Compare commits

...

109 Commits

Author SHA1 Message Date
Brian Behlendorf
7cbe7bbbd4 Tag 2.3.0-rc4
Some checks failed
checkstyle / checkstyle (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
zfs-qemu / Setup (push) Has been cancelled
zloop / zloop (push) Has been cancelled
zfs-qemu / qemu-x86 (push) Has been cancelled
zfs-qemu / Cleanup (push) Has been cancelled
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-12-12 16:20:30 -08:00
Chunwei Chen
2dcc8fe035 Fix DR_OVERRIDDEN use-after-free race in dbuf_sync_leaf
In dbuf_sync_leaf, we clone the arc_buf in dr if we share it with db
except for overridden case. However, this exception causes a race where
dbuf_new_size could free the arc_buf after the last dereference of
*datap and causes use-after-free. We fix this by cloning the buf
regardless if it's overridden.

The race:
--
P0                                     P1

                                       dbuf_hold_impl()
                                         // dbuf_hold_copy passed
                                         // because db_data_pending NULL

dbuf_sync_leaf()
  // doesn't clone *datap
  // *datap derefed to db_buf
  dbuf_write(*datap)

                                       dbuf_new_size()
                                         dmu_buf_will_dirty()
                                           dbuf_fix_old_data()
                                             // alloc new buf for P0 dr
                                             // but can't change *datap

                                         arc_alloc_buf()
                                         arc_buf_destroy()
                                           // alloc new buf for db_buf
                                           // and destroy old buf

  dbuf_write() // continue
    abd_get_from_buf(data->b_data,
    arc_buf_size(data))
      // use-after-free
--

Here's an example when it happens:

BUG: kernel NULL pointer dereference, address: 000000000000002e
RIP: 0010:arc_buf_size+0x1c/0x30 [zfs]
Call Trace:
 dbuf_write+0x3ff/0x580 [zfs]
 dbuf_sync_leaf+0x13c/0x530 [zfs]
 dbuf_sync_list+0xbf/0x120 [zfs]
 dnode_sync+0x3ea/0x7a0 [zfs]
 sync_dnodes_task+0x71/0xa0 [zfs]
 taskq_thread+0x2b8/0x4e0 [spl]
 kthread+0x112/0x130
 ret_from_fork+0x1f/0x30

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #16854
2024-12-12 16:20:30 -08:00
Alexander Motin
38875918d8 BRT: Check bv_mos_entries in brt_entry_lookup()
When vdev first sees some block cloning, there is a window when
brt_maybe_exists() might already return true since something was
cloned, but bv_mos_entries is still 0 since BRT ZAP was not yet
created.  In such case we should not try to look into the ZAP
and dereference NULL bv_mos_entries_dnode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16851
2024-12-12 16:20:30 -08:00
Rob Norris
0d51852ec7 Remove unnecessary CSTYLED escapes on top-level macro invocations
cstyle can handle these cases now, so we don't need to disable it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
73a73cba71 cstyle: ignore old non-POSIX types in macro invocations
In code generation macros, we often use names like `uint` when
constructing handler functions. These are not being used as types, so
exclude them from the admonishment to use POSIX type names.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
87947f2440 cstyle: understand macro params can be empty
It's not uncommon to have empty parameters in code generator macros,
usually when multiple parameters are concatenated or stringified into a
single token or literal. So, exclude the space-before-comma check, which
will allow construction like `MACRO_CALL(foo, , baz)`.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris
0e87150b6c cstyle: understand basic top-level macro invocations
We quite often invoke macros outside of functions, usually to generate
functions or data. cstyle inteprets these as function headers, which at
least have opinions for indenting.

This introduces a separate state for top-level macro invocations, and
excludes it from matching functions. For the moment, most of the
existing rules will continue to apply, but this gives us a way to add or
removes rules targeting macros specifically.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Alexander Motin
7742e29387 Optimize RAIDZ expansion
- Instead of copying one ashift-sized block per ZIO, copy as much
as we have contiguous data up to 16MB per old vdev.  To avoid data
moves use gang ABDs, so that read ZIOs can directly fill buffers
for write ZIOs.  ABDs have much smaller overhead than ZIOs in both
memory usage and processing time, plus big I/Os do not depend on
I/O aggregation and scheduling to reach decent performance on HDDs.
 - Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms.
 - Use 32bit range tree when possible (practically always now) to
slightly reduce memory usage.
 - Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as
for main ones.
 - Fix rate overflows in `zpool status` reporting.

With these changes expanding RAIDZ1 from 4 to 5 children I am able
to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are
limited by devices instead of CPU.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15680
Closes #16819
2024-12-06 09:05:02 -08:00
Alexander Motin
f54052a122 Fix false assertion in dmu_tx_dirty_buf() on cloning
Same as writes block cloning can increase block size and number of
indirection levels.  That means it can dirty block 0 at level 0 or
at new top indirection level without explicitly holding them.

A block cloning test case for large offsets has been added.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16825
2024-12-05 11:49:06 -08:00
Rob Norris
d874f27776 zdb_il: use flex array member to access ZIL records
In 6f50f8e16 we added flex arrays to lr_XX_t structs to silence kernel
bounds check warnings. Userspace code was mostly not updated to use them
though.

It seems that in the right circumstances, compilers can get confused
about sizes in the same way, and throw warnings. This commits switch
those uses over to use the flex array fields also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16832
2024-12-05 09:33:21 -08:00
Alexander Motin
84d7d53e91 Improve speculative prefetcher for block cloning
- Issue prescient prefetches for demand indirect blocks after the
first one.  It should be quite rare for reads/writes, but much more
useful for cloning due to much bigger (up to 1022 blocks) accesses.
It covers the gap during the first couple accesses when we can not
speculate yet, but we know what is needed right now.  It reduces
dbuf_hold() sync read delays in dmu_buf_hold_array_by_dnode().
 - Increase maximum prefetch distance for indirect blocks from 64
to 128MB.  It should cover the maximum 1022 blocks of block cloning
access size in case of default 128KB recordsize used.  In case of
bigger recordsize the above prescient prefetch should also help.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16814
2024-12-05 09:33:21 -08:00
Alexander Motin
d90042dedb Allow dsl_deadlist_open() return errors
In some cases like dsl_dataset_hold_obj() it is possible to handle
those errors, so failure to hold dataset should be better than
kernel panic.  Some other places where these errors are still not
handled but asserted should be less dangerous just as unreachable.

We have a user report about pool corruption leading to assertions
on these errors.  Hopefully this will make behavior a bit nicer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16836
2024-12-05 09:33:21 -08:00
Mark Johnston
0e46085ee6 FreeBSD: Remove an incorrect assertion in zfs_getpages()
The pages in the array may become valid after this initial unbusying,
so the assertion only holds during the first iteration of the outer
loop.

Later in zfs_getpages(), the dmu_read_pages() loop handles already-valid
pages.  Just drop the assertion, it's not terribly useful.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: Klara, Inc.
Closes #16810
Closes #16834
2024-12-05 09:33:21 -08:00
Mariusz Zaborski
3b0c1131ef Add ability to scrub from last scrubbed txg
Some users might want to scrub only new data because they would like
to know if the new write wasn't corrupted.  This PR adds possibility
scrub only newly written data.

This introduces new `last_scrubbed_txg` property, indicating the
transaction group (TXG) up to which the most recent scrub operation
has checked and repaired the dataset, so users can run scrub only
from the last saved point. We use a scn_max_txg and scn_min_txg
which are already built into scrub, to accomplish that.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Sponsored-By: Klara Inc.
Closes #16301
2024-12-05 09:33:21 -08:00
shodanshok
5988de77b0 Fix race in libzfs_run_process_impl
When replacing a disk, a child process is forked to run a script called
zfs_prepare_disk (which can be useful for disk firmware update or health
check). The parent than calls waitpid and checks the child error/status
code.

However, the _reap_children thread (created from zed_exec_process to
manage zedlets) also waits for all children with the same PGID and can
stole the signal, causing the replace operation to be aborted.

As waitpid returns -1, the parent incorrectly assume that the child
process had an error or was killed. This, in turn, leaves the newly
added disk in REMOVED or UNAVAIL status rather than completing the
replace process.

This patch changes the PGID of the child process execuing the
prepare script, shielding it from the _reap_children thread.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16801
2024-12-05 09:33:21 -08:00
Alexander Motin
00debc1361 FreeBSD: Remove some illumos compat from vnode.h
Should make no difference, just some dead code cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-05 09:33:21 -08:00
Alexander Motin
af10714e42 FreeBSD: Return ifndef IN_BASE back to fix the build
FreeBSD's libprocstat seems to build kernel code in user space,
which does not work here due to undefined vnode_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-05 09:33:21 -08:00
Rob Norris
747781a345 zinject(8): rename "ioctl" to "flush"
Doc bug missed in d7605ae77.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16827
2024-12-02 18:14:26 -08:00
Alexander Motin
b17ea73f9d Fix regression in dmu_buf_will_fill()
Direct I/O implementation added condition to call dbuf_undirty()
only in case of block cloning.  But the condition is not right if
the block is no longer dirty in this TXG, but still in DB_NOFILL
state.  It resulted in block not reverting to DB_UNCACHED and
following NULL de-reference on attempt to access absent db_data.

While there, add assertions for db_data to make debugging easier.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16829
2024-12-02 18:14:26 -08:00
Alexander Motin
17cdb7a2b1 Add missing parenthesis in VERIFYF()
Without them the order of operations might get unexpected.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16826
2024-12-02 18:14:26 -08:00
Pavel Snajdr
b673bcba4d Linux: fix zfs_uio_dio_check_for_zero_page
The intent here is to replace the zero page pointer in the array of
pointers to pages in the struct.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16812 
Closes #16689
Closes #16642
2024-12-02 18:14:26 -08:00
Ivan Volosyuk
d8886275df Linux: Fix detection of register_sysctl_sz
Adjust the m4 function to mimic sentinel we use in spl-proc.c
This fixes the detection on kernels compiled with CONFIG_RANDSTRUCT=y

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ivan Volosyuk <Ivan.Volosyuk@gmail.com>
Closes: #16620
Closes: #16805
2024-12-02 18:14:26 -08:00
Rob Norris
1aee375947 zdb: show dedup table and log attributes
There's interesting info in there that is going to help with
understanding dedup behavior at any given moment.

Since this is a format change, tests that rely on that output have been
modified to match.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16755
2024-12-02 18:14:26 -08:00
Pavel Snajdr
a1907b038a Assert if we're logging after final txg was set
This allowed to debug #16714, fixed in #16782.  Without assertions
added here it is difficult to figure out what logs cause the problem,
since the assertion happens in sync thread context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes #16795
2024-12-02 18:14:26 -08:00
Alexander Motin
d359f7f547 FreeBSD: Reduce copy_file_range() source lock to shared
Linux locks copy_file_range() source as shared.  FreeBSD was doing
it also, but then was changed to exclusive, partially because KPI
of that time was doing so, and partially seems out of caution.
Considering zfs_clone_range() uses range locks on both source and
destination, neither should require exclusive vnode locks. But one
step at a time, just sync it with Linux for now.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16789
Closes #16797
2024-12-02 18:14:26 -08:00
Alexander Motin
90603601b4 FreeBSD: Lock vnode in zfs_ioctl()
Previously vnode was not locked there, unlike Linux.  It required
locking it in vn_flush_cached_data(), which recursed on the lock
if called from zfs_clone_range(), having the vnode locked.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16789
Closes #16796
2024-12-02 18:14:26 -08:00
Pavel Snajdr
ecd0b1528e Linux: Fix zfs_prune panics
by protecting against sb->s_shrink eviction on umount with newer kernels

deactivate_locked_super calls shrinker_free and only then
sops->kill_sb cb, resulting in UAF on umount when trying
to reach for the shrinker functions in zpl_prune_sb of
in-umount dataset

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16770
2024-12-02 18:14:26 -08:00
Brian Behlendorf
3ed1d608a8 Linux 6.12 compat: META
Update the META file to reflect compatibility with the 6.12 kernel.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16793
2024-11-21 08:24:37 -08:00
Alexander Motin
c165daa0b1 BRT: Clear bv_entcount_dirty on destroy
This fixes assertion in brt_sync_table() on debug builds when last
cloned block on the vdev is freed and bv_meta_dirty is cleared,
while bv_entcount_dirty is not.  Should not matter in production.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16791
2024-11-21 08:24:37 -08:00
Alexander Motin
1a5414ba2f BRT: More optimizations after per-vdev splitting
- With both pending and current AVL-trees being per-vdev and having
effectively identical comparison functions (pending tree compared
also birth time, but I don't believe it is possible for them to be
different for the same offset within one transaction group), it
makes no sense to move entries from one to another.  Instead inline
dramatically simplified brt_entry_addref() into brt_pending_apply().
It no longer requires bv_lock, since there is nothing concurrent
to it at the time.  And it does not need to search the tree for the
previous entries, since it is the same tree, we already have the
entry and we know it is unique.
 - Put brt_vdev_lookup() and brt_vdev_addref() into different tree
traversals to avoid false positives in the first due to the second
entcount modifications.  It saves dramatic amount of time when a
file cloned first time by not looking for non-existent ZAP entries.
 - Remove avl_is_empty(bv_tree) check from brt_maybe_exists().  I
don't think it is needed, since by the time all added entries are
already accounted in bv_entcount. The extra check must be producing
too many false positives for no reason.  Also we don't need bv_lock
there, since bv_entcount pointer must be table at this point, and
we don't care about false positive races here, while false negative
should be impossible, since all brt_vdev_addref() have already
completed by this point.  This dramatically reduces lock contention
on massive deletes of cloned blocks.  The only remaining one is
between multiple parallel free threads calling brt_entry_decref().
 - Do not update ZAP if net change for a block over the TXG was 0.
In combination with above it makes file move between datasets as
cheap operation as originally intended if it fits into one TXG.
 - Do not allocate vdevs on pool creation or import if it did not
have active block cloning. This allows to save a bit in few cases.
 - While here, add proper error handling in brt_load() on pool
import instead of assertions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16773
2024-11-21 08:24:37 -08:00
Alexander Motin
409aad3f33 BRT: Rework structures and locks to be per-vdev
While block cloning operation from the beginning was made per-vdev,
before this change most of its data were protected by two pool-
wide locks.  It created lots of lock contention in many workload.

This change makes most of block cloning data structures per-vdev,
which allows to lock them separately.  The only pool-wide lock now
it spa_brt_lock, protecting array of per-vdev pointers and in most
cases taken as reader.  Also this splits per-vdev locks into three
different ones: bv_pending_lock protects the AVL-tree of pending
operations in open context, bv_mos_entries_lock protects BRT ZAP
object from while being prefetched, and bv_lock protects the rest
of per-vdev context during TXG commit process.  There should be
no functional difference aside of some optimizations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin
1917c26944 ZAP: Add by_dnode variants to lookup/prefetch_uint64
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin
2b64d41be8 BRT: Don't call brt_pending_remove() on holes/embedded
We are doing exactly the same checks around all brt_pending_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin
9753feaa63 ZTS: Avoid embedded blocks in bclone/bclone_prop_sync
If we write less than 113 bytes with enabled compression we get
embeded block, which then fails check for number of cloned blocks
in bclone_test.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
tleydxdy
3c0b8da206 fix: block incompatible kernel from being installed
The current "Requires" lines only ensure the old kernel is
available on the system but it does not prevent fedora from
updating to an incompatible and breaking user's system.

Set Conflicts to block incompatible kernels from being installed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: tleydxdy <shironeko.github@tesaguri.club>
Closes #16139
2024-11-21 08:24:37 -08:00
Mark Johnston
d7abeef621 zio: Avoid sleeping in the I/O path
zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline.  It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD.  This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.

Simply go to sleep unconditionally.  This fixes an occasional panic I
see when running the ZTS on FreeBSD.  Also remove an unhelpful comment
referencing the non-existent timeout_generic().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16785
2024-11-21 08:24:37 -08:00
Brian Behlendorf
7fb7eb9a63 ZTS: Fix zpool_status_008_pos false positive
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16769
2024-11-21 08:24:37 -08:00
Alexander Motin
3f9af023f6 L2ARC: Stop rebuild before setting spa_final_txg
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16714
Closes #16782
2024-11-21 08:24:37 -08:00
Alexander Motin
f7675ae30f Remove hash_elements_max accounting from DBUF and ARC
Those values require global atomics to get current hash_elements
values in few of the hottest code paths, while in all the years I
never cared about it.  If somebody wants, it should be easy to
get it by periodic sampling, since neither ARC header nor DBUF
counts change so fast that it would be difficult to catch.

For now I've left hash_elements_max kstat for ARC, since it was
used/reported by arc_summary and it would break older versions,
but now it just reports the current value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16759
2024-11-21 08:24:37 -08:00
Rob Norris
920603990a Move "no name changes" from compression to checksum table
Compression names actually aren't used in dedup table names, but
checksum names are.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16776
2024-11-21 08:24:37 -08:00
Steve Mokris
9a4b2f08d3 Expand zpool-remove.8 manpage with example results
Also fix comment cross-referencing to zpool.8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Steve Mokris <smokris@softpixel.com>
Closes #16777
2024-11-21 08:24:37 -08:00
Alexander Motin
8023d9d4b5 Fix few __VA_ARGS typos in assertions
It should be __VA_ARGS__, not __VA_ARGS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16780
2024-11-21 08:24:37 -08:00
Ameer Hamza
23f063d2e6 zed: prevent automatic replacement of offline vdevs
When an OFFLINE device is physically removed, a spare is automatically
activated. However, this behavior differs in FreeBSD, where we do not
transition from OFFLINE state to REMOVED.
Our support team has encountered cases where customers experienced
unexpected behavior during drive replacements, with multiple spares
activating for the same VDEV due to a single disk replacement. This
patch ensures that a drive in an OFFLINE state remains in that state,
preventing it from transitioning to REMOVED and being automatically
replaced by a spare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16751
2024-11-21 08:24:37 -08:00
Rob Norris
18474efeec AUTHORS: refresh with recent new contributors
Welcome to the party 🎉

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16762
2024-11-15 15:08:59 -08:00
José Luis Salvador Rufo
260065099e tests: fix uClibc for getversion.c
This patch fixes compilation with uClibc by applying the same fallback
as commit e12d76176d to the `getversion.c`
file, which was previously overlooked.
 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #16735
Closes #16741
2024-11-14 16:56:34 -08:00
Ameer Hamza
4c9f2cec46 zvol_os.c: Increase optimal IO size
Since zvol read and write can process up to (DMU_MAX_ACCESS / 2) bytes
in a single operation, the current optimal I/O size is too low. SCST
directly reports this value as the optimal transfer length for the
target SCSI device. Increasing it from the previous volblocksize results
in performance improvement for large block parallel I/O workloads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16750
2024-11-14 16:52:10 -08:00
Mark Johnston
ee3677d321 Fix some nits in zfs_getpages()
- If we don't want dmu_read_pages() to perform extra readahead/behind,
  pass a pointer to 0 instead of a null pointer, as dum_read_pages()
  expects rahead and rbehind to be non-null.
- Avoid unneeded iterations in a loop.

Sponsored-by: Klara, Inc.
Reported-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16758
2024-11-14 16:52:06 -08:00
Rob Norris
0274a9a57d dsl_dataset: put IO-inducing frees on the pool deadlist
dsl_free() calls zio_free() to free the block. For most blocks, this
simply calls metaslab_free() without doing any IO or putting anything on
the IO pipeline.

Some blocks however require additional IO to free. This at least
includes gang, dedup and cloned blocks. For those, zio_free() will issue
a ZIO_TYPE_FREE IO and return.

If a huge number of blocks are being freed all at once, it's possible
for dsl_dataset_block_kill() to be called millions of time on a single
transaction (eg a 2T object of 128K blocks is 16M blocks). If those are
all IO-inducing frees, that then becomes 16M FREE IOs placed on the
pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T
object that requires a 20G allocation of resident memory from the
zio_cache. If that can't be satisfied by the kernel, an out-of-memory
condition is raised.

This would be better handled by improving the cases that the
dmu_tx_assign() throttle will handle, or by reducing the overheads
required by the IO pipeline, or with a better central facility for
freeing blocks.

For now, we simply check for the cases that would cause zio_free() to
create a FREE IO, and instead put the block on the pool's freelist. This
is the same place that blocks from destroyed datasets go, and the async
destroy machinery will automatically see them and trickle them out as
normal.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #6783
Closes #16708
Closes #16722 
Closes #16697
2024-11-14 16:52:02 -08:00
Alexander Motin
025f8b2e74 L2ARC: Move different stats updates earlier
..., before we make the header or the log block visible to others.
It should fix assertion on allocated space going negative if the
header is freed once the lock is dropped, while the write is still
going.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16040
Closes #16743
2024-11-14 16:51:58 -08:00
Mark Johnston
37e8f3ae17 Grab the rangelock unconditionally in zfs_getpages()
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-14 16:51:20 -08:00
Mark Johnston
7313c6e382 Fix a potential page leak in mappedread_sf()
mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-14 16:51:17 -08:00
Umer Saleem
1c6b0302ef Fix user properties output for zpool list
In zpool_get_user_prop, when called from zpool_expand_proplist and
collect_pool, we often have zpool_props present in zpool_handle_t equal
to NULL. This mostly happens when only one user property is requested
using zpool list -o <user_property>. Checking for this case and
correctly initializing the zpool_props field in zpool_handle_t fixes
this issue.

Interestingly, this issue does not occur if we query any other property
like name or guid along with a user property with -o flag because while
accessing properties like guid, zpool_prop_get_int is called which
checks for this case specifically and calls zpool_get_all_props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-14 16:51:12 -08:00
Umer Saleem
7e3af4658b JSON: fix user properties output for zpool list
This commit fixes JSON output for zpool list when user properties are
requested with -o flag. This case needed to be handled specifically
since zpool_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-14 16:51:07 -08:00
Brian Behlendorf
1a54b13aaf Tag 2.3.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-11-07 11:33:51 -08:00
Umer Saleem
9061a4da0b JSON: fix user properties output for zfs list
This commit fixes JSON output for zfs list when user properties are
requested with -o flag. This case needed to be handled specifically
since zfs_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16732
2024-11-07 11:33:51 -08:00
Sam James
e12d76176d Use <fcntl.h> instead of <sys/fcntl.h>
When building on musl, we get:

```
In file included from tests/zfs-tests/cmd/getversion.c:22:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>

In file included from module/os/linux/zfs/vdev_file.c:36:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
```

Bug: https://bugs.gentoo.org/925235
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam James <sam@gentoo.org>
Closes #15925
2024-11-07 11:33:51 -08:00
Brian Atkinson
8131793d6f Update ABD stats for linear page Linux
a10e552 updated abd_free_linear_page() to no longer call
abd_update_scatter_stat(). This meant that linear pages that were not
attached to Direct I/O requests were not doing waste accounting for the
ARC. This led to performance issues due to incorrect ARC accounting that
resulted in 100% of CPU time being spent in arc_evict() during prolonged
I/O workloads with the ARC.

The call to abd_update_scatter_stats() is now conditionally called in
abd_free_linear_page() when the ABD is not from a Direct I/O request.

Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16729
2024-11-07 11:33:51 -08:00
Chunwei Chen
c82eb27b22 ZFS send should use spill block prefetched from send_reader_thread
Currently, even though send_reader_thread prefetches spill block,
do_dump() will not use it and issues its own blocking arc_read. This
causes significant performance degradation when sending datasets with
lots of spill blocks.

For unmodified spill blocks, we also create send_range struct for them
in send_reader_thread and issue prefetches for them. We piggyback them
on the dnode send_range instead of enqueueing them so we don't break
send_range_after check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: david.chen <david.chen@nutanix.com>
Closes #16701
2024-11-06 11:54:32 -08:00
tstabrawa
661bb434e6 Use simple folio migration function
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
tstabrawa
ae48c2f6a9 Revert "Avoid BUG in migrate_folio_extra"
This reverts commit b052035990.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
Uglymotha
b96845b632 Verify parent_dev before calling udev_device_get_sysattr_value
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Sietse <sietse@wizdom.nu>
Co-authored-by: Sietse <sietse@wizdom.nu>
Closes #16705 
Closes #16717
2024-11-04 16:46:39 -08:00
Alexander Motin
55cbd1f9bd Reduce dirty records memory usage
Small block workloads may use a very large number of dirty records.
During simple block cloning test due to BRT still using 4KB blocks
I can easily see up to 2.5M of those used.  Before this change
dbuf_dirty_record_t structures representing them were allocated via
kmem_zalloc(), that rounded their size up to 512 bytes.

Introduction of specialized kmem cache allows to reduce the size
from 512 to 408 bytes.  Additionally, since override and raw params
in dirty records are mutually exclusive, puting them into a union
allows to reduce structure size down to 368 bytes, increasing the
saving to 28%, that can be a 0.5GB or more of RAM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16694
2024-11-04 16:46:39 -08:00
Rob Norris
880b73956b zfs(4): remove "experimental" from zfs_bclone_enabled
I think we've done enough experiments.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16189 
Closes #16712
2024-11-04 16:46:39 -08:00
Tony Hutter
d367ef2995 ZTS: Add Fedora 41, remove Fedora 39
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24.
Update Fedora runners in the test suite.  Some minor tweaks also needed
to support ksh 1.0.10.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16700
2024-11-01 10:03:05 -07:00
Rob Norris
7546fbd6e9 zdb: add extra -T flag to show histograms of BRT refcounts
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16692
2024-11-01 09:49:54 -07:00
Rich Ercolani
903d3f9187 Added output to zpool online and offline
I was surprised to discover today that `zpool online` and
`zpool offline` don't print any information about why they failed in
many cases, they just return 1 with no information about why.

Let's improve that where we can without changing the library function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #16244
2024-11-01 09:49:49 -07:00
Rob Norris
86b5853cfb vdev_disk: move abd return and free off the interrupt handler
Freeing an ABD can take sleeping locks to update various stats. We
aren't allowed to sleep on an interrupt handler. So, move the free off
to the io_done callback.

We should never have been freeing things in the interrupt handler, but
we got away with it because we were usually freeing a linear ABD, which
at most is returning two objects to a cache and never sleeping. Scatter
ABDs can be used now, and those have more complex locking.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687
2024-11-01 09:49:23 -07:00
Rob Norris
19a8dd48e1 vdev_disk: try harder to ensure IO alignment rules
It seems out our notion of "properly" aligned IO was incomplete. In
particular, dm-crypt does its own splitting, and assumes that a logical
block will never cross an order-0 page boundary (ie, the physical page
size, not compound size). This effectively means that it needs to be
possible to split a BIO at any page or block size boundary and have it
work correctly.

This updates the alignment check function to enforce these rules (to the
extent possible).

Our response to misaligned data is to make some new allocation that is
properly aligned, and copy the data into it. It turns out that
linearising (via abd_borrow_buf()) is not enough, because we allocate eg
4K blocks from a general purpose slab, and so may receive (or already
have) a 4K block that crosses pages.

So instead, we allocate a new ABD, which is guaranteed to be aligned
properly to block sizes, and then copy everything into it, and back out
on the way back.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687 #16631 #15646 #15533 #14533
2024-11-01 09:49:14 -07:00
Serapheim Dimitropoulos
8ac70aade7 Add warning for external consumers of dmu_tx_callback_register
While reading some code @grwilson came across the above function that
seemingly had no consumers besides a ztest callback that ensures that
the tx_callback infrastructure works correctly. It turns out that Lustre
is the main (and potentially the only) consumer of this. Refer to
`osd_trans_commit_cb` of `lustre/osd-zfs/osd_handler.c` in the Lustre
repo for more info. Let's add a comment highlighting this before someone
removes it by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Closes #16698
2024-11-01 09:49:05 -07:00
Alexander Motin
bbc0d34bfd On the first vdev open ignore impossible ashift hints
If on the first open device's logical ashift is bigger than set
by pool's ashift property, ignore the last as unusable instead of
creating vdev that will fail most of I/Os due to misalignment.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16690
2024-11-01 09:49:00 -07:00
Dimitry Andric
f3823a9ab2 Fix gcc uninitialized warning in FreeBSD zio_crypt.c
In FreeBSD's `zio_do_crypt_data()`, ensure that two `struct uio`
variables are cleared before copying data out of them. This avoids
accessing garbage data, and fixes gcc `-Wuninitialized` warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16688
2024-11-01 09:48:55 -07:00
Dimitry Andric
fd2cae969f Fix gcc unused value warning in FreeBSD simd.h
The macros `simd_stat_init()` and `simd_stat_fini()` in FreeBSD's
`simd.h` are defined as zero, but they are actually only used as
statements. Replace the definitions with `do {} while (0)` instead, to
avoid gcc `-Wunused-value` warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16693
2024-11-01 09:48:51 -07:00
Tony Hutter
f7b4bca66a ZTS: Add LUKS sanity test
Add a LUKS sanity test to trigger: #16631

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16681
2024-11-01 09:48:47 -07:00
Alexander Motin
7e3ce4efaa Pack dmu_buf_impl_t by 16 bytes
On 64bit FreeBSD this reduces one from 296 to 280 bytes.  On small
block workloads dbufs may consume gigabytes of ARC, and this saves
5% of it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16684
2024-11-01 09:48:42 -07:00
Tino Reichardt
77d81974b6 Fix dependency install on Debian 11 (#16683)
Adding cryptsetup breaks some dialog things on Debian 11.
Apply some workaround for it.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-11-01 09:48:37 -07:00
Brian Behlendorf
5237760b17 ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16670
2024-10-21 13:02:07 -07:00
Rob Norris
ede715d1e4 spl/thread: explicitly define thread_func_t as noreturn
All of our thread entry functions have this signature:

    void (*)(void*) __attribute__((noreturn))

The low-level `__thread_create()` function accepts a `thread_func_t` as
the entry point, which is defined more simply as:

    void (*)(void *)

And then the `thread_create()` and `thread_create_named()` macros cast
the passed-in function point down to `thread_func_t`, that is, casting
away the `noreturn` attribute.

Clang considers casting between these two types to be invalid because
both the caller and the callee may have elided parts of the stack frame
save and restore, knowing that they won't be needed.

Recent Linux appears to be setting `-Wcast-function-type-strict`, which
causes this invalid cast to emit a warning, which with `-Werror` is
converted to an error, breaking the build.

This commit fixes this in the simplest possible way: adding `noreturn`
to the `thread_func_t` attribute. Since all our thread entry functions
already have this attribute, it's arguably a just a consistency fix
anyway.

I considered removing the casts in the macros, which silences the
warnings, but it turns out that Clang has a bug that won't emit this
error for implicit conversions, only explicit casts. So leaving them
there seems like a reasonable belt-and-suspenders approach. Also,
frankly, this whole mechanism seems a little undercooked inside LLVM, so
I'm content go with my intuition about the smallest, least invaisve
change.

**NOTE**: `__thread_create` is exported by `spl.ko` and has a
`thread_func_t` arg, so this is an ABI break. Whether that matters in
practice, I have no idea.

Further reading:
- 1aad641c79
- https://github.com/llvm/llvm-project/issues/7325
- https://github.com/llvm/llvm-project/issues/41465

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16672 
Closes #16673
2024-10-21 13:02:07 -07:00
Rob Norris
e30c69365d config: fix dequeue_signal check for kernels <4.20
Before 4.20, kernel_siginfo_t was just called siginfo_t. This was
causing the kthread_dequeue_signal_3arg_task check, which uses
kernel_siginfo_t, to fail on older kernels.

In d6b8c17f1, we started checking for the "new" three-arg
dequeue_signal() by testing for the "old" version. Because that test is
explicitly using kernel_siginfo_t, it would fail, leading to the build
trying to use the new three-arg version, which would then not compile.

This commit fixes that by avoiding checking for the old 3-arg
dequeue_signal entirely. Instead, we check for the new one, as well as
the 4-arg form, and we use the old form as a fallback. This way, we
never have to test for it explicitly, and once we're building
HAVE_SIGINFO will make sure we get the right kernel_siginfo_t for it, so
everything works out nice.

Original-patch-by: Finix <yancw@info2soft.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16666
2024-10-21 13:02:07 -07:00
Rob Norris
78d39d91fa zdb: show bp in uberblock dump
Just another useful nugget of info in times of strife.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16667
2024-10-21 13:02:07 -07:00
Tomohiro Kusumi
2b359c7824 Fix compile-time warnings caused by duplicate struct typedefs
Some compiler/versions warn these typedefs according to #16660.

The platform specific header sys/abd_os.h shouldn't define or use abd_t,
as it's defined in its non-platform specific consumer sys/abd.h.
Do the same as what FreeBSD header does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #16660 
Closes #16665
2024-10-21 13:02:07 -07:00
Alexander Motin
ace2e17a9b zfs_debug: Restore log size limit for userspace
For some reason it was dropped when split from kernel, that makes
raidz_test to accumulate in RAM up to 100GB of logs we don't need.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by:  Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16492
Closes #16566
Closes #16664
2024-10-21 13:02:07 -07:00
Rob Norris
b4cd10ce5b libspl/backtrace: comment and harden libunwind backtracer
This is the sort of code that we get right once and never look at again.
Anyone reading this code is already likely in the middle of a debugging
nightmare, and then they have a wall of manual string construction and
an unfamiliar and idiosyncratic library to deal with. So, comment the
whole thing to try to make it clear what's going on.

In pursuit of the above, I've added return checks to some of the
libunwind calls, fixed the frame loop to not skip the "top" frame
(however unseful it may be), and fix a couple of calls to
spl_bt_u64_to_hex_str() which requested 18 digits instead of 16.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris
f52d7aaaac libspl/backtrace: rename and document hex conversion function
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris
d5db840260 libspl/backtrace: helper macros for output
My eyes are going blurry looking at all those write calls. This is much
nicer.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Close #16653
2024-10-21 13:02:07 -07:00
Rob Norris
bcd61d9579 libspl/backtrace: dump registers in libunwind backtraces
More useful stuff, especially when trying to follow a disassembly.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Umer Saleem
36a67b50a2 Fix inconsistent mount options for ZFS root
While mounting ZFS root during boot on Linux distributions from initrd,
mount from busybox is effectively used which executes mount system call
directly. This skips the ZFS helper mount.zfs, which checks and enables
the mount options as specified in dataset properties. As a result,
datasets mounted during boot from initrd do not have correct mount
options as specified in ZFS dataset properties.

There has been an attempt to use mount.zfs in zfs initrd script,
responsible for mounting the ZFS root filesystem (PR#13305). This was
later reverted (PR#14908) after discovering that using mount.zfs breaks
mounting of snapshots on root (/) and other child datasets of root have
the same issue (Issue#9461).

This happens because switching from busybox mount to mount.zfs correctly
parses the mount options but also adds 'mntpoint=/root' to the mount
options, which is then prepended to the snapshot mountpoint in
'.zfs/snapshot'. '/root' is the directory on Debian with initramfs-tools
where root filesystem is mounted before pivot_root. When Linux runtime
is reached, trying to access the snapshots on root results in
automounting the snapshot on '/root/.zfs/*', which fails.

This commit attempts to fix the automounting of snapshots on root, while
using mount.zfs in initrd script. Since the mountpoint of dataset is
stored in vfs_mntpoint field, we can check if current mountpoint of
dataset and vfs_mntpoint are same or not. If they are not same, reset
the vfs_mntpoint field with current mountpoint. This fixes the
mountpoints of root dataset and children in respective vfs_mntpoint
fields when we try to access the snapshots of root dataset or its
children. With correct mountpoint for root dataset and children stored
in vfs_mntpoint, all snapshots of root dataset are mounted correctly
and become accessible.

This fix will come into play only if current process, that is trying to
access the snapshots is not in chroot context. The Linux kernel API
that is used to convert struct path into char format (d_path), returns
the complete path for given struct path. It works in chroot environment
as well and returns the correct path from original filesystem root.

However d_path fails to return the complete path if any directory from
original root filesystem is mounted using --bind flag or --rbind flag
in chroot environment. In this case, if we try to access the snapshot
from outside the chroot environment, d_path returns the path correctly,
i.e. it returns the correct path to the directory that is mounted with
--bind flag. However inside the chroot environment, it only returns the
path inside chroot.

For now, there is not a better way in my understanding that gives the
complete path in char format and handles the case where directories from
root filesystem are mounted with --bind or --rbind on another path which
user will later chroot into. So this fix gets enabled if current
process trying to access the snapshot is not in chroot context.

With the snapshots issue fixed for root filesystem, using mount.zfs in
ZFS initrd script, mounts the datasets with correct mount options.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16646
2024-10-21 13:02:07 -07:00
Warner Losh
3d9129a7b6 freebsd: Use compiler.h from FreeBSD's base's linuxkpi
The FreeBSD linux/compiler.h in OpenZFS was copied from a very old
version of FreeBSD's linuxkpi's linux/compiler.h. There's no need for
this duplication. Use FreeBSD's linuxkpi version instead, and provide
zfs_fallthrough to augment it (it's all that's needed). Use #pragma once
to avoid naming issues for guard variables. Since this is a complete
rewrite, use my copyright here (the original code in FreeBSD still
credits everybody). This works back at least to FreeBSD 12.4, which
is not out of support, and all newer releases.

Remove extra copies of macros that were defined elsewhere, but are now
properly defined in LinuxKPI so are redundant.

Sponsored-by: Netflix
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #16650
2024-10-21 13:02:07 -07:00
Brian Behlendorf
0409c47fe0 Tag 2.3.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-10-13 19:25:58 -07:00
Tino Reichardt
b5a3825244 ZTS: Make use of optimal CPU pinning
With CPU pinning, we should get some speedup because of better
cpu cache re-use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Tino Reichardt
77df762a1b ZTS: Optimize Kernel Same-page Merging (KSM)
Kernel same-page Merging (KSM) allows KVM guests to share identical
memory pages. These shared pages are usually common libraries or other
identical, high-use data.

The current configuration was a bit to lazy - so KSM didn't work very
well. With the new configuration I could run 3 Linux VMs in parralel.

FreeBSD can't benefit from it. But FreeBSD is not so memory hungry in
general, so there is no need for it ;)

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Brian Behlendorf
56871e465a Fallback to strerror() when strerror_l() isn't available
Some C libraries, such as uClibc, do not provide strerror_l() in
which case we fallback to strerror().

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16636 
Closes #16640
2024-10-13 19:25:58 -07:00
Brian Behlendorf
c645b07eaa ZTS: Increase zpool_import_parallel_pos import margin
Increase the pool import time allowed by assuming a minimum reduction
to 1/2 instead of 1/3 when comparing sequential to parallel import
times.  This is sufficient to verify parallel imports are working as
intended and should address the occasional false positive failure
when the time is slightly exceeded.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16638
2024-10-11 16:21:25 -07:00
Brian Behlendorf
5bc27acf51 ZTS: Slightly increase dedup_quota limit
As described in the comment above this check the space used by
logged entries is not accounted for and some margin needs to be
added in.  While uncommon we have slightly exceeded the 600,000
threshold on some CI run so we increase the limit a bit more.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16637
2024-10-11 16:21:25 -07:00
Brian Behlendorf
7f830d783b CI: Stick with ubuntu-22.04 for CodeQL analysis
The ubuntu-latest alias now refers to ubuntu-24.04 instead of
ubuntu-22.04 which causes CodeQL's autobuild to fail with:

    cpp/autobuilder: deptrace not supported in ubuntu 24.04

Until deptrace is supported by ubuntu-24.04 hosted runners request
ubuntu-22.04 which is supported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Closes #16639
2024-10-11 16:21:25 -07:00
Martin Matuška
58162960a1 zdb: fix printf format in dump_zap()
When compiling zdb.c on 32-bit platforms, a format conversion error 
is reported for a printf() in dump_zap().  Change %l to macro 
%" PRIu64 " to match the platform size of a 64-bit unsigned integer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16635
2024-10-11 16:21:25 -07:00
Rob Norris
666903610d zpool/zfs: allow --json wherever -j is allowed
Mostly so that with the JSON formatting options are also used, they all
look the same. To my eye, `-j --json-flat-vdevs` suggests that they are
different or unrelated, while `--json --json-flat-vdevs` invites no
further questions.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16632
2024-10-11 16:21:25 -07:00
Brian Atkinson
26ecd8b993 Always validate checksums for Direct I/O reads
This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are still correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO column. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16598
2024-10-09 13:45:06 -07:00
Martin Matuška
774dcba86d FreeBSD: ignore some includes when not building kernel
The function abd_alloc_from_pages() is used only in kernel.
Excluding sys/vm.h, and vm/vm_page.h includes avoids dependency
problems.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16616
2024-10-09 13:45:02 -07:00
Brian Behlendorf
09f6b2ebe3 ztest: Fix scrub check in ztest_raidz_expand_check()
The scrub code may return EBUSY under several possible scenarios
causing ztest to incorrectly ASSERT when verifying the result of
a raidz expansion.  Update the test case to allow EBUSY since it
does not indicate pool damage.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16627
2024-10-09 13:44:58 -07:00
Matthew Heller
2609d93b65 vdev_id: multi-lun disks & slot num zero pad
Add ability to generate disk names that contain both a slot number
and a lun number in order to support multi-actuator SAS hard drives
with multiple luns. Also add the ability to zero pad slot numbers to
a desired digit length for easier sorting.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Heller <matthew.f.heller@accre.vanderbilt.edu>
Closes #16603
2024-10-09 13:44:55 -07:00
Brian Behlendorf
10f46d2aba ZTS: resilver_restart_001.ksh restore defaults
Update resilver_restart_001.ksh to restore the default
resilver_defer_percent when the test completes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16618
2024-10-09 13:44:50 -07:00
Umer Saleem
0df10dc911 Only serialize native-deb* targets
.NOTPARALLEL target is being forced on userspace as well. This commit
removes .NOTPARALEL target and only serializes the execution of
native-deb* targets.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16622
2024-10-09 13:44:46 -07:00
Rob Norris
0fbe9d352c zpool/zfs: restore -V & --version options
The -j option added a round of getopt, which didn't know the magic
version flags. So just bypass the whole thing and go straight to the
human output function for the special case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16615 
Closes #16617
2024-10-09 13:44:42 -07:00
Martin Matuška
84f44ec07f Return boolean_t in inline functions of lib/libspl/include/sys/uio.h
The inline functions zfs_dio_offset_aligned(), zfs_dio_size_aligned()
and zfs_dio_aligned() are declared as boolean_t but return the bool
type.

This fixes the build of FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16613
2024-10-09 13:44:36 -07:00
Shengqi Chen
fc9608e2e6 Bump SONAME of libzfs and libzpool
The ABI of libzfs and libzpool have breaking changes since last
SONAME bump in commit fe6babc:

* libzfs: `zpool_print_unsup_feat` removed (used by zpool cmd).
* libzpool: multiple `ddt_*` symbols removed (used by zdb cmd).

Bump them to avoid ABI breakage.

See: https://github.com/openzfs/zfs/pull/11817
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:32 -07:00
Shengqi Chen
d32c05949a contrib/debian: add new manpages to installation list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:26 -07:00
JKDingwall
1ebb6b866f Fix generation of kernel uevents for snapshot rename on linux
`zvol_rename_minors()` needs to be given the full path not just the
snapshot name.  Use code removed in a0bd735ad as a guide
to providing the necessary values.

Add ZTS check for /dev changes after snapshot rename.  After
renaming a snapshot with 'snapdev=visible' ensure that the /dev
entries are updated to reflect the rename.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Dingwall <james@dingwall.me.uk>
Closes #14223 
Closes #16600
2024-10-09 13:44:22 -07:00
Tino Reichardt
f019b445f3 ZTS: Fix summary page creation again - second try
In PR #16599 I used 'return' like in C - which is wrong :/
This fix generates the summary as needed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16611
2024-10-09 13:44:18 -07:00
Tino Reichardt
03822a61be ZTS: Remove FreeBSD 13.4-STABLE
Current CI is failing on FreeBSD 13.4-STABLE, because samba4 can't be
installed there. Lets remove it for now.

Update also the FreeBSD version definitions a bit.

The naming is like this now:

FreeBSD variants:
- freebsd13-3r, freebsd13-4r, freebsd14-0r, freebsd14-1r (RELEASE)
- freebsd13-4s, freebsd14-1s (STABLE)
- freebsd15-0c (CURRENT)

RHL based distros:
- almalinux8, almalinux9, centos-stream9, fedora39, fedora40

Debian based:
- debian11, debian12, ubuntu20, ubuntu22, ubuntu24

Misc Linux distros:
- archlinux, tumbleweed

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16610
2024-10-09 13:43:46 -07:00
196 changed files with 3419 additions and 2213 deletions

View File

@ -11,7 +11,7 @@ concurrency:
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
permissions:
actions: read
contents: read

View File

@ -18,19 +18,21 @@ ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -q -N ""
# we expect RAM shortage
cat << EOF | sudo tee /etc/ksmtuned.conf > /dev/null
# /etc/ksmtuned.conf - Configuration file for ksmtuned
# https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm
KSM_MONITOR_INTERVAL=60
# Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less.
KSM_SLEEP_MSEC=10
KSM_NPAGES_BOOST=300
KSM_NPAGES_DECAY=-50
KSM_NPAGES_MIN=64
KSM_NPAGES_MAX=2048
KSM_SLEEP_MSEC=30
KSM_THRES_COEF=25
KSM_THRES_CONST=2048
KSM_NPAGES_BOOST=0
KSM_NPAGES_DECAY=0
KSM_NPAGES_MIN=1000
KSM_NPAGES_MAX=25000
KSM_THRES_COEF=80
KSM_THRES_CONST=8192
LOGFILE=/var/log/ksmtuned.log
DEBUG=1

View File

@ -14,7 +14,7 @@ OSv=$OS
# compressed with .zst extension
REPO="https://github.com/mcmilk/openzfs-freebsd-images"
FREEBSD="$REPO/releases/download/v2024-09-16"
FREEBSD="$REPO/releases/download/v2024-10-05"
URLzs=""
# Ubuntu mirrors
@ -52,43 +52,55 @@ case "$OS" in
OSNAME="Debian 12"
URL="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2"
;;
fedora39)
OSNAME="Fedora 39"
OSv="fedora39"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/39/Cloud/x86_64/images/Fedora-Cloud-Base-39-1.5.x86_64.qcow2"
;;
fedora40)
OSNAME="Fedora 40"
OSv="fedora39"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/40/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2"
;;
freebsd13r)
fedora41)
OSNAME="Fedora 41"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/41/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-1.4.x86_64.qcow2"
;;
freebsd13-3r)
OSNAME="FreeBSD 13.3-RELEASE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.3-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd13-4r)
OSNAME="FreeBSD 13.4-RELEASE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd13)
OSNAME="FreeBSD 13.4-STABLE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-STABLE.qcow2.zst"
freebsd14-0r)
OSNAME="FreeBSD 14.0-RELEASE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.0-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd14r)
freebsd14-1r)
OSNAME="FreeBSD 14.1-RELEASE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd14)
freebsd13-4s)
OSNAME="FreeBSD 13.4-STABLE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd14-1s)
OSNAME="FreeBSD 14.1-STABLE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd15)
freebsd15-0c)
OSNAME="FreeBSD 15.0-CURRENT"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-15.0-CURRENT.qcow2.zst"

View File

@ -13,10 +13,10 @@ function archlinux() {
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo pacman -Sy --noconfirm base-devel bc cpio dhclient dkms fakeroot \
fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils parted \
pax perf python-packaging python-setuptools qemu-guest-agent ksh samba \
sysstat rng-tools rsync wget xxhash
sudo pacman -Sy --noconfirm base-devel bc cpio cryptsetup dhclient dkms \
fakeroot fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils \
parted pax perf python-packaging python-setuptools qemu-guest-agent ksh \
samba sysstat rng-tools rsync wget xxhash
echo "##[endgroup]"
}
@ -30,11 +30,11 @@ function debian() {
echo "##[group]Install Development Tools"
sudo apt-get install -y \
acl alien attr autoconf bc cpio curl dbench dh-python dkms fakeroot \
fio gdb gdebi git ksh lcov isc-dhcp-client jq libacl1-dev libaio-dev \
libattr1-dev libblkid-dev libcurl4-openssl-dev libdevmapper-dev libelf-dev \
libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev libtool \
libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \
acl alien attr autoconf bc cpio cryptsetup curl dbench dh-python dkms \
fakeroot fio gdb gdebi git ksh lcov isc-dhcp-client jq libacl1-dev \
libaio-dev libattr1-dev libblkid-dev libcurl4-openssl-dev libdevmapper-dev \
libelf-dev libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev \
libtool libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \
lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \
python3-cffi python3-dev python3-distlib python3-packaging \
python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \
@ -66,16 +66,23 @@ function rhel() {
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo dnf group install -y "Development Tools"
# Alma wants "Development Tools", Fedora 41 wants "development-tools"
if ! sudo dnf group install -y "Development Tools" ; then
echo "Trying 'development-tools' instead of 'Development Tools'"
sudo dnf group install -y development-tools
fi
sudo dnf install -y \
acl attr bc bzip2 curl dbench dkms elfutils-libelf-devel fio gdb git \
jq kernel-rpm-macros ksh libacl-devel libaio-devel libargon2-devel \
libattr-devel libblkid-devel libcurl-devel libffi-devel ncompress \
libselinux-devel libtirpc-devel libtool libudev-devel libuuid-devel \
lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester parted perf \
python3 python3-cffi python3-devel python3-packaging kernel-devel \
python3-setuptools qemu-guest-agent rng-tools rpcgen rpm-build rsync \
samba sysstat systemd watchdog wget xfsprogs-devel xxhash zlib-devel
acl attr bc bzip2 cryptsetup curl dbench dkms elfutils-libelf-devel fio \
gdb git jq kernel-rpm-macros ksh libacl-devel libaio-devel \
libargon2-devel libattr-devel libblkid-devel libcurl-devel libffi-devel \
ncompress libselinux-devel libtirpc-devel libtool libudev-devel \
libuuid-devel lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester \
parted perf python3 python3-cffi python3-devel python3-packaging \
kernel-devel python3-setuptools qemu-guest-agent rng-tools rpcgen \
rpm-build rsync samba sysstat systemd watchdog wget xfsprogs-devel xxhash \
zlib-devel
echo "##[endgroup]"
}
@ -111,6 +118,7 @@ case "$1" in
archlinux
;;
debian*)
echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
debian
echo "##[group]Install Debian specific"
sudo apt-get install -yq linux-perf dh-sequence-dkms

View File

@ -83,7 +83,7 @@ function rpm_build_and_install() {
echo "##[endgroup]"
echo "##[group]Install"
run sudo dnf -y --skip-broken localinstall $(ls *.rpm | grep -v src.rpm)
run sudo dnf -y --nobest install $(ls *.rpm | grep -v src.rpm)
echo "##[endgroup]"
}

View File

@ -14,17 +14,21 @@ PID=$(pidof /usr/bin/qemu-system-x86_64)
tail --pid=$PID -f /dev/null
sudo virsh undefine openzfs
# definitions of per operating system
# default values per test vm:
VMs=2
CPU=2
# cpu pinning
CPUSET=("0,1" "2,3")
case "$OS" in
freebsd*)
VMs=2
CPU=3
# FreeBSD can't be optimized via ksmtuned
RAM=6
;;
*)
VMs=2
CPU=3
RAM=7
# Linux can be optimized via ksmtuned
RAM=8
;;
esac
@ -73,6 +77,7 @@ EOF
--cpu host-passthrough \
--virt-type=kvm --hvm \
--vcpus=$CPU,sockets=1 \
--cpuset=${CPUSET[$((i-1))]} \
--memory $((1024*RAM)) \
--memballoon model=virtio \
--graphics none \

View File

@ -11,12 +11,10 @@ function output() {
}
function outfile() {
test -s "$1" || return
cat "$1" >> "out-$logfile.md"
}
function outfile_plain() {
test -s "$1" || return
output "<pre>"
cat "$1" >> "out-$logfile.md"
output "</pre>"
@ -45,6 +43,8 @@ if [ ! -f out-1.md ]; then
tar xf "$tarfile"
test -s env.txt || continue
source env.txt
# when uname.txt is there, the other files are also ok
test -s uname.txt || continue
output "\n## Functional Tests: $OSNAME\n"
outfile_plain uname.txt
outfile_plain summary.txt

View File

@ -22,8 +22,8 @@ jobs:
- name: Generate OS config and CI type
id: os
run: |
FULL_OS='["almalinux8", "almalinux9", "centos-stream9", "debian11", "debian12", "fedora39", "fedora40", "freebsd13", "freebsd13r", "freebsd14", "freebsd14r", "ubuntu20", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "debian12", "fedora40", "freebsd13", "freebsd14", "ubuntu24"]'
FULL_OS='["almalinux8", "almalinux9", "centos-stream9", "debian11", "debian12", "fedora40", "fedora41", "freebsd13-4r", "freebsd14-0r", "freebsd14-1s", "ubuntu20", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "debian12", "fedora41", "freebsd13-3r", "freebsd14-1r", "ubuntu24"]'
# determine CI type when running on PR
ci_type="full"
if ${{ github.event_name == 'pull_request' }}; then
@ -46,10 +46,12 @@ jobs:
strategy:
fail-fast: false
matrix:
# all:
# os: [almalinux8, almalinux9, archlinux, centos-stream9, fedora39, fedora40, debian11, debian12, freebsd13, freebsd13r, freebsd14, freebsd14r, freebsd15, ubuntu20, ubuntu22, ubuntu24]
# openzfs:
# os: [almalinux8, almalinux9, centos-stream9, debian11, debian12, fedora39, fedora40, freebsd13, freebsd13r, freebsd14, freebsd14r, ubuntu20, ubuntu22, ubuntu24]
# rhl: almalinux8, almalinux9, centos-stream9, fedora40, fedora41
# debian: debian11, debian12, ubuntu20, ubuntu22, ubuntu24
# misc: archlinux, tumbleweed
# FreeBSD Release: freebsd13-3r, freebsd13-4r, freebsd14-0r, freebsd14-1r
# FreeBSD Stable: freebsd13-4s, freebsd14-1s
# FreeBSD Current: freebsd15-0c
os: ${{ fromJson(needs.test-config.outputs.test_os) }}
runs-on: ubuntu-24.04
steps:

View File

@ -70,6 +70,7 @@ Rob Norris <robn@despairlabs.com>
Rob Norris <rob.norris@klarasystems.com>
Sam Lunt <samuel.j.lunt@gmail.com>
Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Sebastian Wuerl <s.wuerl@mailbox.org>
Stoiko Ivanov <github@nomore.at>
Tamas TEVESZ <ice@extreme.hu>
WHR <msl0000023508@gmail.com>
@ -78,6 +79,7 @@ Youzhong Yang <youzhong@gmail.com>
# Signed-off-by: overriding Author:
Ryan <errornointernet@envs.net> <error.nointernet@gmail.com>
Sietse <sietse@wizdom.nu> <uglymotha@wizdom.nu>
Qiuhao Chen <chenqiuhao1997@gmail.com> <haohao0924@126.com>
Yuxin Wang <yuxinwang9999@gmail.com> <Bi11gates9999@gmail.com>
Zhenlei Huang <zlei@FreeBSD.org> <zlei.huang@gmail.com>

View File

@ -423,6 +423,7 @@ CONTRIBUTORS:
Mathieu Velten <matmaul@gmail.com>
Matt Fiddaman <github@m.fiddaman.uk>
Matthew Ahrens <matt@delphix.com>
Matthew Heller <matthew.f.heller@gmail.com>
Matthew Thode <mthode@mthode.org>
Matthias Blankertz <matthias@blankertz.org>
Matt Johnston <matt@fugro-fsi.com.au>
@ -562,6 +563,7 @@ CONTRIBUTORS:
Scot W. Stevenson <scot.stevenson@gmail.com>
Sean Eric Fagan <sef@ixsystems.com>
Sebastian Gottschall <s.gottschall@dd-wrt.com>
Sebastian Wuerl <s.wuerl@mailbox.org>
Sebastien Roy <seb@delphix.com>
Sen Haerens <sen@senhaerens.be>
Serapheim Dimitropoulos <serapheim@delphix.com>
@ -574,6 +576,7 @@ CONTRIBUTORS:
Shawn Bayern <sbayern@law.fsu.edu>
Shengqi Chen <harry-chen@outlook.com>
Shen Yan <shenyanxxxy@qq.com>
Sietse <sietse@wizdom.nu>
Simon Guest <simon.guest@tesujimath.org>
Simon Klinkert <simon.klinkert@gmail.com>
Sowrabha Gopal <sowrabha.gopal@delphix.com>
@ -629,6 +632,7 @@ CONTRIBUTORS:
Trevor Bautista <trevrb@trevrb.net>
Trey Dockendorf <treydock@gmail.com>
Troels Nørgaard <tnn@tradeshift.com>
tstabrawa <tstabrawa@users.noreply.github.com>
Tulsi Jain <tulsi.jain@delphix.com>
Turbo Fredriksson <turbo@bayour.com>
Tyler J. Stachecki <stachecki.tyler@gmail.com>

4
META
View File

@ -2,9 +2,9 @@ Meta: 1
Name: zfs
Branch: 1.0
Version: 2.3.0
Release: rc1
Release: rc4
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 6.11
Linux-Maximum: 6.12
Linux-Minimum: 4.18

View File

@ -662,10 +662,7 @@ def section_arc(kstats_dict):
print()
print('ARC hash breakdown:')
prt_i1('Elements max:', f_hits(arc_stats['hash_elements_max']))
prt_i2('Elements current:',
f_perc(arc_stats['hash_elements'], arc_stats['hash_elements_max']),
f_hits(arc_stats['hash_elements']))
prt_i1('Elements:', f_hits(arc_stats['hash_elements']))
prt_i1('Collisions:', f_hits(arc_stats['hash_collisions']))
prt_i1('Chain max:', f_hits(arc_stats['hash_chain_max']))

View File

@ -1131,7 +1131,7 @@ dump_zap(objset_t *os, uint64_t object, void *data, size_t size)
!!(zap_getflags(zc.zc_zap) & ZAP_FLAG_UINT64_KEY);
if (key64)
(void) printf("\t\t0x%010lx = ",
(void) printf("\t\t0x%010" PRIu64 "x = ",
*(uint64_t *)attrp->za_name);
else
(void) printf("\t\t%s = ", attrp->za_name);
@ -1967,17 +1967,53 @@ dump_dedup_ratio(const ddt_stat_t *dds)
static void
dump_ddt_log(ddt_t *ddt)
{
if (ddt->ddt_version != DDT_VERSION_FDT ||
!(ddt->ddt_flags & DDT_FLAG_LOG))
return;
for (int n = 0; n < 2; n++) {
ddt_log_t *ddl = &ddt->ddt_log[n];
char flagstr[64] = {0};
if (ddl->ddl_flags > 0) {
flagstr[0] = ' ';
int c = 1;
if (ddl->ddl_flags & DDL_FLAG_FLUSHING)
c += strlcpy(&flagstr[c], " FLUSHING",
sizeof (flagstr) - c);
if (ddl->ddl_flags & DDL_FLAG_CHECKPOINT)
c += strlcpy(&flagstr[c], " CHECKPOINT",
sizeof (flagstr) - c);
if (ddl->ddl_flags &
~(DDL_FLAG_FLUSHING|DDL_FLAG_CHECKPOINT))
c += strlcpy(&flagstr[c], " UNKNOWN",
sizeof (flagstr) - c);
flagstr[1] = '[';
flagstr[c++] = ']';
}
uint64_t count = avl_numnodes(&ddl->ddl_tree);
if (count == 0)
continue;
printf(DMU_POOL_DDT_LOG ": %lu log entries\n",
zio_checksum_table[ddt->ddt_checksum].ci_name, n, count);
printf(DMU_POOL_DDT_LOG ": flags=0x%02x%s; obj=%llu; "
"len=%llu; txg=%llu; entries=%llu\n",
zio_checksum_table[ddt->ddt_checksum].ci_name, n,
ddl->ddl_flags, flagstr,
(u_longlong_t)ddl->ddl_object,
(u_longlong_t)ddl->ddl_length,
(u_longlong_t)ddl->ddl_first_txg, (u_longlong_t)count);
if (dump_opt['D'] < 4)
if (ddl->ddl_flags & DDL_FLAG_CHECKPOINT) {
const ddt_key_t *ddk = &ddl->ddl_checkpoint;
printf(" checkpoint: "
"%016llx:%016llx:%016llx:%016llx:%016llx\n",
(u_longlong_t)ddk->ddk_cksum.zc_word[0],
(u_longlong_t)ddk->ddk_cksum.zc_word[1],
(u_longlong_t)ddk->ddk_cksum.zc_word[2],
(u_longlong_t)ddk->ddk_cksum.zc_word[3],
(u_longlong_t)ddk->ddk_prop);
}
if (count == 0 || dump_opt['D'] < 4)
continue;
ddt_lightweight_entry_t ddlwe;
@ -1991,7 +2027,7 @@ dump_ddt_log(ddt_t *ddt)
}
static void
dump_ddt(ddt_t *ddt, ddt_type_t type, ddt_class_t class)
dump_ddt_object(ddt_t *ddt, ddt_type_t type, ddt_class_t class)
{
char name[DDT_NAMELEN];
ddt_lightweight_entry_t ddlwe;
@ -2016,11 +2052,8 @@ dump_ddt(ddt_t *ddt, ddt_type_t type, ddt_class_t class)
ddt_object_name(ddt, type, class, name);
(void) printf("%s: %llu entries, size %llu on disk, %llu in core\n",
name,
(u_longlong_t)count,
(u_longlong_t)dspace,
(u_longlong_t)mspace);
(void) printf("%s: dspace=%llu; mspace=%llu; entries=%llu\n", name,
(u_longlong_t)dspace, (u_longlong_t)mspace, (u_longlong_t)count);
if (dump_opt['D'] < 3)
return;
@ -2043,24 +2076,52 @@ dump_ddt(ddt_t *ddt, ddt_type_t type, ddt_class_t class)
(void) printf("\n");
}
static void
dump_ddt(ddt_t *ddt)
{
if (!ddt || ddt->ddt_version == DDT_VERSION_UNCONFIGURED)
return;
char flagstr[64] = {0};
if (ddt->ddt_flags > 0) {
flagstr[0] = ' ';
int c = 1;
if (ddt->ddt_flags & DDT_FLAG_FLAT)
c += strlcpy(&flagstr[c], " FLAT",
sizeof (flagstr) - c);
if (ddt->ddt_flags & DDT_FLAG_LOG)
c += strlcpy(&flagstr[c], " LOG",
sizeof (flagstr) - c);
if (ddt->ddt_flags & ~DDT_FLAG_MASK)
c += strlcpy(&flagstr[c], " UNKNOWN",
sizeof (flagstr) - c);
flagstr[1] = '[';
flagstr[c] = ']';
}
printf("DDT-%s: version=%llu [%s]; flags=0x%02llx%s; rootobj=%llu\n",
zio_checksum_table[ddt->ddt_checksum].ci_name,
(u_longlong_t)ddt->ddt_version,
(ddt->ddt_version == 0) ? "LEGACY" :
(ddt->ddt_version == 1) ? "FDT" : "UNKNOWN",
(u_longlong_t)ddt->ddt_flags, flagstr,
(u_longlong_t)ddt->ddt_dir_object);
for (ddt_type_t type = 0; type < DDT_TYPES; type++)
for (ddt_class_t class = 0; class < DDT_CLASSES; class++)
dump_ddt_object(ddt, type, class);
dump_ddt_log(ddt);
}
static void
dump_all_ddts(spa_t *spa)
{
ddt_histogram_t ddh_total = {{{0}}};
ddt_stat_t dds_total = {0};
for (enum zio_checksum c = 0; c < ZIO_CHECKSUM_FUNCTIONS; c++) {
ddt_t *ddt = spa->spa_ddt[c];
if (!ddt || ddt->ddt_version == DDT_VERSION_UNCONFIGURED)
continue;
for (ddt_type_t type = 0; type < DDT_TYPES; type++) {
for (ddt_class_t class = 0; class < DDT_CLASSES;
class++) {
dump_ddt(ddt, type, class);
}
}
dump_ddt_log(ddt);
}
for (enum zio_checksum c = 0; c < ZIO_CHECKSUM_FUNCTIONS; c++)
dump_ddt(spa->spa_ddt[c]);
ddt_get_dedup_stats(spa, &dds_total);
@ -2119,9 +2180,6 @@ dump_brt(spa_t *spa)
return;
}
brt_t *brt = spa->spa_brt;
VERIFY(brt);
char count[32], used[32], saved[32];
zdb_nicebytes(brt_get_used(spa), used, sizeof (used));
zdb_nicebytes(brt_get_saved(spa), saved, sizeof (saved));
@ -2132,11 +2190,8 @@ dump_brt(spa_t *spa)
if (dump_opt['T'] < 2)
return;
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd == NULL)
continue;
for (uint64_t vdevid = 0; vdevid < spa->spa_brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = spa->spa_brt_vdevs[vdevid];
if (!brtvd->bv_initiated) {
printf("BRT: vdev %" PRIu64 ": empty\n", vdevid);
continue;
@ -2152,34 +2207,54 @@ dump_brt(spa_t *spa)
if (dump_opt['T'] < 3)
return;
/* -TTT shows a per-vdev histograms; -TTTT shows all entries */
boolean_t do_histo = dump_opt['T'] == 3;
char dva[64];
if (!do_histo)
printf("\n%-16s %-10s\n", "DVA", "REFCNT");
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd == NULL || !brtvd->bv_initiated)
for (uint64_t vdevid = 0; vdevid < spa->spa_brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = spa->spa_brt_vdevs[vdevid];
if (!brtvd->bv_initiated)
continue;
uint64_t counts[64] = {};
zap_cursor_t zc;
zap_attribute_t *za = zap_attribute_alloc();
for (zap_cursor_init(&zc, brt->brt_mos, brtvd->bv_mos_entries);
for (zap_cursor_init(&zc, spa->spa_meta_objset,
brtvd->bv_mos_entries);
zap_cursor_retrieve(&zc, za) == 0;
zap_cursor_advance(&zc)) {
uint64_t refcnt;
VERIFY0(zap_lookup_uint64(brt->brt_mos,
VERIFY0(zap_lookup_uint64(spa->spa_meta_objset,
brtvd->bv_mos_entries,
(const uint64_t *)za->za_name, 1,
za->za_integer_length, za->za_num_integers,
&refcnt));
uint64_t offset = *(const uint64_t *)za->za_name;
if (do_histo)
counts[highbit64(refcnt)]++;
else {
uint64_t offset =
*(const uint64_t *)za->za_name;
snprintf(dva, sizeof (dva), "%" PRIu64 ":%llx", vdevid,
(u_longlong_t)offset);
printf("%-16s %-10llu\n", dva, (u_longlong_t)refcnt);
snprintf(dva, sizeof (dva), "%" PRIu64 ":%llx",
vdevid, (u_longlong_t)offset);
printf("%-16s %-10llu\n", dva,
(u_longlong_t)refcnt);
}
}
zap_cursor_fini(&zc);
zap_attribute_free(za);
if (do_histo) {
printf("\nBRT: vdev %" PRIu64
": DVAs with 2^n refcnts:\n", vdevid);
dump_histogram(counts, 64, 0);
}
}
}
@ -4266,6 +4341,10 @@ dump_uberblock(uberblock_t *ub, const char *header, const char *footer)
(void) printf("\ttimestamp = %llu UTC = %s",
(u_longlong_t)ub->ub_timestamp, ctime(&timestamp));
char blkbuf[BP_SPRINTF_LEN];
snprintf_blkptr(blkbuf, sizeof (blkbuf), &ub->ub_rootbp);
(void) printf("\tbp = %s\n", blkbuf);
(void) printf("\tmmp_magic = %016llx\n",
(u_longlong_t)ub->ub_mmp_magic);
if (MMP_VALID(ub)) {
@ -6874,7 +6953,7 @@ iterate_deleted_livelists(spa_t *spa, ll_iter_t func, void *arg)
for (zap_cursor_init(&zc, mos, zap_obj);
zap_cursor_retrieve(&zc, attrp) == 0;
(void) zap_cursor_advance(&zc)) {
dsl_deadlist_open(&ll, mos, attrp->za_first_integer);
VERIFY0(dsl_deadlist_open(&ll, mos, attrp->za_first_integer));
func(&ll, arg);
dsl_deadlist_close(&ll);
}
@ -8204,16 +8283,13 @@ dump_mos_leaks(spa_t *spa)
}
}
if (spa->spa_brt != NULL) {
brt_t *brt = spa->spa_brt;
for (uint64_t vdevid = 0; vdevid < brt->brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = &brt->brt_vdevs[vdevid];
if (brtvd != NULL && brtvd->bv_initiated) {
for (uint64_t vdevid = 0; vdevid < spa->spa_brt_nvdevs; vdevid++) {
brt_vdev_t *brtvd = spa->spa_brt_vdevs[vdevid];
if (brtvd->bv_initiated) {
mos_obj_refd(brtvd->bv_mos_brtvdev);
mos_obj_refd(brtvd->bv_mos_entries);
}
}
}
/*
* Visit all allocated objects and make sure they are referenced.

View File

@ -67,19 +67,19 @@ zil_prt_rec_create(zilog_t *zilog, int txtype, const void *arg)
const lr_create_t *lrc = arg;
const _lr_create_t *lr = &lrc->lr_create;
time_t crtime = lr->lr_crtime[0];
char *name, *link;
const char *name, *link;
lr_attr_t *lrattr;
name = (char *)(lr + 1);
name = (const char *)&lrc->lr_data[0];
if (lr->lr_common.lrc_txtype == TX_CREATE_ATTR ||
lr->lr_common.lrc_txtype == TX_MKDIR_ATTR) {
lrattr = (lr_attr_t *)(lr + 1);
lrattr = (lr_attr_t *)&lrc->lr_data[0];
name += ZIL_XVAT_SIZE(lrattr->lr_attr_masksize);
}
if (txtype == TX_SYMLINK) {
link = name + strlen(name) + 1;
link = (const char *)&lrc->lr_data[strlen(name) + 1];
(void) printf("%s%s -> %s\n", tab_prefix, name, link);
} else if (txtype != TX_MKXATTR) {
(void) printf("%s%s\n", tab_prefix, name);
@ -104,7 +104,7 @@ zil_prt_rec_remove(zilog_t *zilog, int txtype, const void *arg)
const lr_remove_t *lr = arg;
(void) printf("%sdoid %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (char *)(lr + 1));
(u_longlong_t)lr->lr_doid, (const char *)&lr->lr_data[0]);
}
static void
@ -115,7 +115,7 @@ zil_prt_rec_link(zilog_t *zilog, int txtype, const void *arg)
(void) printf("%sdoid %llu, link_obj %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (u_longlong_t)lr->lr_link_obj,
(char *)(lr + 1));
(const char *)&lr->lr_data[0]);
}
static void
@ -124,8 +124,8 @@ zil_prt_rec_rename(zilog_t *zilog, int txtype, const void *arg)
(void) zilog, (void) txtype;
const lr_rename_t *lrr = arg;
const _lr_rename_t *lr = &lrr->lr_rename;
char *snm = (char *)(lr + 1);
char *tnm = snm + strlen(snm) + 1;
const char *snm = (const char *)&lrr->lr_data[0];
const char *tnm = (const char *)&lrr->lr_data[strlen(snm) + 1];
(void) printf("%ssdoid %llu, tdoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_sdoid, (u_longlong_t)lr->lr_tdoid);
@ -211,7 +211,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
/* data is stored after the end of the lr_write record */
data = abd_alloc(lr->lr_length, B_FALSE);
abd_copy_from_buf(data, lr + 1, lr->lr_length);
abd_copy_from_buf(data, &lr->lr_data[0], lr->lr_length);
}
(void) printf("%s", tab_prefix);
@ -309,7 +309,7 @@ zil_prt_rec_setsaxattr(zilog_t *zilog, int txtype, const void *arg)
(void) zilog, (void) txtype;
const lr_setsaxattr_t *lr = arg;
char *name = (char *)(lr + 1);
const char *name = (const char *)&lr->lr_data[0];
(void) printf("%sfoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_foid);
@ -318,7 +318,7 @@ zil_prt_rec_setsaxattr(zilog_t *zilog, int txtype, const void *arg)
(void) printf("%sXAT_VALUE NULL\n", tab_prefix);
} else {
(void) printf("%sXAT_VALUE ", tab_prefix);
char *val = name + (strlen(name) + 1);
const char *val = (const char *)&lr->lr_data[strlen(name) + 1];
for (int i = 0; i < lr->lr_size; i++) {
(void) printf("%c", *val);
val++;

View File

@ -445,8 +445,8 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
* its a loopback event from spa_async_remove(). Just
* ignore it.
*/
if (vs->vs_state == VDEV_STATE_REMOVED &&
state == VDEV_STATE_REMOVED)
if ((vs->vs_state == VDEV_STATE_REMOVED && state ==
VDEV_STATE_REMOVED) || vs->vs_state == VDEV_STATE_OFFLINE)
return;
/* Remove the vdev since device is unplugged */

View File

@ -139,7 +139,8 @@ dev_event_nvlist(struct udev_device *dev)
* is /dev/sda.
*/
struct udev_device *parent_dev = udev_device_get_parent(dev);
if ((value = udev_device_get_sysattr_value(parent_dev, "size"))
if (parent_dev != NULL &&
(value = udev_device_get_sysattr_value(parent_dev, "size"))
!= NULL) {
uint64_t numval = DEV_BSIZE;

View File

@ -2162,6 +2162,7 @@ zfs_do_get(int argc, char **argv)
cb.cb_type = ZFS_TYPE_DATASET;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZFS_OPTION_JSON_NUMS_AS_INT},
{0, 0, 0, 0}
};
@ -3760,8 +3761,13 @@ collect_dataset(zfs_handle_t *zhp, list_cbdata_t *cb)
if (cb->cb_json) {
if (pl->pl_prop == ZFS_PROP_NAME)
continue;
const char *prop_name;
if (pl->pl_prop != ZPROP_USERPROP)
prop_name = zfs_prop_to_name(pl->pl_prop);
else
prop_name = pl->pl_user_prop;
if (zprop_nvlist_one_property(
zfs_prop_to_name(pl->pl_prop), propstr,
prop_name, propstr,
sourcetype, source, NULL, props,
cb->cb_json_as_int) != 0)
nomem();
@ -3852,6 +3858,7 @@ zfs_do_list(int argc, char **argv)
nvlist_t *data = NULL;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZFS_OPTION_JSON_NUMS_AS_INT},
{0, 0, 0, 0}
};
@ -7436,9 +7443,15 @@ share_mount(int op, int argc, char **argv)
uint_t nthr;
jsobj = data = item = NULL;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, op == OP_MOUNT ? ":ajRlvo:Of" : "al"))
!= -1) {
while ((c = getopt_long(argc, argv,
op == OP_MOUNT ? ":ajRlvo:Of" : "al",
op == OP_MOUNT ? long_options : NULL, NULL)) != -1) {
switch (c) {
case 'a':
do_all = 1;
@ -8374,8 +8387,14 @@ zfs_do_channel_program(int argc, char **argv)
boolean_t sync_flag = B_TRUE, json_output = B_FALSE;
zpool_handle_t *zhp;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, "nt:m:j")) != -1) {
while ((c = getopt_long(argc, argv, "nt:m:j", long_options,
NULL)) != -1) {
switch (c) {
case 't':
case 'm': {
@ -9083,7 +9102,13 @@ zfs_do_version(int argc, char **argv)
int c;
nvlist_t *jsobj = NULL, *zfs_ver = NULL;
boolean_t json = B_FALSE;
while ((c = getopt(argc, argv, "j")) != -1) {
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{0, 0, 0, 0}
};
while ((c = getopt_long(argc, argv, "j", long_options, NULL)) != -1) {
switch (c) {
case 'j':
json = B_TRUE;
@ -9187,7 +9212,7 @@ main(int argc, char **argv)
* Special case '-V|--version'
*/
if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0))
return (zfs_do_version(argc, argv));
return (zfs_version_print() != 0);
/*
* Special case 'help'

View File

@ -512,7 +512,8 @@ get_usage(zpool_help_t idx)
return (gettext("\tinitialize [-c | -s | -u] [-w] <pool> "
"[<device> ...]\n"));
case HELP_SCRUB:
return (gettext("\tscrub [-s | -p] [-w] [-e] <pool> ...\n"));
return (gettext("\tscrub [-e | -s | -p | -C] [-w] "
"<pool> ...\n"));
case HELP_RESILVER:
return (gettext("\tresilver <pool> ...\n"));
case HELP_TRIM:
@ -6882,8 +6883,13 @@ collect_pool(zpool_handle_t *zhp, list_cbdata_t *cb)
if (cb->cb_json) {
if (pl->pl_prop == ZPOOL_PROP_NAME)
continue;
const char *prop_name;
if (pl->pl_prop != ZPROP_USERPROP)
prop_name = zpool_prop_to_name(pl->pl_prop);
else
prop_name = pl->pl_user_prop;
(void) zprop_nvlist_one_property(
zpool_prop_to_name(pl->pl_prop), propstr,
prop_name, propstr,
sourcetype, NULL, NULL, props, cb->cb_json_as_int);
} else {
/*
@ -7340,6 +7346,7 @@ zpool_do_list(int argc, char **argv)
current_prop_type = ZFS_TYPE_POOL;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT},
{"json-pool-key-guid", no_argument, NULL,
ZPOOL_OPTION_POOL_KEY_GUID},
@ -7965,8 +7972,11 @@ zpool_do_online(int argc, char **argv)
poolname = argv[0];
if ((zhp = zpool_open(g_zfs, poolname)) == NULL)
if ((zhp = zpool_open(g_zfs, poolname)) == NULL) {
(void) fprintf(stderr, gettext("failed to open pool "
"\"%s\""), poolname);
return (1);
}
for (i = 1; i < argc; i++) {
vdev_state_t oldstate;
@ -7987,12 +7997,15 @@ zpool_do_online(int argc, char **argv)
&l2cache, NULL);
if (tgt == NULL) {
ret = 1;
(void) fprintf(stderr, gettext("couldn't find device "
"\"%s\" in pool \"%s\"\n"), argv[i], poolname);
continue;
}
uint_t vsc;
oldstate = ((vdev_stat_t *)fnvlist_lookup_uint64_array(tgt,
ZPOOL_CONFIG_VDEV_STATS, &vsc))->vs_state;
if (zpool_vdev_online(zhp, argv[i], flags, &newstate) == 0) {
if ((rc = zpool_vdev_online(zhp, argv[i], flags,
&newstate)) == 0) {
if (newstate != VDEV_STATE_HEALTHY) {
(void) printf(gettext("warning: device '%s' "
"onlined, but remains in faulted state\n"),
@ -8018,6 +8031,9 @@ zpool_do_online(int argc, char **argv)
}
}
} else {
(void) fprintf(stderr, gettext("Failed to online "
"\"%s\" in pool \"%s\": %d\n"),
argv[i], poolname, rc);
ret = 1;
}
}
@ -8102,8 +8118,11 @@ zpool_do_offline(int argc, char **argv)
poolname = argv[0];
if ((zhp = zpool_open(g_zfs, poolname)) == NULL)
if ((zhp = zpool_open(g_zfs, poolname)) == NULL) {
(void) fprintf(stderr, gettext("failed to open pool "
"\"%s\""), poolname);
return (1);
}
for (i = 1; i < argc; i++) {
uint64_t guid = zpool_vdev_path_to_guid(zhp, argv[i]);
@ -8411,12 +8430,13 @@ wait_callback(zpool_handle_t *zhp, void *data)
}
/*
* zpool scrub [-s | -p] [-w] [-e] <pool> ...
* zpool scrub [-e | -s | -p | -C] [-w] <pool> ...
*
* -e Only scrub blocks in the error log.
* -s Stop. Stops any in-progress scrub.
* -p Pause. Pause in-progress scrub.
* -w Wait. Blocks until scrub has completed.
* -C Scrub from last saved txg.
*/
int
zpool_do_scrub(int argc, char **argv)
@ -8432,9 +8452,10 @@ zpool_do_scrub(int argc, char **argv)
boolean_t is_error_scrub = B_FALSE;
boolean_t is_pause = B_FALSE;
boolean_t is_stop = B_FALSE;
boolean_t is_txg_continue = B_FALSE;
/* check options */
while ((c = getopt(argc, argv, "spwe")) != -1) {
while ((c = getopt(argc, argv, "spweC")) != -1) {
switch (c) {
case 'e':
is_error_scrub = B_TRUE;
@ -8448,6 +8469,9 @@ zpool_do_scrub(int argc, char **argv)
case 'w':
wait = B_TRUE;
break;
case 'C':
is_txg_continue = B_TRUE;
break;
case '?':
(void) fprintf(stderr, gettext("invalid option '%c'\n"),
optopt);
@ -8459,6 +8483,18 @@ zpool_do_scrub(int argc, char **argv)
(void) fprintf(stderr, gettext("invalid option "
"combination :-s and -p are mutually exclusive\n"));
usage(B_FALSE);
} else if (is_pause && is_txg_continue) {
(void) fprintf(stderr, gettext("invalid option "
"combination :-p and -C are mutually exclusive\n"));
usage(B_FALSE);
} else if (is_stop && is_txg_continue) {
(void) fprintf(stderr, gettext("invalid option "
"combination :-s and -C are mutually exclusive\n"));
usage(B_FALSE);
} else if (is_error_scrub && is_txg_continue) {
(void) fprintf(stderr, gettext("invalid option "
"combination :-e and -C are mutually exclusive\n"));
usage(B_FALSE);
} else {
if (is_error_scrub)
cb.cb_type = POOL_SCAN_ERRORSCRUB;
@ -8467,6 +8503,8 @@ zpool_do_scrub(int argc, char **argv)
cb.cb_scrub_cmd = POOL_SCRUB_PAUSE;
} else if (is_stop) {
cb.cb_type = POOL_SCAN_NONE;
} else if (is_txg_continue) {
cb.cb_scrub_cmd = POOL_SCRUB_FROM_LAST_TXG;
} else {
cb.cb_scrub_cmd = POOL_SCRUB_NORMAL;
}
@ -9224,6 +9262,12 @@ vdev_stats_nvlist(zpool_handle_t *zhp, status_cbdata_t *cb, nvlist_t *nv,
}
}
if (cb->cb_print_dio_verify) {
nice_num_str_nvlist(vds, "dio_verify_errors",
vs->vs_dio_verify_errors, cb->cb_literal,
cb->cb_json_as_int, ZFS_NICENUM_1024);
}
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,
&notpresent) == 0) {
nice_num_str_nvlist(vds, ZPOOL_CONFIG_NOT_PRESENT,
@ -10010,9 +10054,8 @@ print_removal_status(zpool_handle_t *zhp, pool_removal_stat_t *prs)
(void) printf(gettext("Removal of %s canceled on %s"),
vdev_name, ctime(&end));
} else {
uint64_t copied, total, elapsed, mins_left, hours_left;
uint64_t copied, total, elapsed, rate, mins_left, hours_left;
double fraction_done;
uint_t rate;
assert(prs->prs_state == DSS_SCANNING);
@ -10108,9 +10151,8 @@ print_raidz_expand_status(zpool_handle_t *zhp, pool_raidz_expand_stat_t *pres)
copied_buf, time_buf, ctime((time_t *)&end));
} else {
char examined_buf[7], total_buf[7], rate_buf[7];
uint64_t copied, total, elapsed, secs_left;
uint64_t copied, total, elapsed, rate, secs_left;
double fraction_done;
uint_t rate;
assert(pres->pres_state == DSS_SCANNING);
@ -10975,6 +11017,7 @@ zpool_do_status(int argc, char **argv)
struct option long_options[] = {
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT},
{"json-flat-vdevs", no_argument, NULL,
ZPOOL_OPTION_JSON_FLAT_VDEVS},
@ -12583,6 +12626,7 @@ zpool_do_get(int argc, char **argv)
current_prop_type = cb.cb_type;
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
{"json-int", no_argument, NULL, ZPOOL_OPTION_JSON_NUMS_AS_INT},
{"json-pool-key-guid", no_argument, NULL,
ZPOOL_OPTION_POOL_KEY_GUID},
@ -13497,7 +13541,12 @@ zpool_do_version(int argc, char **argv)
int c;
nvlist_t *jsobj = NULL, *zfs_ver = NULL;
boolean_t json = B_FALSE;
while ((c = getopt(argc, argv, "j")) != -1) {
struct option long_options[] = {
{"json", no_argument, NULL, 'j'},
};
while ((c = getopt_long(argc, argv, "j", long_options, NULL)) != -1) {
switch (c) {
case 'j':
json = B_TRUE;
@ -13613,7 +13662,7 @@ main(int argc, char **argv)
* Special case '-V|--version'
*/
if ((strcmp(cmdname, "-V") == 0) || (strcmp(cmdname, "--version") == 0))
return (zpool_do_version(argc, argv));
return (zfs_version_print() != 0);
/*
* Special case 'help'

View File

@ -6717,6 +6717,17 @@ out:
*
* Only after a full scrub has been completed is it safe to start injecting
* data corruption. See the comment in zfs_fault_inject().
*
* EBUSY may be returned for the following six cases. It's the callers
* responsibility to handle them accordingly.
*
* Current state Requested
* 1. Normal Scrub Running Normal Scrub or Error Scrub
* 2. Normal Scrub Paused Error Scrub
* 3. Normal Scrub Paused Pause Normal Scrub
* 4. Error Scrub Running Normal Scrub or Error Scrub
* 5. Error Scrub Paused Pause Error Scrub
* 6. Resilvering Anything else
*/
static int
ztest_scrub_impl(spa_t *spa)
@ -8082,8 +8093,14 @@ ztest_raidz_expand_check(spa_t *spa)
(void) printf("verifying an interrupted raidz "
"expansion using a pool scrub ...\n");
}
/* Will fail here if there is non-recoverable corruption detected */
VERIFY0(ztest_scrub_impl(spa));
int error = ztest_scrub_impl(spa);
if (error == EBUSY)
error = 0;
VERIFY0(error);
if (ztest_opts.zo_verbose >= 1) {
(void) printf("raidz expansion scrub check complete\n");
}

View File

@ -58,9 +58,9 @@ deb-utils: deb-local rpm-utils-initramfs
pkg1=$${name}-$${version}.$${arch}.rpm; \
pkg2=libnvpair3-$${version}.$${arch}.rpm; \
pkg3=libuutil3-$${version}.$${arch}.rpm; \
pkg4=libzfs5-$${version}.$${arch}.rpm; \
pkg5=libzpool5-$${version}.$${arch}.rpm; \
pkg6=libzfs5-devel-$${version}.$${arch}.rpm; \
pkg4=libzfs6-$${version}.$${arch}.rpm; \
pkg5=libzpool6-$${version}.$${arch}.rpm; \
pkg6=libzfs6-devel-$${version}.$${arch}.rpm; \
pkg7=$${name}-test-$${version}.$${arch}.rpm; \
pkg8=$${name}-dracut-$${version}.noarch.rpm; \
pkg9=$${name}-initramfs-$${version}.$${arch}.rpm; \
@ -72,7 +72,7 @@ deb-utils: deb-local rpm-utils-initramfs
path_prepend=`mktemp -d /tmp/intercept.XXXXXX`; \
echo "#!$(SHELL)" > $${path_prepend}/dh_shlibdeps; \
echo "`which dh_shlibdeps` -- \
-xlibuutil3linux -xlibnvpair3linux -xlibzfs5linux -xlibzpool5linux" \
-xlibuutil3linux -xlibnvpair3linux -xlibzfs6linux -xlibzpool6linux" \
>> $${path_prepend}/dh_shlibdeps; \
## These -x arguments are passed to dpkg-shlibdeps, which exclude the
## Debianized packages from the auto-generated dependencies of the new debs,
@ -93,13 +93,17 @@ debian:
cp -r contrib/debian debian; chmod +x debian/rules;
native-deb-utils: native-deb-local debian
while [ -f debian/deb-build.lock ]; do sleep 1; done; \
echo "native-deb-utils" > debian/deb-build.lock; \
cp contrib/debian/control debian/control; \
$(DPKGBUILD) -b -rfakeroot -us -uc;
$(DPKGBUILD) -b -rfakeroot -us -uc; \
$(RM) -f debian/deb-build.lock
native-deb-kmod: native-deb-local debian
while [ -f debian/deb-build.lock ]; do sleep 1; done; \
echo "native-deb-kmod" > debian/deb-build.lock; \
sh scripts/make_gitrev.sh; \
fakeroot debian/rules override_dh_binary-modules;
fakeroot debian/rules override_dh_binary-modules; \
$(RM) -f debian/deb-build.lock
native-deb: native-deb-utils native-deb-kmod
.NOTPARALLEL: native-deb native-deb-utils native-deb-kmod

View File

@ -17,14 +17,21 @@ AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_COMPLETE_AND_EXIT], [
AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_DEQUEUE_SIGNAL], [
dnl #
dnl # 5.17 API: enum pid_type * as new 4th dequeue_signal() argument,
dnl # 5768d8906bc23d512b1a736c1e198aa833a6daa4 ("signal: Requeue signals in the appropriate queue")
dnl # prehistory:
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask,
dnl # siginfo_t *info)
dnl #
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask, kernel_siginfo_t *info);
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type);
dnl # 4.20: kernel_siginfo_t introduced, replaces siginfo_t
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask,
dnl kernel_siginfo_t *info)
dnl #
dnl # 6.12 API: first arg struct_task* removed
dnl # int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type);
dnl # 5.17: enum pid_type introduced as 4th arg
dnl # int dequeue_signal(struct task_struct *task, sigset_t *mask,
dnl # kernel_siginfo_t *info, enum pid_type *type)
dnl #
dnl # 6.12: first arg struct_task* removed
dnl # int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info,
dnl # enum pid_type *type)
dnl #
AC_MSG_CHECKING([whether dequeue_signal() takes 4 arguments])
ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_4arg], [
@ -33,11 +40,11 @@ AC_DEFUN([ZFS_AC_KERNEL_KTHREAD_DEQUEUE_SIGNAL], [
[dequeue_signal() takes 4 arguments])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether dequeue_signal() a task argument])
ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_3arg_task], [
AC_MSG_CHECKING([whether 3-arg dequeue_signal() takes a type argument])
ZFS_LINUX_TEST_RESULT([kthread_dequeue_signal_3arg_type], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_DEQUEUE_SIGNAL_3ARG_TASK, 1,
[dequeue_signal() takes a task argument])
AC_DEFINE(HAVE_DEQUEUE_SIGNAL_3ARG_TYPE, 1,
[3-arg dequeue_signal() takes a type argument])
], [
AC_MSG_RESULT(no)
])
@ -56,17 +63,6 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_COMPLETE_AND_EXIT], [
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_DEQUEUE_SIGNAL], [
ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_3arg_task], [
#include <linux/sched/signal.h>
], [
struct task_struct *task = NULL;
sigset_t *mask = NULL;
kernel_siginfo_t *info = NULL;
int error __attribute__ ((unused));
error = dequeue_signal(task, mask, info);
])
ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_4arg], [
#include <linux/sched/signal.h>
], [
@ -78,6 +74,17 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_KTHREAD_DEQUEUE_SIGNAL], [
error = dequeue_signal(task, mask, info, type);
])
ZFS_LINUX_TEST_SRC([kthread_dequeue_signal_3arg_type], [
#include <linux/sched/signal.h>
], [
sigset_t *mask = NULL;
kernel_siginfo_t *info = NULL;
enum pid_type *type = NULL;
int error __attribute__ ((unused));
error = dequeue_signal(mask, info, type);
])
])
AC_DEFUN([ZFS_AC_KERNEL_KTHREAD], [

View File

@ -36,7 +36,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_SZ], [
ZFS_LINUX_TEST_SRC([has_register_sysctl_sz], [
#include <linux/sysctl.h>
],[
struct ctl_table test_table[] __attribute__((unused)) = {0};
struct ctl_table test_table[] __attribute__((unused)) = {{}};
register_sysctl_sz("", test_table, 0);
])
])

View File

@ -1,33 +0,0 @@
dnl #
dnl # Linux 5.18 uses invalidate_folio in lieu of invalidate_page
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_INVALIDATE_FOLIO], [
ZFS_LINUX_TEST_SRC([vfs_has_invalidate_folio], [
#include <linux/fs.h>
static void
test_invalidate_folio(struct folio *folio, size_t offset,
size_t len) {
(void) folio; (void) offset; (void) len;
return;
}
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.invalidate_folio = test_invalidate_folio,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_INVALIDATE_FOLIO], [
dnl #
dnl # Linux 5.18 uses invalidate_folio in lieu of invalidate_page
dnl #
AC_MSG_CHECKING([whether invalidate_folio exists])
ZFS_LINUX_TEST_RESULT([vfs_has_invalidate_folio], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_INVALIDATE_FOLIO, 1, [invalidate_folio exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -0,0 +1,27 @@
dnl #
dnl # Linux 6.0 uses migrate_folio in lieu of migrate_page
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_MIGRATE_FOLIO], [
ZFS_LINUX_TEST_SRC([vfs_has_migrate_folio], [
#include <linux/fs.h>
#include <linux/migrate.h>
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.migrate_folio = migrate_folio,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_MIGRATE_FOLIO], [
dnl #
dnl # Linux 6.0 uses migrate_folio in lieu of migrate_page
dnl #
AC_MSG_CHECKING([whether migrate_folio exists])
ZFS_LINUX_TEST_RESULT([vfs_has_migrate_folio], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_MIGRATE_FOLIO, 1, [migrate_folio exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -1,32 +0,0 @@
dnl #
dnl # Linux 5.19 uses release_folio in lieu of releasepage
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_RELEASE_FOLIO], [
ZFS_LINUX_TEST_SRC([vfs_has_release_folio], [
#include <linux/fs.h>
static bool
test_release_folio(struct folio *folio, gfp_t gfp) {
(void) folio; (void) gfp;
return (0);
}
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.release_folio = test_release_folio,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_RELEASE_FOLIO], [
dnl #
dnl # Linux 5.19 uses release_folio in lieu of releasepage
dnl #
AC_MSG_CHECKING([whether release_folio exists])
ZFS_LINUX_TEST_RESULT([vfs_has_release_folio], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_RELEASE_FOLIO, 1, [release_folio exists])
],[
AC_MSG_RESULT([no])
])
])

View File

@ -77,8 +77,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_SGET
ZFS_AC_KERNEL_SRC_VFS_FILEMAP_DIRTY_FOLIO
ZFS_AC_KERNEL_SRC_VFS_READ_FOLIO
ZFS_AC_KERNEL_SRC_VFS_RELEASE_FOLIO
ZFS_AC_KERNEL_SRC_VFS_INVALIDATE_FOLIO
ZFS_AC_KERNEL_SRC_VFS_MIGRATE_FOLIO
ZFS_AC_KERNEL_SRC_VFS_FSYNC_2ARGS
ZFS_AC_KERNEL_SRC_VFS_DIRECT_IO
ZFS_AC_KERNEL_SRC_VFS_READPAGES
@ -189,8 +188,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_SGET
ZFS_AC_KERNEL_VFS_FILEMAP_DIRTY_FOLIO
ZFS_AC_KERNEL_VFS_READ_FOLIO
ZFS_AC_KERNEL_VFS_RELEASE_FOLIO
ZFS_AC_KERNEL_VFS_INVALIDATE_FOLIO
ZFS_AC_KERNEL_VFS_MIGRATE_FOLIO
ZFS_AC_KERNEL_VFS_FSYNC_2ARGS
ZFS_AC_KERNEL_VFS_DIRECT_IO
ZFS_AC_KERNEL_VFS_READPAGES

View File

@ -33,7 +33,7 @@ AC_DEFUN([ZFS_AC_CONFIG_USER], [
ZFS_AC_CONFIG_USER_MAKEDEV_IN_MKDEV
ZFS_AC_CONFIG_USER_ZFSEXEC
AC_CHECK_FUNCS([execvpe issetugid mlockall strlcat strlcpy gettid])
AC_CHECK_FUNCS([execvpe issetugid mlockall strerror_l strlcat strlcpy gettid])
AC_SUBST(RM)
])

View File

@ -12,14 +12,14 @@ dist_noinst_DATA += %D%/openzfs-libpam-zfs.postinst
dist_noinst_DATA += %D%/openzfs-libpam-zfs.prerm
dist_noinst_DATA += %D%/openzfs-libuutil3.docs
dist_noinst_DATA += %D%/openzfs-libuutil3.install.in
dist_noinst_DATA += %D%/openzfs-libzfs4.docs
dist_noinst_DATA += %D%/openzfs-libzfs4.install.in
dist_noinst_DATA += %D%/openzfs-libzfs6.docs
dist_noinst_DATA += %D%/openzfs-libzfs6.install.in
dist_noinst_DATA += %D%/openzfs-libzfsbootenv1.docs
dist_noinst_DATA += %D%/openzfs-libzfsbootenv1.install.in
dist_noinst_DATA += %D%/openzfs-libzfs-dev.docs
dist_noinst_DATA += %D%/openzfs-libzfs-dev.install.in
dist_noinst_DATA += %D%/openzfs-libzpool5.docs
dist_noinst_DATA += %D%/openzfs-libzpool5.install.in
dist_noinst_DATA += %D%/openzfs-libzpool6.docs
dist_noinst_DATA += %D%/openzfs-libzpool6.install.in
dist_noinst_DATA += %D%/openzfs-python3-pyzfs.install
dist_noinst_DATA += %D%/openzfs-zfs-dkms.config
dist_noinst_DATA += %D%/openzfs-zfs-dkms.dkms

View File

@ -6,6 +6,6 @@ contrib/pyzfs/libzfs_core/bindings/__pycache__/
contrib/pyzfs/pyzfs.egg-info/
debian/openzfs-libnvpair3.install
debian/openzfs-libuutil3.install
debian/openzfs-libzfs4.install
debian/openzfs-libzfs6.install
debian/openzfs-libzfs-dev.install
debian/openzfs-libzpool5.install
debian/openzfs-libzpool6.install

View File

@ -78,9 +78,9 @@ Architecture: linux-any
Depends: libssl-dev | libssl1.0-dev,
openzfs-libnvpair3 (= ${binary:Version}),
openzfs-libuutil3 (= ${binary:Version}),
openzfs-libzfs4 (= ${binary:Version}),
openzfs-libzfs6 (= ${binary:Version}),
openzfs-libzfsbootenv1 (= ${binary:Version}),
openzfs-libzpool5 (= ${binary:Version}),
openzfs-libzpool6 (= ${binary:Version}),
${misc:Depends}
Replaces: libzfslinux-dev
Conflicts: libzfslinux-dev
@ -90,18 +90,18 @@ Description: OpenZFS filesystem development files for Linux
libraries of OpenZFS filesystem.
.
This package includes the development files of libnvpair3, libuutil3,
libzpool5 and libzfs4.
libzpool6 and libzfs6.
Package: openzfs-libzfs4
Package: openzfs-libzfs6
Section: contrib/libs
Architecture: linux-any
Depends: ${misc:Depends}, ${shlibs:Depends}
# The libcurl4 is loaded through dlopen("libcurl.so.4").
# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988521
Recommends: libcurl4
Breaks: libzfs2, libzfs4
Replaces: libzfs2, libzfs4, libzfs4linux
Conflicts: libzfs4linux
Breaks: libzfs2, libzfs4, libzfs4linux, libzfs6linux
Replaces: libzfs2, libzfs4, libzfs4linux, libzfs6linux
Conflicts: libzfs6linux
Description: OpenZFS filesystem library for Linux - general support
OpenZFS is a storage platform that encompasses the functionality of
traditional filesystems and volume managers. It supports data checksums,
@ -123,13 +123,13 @@ Description: OpenZFS filesystem library for Linux - label info support
.
The zfsbootenv library provides support for modifying ZFS label information.
Package: openzfs-libzpool5
Package: openzfs-libzpool6
Section: contrib/libs
Architecture: linux-any
Depends: ${misc:Depends}, ${shlibs:Depends}
Breaks: libzpool2, libzpool5
Replaces: libzpool2, libzpool5, libzpool5linux
Conflicts: libzpool5linux
Breaks: libzpool2, libzpool5, libzpool5linux, libzpool6linux
Replaces: libzpool2, libzpool5, libzpool5linux, libzpool6linux
Conflicts: libzpool6linux
Description: OpenZFS pool library for Linux
OpenZFS is a storage platform that encompasses the functionality of
traditional filesystems and volume managers. It supports data checksums,
@ -246,8 +246,8 @@ Architecture: linux-any
Pre-Depends: ${misc:Pre-Depends}
Depends: openzfs-libnvpair3 (= ${binary:Version}),
openzfs-libuutil3 (= ${binary:Version}),
openzfs-libzfs4 (= ${binary:Version}),
openzfs-libzpool5 (= ${binary:Version}),
openzfs-libzfs6 (= ${binary:Version}),
openzfs-libzpool6 (= ${binary:Version}),
python3,
${misc:Depends},
${shlibs:Depends}

View File

@ -98,6 +98,7 @@ usr/share/man/man8/zpool-attach.8
usr/share/man/man8/zpool-checkpoint.8
usr/share/man/man8/zpool-clear.8
usr/share/man/man8/zpool-create.8
usr/share/man/man8/zpool-ddtprune.8
usr/share/man/man8/zpool-destroy.8
usr/share/man/man8/zpool-detach.8
usr/share/man/man8/zpool-ddtprune.8
@ -113,6 +114,7 @@ usr/share/man/man8/zpool-list.8
usr/share/man/man8/zpool-offline.8
usr/share/man/man8/zpool-online.8
usr/share/man/man8/zpool-prefetch.8
usr/share/man/man8/zpool-prefetch.8
usr/share/man/man8/zpool-reguid.8
usr/share/man/man8/zpool-remove.8
usr/share/man/man8/zpool-reopen.8

View File

@ -344,7 +344,7 @@ mount_fs()
# Need the _original_ datasets mountpoint!
mountpoint=$(get_fs_value "$fs" mountpoint)
ZFS_CMD="mount -o zfsutil -t zfs"
ZFS_CMD="mount.zfs -o zfsutil"
if [ "$mountpoint" = "legacy" ] || [ "$mountpoint" = "none" ]; then
# Can't use the mountpoint property. Might be one of our
# clones. Check the 'org.zol:mountpoint' property set in
@ -359,9 +359,8 @@ mount_fs()
# isn't the root fs.
return 0
fi
# Don't use mount.zfs -o zfsutils for legacy mountpoint
if [ "$mountpoint" = "legacy" ]; then
ZFS_CMD="mount -t zfs"
ZFS_CMD="mount.zfs"
fi
# Last hail-mary: Hope 'rootmnt' is set!
mountpoint=""

View File

@ -276,7 +276,11 @@ _LIBZUTIL_H void update_vdev_config_dev_sysfs_path(nvlist_t *nv,
* Thread-safe strerror() for use in ZFS libraries
*/
static inline char *zfs_strerror(int errnum) {
#ifdef HAVE_STRERROR_L
return (strerror_l(errnum, uselocale(0)));
#else
return (strerror(errnum));
#endif
}
#ifdef __cplusplus

View File

@ -1,10 +1,5 @@
/*
* Copyright (c) 2010 Isilon Systems, Inc.
* Copyright (c) 2010 iXsystems, Inc.
* Copyright (c) 2010 Panasas, Inc.
* Copyright (c) 2013-2016 Mellanox Technologies, Ltd.
* Copyright (c) 2015 François Tigeot
* All rights reserved.
* Copyright (c) 2024 Warner Losh.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@ -26,76 +21,14 @@
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* $FreeBSD$
*/
#ifndef _LINUX_COMPILER_H_
#define _LINUX_COMPILER_H_
#include <sys/cdefs.h>
/*
* FreeBSD's LinuxKPI compiler.h as far back as FreeBSD 12 has what we need,
* except zfs_fallthrough.
*/
#pragma once
#include <compat/linuxkpi/common/include/linux/compiler.h>
#define __user
#define __kernel
#define __safe
#define __force
#define __nocast
#define __iomem
#define __chk_user_ptr(x) ((void)0)
#define __chk_io_ptr(x) ((void)0)
#define __builtin_warning(x, y...) (1)
#define __acquires(x)
#define __releases(x)
#define __acquire(x) do { } while (0)
#define __release(x) do { } while (0)
#define __cond_lock(x, c) (c)
#define __bitwise
#define __devinitdata
#define __deprecated
#define __init
#define __initconst
#define __devinit
#define __devexit
#define __exit
#define __rcu
#define __percpu
#define __weak __weak_symbol
#define __malloc
#define ___stringify(...) #__VA_ARGS__
#define __stringify(...) ___stringify(__VA_ARGS__)
#define __attribute_const__ __attribute__((__const__))
#undef __always_inline
#define __always_inline inline
#define noinline __noinline
#define ____cacheline_aligned __aligned(CACHE_LINE_SIZE)
#define zfs_fallthrough __attribute__((__fallthrough__))
#if !defined(_KERNEL) && !defined(_STANDALONE)
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
#endif
#define typeof(x) __typeof(x)
#define uninitialized_var(x) x = x
#define __maybe_unused __unused
#define __always_unused __unused
#define __must_check __result_use_check
#define __printf(a, b) __printflike(a, b)
#define barrier() __asm__ __volatile__("": : :"memory")
#define ___PASTE(a, b) a##b
#define __PASTE(a, b) ___PASTE(a, b)
#define ACCESS_ONCE(x) (*(volatile __typeof(x) *)&(x))
#define WRITE_ONCE(x, v) do { \
barrier(); \
ACCESS_ONCE(x) = (v); \
barrier(); \
} while (0)
#define lockless_dereference(p) READ_ONCE(p)
#define _AT(T, X) ((T)(X))
#endif /* _LINUX_COMPILER_H_ */

View File

@ -70,15 +70,6 @@ hlist_del(struct hlist_node *n)
n->next->pprev = n->pprev;
}
/* BEGIN CSTYLED */
#define READ_ONCE(x) ({ \
__typeof(x) __var = ({ \
barrier(); \
ACCESS_ONCE(x); \
}); \
barrier(); \
__var; \
})
#define HLIST_HEAD_INIT { }
#define HLIST_HEAD(name) struct hlist_head name = HLIST_HEAD_INIT
#define INIT_HLIST_HEAD(head) (head)->first = NULL

View File

@ -95,10 +95,6 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
#ifndef expect
#define expect(expr, value) (__builtin_expect((expr), (value)))
#endif
#ifndef __linux__
#define likely(expr) expect((expr) != 0, 1)
#define unlikely(expr) expect((expr) != 0, 0)
#endif
#define PANIC(fmt, a...) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, fmt, ## a)
@ -109,7 +105,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
__FILE__, __FUNCTION__, __LINE__))
#define VERIFYF(cond, str, ...) do { \
if (unlikely(!cond)) \
if (unlikely(!(cond))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY(" #cond ") failed " str "\n", __VA_ARGS__);\
} while (0)
@ -205,7 +201,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
"failed (%lld " #OP " %lld) " STR "\n", \
(long long)(_verify3_left), \
(long long)(_verify3_right), \
__VA_ARGS); \
__VA_ARGS__); \
} while (0)
#define VERIFY3UF(LEFT, OP, RIGHT, STR, ...) do { \
@ -217,7 +213,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
"failed (%llu " #OP " %llu) " STR "\n", \
(unsigned long long)(_verify3_left), \
(unsigned long long)(_verify3_right), \
__VA_ARGS); \
__VA_ARGS__); \
} while (0)
#define VERIFY3PF(LEFT, OP, RIGHT, STR, ...) do { \

View File

@ -31,9 +31,9 @@
#include_next <sys/sdt.h>
#ifdef KDTRACE_HOOKS
/* BEGIN CSTYLED */
SDT_PROBE_DECLARE(sdt, , , set__error);
/* BEGIN CSTYLED */
#define SET_ERROR(err) ({ \
SDT_PROBE1(sdt, , , set__error, (uintptr_t)err); \
err; \

View File

@ -50,7 +50,7 @@
#define kfpu_fini() do {} while (0)
#endif
#define simd_stat_init() 0
#define simd_stat_fini() 0
#define simd_stat_init() do {} while (0)
#define simd_stat_fini() do {} while (0)
#endif

View File

@ -68,47 +68,30 @@ enum symfollow { NO_FOLLOW = NOFOLLOW };
#include <vm/vm_object.h>
typedef struct vop_vector vnodeops_t;
#define VOP_FID VOP_VPTOFH
#define vop_fid vop_vptofh
#define vop_fid_args vop_vptofh_args
#define a_fid a_fhp
#define rootvfs (rootvnode == NULL ? NULL : rootvnode->v_mount)
#ifndef IN_BASE
static __inline int
vn_is_readonly(vnode_t *vp)
{
return (vp->v_mount->mnt_flag & MNT_RDONLY);
}
#endif
#define vn_vfswlock(vp) (0)
#define vn_vfsunlock(vp) do { } while (0)
#define vn_ismntpt(vp) \
((vp)->v_type == VDIR && (vp)->v_mountedhere != NULL)
#define vn_mountedvfs(vp) ((vp)->v_mountedhere)
#ifndef IN_BASE
#define vn_has_cached_data(vp) \
((vp)->v_object != NULL && \
(vp)->v_object->resident_page_count > 0)
#ifndef IN_BASE
static __inline void
vn_flush_cached_data(vnode_t *vp, boolean_t sync)
{
if (vm_object_mightbedirty(vp->v_object)) {
int flags = sync ? OBJPC_SYNC : 0;
vn_lock(vp, LK_SHARED | LK_RETRY);
zfs_vmobject_wlock(vp->v_object);
vm_object_page_clean(vp->v_object, 0, 0, flags);
zfs_vmobject_wunlock(vp->v_object);
VOP_UNLOCK(vp);
}
}
#endif
#define vn_exists(vp) do { } while (0)
#define vn_invalid(vp) do { } while (0)
#define vn_free(vp) do { } while (0)
#define vn_matchops(vp, vops) ((vp)->v_op == &(vops))
#define VN_HOLD(v) vref(v)
@ -123,9 +106,6 @@ vn_flush_cached_data(vnode_t *vp, boolean_t sync)
#define vnevent_rename_dest(vp, dvp, name, ct) do { } while (0)
#define vnevent_rename_dest_dir(vp, ct) do { } while (0)
#define specvp(vp, rdev, type, cr) (VN_HOLD(vp), (vp))
#define MANDLOCK(vp, mode) (0)
/*
* We will use va_spare is place of Solaris' va_mask.
* This field is initialized in zfs_setattr().

View File

@ -26,8 +26,10 @@
#ifndef _ABD_OS_H
#define _ABD_OS_H
#ifdef _KERNEL
#include <sys/vm.h>
#include <vm/vm_page.h>
#endif
#ifdef __cplusplus
extern "C" {
@ -47,8 +49,10 @@ struct abd_linear {
#endif
};
#ifdef _KERNEL
__attribute__((malloc))
struct abd *abd_alloc_from_pages(vm_page_t *, unsigned long, uint64_t);
#endif
#ifdef __cplusplus
}

View File

@ -109,7 +109,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
__FILE__, __FUNCTION__, __LINE__))
#define VERIFYF(cond, str, ...) do { \
if (unlikely(!cond)) \
if (unlikely(!(cond))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY(" #cond ") failed " str "\n", __VA_ARGS__);\
} while (0)
@ -205,7 +205,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
"failed (%lld " #OP " %lld) " STR "\n", \
(long long)(_verify3_left), \
(long long)(_verify3_right), \
__VA_ARGS); \
__VA_ARGS__); \
} while (0)
#define VERIFY3UF(LEFT, OP, RIGHT, STR, ...) do { \
@ -217,7 +217,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
"failed (%llu " #OP " %llu) " STR "\n", \
(unsigned long long)(_verify3_left), \
(unsigned long long)(_verify3_right), \
__VA_ARGS); \
__VA_ARGS__); \
} while (0)
#define VERIFY3PF(LEFT, OP, RIGHT, STR, ...) do { \

View File

@ -38,8 +38,7 @@
#include <sys/rwlock.h>
#include <sys/wait.h>
#include <sys/wmsum.h>
typedef struct kstat_s kstat_t;
#include <sys/kstat.h>
#define TASKQ_NAMELEN 31

View File

@ -42,7 +42,7 @@
#define TS_ZOMB EXIT_ZOMBIE
#define TS_STOPPED TASK_STOPPED
typedef void (*thread_func_t)(void *);
typedef void (*thread_func_t)(void *) __attribute__((noreturn));
#define thread_create_named(name, stk, stksize, func, arg, len, \
pp, state, pri) \

View File

@ -30,6 +30,8 @@
extern "C" {
#endif
struct abd;
struct abd_scatter {
uint_t abd_offset;
uint_t abd_nents;
@ -41,10 +43,8 @@ struct abd_linear {
struct scatterlist *abd_sgl; /* for LINEAR_PAGE */
};
typedef struct abd abd_t;
typedef int abd_iter_page_func_t(struct page *, size_t, size_t, void *);
int abd_iterate_page_func(abd_t *, size_t, size_t, abd_iter_page_func_t *,
int abd_iterate_page_func(struct abd *, size_t, size_t, abd_iter_page_func_t *,
void *);
/*
@ -52,11 +52,11 @@ int abd_iterate_page_func(abd_t *, size_t, size_t, abd_iter_page_func_t *,
* Note: these are only needed to support vdev_classic. See comment in
* vdev_disk.c.
*/
unsigned int abd_bio_map_off(struct bio *, abd_t *, unsigned int, size_t);
unsigned long abd_nr_pages_off(abd_t *, unsigned int, size_t);
unsigned int abd_bio_map_off(struct bio *, struct abd *, unsigned int, size_t);
unsigned long abd_nr_pages_off(struct abd *, unsigned int, size_t);
__attribute__((malloc))
abd_t *abd_alloc_from_pages(struct page **, unsigned long, uint64_t);
struct abd *abd_alloc_from_pages(struct page **, unsigned long, uint64_t);
#ifdef __cplusplus
}

View File

@ -69,6 +69,7 @@ typedef struct vfs {
boolean_t vfs_do_relatime;
boolean_t vfs_nbmand;
boolean_t vfs_do_nbmand;
kmutex_t vfs_mntpt_lock;
} vfs_t;
typedef struct zfs_mnt {

View File

@ -347,6 +347,7 @@ void l2arc_fini(void);
void l2arc_start(void);
void l2arc_stop(void);
void l2arc_spa_rebuild_start(spa_t *spa);
void l2arc_spa_rebuild_stop(spa_t *spa);
#ifndef _KERNEL
extern boolean_t arc_watch;

View File

@ -942,6 +942,7 @@ typedef struct arc_sums {
wmsum_t arcstat_evict_l2_eligible_mru;
wmsum_t arcstat_evict_l2_ineligible;
wmsum_t arcstat_evict_l2_skip;
wmsum_t arcstat_hash_elements;
wmsum_t arcstat_hash_collisions;
wmsum_t arcstat_hash_chains;
aggsum_t arcstat_size;

View File

@ -86,28 +86,38 @@ typedef struct brt_vdev_phys {
uint64_t bvp_savedspace;
} brt_vdev_phys_t;
typedef struct brt_vdev {
struct brt_vdev {
/*
* Pending changes from open contexts.
*/
kmutex_t bv_pending_lock;
avl_tree_t bv_pending_tree[TXG_SIZE];
/*
* Protects bv_mos_*.
*/
krwlock_t bv_mos_entries_lock ____cacheline_aligned;
/*
* Protects all the fields starting from bv_initiated.
*/
krwlock_t bv_lock ____cacheline_aligned;
/*
* VDEV id.
*/
uint64_t bv_vdevid;
/*
* Is the structure initiated?
* (bv_entcount and bv_bitmap are allocated?)
*/
boolean_t bv_initiated;
uint64_t bv_vdevid ____cacheline_aligned;
/*
* Object number in the MOS for the entcount array and brt_vdev_phys.
*/
uint64_t bv_mos_brtvdev;
/*
* Object number in the MOS for the entries table.
* Object number in the MOS and dnode for the entries table.
*/
uint64_t bv_mos_entries;
dnode_t *bv_mos_entries_dnode;
/*
* Entries to sync.
* Is the structure initiated?
* (bv_entcount and bv_bitmap are allocated?)
*/
avl_tree_t bv_tree;
boolean_t bv_initiated;
/*
* Does the bv_entcount[] array needs byte swapping?
*/
@ -120,6 +130,26 @@ typedef struct brt_vdev {
* This is the array with BRT entry count per BRT_RANGESIZE.
*/
uint16_t *bv_entcount;
/*
* bv_entcount[] potentially can be a bit too big to sychronize it all
* when we just changed few entcounts. The fields below allow us to
* track updates to bv_entcount[] array since the last sync.
* A single bit in the bv_bitmap represents as many entcounts as can
* fit into a single BRT_BLOCKSIZE.
* For example we have 65536 entcounts in the bv_entcount array
* (so the whole array is 128kB). We updated bv_entcount[2] and
* bv_entcount[5]. In that case only first bit in the bv_bitmap will
* be set and we will write only first BRT_BLOCKSIZE out of 128kB.
*/
ulong_t *bv_bitmap;
/*
* bv_entcount[] needs updating on disk.
*/
boolean_t bv_entcount_dirty;
/*
* brt_vdev_phys needs updating on disk.
*/
boolean_t bv_meta_dirty;
/*
* Sum of all bv_entcount[]s.
*/
@ -133,65 +163,27 @@ typedef struct brt_vdev {
*/
uint64_t bv_savedspace;
/*
* brt_vdev_phys needs updating on disk.
* Entries to sync.
*/
boolean_t bv_meta_dirty;
/*
* bv_entcount[] needs updating on disk.
*/
boolean_t bv_entcount_dirty;
/*
* bv_entcount[] potentially can be a bit too big to sychronize it all
* when we just changed few entcounts. The fields below allow us to
* track updates to bv_entcount[] array since the last sync.
* A single bit in the bv_bitmap represents as many entcounts as can
* fit into a single BRT_BLOCKSIZE.
* For example we have 65536 entcounts in the bv_entcount array
* (so the whole array is 128kB). We updated bv_entcount[2] and
* bv_entcount[5]. In that case only first bit in the bv_bitmap will
* be set and we will write only first BRT_BLOCKSIZE out of 128kB.
*/
ulong_t *bv_bitmap;
uint64_t bv_nblocks;
} brt_vdev_t;
avl_tree_t bv_tree;
};
/*
* In-core brt
*/
typedef struct brt {
krwlock_t brt_lock;
spa_t *brt_spa;
#define brt_mos brt_spa->spa_meta_objset
uint64_t brt_rangesize;
uint64_t brt_usedspace;
uint64_t brt_savedspace;
avl_tree_t brt_pending_tree[TXG_SIZE];
kmutex_t brt_pending_lock[TXG_SIZE];
/* Sum of all entries across all bv_trees. */
uint64_t brt_nentries;
brt_vdev_t *brt_vdevs;
uint64_t brt_nvdevs;
} brt_t;
/* Size of bre_offset / sizeof (uint64_t). */
/* Size of offset / sizeof (uint64_t). */
#define BRT_KEY_WORDS (1)
#define BRE_OFFSET(bre) (DVA_GET_OFFSET(&(bre)->bre_bp.blk_dva[0]))
/*
* In-core brt entry.
* On-disk we use bre_offset as the key and bre_refcount as the value.
* On-disk we use ZAP with offset as the key and count as the value.
*/
typedef struct brt_entry {
uint64_t bre_offset;
uint64_t bre_refcount;
avl_node_t bre_node;
blkptr_t bre_bp;
uint64_t bre_count;
uint64_t bre_pcount;
} brt_entry_t;
typedef struct brt_pending_entry {
blkptr_t bpe_bp;
int bpe_count;
avl_node_t bpe_node;
} brt_pending_entry_t;
#ifdef __cplusplus
}
#endif

View File

@ -171,7 +171,6 @@ typedef struct dbuf_dirty_record {
* gets COW'd in a subsequent transaction group.
*/
arc_buf_t *dr_data;
blkptr_t dr_overridden_by;
override_states_t dr_override_state;
uint8_t dr_copies;
boolean_t dr_nopwrite;
@ -179,14 +178,21 @@ typedef struct dbuf_dirty_record {
boolean_t dr_diowrite;
boolean_t dr_has_raw_params;
/* Override and raw params are mutually exclusive. */
union {
blkptr_t dr_overridden_by;
struct {
/*
* If dr_has_raw_params is set, the following crypt
* params will be set on the BP that's written.
* If dr_has_raw_params is set, the
* following crypt params will be set
* on the BP that's written.
*/
boolean_t dr_byteorder;
uint8_t dr_salt[ZIO_DATA_SALT_LEN];
uint8_t dr_iv[ZIO_DATA_IV_LEN];
uint8_t dr_mac[ZIO_DATA_MAC_LEN];
};
};
} dl;
struct dirty_lightweight_leaf {
/*
@ -264,6 +270,27 @@ typedef struct dmu_buf_impl {
*/
uint8_t db_level;
/* This block was freed while a read or write was active. */
uint8_t db_freed_in_flight;
/*
* Evict user data as soon as the dirty and reference counts are equal.
*/
uint8_t db_user_immediate_evict;
/*
* dnode_evict_dbufs() or dnode_evict_bonus() tried to evict this dbuf,
* but couldn't due to outstanding references. Evict once the refcount
* drops to 0.
*/
uint8_t db_pending_evict;
/* Number of TXGs in which this buffer is dirty. */
uint8_t db_dirtycnt;
/* The buffer was partially read. More reads may follow. */
uint8_t db_partial_read;
/*
* Protects db_buf's contents if they contain an indirect block or data
* block of the meta-dnode. We use this lock to protect the structure of
@ -288,6 +315,9 @@ typedef struct dmu_buf_impl {
*/
dbuf_states_t db_state;
/* In which dbuf cache this dbuf is, if any. */
dbuf_cached_state_t db_caching_status;
/*
* Refcount accessed by dmu_buf_{hold,rele}.
* If nonzero, the buffer can't be destroyed.
@ -304,39 +334,10 @@ typedef struct dmu_buf_impl {
/* Link in dbuf_cache or dbuf_metadata_cache */
multilist_node_t db_cache_link;
/* Tells us which dbuf cache this dbuf is in, if any */
dbuf_cached_state_t db_caching_status;
uint64_t db_hash;
/* Data which is unique to data (leaf) blocks: */
/* User callback information. */
dmu_buf_user_t *db_user;
/*
* Evict user data as soon as the dirty and reference
* counts are equal.
*/
uint8_t db_user_immediate_evict;
/*
* This block was freed while a read or write was
* active.
*/
uint8_t db_freed_in_flight;
/*
* dnode_evict_dbufs() or dnode_evict_bonus() tried to
* evict this dbuf, but couldn't due to outstanding
* references. Evict once the refcount drops to 0.
*/
uint8_t db_pending_evict;
uint8_t db_dirtycnt;
/* The buffer was partially read. More reads may follow. */
uint8_t db_partial_read;
} dmu_buf_impl_t;
#define DBUF_HASH_MUTEX(h, idx) \
@ -351,6 +352,8 @@ typedef struct dbuf_hash_table {
typedef void (*dbuf_prefetch_fn)(void *, uint64_t, uint64_t, boolean_t);
extern kmem_cache_t *dbuf_dirty_kmem_cache;
uint64_t dbuf_whichblock(const struct dnode *di, const int64_t level,
const uint64_t offset);

View File

@ -381,6 +381,7 @@ typedef struct dmu_buf {
#define DMU_POOL_CREATION_VERSION "creation_version"
#define DMU_POOL_SCAN "scan"
#define DMU_POOL_ERRORSCRUB "error_scrub"
#define DMU_POOL_LAST_SCRUBBED_TXG "last_scrubbed_txg"
#define DMU_POOL_FREE_BPOBJ "free_bpobj"
#define DMU_POOL_BPTREE_OBJ "bptree_obj"
#define DMU_POOL_EMPTY_BPOBJ "empty_bpobj"

View File

@ -89,7 +89,7 @@ extern int zfs_livelist_min_percent_shared;
typedef int deadlist_iter_t(void *args, dsl_deadlist_entry_t *dle);
void dsl_deadlist_open(dsl_deadlist_t *dl, objset_t *os, uint64_t object);
int dsl_deadlist_open(dsl_deadlist_t *dl, objset_t *os, uint64_t object);
void dsl_deadlist_close(dsl_deadlist_t *dl);
void dsl_deadlist_iterate(dsl_deadlist_t *dl, deadlist_iter_t func, void *arg);
uint64_t dsl_deadlist_alloc(objset_t *os, dmu_tx_t *tx);

View File

@ -198,7 +198,7 @@ void dsl_dir_set_reservation_sync_impl(dsl_dir_t *dd, uint64_t value,
dmu_tx_t *tx);
void dsl_dir_zapify(dsl_dir_t *dd, dmu_tx_t *tx);
boolean_t dsl_dir_is_zapified(dsl_dir_t *dd);
void dsl_dir_livelist_open(dsl_dir_t *dd, uint64_t obj);
int dsl_dir_livelist_open(dsl_dir_t *dd, uint64_t obj);
void dsl_dir_livelist_close(dsl_dir_t *dd);
void dsl_dir_remove_livelist(dsl_dir_t *dd, dmu_tx_t *tx, boolean_t total);
int dsl_dir_wait(dsl_dir_t *dd, dsl_dataset_t *ds, zfs_wait_activity_t activity,

View File

@ -179,6 +179,12 @@ typedef struct dsl_scan {
dsl_errorscrub_phys_t errorscrub_phys;
} dsl_scan_t;
typedef struct {
pool_scan_func_t func;
uint64_t txgstart;
uint64_t txgend;
} setup_sync_arg_t;
typedef struct dsl_scan_io_queue dsl_scan_io_queue_t;
void scan_init(void);
@ -189,7 +195,8 @@ void dsl_scan_setup_sync(void *, dmu_tx_t *);
void dsl_scan_fini(struct dsl_pool *dp);
void dsl_scan_sync(struct dsl_pool *, dmu_tx_t *);
int dsl_scan_cancel(struct dsl_pool *);
int dsl_scan(struct dsl_pool *, pool_scan_func_t);
int dsl_scan(struct dsl_pool *, pool_scan_func_t, uint64_t starttxg,
uint64_t txgend);
void dsl_scan_assess_vdev(struct dsl_pool *dp, vdev_t *vd);
boolean_t dsl_scan_scrubbing(const struct dsl_pool *dp);
boolean_t dsl_errorscrubbing(const struct dsl_pool *dp);

View File

@ -42,7 +42,8 @@ extern "C" {
#define FM_EREPORT_ZFS_DATA "data"
#define FM_EREPORT_ZFS_DELAY "delay"
#define FM_EREPORT_ZFS_DEADMAN "deadman"
#define FM_EREPORT_ZFS_DIO_VERIFY "dio_verify"
#define FM_EREPORT_ZFS_DIO_VERIFY_WR "dio_verify_wr"
#define FM_EREPORT_ZFS_DIO_VERIFY_RD "dio_verify_rd"
#define FM_EREPORT_ZFS_POOL "zpool"
#define FM_EREPORT_ZFS_DEVICE_UNKNOWN "vdev.unknown"
#define FM_EREPORT_ZFS_DEVICE_OPEN_FAILED "vdev.open_failed"

View File

@ -265,6 +265,7 @@ typedef enum {
ZPOOL_PROP_DEDUP_TABLE_SIZE,
ZPOOL_PROP_DEDUP_TABLE_QUOTA,
ZPOOL_PROP_DEDUPCACHED,
ZPOOL_PROP_LAST_SCRUBBED_TXG,
ZPOOL_NUM_PROPS
} zpool_prop_t;
@ -1088,6 +1089,7 @@ typedef enum pool_scan_func {
typedef enum pool_scrub_cmd {
POOL_SCRUB_NORMAL = 0,
POOL_SCRUB_PAUSE,
POOL_SCRUB_FROM_LAST_TXG,
POOL_SCRUB_FLAGS_END
} pool_scrub_cmd_t;

View File

@ -53,6 +53,7 @@ extern "C" {
/*
* Forward references that lots of things need.
*/
typedef struct brt_vdev brt_vdev_t;
typedef struct spa spa_t;
typedef struct vdev vdev_t;
typedef struct metaslab metaslab_t;
@ -821,6 +822,8 @@ extern void spa_l2cache_drop(spa_t *spa);
/* scanning */
extern int spa_scan(spa_t *spa, pool_scan_func_t func);
extern int spa_scan_range(spa_t *spa, pool_scan_func_t func, uint64_t txgstart,
uint64_t txgend);
extern int spa_scan_stop(spa_t *spa);
extern int spa_scrub_pause_resume(spa_t *spa, pool_scrub_cmd_t flag);
@ -1079,6 +1082,7 @@ extern uint64_t spa_get_deadman_failmode(spa_t *spa);
extern void spa_set_deadman_failmode(spa_t *spa, const char *failmode);
extern boolean_t spa_suspended(spa_t *spa);
extern uint64_t spa_bootfs(spa_t *spa);
extern uint64_t spa_get_last_scrubbed_txg(spa_t *spa);
extern uint64_t spa_delegation(spa_t *spa);
extern objset_t *spa_meta_objset(spa_t *spa);
extern space_map_t *spa_syncing_log_sm(spa_t *spa);

View File

@ -318,6 +318,7 @@ struct spa {
uint64_t spa_scan_pass_scrub_spent_paused; /* total paused */
uint64_t spa_scan_pass_exam; /* examined bytes per pass */
uint64_t spa_scan_pass_issued; /* issued bytes per pass */
uint64_t spa_scrubbed_last_txg; /* last txg scrubbed */
/* error scrub pause time in milliseconds */
uint64_t spa_scan_pass_errorscrub_pause;
@ -412,8 +413,12 @@ struct spa {
uint64_t spa_dedup_dspace; /* Cache get_dedup_dspace() */
uint64_t spa_dedup_checksum; /* default dedup checksum */
uint64_t spa_dspace; /* dspace in normal class */
uint64_t spa_rdspace; /* raw (non-dedup) --//-- */
boolean_t spa_active_ddt_prune; /* ddt prune process active */
struct brt *spa_brt; /* in-core BRT */
brt_vdev_t **spa_brt_vdevs; /* array of per-vdev BRTs */
uint64_t spa_brt_nvdevs; /* number of vdevs in BRT */
uint64_t spa_brt_rangesize; /* pool's BRT range size */
krwlock_t spa_brt_lock; /* Protects brt_vdevs/nvdevs */
kmutex_t spa_vdev_top_lock; /* dueling offline/remove */
kmutex_t spa_proc_lock; /* protects spa_proc* */
kcondvar_t spa_proc_cv; /* spa_proc_state transitions */

View File

@ -57,7 +57,7 @@ void vdev_raidz_reconstruct(struct raidz_map *, const int *, int);
void vdev_raidz_child_done(zio_t *);
void vdev_raidz_io_done(zio_t *);
void vdev_raidz_checksum_error(zio_t *, struct raidz_col *, abd_t *);
struct raidz_row *vdev_raidz_row_alloc(int);
struct raidz_row *vdev_raidz_row_alloc(int, zio_t *);
void vdev_raidz_reflow_copy_scratch(spa_t *);
void raidz_dtl_reassessed(vdev_t *);

View File

@ -223,11 +223,15 @@ int zap_lookup_norm(objset_t *ds, uint64_t zapobj, const char *name,
boolean_t *normalization_conflictp);
int zap_lookup_uint64(objset_t *os, uint64_t zapobj, const uint64_t *key,
int key_numints, uint64_t integer_size, uint64_t num_integers, void *buf);
int zap_lookup_uint64_by_dnode(dnode_t *dn, const uint64_t *key,
int key_numints, uint64_t integer_size, uint64_t num_integers, void *buf);
int zap_contains(objset_t *ds, uint64_t zapobj, const char *name);
int zap_prefetch(objset_t *os, uint64_t zapobj, const char *name);
int zap_prefetch_object(objset_t *os, uint64_t zapobj);
int zap_prefetch_uint64(objset_t *os, uint64_t zapobj, const uint64_t *key,
int key_numints);
int zap_prefetch_uint64_by_dnode(dnode_t *dn, const uint64_t *key,
int key_numints);
int zap_lookup_by_dnode(dnode_t *dn, const char *name,
uint64_t integer_size, uint64_t num_integers, void *buf);
@ -236,9 +240,6 @@ int zap_lookup_norm_by_dnode(dnode_t *dn, const char *name,
matchtype_t mt, char *realname, int rn_len,
boolean_t *ncp);
int zap_count_write_by_dnode(dnode_t *dn, const char *name,
int add, zfs_refcount_t *towrite, zfs_refcount_t *tooverwrite);
/*
* Create an attribute with the given name and value.
*

View File

@ -208,25 +208,25 @@ typedef uint64_t zio_flag_t;
#define ZIO_FLAG_PROBE (1ULL << 16)
#define ZIO_FLAG_TRYHARD (1ULL << 17)
#define ZIO_FLAG_OPTIONAL (1ULL << 18)
#define ZIO_FLAG_DIO_READ (1ULL << 19)
#define ZIO_FLAG_VDEV_INHERIT (ZIO_FLAG_DONT_QUEUE - 1)
/*
* Flags not inherited by any children.
*/
#define ZIO_FLAG_DONT_QUEUE (1ULL << 19) /* must be first for INHERIT */
#define ZIO_FLAG_DONT_PROPAGATE (1ULL << 20)
#define ZIO_FLAG_IO_BYPASS (1ULL << 21)
#define ZIO_FLAG_IO_REWRITE (1ULL << 22)
#define ZIO_FLAG_RAW_COMPRESS (1ULL << 23)
#define ZIO_FLAG_RAW_ENCRYPT (1ULL << 24)
#define ZIO_FLAG_GANG_CHILD (1ULL << 25)
#define ZIO_FLAG_DDT_CHILD (1ULL << 26)
#define ZIO_FLAG_GODFATHER (1ULL << 27)
#define ZIO_FLAG_NOPWRITE (1ULL << 28)
#define ZIO_FLAG_REEXECUTED (1ULL << 29)
#define ZIO_FLAG_DELEGATED (1ULL << 30)
#define ZIO_FLAG_DIO_CHKSUM_ERR (1ULL << 31)
#define ZIO_FLAG_DONT_QUEUE (1ULL << 20) /* must be first for INHERIT */
#define ZIO_FLAG_DONT_PROPAGATE (1ULL << 21)
#define ZIO_FLAG_IO_BYPASS (1ULL << 22)
#define ZIO_FLAG_IO_REWRITE (1ULL << 23)
#define ZIO_FLAG_RAW_COMPRESS (1ULL << 24)
#define ZIO_FLAG_RAW_ENCRYPT (1ULL << 25)
#define ZIO_FLAG_GANG_CHILD (1ULL << 26)
#define ZIO_FLAG_DDT_CHILD (1ULL << 27)
#define ZIO_FLAG_GODFATHER (1ULL << 28)
#define ZIO_FLAG_NOPWRITE (1ULL << 29)
#define ZIO_FLAG_REEXECUTED (1ULL << 30)
#define ZIO_FLAG_DELEGATED (1ULL << 31)
#define ZIO_FLAG_DIO_CHKSUM_ERR (1ULL << 32)
#define ZIO_ALLOCATOR_NONE (-1)
#define ZIO_HAS_ALLOCATOR(zio) ((zio)->io_allocator != ZIO_ALLOCATOR_NONE)
@ -647,6 +647,7 @@ extern void zio_vdev_io_redone(zio_t *zio);
extern void zio_change_priority(zio_t *pio, zio_priority_t priority);
extern void zio_checksum_verified(zio_t *zio);
extern void zio_dio_chksum_verify_error_report(zio_t *zio);
extern int zio_worst_error(int e1, int e2);
extern enum zio_checksum zio_checksum_select(enum zio_checksum child,

View File

@ -35,7 +35,6 @@
(void) __atomic_add_fetch(target, 1, __ATOMIC_SEQ_CST); \
}
/* BEGIN CSTYLED */
ATOMIC_INC(8, uint8_t)
ATOMIC_INC(16, uint16_t)
ATOMIC_INC(32, uint32_t)
@ -44,7 +43,6 @@ ATOMIC_INC(uchar, uchar_t)
ATOMIC_INC(ushort, ushort_t)
ATOMIC_INC(uint, uint_t)
ATOMIC_INC(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_DEC(name, type) \
@ -53,7 +51,6 @@ ATOMIC_INC(ulong, ulong_t)
(void) __atomic_sub_fetch(target, 1, __ATOMIC_SEQ_CST); \
}
/* BEGIN CSTYLED */
ATOMIC_DEC(8, uint8_t)
ATOMIC_DEC(16, uint16_t)
ATOMIC_DEC(32, uint32_t)
@ -62,7 +59,6 @@ ATOMIC_DEC(uchar, uchar_t)
ATOMIC_DEC(ushort, ushort_t)
ATOMIC_DEC(uint, uint_t)
ATOMIC_DEC(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_ADD(name, type1, type2) \
@ -77,7 +73,6 @@ atomic_add_ptr(volatile void *target, ssize_t bits)
(void) __atomic_add_fetch((void **)target, bits, __ATOMIC_SEQ_CST);
}
/* BEGIN CSTYLED */
ATOMIC_ADD(8, uint8_t, int8_t)
ATOMIC_ADD(16, uint16_t, int16_t)
ATOMIC_ADD(32, uint32_t, int32_t)
@ -86,7 +81,6 @@ ATOMIC_ADD(char, uchar_t, signed char)
ATOMIC_ADD(short, ushort_t, short)
ATOMIC_ADD(int, uint_t, int)
ATOMIC_ADD(long, ulong_t, long)
/* END CSTYLED */
#define ATOMIC_SUB(name, type1, type2) \
@ -101,7 +95,6 @@ atomic_sub_ptr(volatile void *target, ssize_t bits)
(void) __atomic_sub_fetch((void **)target, bits, __ATOMIC_SEQ_CST);
}
/* BEGIN CSTYLED */
ATOMIC_SUB(8, uint8_t, int8_t)
ATOMIC_SUB(16, uint16_t, int16_t)
ATOMIC_SUB(32, uint32_t, int32_t)
@ -110,7 +103,6 @@ ATOMIC_SUB(char, uchar_t, signed char)
ATOMIC_SUB(short, ushort_t, short)
ATOMIC_SUB(int, uint_t, int)
ATOMIC_SUB(long, ulong_t, long)
/* END CSTYLED */
#define ATOMIC_OR(name, type) \
@ -119,7 +111,6 @@ ATOMIC_SUB(long, ulong_t, long)
(void) __atomic_or_fetch(target, bits, __ATOMIC_SEQ_CST); \
}
/* BEGIN CSTYLED */
ATOMIC_OR(8, uint8_t)
ATOMIC_OR(16, uint16_t)
ATOMIC_OR(32, uint32_t)
@ -128,7 +119,6 @@ ATOMIC_OR(uchar, uchar_t)
ATOMIC_OR(ushort, ushort_t)
ATOMIC_OR(uint, uint_t)
ATOMIC_OR(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_AND(name, type) \
@ -137,7 +127,6 @@ ATOMIC_OR(ulong, ulong_t)
(void) __atomic_and_fetch(target, bits, __ATOMIC_SEQ_CST); \
}
/* BEGIN CSTYLED */
ATOMIC_AND(8, uint8_t)
ATOMIC_AND(16, uint16_t)
ATOMIC_AND(32, uint32_t)
@ -146,7 +135,6 @@ ATOMIC_AND(uchar, uchar_t)
ATOMIC_AND(ushort, ushort_t)
ATOMIC_AND(uint, uint_t)
ATOMIC_AND(ulong, ulong_t)
/* END CSTYLED */
/*
@ -159,7 +147,6 @@ ATOMIC_AND(ulong, ulong_t)
return (__atomic_add_fetch(target, 1, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_INC_NV(8, uint8_t)
ATOMIC_INC_NV(16, uint16_t)
ATOMIC_INC_NV(32, uint32_t)
@ -168,7 +155,6 @@ ATOMIC_INC_NV(uchar, uchar_t)
ATOMIC_INC_NV(ushort, ushort_t)
ATOMIC_INC_NV(uint, uint_t)
ATOMIC_INC_NV(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_DEC_NV(name, type) \
@ -177,7 +163,6 @@ ATOMIC_INC_NV(ulong, ulong_t)
return (__atomic_sub_fetch(target, 1, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_DEC_NV(8, uint8_t)
ATOMIC_DEC_NV(16, uint16_t)
ATOMIC_DEC_NV(32, uint32_t)
@ -186,7 +171,6 @@ ATOMIC_DEC_NV(uchar, uchar_t)
ATOMIC_DEC_NV(ushort, ushort_t)
ATOMIC_DEC_NV(uint, uint_t)
ATOMIC_DEC_NV(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_ADD_NV(name, type1, type2) \
@ -201,7 +185,6 @@ atomic_add_ptr_nv(volatile void *target, ssize_t bits)
return (__atomic_add_fetch((void **)target, bits, __ATOMIC_SEQ_CST));
}
/* BEGIN CSTYLED */
ATOMIC_ADD_NV(8, uint8_t, int8_t)
ATOMIC_ADD_NV(16, uint16_t, int16_t)
ATOMIC_ADD_NV(32, uint32_t, int32_t)
@ -210,7 +193,6 @@ ATOMIC_ADD_NV(char, uchar_t, signed char)
ATOMIC_ADD_NV(short, ushort_t, short)
ATOMIC_ADD_NV(int, uint_t, int)
ATOMIC_ADD_NV(long, ulong_t, long)
/* END CSTYLED */
#define ATOMIC_SUB_NV(name, type1, type2) \
@ -225,7 +207,6 @@ atomic_sub_ptr_nv(volatile void *target, ssize_t bits)
return (__atomic_sub_fetch((void **)target, bits, __ATOMIC_SEQ_CST));
}
/* BEGIN CSTYLED */
ATOMIC_SUB_NV(8, uint8_t, int8_t)
ATOMIC_SUB_NV(char, uchar_t, signed char)
ATOMIC_SUB_NV(16, uint16_t, int16_t)
@ -234,7 +215,6 @@ ATOMIC_SUB_NV(32, uint32_t, int32_t)
ATOMIC_SUB_NV(int, uint_t, int)
ATOMIC_SUB_NV(long, ulong_t, long)
ATOMIC_SUB_NV(64, uint64_t, int64_t)
/* END CSTYLED */
#define ATOMIC_OR_NV(name, type) \
@ -243,7 +223,6 @@ ATOMIC_SUB_NV(64, uint64_t, int64_t)
return (__atomic_or_fetch(target, bits, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_OR_NV(8, uint8_t)
ATOMIC_OR_NV(16, uint16_t)
ATOMIC_OR_NV(32, uint32_t)
@ -252,7 +231,6 @@ ATOMIC_OR_NV(uchar, uchar_t)
ATOMIC_OR_NV(ushort, ushort_t)
ATOMIC_OR_NV(uint, uint_t)
ATOMIC_OR_NV(ulong, ulong_t)
/* END CSTYLED */
#define ATOMIC_AND_NV(name, type) \
@ -261,7 +239,6 @@ ATOMIC_OR_NV(ulong, ulong_t)
return (__atomic_and_fetch(target, bits, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_AND_NV(8, uint8_t)
ATOMIC_AND_NV(16, uint16_t)
ATOMIC_AND_NV(32, uint32_t)
@ -270,7 +247,6 @@ ATOMIC_AND_NV(uchar, uchar_t)
ATOMIC_AND_NV(ushort, ushort_t)
ATOMIC_AND_NV(uint, uint_t)
ATOMIC_AND_NV(ulong, ulong_t)
/* END CSTYLED */
/*
@ -300,7 +276,6 @@ atomic_cas_ptr(volatile void *target, void *exp, void *des)
return (exp);
}
/* BEGIN CSTYLED */
ATOMIC_CAS(8, uint8_t)
ATOMIC_CAS(16, uint16_t)
ATOMIC_CAS(32, uint32_t)
@ -309,7 +284,6 @@ ATOMIC_CAS(uchar, uchar_t)
ATOMIC_CAS(ushort, ushort_t)
ATOMIC_CAS(uint, uint_t)
ATOMIC_CAS(ulong, ulong_t)
/* END CSTYLED */
/*
@ -322,7 +296,6 @@ ATOMIC_CAS(ulong, ulong_t)
return (__atomic_exchange_n(target, bits, __ATOMIC_SEQ_CST)); \
}
/* BEGIN CSTYLED */
ATOMIC_SWAP(8, uint8_t)
ATOMIC_SWAP(16, uint16_t)
ATOMIC_SWAP(32, uint32_t)
@ -331,7 +304,6 @@ ATOMIC_SWAP(uchar, uchar_t)
ATOMIC_SWAP(ushort, ushort_t)
ATOMIC_SWAP(uint, uint_t)
ATOMIC_SWAP(ulong, ulong_t)
/* END CSTYLED */
void *
atomic_swap_ptr(volatile void *target, void *bits)

View File

@ -25,19 +25,32 @@
#include <sys/backtrace.h>
#include <sys/types.h>
#include <sys/debug.h>
#include <unistd.h>
/*
* libspl_backtrace() must be safe to call from inside a signal hander. This
* mostly means it must not allocate, and so we can't use things like printf.
* Output helpers. libspl_backtrace() must not block, must be thread-safe and
* must be safe to call from a signal handler. At least, that means not having
* printf, so we end up having to call write() directly on the fd. That's
* awkward, as we always have to pass through a length, and some systems will
* complain if we don't consume the return. So we have some macros to make
* things a little more palatable.
*/
#define spl_bt_write_n(fd, s, n) \
do { ssize_t r __maybe_unused = write(fd, s, n); } while (0)
#define spl_bt_write(fd, s) spl_bt_write_n(fd, s, sizeof (s))
#if defined(HAVE_LIBUNWIND)
#define UNW_LOCAL_ONLY
#include <libunwind.h>
/*
* Convert `v` to ASCII hex characters. The bottom `n` nybbles (4-bits ie one
* hex digit) will be written, up to `buflen`. The buffer will not be
* null-terminated. Returns the number of digits written.
*/
static size_t
libspl_u64_to_hex_str(uint64_t v, size_t digits, char *buf, size_t buflen)
spl_bt_u64_to_hex_str(uint64_t v, size_t n, char *buf, size_t buflen)
{
static const char hexdigits[] = {
'0', '1', '2', '3', '4', '5', '6', '7',
@ -45,10 +58,10 @@ libspl_u64_to_hex_str(uint64_t v, size_t digits, char *buf, size_t buflen)
};
size_t pos = 0;
boolean_t want = (digits == 0);
boolean_t want = (n == 0);
for (int i = 15; i >= 0; i--) {
const uint64_t d = v >> (i * 4) & 0xf;
if (!want && (d != 0 || digits > i))
if (!want && (d != 0 || n > i))
want = B_TRUE;
if (want) {
buf[pos++] = hexdigits[d];
@ -62,40 +75,181 @@ libspl_u64_to_hex_str(uint64_t v, size_t digits, char *buf, size_t buflen)
void
libspl_backtrace(int fd)
{
ssize_t ret __attribute__((unused));
unw_context_t uc;
unw_cursor_t cp;
unw_word_t loc;
unw_word_t v;
char buf[128];
size_t n;
int err;
ret = write(fd, "Call trace:\n", 12);
/* Snapshot the current frame and state. */
unw_getcontext(&uc);
/*
* TODO: walk back to the frame that tripped the assertion / the place
* where the signal was recieved.
*/
/*
* Register dump. We're going to loop over all the registers in the
* top frame, and show them, with names, in a nice three-column
* layout, which keeps us within 80 columns.
*/
spl_bt_write(fd, "Registers:\n");
/* Initialise a frame cursor, starting at the current frame */
unw_init_local(&cp, &uc);
while (unw_step(&cp) > 0) {
unw_get_reg(&cp, UNW_REG_IP, &loc);
ret = write(fd, " [0x", 5);
n = libspl_u64_to_hex_str(loc, 10, buf, sizeof (buf));
ret = write(fd, buf, n);
ret = write(fd, "] ", 2);
unw_get_proc_name(&cp, buf, sizeof (buf), &loc);
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
ret = write(fd, buf, n);
ret = write(fd, "+0x", 3);
n = libspl_u64_to_hex_str(loc, 2, buf, sizeof (buf));
ret = write(fd, buf, n);
#ifdef HAVE_LIBUNWIND_ELF
ret = write(fd, " (in ", 5);
unw_get_elf_filename(&cp, buf, sizeof (buf), &loc);
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
ret = write(fd, buf, n);
ret = write(fd, " +0x", 4);
n = libspl_u64_to_hex_str(loc, 2, buf, sizeof (buf));
ret = write(fd, buf, n);
ret = write(fd, ")", 1);
#endif
ret = write(fd, "\n", 1);
/*
* libunwind's list of possible registers for this architecture is an
* enum, unw_regnum_t. UNW_TDEP_LAST_REG is the highest-numbered
* register in that list, however, not all register numbers in this
* range are defined by the architecture, and not all defined registers
* will be present on every implementation of that architecture.
* Moreover, libunwind provides nice names for most, but not all
* registers, but these are hardcoded; a name being available does not
* mean that register is available.
*
* So, we have to pull this all together here. We try to get the value
* of every possible register. If we get a value for it, then the
* register must exist, and so we get its name. If libunwind has no
* name for it, we synthesize something. These cases should be rare,
* and they're usually for uninteresting or niche registers, so it
* shouldn't really matter. We can see the value, and that's the main
* thing.
*/
uint_t cols = 0;
for (uint_t regnum = 0; regnum <= UNW_TDEP_LAST_REG; regnum++) {
/*
* Get the value. Any error probably means the register
* doesn't exist, and we skip it.
*/
if (unw_get_reg(&cp, regnum, &v) < 0)
continue;
/*
* Register name. If libunwind doesn't have a name for it,
* it will return "???". As a shortcut, we just treat '?'
* is an alternate end-of-string character.
*/
const char *name = unw_regname(regnum);
for (n = 0; name[n] != '\0' && name[n] != '?'; n++) {}
if (n == 0) {
/*
* No valid name, so make one of the form "?xx", where
* "xx" is the two-char hex of libunwind's register
* number.
*/
buf[0] = '?';
n = spl_bt_u64_to_hex_str(regnum, 2,
&buf[1], sizeof (buf)-1) + 1;
name = buf;
}
/*
* Two spaces of padding before each column, plus extra
* spaces to align register names shorter than three chars.
*/
spl_bt_write_n(fd, " ", 5-MIN(n, 3));
/* Register name and column punctuation */
spl_bt_write_n(fd, name, n);
spl_bt_write(fd, ": 0x");
/*
* Convert register value (from unw_get_reg()) to hex. We're
* assuming that all registers are 64-bits wide, which is
* probably fine for any general-purpose registers on any
* machine currently in use. A more generic way would be to
* look at the width of unw_word_t, but that would also
* complicate the column code a bit. This is fine.
*/
n = spl_bt_u64_to_hex_str(v, 16, buf, sizeof (buf));
spl_bt_write_n(fd, buf, n);
/* Every third column, emit a newline */
if (!(++cols % 3))
spl_bt_write(fd, "\n");
}
/* If we finished before the third column, emit a newline. */
if (cols % 3)
spl_bt_write(fd, "\n");
/* Now the main event, the backtrace. */
spl_bt_write(fd, "Call trace:\n");
/* Reset the cursor to the top again. */
unw_init_local(&cp, &uc);
do {
/*
* Getting the IP should never fail; libunwind handles it
* specially, because its used a lot internally. Still, no
* point being silly about it, as the last thing we want is
* our crash handler to crash. So if it ever does fail, we'll
* show an error line, but keep going to the next frame.
*/
if (unw_get_reg(&cp, UNW_REG_IP, &v) < 0) {
spl_bt_write(fd, " [couldn't get IP register; "
"corrupt frame?]");
continue;
}
/* IP & punctuation */
n = spl_bt_u64_to_hex_str(v, 16, buf, sizeof (buf));
spl_bt_write(fd, " [0x");
spl_bt_write_n(fd, buf, n);
spl_bt_write(fd, "] ");
/*
* Function ("procedure") name for the current frame. `v`
* receives the offset from the named function to the IP, which
* we show as a "+offset" suffix.
*
* If libunwind can't determine the name, we just show "???"
* instead. We've already displayed the IP above; that will
* have to do.
*
* unw_get_proc_name() will return ENOMEM if the buffer is too
* small, instead truncating the name. So we treat that as a
* success and use whatever is in the buffer.
*/
err = unw_get_proc_name(&cp, buf, sizeof (buf), &v);
if (err == 0 || err == -UNW_ENOMEM) {
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
spl_bt_write_n(fd, buf, n);
/* Offset from proc name */
spl_bt_write(fd, "+0x");
n = spl_bt_u64_to_hex_str(v, 2, buf, sizeof (buf));
spl_bt_write_n(fd, buf, n);
} else
spl_bt_write(fd, "???");
#ifdef HAVE_LIBUNWIND_ELF
/*
* Newer libunwind has unw_get_elf_filename(), which gets
* the name of the ELF object that the frame was executing in.
* Like `unw_get_proc_name()`, `v` recieves the offset within
* the file, and UNW_ENOMEM indicates that a truncate filename
* was left in the buffer.
*/
err = unw_get_elf_filename(&cp, buf, sizeof (buf), &v);
if (err == 0 || err == -UNW_ENOMEM) {
for (n = 0; n < sizeof (buf) && buf[n] != '\0'; n++) {}
spl_bt_write(fd, " (in ");
spl_bt_write_n(fd, buf, n);
/* Offset within file */
spl_bt_write(fd, " +0x");
n = spl_bt_u64_to_hex_str(v, 2, buf, sizeof (buf));
spl_bt_write_n(fd, buf, n);
spl_bt_write(fd, ")");
}
#endif
spl_bt_write(fd, "\n");
} while (unw_step(&cp) > 0);
}
#elif defined(HAVE_BACKTRACE)
#include <execinfo.h>
@ -103,15 +257,12 @@ libspl_backtrace(int fd)
void
libspl_backtrace(int fd)
{
ssize_t ret __attribute__((unused));
void *btptrs[64];
size_t nptrs = backtrace(btptrs, 64);
ret = write(fd, "Call trace:\n", 12);
spl_bt_write(fd, "Call trace:\n");
backtrace_symbols_fd(btptrs, nptrs, fd);
}
#else
#include <sys/debug.h>
void
libspl_backtrace(int fd __maybe_unused)
{

View File

@ -92,20 +92,20 @@ zfs_dio_page_aligned(void *buf)
static inline boolean_t
zfs_dio_offset_aligned(uint64_t offset, uint64_t blksz)
{
return (IS_P2ALIGNED(offset, blksz));
return ((IS_P2ALIGNED(offset, blksz)) ? B_TRUE : B_FALSE);
}
static inline boolean_t
zfs_dio_size_aligned(uint64_t size, uint64_t blksz)
{
return ((size % blksz) == 0);
return (((size % blksz) == 0) ? B_TRUE : B_FALSE);
}
static inline boolean_t
zfs_dio_aligned(uint64_t offset, uint64_t size, uint64_t blksz)
{
return (zfs_dio_offset_aligned(offset, blksz) &&
zfs_dio_size_aligned(size, blksz));
return ((zfs_dio_offset_aligned(offset, blksz) &&
zfs_dio_size_aligned(size, blksz)) ? B_TRUE : B_FALSE);
}
static inline void

View File

@ -70,7 +70,7 @@ if BUILD_FREEBSD
libzfs_la_LIBADD += -lutil -lgeom
endif
libzfs_la_LDFLAGS += -version-info 5:0:1
libzfs_la_LDFLAGS += -version-info 6:0:0
pkgconfig_DATA += %D%/libzfs.pc

View File

@ -1,4 +1,4 @@
<abi-corpus version='2.0' architecture='elf-amd-x86_64' soname='libzfs.so.4'>
<abi-corpus version='2.0' architecture='elf-amd-x86_64' soname='libzfs.so.6'>
<elf-needed>
<dependency name='libzfs_core.so.3'/>
<dependency name='libnvpair.so.3'/>
@ -3132,7 +3132,8 @@
<enumerator name='ZPOOL_PROP_DEDUP_TABLE_SIZE' value='36'/>
<enumerator name='ZPOOL_PROP_DEDUP_TABLE_QUOTA' value='37'/>
<enumerator name='ZPOOL_PROP_DEDUPCACHED' value='38'/>
<enumerator name='ZPOOL_NUM_PROPS' value='39'/>
<enumerator name='ZPOOL_PROP_LAST_SCRUBBED_TXG' value='39'/>
<enumerator name='ZPOOL_NUM_PROPS' value='40'/>
</enum-decl>
<typedef-decl name='zpool_prop_t' type-id='af1ba157' id='5d0c23fb'/>
<typedef-decl name='regoff_t' type-id='95e97e5e' id='54a2a2a8'/>
@ -5984,7 +5985,8 @@
<underlying-type type-id='9cac1fee'/>
<enumerator name='POOL_SCRUB_NORMAL' value='0'/>
<enumerator name='POOL_SCRUB_PAUSE' value='1'/>
<enumerator name='POOL_SCRUB_FLAGS_END' value='2'/>
<enumerator name='POOL_SCRUB_FROM_LAST_TXG' value='2'/>
<enumerator name='POOL_SCRUB_FLAGS_END' value='3'/>
</enum-decl>
<typedef-decl name='pool_scrub_cmd_t' type-id='a1474cbd' id='b51cf3c2'/>
<enum-decl name='zpool_errata' id='d9abbf54'>

View File

@ -471,13 +471,15 @@ int
zpool_get_userprop(zpool_handle_t *zhp, const char *propname, char *buf,
size_t len, zprop_source_t *srctype)
{
nvlist_t *nv, *nvl;
nvlist_t *nv;
uint64_t ival;
const char *value;
zprop_source_t source = ZPROP_SRC_LOCAL;
nvl = zhp->zpool_props;
if (nvlist_lookup_nvlist(nvl, propname, &nv) == 0) {
if (zhp->zpool_props == NULL)
zpool_get_all_props(zhp);
if (nvlist_lookup_nvlist(zhp->zpool_props, propname, &nv) == 0) {
if (nvlist_lookup_uint64(nv, ZPROP_SOURCE, &ival) == 0)
source = ival;
verify(nvlist_lookup_string(nv, ZPROP_VALUE, &value) == 0);
@ -2796,7 +2798,7 @@ zpool_scan(zpool_handle_t *zhp, pool_scan_func_t func, pool_scrub_cmd_t cmd)
}
/*
* With EBUSY, five cases are possible:
* With EBUSY, six cases are possible:
*
* Current state Requested
* 1. Normal Scrub Running Normal Scrub or Error Scrub

View File

@ -932,6 +932,7 @@ libzfs_run_process_impl(const char *path, char *argv[], char *env[], int flags,
pid = fork();
if (pid == 0) {
/* Child process */
setpgid(0, 0);
devnull_fd = open("/dev/null", O_WRONLY | O_CLOEXEC);
if (devnull_fd < 0)

View File

@ -212,7 +212,7 @@ if BUILD_FREEBSD
libzpool_la_LIBADD += -lgeom
endif
libzpool_la_LDFLAGS += -version-info 5:0:0
libzpool_la_LDFLAGS += -version-info 6:0:0
if TARGET_CPU_POWERPC
module/zfs/libzpool_la-vdev_raidz_math_powerpc_altivec.$(OBJEXT) : CFLAGS += -maltivec

View File

@ -35,9 +35,25 @@ typedef struct zfs_dbgmsg {
static list_t zfs_dbgmsgs;
static kmutex_t zfs_dbgmsgs_lock;
static uint_t zfs_dbgmsg_size = 0;
static uint_t zfs_dbgmsg_maxsize = 4<<20; /* 4MB */
int zfs_dbgmsg_enable = B_TRUE;
static void
zfs_dbgmsg_purge(uint_t max_size)
{
while (zfs_dbgmsg_size > max_size) {
zfs_dbgmsg_t *zdm = list_remove_head(&zfs_dbgmsgs);
if (zdm == NULL)
return;
uint_t size = zdm->zdm_size;
kmem_free(zdm, size);
zfs_dbgmsg_size -= size;
}
}
void
zfs_dbgmsg_init(void)
{
@ -74,6 +90,8 @@ __zfs_dbgmsg(char *buf)
mutex_enter(&zfs_dbgmsgs_lock);
list_insert_tail(&zfs_dbgmsgs, zdm);
zfs_dbgmsg_size += size;
zfs_dbgmsg_purge(zfs_dbgmsg_maxsize);
mutex_exit(&zfs_dbgmsgs_lock);
}

View File

@ -18,7 +18,7 @@
.\"
.\" Copyright (c) 2024, Klara, Inc.
.\"
.Dd October 2, 2024
.Dd November 1, 2024
.Dt ZFS 4
.Os
.
@ -436,7 +436,7 @@ write.
It can also help to identify if reported checksum errors are tied to Direct I/O
writes.
Each verify error causes a
.Sy dio_verify
.Sy dio_verify_wr
zevent.
Direct Write I/O checkum verify errors can be seen with
.Nm zpool Cm status Fl d .
@ -1333,9 +1333,10 @@ results in vector instructions
from the respective CPU instruction set being used.
.
.It Sy zfs_bclone_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
Enable the experimental block cloning feature.
Enables access to the block cloning feature.
If this setting is 0, then even if feature@block_cloning is enabled,
attempts to clone blocks will act as though the feature is disabled.
using functions and system calls that attempt to clone blocks will act as
though the feature is disabled.
.
.It Sy zfs_bclone_wait_dirty Ns = Ns Sy 0 Ns | Ns 1 Pq int
When set to 1 the FICLONE and FICLONERANGE ioctls wait for dirty data to be

View File

@ -92,6 +92,11 @@ before a generic mapping for the same slot.
In this way a custom mapping may be applied to a particular channel
and a default mapping applied to the others.
.
.It Sy zpad_slot Ar digits
Pad slot numbers with zeros to make them
.Ar digits
long, which can help to make disk names a consistent length and easier to sort.
.
.It Sy multipath Sy yes Ns | Ns Sy no
Specifies whether
.Xr vdev_id 8
@ -122,7 +127,7 @@ device is connected to.
The default is
.Sy 4 .
.
.It Sy slot Sy bay Ns | Ns Sy phy Ns | Ns Sy port Ns | Ns Sy id Ns | Ns Sy lun Ns | Ns Sy ses
.It Sy slot Sy bay Ns | Ns Sy phy Ns | Ns Sy port Ns | Ns Sy id Ns | Ns Sy lun Ns | Ns Sy bay_lun Ns | Ns Sy ses
Specifies from which element of a SAS identifier the slot number is
taken.
The default is
@ -138,6 +143,9 @@ use the SAS port as the slot number.
use the scsi id as the slot number.
.It Sy lun
use the scsi lun as the slot number.
.It Sy bay_lun
read the slot number from the bay identifier and append the lun number.
Useful for multi-lun multi-actuator hard drives.
.It Sy ses
use the SCSI Enclosure Services (SES) enclosure device slot number,
as reported by

View File

@ -28,7 +28,7 @@
.\" Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
.\" Copyright (c) 2023, Klara Inc.
.\"
.Dd July 29, 2024
.Dd November 18, 2024
.Dt ZPOOLPROPS 7
.Os
.
@ -135,6 +135,19 @@ A unique identifier for the pool.
The current health of the pool.
Health can be one of
.Sy ONLINE , DEGRADED , FAULTED , OFFLINE, REMOVED , UNAVAIL .
.It Sy last_scrubbed_txg
Indicates the transaction group (TXG) up to which the most recent scrub
operation has checked and repaired the dataset.
This provides insight into the data integrity status of their pool at
a specific point in time.
.Xr zpool-scrub 8
can utilize this property to scan only data that has changed since the last
scrub completed, when given the
.Fl C
flag.
This property is not updated when performing an error scrub with the
.Fl e
flag.
.It Sy leaked
Space not released while
.Sy freeing

View File

@ -14,7 +14,7 @@
.\" Copyright (c) 2017 Lawrence Livermore National Security, LLC.
.\" Copyright (c) 2017 Intel Corporation.
.\"
.Dd November 18, 2023
.Dd October 27, 2024
.Dt ZDB 8
.Os
.
@ -408,6 +408,8 @@ blocks cloned, the space saving as a result of cloning, and the saving ratio.
.It Fl TT
Display the per-vdev BRT statistics, including total references.
.It Fl TTT
Display histograms of per-vdev BRT refcounts.
.It Fl TTTT
Dump the contents of the block reference tables.
.It Fl u , -uberblock
Display the current uberblock.

View File

@ -71,7 +71,7 @@ The following fields are displayed:
Used for scripting mode.
Do not print headers and separate fields by a single tab instead of arbitrary
white space.
.It Fl j Op Ar --json-int
.It Fl j , -json Op Ar --json-int
Print the output in JSON format.
Specify
.Sy --json-int

View File

@ -59,7 +59,7 @@
.Xc
Displays all ZFS file systems currently mounted.
.Bl -tag -width "-j"
.It Fl j
.It Fl j , -json
Displays all mounted file systems in JSON format.
.El
.It Xo

View File

@ -50,7 +50,7 @@ and any attempts to access or modify other pools will cause an error.
.
.Sh OPTIONS
.Bl -tag -width "-t"
.It Fl j
.It Fl j , -json
Display channel program output in JSON format.
When this flag is specified and standard output is empty -
channel program encountered an error.

View File

@ -130,7 +130,7 @@ The value
can be used to display all properties that apply to the given dataset's type
.Pq Sy filesystem , volume , snapshot , No or Sy bookmark .
.Bl -tag -width "-s source"
.It Fl j Op Ar --json-int
.It Fl j , -json Op Ar --json-int
Display the output in JSON format.
Specify
.Sy --json-int

View File

@ -23,7 +23,7 @@
.\"
.\" lint-ok: WARNING: sections out of conventional order: Sh SYNOPSIS
.\"
.Dd April 4, 2024
.Dd December 2, 2024
.Dt ZINJECT 8
.Os
.
@ -268,7 +268,7 @@ Run for this many seconds before reporting failure.
.It Fl T Ar failure
Set the failure type to one of
.Sy all ,
.Sy ioctl ,
.Sy flush ,
.Sy claim ,
.Sy free ,
.Sy read ,

View File

@ -98,7 +98,10 @@ This can be an indicator of problems with the underlying storage device.
The number of delay events is ratelimited by the
.Sy zfs_slow_io_events_per_second
module parameter.
.It Sy dio_verify
.It Sy dio_verify_rd
Issued when there was a checksum verify error after a Direct I/O read has been
issued.
.It Sy dio_verify_wr
Issued when there was a checksum verify error after a Direct I/O write has been
issued.
This event can only take place if the module parameter

View File

@ -98,7 +98,7 @@ See the
.Xr zpoolprops 7
manual page for more information on the available pool properties.
.Bl -tag -compact -offset Ds -width "-o field"
.It Fl j Op Ar --json-int, --json-pool-key-guid
.It Fl j , -json Op Ar --json-int, --json-pool-key-guid
Display the list of properties in JSON format.
Specify
.Sy --json-int
@ -157,7 +157,7 @@ See the
.Xr vdevprops 7
manual page for more information on the available pool properties.
.Bl -tag -compact -offset Ds -width "-o field"
.It Fl j Op Ar --json-int
.It Fl j , -json Op Ar --json-int
Display the list of properties in JSON format.
Specify
.Sy --json-int

View File

@ -59,7 +59,7 @@ is specified, the command exits after
.Ar count
reports are printed.
.Bl -tag -width Ds
.It Fl j Op Ar --json-int, --json-pool-key-guid
.It Fl j , -json Op Ar --json-int, --json-pool-key-guid
Display the list of pools in JSON format.
Specify
.Sy --json-int

View File

@ -109,7 +109,7 @@ Stops and cancels an in-progress removal of a top-level vdev.
.El
.
.Sh EXAMPLES
.\" These are, respectively, examples 14 from zpool.8
.\" These are, respectively, examples 15 from zpool.8
.\" Make sure to update them bidirectionally
.Ss Example 1 : No Removing a Mirrored top-level (Log or Data) Device
The following commands remove the mirrored log device
@ -142,9 +142,43 @@ The command to remove the mirrored log
.Ar mirror-2 No is :
.Dl # Nm zpool Cm remove Ar tank mirror-2
.Pp
At this point, the log device no longer exists
(both sides of the mirror have been removed):
.Bd -literal -compact -offset Ds
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
.Ed
.Pp
The command to remove the mirrored data
.Ar mirror-1 No is :
.Dl # Nm zpool Cm remove Ar tank mirror-1
.Pp
After
.Ar mirror-1 No has been evacuated, the pool remains redundant, but
the total amount of space is reduced:
.Bd -literal -compact -offset Ds
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
.Ed
.
.Sh SEE ALSO
.Xr zpool-add 8 ,

View File

@ -26,7 +26,7 @@
.\" Copyright 2017 Nexenta Systems, Inc.
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
.\"
.Dd June 22, 2023
.Dd November 18, 2024
.Dt ZPOOL-SCRUB 8
.Os
.
@ -36,9 +36,8 @@
.Sh SYNOPSIS
.Nm zpool
.Cm scrub
.Op Fl s Ns | Ns Fl p
.Op Ns Fl e | Ns Fl p | Fl s Ns | Fl C Ns
.Op Fl w
.Op Fl e
.Ar pool Ns
.
.Sh DESCRIPTION
@ -114,6 +113,10 @@ The pool must have been scrubbed at least once with the
feature enabled to use this option.
Error scrubbing cannot be run simultaneously with regular scrubbing or
resilvering, nor can it be run when a regular scrub is paused.
.It Fl C
Continue scrub from last saved txg (see zpool
.Sy last_scrubbed_txg
property).
.El
.Sh EXAMPLES
.Ss Example 1

View File

@ -70,7 +70,7 @@ See the
option of
.Nm zpool Cm iostat
for complete details.
.It Fl j Op Ar --json-int, --json-flat-vdevs, --json-pool-key-guid
.It Fl j , -json Op Ar --json-int, --json-flat-vdevs, --json-pool-key-guid
Display the status for ZFS pools in JSON format.
Specify
.Sy --json-int
@ -82,14 +82,18 @@ Specify
.Sy --json-pool-key-guid
to set pool GUID as key for pool objects instead of pool names.
.It Fl d
Display the number of Direct I/O write checksum verify errors that have occured
on a top-level VDEV.
Display the number of Direct I/O read/write checksum verify errors that have
occured on a top-level VDEV.
See
.Sx zfs_vdev_direct_write_verify
in
.Xr zfs 4
for details about the conditions that can cause Direct I/O write checksum
verify failures to occur.
Direct I/O reads checksum verify errors can also occur if the contents of the
buffer are being manipulated after the I/O has been issued and is in flight.
In the case of Direct I/O read checksum verify errors, the I/O will be reissued
through the ARC.
.It Fl D
Display a histogram of deduplication statistics, showing the allocated
.Pq physically present on disk

View File

@ -405,9 +405,43 @@ The command to remove the mirrored log
.Ar mirror-2 No is :
.Dl # Nm zpool Cm remove Ar tank mirror-2
.Pp
At this point, the log device no longer exists
(both sides of the mirror have been removed):
.Bd -literal -compact -offset Ds
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
.Ed
.Pp
The command to remove the mirrored data
.Ar mirror-1 No is :
.Dl # Nm zpool Cm remove Ar tank mirror-1
.Pp
After
.Ar mirror-1 No has been evacuated, the pool remains redundant, but
the total amount of space is reduced:
.Bd -literal -compact -offset Ds
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
.Ed
.
.Ss Example 16 : No Displaying expanded space on a device
The following command displays the detailed information for the pool

View File

@ -3281,7 +3281,6 @@ nvs_xdr_nvp_##type(XDR *xdrs, void *ptr, ...) \
#endif
/* BEGIN CSTYLED */
NVS_BUILD_XDRPROC_T(char);
NVS_BUILD_XDRPROC_T(short);
NVS_BUILD_XDRPROC_T(u_short);
@ -3289,7 +3288,6 @@ NVS_BUILD_XDRPROC_T(int);
NVS_BUILD_XDRPROC_T(u_int);
NVS_BUILD_XDRPROC_T(longlong_t);
NVS_BUILD_XDRPROC_T(u_longlong_t);
/* END CSTYLED */
/*
* The format of xdr encoded nvpair is:

View File

@ -31,5 +31,4 @@
#include <sys/queue.h>
#include <sys/sdt.h>
/* CSTYLED */
SDT_PROBE_DEFINE1(sdt, , , set__error, "int");

View File

@ -620,9 +620,16 @@ abd_borrow_buf_copy(abd_t *abd, size_t n)
/*
* Return a borrowed raw buffer to an ABD. If the ABD is scattered, this will
* no change the contents of the ABD and will ASSERT that you didn't modify
* the buffer since it was borrowed. If you want any changes you made to buf to
* be copied back to abd, use abd_return_buf_copy() instead.
* not change the contents of the ABD. If you want any changes you made to
* buf to be copied back to abd, use abd_return_buf_copy() instead. If the
* ABD is not constructed from user pages from Direct I/O then an ASSERT
* checks to make sure the contents of the buffer have not changed since it was
* borrowed. We can not ASSERT the contents of the buffer have not changed if
* it is composed of user pages. While Direct I/O write pages are placed under
* write protection and can not be changed, this is not the case for Direct I/O
* reads. The pages of a Direct I/O read could be manipulated at any time.
* Checksum verifications in the ZIO pipeline check for this issue and handle
* it by returning an error on checksum verification failure.
*/
void
abd_return_buf(abd_t *abd, void *buf, size_t n)
@ -632,8 +639,34 @@ abd_return_buf(abd_t *abd, void *buf, size_t n)
#ifdef ZFS_DEBUG
(void) zfs_refcount_remove_many(&abd->abd_children, n, buf);
#endif
if (abd_is_linear(abd)) {
if (abd_is_from_pages(abd)) {
if (!abd_is_linear_page(abd))
zio_buf_free(buf, n);
} else if (abd_is_linear(abd)) {
ASSERT3P(buf, ==, abd_to_buf(abd));
} else if (abd_is_gang(abd)) {
#ifdef ZFS_DEBUG
/*
* We have to be careful with gang ABD's that we do not ASSERT
* for any ABD's that contain user pages from Direct I/O. See
* the comment above about Direct I/O read buffers possibly
* being manipulated. In order to handle this, we jsut iterate
* through the gang ABD and only verify ABD's that are not from
* user pages.
*/
void *cmp_buf = buf;
for (abd_t *cabd = list_head(&ABD_GANG(abd).abd_gang_chain);
cabd != NULL;
cabd = list_next(&ABD_GANG(abd).abd_gang_chain, cabd)) {
if (!abd_is_from_pages(cabd)) {
ASSERT0(abd_cmp_buf(cabd, cmp_buf,
cabd->abd_size));
}
cmp_buf = (char *)cmp_buf + cabd->abd_size;
}
#endif
zio_buf_free(buf, n);
} else {
ASSERT0(abd_cmp_buf(abd, buf, n));
zio_buf_free(buf, n);

View File

@ -103,6 +103,7 @@ dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
db->db_offset + bufoff);
thiscpy = MIN(PAGESIZE, tocpy - copied);
va = zfs_map_page(*ma, &sf);
ASSERT(db->db_data != NULL);
memcpy((char *)db->db_data + bufoff, va, thiscpy);
zfs_unmap_page(sf);
ma += 1;
@ -172,6 +173,7 @@ dmu_read_pages(objset_t *os, uint64_t object, vm_page_t *ma, int count,
ASSERT3U(db->db_size, >, PAGE_SIZE);
bufoff = IDX_TO_OFF(m->pindex) % db->db_size;
va = zfs_map_page(m, &sf);
ASSERT(db->db_data != NULL);
memcpy(va, (char *)db->db_data + bufoff, PAGESIZE);
zfs_unmap_page(sf);
vm_page_valid(m);
@ -211,8 +213,10 @@ dmu_read_pages(objset_t *os, uint64_t object, vm_page_t *ma, int count,
*/
tocpy = MIN(db->db_size - bufoff, PAGESIZE - pgoff);
ASSERT3S(tocpy, >=, 0);
if (m != bogus_page)
if (m != bogus_page) {
ASSERT(db->db_data != NULL);
memcpy(va + pgoff, (char *)db->db_data + bufoff, tocpy);
}
pgoff += tocpy;
ASSERT3S(pgoff, >=, 0);
@ -290,6 +294,7 @@ dmu_read_pages(objset_t *os, uint64_t object, vm_page_t *ma, int count,
bufoff = IDX_TO_OFF(m->pindex) % db->db_size;
tocpy = MIN(db->db_size - bufoff, PAGESIZE);
va = zfs_map_page(m, &sf);
ASSERT(db->db_data != NULL);
memcpy(va, (char *)db->db_data + bufoff, tocpy);
if (tocpy < PAGESIZE) {
ASSERT3S(i, ==, *rahead - 1);

View File

@ -187,12 +187,10 @@ param_set_arc_max(SYSCTL_HANDLER_ARGS)
return (0);
}
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, arc_max,
CTLTYPE_ULONG | CTLFLAG_RWTUN | CTLFLAG_MPSAFE,
NULL, 0, param_set_arc_max, "LU",
"Maximum ARC size in bytes (LEGACY)");
/* END CSTYLED */
int
param_set_arc_min(SYSCTL_HANDLER_ARGS)
@ -218,12 +216,10 @@ param_set_arc_min(SYSCTL_HANDLER_ARGS)
return (0);
}
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, arc_min,
CTLTYPE_ULONG | CTLFLAG_RWTUN | CTLFLAG_MPSAFE,
NULL, 0, param_set_arc_min, "LU",
"Minimum ARC size in bytes (LEGACY)");
/* END CSTYLED */
extern uint_t zfs_arc_free_target;
@ -252,13 +248,11 @@ param_set_arc_free_target(SYSCTL_HANDLER_ARGS)
* NOTE: This sysctl is CTLFLAG_RW not CTLFLAG_RWTUN due to its dependency on
* pagedaemon initialization.
*/
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, arc_free_target,
CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_MPSAFE,
NULL, 0, param_set_arc_free_target, "IU",
"Desired number of free pages below which ARC triggers reclaim"
" (LEGACY)");
/* END CSTYLED */
int
param_set_arc_no_grow_shift(SYSCTL_HANDLER_ARGS)
@ -278,84 +272,64 @@ param_set_arc_no_grow_shift(SYSCTL_HANDLER_ARGS)
return (0);
}
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, arc_no_grow_shift,
CTLTYPE_INT | CTLFLAG_RWTUN | CTLFLAG_MPSAFE,
NULL, 0, param_set_arc_no_grow_shift, "I",
"log2(fraction of ARC which must be free to allow growing) (LEGACY)");
/* END CSTYLED */
extern uint64_t l2arc_write_max;
/* BEGIN CSTYLED */
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, l2arc_write_max,
CTLFLAG_RWTUN, &l2arc_write_max, 0,
"Max write bytes per interval (LEGACY)");
/* END CSTYLED */
extern uint64_t l2arc_write_boost;
/* BEGIN CSTYLED */
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, l2arc_write_boost,
CTLFLAG_RWTUN, &l2arc_write_boost, 0,
"Extra write bytes during device warmup (LEGACY)");
/* END CSTYLED */
extern uint64_t l2arc_headroom;
/* BEGIN CSTYLED */
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, l2arc_headroom,
CTLFLAG_RWTUN, &l2arc_headroom, 0,
"Number of max device writes to precache (LEGACY)");
/* END CSTYLED */
extern uint64_t l2arc_headroom_boost;
/* BEGIN CSTYLED */
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, l2arc_headroom_boost,
CTLFLAG_RWTUN, &l2arc_headroom_boost, 0,
"Compressed l2arc_headroom multiplier (LEGACY)");
/* END CSTYLED */
extern uint64_t l2arc_feed_secs;
/* BEGIN CSTYLED */
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, l2arc_feed_secs,
CTLFLAG_RWTUN, &l2arc_feed_secs, 0,
"Seconds between L2ARC writing (LEGACY)");
/* END CSTYLED */
extern uint64_t l2arc_feed_min_ms;
/* BEGIN CSTYLED */
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, l2arc_feed_min_ms,
CTLFLAG_RWTUN, &l2arc_feed_min_ms, 0,
"Min feed interval in milliseconds (LEGACY)");
/* END CSTYLED */
extern int l2arc_noprefetch;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs, OID_AUTO, l2arc_noprefetch,
CTLFLAG_RWTUN, &l2arc_noprefetch, 0,
"Skip caching prefetched buffers (LEGACY)");
/* END CSTYLED */
extern int l2arc_feed_again;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs, OID_AUTO, l2arc_feed_again,
CTLFLAG_RWTUN, &l2arc_feed_again, 0,
"Turbo L2ARC warmup (LEGACY)");
/* END CSTYLED */
extern int l2arc_norw;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs, OID_AUTO, l2arc_norw,
CTLFLAG_RWTUN, &l2arc_norw, 0,
"No reads during writes (LEGACY)");
/* END CSTYLED */
static int
param_get_arc_state_size(SYSCTL_HANDLER_ARGS)
@ -370,7 +344,6 @@ param_get_arc_state_size(SYSCTL_HANDLER_ARGS)
extern arc_state_t ARC_anon;
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, anon_size,
CTLTYPE_S64 | CTLFLAG_RD | CTLFLAG_MPSAFE,
&ARC_anon, 0, param_get_arc_state_size, "Q",
@ -381,11 +354,9 @@ SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, anon_metadata_esize, CTLFLAG_RD,
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, anon_data_esize, CTLFLAG_RD,
&ARC_anon.arcs_esize[ARC_BUFC_DATA].rc_count, 0,
"size of evictable data in anonymous state");
/* END CSTYLED */
extern arc_state_t ARC_mru;
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, mru_size,
CTLTYPE_S64 | CTLFLAG_RD | CTLFLAG_MPSAFE,
&ARC_mru, 0, param_get_arc_state_size, "Q",
@ -396,11 +367,9 @@ SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, mru_metadata_esize, CTLFLAG_RD,
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, mru_data_esize, CTLFLAG_RD,
&ARC_mru.arcs_esize[ARC_BUFC_DATA].rc_count, 0,
"size of evictable data in mru state");
/* END CSTYLED */
extern arc_state_t ARC_mru_ghost;
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, mru_ghost_size,
CTLTYPE_S64 | CTLFLAG_RD | CTLFLAG_MPSAFE,
&ARC_mru_ghost, 0, param_get_arc_state_size, "Q",
@ -411,11 +380,9 @@ SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, mru_ghost_metadata_esize, CTLFLAG_RD,
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, mru_ghost_data_esize, CTLFLAG_RD,
&ARC_mru_ghost.arcs_esize[ARC_BUFC_DATA].rc_count, 0,
"size of evictable data in mru ghost state");
/* END CSTYLED */
extern arc_state_t ARC_mfu;
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, mfu_size,
CTLTYPE_S64 | CTLFLAG_RD | CTLFLAG_MPSAFE,
&ARC_mfu, 0, param_get_arc_state_size, "Q",
@ -426,11 +393,9 @@ SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, mfu_metadata_esize, CTLFLAG_RD,
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, mfu_data_esize, CTLFLAG_RD,
&ARC_mfu.arcs_esize[ARC_BUFC_DATA].rc_count, 0,
"size of evictable data in mfu state");
/* END CSTYLED */
extern arc_state_t ARC_mfu_ghost;
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, mfu_ghost_size,
CTLTYPE_S64 | CTLFLAG_RD | CTLFLAG_MPSAFE,
&ARC_mfu_ghost, 0, param_get_arc_state_size, "Q",
@ -441,11 +406,9 @@ SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, mfu_ghost_metadata_esize, CTLFLAG_RD,
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, mfu_ghost_data_esize, CTLFLAG_RD,
&ARC_mfu_ghost.arcs_esize[ARC_BUFC_DATA].rc_count, 0,
"size of evictable data in mfu ghost state");
/* END CSTYLED */
extern arc_state_t ARC_uncached;
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, uncached_size,
CTLTYPE_S64 | CTLFLAG_RD | CTLFLAG_MPSAFE,
&ARC_uncached, 0, param_get_arc_state_size, "Q",
@ -456,16 +419,13 @@ SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, uncached_metadata_esize, CTLFLAG_RD,
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, uncached_data_esize, CTLFLAG_RD,
&ARC_uncached.arcs_esize[ARC_BUFC_DATA].rc_count, 0,
"size of evictable data in uncached state");
/* END CSTYLED */
extern arc_state_t ARC_l2c_only;
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, l2c_only_size,
CTLTYPE_S64 | CTLFLAG_RD | CTLFLAG_MPSAFE,
&ARC_l2c_only, 0, param_get_arc_state_size, "Q",
"size of l2c_only state");
/* END CSTYLED */
/* dbuf.c */
@ -477,19 +437,15 @@ SYSCTL_NODE(_vfs_zfs, OID_AUTO, zfetch, CTLFLAG_RW, 0, "ZFS ZFETCH (LEGACY)");
extern uint32_t zfetch_max_distance;
/* BEGIN CSTYLED */
SYSCTL_UINT(_vfs_zfs_zfetch, OID_AUTO, max_distance,
CTLFLAG_RWTUN, &zfetch_max_distance, 0,
"Max bytes to prefetch per stream (LEGACY)");
/* END CSTYLED */
extern uint32_t zfetch_max_idistance;
/* BEGIN CSTYLED */
SYSCTL_UINT(_vfs_zfs_zfetch, OID_AUTO, max_idistance,
CTLFLAG_RWTUN, &zfetch_max_idistance, 0,
"Max bytes to prefetch indirects for per stream (LEGACY)");
/* END CSTYLED */
/* dsl_pool.c */
@ -527,12 +483,10 @@ param_set_active_allocator(SYSCTL_HANDLER_ARGS)
*/
extern int zfs_metaslab_sm_blksz_no_log;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs_metaslab, OID_AUTO, sm_blksz_no_log,
CTLFLAG_RDTUN, &zfs_metaslab_sm_blksz_no_log, 0,
"Block size for space map in pools with log space map disabled. "
"Power of 2 greater than 4096.");
/* END CSTYLED */
/*
* When the log space map feature is enabled, we accumulate a lot of
@ -541,12 +495,10 @@ SYSCTL_INT(_vfs_zfs_metaslab, OID_AUTO, sm_blksz_no_log,
*/
extern int zfs_metaslab_sm_blksz_with_log;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs_metaslab, OID_AUTO, sm_blksz_with_log,
CTLFLAG_RDTUN, &zfs_metaslab_sm_blksz_with_log, 0,
"Block size for space map in pools with log space map enabled. "
"Power of 2 greater than 4096.");
/* END CSTYLED */
/*
* The in-core space map representation is more compact than its on-disk form.
@ -556,29 +508,23 @@ SYSCTL_INT(_vfs_zfs_metaslab, OID_AUTO, sm_blksz_with_log,
*/
extern uint_t zfs_condense_pct;
/* BEGIN CSTYLED */
SYSCTL_UINT(_vfs_zfs, OID_AUTO, condense_pct,
CTLFLAG_RWTUN, &zfs_condense_pct, 0,
"Condense on-disk spacemap when it is more than this many percents"
" of in-memory counterpart");
/* END CSTYLED */
extern uint_t zfs_remove_max_segment;
/* BEGIN CSTYLED */
SYSCTL_UINT(_vfs_zfs, OID_AUTO, remove_max_segment,
CTLFLAG_RWTUN, &zfs_remove_max_segment, 0,
"Largest contiguous segment ZFS will attempt to allocate when removing"
" a device");
/* END CSTYLED */
extern int zfs_removal_suspend_progress;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs, OID_AUTO, removal_suspend_progress,
CTLFLAG_RWTUN, &zfs_removal_suspend_progress, 0,
"Ensures certain actions can happen while in the middle of a removal");
/* END CSTYLED */
/*
* Minimum size which forces the dynamic allocator to change
@ -588,12 +534,10 @@ SYSCTL_INT(_vfs_zfs, OID_AUTO, removal_suspend_progress,
*/
extern uint64_t metaslab_df_alloc_threshold;
/* BEGIN CSTYLED */
SYSCTL_QUAD(_vfs_zfs_metaslab, OID_AUTO, df_alloc_threshold,
CTLFLAG_RWTUN, &metaslab_df_alloc_threshold, 0,
"Minimum size which forces the dynamic allocator to change its"
" allocation strategy");
/* END CSTYLED */
/*
* The minimum free space, in percent, which must be available
@ -603,12 +547,10 @@ SYSCTL_QUAD(_vfs_zfs_metaslab, OID_AUTO, df_alloc_threshold,
*/
extern uint_t metaslab_df_free_pct;
/* BEGIN CSTYLED */
SYSCTL_UINT(_vfs_zfs_metaslab, OID_AUTO, df_free_pct,
CTLFLAG_RWTUN, &metaslab_df_free_pct, 0,
"The minimum free space, in percent, which must be available in a"
" space map to continue allocations in a first-fit fashion");
/* END CSTYLED */
/* mmp.c */
@ -631,28 +573,22 @@ param_set_multihost_interval(SYSCTL_HANDLER_ARGS)
extern int zfs_ccw_retry_interval;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs, OID_AUTO, ccw_retry_interval,
CTLFLAG_RWTUN, &zfs_ccw_retry_interval, 0,
"Configuration cache file write, retry after failure, interval"
" (seconds)");
/* END CSTYLED */
extern uint64_t zfs_max_missing_tvds_cachefile;
/* BEGIN CSTYLED */
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, max_missing_tvds_cachefile,
CTLFLAG_RWTUN, &zfs_max_missing_tvds_cachefile, 0,
"Allow importing pools with missing top-level vdevs in cache file");
/* END CSTYLED */
extern uint64_t zfs_max_missing_tvds_scan;
/* BEGIN CSTYLED */
SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, max_missing_tvds_scan,
CTLFLAG_RWTUN, &zfs_max_missing_tvds_scan, 0,
"Allow importing pools with missing top-level vdevs during scan");
/* END CSTYLED */
/* spa_misc.c */
@ -681,11 +617,9 @@ sysctl_vfs_zfs_debug_flags(SYSCTL_HANDLER_ARGS)
return (0);
}
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, debugflags,
CTLTYPE_UINT | CTLFLAG_MPSAFE | CTLFLAG_RWTUN, NULL, 0,
sysctl_vfs_zfs_debug_flags, "IU", "Debug flags for ZFS testing.");
/* END CSTYLED */
int
param_set_deadman_synctime(SYSCTL_HANDLER_ARGS)
@ -768,10 +702,8 @@ param_set_slop_shift(SYSCTL_HANDLER_ARGS)
extern int space_map_ibs;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs, OID_AUTO, space_map_ibs, CTLFLAG_RWTUN,
&space_map_ibs, 0, "Space map indirect block shift");
/* END CSTYLED */
/* vdev.c */
@ -795,13 +727,11 @@ param_set_min_auto_ashift(SYSCTL_HANDLER_ARGS)
return (0);
}
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, min_auto_ashift,
CTLTYPE_UINT | CTLFLAG_RWTUN | CTLFLAG_MPSAFE,
&zfs_vdev_min_auto_ashift, sizeof (zfs_vdev_min_auto_ashift),
param_set_min_auto_ashift, "IU",
"Min ashift used when creating new top-level vdev. (LEGACY)");
/* END CSTYLED */
int
param_set_max_auto_ashift(SYSCTL_HANDLER_ARGS)
@ -822,14 +752,12 @@ param_set_max_auto_ashift(SYSCTL_HANDLER_ARGS)
return (0);
}
/* BEGIN CSTYLED */
SYSCTL_PROC(_vfs_zfs, OID_AUTO, max_auto_ashift,
CTLTYPE_UINT | CTLFLAG_RWTUN | CTLFLAG_MPSAFE,
&zfs_vdev_max_auto_ashift, sizeof (zfs_vdev_max_auto_ashift),
param_set_max_auto_ashift, "IU",
"Max ashift used when optimizing for logical -> physical sector size on"
" new top-level vdevs. (LEGACY)");
/* END CSTYLED */
/*
* Since the DTL space map of a vdev is not expected to have a lot of
@ -837,11 +765,9 @@ SYSCTL_PROC(_vfs_zfs, OID_AUTO, max_auto_ashift,
*/
extern int zfs_vdev_dtl_sm_blksz;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs, OID_AUTO, dtl_sm_blksz,
CTLFLAG_RDTUN, &zfs_vdev_dtl_sm_blksz, 0,
"Block size for DTL space map. Power of 2 greater than 4096.");
/* END CSTYLED */
/*
* vdev-wide space maps that have lots of entries written to them at
@ -850,19 +776,15 @@ SYSCTL_INT(_vfs_zfs, OID_AUTO, dtl_sm_blksz,
*/
extern int zfs_vdev_standard_sm_blksz;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs, OID_AUTO, standard_sm_blksz,
CTLFLAG_RDTUN, &zfs_vdev_standard_sm_blksz, 0,
"Block size for standard space map. Power of 2 greater than 4096.");
/* END CSTYLED */
extern int vdev_validate_skip;
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs, OID_AUTO, validate_skip,
CTLFLAG_RDTUN, &vdev_validate_skip, 0,
"Enable to bypass vdev_validate().");
/* END CSTYLED */
/* vdev_mirror.c */
@ -870,17 +792,13 @@ SYSCTL_INT(_vfs_zfs, OID_AUTO, validate_skip,
extern uint_t zfs_vdev_max_active;
/* BEGIN CSTYLED */
SYSCTL_UINT(_vfs_zfs, OID_AUTO, top_maxinflight,
CTLFLAG_RWTUN, &zfs_vdev_max_active, 0,
"The maximum number of I/Os of all types active for each device."
" (LEGACY)");
/* END CSTYLED */
/* zio.c */
/* BEGIN CSTYLED */
SYSCTL_INT(_vfs_zfs_zio, OID_AUTO, exclude_metadata,
CTLFLAG_RDTUN, &zio_exclude_metadata, 0,
"Exclude metadata buffers from dumps as well");
/* END CSTYLED */

View File

@ -291,8 +291,12 @@ zfs_ioctl(vnode_t *vp, ulong_t com, intptr_t data, int flag, cred_t *cred,
case F_SEEK_HOLE:
{
off = *(offset_t *)data;
error = vn_lock(vp, LK_SHARED);
if (error)
return (error);
/* offset parameter is in/out */
error = zfs_holey(VTOZ(vp), com, &off);
VOP_UNLOCK(vp);
if (error)
return (error);
*(offset_t *)data = off;
@ -452,8 +456,10 @@ mappedread_sf(znode_t *zp, int nbytes, zfs_uio_t *uio)
if (!vm_page_wired(pp) && pp->valid == 0 &&
vm_page_busy_tryupgrade(pp))
vm_page_free(pp);
else
else {
vm_page_deactivate_noreuse(pp);
vm_page_sunbusy(pp);
}
zfs_vmobject_wunlock(obj);
}
} else {
@ -3928,6 +3934,7 @@ zfs_getpages(struct vnode *vp, vm_page_t *ma, int count, int *rbehind,
if (zfs_enter_verify_zp(zfsvfs, zp, FTAG) != 0)
return (zfs_vm_pagerret_error);
object = ma[0]->object;
start = IDX_TO_OFF(ma[0]->pindex);
end = IDX_TO_OFF(ma[count - 1]->pindex + 1);
@ -3936,32 +3943,44 @@ zfs_getpages(struct vnode *vp, vm_page_t *ma, int count, int *rbehind,
* Note that we need to handle the case of the block size growing.
*/
for (;;) {
uint64_t len;
blksz = zp->z_blksz;
len = roundup(end, blksz) - rounddown(start, blksz);
lr = zfs_rangelock_tryenter(&zp->z_rangelock,
rounddown(start, blksz),
roundup(end, blksz) - rounddown(start, blksz), RL_READER);
rounddown(start, blksz), len, RL_READER);
if (lr == NULL) {
if (rahead != NULL) {
*rahead = 0;
rahead = NULL;
}
if (rbehind != NULL) {
*rbehind = 0;
rbehind = NULL;
}
break;
/*
* Avoid a deadlock with update_pages(). We need to
* hold the range lock when copying from the DMU, so
* give up the busy lock to allow update_pages() to
* proceed. We might need to allocate new pages, which
* isn't quite right since this allocation isn't subject
* to the page fault handler's OOM logic, but this is
* the best we can do for now.
*/
for (int i = 0; i < count; i++)
vm_page_xunbusy(ma[i]);
lr = zfs_rangelock_enter(&zp->z_rangelock,
rounddown(start, blksz), len, RL_READER);
zfs_vmobject_wlock(object);
(void) vm_page_grab_pages(object, OFF_TO_IDX(start),
VM_ALLOC_NORMAL | VM_ALLOC_WAITOK | VM_ALLOC_ZERO,
ma, count);
zfs_vmobject_wunlock(object);
}
if (blksz == zp->z_blksz)
break;
zfs_rangelock_exit(lr);
}
object = ma[0]->object;
zfs_vmobject_wlock(object);
obj_size = object->un_pager.vnp.vnp_size;
zfs_vmobject_wunlock(object);
if (IDX_TO_OFF(ma[count - 1]->pindex) >= obj_size) {
if (lr != NULL)
zfs_rangelock_exit(lr);
zfs_exit(zfsvfs, FTAG);
return (zfs_vm_pagerret_bad);
@ -3987,10 +4006,32 @@ zfs_getpages(struct vnode *vp, vm_page_t *ma, int count, int *rbehind,
* ZFS will panic if we request DMU to read beyond the end of the last
* allocated block.
*/
error = dmu_read_pages(zfsvfs->z_os, zp->z_id, ma, count, &pgsin_b,
&pgsin_a, MIN(end, obj_size) - (end - PAGE_SIZE));
for (int i = 0; i < count; i++) {
int dummypgsin, count1, j, last_size;
if (vm_page_any_valid(ma[i])) {
ASSERT(vm_page_all_valid(ma[i]));
continue;
}
for (j = i + 1; j < count; j++) {
if (vm_page_any_valid(ma[j])) {
ASSERT(vm_page_all_valid(ma[j]));
break;
}
}
count1 = j - i;
dummypgsin = 0;
last_size = j == count ?
MIN(end, obj_size) - (end - PAGE_SIZE) : PAGE_SIZE;
error = dmu_read_pages(zfsvfs->z_os, zp->z_id, &ma[i], count1,
i == 0 ? &pgsin_b : &dummypgsin,
j == count ? &pgsin_a : &dummypgsin,
last_size);
if (error != 0)
break;
i += count1 - 1;
}
if (lr != NULL)
zfs_rangelock_exit(lr);
ZFS_ACCESSTIME_STAMP(zfsvfs, zp);
@ -6159,7 +6200,7 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
} else {
#if (__FreeBSD_version >= 1302506 && __FreeBSD_version < 1400000) || \
__FreeBSD_version >= 1400086
vn_lock_pair(invp, false, LK_EXCLUSIVE, outvp, false,
vn_lock_pair(invp, false, LK_SHARED, outvp, false,
LK_EXCLUSIVE);
#else
vn_lock_pair(invp, false, outvp, false);

View File

@ -370,8 +370,6 @@ zfs_znode_sa_init(zfsvfs_t *zfsvfs, znode_t *zp,
*/
if (zp->z_id == zfsvfs->z_root && zfsvfs->z_parent == zfsvfs)
ZTOV(zp)->v_flag |= VROOT;
vn_exists(ZTOV(zp));
}
void

View File

@ -1686,11 +1686,10 @@ zio_do_crypt_data(boolean_t encrypt, zio_crypt_key_t *key,
freebsd_crypt_session_t *tmpl = NULL;
uint8_t *authbuf = NULL;
memset(&puio_s, 0, sizeof (puio_s));
memset(&cuio_s, 0, sizeof (cuio_s));
zfs_uio_init(&puio, &puio_s);
zfs_uio_init(&cuio, &cuio_s);
memset(GET_UIO_STRUCT(&puio), 0, sizeof (struct uio));
memset(GET_UIO_STRUCT(&cuio), 0, sizeof (struct uio));
#ifdef FCRYPTO_DEBUG
printf("%s(%s, %p, %p, %d, %p, %p, %u, %s, %p, %p, %p)\n",
@ -1824,7 +1823,6 @@ error:
}
#if defined(_KERNEL) && defined(HAVE_SPL)
/* CSTYLED */
module_param(zfs_key_max_salt_uses, ulong, 0644);
MODULE_PARM_DESC(zfs_key_max_salt_uses, "Max number of times a salt value "
"can be used for generating encryption keys before it is rotated");

View File

@ -33,7 +33,6 @@
* But we would still default to the current default of not to do that.
*/
static unsigned int spl_panic_halt;
/* CSTYLED */
module_param(spl_panic_halt, uint, 0644);
MODULE_PARM_DESC(spl_panic_halt, "Cause kernel panic on assertion failures");

View File

@ -54,7 +54,6 @@
unsigned long spl_hostid = 0;
EXPORT_SYMBOL(spl_hostid);
/* CSTYLED */
module_param(spl_hostid, ulong, 0644);
MODULE_PARM_DESC(spl_hostid, "The system hostid.");

View File

@ -48,7 +48,6 @@
#define smp_mb__after_atomic(x) smp_mb__after_clear_bit(x)
#endif
/* BEGIN CSTYLED */
/*
* Cache magazines are an optimization designed to minimize the cost of
* allocating memory. They do this by keeping a per-cpu cache of recently
@ -97,7 +96,6 @@ static unsigned int spl_kmem_cache_kmem_threads = 4;
module_param(spl_kmem_cache_kmem_threads, uint, 0444);
MODULE_PARM_DESC(spl_kmem_cache_kmem_threads,
"Number of spl_kmem_cache threads");
/* END CSTYLED */
/*
* Slab allocation interfaces

Some files were not shown because too many files have changed in this diff Show More