Proxmox-Port/zfs - zfs - Gitea: Git with a cup of tea

mirror of https://github.com/openzfs/zfs.git synced 2025-10-01 19:56:28 +00:00

Author	SHA1	Message	Date
Alexander Motin	94413bc75d	zdb: Filter log spacemaps by vdev Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details When requested to dump metaslabs only for specific vdev, apply the filter also to log spacemaps to reduce the output. Unfortunately filtering by metaslab numbers is more difficult so leave those. While there, tune the output formatting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17643	2025-08-21 11:19:46 -04:00
Alan Somers	d3c1d27afd	zdb: better handling for corrupt block pointers When dumping indirect blocks, attempt to print corrupt block pointers rather than abort the program. When corruption is detected zdb will exit with an error code of 3. Sponsored by: ConnectWise Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Alek Pinchuk <alek.pinchuk@connectwise.com> Signed-off-by: Alan Somers <asomers@gmail.com> Closes #17166	2025-08-12 14:16:37 -07:00
René Wirnata	1d0b94c4e7	zed: prettify slack notification message This converts the body of a ZED slack notification from plain text to code block style to help with readability. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: René Wirnata <rene.wirnata@pandascience.net> Closes #17610	2025-08-11 09:44:51 -07:00
Rob Norris	72602f6ad9	ZIL: "crash" the ZIL if the pool suspends during fallback If the ZIL runs into trouble, it calls txg_wait_synced(), which blocks on suspend. We want it to not block on suspend, instead returning an error. On the surface, this is simple: change all calls to txg_wait_synced_flags(TXG_WAIT_SUSPEND), and then thread the error return back to the zil_commit() caller. Handling suspension means returning an error to all commit waiters. This is relatively straightforward, as zil_commit_waiter_t already has zcw_zio_error to hold the write IO error, which signals a fallback to txg_wait_synced_flags(TXG_WAIT_SUSPEND), which will fail, and so the waiter can now return an error from zil_commit(). However, commit waiters are normally signalled when their associated write (LWB) completes. If the pool has suspended, those IOs may not return for some time, or maybe not at all. We still want to signal those waiters so they can return from zil_commit(). We have a list of those in-flight LWBs on zl_lwb_list, so we can run through those, detach them and signal them. The LWB itself is still in-flight, but no longer has attached waiters, so when it returns there will be nothing to do. (As an aside, ITXs can also supply completion callbacks, which are called when they are destroyed. These are directly connected to LWBs though, so are passed the error code and destroyed there too). At this point, all ZIL waiters have been ejected, so we only have to consider the internal state. We potentially still have ITXs that have not been committed, LWBs still open, and LWBs in-flight. The on-disk ZIL is in an unknown state; some writes may have been written but not returned to us. We really can't rely on any of it; the best thing to do is abandon it entirely and start over when the pool returns to service. But, since we may have IO out that won't return until the pool resumes, we need something for it to return to. The simplest solution I could find, implemented here, is to "crash" the ZIL: accept no new ITXs, make no further updates, and let it empty out on its normal schedule, that is, as txgs complete and zil_sync() and zil_clean() are called. We set a "restart txg" to three txgs in the future (syncing + TXG_CONCURRENT_STATES), at which point all the internal state will have been cleared out, and the ZIL can resume operation (handled at the top of zil_clean()). This commit adds zil_crash(), which handles all of the above: - sets the restart txg - capture and signal all waiters - zero the header zil_crash() is called when txg_wait_synced_flags(TXG_WAIT_SUSPEND) returns because the pool suspended (ESHUTDOWN). The rest of the commit is just threading the errors through, and related housekeeping. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17398	2025-08-08 16:43:26 -07:00
Rob Norris	99a5f5d1ba	ZIL: pass commit errors back to ITX callbacks ITX callbacks are used to signal that something can be cleaned up after a itx is committed. Presently that's only used when syncing out mapped pages (msync()) to mark dirty pages clean. This extends the callback interface so it can be passed an error, and take a different cleanup action if necessary. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17398	2025-08-08 16:43:20 -07:00
Rob Norris	967b15b888	ZIL: allow zil_commit() to fail with error This changes zil_commit() to have an int return, and updates all callers to check it. There are no corresponding internal changes yet; it will always return 0. Since zil_commit() is an indication that the caller _really_ wants the associated data to be durability stored, I've annotated it with the __warn_unused_result__ compiler attribute (via __must_check), to emit a warning if it's ever ussd without doing something with the return code. I hope this will mean we never misuse it in the future. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17398	2025-08-08 16:43:09 -07:00
Rob Norris	82d6f7b047	Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:42 -07:00
Rob Norris	f7bdd84328	Prefer VERIFY0P(n) over VERIFY(n == NULL) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:37 -07:00
Rob Norris	611b95da18	Prefer VERIFY0(n) over VERIFY3S(n, ==, 0) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:32 -07:00
Rob Norris	5c7df3bcac	Prefer VERIFY0(n) over VERIFY3U(n, ==, 0) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:25 -07:00
Rob Norris	c39e076f23	Prefer VERIFY0(n) over VERIFY(n == 0) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:40:59 -07:00
Mariusz Zaborski	0c376d0f59	Document the new '-a' zpool option Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org> Closes #17585	2025-08-06 17:11:47 -07:00
Alek P	3e004369f7	Removed unused zio_decompress_fail_fraction variable Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Alek Pinchuk <alek.pinchuk@connectwise.com> Closes #17599	2025-08-06 17:10:03 -07:00
Alexander Motin	60f714e6e2	Implement physical rewrites Based on previous commit this implements `zfs rewrite -P` flag, making ZFS to keep blocks logical birth times while rewriting files. It should exclude the rewritten blocks from incremental sends, snapshot diffs, etc. Snapshots space usage same time will reflect the additional space usage from newly allocated blocks. Since this begins to use new "rewrite" flag in the block pointers, this commit introduces a new read-compatible per-dataset feature physical_rewrite. It must be enabled for the command to not fail, it is activated on first use and deactivated on deletion of the last affected dataset. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17565	2025-08-06 10:36:56 -07:00
Alexander Motin	4ae8bf406b	Allow physical rewrite without logical During regular block writes ZFS sets both logical and physical birth times equal to the current TXG. During dedup and block cloning logical birth time is still set to the current TXG, but physical may be copied from the original block that was used. This represents the fact that logically user data has changed, but the physically it is the same old block. But block rewrite introduces a new situation, when block is not changed logically, but stored in a different place of the pool. From ARC, scrub and some other perspectives this is a new block, but for example for user applications or incremental replication it is not. Somewhat similar thing happen during remap phase of device removal, but in that case space blocks are still acounted as allocated at their logical birth times. This patch introduces a new "rewrite" flag in the block pointer structure, allowing to differentiate physical rewrite (when the block is actually reallocated at the physical birth time) from the device reval case (when the logical birth time is used). The new functionality is not used at this point, and the only expected change is that error log is now kept in terms of physical physical birth times, rather than logical, since if a block with logged error was somehow rewritten, then the previous error does not matter any more. This change also introduces a new TRAVERSE_LOGICAL flag to the traverse code, allowing zfs send, redact and diff to work in context of logical birth times, ignoring physical-only rewrites. It also changes nothing at this point due to lack of those writes, but they will come in a following patch. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17565	2025-08-06 10:36:07 -07:00
Mariusz Zaborski	894edd084e	Add TXG timestamp database This feature enables tracking of when TXGs are committed to disk, providing an estimated timestamp for each TXG. With this information, it becomes possible to perform scrubs based on specific date ranges, improving the granularity of data management and recovery operations. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #16853	2025-08-06 10:31:21 -07:00
Igor Ostapenko	cb5e7e097d	range_tree: Provide more debug details upon unexpected add/remove Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com> Closes #17581	2025-07-31 10:44:42 -04:00
Akash B	b6e8db509d	zpool/zfs: Add '-a\|--all' option to scrub, trim, initialize Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Add support for the '-a \| --all' option to perform trim, scrub, and initialize operations on all pools. Previously, specifying a pool name was mandatory for these operations. With this enhancement, users can now execute these operations across all pools at once, without needing to manually iterate over each pool from the command line. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Akash B <akash-b@hpe.com> Closes #17524	2025-07-29 14:50:44 -07:00
Andriy Tkachuk	4bd7a2eaa5	zdb: fix checksum calculation for decompressed blocks Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Currently, when reading compressed blocks with -R and decompressing them with :d option and specifying lsize, which is normally bigger than psize for compressed blocks, the checksum is calculated on decompressed data. But it makes no sense since zfs always calculates checksum on physical, i.e. compressed data. So reading the same block produces different checksum results depending on how we read it, whether we decompress it or not, which, again, makes no sense. Fix: use psize instead of lsize when calculating the checksum so that it is always calculated on the physical block size, no matter was it compressed or not. Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17547	2025-07-24 21:24:15 -04:00
Ameer Hamza	a8646a8186	ZED: Fix device type detection and pool iteration logic Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details During hotplug REMOVED events, devid matching fails for partition-based spares because devid information is not stored in pool config for partitioned devices. However, when devid is populated by the hotplug event, the original code skipped the search logic entirely, skipping vdev_guid matching and resulting in wrong device type detection that caused spares to be incorrectly identified as l2arc devices. Additionally, fix zfs_agent_iter_pool() to use the return value from zfs_agent_iter_vdev() instead of relying on search parameters, which was previously ignored. Also add pool_guid optimization to enable targeted pool searching when pool_guid is available. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17545	2025-07-24 15:47:46 -07:00
Rob Norris	bf38c15071	everywhere: misc unnecessary var init/update Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details These are all cases where we initialise or update a variable, and then never use it. None of them particularly matter, as the compiler should optimise them all away during dead store elimination, but some static analysers complain about them and they are extra work for casual readers to follow, so worth removing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17551	2025-07-22 15:23:58 -07:00
shodanshok	cecff09faa	add uncompressed_size to arc_summary Add uncompressed ARC size to statistics reported by arc_summary. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #17556	2025-07-22 15:06:09 -07:00
Alexander Motin	be1e991a1a	Allow and prefer special vdevs as ZIL Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details Before this change ZIL blocks were allocated only from normal or SLOG vdevs. In typical situation when special vdevs are SSDs and normal are HDDs it could cause weird inversions when data blocks are written to SSDs, but ZIL referencing them to HDDs. This change assumes that special vdevs typically have much better (or at least not worse) latency than normal, and so in absence of SLOGs should store ZIL blocks. It means similar to normal vdevs introduction of special embedded log allocation class and updating the allocation fallback order to: SLOG -> special embedded log -> special -> normal embedded log -> normal. The code tries to guess whether data block is going to be written to normal or special vdev (it can not be done precisely before compression) and prefer indirect writes for blocks written to a special vdev to avoid double-write. For blocks that are going to be written to normal vdev, special vdev by default plays as SLOG, reducing write latency by the cost of higher special vdev wear, but it is tunable via module parameter. This should allow HDD pools with decent SSD as special vdev to work under synchronous workloads without requiring additional SLOG SSD, impractical in many scenarios. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17505	2025-07-18 18:44:14 -07:00
Paul Dagnelie	b21e04e8d9	Fix zdb pool/ with -k Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details When examining the root dataset with zdb -k, we get into a mismatched state. main() knows we are not examining the whole pool, but it strips off the trailing slash. import_checkpointed_state() then thinks we are examining the whole pool, and does not update the target path appropriately. The fix is to directly inform import_checkpointed_state that we are examining a filesystem, and not the whole pool. Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17536	2025-07-15 17:01:49 -07:00
Rob Norris	fce18e04d5	libzpool: tunable-based option interface for zdb/ztest Removes the old dlsym() based option setter and adds a new function handle_tunable_option() that can set, get and list all the tunables in the system. And then wire it up to zdb and ztest. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17537	2025-07-15 15:47:03 -07:00
Brian Behlendorf	ea38787f2e	Revert "Fix incorrect expected error in ztest" This reverts commit `2076011e0c`. The comment which explains EINVAL should be expected for this case was wrong, not the code. The kernel will return ENOTSUP when attaching a distributed spare to the wrong top-level dRAID vdev. See the check for this in spa_vdev_attach(). Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17503	2025-07-09 14:34:02 -07:00
Paul Dagnelie	a981cb69e4	Implement dynamic gang header sizes ZFS gang block headers are currently fixed at 512 bytes. This is increasingly wasteful in the era of larger disk sector sizes. This PR allows any size allocation to work as a gang header. It also contains supporting changes to ZDB to make gang headers easier to work with. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17004	2025-07-09 14:02:53 -07:00
Paul Dagnelie	e845be28e7	Add no-upgrade featureflag Adds a featureflag that is not enabled during upgrades unless listed explicitly. This is useful for features that could cause issues unless applied carefully; for example, a feature that could make a root pool unbootable if bootloaders don't yet have support for it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17004	2025-07-09 14:01:59 -07:00
Ameer Hamza	523d9d6007	Validate mountpoint on path-based unmount using statx Use statx to verify that path-based unmounts proceed only if the mountpoint reported by statx matches the MNTTAB entry reported by libzfs, aborting the operation if they differ. Align `zfs umount /path` behavior with `zfs umount dataset`. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17481	2025-07-08 22:10:00 -04:00
Igor Ostapenko	ee0cb4cb89	ztest: Fix false positive of ENOSPC handling Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details Before running a pass zs_enospc_count is checked to free up some space by destroying a random dataset. But the space freed may still be not re-usable during the TXG_DEFER window breaking the next dataset creation in ztest_generic_run(). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com> Closes #17506	2025-07-03 16:00:13 -07:00
Alexander Motin	bf846dcb7d	Release topology restrictions on special/dedup Special vdevs were originally designed as a small blocks storage for dRAID, for which role RAIDZ/dRAID topologies are not good. But it is more often used as SSD storage for metadata and hot data of HDD pools. In these use cases narrow RAIDZ of SSDs might be fine, so we should not introduce unnecessary restrictions, and ZFS internally does not care. Similar applies to dedup vdevs. Original DDT used 4KB blocks, for which anything but mirror was a terrible storage. But new FDT implementation uses 32KB blocks by default, which are much less demanding even including compression, and which could be increased even higher now, if needed. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17496	2025-07-02 09:33:47 -07:00
Brian Behlendorf	48ce292ea0	Clarify and restrict dmu_tx_assign() errors There are three possible cases where dmu_tx_assign() may encounter a fatal error. When there is a true lack of free space (ENOSPC), when there is a lack of quota space (EDQUOT), or when data required to perform the transaction cannot be read from disk (EIO). See the dmu_tx_check_ioerr() function for additional details of on the motivation for check for I/O error early. Prior to this change dmu_tx_assign() would return the contents of tx->tx_err which covered a wide range of possible error codes (EIO, ECKSUM, ESRCH, etc). In practice, none of the callers could do anything useful with this level of detail and simply returned the error. Therefore, this change converts all tx->tx_err errors to EIO, adds ASSERTs to dmu_tx_assign() to cover the only possible errors, and clarifies the function comment to include EIO as a possible fatal error. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian D Behlendorf <behlendo@slag12.llnl.gov> Closes #17463	2025-06-23 15:48:30 -04:00
Rob Norris	560e3170ef	dsl_dataset: rename dmu_objset_clone* to dsl_dataset_clone* And make its check and sync functions visible, so I can hook them up to zcp_synctask. Rename not strictly necessary, but it definitely looks more like a dsl_dataset thing than a dmu_objset thing, to the extent that those things even have a meaningful distinction. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17426	2025-06-10 14:52:43 -07:00
Ameer Hamza	f5a6dd8b70	zpool: clarify ZPOOL_STATUS_REMOVED_DEV status message Disks can be removed either by the administrator via hotplug or by the kernel when a disk failure occurs. The previous message implied that removal was always manual, which could be confusing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17400	2025-05-30 09:15:10 -07:00
Rob Norris	44e3266894	events: include zio type in IO error reports Usually the IO type can be inferred from the other fields (in particular, priority and flags) sometimes it's not easy to see. This is just another little debug helper. May 27 2025 00:54:54.024110493 ereport.fs.zfs.data class = "ereport.fs.zfs.data" ena = 0x1f5ecfae600801 ... zio_delta = 0x0 zio_type = 0x2 [WRITE] zio_priority = 0x3 [ASYNC_WRITE] zio_objset = 0x0 Document zio_type and zio_priority. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17381	2025-05-30 10:29:29 -04:00
Cameron Harr	92157c840c	Refactor man page and CLI help output per mandoc Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details The man page and the usage statement from the CLI have been refactored to abide by the ManDoc standard. Style changes include: * Upper-case letters before lower-case * List short options w/o arguments first * Then list short options w/ arguments * Then list long arguments Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Cameron Harr <harr1@llnl.gov> Closes #17357	2025-05-23 09:10:30 -07:00
Cameron Harr	cdb4c44684	Reformat cli help and man page to be in sync The man page and CLI usage statements were both a little out of sync and neither fully alphabetized correctly. That has been fixed. One outstanding question is whether to get rid of the ellipses on the CLI usage. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Cameron Harr <harr1@llnl.gov> Closes #16004 Closes #17357	2025-05-23 09:10:21 -07:00
Ameer Hamza	f0baaa329a	arcstat: prevent ZeroDivisionError when L2ARC becomes empty Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard@ryao.dev> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17348	2025-05-19 16:27:24 -07:00
Alexander Motin	734eba251d	Wire O_DIRECT also to Uncached I/O (#17218 ) Before Direct I/O was implemented, I've implemented lighter version I called Uncached I/O. It uses normal DMU/ARC data path with some optimizations, but evicts data from caches as soon as possible and reasonable. Originally I wired it only to a primarycache property, but now completing the integration all the way up to the VFS. While Direct I/O has the lowest possible memory bandwidth usage, it also has a significant number of limitations. It require I/Os to be page aligned, does not allow speculative prefetch, etc. The Uncached I/O does not have those limitations, but instead require additional memory copy, though still one less than regular cached I/O. As such it should fill the gap in between. Considering this I've disabled annoying EINVAL errors on misaligned requests, adding a tunable for those who wants to test their applications. To pass the information between the layers I had to change a number of APIs. But as side effect upper layers can now control not only the caching, but also speculative prefetch. I haven't wired it to VFS yet, since it require looking on some OS specifics. But while there I've implemented speculative prefetch of indirect blocks for Direct I/O, controllable via all the same mechanisms. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Fixes #17027 Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-05-13 14:26:55 -07:00
Alexander Motin	49fbdd4533	Introduce zfs rewrite subcommand (#17246 ) This allows to rewrite content of specified file(s) as-is without modifications, but at a different location, compression, checksum, dedup, copies and other parameter values. It is faster than read plus write, since it does not require data copying to user-space. It is also faster for sync=always datasets, since without data modification it does not require ZIL writing. Also since it is protected by normal range range locks, it can be done under any other load. Also it does not affect file's modification time or other properties. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com>	2025-05-12 10:22:17 -07:00
Artem	27f3d94940	Sort the blocking snapshots list #12751 (#17264 ) Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details When multiple snapshots prevent the destruction/rollback of the respective dataset/snapshot/volume via zfs destroy or zfs rollback, the error message does not list the blocking snapshots sorted according to their order of creation. This causes inconvenience and can lead to confusion, and also creates a contrast with a returned message from zfs list -t snap function. Closes: #12751 Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-05-01 17:40:23 -07:00
Tony Hutter	155847c72d	GCC 15: Fix unterminated-string-initialization (#17244 ) Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Fix build errors on Fedora 42 like: module/zcommon/zfs_valstr.c:193:16: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (3 chars into 2 available) The arrays in zpool_vdev_os.c and zfs_valstr.c don't need to be NULL terminated, but we do so to make GCC happy. Closes: #17242 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-04-16 09:33:29 -07:00
IIIPr0t0typ3III	189dc26296	Fixed zfs_notify_email for programs like sendmail Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details zfs_notify_email will now include an empty line separating the header from the body of the email in case the subject is not provided via a command line argument. This is necessary for programs like sendmail to function correctly (everything up to the first empty line is interpreted as header, which previously resulted in either missing message parts or unsent emails) Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Felix Schmidt <felixschmidt20@aol.com> Closed #17238	2025-04-12 11:58:19 -04:00
Syed Shahrukh Hussain	78a7c78bdf	Added fix for zpool get state segfaults with two or more vdevs (#15972 ). (#17213 ) Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details The problem was identified in handling of the zpool get state command line arguments. A pointer vdev was used to point to the argv[1], and its address set to cb.cb_vdevs.cb_names(pointer to array of strings) so any increment to cb_names resulted in a segfault. Fix covers a special case of root parameter at argv[1] and remaining cases are handled by passing in the argv + 1, which allows cb_names iteration of next command line arguments (vdevs). Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Syed Shahrukh Hussain <syed.shahrukh@ossrevival.org>	2025-04-04 15:34:38 -07:00
Ameer Hamza	6f6c504700	Show default quotas in zfs userspace tools Update zfs userspace, groupspace, and projectspace to display the default quotas when no per-ID specific quota is configured. This ensures tool outputs align with enforced limits. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-03 10:36:45 -07:00
Martin Matuška	87f8bf6b0c	Multiple printf() size fixes (#17199 ) cmd/zinject/zinject.c: - use PRIu64 when printing uint64_t tests/zfs-tests/cmd/clonefile.c: - use an unsigned long long to store result from strtoull() - use %jd for printing off_t, %zu for size_t, %zd for ssize_t tests/zfs-tests/tests/functional/vdev_disk/page_alignment.c: - use %zx to print size_t Discovered when compiling on FreeBSD i386. Signed-off-by: Martin Matuska <mm@FreeBSD.org> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: @ImAwsumm	2025-04-01 15:27:03 -07:00
Ameer Hamza	30cc2331f4	zed: Ensure spare activation after kernel-initiated device removal In addition to hotplug events, the kernel may also mark a failing vdev as REMOVED. This was observed in a customer report and reproduced by forcing the NVMe host driver to disable the device after a failed reset due to command timeout. In such cases, the spare was not activated because the device had already transitioned to a REMOVED state before zed processed the event. To address this, explicitly attempt hot spare activation when the kernel marks a device as REMOVED. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17187	2025-03-28 15:48:38 -04:00
Paul Dagnelie	9250403ba6	Make ganging redundancy respect redundant_metadata property (#17073 ) Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details The redundant_metadata setting in ZFS allows users to trade resilience for performance and space savings. This applies to all data and metadata blocks in zfs, with one exception: gang blocks. Gang blocks currently just take the copies property of the IO being ganged and, if it's 1, sets it to 2. This means that we always make at least two copies of a gang header, which is good for resilience. However, if the users care more about performance than resilience, their gang blocks will be even more of a penalty than usual. We add logic to calculate the number of gang headers copies directly, and store it as a separate IO property. This is stored in the IO properties and not calculated when we decide to gang because by that point we may not have easy access to the relevant information about what kind of block is being stored. We also check the redundant_metadata property when doing so, and use that to decide whether to store an extra copy of the gang headers, compared to the underlying blocks. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-03-19 15:58:29 -07:00
Rob Norris	f69631992d	dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143 ) This helps to avoids confusion with the similarly-named txg_wait_synced(). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-03-18 16:04:22 -07:00
Rob Norris	137045be98	SPDX: license tags: BSD-2-Clause Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:46 -07:00

1 2 3 4 5 ...

1635 Commits