Proxmox-Port/zfs - zfs - Gitea: Git with a cup of tea

mirror of https://github.com/openzfs/zfs.git synced 2025-10-01 19:56:28 +00:00

Author	SHA1	Message	Date
Rob Norris	2755e2aa60	spa_activity_check: narrow scope of MMP vars They aren't used outside these very small blocks, and their initial values are never used at all. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17551	2025-07-22 15:23:07 -07:00
Rob Norris	9292071565	linux/kmem: remove HAVE_ATOMIC64_T and kmem_alloc_used wrappers Seems like we haven't set it since the SPL was pulled into the main ZFS tree. In removing the define, I've taken the 64-bit version (ie the one that _hasn't_ been running since back then) because it looks like its closer to the intended width by the way its used. Since the macros ar eno longer needed as a selector, pull those too. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17551	2025-07-22 15:08:07 -07:00
Rob Norris	1c483cf3d0	linux/kmem: remove long-obsolete __GFP compat flags Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17551	2025-07-22 15:07:53 -07:00
Rob Norris	96d20d7d59	linux/kmem: remove PF_FSTRANS and PF_MEMALLOC_NOIO compat Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17551	2025-07-22 15:07:36 -07:00
shodanshok	cecff09faa	add uncompressed_size to arc_summary Add uncompressed ARC size to statistics reported by arc_summary. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #17556	2025-07-22 15:06:09 -07:00
shodanshok	a7a144e655	enforce arc_dnode_limit Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details Linux kernel shrinker in the context of null/root memcg does not scan dentry and inode caches added by a task running in non-root memcg. For ZFS this means that dnode cache routinely overflows, evicting valuable meta/data and putting additional memory pressure on the system. This patch restores zfs_prune_aliases as fallback when the kernel shrinker does nothing, enabling zfs to actually free dnodes. Moreover, it (indirectly) calls arc_evict when dnode_size > dnode_limit. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #17487 Closes #17542	2025-07-21 10:32:01 -07:00
Alexander Motin	be1e991a1a	Allow and prefer special vdevs as ZIL Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details Before this change ZIL blocks were allocated only from normal or SLOG vdevs. In typical situation when special vdevs are SSDs and normal are HDDs it could cause weird inversions when data blocks are written to SSDs, but ZIL referencing them to HDDs. This change assumes that special vdevs typically have much better (or at least not worse) latency than normal, and so in absence of SLOGs should store ZIL blocks. It means similar to normal vdevs introduction of special embedded log allocation class and updating the allocation fallback order to: SLOG -> special embedded log -> special -> normal embedded log -> normal. The code tries to guess whether data block is going to be written to normal or special vdev (it can not be done precisely before compression) and prefer indirect writes for blocks written to a special vdev to avoid double-write. For blocks that are going to be written to normal vdev, special vdev by default plays as SLOG, reducing write latency by the cost of higher special vdev wear, but it is tunable via module parameter. This should allow HDD pools with decent SSD as special vdev to work under synchronous workloads without requiring additional SLOG SSD, impractical in many scenarios. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17505	2025-07-18 18:44:14 -07:00
Chunwei Chen	2669b00f13	Define sops->free_inode() to prevent use-after-free during lookup Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details On Linux, when doing path lookup with LOOKUP_RCU, dentry and inode can be dereferenced without refcounts and locks. For this reason, dentry and inode must only be freed after RCU grace period. However, zfs currently frees inode in zfs_inode_destroy synchronously and we can't use GPL-only call_rcu() in zfs directly. Fortunately, on Linux 5.2 and after, if we define sops->free_inode(), the kernel will do call_rcu() for us. This issue may be triggered more easily with init_on_free=1 boot parameter: BUG: kernel NULL pointer dereference, address: 0000000000000020 RIP: 0010:selinux_inode_permission+0x10e/0x1c0 Call Trace: ? show_trace_log_lvl+0x1be/0x2d9 ? show_trace_log_lvl+0x1be/0x2d9 ? show_trace_log_lvl+0x1be/0x2d9 ? security_inode_permission+0x37/0x60 ? __die_body.cold+0x8/0xd ? no_context+0x113/0x220 ? exc_page_fault+0x6d/0x130 ? asm_exc_page_fault+0x1e/0x30 ? selinux_inode_permission+0x10e/0x1c0 security_inode_permission+0x37/0x60 link_path_walk.part.0.constprop.0+0xb5/0x360 ? path_init+0x27d/0x3c0 path_lookupat+0x3e/0x1a0 filename_lookup+0xc0/0x1d0 ? __check_object_size.part.0+0x123/0x150 ? strncpy_from_user+0x4e/0x130 ? getname_flags.part.0+0x4b/0x1c0 vfs_statx+0x72/0x120 ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120 __do_sys_newlstat+0x39/0x70 ? __x64_sys_ioctl+0x8d/0xd0 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x62/0xc7 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Co-authored-by: Chunwei Chen <david.chen@nutanix.com> Closes #17546	2025-07-18 08:45:13 -07:00
Alexander Motin	d7ab07dfb4	ZIL: Force writing of open LWB on suspend Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Under parallel workloads ZIL may delay writes of open LWBs that are not full enough. On suspend we do not expect anything new to appear since zil_get_commit_list() will not let it pass, only returning TXG number to wait for. But I suspect that waiting for the TXG commit without having the last LWB issued may not wait for its completion, resulting in panic described in #17509. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17521	2025-07-17 15:31:19 -07:00
Paul Dagnelie	c1e51c55f5	Correct weight recalculation of space-based metaslabs Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details Currently, after a failed allocation, the metaslab code recalculates the weight for a metaslab. However, for space-based metaslabs, it uses the maximum free segment size instead of the normal weighting algorithm. This is presumably because the normal metaslab weight is (roughly) intended to estimate the size of the largest free segment, but it doesn't do that reliably at most fragmentation levels. This means that recalculated metaslabs are forced to a weight that isn't really using the same units as the rest of them, resulting in undesirable behaviors. We switch this to use the normal space-weighting function. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Sponsored-by: Wasabi Technology, Inc. Sponsored-by: Klara, Inc. Closes #17531	2025-07-16 10:20:57 -07:00
Paul Dagnelie	b21e04e8d9	Fix zdb pool/ with -k Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details When examining the root dataset with zdb -k, we get into a mismatched state. main() knows we are not examining the whole pool, but it strips off the trailing slash. import_checkpointed_state() then thinks we are examining the whole pool, and does not update the target path appropriately. The fix is to directly inform import_checkpointed_state that we are examining a filesystem, and not the whole pool. Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17536	2025-07-15 17:01:49 -07:00
Rob Norris	d323fbf49c	FreeBSD: zfs_putpages: don't undirty pages until after write completes In syncing mode, zfs_putpages() would put the entire range of pages onto the ZIL, then return VM_PAGER_OK for each page to the kernel. However, an associated zil_commit() or txg sync had not happened at this point, so the write may not actually be on disk. So, we rework that case to use a ZIL commit callback, and do the post-write work of undirtying the page and signaling completion there. We return VM_PAGER_PEND to the kernel instead so it knows that we will take care of it. The original version of this (`238eab7dc1`) copied the Linux model and did the cleanup in a ZIL callback for both async and sync. This was a mistake, as FreeBSD does not have a separate "busy for writeback" flag like Linux which keeps the page usable. The full sbusy flag locks the entire page out until the itx callback fires, which for async is after txg sync, which could be literal seconds in the future. For the async case, the data is already on the DMU and the in-memory ZIL, which is sufficient for async writeback, so the old method of logging it without a callback, undirtying the page and returning is more than sufficient and reclaims that lost performance. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mark Johnston <markj@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17533	2025-07-15 15:58:15 -07:00
Mark Johnston	ee2a2d941a	Revert "FreeBSD: zfs_putpages: don't undirty pages until after write completes" This causes async putpages to leave the pages sbusied for a long time, which hurts concurrency. Revert for now until we have a better approach. This reverts commit `238eab7dc1`. Reported by: Ihor Antonov <ngor@hugpoint.tech> Discussed with: Rob Norris <rob.norris@klarasystems.com> References: freebsd/freebsd-src@738a9a7 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mark Johnston <markj@FreeBSD.org> Ported-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17533	2025-07-15 15:58:11 -07:00
Rob Norris	1b84bd1dff	ZTS: test that zdb can work with libzpool tunables Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17537	2025-07-15 15:47:08 -07:00
Rob Norris	fce18e04d5	libzpool: tunable-based option interface for zdb/ztest Removes the old dlsym() based option setter and adds a new function handle_tunable_option() that can set, get and list all the tunables in the system. And then wire it up to zdb and ztest. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17537	2025-07-15 15:47:03 -07:00
Rob Norris	cb9742e532	libspl: add API for manipulating tunables Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17537	2025-07-15 15:46:58 -07:00
Rob Norris	967ce75669	libspl: implement ZFS_MODULE_PARAM for userspace For each tunable declaration, we create a zfs_tunable_t with its details, and then a pointer to it in the 'zfs_tunables' ELF section, that we can access later with a little support from the linker. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17537	2025-07-15 15:46:51 -07:00
Rob Norris	3a494c6d2a	mod.h: make consistent across all three platforms mod.h only exists to include the platform-specific mod_os.h, so we can get rid of it and just call the platform header mod.h. Then, create a libspl mod.h, and move the relevant items to it so we can start building on it. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17537	2025-07-15 15:46:14 -07:00
Carl George	fe3b2b76cf	CI: Add CentOS Stream 9/10 to the FULL_OS runner list Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Testing on CentOS Stream provides several months advance notice of changes coming to the RHEL kernel. This should help OpenZFS be proactive instead of reactive to new RHEL minor versions. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Carl George <carlwgeorge@gmail.com> ZFS-CI-Type: full Closes #16904 Closes #17526	2025-07-15 10:00:35 -07:00
Attila Fülöp	8de8e0df9f	objtool wrapper: use absolute path to call the wrapper Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Older kernel versions run make outside of the build directory. This works since all paths are absolute. Relative paths will fail in such a scenario. Use an absolute path to the objtool wrapper as well, since the relative path breaks the build on older kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #17541	2025-07-14 15:10:02 -07:00
Tino Reichardt	2461e6f636	Delete unused .cirrus.yml Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details The Cirrus_CI was planned for testing FreeBSD, but never really used I think. Currently it's not needed anymore, so remove it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #17155 Closes #17535	2025-07-11 08:49:06 -07:00
Tino Reichardt	d6dcae3166	ZTS: Fix FreeBSD 15.0 ksh errors Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details The package ksh93 is replaced by ksh now. This works for FreeBSD 13 and 14 also. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #17523	2025-07-09 14:40:32 -07:00
Alexander Motin	f66b57c87d	CI: Switch from FreeBSD 13.4 to 13.5 FreeBSD 13.4 is EOL since June 30, 2025. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #17519	2025-07-09 14:38:32 -07:00
Brian Behlendorf	ea38787f2e	Revert "Fix incorrect expected error in ztest" This reverts commit `2076011e0c`. The comment which explains EINVAL should be expected for this case was wrong, not the code. The kernel will return ENOTSUP when attaching a distributed spare to the wrong top-level dRAID vdev. See the check for this in spa_vdev_attach(). Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17503	2025-07-09 14:34:02 -07:00
Paul Dagnelie	a981cb69e4	Implement dynamic gang header sizes ZFS gang block headers are currently fixed at 512 bytes. This is increasingly wasteful in the era of larger disk sector sizes. This PR allows any size allocation to work as a gang header. It also contains supporting changes to ZDB to make gang headers easier to work with. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17004	2025-07-09 14:02:53 -07:00
Paul Dagnelie	e845be28e7	Add no-upgrade featureflag Adds a featureflag that is not enabled during upgrades unless listed explicitly. This is useful for features that could cause issues unless applied carefully; for example, a feature that could make a root pool unbootable if bootloaders don't yet have support for it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17004	2025-07-09 14:01:59 -07:00
rmacklem	4c2a7f85d5	FreeBSD: Add support for _PC_HAS_HIDDENSYSTEM Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details In FreeBSD there is now a pathconf name _PC_HAS_HIDDENSYSTEM. This patch adds support for it to OpenZFS. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca> Closes #17518	2025-07-08 22:11:22 -04:00
Ameer Hamza	523d9d6007	Validate mountpoint on path-based unmount using statx Use statx to verify that path-based unmounts proceed only if the mountpoint reported by statx matches the MNTTAB entry reported by libzfs, aborting the operation if they differ. Align `zfs umount /path` behavior with `zfs umount dataset`. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17481	2025-07-08 22:10:00 -04:00
Rob Norris	6af8db61b1	metaslab: don't pass whole zio to throttle reserve APIs Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details They only need a couple of fields, and passing the whole thing just invites fiddling around inside it, like modifying flags, which then makes it much harder to understand the zio state from inside zio.c. We move the flag update to just after a successful throttle in zio.c. Rename ZIO_FLAG_IO_ALLOCATING to ZIO_FLAG_ALLOC_THROTTLED Better describes what it means, and makes it look less like IO_IS_ALLOCATING, which means something different. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17508	2025-07-04 23:22:22 -04:00
Rob Norris	92d3b4ee2c	zio: rename `io_reexecute` as `io_post`; use it for the direct IO checksum error flag We're not supposed to modify someone else's io_flags, so we need another way to propagate DIO_CHKSUM_ERR. If we squint, we can see that io_reexecute is really just recording exceptional events that a parent (or its parents) will need to do something about. It just happens that the only things we've had historically are two forms of reexecution: now or later (suspend). So, rename it to io_post, as in, post-IO info/events/actions. And now we have a few spare bits for other conditions. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17507	2025-07-04 23:16:14 -04:00
Igor Ostapenko	ee0cb4cb89	ztest: Fix false positive of ENOSPC handling Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details Before running a pass zs_enospc_count is checked to free up some space by destroying a random dataset. But the space freed may still be not re-usable during the TXG_DEFER window breaking the next dataset creation in ztest_generic_run(). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com> Closes #17506	2025-07-03 16:00:13 -07:00
Meriel Luna Mittelbach	d411ea2e4d	Add templated zfs-mount@.service Runs `zfs mount -R <dataset>` at boot, after `zfs mount -a`. Intended to replace `mountpoint=legacy` in certain mount setups. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Meriel Luna Mittelbach <lunarlambda@gmail.com> Closes #17483	2025-07-03 14:24:07 -07:00
Brian Behlendorf	c98a393cb6	CI: run ztest on compressed zpool When running ztest under the CI a common failure mode is for the underlying filesystem to run out of available free space. Since the storage associated with a GitHub-hosted running is fixed, we instead create a pool and use a compressed ZFS dataset to store the ztest vdev files. This significantly increases the available capacity since the data written by ztest is highly compressible. A compression ratio of over 40:1 is conservatively achieved using the default lz4 compression. Autotrimming is enabled to ensure freed blocks are discarded from the backing cipool vdev file. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17501	2025-07-03 10:27:05 -07:00
Alexander Motin	4e92aee233	Relax special_small_blocks restrictions Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details special_small_blocks is applied to blocks after compression, so it makes no sense to demand its values to be power of 2. At most they could be multiple of 512, but that would still buy us nothing, so lets allow them be any within SPA_MAXBLOCKSIZE. Also special_small_blocks does not really need to depend on the set recordsize, enabled pool features or presence of special vdev. At worst in any of those cases it will just do nothing, so we should not complicate users lives by artificial limitations. While there, polish comments for recordsize and volblocksize. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17497	2025-07-02 11:11:37 -07:00
Martin Rüegg	17ee0fd4fa	pyzfs: Adapt python lib directory evaluation from ax_python_devel.m4 `71216b91d2` introduced a regression on debian/ubuntu systems during build. The reason being, that building the RPM for pyzfs was using a different library path than building the library itself. This is now harmonized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch> Closes #16155 Closes #17480	2025-07-02 09:58:22 -07:00
Martin Rüegg	6d838ec0b6	pyzfs: Update ax_python_devel.m4 to serial 37 Fixes an obvious typo, where a variable was missing the required leading dollar sign ($) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch> Closes #17480	2025-07-02 09:57:50 -07:00
Alexander Motin	bf846dcb7d	Release topology restrictions on special/dedup Special vdevs were originally designed as a small blocks storage for dRAID, for which role RAIDZ/dRAID topologies are not good. But it is more often used as SSD storage for metadata and hot data of HDD pools. In these use cases narrow RAIDZ of SSDs might be fine, so we should not introduce unnecessary restrictions, and ZFS internally does not care. Similar applies to dedup vdevs. Original DDT used 4KB blocks, for which anything but mirror was a terrible storage. But new FDT implementation uses 32KB blocks by default, which are much less demanding even including compression, and which could be increased even higher now, if needed. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17496	2025-07-02 09:33:47 -07:00
Chunwei Chen	eacf618a65	Missing tests in make pkg Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details ``` Warning: TestGroup '/var/tmp/tests/functional/ctime' not added to this run. Auxiliary script '/var/tmp/tests/functional/ctime/setup' failed verification. ``` Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #17491	2025-06-30 16:16:27 -07:00
Olivier Certner	dee62e074a	spa: ZIO_TASKQ_ISSUE: Use symbolic priority Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details This allows to change the meaning of priority differences in FreeBSD without requiring code changes in ZFS. This upstreams commit fd141584cf89d7d2 from FreeBSD src. Sponsored-by: The FreeBSD Foundation Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Olivier Certner <olce@FreeBSD.org> Closes #17489	2025-06-30 10:24:23 -04:00
Paul Dagnelie	69ee01aa4b	Fix bug caused by rounding in vdev_raidz_asize_to_psize Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details When an allocation is happening on a raidz vdev, the number of sectors allocated is rounded up to a multiple of nparity + 1. If this results in the allocation spilling into an extra row, then the corresponding call to vdev_raidz_asize_to_psize will incorrectly assume that parity sectors were allocated for that spilled row, even though no data is stored there. If we determine that happened, we need to subtract out those extra sectors before performing the rest of the capacity calculation. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17490	2025-06-27 14:54:20 -04:00
Rob Norris	ea076d6921	vdev_raidz_asize_to_psize: return psize, not asize Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details Since `246e588`, gang blocks written to raidz vdevs will write past the end of their allocation, corrupting themselves, other data, or both. The reason is simple - when allocating the gang children, we call vdev_psize_to_asize() to find out how much data we should load into the allocation we just did. vdev_raidz_asize_to_psize() had a bug; it computed the psize, but returned the original asize. The raidz layer dutifully writes that much out, into space beyond the end of the allocation. If there's existing data there, it gets overwritten, causing checksum errors when that data is read. Even there's not data there (unlikely, given that gang blocks are in play at all), that area is not considered allocated, so can be allocated and overwritten later. The fix is simple: return the psize we just computed. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17488	2025-06-26 10:19:59 -04:00
Mark Johnston	0a2163d194	FreeBSD: Ensure that z_pflags is initialized for new znodes Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details The field is subsequently accessed in zfs_mknode(), in zfs_inherit_projid(). The Linux implementation of zfs_create_fs() has this initialization already; there is no counterpart to zfs_create_share_dir() that I can see. Reported-by: KMSAN Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #17486	2025-06-25 12:07:17 -04:00
Paul Dagnelie	d461a67d0a	Ensure that gang_copies is always at least as large as copies As discussed in the comments of PR #17004, you can theoretically run into a case where a gang child has more copies than the gang header, which can lead to some odd accounting behavior (and even trip a VERIFY). While the accounting code could be changed to handle this, it fundamentally doesn't seem to make a lot of sense to allow this to happen. If the data is supposed to have a certain level of reliability, that isn't actually achieved unless the gang_copies property is set to match it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17484	2025-06-25 12:05:36 -04:00
Rob Norris	46a4075100	Linux 6.16: remove writepage and readahead_page Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17443	2025-06-23 15:51:02 -04:00
Brian Behlendorf	48ce292ea0	Clarify and restrict dmu_tx_assign() errors There are three possible cases where dmu_tx_assign() may encounter a fatal error. When there is a true lack of free space (ENOSPC), when there is a lack of quota space (EDQUOT), or when data required to perform the transaction cannot be read from disk (EIO). See the dmu_tx_check_ioerr() function for additional details of on the motivation for check for I/O error early. Prior to this change dmu_tx_assign() would return the contents of tx->tx_err which covered a wide range of possible error codes (EIO, ECKSUM, ESRCH, etc). In practice, none of the callers could do anything useful with this level of detail and simply returned the error. Therefore, this change converts all tx->tx_err errors to EIO, adds ASSERTs to dmu_tx_assign() to cover the only possible errors, and clarifies the function comment to include EIO as a possible fatal error. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian D Behlendorf <behlendo@slag12.llnl.gov> Closes #17463	2025-06-23 15:48:30 -04:00
Paul Dagnelie	8170eb6ebc	Fix TestGroup warning due to missing tags Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17473	2025-06-19 14:41:31 -07:00
Alexander Motin	5e5253be84	FreeBSD: Wire projects support While FreeBSD itself does not support projects, there is no reason why it can't be controlled via `zfs project` and other subcommands. Most of the code is actually already there and just needs some revival and sync with Linux, plus enabling some tests not depending on the OS support. Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17423	2025-06-19 14:39:20 -07:00
Paul Dagnelie	717213d431	Fix other nonrot bugs Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details There are still a variety of bugs involving the vdev_nonrot property that will cause problems if you try to run the test suite with segment-based weighting disabled, and with other things in the weighting code. Parents' nonrot property need to be updated when children are added. When vdevs are expanded and more metaslabs are added, the weights have to be recalculated (since the number of metaslabs is an input to the lba bias function). When opening, faulted or unopenable children should not be considered for whether a vdev is nonrot or not (since the nonrot property is determined during a successful open, this can cause false negatives). And draid spares need to have the nonrot property set correctly. Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17469	2025-06-19 09:25:58 -04:00
Tino Reichardt	585dbbf13b	ZTS: Use FreeBSD cloudinit images Some checks are pending checkstyle / checkstyle (push) Waiting to run Details CodeQL / Analyze (cpp) (push) Waiting to run Details CodeQL / Analyze (python) (push) Waiting to run Details zloop / zloop (push) Waiting to run Details FreeBSD provides CI-IMAGES since some time. These images are based on nuageinit, which does not support fqdn and sudo for example. So we need currently some workarounds to get it working. The FreeBSD images will be more compatible with cloud-init in some near future. Then we can remove the workaround things. These versions are used for testing: - freebsd13-4r (RELEASE) - freebsd14-3s (STABLE) - freebsd15-0c (CURRENT) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #17462	2025-06-18 10:19:21 -04:00
Attila Fülöp	6cf17f6538	Linux build: handle CONFIG_OBJTOOL_WERROR=y Some checks failed checkstyle / checkstyle (push) Has been cancelled Details CodeQL / Analyze (cpp) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details zloop / zloop (push) Has been cancelled Details Linux 5.16 by default fails the build on objtool warnings. We have known and understood objtool warnings we can't fix without involving Linux maintainers. To work around this we introduce an objtool wrapper script which removes the `--Werror` flag. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #17456	2025-06-16 08:12:09 -07:00

1 2 3 4 5 ...

10142 Commits