mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs synced 2025-04-28 13:20:02 +00:00

Author	SHA1	Message	Date
Robert Evans	3a445f2ef5	Remove duplicate dedup_legacy_create in common.run Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Robert Evans <evansr@google.com> Closes #16926	2025-01-05 17:25:22 -08:00
Richard Kojedzinszky	dc0324bfa9	fix: make zfs_strerror really thread-safe and portable #15793 wanted to make zfs_strerror threadsafe, unfortunately, it turned out that strerror_l() usage was wrong, and also, some libc implementations dont have strerror_l(). zfs_strerror() now simply calls original strerror() and copies the result to a thread-local buffer, then returns that. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Kojedzinszky <richard@kojedz.in> Closes #15793 Closes #16640 Closes #16923	2025-01-04 10:33:27 -08:00
Don Brady	939e0237c5	Too many vdev probe errors should suspend pool Similar to what we saw in #16569, we need to consider that a replacing vdev should not be considered as fully contributing to the redundancy of a raidz vdev even though current IO has enough redundancy. When a failed vdev_probe() is faulting a disk, it now checks if that disk is required, and if so it suspends the pool until the admin can return the missing disks. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Don Brady <don.brady@klarasystems.com> Closes #16864	2025-01-04 10:28:33 -08:00
Robert Evans	50cbb14641	Add Makefile dependencies for scripts/zfs-tests.sh -c This updates the Makefile to be more correct for parallel make. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Robert Evans <evansr@google.com> Closes #16030 Closes #16922	2025-01-03 19:04:01 -08:00
Toomas Soome	ee3bde9dad	ZTS: checkpoint_discard_busy should use save_tunable/restore_tunable Instead of using hardwired value for SPA_DISCARD_MEMORY_LIMIT, use save_tunable and restore_tunable to restore the pre-test state. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #16919	2025-01-03 14:48:30 -08:00
Rob Norris	c02e1cf055	vdev_open: clear async remove flag after reopen It's possible for a vdev to be flagged for async remove after the pool has suspended. If the removed device has been returned when the pool is resumed, the ASYNC_REMOVE task will still run at the end of txg, and remove the device from the pool again. To fix, we clear the async remove flag at reopen, just as we did for the async fault flag in `5de3ac223`. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16921	2025-01-03 14:42:06 -08:00
Toomas Soome	e94549d868	ZTS: remove unused TESTDIRS from pam/cleanup.ksh Remove TESTDIRS as it is not set for pam tests. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #16920	2025-01-03 14:41:03 -08:00
pstef	478b09577a	zfs_vnops_os.c: fallocate is valid but not supported on FreeBSD This works around /usr/lib/go-1.18/pkg/tool/linux_amd64/link: mapping output file failed: invalid argument It's happened to me under a Linux jail, but it's also happened to other people, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270247#c4 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: pstef <pstef@users.noreply.github.com> Closes #16918	2025-01-03 09:03:14 -08:00
Toomas Soome	d35f9f2e84	ZTS: checkpoint_discard_busy does not set 16M on cleanup Originally hex value is used as decimal. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #16917	2025-01-02 15:57:24 -08:00
Toomas Soome	d6b4110d71	ZTS: functional/mount scripts are not removing /var/tmp/testdir.X dirs cleanup.ksh is assuming we have TESTDIRS set. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #16915	2025-01-02 15:53:53 -08:00
Toomas Soome	8dc15ef4b3	ZTS: zfs_mount_all_fail leaves /var/tmp/testrootPIDNUM directory around Before we can remove test files, we need to unmount datasets used by test first. See also: zfs_mount_all_mountpoints.ksh Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #16914	2025-01-02 13:29:12 -08:00
James Reilly	3c2267a873	ZTS: add centos stream10 (#16904 ) Added centos as optional runners via workflow_dispatch removed centos-stream9 from the FULL_OS runner list as CentOS is not officially support by ZFS. This commit will add preliminary support for EL10 and allow testing ZFS ahead of EL10 codebase solidifying in ~6 months Signed-off-by: James Reilly <jreilly1821@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>	2025-01-02 09:28:56 -08:00
Andrew Walker	25238baad5	Add missing zfs_exit() when snapdir is disabled (#16912 ) zfs_vget doesn't zfs_exit when erroring out due to snapdir being disabled. Signed-off-by: Andrew Walker <awalker@ixsystems.com> Reviewed-by: @bmeagherix Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2024-12-30 17:06:48 -08:00
shodanshok	54126fdb5b	set zfs_arc_shrinker_limit to 0 by default zfs_arc_shrinker_limit was introduced to avoid ARC collapse due to aggressive kernel reclaim. While useful, the current default (10000) is too prone to OOM especially when MGLRU-enabled kernels with default min_ttl_ms are used. Even when no OOM happens, it often causes too much swap usage. This patch sets zfs_arc_shrinker_limit=0 to not ignore kernel reclaim requests. ARC now plays better with both kernel shrinker and pagecache but, should ARC collapse happen again, MGLRU behavior can be tuned or even disabled. Anyway, zfs should not cause OOM when ARC can be released. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #16909	2024-12-29 11:50:19 -08:00
Ameer Hamza	9dd5fe1095	zvol: implement platform-independent part of block cloning In Linux, block devices currently lack support for `copy_file_range` API because the kernel does not provide the necessary functionality. However, there is an ongoing upstream effort to address this limitation: https://patchwork.kernel.org/project/dm-devel/cover/20240520102033.9361-1-nj.shetty@samsung.com/. We have adopted this upstream kernel patch into the TrueNAS kernel and made some additional modifications to enable block cloning specifically for the zvol block device. This patch implements the platform- independent portions of these changes for inclusion in OpenZFS. This patch does not introduce any new functionality directly into OpenZFS. The `TX_CLONE_RANGE` replay capability is only relevant when zvols are migrated to non-TrueNAS systems that support Clone Range replay in the ZIL. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #16901	2024-12-29 11:41:30 -08:00
Alexander Motin	a153397f41	ZTS: Reduce file size in redacted_panic to 1GB This test takes 3 minutes on RELEASE FreeBSD bots, but on CURRENT, probably due to debugging it has in kernel, it does not complete within 10 minutes, ending up killed. As I see all the redacting here happens within the first ~128MB of the file, so I hope it won't matter if there is 1GB of data instead of 2GB. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by:Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #11141	2024-12-29 11:19:32 -08:00
Alexander Motin	b66d910113	ZTS: Remove procfs use from zpool_import_status procfs might be not mounted on FreeBSD. Plus checking for specific PID might be not exactly reliable. Check for empty list of jobs instead. Premature loop exit can result in failed test and failed cleanup, failing also some following tests. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by:Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #11141	2024-12-29 11:19:25 -08:00
Alexander Motin	8bf1e83eef	ZTS: Remove non-standard awk hex numbers usage FreeBSD recently removed non-standard hex numbers support from awk. Neither it supports -n argument, enabling it in gawk. Instead of depending on those rewrite list_file_blocks() fuction to handle the hex math in shell instead of awk. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by:Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #11141	2024-12-29 11:17:27 -08:00
Rob Norris	c4e5fa5e17	ZTS: test clearing pool and vdev userprops Confirming that clearing pool and vdev userprops produce the same result: an empty value, with default source. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16887	2024-12-29 11:12:16 -08:00
Rob Norris	03b7cfdef3	spa_sync_props: remove pool userprops by setting empty-string People have noted there's no way to remove a pool userprop, only zero it. Turns vdev userprops had a method, by setting empty-string. So this makes pool userprops follow the same behaviour. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16887	2024-12-29 11:12:04 -08:00
Rob Norris	779c5a5deb	zpool_get_vdev_prop_value: show missing vdev userprops If a vdev userprop is not found, present it as value '-', default source, so it matches the output from pool userprops. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16887	2024-12-29 11:11:40 -08:00
Alexander Motin	89f796dec6	ZTS: Increase write sizes for RAIDZ/dRAID tests Many RAIDZ/dRAID tests filled files doing millions of 100 or even 10 byte writes. It makes very little sense since we are not micro-benchmarking syscalls or VFS layer here, while before the blocks reach the vdev layer absolute majority of the small writes will be aggregated. In some cases I see we spend almost as much time creating the test files as actually running the tests. And sometimes the tests even time out after that. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16905	2024-12-27 10:01:22 -05:00
Rob Norris	c37a2ddaaa	microzap: set hard upper limit of 1M The count of chunks in a microzap block is stored as an uint16_t (mze_chunkid). Each chunk is 64 bytes, and the first is used to store a header, so there are 32767 usable chunks, which is just under 2M. 1M is the largest power-2-rounded block size under 2M, so we must set the limit there. If it goes higher, the loop in mzap_addent can overflow and fall into the PANIC case. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16888	2024-12-26 17:10:09 -05:00
Alexander Motin	1acd246964	Fix readonly check for vdev user properties VDEV_PROP_USERPROP is equal do VDEV_PROP_INVAL and so is not a real property. That's why vdev_prop_readonly() does not work right for it. In particular it may declare all vdev user properties readonly on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16890	2024-12-20 17:25:35 -05:00
Umer Saleem	219a89cbbf	Skip iterating over snapshots for share properties Setting sharenfs and sharesmb properties on a dataset can become costly if there are large number of snapshots, since setting the share properties iterates over all snapshots present for a dataset. If it is the root dataset for which we are trying to set the share property, snapshots for all child datasets and their children will also be iterated. There is no need to iterate over snapshots for share properties because we do not allow share properties or any other property, to be set on a snapshot itself execpt for user properties. This commit skips iterating over snapshots for share properties, instead iterate over all child dataset and their children for share properties. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #16877	2024-12-19 15:02:58 -05:00
Rob Norris	f00a57a786	zfs_main: fix alignment on props usage output I guess we've got some long property names since this was first set up! Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16883	2024-12-19 11:04:56 -05:00
Tino Reichardt	e5ac7786bd	CI: Fix FreeBSD 13.4 STABLE build In #16869 we added FreeBSD 13.4 STABLE, but forget the special thing, that the virtio nic within FreeBSD 13.x is buggy. This fix adds the needed rtl8139 nic to the VM. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #16885	2024-12-19 11:01:34 -05:00
Rob Norris	ab7cbbe789	zprop: fix value help for ZPOOL_PROP_CAPACITY It's a percentage and documented as such, but we were showing it as <size>. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16881	2024-12-18 15:25:12 -08:00
Brian Behlendorf	830a531249	CI: Add FreeBSD 14.2 RELEASE+STABLE builds Update the CI to include FreeBSD 14.2 as a regularly tested platform. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #16869	2024-12-17 08:58:33 -08:00
Brian Atkinson	882a809983	Use pin_user_pages API for Direct I/O requests As of kernel v5.8, pin_user_pages* interfaced were introduced. These interfaces use the FOLL_PIN flag. This is preferred interface now for Direct I/O requests in the kernel. The reasoning for using this new interface for Direct I/O requests is explained in the kernel documenetation: Documentation/core-api/pin_user_pages.rst If pin_user_pages_unlocked is available, the all Direct I/O requests will use this new API to stay uptodate with the kernel API requirements. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #16856	2024-12-16 10:24:30 -08:00
Brian Atkinson	c6442bd3b6	Removing old code outside of 4.18 kernsls There were checks still in place to verify we could completely use iov_iter's on the Linux side. All interfaces are available as of kernel 4.18, so there is no reason to check whether we should use that interface at this point. This PR completely removes the UIO_USERSPACE type. It also removes the check for the direct_IO interface checks. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #16856	2024-12-16 10:23:45 -08:00
Shengqi Chen	acda137d8c	simd_stat: fix undefined CONFIG_KERNEL_MODE_NEON error on armel CONFIG_KERNEL_MODE_NEON depends on CONFIG_NEON. Neither is defined on armel. Add a guard to avoid compilation errors. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Closes #16871	2024-12-16 09:40:41 -08:00
Brian Behlendorf	22259fb24d	Fix stray "no" in configure output This is purely a cosmetic fix which removes a stray "no" from the configure output. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #16867	2024-12-14 14:05:12 -08:00
Alexander Motin	ff6266ee9b	Fix use-afer-free regression in RAIDZ expansion We should not dereference rra after the last zio_nowait() is called. It seems very unlikely, but ASAN in ztest managed to catch it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16868	2024-12-14 14:02:11 -08:00
kotauskas	586304ac44	Remount datasets on soft-reboot The one-shot zfs-mount.service is incorrectly deemed active by Systemd after a systemctl soft-reboot. As such, soft-rebooting prevents zfs mount -a from being ran automatically. This commit makes it so that zfs-mount.service is marked as being undone by the time umount.target is reached, so that zfs.target then pulls it in again and gets it restarted after a soft reboot. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: kotauskas <v.toncharov@gmail.com> Closes #16845	2024-12-13 13:50:50 -08:00
Rob Norris	46e06feded	flush: only detect lack of flush support in one place It seems there's no good reason for vdev_disk & vdev_geom to explicitly detect no support for flush and set vdev_nowritecache. Instead, just signal it by setting the error to ENOTSUP, and let zio_vdev_io_assess() take care of it in one place. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16855	2024-12-13 12:19:54 -08:00
Rob Norris	fbea92432a	flush: don't report flush error when disabling flush support The first time a device returns ENOTSUP in repsonse to a flush request, we set vdev_nowritecache so we don't issue flushes in the future and instead just pretend the succeeded. However, we still return an error for the initial flush, even though we just decided such errors are meaningless! So, when setting vdev_nowritecache in response to a flush error, also reset the error code to assume success. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16855	2024-12-13 12:19:20 -08:00
Poscat	76f57ab9f7	build: use correct bashcompletiondir on arch Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: poscat <poscat@poscat.moe> Closes #16861	2024-12-13 12:06:40 -08:00
Rob Norris	ecc0970e3e	backtrace: fix off-by-one on string output sizeof("foo") includes the trailing null byte, so all the output had nulls through it. Most terminals quietly ignore it, but it makes some tools misdetect file types and other annoyances. Easy fix: subtract 1. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16862	2024-12-13 10:12:14 -08:00
Chunwei Chen	6c9b4f18d3	Fix DR_OVERRIDDEN use-after-free race in dbuf_sync_leaf In dbuf_sync_leaf, we clone the arc_buf in dr if we share it with db except for overridden case. However, this exception causes a race where dbuf_new_size could free the arc_buf after the last dereference of datap and causes use-after-free. We fix this by cloning the buf regardless if it's overridden. The race: -- P0 P1 dbuf_hold_impl() // dbuf_hold_copy passed // because db_data_pending NULL dbuf_sync_leaf() // doesn't clone datap // datap derefed to db_buf dbuf_write(datap) dbuf_new_size() dmu_buf_will_dirty() dbuf_fix_old_data() // alloc new buf for P0 dr // but can't change *datap arc_alloc_buf() arc_buf_destroy() // alloc new buf for db_buf // and destroy old buf dbuf_write() // continue abd_get_from_buf(data->b_data, arc_buf_size(data)) // use-after-free -- Here's an example when it happens: BUG: kernel NULL pointer dereference, address: 000000000000002e RIP: 0010:arc_buf_size+0x1c/0x30 [zfs] Call Trace: dbuf_write+0x3ff/0x580 [zfs] dbuf_sync_leaf+0x13c/0x530 [zfs] dbuf_sync_list+0xbf/0x120 [zfs] dnode_sync+0x3ea/0x7a0 [zfs] sync_dnodes_task+0x71/0xa0 [zfs] taskq_thread+0x2b8/0x4e0 [spl] kthread+0x112/0x130 ret_from_fork+0x1f/0x30 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Co-authored-by: Chunwei Chen <david.chen@nutanix.com> Closes #16854	2024-12-12 16:18:45 -08:00
Alexander Motin	19a04e5ad2	BRT: Check bv_mos_entries in brt_entry_lookup() When vdev first sees some block cloning, there is a window when brt_maybe_exists() might already return true since something was cloned, but bv_mos_entries is still 0 since BRT ZAP was not yet created. In such case we should not try to look into the ZAP and dereference NULL bv_mos_entries_dnode. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16851	2024-12-12 10:22:41 -08:00
Rob Norris	e0039c7057	Remove unnecessary CSTYLED escapes on top-level macro invocations cstyle can handle these cases now, so we don't need to disable it. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16840	2024-12-06 08:53:57 -08:00
Rob Norris	0de8ae56f7	cstyle: ignore old non-POSIX types in macro invocations In code generation macros, we often use names like `uint` when constructing handler functions. These are not being used as types, so exclude them from the admonishment to use POSIX type names. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16840	2024-12-06 08:53:53 -08:00
Rob Norris	ba00a6f9a3	cstyle: understand macro params can be empty It's not uncommon to have empty parameters in code generator macros, usually when multiple parameters are concatenated or stringified into a single token or literal. So, exclude the space-before-comma check, which will allow construction like `MACRO_CALL(foo, , baz)`. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16840	2024-12-06 08:53:36 -08:00
Rob Norris	903895ea5f	cstyle: understand basic top-level macro invocations We quite often invoke macros outside of functions, usually to generate functions or data. cstyle inteprets these as function headers, which at least have opinions for indenting. This introduces a separate state for top-level macro invocations, and excludes it from matching functions. For the moment, most of the existing rules will continue to apply, but this gives us a way to add or removes rules targeting macros specifically. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16840	2024-12-06 08:53:12 -08:00
Alexander Motin	a44eaf1690	Optimize RAIDZ expansion - Instead of copying one ashift-sized block per ZIO, copy as much as we have contiguous data up to 16MB per old vdev. To avoid data moves use gang ABDs, so that read ZIOs can directly fill buffers for write ZIOs. ABDs have much smaller overhead than ZIOs in both memory usage and processing time, plus big I/Os do not depend on I/O aggregation and scheduling to reach decent performance on HDDs. - Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms. - Use 32bit range tree when possible (practically always now) to slightly reduce memory usage. - Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as for main ones. - Fix rate overflows in `zpool status` reporting. With these changes expanding RAIDZ1 from 4 to 5 children I am able to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are limited by devices instead of CPU. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15680 Closes #16819	2024-12-06 08:50:16 -08:00
Alexander Motin	e8b333e4d3	Fix false assertion in dmu_tx_dirty_buf() on cloning Same as writes block cloning can increase block size and number of indirection levels. That means it can dirty block 0 at level 0 or at new top indirection level without explicitly holding them. A block cloning test case for large offsets has been added. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16825	2024-12-05 11:48:08 -08:00
Don Brady	44446dccdb	During pool export flush the ARC asynchronously This also includes removing L2 vdevs asynchronously. This commit also guarantees that spa_load_guid is unique. The zpool reguid feature introduced the spa_load_guid, which is a transient value used for runtime identification purposes in the ARC. This value is not the same as the spa's persistent pool guid. However, the value is seeded from spa_generate_load_guid() which does not check for uniqueness against the spa_load_guid from other pools. Although extremely rare, you can end up with two different pools sharing the same spa_load_guid value! So we guarantee that the value is always unique and additionally not still in use by an async arc flush task. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Don Brady <don.brady@klarasystems.com> Closes #16215	2024-12-05 08:58:20 -08:00
Rob Norris	2507db612d	zdb_il: use flex array member to access ZIL records In `6f50f8e16` we added flex arrays to lr_XX_t structs to silence kernel bounds check warnings. Userspace code was mostly not updated to use them though. It seems that in the right circumstances, compilers can get confused about sizes in the same way, and throw warnings. This commits switch those uses over to use the flex array fields also. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16832	2024-12-04 19:03:20 -05:00
Alexander Motin	a01504b35c	Improve speculative prefetcher for block cloning - Issue prescient prefetches for demand indirect blocks after the first one. It should be quite rare for reads/writes, but much more useful for cloning due to much bigger (up to 1022 blocks) accesses. It covers the gap during the first couple accesses when we can not speculate yet, but we know what is needed right now. It reduces dbuf_hold() sync read delays in dmu_buf_hold_array_by_dnode(). - Increase maximum prefetch distance for indirect blocks from 64 to 128MB. It should cover the maximum 1022 blocks of block cloning access size in case of default 128KB recordsize used. In case of bigger recordsize the above prescient prefetch should also help. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16814	2024-12-04 15:19:05 -08:00

1 2 3 4 5 ...

9715 Commits