From c2668b2d1096091291b1c3d0caeda985c0bd80d5 Mon Sep 17 00:00:00 2001 From: Alexander Motin Date: Tue, 25 Feb 2025 14:26:34 -0500 Subject: [PATCH] Better fill empty metaslabs Before this change zfs_metaslab_switch_threshold tunable switched metaslabs each time ones index reduced by two (which means biggest contiguous chunk reduced to 1/4). It is a good idea to balance metaslabs fragmentation. But for empty metaslabs (having power- of-2 sizes) this means switching when they get just below the half of their capacity. Inspection with zdb after filling new pool to half capacity shown most of its metaslabs filled to half capacity. I consider this sub-optimal for pool fragmentation in a long run. This change blocks the metaslabs switching if most of the metaslab free space (15/16) is represented by a single contiguous range. Such metaslab should not be considered fragmented until it actually fail some big allocation. More contiguous filling should improve data locality and increase time before previously filled and partially freed metaslab is touched again, giving it more time to free more contiguous chunks for lower fragmentation. It should also slightly reduce spacemap traffic. Reviewed-by: Brian Behlendorf Reviewed-by: Paul Dagnelie Signed-off-by: Alexander Motin Sponsored by: iXsystems, Inc. Closes #17081 --- module/zfs/metaslab.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/module/zfs/metaslab.c b/module/zfs/metaslab.c index 35bd968f6..c1424a81b 100644 --- a/module/zfs/metaslab.c +++ b/module/zfs/metaslab.c @@ -3545,6 +3545,15 @@ metaslab_segment_may_passivate(metaslab_t *msp) if (WEIGHT_IS_SPACEBASED(msp->ms_weight) || spa_sync_pass(spa) > 1) return; + /* + * As long as a single largest free segment covers majorioty of free + * space, don't consider the metaslab fragmented. It should allow + * us to fill new unfragmented metaslabs full before switching. + */ + if (metaslab_largest_allocatable(msp) > + zfs_range_tree_space(msp->ms_allocatable) * 15 / 16) + return; + /* * Since we are in the middle of a sync pass, the most accurate * information that is accessible to us is the in-core range tree