Better fill empty metaslabs

Before this change zfs_metaslab_switch_threshold tunable switched
metaslabs each time ones index reduced by two (which means biggest
contiguous chunk reduced to 1/4).  It is a good idea to balance
metaslabs fragmentation.  But for empty metaslabs (having power-
of-2 sizes) this means switching when they get just below the half
of their capacity.  Inspection with zdb after filling new pool to
half capacity shown most of its metaslabs filled to half capacity.
I consider this sub-optimal for pool fragmentation in a long run.

This change blocks the metaslabs switching if most of the metaslab
free space (15/16) is represented by a single contiguous range.
Such metaslab should not be considered fragmented until it actually
fail some big allocation.  More contiguous filling should improve
data locality and increase time before previously filled and
partially freed metaslab is touched again, giving it more time to
free more contiguous chunks for lower fragmentation.  It should
also slightly reduce spacemap traffic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17081
This commit is contained in:
Alexander Motin 2025-02-25 14:26:34 -05:00 committed by Ameer Hamza
parent b4ce059a76
commit c2668b2d10

View File

@ -3545,6 +3545,15 @@ metaslab_segment_may_passivate(metaslab_t *msp)
if (WEIGHT_IS_SPACEBASED(msp->ms_weight) || spa_sync_pass(spa) > 1) if (WEIGHT_IS_SPACEBASED(msp->ms_weight) || spa_sync_pass(spa) > 1)
return; return;
/*
* As long as a single largest free segment covers majorioty of free
* space, don't consider the metaslab fragmented. It should allow
* us to fill new unfragmented metaslabs full before switching.
*/
if (metaslab_largest_allocatable(msp) >
zfs_range_tree_space(msp->ms_allocatable) * 15 / 16)
return;
/* /*
* Since we are in the middle of a sync pass, the most accurate * Since we are in the middle of a sync pass, the most accurate
* information that is accessible to us is the in-core range tree * information that is accessible to us is the in-core range tree