mirror of
https://git.proxmox.com/git/mirror_zfs
synced 2025-10-19 00:57:25 +00:00
![]() The performance of `zfs receive` can be bottlenecked on the CPU consumed by the `receive_writer` thread, especially when receiving streams with small compressed block sizes. Much of the CPU is spent creating and destroying dbuf's and arc buf's, one for each `WRITE` record in the send stream. This commit introduces the concept of "lightweight writes", which allows `zfs receive` to write to the DMU by providing an ABD, and instantiating only a new type of `dbuf_dirty_record_t`. The dbuf and arc buf for this "dirty leaf block" are not instantiated. Because there is no dbuf with the dirty data, this mechanism doesn't support reading from "lightweight-dirty" blocks (they would see the on-disk state rather than the dirty data). Since the dedup-receive code has been removed, `zfs receive` is write-only, so this works fine. Because there are no arc bufs for the received data, the received data is no longer cached in the ARC. Testing a receive of a stream with average compressed block size of 4KB, this commit improves performance by 50%, while also reducing CPU usage by 50% of a CPU. On a per-block basis, CPU consumed by receive_writer() and dbuf_evict() is now 1/7th (14%) of what it was. Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35% New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0% The code is also restructured in a few ways: Added a `dr_dnode` field to the dbuf_dirty_record_t. This simplifies some existing code that no longer needs `DB_DNODE_ENTER()` and related routines. The new field is needed by the lightweight-type dirty record. To ensure that the `dr_dnode` field remains valid until the dirty record is freed, we have to ensure that the `dnode_move()` doesn't relocate the dnode_t. To do this we keep a hold on the dnode until it's zio's have completed. This is already done by the user-accounting code (`userquota_updates_task()`), this commit extends that so that it always keeps the dnode hold until zio completion (see `dnode_rele_task()`). `dn_dirty_txg` was previously zeroed when the dnode was synced. This was not necessary, since its meaning can be "when was this dnode last dirtied". This change simplifies the new `dnode_rele_task()` code. Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11105 |
||
---|---|---|
.. | ||
abd.c | ||
aggsum.c | ||
arc.c | ||
blkptr.c | ||
bplist.c | ||
bpobj.c | ||
bptree.c | ||
bqueue.c | ||
btree.c | ||
dataset_kstats.c | ||
dbuf_stats.c | ||
dbuf.c | ||
ddt_zap.c | ||
ddt.c | ||
dmu_diff.c | ||
dmu_object.c | ||
dmu_objset.c | ||
dmu_recv.c | ||
dmu_redact.c | ||
dmu_send.c | ||
dmu_traverse.c | ||
dmu_tx.c | ||
dmu_zfetch.c | ||
dmu.c | ||
dnode_sync.c | ||
dnode.c | ||
dsl_bookmark.c | ||
dsl_crypt.c | ||
dsl_dataset.c | ||
dsl_deadlist.c | ||
dsl_deleg.c | ||
dsl_destroy.c | ||
dsl_dir.c | ||
dsl_pool.c | ||
dsl_prop.c | ||
dsl_scan.c | ||
dsl_synctask.c | ||
dsl_userhold.c | ||
edonr_zfs.c | ||
fm.c | ||
gzip.c | ||
hkdf.c | ||
lz4.c | ||
lzjb.c | ||
Makefile.in | ||
metaslab.c | ||
mmp.c | ||
multilist.c | ||
objlist.c | ||
pathname.c | ||
range_tree.c | ||
refcount.c | ||
rrwlock.c | ||
sa.c | ||
sha256.c | ||
skein_zfs.c | ||
spa_boot.c | ||
spa_checkpoint.c | ||
spa_config.c | ||
spa_errlog.c | ||
spa_history.c | ||
spa_log_spacemap.c | ||
spa_misc.c | ||
spa_stats.c | ||
spa.c | ||
space_map.c | ||
space_reftree.c | ||
THIRDPARTYLICENSE.cityhash | ||
THIRDPARTYLICENSE.cityhash.descrip | ||
txg.c | ||
uberblock.c | ||
unique.c | ||
vdev_cache.c | ||
vdev_draid_rand.c | ||
vdev_draid.c | ||
vdev_indirect_births.c | ||
vdev_indirect_mapping.c | ||
vdev_indirect.c | ||
vdev_initialize.c | ||
vdev_label.c | ||
vdev_mirror.c | ||
vdev_missing.c | ||
vdev_queue.c | ||
vdev_raidz_math_aarch64_neon_common.h | ||
vdev_raidz_math_aarch64_neon.c | ||
vdev_raidz_math_aarch64_neonx2.c | ||
vdev_raidz_math_avx2.c | ||
vdev_raidz_math_avx512bw.c | ||
vdev_raidz_math_avx512f.c | ||
vdev_raidz_math_impl.h | ||
vdev_raidz_math_powerpc_altivec_common.h | ||
vdev_raidz_math_powerpc_altivec.c | ||
vdev_raidz_math_scalar.c | ||
vdev_raidz_math_sse2.c | ||
vdev_raidz_math_ssse3.c | ||
vdev_raidz_math.c | ||
vdev_raidz.c | ||
vdev_rebuild.c | ||
vdev_removal.c | ||
vdev_root.c | ||
vdev_trim.c | ||
vdev.c | ||
zap_leaf.c | ||
zap_micro.c | ||
zap.c | ||
zcp_get.c | ||
zcp_global.c | ||
zcp_iter.c | ||
zcp_set.c | ||
zcp_synctask.c | ||
zcp.c | ||
zfeature.c | ||
zfs_byteswap.c | ||
zfs_fm.c | ||
zfs_fuid.c | ||
zfs_ioctl.c | ||
zfs_log.c | ||
zfs_onexit.c | ||
zfs_quota.c | ||
zfs_ratelimit.c | ||
zfs_replay.c | ||
zfs_rlock.c | ||
zfs_sa.c | ||
zfs_vnops.c | ||
zil.c | ||
zio_checksum.c | ||
zio_compress.c | ||
zio_inject.c | ||
zio.c | ||
zle.c | ||
zrlock.c | ||
zthr.c | ||
zvol.c |