push: reduce initial capacity of known chunks

one million chunks are a bit much, considering that chunks are representing
1-2MB (dynamic) to 4MB (fixed) of input data, that would mean 1-4TB of re-used
input data in a single snapshot.

64k chunks are still representing 64-256GB of input data, which should be
plenty (and for such big snapshots with lots of re-used chunks, growing the
allocation of the HashSet should not be the bottleneck), and is also the
default capacity used for pulling.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
This commit is contained in:
Fabian Grünbichler 2024-11-20 19:55:04 +01:00
parent 0bcbb1badd
commit 2492083e37

View File

@ -781,7 +781,7 @@ pub(crate) async fn push_snapshot(
};
// Avoid double upload penalty by remembering already seen chunks
let known_chunks = Arc::new(Mutex::new(HashSet::with_capacity(1024 * 1024)));
let known_chunks = Arc::new(Mutex::new(HashSet::with_capacity(64 * 1024)));
for entry in source_manifest.files() {
let mut path = backup_dir.full_path();