During phase 1 of garbage collection referenced chunks are marked as
in use by iterating over all index files and updating the atime on
the chunks referenced by these.
In an edge case for long running garbage collection jobs, where a
newly added snapshot (created after the start of GC) reused known
chunks from a previous snapshot, but the previous snapshot index
referencing them disappeared before the marking phase could reach
that index (e.g. pruned because only 1 snapshot to be kept by
retention setting), known chunks from that previous index file might
not be marked (given that by none of the other index files it was
marked).
Since commit 74361da8 ("garbage collection: generate index file list
via datastore iterators") this is even less likely as now the
iteration reads also index files added during phase 1, and
therefore either the new or the previous index file will account for
these chunks (the previous backup snapshot can only be pruned after
the new one finished, since locked). There remains however a small
race window between the reading of the snapshots in the backup group
and the reading of the actual index files for marking.
Fix this race by:
1. Checking if the last snapshot of a group disappeared and if so
2. generate the list again, looking for new index files previously
not accounted for
3. To avoid possible endless looping, lock the group if the snapshot
list changed even after the 10th time (which will lead to
concurrent operations to this group failing).
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Acked-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Link: https://lore.proxmox.com/20250416105000.270166-3-c.ebner@proxmox.com
Instead of returning a None, fail if the open index reader is called
on a blob file. Blobs cannot be read as index anyways and this allows
to distinguish cases where the index file cannot be read because
vanished.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250416105000.270166-2-c.ebner@proxmox.com
Since commit 74361da8 ("garbage collection: generate index file list
via datastore iterators") not only snapshots present at the start of
the garbage collection run are considered for marking, but also newly
added ones. Take these into account by adapting the total index file
counter used for the progress output.
Further, correctly take into account also index files which have been
pruned during GC, therefore present in the list of still to process
index files but never encountered by the datastore iterators. These
would otherwise be interpreted incorrectly as strange paths and logged
accordingly, causing confusion as reported in the community forum [0].
Fixes: https://forum.proxmox.com/threads/164968/
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Allow to control the capacity of the cache used to track recently
touched chunks via the configured value in the datastore tuning
options. Log the configured value to the task log, if an explicit
value is set, allowing the user to confirm the setting and debug.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Link: https://lore.proxmox.com/pbs-devel/20250404130713.376630-2-c.ebner@proxmox.com
Use the user configured atime cutoff over the default 24h 5m
margin if explicitly set, otherwise fallback to the default.
Move the minimum atime calculation based on the atime cutoff to the
sweep_unused_chunks() callside and pass in the calculated values, as
to have the logic in the same place.
Add log outputs shownig which cutoff and minimum access time is used
by the garbage collection.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Check if the filesystem backing the chunk store actually updates the
atime to avoid potential data loss in phase 2 of garbage collection,
in case the atime update is not honored.
Perform the check before phase 1 of garbage collection, as well as
on datastore creation. The latter to early detect and disallow
datastore creation on filesystem configurations which otherwise most
likely would lead to data losses. To perform the check also when
reusing an existing datastore, open the chunks store also on reuse.
Enable the atime update check by default, but allow to opt-out by
setting a datastore tuning parameter flag for backwards compatibility.
This is honored by both, garbage collection and datastore creation.
The check uses a 4 MiB fixed sized, unencypted and compressed chunk
as test marker, inserted if not present. This all zero-chunk is very
likely anyways for unencrypted backup contents with large all-zero
regions using fixed size chunking (e.g. VMs).
To avoid cases were the timestamp will not be updated because of the
Linux kernels timestamp granularity, sleep in-between chunk insert
(including an atime update if pre-existing) and the subsequent
stating + utimensat for 1 second.
Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Inserting a new chunk into the chunk store as process running with
root priviledger currently does not set an explicit ownership on the
chunk file. As a consequence this will lead to permission issues if
the chunk is operated on by a codepath executed in the less
privileged proxy task running as `backup` user.
Therefore, explicitly set the ownership and permissions of the chunk
file upon insert, if the process is executed as `root` user.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
group or namespace removal can fail if the old locking mechanism is
still in use, as it is unsafe to properly clean up in that scenario.
return an error message that explains how to rectify that situation.
Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
[TL: address simple merge conflict and fine tune message to admins]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this is only needed for removing the group if the last snapshot is
removed, ignore locking failures, as the user can't do anything to
rectify the situation anymore.
log the locking error for debugging purposes, though.
Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
[TL: line-wrap comment at 100cc and fix bullet-point indentation]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
To reduce the number of atimes updates, keep track of the recently
marked chunks in phase 1 of garbage to avoid multiple atime updates
via expensive utimensat() calls.
Recently touched chunks are tracked by storing the chunk digests in
an LRU cache of fixed capacity. By inserting a digest, the chunk will
be the most recently touched one and if already present in the cache
before insert, the atime update can be skipped. The cache capacity of
1024 * 1024 was chosen as compromise between required memory usage
and the size of an index file referencing a 4 TiB fixed size chunked
image (with 4MiB chunk size).
The previous change to iterate over the datastore contents using the
datastore's iterator helps for increased cache hits, as subsequent
snapshots are most likely to share common chunks.
Basic benchmarking:
Number of utimensat calls shows significatn reduction:
unpatched: 31591944
patched: 1495136
Total GC runtime shows significatn reduction (average of 3 runs):
unpatched: 155.4 ± 3.5 s
patched: 22.8 ± 0.5 s
VmPeak measured via /proc/self/status before and after
`mark_used_chunks` (proxmox-backup-proxy was restarted in between
for normalization, average of 3 runs):
unpatched before: 1196028 ± 0 kB
unpatched after: 1196028 ± 0 kB
unpatched before: 1163337 ± 28317 kB
unpatched after: 1330906 ± 29280 kB
delta: 167569 kB
Dependence on the cache capacity:
capacity runtime[s] VmPeakDiff[kB]
1*1024 66.221 0
10*1024 36.164 0
100*1024 23.141 0
1024*1024 22.188 101060
10*1024*1024 23.178 689660
100*1024*1024 25.135 5507292
Description of the PBS host and datastore:
CPU: Intel Xeon E5-2620
Datastore backing storage: ZFS RAID 10 with 3 mirrors of 2x
ST16000NM001G, mirror of 2x SAMSUNG_MZ1LB1T9HALS as special
Namespaces: 45
Groups: 182
Snapshots: 3184
Index files: 6875
Deduplication factor: 44.54
Original data usage: 120.742 TiB
On-Disk usage: 2.711 TiB (2.25%)
On-Disk chunks: 1494727
Average chunk size: 1.902 MiB
Distribution of snapshots (binned by month):
2023-11 11
2023-12 16
2024-01 30
2024-02 38
2024-03 17
2024-04 37
2024-05 17
2024-06 59
2024-07 99
2024-08 96
2024-09 115
2024-10 35
2024-11 42
2024-12 37
2025-01 162
2025-02 489
2025-03 1884
Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5331
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Instead of iterating over all index files found in the datastore in
an unstructured manner, use the datastore iterators to logically
iterate over them as other datastore operations will.
This allows to better distinguish index files in unexpected locations
from ones in their expected location, warning the user of unexpected
ones to allow to act on possible missconfigurations. Further, this
will allow to integrate marking of snapshots with missing chunks as
incomplete/corrupt more easily and helps improve cache hits when
introducing LRU caching to avoid multiple atime updates in phase 1 of
garbage collection.
This now iterates twice over the index files, as indices in
unexpected locations are still considered by generating the list of
all index files to be found in the datastore and removing regular
index files from that list, leaving unexpected ones behind.
Further, align terminology by renaming the `list_images` method to
a more fitting `list_index_files` and the variable names accordingly.
This will reduce possible confusion since throughout the codebase and
in the documentation files referencing the data chunks are referred
to as index files. The term image on the other hand is associated
with virtual machine images and other large binary data stored as
fixed-size chunks.
Basic benchmarking:
Total GC runtime shows no significatn change (average of 3 runs):
unpatched: 155.4 ± 2.6 s
patched: 155.4 ± 3.5 s
VmPeak measured via /proc/self/status before and after
`mark_used_chunks` (proxmox-backup-proxy was restarted in between
for normalization, no changes for all 3 runs):
unpatched before: 1196032 kB
unpatched after: 1196032 kB
patched before: 1196028 kB
patched after: 1196028 kB
List image shows a slight increase due to the switch to a HashSet
(average of 3 runs):
unpatched: 64.2 ± 8.4 ms
patched: 72.8 ± 3.7 ms
Description of the PBS host and datastore:
CPU: Intel Xeon E5-2620
Datastore backing storage: ZFS RAID 10 with 3 mirrors of 2x
ST16000NM001G, mirror of 2x SAMSUNG_MZ1LB1T9HALS as special
Namespaces: 45
Groups: 182
Snapshots: 3184
Index files: 6875
Deduplication factor: 44.54
Original data usage: 120.742 TiB
On-Disk usage: 2.711 TiB (2.25%)
On-Disk chunks: 1494727
Average chunk size: 1.902 MiB
Distribution of snapshots (binned by month):
2023-11 11
2023-12 16
2024-01 30
2024-02 38
2024-03 17
2024-04 37
2024-05 17
2024-06 59
2024-07 99
2024-08 96
2024-09 115
2024-10 35
2024-11 42
2024-12 37
2025-01 162
2025-02 489
2025-03 1884
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Refactor the archive type and index file reader opening with its
error handling into a helper method for better reusability.
This allows to use the same logic for both, expected image paths
and unexpected image paths when iterating trough the datastore
in a hierarchical manner.
Improve error handling by switching to anyhow's error context.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Empty backup groups are not visible in the API or GUI. This led to a
confusing issue where users were unable to create a group because it
already existed and was still owned by another user. Resolve this
issue by removing the group if its last snapshot is removed.
Also fixes an issue where removing a group used the non-atomic
`remove_dir_all()` function when destroying a group unconditionally.
This could lead to two different threads suddenly holding a lock to
the same group. Make sure that the new locking mechanism is used,
which prevents that, before removing the group. This is also a bit
more conservative now, as it specifically removes the owner file and
group directory separately to avoid accidentally removing snapshots in
case we made an oversight.
Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
adds double stat'ing and removes directory hierarchy to bring manifest
locking in-line with other locks used by the BackupDir trait.
if the old locking mechanism is still supposed to be used, this still
falls back to the previous lock file. however, we already add double
stat'ing since it is trivial to do here and should only provide better
safety when it comes to removing locks.
Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
to avoid issues when removing a group or snapshot directory where two
threads hold a lock to the same directory, move locking to the tmpfs
backed '/run' directory. also adds double stat'ing to make it possible
to remove locks without certain race condition issues.
this new mechanism is only employed when we can be sure, that a reboot
has occured so that all processes are using the new locking mechanism.
otherwise, two separate process could assume they have exclusive
rights to a group or snapshot.
bumps the rust version to 1.81 so we can use `std::fs::exists` without
issue.
Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[TL: drop unused format_err import]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
to avoid duplicate code, add helpers for locking groups and snapshots
to the BackupGroup and BackupDir traits respectively and refactor
existing code to use them.
this also adapts error handling by adding relevant context to each
locking helper call site. otherwise, we might loose valuable
information useful for debugging. note, however, that users that
relied on specific error messages will break.
Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
fixes the clippy warning on types T implementing Copy:
```
warning: using `clone` on type `T` which implements the `Copy` trait
```
followed by formatting fixups via `cargo fmt`.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Drop the pub scope for `DataStore`s `list_images` method.
This method is only used to generate a list of index files found in
the datastore for iteration during garbage collection. There are no
other call sites and this is intended to only be used within the
module itself. Allows to be more flexible for future method signature
adaptions.
No functional changes.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
The type `Box<dyn IndexFile + Send>>, usize, Vec<(usize, u64)>` is not
Sync so it makes more sense to use Rc. This is suggested by clippy.
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Use the UTIME_NOW and UTIME_OMIT constants defined in libc crate
instead of redefining them. This improves consistency, as utimesat
and its timespec parameter are also defined via the libc crate.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
As per its documentation [1]:
> If .create_new(true) is set, .create() and .truncate() are ignored.
This gets rid of the "file opened with `create`, but `truncate`
behavior not defined " clippy warnings.
[1] https://doc.rust-lang.org/std/fs/struct.OpenOptions.html#method.create_new
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Fixes the cargo doc lint:
```
warning: unclosed HTML tag `uuid`
--> pbs-datastore/src/datastore.rs:60:41
|
60 | /// - could not stat /dev/disk/by-uuid/<uuid>
| ^^^^^^
|
= note: `#[warn(rustdoc::invalid_html_tags)]` on by default
warning: unclosed HTML tag `uuid`
--> pbs-datastore/src/datastore.rs:61:26
|
61 | /// - /dev/disk/by-uuid/<uuid> is not a block device
```
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Fixes the cargo doc lint:
```
warning: this URL is not a hyperlink
--> pbs-datastore/src/data_blob.rs:555:5
|
555 | /// https://github.com/facebook/zstd/blob/dev/lib/common/error_private.h
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: bare URLs are not automatically turned into clickable links
= note: `#[warn(rustdoc::bare_urls)]` on by default
help: use an automatic link instead
|
555 | /// <https://github.com/facebook/zstd/blob/dev/lib/common/error_private.h>
| + +
```
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Since we don't want to have lingering file descriptors on any fork +
exec, like the reload code from the proxmox-daemon crate we're using
for the rest-server(s) does, as that can have serious side effects and
even cause hangs.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
[ TL: Reword commit message ]}
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
```
warning: field assignment outside of initializer for an instance created with Default::default()
--> pbs-datastore/src/chunker.rs:431:5
|
431 | ctx.total = buffer.len() as u64;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
note: consider initializing the variable with `chunker::Context { total: buffer.len() as u64, ..Default::default() }` and removing relevant reassignments
--> pbs-datastore/src/chunker.rs:430:5
|
430 | let mut ctx = Context::default();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#field_reassign_with_default
= note: `#[warn(clippy::field_reassign_with_default)]` on by default
```
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Fixes the clippy lint:
```
warning: empty line after doc comment
--> src/tape/pool_writer/mod.rs:441:5
|
441 | / /// updated.
442 | |
| |_
...
448 | / pub fn append_snapshot_archive(
449 | | &mut self,
450 | | snapshot_reader: &SnapshotReader,
451 | | ) -> Result<(bool, usize), Error> {
| |_____________________________________- the comment documents this method
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_doc_comments
= help: if the empty line is unintentional remove it
help: if the documentation should include the empty line include it in the comment
|
442 | ///
|
```
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Fixes the question_mark clippy lint:
```
warning: this `let...else` may be rewritten with the `?` operator
--> pbs-datastore/src/datastore.rs:101:5
|
101 | / let Some(ref device_uuid) = config.backing_device else {
102 | | return None;
103 | | };
| |______^ help: replace it with: `let ref device_uuid = config.backing_device?;`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#question_mark
= note: `#[warn(clippy::question_mark)]` on by default
```
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
these are particularly problematic since GC will walk the whole datastore tree
on the file system, and will thus pick up indices (but not chunks!) from nested
directories that are ignored in other code paths that use our regular
iterators..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the current phrase leads to clumsy log messages such as:
> datastore 'store' is in datastore is being unmounted
this commit re-phrases that too:
> datastore 'store' is unavailable: datastore is being unmounted
Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
Data deletion is only possible if the datastore is mounted, won't attempt
mounting it for the purpose of deleting data.
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
... at a specific location. Also adds two additional functions to
get the mount status, and ensuring a removable datastore is mounted.
Co-authored-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
... and add MaintenanceType::Delete to it. We also want to clear any
cach entries if we are deleting the datastore, not just if it is marked
as offline.
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
Add helper functions to retrieve the verify_state from the manifest of a
snapshot. Replaced all the manual "verify_state" parsing with the helper
function.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
Instead of using the plain String or slices of it for archive names,
use the dedicated api type and its methods to parse and check for
archive type based on archive filename extension.
Thereby, keeping the checks and mappings in the api type and
resticting function parameters by the narrower wrapper type to reduce
potential misuse.
Further, instead of declaring and using the archive name constants
throughout the codebase, use the `BackupArchiveName` helpers to
generate the archive names for manifest, client logs and encryption
keys.
This allows for easy archive name comparisons using the same
`BackupArchiveName` type, at the cost of some extra allocations and
avoids the currently present double constant declaration of
`CATALOG_NAME`.
A positive ergonomic side effect of this is that commands now also
accept the archive type extension optionally, when passing the archive
name.
E.g.
```
proxmox-backup-client restore <snapshot> <name>.pxar.didx <target>
```
is equal to
```
proxmox-backup-client restore <snapshot> <name>.pxar <target>
```
The previously default mapping of any archive name extension to a blob
has been dropped in favor of consistent mapping by the api type
helpers.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
FG: use LazyLock for constant archive names
FG: add missing import
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Moving the `ArchiveType` to avoid crate dependencies on
`pbs-datastore`.
In preparation for introducing a dedicated `BackupArchiveName` api
type, allowing to set the corresponding archive type variant when
parsing the archive name based on it's filename.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Permissions are stored in the lower 9 bits (rwxrwxrwx),
so we have to mask `st_mode` with 0o777.
The datastore root dir is created with 755, the `.chunks` dir and its
contents with 750 and the `.lock` file with 644, this changes the
expected permissions accordingly.
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
Fixes: 6e101ff757 ("fix #5439: allow to reuse existing datastore")
Reviewed-By: Gabriel Goller <g.goller@proxmox.com>
Add and optionally expose the backup group delete statistics by adding the
return type to the corresponding REST API endpoints.
Clients can opt-into the new behaviour by setting the new `error-on-protected`
flag to `false` when calling the api endpoints, which results in removal not
erroring out when encountering protected snapshots.
The default value for the flag remains `true` for now, to remain backwards
compatible with older clients expecting this behaviour.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
FG: reworded commit message slightly
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
To correctly account also for the number of deleted backup groups, in
preparation to correctly return the delete statistics when removing
contents via the REST API.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
In preparation for the delete stats to be exposed as return type to
the backup group delete api endpoint.
Also, rename the private field `unremoved_protected` to a better
fitting `protected_snapshots` to be in line with the method names.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
removable datastores will have a PBS-managed mountpoint as path, direct
access to the field needs to be replaced with a helper that can account
for this.
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
Disallow creating datastores in non-empty directories. Allow adding
existing datastores via a 'reuse-datastore' checkmark. This only checks
if all the necessary directories (.chunks + subdirectories and .lock)
exist and have the correct permissions. Note that the reuse-datastore
path does not open the datastore, so that we don't drop the
ProcessLocker of an existing datastore.
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
Perform the conversion from pxar file entries to catalog entry
attributes by implementing `TryFrom<&FileEntry<T>>` for
`DirEntryAttribute` and use that.
Allows the reuse for the catalog shell, when using the split pxar
archive instead of the catalog.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
we currently use the behavior of zstd that is not part of the public
api, so this is at risk to be changed without notice.
There is a public api that we could use, but it's only available
with zstd_sys >= 2.0.9, which at this time, is not yet packaged for/by
us.
Add a comment that we can use the public api for this when the
new version of the crate gets available.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>