Commit Graph

274 Commits

Author SHA1 Message Date
Christian Ebner
ab75d7ac6e config: s3: adapt to new config struct layouts
In order to not return the secret key as part of the s3 endpoint
config, split the config into different struct depending on the
usecase. Either use the plain config without id and secret_key,
the struct with id and plain config or the combined variant with
all 3 fields present.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
22cd2711eb datastore: conditionally upload atime marker chunk to s3 backend
Since commit b18eab64 ("fix #5982: garbage collection: check atime
updates are honored"), the 4 MiB fixed sized, unencypted and
compressed chunk containing all zeros is inserted at datastore
creation if the atime safety check is enabled.

If the datastore is backed by an S3 object store, chunk uploads are
avoided by checking the presence of the chunks in the local cache
store. Therefore, the all zero chunk will however not be uploaded
since already inserted locally.

Fix this by conditionally uploading the chunk before performing the
atime update check for datastores backed by S3.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
b2ffc83627 api/datastore: implement refresh endpoint for stores with s3 backend
Allows to easily refresh the contents on the local cache store for
datastores backed by an S3 object store.

In order to guarantee that no read or write operations are ongoing,
the store is first set into the maintenance mode `S3Refresh`. Objects
are then fetched into a temporary directory to avoid loosing contents
and consistency in case of an error. Once all objects have been
fetched, clears out existing contents and moves the newly fetched
contents in place.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
f8304a3d31 datastore: local chunk reader: get cached chunk from local cache store
Check if a chunk is contained in the local cache and if so prefer
fetching it from the cache instead of pulling it via the S3 api. This
improves performance and reduces number of requests to the backend.

Basic restore performance tests:

Restored a snapshot containing the linux git repository (on-disk size
5.069 GiB, compressed 3.718 GiB) from an AWS S3 backed datastore, with
and without cached contents:
non cached: 691.95 s
all cached:  74.89 s

Signed-off-by: Christian Ebner <c.ebnner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
299276be19 datastore: add local datastore cache for network attached storages
Use a local datastore as cache using LRU cache replacement policy for
operations on a datastore backed by a network, e.g. by an S3 object
store backend. The goal is to reduce number of requests to the
backend and thereby save costs (monetary as well as time).

Cached chunks are stored on the local datastore cache, already
containing the datastore's contents metadata (namespace, group,
snapshot, owner, index files, ecc..), used to perform fast lookups.
The cache itself only stores chunk digests, not the raw data itself.
When payload data is required, contents are looked up and read from
the local datastore cache filesystem, including fallback to fetch from
the backend if the presumably cached entry is not found.

The cacher allows to fetch cache items on cache misses via the access
method.

The capacity of the cache is derived from the local datastore cache
filesystem, or by the user configured value, whichever is smalller.
The capacity is only set on instantiation of the store, and the current
value kept as long as the datastore remains cached in the datastore
cache. To change the value, the store has to be either be set to offline
mode and back, or the services restarted.

Basic performance tests:

Backup and upload of contents of linux git repository to AWS S3,
snapshots removed in-between each backup run to avoid other chunk reuse
optimization of PBS.

no-cache:
    had to backup 5.069 GiB of 5.069 GiB (compressed 3.718 GiB) in 50.76 s (average 102.258 MiB/s)
empty-cache:
    had to backup 5.069 GiB of 5.069 GiB (compressed 3.718 GiB) in 50.42 s (average 102.945 MiB/s)
all-cached:
    had to backup 5.069 GiB of 5.069 GiB (compressed 3.718 GiB) in 43.78 s (average 118.554 MiB/s)

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
8c29e18b8e tools: lru cache: add removed callback for evicted cache nodes
Add a callback function to be executed on evicted cache nodes. The
callback gets the key of the removed node, allowing to externally act
based on that value.

Since the callback might fail, extend the current LRU cache api to
return an error on insert, covering the error for the `removed`
callback.

Async lru cache, callsites and tests are adapted to include the
additional callback parameter accordingly.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
6a880e8a44 datastore: implement garbage collection for s3 backend
Implements the garbage collection for datastores backed by an s3
object store.
Take advantage of the local datastore by placing marker files in the
chunk store during phase 1 of the garbage collection, updating their
atime if already present.
This allows us to avoid making expensive API calls to update object
metadata, which would only be possible via a copy object operation.

The phase 2 is implemented by fetching a list of all the chunks via
the ListObjectsV2 API call, filtered by the chunk folder prefix.
This operation has to be performed in batches of 1000 objects, given
by the APIs response limits.
For each object key, lookup the marker file and decide based on the
marker existence and it's atime if the chunk object needs to be
removed. Deletion happens via the delete objects operation, allowing
to delete multiple chunks by a single request.

This allows to efficiently lookup chunks which are not in use
anymore while being performant and cost effective.

Baseline runtime performance tests:
-----------------------------------

3 garbage collection runs were performed with hot filesystem caches
(by additional GC run before the test runs). The PBS instance was
virtualized, the same virtualized disk using ZFS for all the local
cache stores:

All datastores contained the same encrypted data, with the following
content statistics:
Original data usage: 269.685 GiB
On-Disk usage: 9.018 GiB (3.34%)
On-Disk chunks: 6477
Deduplication factor: 29.90
Average chunk size: 1.426 MiB

The resutlts demonstrate the overhead caused by the additional
ListObjectV2 API calls and their processing, but depending on the
object store backend.

Average garbage collection runtime:
Local datastore:             (2.04 ± 0.01) s
Local RADOS gateway (Squid): (3.05 ± 0.01) s
AWS S3:                      (3.05 ± 0.01) s
Cloudflare R2:               (6.71 ± 0.58) s

After pruning of all datastore contents (therefore including
DeleteObjects requests):
Local datastore:              3.04 s
Local RADOS gateway (Squid): 14.08 s
AWS S3:                      13.06 s
Cloudflare R2:               78.21 s

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
57b47366f7 datastore: get and set owner for s3 store backend
Read or write the ownership information from/to the corresponding
object in the S3 object store. Keep that information available if
the bucket is reused as datastore.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Lukas Wagner <l.wagner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
6ff078a5a0 datastore: prune groups/snapshots from s3 object store backend
When pruning a backup group or a backup snapshot for a datastore with
S3 object store backend, remove the associated objects by removing
them based on the prefix.

In order to exclude protected contents, add a filtering based on the
presence of the protected marker.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
b9a2fa4994 datastore: create/delete protected marker file on s3 storage backend
Commit 8292d3d2 ("api2/admin/datastore: add get/set_protection")
introduced the protected flag for backup snapshots, considering
snapshots as protected based on the presence/absence of the
`.protected` marker file in the corresponding snapshot directory.

To allow independent recovery of a datastore backed by an S3 bucket,
also create/delete the marker file on the object store backend. For
actual checks, still rely on the marker as encountered in the local
cache store.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
5ea28683bb datastore: create namespace marker in s3 backend
The S3 object store only allows to store objects, referenced by their
key. For backup namespaces datastores however use directories, so
they cannot be represented as one to one mapping.

Instead, create an empty marker file for each namespace and operate
based on that.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
e3ca69adb0 datastore: local chunk reader: read chunks based on backend
Get and store the datastore's backend on local chunk reader
instantiantion and fetch chunks based on the variant from either the
filesystem or the s3 object store.

By storing the backend variant, the s3 client is instantiated only
once and reused until the local chunk reader instance is dropped.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Lukas Wagner <l.wagner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
352a206578 api: backup: conditionally upload manifest to s3 object store backend
Reupload the manifest to the S3 object store backend on manifest
updates, if s3 is configured as backend.
This also triggers the initial manifest upload when finishing backup
snapshot in the backup api call handler.
Updates also the locally cached version for fast and efficient
listing of contents without the need to perform expensive (as in
monetary cost and IO latency) requests.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Lukas Wagner <l.wagner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
098ab91bd9 datastore: allow to get the backend for a datastore
Implements an enum with variants Filesystem and S3 to distinguish
between available backends. Filesystem will be used as default, if no
backend is configured in the datastores configuration. If the
datastore has an s3 backend configured, the backend method will
instantiate and s3 client and return it with the S3 variant.

This allows to instantiate the client once, keeping and reusing the
same open connection to the api for the lifetime of task or job, e.g.
in the backup writer/readers runtime environment.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
aeb4ff4992 datastore: add helpers for path/digest to s3 object key conversion
Adds helper methods to generate the s3 object keys given a relative
path and filename for datastore contents or digest in case of chunk
files.

Regular datastore contents are stored by grouping them with a content
prefix in the object key. In order to keep the object key length
small, given the max limit of 1024 bytes [0], `.cnt` is used as
content prefix. Chunks on the other hand are prefixed by `.chunks`,
same as on regular datastores.

The prefix allows for selective listing of either contents or chunks
by providing the prefix to the respective api calls.

[0] https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Lukas Wagner <l.wagner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-22 21:43:43 +02:00
Christian Ebner
ae3994e003 garbage collection: track chunk cache stats and show in task log
Count the chunk cache hits and misses and display the resulting
values and the hit ratio in the garbage collection task log summary.

This allows to investigate possible issues and tune cache capacity,
also by being able to compare to other values in the summary such
as the on disk chunk count.

Exemplary output
```
2025-05-16T22:31:53+02:00: Chunk cache: hits 15817, misses 873 (hit ratio 94.77%)
2025-05-16T22:31:53+02:00: Removed garbage: 0 B
2025-05-16T22:31:53+02:00: Removed chunks: 0
2025-05-16T22:31:53+02:00: Original data usage: 64.961 GiB
2025-05-16T22:31:53+02:00: On-Disk usage: 1.037 GiB (1.60%)
2025-05-16T22:31:53+02:00: On-Disk chunks: 874
2025-05-16T22:31:53+02:00: Deduplication factor: 62.66
2025-05-16T22:31:53+02:00: Average chunk size: 1.215 MiB
```

Sidenote: the discrepancy between cache miss counter and on-disk
chunk count in the output shown above can be attributed to the all
zero chunk, inserted during the atime update check at the start of
garbage collection, however not being referenced by any index file in
this examplary case.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250604153449.482640-3-c.ebner@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-16 01:52:15 +02:00
Christian Ebner
433fc1b73b datastore: ignore missing owner file when removing group directory
Since commit 23be00a4 ("fix #3336: datastore: remove group if the
last snapshot is removed"), a backup group directory is cleaned up
when the new locking mechanism is in use once:
- the group is requested to be destroyed and all the snapshots have
  been deleted
- the last snapshot of a group has been destroyed
Since then, the owner file is also cleaned up separately.

However, the owner file might be already missing due to removal of
the group directory executed when removing the last backup snapshot
of the group, making the subsequent call in the backup group destroy
method fail.

Fix this by ignoring a missing owner file and continue with trying to
emove the group directory itself.

Fixes: 23be00a4 ("fix #3336: datastore: remove group if the last snapshot is removed")
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250703131837.786811-7-c.ebner@proxmox.com
2025-07-04 13:01:52 +02:00
Wolfgang Bumiller
e536da9f80 update pbs-datastore to nix 0.29 and proxmox-base64
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-06-16 13:59:37 +02:00
Christian Ebner
5285a859dc garbage collection: bypass cache if gc-cache-capacity is 0
Since commit 1e7639bf ("fixup minimum lru capacity") the LRU cache
capacity is set to a minimum value of 1 to avoid issues with the edge
case of 0 capacity.

In commit f1a711c8 ("garbage collection: set phase1 LRU cache
capacity by tuning option") this was not taken into account, allowing
to set values in the range [0, 8*1024*1024] via the datastores tuning
parameters.

Bypass the cache by making it optional and do not use it if the cache
capacity is set to 0, which implies it being disabled.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-06-04 14:39:26 +02:00
Christian Ebner
7149ecdacd datastore: pass relative path to group type iterator
`ListGroupsType::new_at` creates a new iterator over all groups of
give backup type with provided parent file descriptor.

The parent directory file descriptor is passed to the `read_subdir`
call, which itself uses it to open the type directory via `openat`.
This call does however ignore the passed file handle if the given
path is absolute [0], which is always the case for the type path
generated via `DataStore::type_path`.

Fix this by passing only the type name as relative path to the
`read_subdir` call, use the absolute path only for
`ListGroupType::new`.

This helps avoiding re-traversing the absolute path in the
`ListGroups` iterator, and since it is then the only callside for
`ListGroupsType::new_at`, inline the instantiation.

[0] https://linux.die.net/man/2/openat

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-06-04 13:47:55 +02:00
Fabian Grünbichler
04eb5010e6 backup info: fully inline protected check into list_backup_files
to avoid to diverging code paths that both want the same result anyway.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2025-06-04 13:45:01 +02:00
Christian Ebner
2745f731e2 backup info: avoid additional stat syscall for protected check
`BackupDir::is_protected` is the general helper method to check the
protected state for a snapshot. This checks for the presence of the
protected marker file, which is performed by stating the file and
requires traversing the full path.

When generating the backup list for a backup group, the snapshot
directory contents are however scanned nevertheless. Take advantage
of this by extending the regex used to filter contents by scandir to
include also the protected marker filename and set the state based on
the presence/absence, thereby avoiding the additional stat syscall
altogether.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-06-04 13:36:41 +02:00
Christian Ebner
a93775e37e fix #6358: remove group note file if present on group destroy
Removing the group directory when forgetting a backup group or
removing the final backup snapshot of a group did not take into
consideration a potentially present group note file, leading for it
to fail.

Further, since the owner file is removed before trying to remove the
(not empty) group directory, the group will not be usable anymore as
the owner check will fail as well.

To fix this, remove the backup group's note file first, if present
and only after that try to cleanup the rest.

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6358
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-06-04 13:30:05 +02:00
Christian Ebner
7cf68f3f2f api: datastore: make group notes path helper a DataStore method
Move and make the helper function to get a backup groups notes file
path a `DataStore` method instead. This allows it to be reused when
access to the notes path is required from the datastore itself.

Further, use the plural `notes` wording also in the helper to be
consistent with the rest of the codebase.

In preparation for correctly removing the notes file from the backup
group on destruction.

No functional changes intended.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-06-04 13:30:05 +02:00
Christian Ebner
dc324716a6 datastore: snapshot iterator: avoid unnecessary string conversion
Avoid converting the backup time string to the timestamp and back to
string again. `BackupDir::with_rfc3339` already performs the string
to time conversion, so use it over parsing the timestamp first only
to convert it back to string in `BackupDir::with_group`.

No functional changes intended.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-04-30 14:26:39 +02:00
Fabian Grünbichler
115942267d run cargo fmt
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2025-04-24 09:54:12 +02:00
Christian Ebner
cb9814e331 garbage collection: fix rare race in chunk marking phase
During phase 1 of garbage collection referenced chunks are marked as
in use by iterating over all index files and updating the atime on
the chunks referenced by these.

In an edge case for long running garbage collection jobs, where a
newly added snapshot (created after the start of GC) reused known
chunks from a previous snapshot, but the previous snapshot index
referencing them disappeared before the marking phase could reach
that index (e.g. pruned because only 1 snapshot to be kept by
retention setting), known chunks from that previous index file might
not be marked (given that by none of the other index files it was
marked).

Since commit 74361da8 ("garbage collection: generate index file list
via datastore iterators") this is even less likely as now the
iteration reads also index files added during phase 1, and
therefore either the new or the previous index file will account for
these chunks (the previous backup snapshot can only be pruned after
the new one finished, since locked). There remains however a small
race window between the reading of the snapshots in the backup group
and the reading of the actual index files for marking.

Fix this race by:
1. Checking if the last snapshot of a group disappeared and if so
2. generate the list again, looking for new index files previously
   not accounted for
3. To avoid possible endless looping, lock the group if the snapshot
   list changed even after the 10th time (which will lead to
   concurrent operations to this group failing).

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Acked-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Link: https://lore.proxmox.com/20250416105000.270166-3-c.ebner@proxmox.com
2025-04-16 14:17:24 +02:00
Christian Ebner
31dbaf69ab garbage collection: fail on ArchiveType::Blob in open index reader
Instead of returning a None, fail if the open index reader is called
on a blob file. Blobs cannot be read as index anyways and this allows
to distinguish cases where the index file cannot be read because
vanished.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250416105000.270166-2-c.ebner@proxmox.com
2025-04-16 14:17:24 +02:00
Christian Ebner
5fc281cd89 garbage collection: fix: account for created/deleted index files
Since commit 74361da8 ("garbage collection: generate index file list
via datastore iterators") not only snapshots present at the start of
the garbage collection run are considered for marking, but also newly
added ones. Take these into account by adapting the total index file
counter used for the progress output.

Further, correctly take into account also index files which have been
pruned during GC, therefore present in the list of still to process
index files but never encountered by the datastore iterators. These
would otherwise be interpreted incorrectly as strange paths and logged
accordingly, causing confusion as reported in the community forum [0].

Fixes: https://forum.proxmox.com/threads/164968/
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-04-15 12:17:21 +02:00
Christian Ebner
f1a711c830 garbage collection: set phase1 LRU cache capacity by tuning option
Allow to control the capacity of the cache used to track recently
touched chunks via the configured value in the datastore tuning
options. Log the configured value to the task log, if an explicit
value is set, allowing the user to confirm the setting and debug.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Link: https://lore.proxmox.com/pbs-devel/20250404130713.376630-2-c.ebner@proxmox.com
2025-04-05 17:40:10 +02:00
Christian Ebner
daa9d0a9d5 datastore: use custom GC atime cutoff if set
Use the user configured atime cutoff over the default 24h 5m
margin if explicitly set, otherwise fallback to the default.

Move the minimum atime calculation based on the atime cutoff to the
sweep_unused_chunks() callside and pass in the calculated values, as
to have the logic in the same place.

Add log outputs shownig which cutoff and minimum access time is used
by the garbage collection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-04-05 13:18:22 +02:00
Christian Ebner
b18eab64a9 fix #5982: garbage collection: check atime updates are honored
Check if the filesystem backing the chunk store actually updates the
atime to avoid potential data loss in phase 2 of garbage collection,
in case the atime update is not honored.

Perform the check before phase 1 of garbage collection, as well as
on datastore creation. The latter to early detect and disallow
datastore creation on filesystem configurations which otherwise most
likely would lead to data losses. To perform the check also when
reusing an existing datastore, open the chunks store also on reuse.

Enable the atime update check by default, but allow to opt-out by
setting a datastore tuning parameter flag for backwards compatibility.
This is honored by both, garbage collection and datastore creation.

The check uses a 4 MiB fixed sized, unencypted and compressed chunk
as test marker, inserted if not present. This all zero-chunk is very
likely anyways for unencrypted backup contents with large all-zero
regions using fixed size chunking (e.g. VMs).

To avoid cases were the timestamp will not be updated because of the
Linux kernels timestamp granularity, sleep in-between chunk insert
(including an atime update if pre-existing) and the subsequent
stating + utimensat for 1 second.

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-04-05 13:18:22 +02:00
Christian Ebner
8f6874391f chunk store: set file ownership on chunk insert as root user
Inserting a new chunk into the chunk store as process running with
root priviledger currently does not set an explicit ownership on the
chunk file. As a consequence this will lead to permission issues if
the chunk is operated on by a codepath executed in the less
privileged proxy task running as `backup` user.

Therefore, explicitly set the ownership and permissions of the chunk
file upon insert, if the process is executed as `root` user.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-04-05 13:18:22 +02:00
Shannon Sterz
f09f2e0d9e datastore/api: add error message on failed removal due to old locking
group or namespace removal can fail if the old locking mechanism is
still in use, as it is unsafe to properly clean up in that scenario.
return an error message that explains how to rectify that situation.

Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
 [TL: address simple merge conflict and fine tune message to admins]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-03 16:10:16 +02:00
Shannon Sterz
d728c2e836 datastore: ignore group locking errors when removing snapshots
this is only needed for removing the group if the last snapshot is
removed, ignore locking failures, as the user can't do anything to
rectify the situation anymore.

log the locking error for debugging purposes, though.

Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
 [TL: line-wrap comment at 100cc and fix bullet-point indentation]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-03 13:06:24 +02:00
Christian Ebner
03143eee0a fix #5331: garbage collection: avoid multiple chunk atime updates
To reduce the number of atimes updates, keep track of the recently
marked chunks in phase 1 of garbage to avoid multiple atime updates
via expensive utimensat() calls.

Recently touched chunks are tracked by storing the chunk digests in
an LRU cache of fixed capacity. By inserting a digest, the chunk will
be the most recently touched one and if already present in the cache
before insert, the atime update can be skipped. The cache capacity of
1024 * 1024 was chosen as compromise between required memory usage
and the size of an index file referencing a 4 TiB fixed size chunked
image (with 4MiB chunk size).

The previous change to iterate over the datastore contents using the
datastore's iterator helps for increased cache hits, as subsequent
snapshots are most likely to share common chunks.

Basic benchmarking:

Number of utimensat calls shows significatn reduction:
unpatched: 31591944
patched:    1495136

Total GC runtime shows significatn reduction (average of 3 runs):
unpatched: 155.4 ± 3.5 s
patched:    22.8 ± 0.5 s

VmPeak measured via /proc/self/status before and after
`mark_used_chunks` (proxmox-backup-proxy was restarted in between
for normalization, average of 3 runs):
unpatched before: 1196028 ± 0 kB
unpatched after:  1196028 ± 0 kB

unpatched before: 1163337 ± 28317 kB
unpatched after:  1330906 ± 29280 kB
delta:             167569 kB

Dependence on the cache capacity:
     capacity runtime[s]  VmPeakDiff[kB]
       1*1024     66.221               0
      10*1024     36.164               0
     100*1024     23.141               0
    1024*1024     22.188          101060
 10*1024*1024     23.178          689660
100*1024*1024     25.135         5507292

Description of the PBS host and datastore:
CPU: Intel Xeon E5-2620
Datastore backing storage: ZFS RAID 10 with 3 mirrors of 2x
ST16000NM001G, mirror of 2x SAMSUNG_MZ1LB1T9HALS as special

Namespaces: 45
Groups: 182
Snapshots: 3184
Index files: 6875
Deduplication factor: 44.54

Original data usage: 120.742 TiB
On-Disk usage: 2.711 TiB (2.25%)
On-Disk chunks: 1494727
Average chunk size: 1.902 MiB

Distribution of snapshots (binned by month):
2023-11	11
2023-12	16
2024-01	30
2024-02	38
2024-03	17
2024-04	37
2024-05	17
2024-06	59
2024-07	99
2024-08	96
2024-09	115
2024-10	35
2024-11	42
2024-12	37
2025-01	162
2025-02	489
2025-03	1884

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5331
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-04-02 19:57:51 +02:00
Christian Ebner
74361da855 garbage collection: generate index file list via datastore iterators
Instead of iterating over all index files found in the datastore in
an unstructured manner, use the datastore iterators to logically
iterate over them as other datastore operations will.

This allows to better distinguish index files in unexpected locations
from ones in their expected location, warning the user of unexpected
ones to allow to act on possible missconfigurations. Further, this
will allow to integrate marking of snapshots with missing chunks as
incomplete/corrupt more easily and helps improve cache hits when
introducing LRU caching to avoid multiple atime updates in phase 1 of
garbage collection.

This now iterates twice over the index files, as indices in
unexpected locations are still considered by generating the list of
all index files to be found in the datastore and removing regular
index files from that list, leaving unexpected ones behind.

Further, align terminology by renaming the `list_images` method to
a more fitting `list_index_files` and the variable names accordingly.

This will reduce possible confusion since throughout the codebase and
in the documentation files referencing the data chunks are referred
to as index files. The term image on the other hand is associated
with virtual machine images and other large binary data stored as
fixed-size chunks.

Basic benchmarking:

Total GC runtime shows no significatn change (average of 3 runs):
unpatched: 155.4 ± 2.6 s
patched:   155.4 ± 3.5 s

VmPeak measured via /proc/self/status before and after
`mark_used_chunks` (proxmox-backup-proxy was restarted in between
for normalization, no changes for all 3 runs):
unpatched before: 1196032 kB
unpatched after:  1196032 kB

patched before: 1196028 kB
patched after:  1196028 kB

List image shows a slight increase due to the switch to a HashSet
(average of 3 runs):
unpatched: 64.2 ± 8.4 ms
patched:   72.8 ± 3.7 ms

Description of the PBS host and datastore:
CPU: Intel Xeon E5-2620
Datastore backing storage: ZFS RAID 10 with 3 mirrors of 2x
ST16000NM001G, mirror of 2x SAMSUNG_MZ1LB1T9HALS as special

Namespaces: 45
Groups: 182
Snapshots: 3184
Index files: 6875
Deduplication factor: 44.54

Original data usage: 120.742 TiB
On-Disk usage: 2.711 TiB (2.25%)
On-Disk chunks: 1494727
Average chunk size: 1.902 MiB

Distribution of snapshots (binned by month):
2023-11	11
2023-12	16
2024-01	30
2024-02	38
2024-03	17
2024-04	37
2024-05	17
2024-06	59
2024-07	99
2024-08	96
2024-09	115
2024-10	35
2024-11	42
2024-12	37
2025-01	162
2025-02	489
2025-03	1884

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-04-02 19:57:51 +02:00
Christian Ebner
c9bd214555 datastore: add helper method to open index reader from path
Refactor the archive type and index file reader opening with its
error handling into a helper method for better reusability.

This allows to use the same logic for both, expected image paths
and unexpected image paths when iterating trough the datastore
in a hierarchical manner.

Improve error handling by switching to anyhow's error context.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-04-02 19:57:51 +02:00
Shannon Sterz
23be00a42c fix #3336: datastore: remove group if the last snapshot is removed
Empty backup groups are not visible in the API or GUI. This led to a
confusing issue where users were unable to create a group because it
already existed and was still owned by another user. Resolve this
issue by removing the group if its last snapshot is removed.

Also fixes an issue where removing a group used the non-atomic
`remove_dir_all()` function when destroying a group unconditionally.
This could lead to two different threads suddenly holding a lock to
the same group. Make sure that the new locking mechanism is used,
which prevents that, before removing the group. This is also a bit
more conservative now, as it specifically removes the owner file and
group directory separately to avoid accidentally removing snapshots in
case we made an oversight.

Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
2025-04-02 14:42:42 +02:00
Shannon Sterz
52e5d52cbd fix #3935: datastore: move manifest locking to new locking method
adds double stat'ing and removes directory hierarchy to bring manifest
locking in-line with other locks used by the BackupDir trait.

if the old locking mechanism is still supposed to be used, this still
falls back to the previous lock file. however, we already add double
stat'ing since it is trivial to do here and should only provide better
safety when it comes to removing locks.

Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2025-03-26 16:21:43 +01:00
Shannon Sterz
27dd73777f fix #3935: datastore/api/backup: move datastore locking to '/run'
to avoid issues when removing a group or snapshot directory where two
threads hold a lock to the same directory, move locking to the tmpfs
backed '/run' directory. also adds double stat'ing to make it possible
to remove locks without certain race condition issues.

this new mechanism is only employed when we can be sure, that a reboot
has occured so that all processes are using the new locking mechanism.
otherwise, two separate process could assume they have exclusive
rights to a group or snapshot.

bumps the rust version to 1.81 so we can use `std::fs::exists` without
issue.

Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
 [TL: drop unused format_err import]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-03-26 16:21:43 +01:00
Shannon Sterz
e2c1866b13 datastore/api/backup: prepare for fix of #3935 by adding lock helpers
to avoid duplicate code, add helpers for locking groups and snapshots
to the BackupGroup and BackupDir traits respectively and refactor
existing code to use them.

this also adapts error handling by adding relevant context to each
locking helper call site. otherwise, we might loose valuable
information useful for debugging. note, however, that users that
relied on specific error messages will break.

Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2025-03-26 16:21:39 +01:00
Christian Ebner
3362a6e049 clippy/fmt: tree wide drop of clone for types implementing copy
fixes the clippy warning on types T implementing Copy:
```
warning: using `clone` on type `T` which implements the `Copy` trait
```

followed by formatting fixups via `cargo fmt`.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-03-20 14:48:31 +01:00
Christian Ebner
54763b39c7 datastore: restrict datastores list_images method scope to module
Drop the pub scope for `DataStore`s `list_images` method.

This method is only used to generate a list of index files found in
the datastore for iteration during garbage collection. There are no
other call sites and this is intended to only be used within the
module itself. Allows to be more flexible for future method signature
adaptions.

No functional changes.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-03-17 14:06:01 +01:00
Fabian Grünbichler
eba172a492 run cargo fmt
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2025-03-13 13:23:17 +01:00
Maximiliano Sandoval
5117a21ec9 snapshot_reader: replace Arc with Rc
The type `Box<dyn IndexFile + Send>>, usize, Vec<(usize, u64)>` is not
Sync so it makes more sense to use Rc. This is suggested by clippy.

Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
2025-03-06 14:55:04 +01:00
Maximiliano Sandoval
858744bf3c run cargo clippy --fix
The actual incantation is:

clippy --all-targets --workspace --all-features --fix

Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
2025-03-06 14:53:47 +01:00
Christian Ebner
f098814876 datastore: use libc's timespec constants instead of redefinition
Use the UTIME_NOW and UTIME_OMIT constants defined in libc crate
instead of redefining them. This improves consistency, as utimesat
and its timespec parameter are also defined via the libc crate.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2025-03-05 10:01:12 +01:00
Maximiliano Sandoval
1cf52c6bb3 remove create & truncate when create_new is used
As per its documentation [1]:

> If .create_new(true) is set, .create() and .truncate() are ignored.

This gets rid of the "file opened with `create`, but `truncate`
behavior not defined " clippy warnings.

[1] https://doc.rust-lang.org/std/fs/struct.OpenOptions.html#method.create_new

Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
2025-01-27 11:53:23 +01:00
Maximiliano Sandoval
f1a5808e67 replace match statements with ? operator
When possible.

Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
2025-01-14 08:57:24 +01:00