Commit Graph

7827 Commits

Author SHA1 Message Date
Fabian Grünbichler
3a76fdc9e7 bump version to 3.2.5-1
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2024-06-10 13:45:49 +02:00
Fabian Grünbichler
4354cae7ba bump pxar to 0.11.1
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2024-06-10 13:39:33 +02:00
Fabian Grünbichler
481a9515b1 extract: don't interpret prelude as OsStr
that would drop the final byte, and the corresponding code has been removed
from pxar now as well.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2024-06-10 13:38:10 +02:00
Christian Ebner
afdc0f0e42 client: pxar: encode prelude based on writer variant
Currently, whether to encode the exlcude patterns passed via cli as
prelude or via the `.pxar-exclude-cli` is based on the presence of
a previous metadata accessor.
That leaves however to the encoding of the file entry instead of the
prelude for split archives in `data` mode and for the first snapshot
in a backup, creating undesired padding in the first payload chunk.

Therefore, use the pxar writer variant to make the decision instead.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-10 13:11:38 +02:00
Christian Ebner
903ab2e938 client: pxar: json encode cli exclude pattern in prelude
The current encoding is not extensible, so encode the cli exclude
patterns as json instead. By this, the prelude is easily seralized
and deserialized, while remaining human readable.

Originally-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-10 13:11:29 +02:00
Christian Ebner
3c9a29e5cf file-restore: list: improve pxar v2 performance
Do not attach the payload reader for split pxar archives, as only the
metadata has to be accessed for listing.
This avoids that the decoder performs consistency checks with the
payload stream, which require chunk download and decoding, making the
listing unusable slow.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-10 10:58:31 +02:00
Christian Ebner
8834153187 docs: add table listing possible change detection modes
Quick and concise listing of the available change detection modes for
reference.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-10 10:30:10 +02:00
Christian Ebner
04bb256abd client: backup spec: rename change detection mode default
The currently default variant is named `Default`, which is not future
prove since the default might change in the future. So rename it to
`Legacy` instead.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-10 10:30:05 +02:00
Fabian Grünbichler
fdcae4bd8a api: catalog: improve pxar v2 performance
by skipping the payloader reader entirely, it's not needed for listing contents
and would make accessing larger archives too expensive.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-10 10:10:33 +02:00
Fabian Grünbichler
8f9330b582 run cargo fmt
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2024-06-07 14:00:33 +02:00
Fabian Grünbichler
e653ace318 api: catalog/file-restore: use archive-name schema
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2024-06-07 14:00:15 +02:00
Christian Ebner
c0302805c4 client: backup: conditionally write catalog for file level backups
Only write the catalog when using the regular backup mode, do not write
it when using the split archive mode.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 13:56:24 +02:00
Christian Ebner
4dd816b343 www: content: lookup via metadata archive instead of catalog
In case of pxar archives with split metadata and payload data, the
metadata archive has to be used to lookup entries for navigation
before performing a single file restore.

Decide based on the archive filename extension whether to use the
`catalog` or the `pxar-lookup` api endpoint.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 13:56:24 +02:00
Christian Ebner
84981486cb file-restore: fallback to mpxar if catalog not present
The `proxmox-file-restore list` command will uses the provided path to
lookup and list directory entries via the catalog. Fallback to using
the metadata archive if the catalog is not present for fast lookups in
a backup snapshot.

This is in preparation for dropping encoding of the catalog for
snapshots using split archive encoding. Proxmox VE's storage plugin
uses this to allow single file restore for LXCs.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 13:56:24 +02:00
Christian Ebner
6cf75f2fe2 file-restore: never list ppxar as archive
Payload data archives cannot be used to navigate the content, so
exclude them from the archive listing, as this is used by
Proxmox VE to list in the file browser.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 13:56:24 +02:00
Christian Ebner
364e2d1133 api: datastore: add optional archive-name to file-restore
Allow to pass the archive name as optional api call parameter instead
of having it as prefix to the path.
If this parameter is given, instead of splitting of the archive name
from the path, the parameter itself is used, leaving the path
untouched.

This allows to restore single files from the archive, without having
to artificially construct the path in case of file restores for split
pxar archives, where the response path of the listing does not
include the archive, as opposed to the response provided by lookup
via the catalog.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 13:56:24 +02:00
Christian Ebner
cfb2632c9e api: datastore: conditional lookup for catalog endpoint
Add an optional `archive-name` parameter, indicating the metadata
archive to be used for directory content lookups instead of the
catalog. If provided, instead of the catalog reader, a pxar Accessor
instance is created to perform the lookup.

This is in preparation for dropping catalog encoding for snapshots
with split pxar archive encoding.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 13:56:24 +02:00
Christian Ebner
8d8142955f client: tools: add helper to lookup ArchiveEntrys via pxar
In preparation to lookup entries via the pxar metadata archive
instead of the catalog, in order to drop encoding the catalog
for snapshots using split pxar archives altogehter.

This helper allows to lookup the directory entries via the provided
accessor instance and formats them to be compatible with the output
as produced by lookups via the catalog.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 13:56:24 +02:00
Christian Ebner
3b95f09522 api: datastore: move reusable code out of thread
Move code that can be reused when having to  perform a lookup via the
pxar metadata archive instead of the catalog out of the thread.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 13:56:24 +02:00
Christian Ebner
30ea695518 api: datastore: factor out path decoding for catalog
The file path passed to the catalog is base64 encoded, with an exception
for the root.
Factor this check and decoding step out into a helper function to make
it reusable when doing the same for lookups via the metadata archive
instead of the catalog.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 13:56:24 +02:00
Christian Ebner
2baa9f8fb8 client: helper: fix minor formatting issue
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 12:09:21 +02:00
Christian Ebner
6b1110badf client: pxar: fix minor formatting issue
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-07 12:09:21 +02:00
Christian Ebner
1259234488 client: pxar: conditionally skip metadata reference test
The test will fail for all users not having euid/egid set to
1000/1000, as the reference test folder structure cannot be created
with the expected ownership.
Therefore, skip over the test if either euid or egid do not match
this condition.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-06 10:46:05 +02:00
Christian Ebner
bab0645cc6 client: pxar: do not attempt to set uid/gid in test
Setting the uid/gid for the files and folders of the test directory
structure will not work when lacking the permissions.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-06 10:46:05 +02:00
Fabian Grünbichler
766faeb04a bump pxar build-dep to 0.11
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
c51f0d5e8d docs: add section describing change detection mode
Describe the motivation and basic principle of the clients change
detection mode and show an example invocation.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
5cff9c6fe8 docs: file formats: describe split pxar archive file layout
Describes the pxar metadata archive and the corresponding pxar payload
file-format layout.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
cf75bc0db5 client: pxar: set cache limit based on nofile rlimit
The lookahead cache size requires the resource limit for open file
handles to be high in order to allow for efficient reuse of unchanged
file payloads.

Increase the nofile soft limit to the hard limit and dynamically adapt
the cache size to the new soft limit minus the half of the previous
soft limit.

The `PxarCreateOptions` and the `Archiver` are therefore extended by
an additional field to store the maximum cache size, with fallback to
a default size of 512 entries.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
7b352cc0cf client: tools: add helper to raise nofile rlimit
The default soft limit for open file handles is rather low, as some
apis (e.g. the POSIX `select(2)` syscall) do not work [0].

The lookahead cache use during the backup clients metadata comparison
to reuse unchanged files however requires much higher limits to work
effectively.

This helper function allows to raise the soft limit to the hard
limit, as provided by the `getrlimit(2)` syscall.

[0] https://0pointer.net/blog/file-descriptor-limits.html

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
992487929a client: pxar: add archive creation with reference test
Add a basic regression test for archive creation with reference
metadata archive and index.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
5a5d454083 client: chunk stream: switch payload stream chunker
Use the dedicated chunker with boundary suggestions for the payload
stream, by attaching the channel sender to the archiver and the
channel receiver to the payload stream chunker.

The archiver sends the file boundaries for the chunker to consume.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
589f510e7d chunk stream: tests: add regression tests for payload chunker
Regression tests to cover suggested and forced boundaries as well as
chunk injection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
e11ee319ce chunker: tests: add regression tests for payload chunker
Test chunking of a payload stream with suggested chunk boundaries.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
88ef759cc4 datastore: chunker: implement chunker for payload stream
Implement the Chunker trait for a dedicated payload stream chunker,
which extends the regular chunker by the option to suggest boundaries
to be used over the hast based boundaries whenever possible.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
e321815635 datastore: chunker: add Chunker trait
Add the Chunker trait and move the current Chunker to ChunkerImpl to
implement the trait instead. This allows to use different chunker
implementations by dynamic dispatch and is in preparation for
implementing a dedicated payload chunker.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:42 +02:00
Christian Ebner
a43399da06 pxar: add optional payload input to mount archive
Allow to pass an optional input path to mount a split pxar archive
with dedicated payload data file.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
108764f95b pxar: bin: support creation of split pxar archives via cli
Add support to create split pxar archives by redirecting the payload
output to a dedicated file.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
f16c5de757 pxar: bin: test pxar list with payload-input
Add a unit test to check for correct listing of pxar archives with
split payload input.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
1bec755b50 pxar: bin: ignore version and prelude entries in listing
Do not list the pxar format version and the prelude entries in the
output of pxar list, these are not regular entries. Do include them
however when dumping with the debug environmet variable set.
Since the prelude is arbitrary in size, only show the content size.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
58bfb6bb49 pxar: bin: show padding in debug output on archive list
In addition to the entries, also show the padding encountered in-between
referenced payloads.

Example invocation: `PXAR_LOG=debug pxar list archive.mpxar`

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
ee478ef1dc client: pxar: allow to restore prelude to optional path
Pxar archives allow to store additional information in a prelude
entry since pxar format version 2.

Add an optional parameter to `pxar` and `proxmox-backup-client` to
specify the path to restore the prelude to and pass this to the
archive extraction by extending the `PxarExtractOptions` by a
corresponding field. If none is given, the prelude is simply skipped
during restore.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
126fe1365d client: pxar: opt encode cli exclude patterns as Prelude
Instead of encoding the pxar cli exclude patterns as regular file
within the root directory of an archive, store this information
directly after the pxar format version entry in the entry of kind
Prelude.

This behavior is however currently exclusive to the archives written
with format version 2 in a split metadata and payload case.

This is a breaking change for the encoding of new cli exclude
parameters. Any new exclude parameter will not be added to an already
present .pxar-cliexclude file, and it will not be created if not
present.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
0cbfaf64b5 client: pxar: add helper to handle optional preludes
Pxar archives with format version 2 allows to store optional
information file format version and prelude entries.

Cover the case for these entries, the file format version entry being
introduced to distinguish between different file formats used for
encoding as well as the prelude entry used to store optional metadata
such as the pxar cli exlude parameters.

Add the logic to accept and decode these prelude entries when
accessing the archive via a decoder instance.

For now simply ignore them.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
7401be7e96 client: backup writer: make backup info output more concise
With the additional output in case of split pxar archives, the upload
statistics logged by the backup writer following a backup are crowded
and hard to read.

Make the output more concise by merging the currenlty 2 lines per
upload stream, shown as e.g.:

```
data.ppxar: had to backup 4 MiB of 10.943 GiB (compressed 159 B) in 49.30s
data.ppxar: average backup speed: 83.09 KiB/s
```

into a single line, shown as e.g.:

```
data.ppxar: had to back up 4 MiB of 10.943 GiB (159 B compressed) in 49.30 s (average 83.09 KiB/s)
```

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
5b91d85150 pxar: create: show chunk injection stats info output
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
64eddfffbe pxar: create: keep track of reused chunks and files
Track and log reused or reencoded files as well as the reused chunks
and their paddings.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
51bb7bc6b0 client: backup writer: add injected chunk count to stats
Track the number of injected chunks and show them in the debug output

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
7dcbe69a87 fix #3174: client: pxar: enable caching and meta comparison
When walking the file system tree, check for each entry if it is
reusable, meaning that the metadata did not change and the payload
chunks can be reindexed instead of reencoding the whole data.

If the metadata matched, the range of the dynamic index entries for
that file are looked up in the previous payload data index.
Use the range and possible padding introduced by partial reuse of
chunks to decide whether to reuse the dynamic entries and encode
the file payloads as payload reference right away or cache the entry
for now and keep looking ahead.

If however a non-reusable (because changed) entry is encountered
before the padding threshold is reached, the entries on the cache are
flushed to the archive by reencoding them, resetting the cached state.

Reusable chunk digests and size as well as reference offsets to the
start of regular files payloads within the payload stream are injected
into the backup stream by sending them to the chunker via a dedicated
channel, forcing a chunk boundary and inserting the chunks.

If the threshold value for reuse is reached, the chunks are injected
in the payload stream and the references with the corresponding
offsets encoded in the metadata stream.

Since multiple files might be contained within a single chunk, it is
assured that the deduplication of chunks is performed, by keeping back
the last chunk, so following files might as well reuse that same
chunk without double indexing it.  It is assured that this chunk is
injected in the stream also in case that the following lookups lead to
a cache clear and reencoding.

Directory boundaries are cached as well, and written as part of the
encoding when flushing.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
3149454511 client: pxar: refactor catalog encoding for directories
Move the catalog directory start and end encoding from `add_entry`
to the `add_directory`, the latter being called by the previous.

By this, the `add_entry` method can be reused to walk the filesystem
tree in the context of an enabled lookahead cache without encoding
anything.

No functional change intended.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00
Christian Ebner
6f23976247 pxar: caching: add look-ahead cache
Add a lookahead cache and the neccessary types to store the required
data and keep track of directory boundaries while traversing the
filesystem tree, in order to postpone a decision if to reuse or
reencode a given regular file with unchanged metadata.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
2024-06-05 16:39:41 +02:00