The local one is specific for `allocate_fleecing_images` and has a
comment stating to use the one from `PVE::QemuConfig` in all other
cases.
The `cleanup` sub already called this, but only if the VM was running.
We do allocate fleecing images for previously-stopped VMs as well,
though, so we also need to do the cleanup.
As for the `detach_fleecing_images()` call: while could have stayed in
the `vm_running_locall()` branch, it also performs this check and this
way the entire fleecing cleanup stays together in one place.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
VirtIO-fs using writeback cache seems very broken at the moment. If a
guest accesses a file (even just using 'touch'), that the host is
currently writing, the guest can permanently end up with a truncated
version of that file. Even subsequent operations like moving the file,
will not result in the correct file being visible, but just rename the
truncated one.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Version 5.2.0 of libpve-guest-common-perl is required for the
PVE/Mapping/Dir.pm module, but there was a transitive dependency for
libpve-cluster-perl missing for tracking the corresponding file on the
cluster file system and build would still fail with: > unknown file
'mapping/directory.cfg' at /usr/share/perl5/PVE/Cluster.pm
Version 5.2.2 of libpve-guest-common-perl depends on recent enough
libpve-cluster-perl to fix this.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Markus Frank <m.frank@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Laurențiu Leahu-Vlăducu <l.leahu-vladucu@proxmox.com>
Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Laurențiu Leahu-Vlăducu <l.leahu-vladucu@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Lukas Wagner <l.wagner@proxmox.com>
Link: https://lore.proxmox.com/20250407134950.265270-6-m.frank@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
add dir mapping checks to check_local_resources
Since the VM needs to be powered off for migration, migration should
work with a directory on shared storage with all caching settings.
Signed-off-by: Markus Frank <m.frank@proxmox.com>
Link: https://lore.proxmox.com/20250407134950.265270-5-m.frank@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Add support for sharing directories with a guest VM.
virtio-fs needs virtiofsd to be started. In order to start virtiofsd
as a process (despite being a daemon it is does not run in the
background), a double-fork is used.
virtiofsd should close itself together with QEMU.
There are the parameters dirid and the optional parameters direct-io,
cache and writeback. Additionally the expose-xattr & expose-acl
parameter can be set to expose xattr & acl settings from the shared
filesystem to the guest system.
The dirid gets mapped to the path on the current node and is also used
as a mount tag (name used to mount the device on the guest).
example config:
```
virtiofs0: foo,direct-io=1,cache=always,expose-acl=1
virtiofs1: dirid=bar,cache=never,expose-xattr=1,writeback=1
```
For information on the optional parameters see the coherent doc patch
and the official gitlab README:
https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md
Also add a permission check for virtiofs directory access.
Add virtiofsd to the Recommends list for the qemu-server Debian
package, this allows users to opt-out of installing this package, e.g.
for certification reasons.
Signed-off-by: Markus Frank <m.frank@proxmox.com>
Link: https://lore.proxmox.com/20250407134950.265270-3-m.frank@proxmox.com
Tested-by: Lukas Wagner <l.wagner@proxmox.com>
[TL: squash d/control change and re-add Lukas' T-b, as nothing
essentially changed from the v16 where his tag applied]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
In spirit, this is a revert of 502870a0 ("qmeventd: extract vmid from
cgroup file instead of cmdline"), but instead of relying on the custom
'id' commandline option that's added by a Proxmox VE patch to QEMU,
rely on the standard 'pidfile' option to extract the VM ID.
As reported in the community forum [0], at least during stop mode
backup, it seems to be possible to end up with the VM process having
> 0::/system.slice/pvescheduler.service
as its single cgroup entry. It's not clear what exactly happens and
there was no success to reproduce the issue. Might be a rare bug in
systemd or in pve-common's enter_systemd_scope() code.
This was not the first time relying on the cgroup entry caused issues,
see d0b58753 ("qmeventd: improve getting VMID from PID in presence of
legacy cgroup entries").
To avoid such edge cases and issues in the future, go back to
extracting the VM ID from the process's commandline.
It's enough to care about the first occurrence of the 'pidfile'
option, because that's the one added by Proxmox VE, so the 'continue's
in the loop turn into 'break's. Even though a later option would
override the first for QEMU itself to use, that's not supported
anyways and the important part is the VM ID which is present in the
first.
[0]: https://forum.proxmox.com/threads/147409/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20240614092134.18729-1-f.ebner@proxmox.com
This makes it a bit more obvious what happens and having an actual
error for bogus $PVE_MACHINE_VERSION entries.
Note that there was no auto-vivification before, as we never directly
accessed $PVE_MACHINE_VERSION->{$verstr}->{highest} but used
get_machine_pve_revisions to query a specific QEMU machine version's
PVE revisions and then operated on the return value, and that method
returns undef if there is no entry at all.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This should have been in the patch doing the change :(
Fixes: 65b2041 ("vm-network-scripts: move scripts to /usr/libexec")
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Moves the network scripts from /var/lib/qemu-server into
/usr/libexec/qemu-server.
/usr/libexec is described as binaries run by programs which are not
intended to be directly executed by the user on [FHS 4.7]. On the other
hand /var/lib corresponds to variable state information, which does not
fit the use case here, see [FHS 5.8].
For the sake of preventing race conditions during upgrade we ship both
versions until version 9. This is required as package files are first
unpacked, including the removal of files not shipped by the new
version anymore, and only then configured, which triggers the restart
of the services.
[FHS 4.7]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s07.html
[FHS 5.8]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s08.html
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Link: https://lore.proxmox.com/20250218133206.318155-1-m.sandoval@proxmox.com
Fiona Ebner <f.ebner@proxmox.com> says:
Record the created fleecing images in the VM configuration to be able
to remove left-overs after hard failures.
Adds a new special configuration section 'fleecing', making special
section handling more generic as preparation, as well as fixing some
corner cases in configuration parsing and adding tests.
Fiona Ebner (16):
migration: remove unused variable
test: avoid duplicate mock module in restore config test
test: add parse config tests
parse config: be precise about section type checks
test: add test case exposing issue with unknown sections
parse config: skip unknown sections and warn about their presence
vzdump: anchor matches for pending and special sections
vzdump: skip all special sections
config: make special section handling generic
test: parse config: test config with duplicate sections
parse config: warn about duplicate sections
check type: require schema as an argument
config: add fleecing section
fix#5440: vzdump: better cleanup fleecing images after hard errors
migration: attempt to clean up potential left-over fleecing images
destroy vm: clean up potential left-over fleecing images
Link: https://lore.proxmox.com/20250127112923.31703-1-f.ebner@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Clean up left-over fleecing images before the guest is migrated to a
different node and they'd really become orphaned.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-16-f.ebner@proxmox.com
By recording the allocated fleecing images in the VM config, they
are not immediately orphaned, should a hard error occur during
backup that prevents cleanup.
They are attempted to be cleaned up during the next backup run.
In the cleanup helper, check if fleecing images are still attached in
QEMU and detach them. This allows recovering from more failure
scenarios. However, to avoid a deadlock, a left-over backup job needs
to be canceled first. While canceling a left-over backup already
happens when cleanup is done for a subsquent backup, it is required
for other cases that like cleanup before migration (to be added in a
following commit).
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-15-f.ebner@proxmox.com
Currently, a duplicate section will quietly override the previous
instance of the section with the same identifier. Keep the current
behavior of preferring later entries, but issue a warning or die when
parsing strictly.
The entry for 'pending' in the result needs to start out as undefined
for the check to also work in presence of empty sections. Avoid
changing the returned value itself, by making sure to initialize the
entry before returning.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-12-f.ebner@proxmox.com
Collect special sections below a common 'special-sections' key in
preparation to introduce a new special section.
The special 'cloudinit' section was added in the top-level of the
configuration structure, but it's cleaner to group special sections
more similar to snapshots.
The 'cloudinit' key was already initialized, so having the new
'special-sections' key be always initialized should not cause issues
after checking and adapting all usages of 'cloudinit' which this patch
attempts to do.
Add compat handling for remote migration which might receive the
configuration hash from a node that does not yet have the changes.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-10-f.ebner@proxmox.com
Also log an informational message just like for pending and snapshot
sections.
Add a comment about it to parse_vm_config() in the hope that the
behavior is noted when introducing a future special section.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-9-f.ebner@proxmox.com
Otherwise, a snapshot with a name that includes "pending" will be
misinterpreted as the pending section.
Only affects the warning messages, but still confusing.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-8-f.ebner@proxmox.com
Currently, keys in an unknown section will be interpreted as still
belonging to the last section and might erroneously overwrite values
in that way. Explicitly ignore unknown sections to avoid this while
warning the user.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-7-f.ebner@proxmox.com
While unknown sections do lead to an error in strict mode, in
non-strict mode the line is just skipped, meaning that key-value
pairs from the unknown section will override the key-value pairs from
the previous section.
Fixed by the next commit.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-6-f.ebner@proxmox.com
There are checks for custom parsing behavior inside certain sections
relying only on the section name. While the name 'pending' cannot be
used by snapshots, the name 'cloudinit' can. Introduce an associated
section type to make the checks precise.
The test was not added in a separate commit, because it would fail
when writing the config before the fix, and failure in writing is
never expected by the test script. So there is no easy way to
highlight just the difference in behavior together with the fix and
the git history should stay bisectable.
Compare with the verify-snapshot.conf testcase without the actual fix
applied to see the difference.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-5-f.ebner@proxmox.com
Tests for parsing and writing VM configuration files. The parsing part
is already covered by the config2command test too, but that only
focuses on the main section, not other section types and does not also
test parsing in strict mode.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-4-f.ebner@proxmox.com
The duplication is there, because two independet fixes for a test
failure where applied, namely commits:
75c430ce ("test: unbreak restore_config_test")
cc1cdadb ("test: fix restore config test as unprivileged user")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-3-f.ebner@proxmox.com
This was added by a89fded1 ("migration : enable auto-converge
capability v2"), but migration capabilities are already disabled by
default and there is nothing special about 'rdma-pin-all' compared to
other ones that are not used by Proxmox VE. Morover, the code
currently doesn't even do anything, because the capability would need
to be set as 'rdma-pin-all' without the experimental 'x-' marker
prefix (at least since QEMU 8.0, maybe longer).
The function to set migration capabilities already queries which ones
are supported by the currently running QEMU and ignores others, so
there was no error about the invalid name.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250407073256.8889-5-f.ebner@proxmox.com
The 'zero-blocks' capability was deprecated in QEMU commit 73581a041e
("migration: Deprecate zero-blocks capability")
> The zero-blocks capability was meant to be used along with the block
> migration, which has been removed already in commit eef0bae3a7
> ("migration: Remove block migration").
> Setting zero-blocks is currently a noop, but the outright removal of
> the capability would cause and error in case some users are still
> setting it. Put the capability through the deprecation process.
The default for the capability already was disabled (checked in QEMU
8.0 source).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250407073256.8889-4-f.ebner@proxmox.com
This was added by commit b62532e4 ("migration: disable compress")
stating:
> it's already disable by default,
> but we want to be sure if it's change in later release
QEMU never did change the default (verified with QEMU 8.0 and that
would be would've been a breaking change from QEMU's side).
The 'compress' capability was removed in QEMU 9.1, with QEMU commit
0222111a22 ("migration: Remove non-multifd compression").
The function to set migration capabilities already queries which ones
are supported by the currently running QEMU and ignores others, so
there was no error about 'compress'.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250407073256.8889-3-f.ebner@proxmox.com
The 'reconnect' option was replaced by 'reconnect-ms' in QEMU commit
c8e2b6b4d7 ("chardev: introduce 'reconnect-ms' and deprecate
'reconnect'").
Makes qemu-server build-depend on QEMU 9.2 for the tests.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250407073256.8889-2-f.ebner@proxmox.com
The TPM state drive is newly attached each time, so it is fully
expected that a bitmap from last time would be missing.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-21-f.ebner@proxmox.com
A new 'missing-recreated' action was added on the QEMU side.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-20-f.ebner@proxmox.com
The features returned by the 'query-proxmox-support' QMP command are
booleans, so just checking for definedness is not enough in principle.
In practice, a feature is currently always true if defined. Still, fix
the checks, should the need to disable support for a feature ever
arise in the future and to avoid propagating the pattern further.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-19-f.ebner@proxmox.com
First, the provider is asked about what restore mechanism to use.
Currently, only 'qemu-img' is possible. Then the configuration files
are restored, the provider gives information about volumes contained
in the backup and finally the volumes are restored via
'qemu-img convert'.
The code for the restore_external_archive() function was copied and
adapted from the restore_proxmox_backup_archive() function. Together
with restore_vma_archive() it seems sensible to extract the common
parts and use a dedicated module for restore code.
The parse_restore_archive() helper was renamed, because it's not just
parsing.
While currently, the format for the source can only be raw, do an
untrusted check for the source for future-proofing. Still serves as a
basic sanity check currently.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[WB: fix 'bwlimit' typo]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-18-f.ebner@proxmox.com
In preparation for the restore API for backup providers that doesn't
want detection based on the file extension but always requires raw.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-17-f.ebner@proxmox.com
In preparation to add another option and to improve style for the
callers.
One of the test cases that specified $is_zero_initialized is for a
non-existent storage, so the option was not added there.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-16-f.ebner@proxmox.com
The state of the VM's disk images at the time the backup is started is
preserved via a snapshot-access block node. Old data is moved to the
fleecing image when new guest writes come in. The snapshot-access
block node, as well as the associated bitmap in case of incremental
backup, will be made available to the external provider. They are
exported via NBD and for 'nbd' mechanism, the NBD socket path is
passed to the provider, while for 'file-handle' mechanism, the NBD
export is made accessible via a file handle and the bitmap information
is made available via a $next_dirty_region->() function. For
'file-handle', the 'nbdinfo' and 'nbdfuse' binaries are required.
The provider can indicate that it wants to do an incremental backup by
returning the bitmap ID that was used for a previous backup and it
will then be told if the bitmap was newly created (either first backup
or old bitmap was invalid) or if the bitmap can be reused.
The provider then reads the parts of the NBD or virtual file it needs,
either the full disk for full backup, or the dirty parts according to
the bitmap for incremental backup. The bitmap has to be respected,
reads to other parts of the image will return an error. After backing
up each part of the disk, it should be discarded in the export to
avoid unnecessary space usage in the fleecing image (requires the
storage underlying the fleecing image to support discard too).
[WB: - instead of backup_vm_available_bitmaps call
backup_vm_query_incremental, which provides a bitmap-mode
instead, pass this along and use just the storage id as bitmap
name]
[FE: Move construction of $devices parameter to after sizes are
available. They were undef before.
Adapt to changed backup-access QMP interface not requiring an
explicit bitmap name anymore.]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-14-f.ebner@proxmox.com
For the external backup API, it will be necessary to add a fleecing
image even for small disks like EFI and TPM, because there is no other
place the old data could be copied to when a new guest write comes in.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-13-f.ebner@proxmox.com
A non-1KiB aligned source image could cause issues when used with
qcow2 fleecing images, e.g. for an image with size 4.5 KiB:
> Size mismatch for 'drive-tpmstate0-backup-fleecing' - sector count 10 != 9
Raw images are attached to QEMU with an explicit 'size' argument, so
rounding up before allocation doesn't matter, but it does for qcow2.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-12-f.ebner@proxmox.com
For fleecing, the size needs to match exactly what QEMU sees. In
particular, EFI disks might be attached with a 'size=' option, meaning
that size can be different from the volume's size. Commit 36377acf
("backup: disk info: also keep track of size") introduced size
tracking and it was used for fleecing since then, but the accurate
size information needs to be queried via QMP.
Should also help with the following issue reported in the community
forum:
https://forum.proxmox.com/threads/152202
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-11-f.ebner@proxmox.com
It's a bit nicer to avoid multiple hashes, this keeps the information
also closer together, albeit that alone wasn't the reason for doing
this because this (hopefully) will never grow that big, rather it
should encourage adding changes entries until we test for that and it
also makes handling these entries a tiny bit more natural as no
wrapper counter for-loops doing integer string interpolation are
required.
While at it expand the comment, might be even a tad to verbose now,
and add a new helper to get these per-machine revisions.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Avoid pushing our array of downstream revisions to the array of
upstream machines if we create completely new array on return anyway,
just let sort iterate over both arrays instead.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Should better denote what this is about, the schema description called
it that already anyway. Avoid using a pve specific property name in
case we add some changes for the upstream revisions in the future.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>