This check was added to guard the config format migration to a
dedicated section for cloudinit. The respective package version set
required for that to be understood is guaranteed to be available with
pve-manager 7.2-13 or newer, as that raised the versioned dependencies
respectively.
This hedges against a migration from a node with newer version to one
with older version, the effects would be basically that the name
argument in a cloudinit section would override the current one, as the
old parser interprets it as belonging to the main section, not the
cloudinit section.
We normally are cautious with removing such guards, and communicate
stricter requirements than we check, to safeguard users with a certain
ignorance or willingness to care for proper and periodic timely
upgrades.
But due to:
- PVE 7 being EOL since a few months
- PVE 7.2 being EOL for well over a year
- the documented requirement to upgrade to latest PVE 7.4 before an
upgrade to PVE 8
- The relatively harmless effects when this check is voided
we can drop that check more than safely now.
Reported-by: Christian Ebner <c.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The function checks for resources that cannot be migrated, snapshoted,
or suspended.
To run this function while the snapshot lock is active, the
pve-guest-common patch 'AbstractConfig: add abstract method to check for
resources preventing a snapshot.' is required.
Signed-off-by: Markus Frank <m.frank@proxmox.com>
As reported in the community forum [0], when a remote migration
request comes in via an API client, the -T flag for Perl is set, so an
insecure dependency in a call like unlink() in forward_unix_socket()
will fail with:
> failed to write forwarding command - Insecure dependency in unlink while running with -T switch
To fix it, untaint the problematic socket addresses coming from the
remote side. Require that all sockets are below '/run/qemu-server/'
and end with '.migrate' with the main socket being matched more
strictly. This allows extensions in the future while still being quite
strict.
[0]: https://forum.proxmox.com/threads/123048/post-691958
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The version of the running QEMU binary is not related to the machine
version and so it's a bit confusing to have the helper in the
'Machine' module. It cannot live in the 'Helpers' module, because that
would lead to a cyclic inclusion Helpers <-> Monitor. Thus,
'QMPHelpers' is chosen as the new home.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
There is a possibility that the drive-mirror job is not yet done when
the migration wants to inactivate the source's blockdrives:
> bdrv_co_write_req_prepare: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
This can be prevented by using the 'write-blocking' copy mode (also
called active mode) for the mirror. However, with active mode, the
guest write speed is limited by the synchronous writes to the mirror
target. For this reason, a way to start out in the faster 'background'
mode and later switch to active mode was introduced in QEMU 8.2.
The switch is done once the mirror job for all drives is ready to be
completed to reduce the time spent where guest IO is limited.
The loop waiting for actively-synced to become true is not an endless
loop: Once the remaining dirty parts have been mirrored by the
background iteration, the actively-synced flag will be set. Because
the 'block-job-change' QMP command already succeeded, new writes will
be done synchronously to the target and thus not lead to new dirty
parts. If the job fails or vanishes (shouldn't actually happen,
because auto-dismiss is false), the loop will be exited and the error
propagated.
Reported rarely, but steadily over the years:
https://forum.proxmox.com/threads/78954/post-353651https://forum.proxmox.com/threads/78954/post-380015https://forum.proxmox.com/threads/100020/post-431660https://forum.proxmox.com/threads/111831/post-482425https://forum.proxmox.com/threads/111831/post-499807https://forum.proxmox.com/threads/137849/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
In Proxmox VE 8, the oldest supported QEMU version is 8.0, so a
check for version 4.2 is not required anymore. The check was also
wrong, because it checked the installed version and not the currently
running one.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
During migration, the volume names may change if the name is already in
use at the target location. We therefore want to save the original names
so that we can deactivate the original volumes afterwards.
Signed-off-by: Hannes Duerr <h.duerr@proxmox.com>
It is not yet supported for QEMU's vdagent device which is used for
the VNC clipboard.
The migration precondition API call will now treat the VNC clipboard
as a local resource. Thus the GUI blocks migration and shows:
"Can't migrate VM with local resources: clipboard=vnc"
QemuMigrate's prepare function will also abort live migration early
when using the VNC clipboard.
Signed-off-by: Markus Frank <m.frank@proxmox.com>
[FE: adapt commit message a bit]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Fix races with ACPI-suspended VMs which could wake up during migration
or during a suspend-mode backup.
Revert prevention, of ACPI-suspended VMs automatically resuming after
migration, introduced by 7ba974a682. The commit introduced a
potential problem that causes a suspended VM that wakes up during
migration to remain paused after the migration finishes.
This can be fixed once QEMU preserves the 'suspended' runstate during
migration (current patch on the qemu-devel list [0]) by checking for
the 'suspended' runstate on the target after migration.
Furthermore the commit increased the race window during the
preparation of a suspend-mode backup, when a suspended VM wakes up
between the vm_is_paused check in PVE::VZDump::QemuServer::prepare and
PVE::VZDump::QemuServer::qga_fs_freeze. This causes the code to skip
fs-freeze even if the VM has woken up, potentially leaving the file
system in an inconsistent state.
To prevent this, do not treat the suspended runstate as paused when
migrating or archiving a VM.
[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-08/msg05260.html
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
[ TL: massage in Fiona's extra info into commit message ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
In preparation to add more properties to the memory configuration like
maximum hotpluggable memory and whether virtio-mem devices should be
used.
This also allows to get rid of the cyclic include of PVE::QemuServer
in the memory module.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
[FE: also convert new usage in get_derived_property
remove cyclic include of PVE::QemuServer
add commit message]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Aliased volids can lead to unexpected behavior in a migration.
An aliased volid can happen if we have two storage configurations,
pointing to the same place. The resulting 'path' for a disk image
will be the same.
Therefore, stop the migration in such a case.
The check works by comparing the path returned by the storage plugin.
We decided against checking the storages themselves being aliased. It is
not possible to infer that reliably from just the storage configuration
options alone.
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Since we don't scan all storages for matching disk images anymore for a
migration we don't have any images found via storage alone. They will be
referenced in the config somewhere.
Therefore, there is no need for the 'storage' ref.
The 'referenced_in_config' is not really needed and can apply to both,
attached and unused disk images.
Therefore the QemuServer::foreach_volid() will change the
'referenced_in_config' attribute to an 'is_attached' one that only
applies to disk images that are in the _main_ config part and are not
unused.
In QemuMigrate::scan_local_volumes() we can then quite easily map the
refs to each state, attached, unused, referenced_in_{pending,snapshot}.
The refs are mostly used for informational use to print out in the logs
why a disk image is part of the migration. Except for the 'attached' case.
In the future the extra step of the refs in QemuMigrate could probably
be streamlined even more.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
When scanning all configured storages for disk images belonging to the
VM, the migration could easily fail if a storage is not available, but
enabled. That storage might not even be used by the VM at all.
By not scanning all storages and only looking at the disk images
referenced in the VM config, we can avoid unnecessary failures.
Some information that used to be provided by the storage scanning needs
to be fetched explicilty (size, format).
Behaviorally the biggest change is that unreferenced disk images will
not be migrated anymore. Only images referenced in the config will be
migrated.
The tests have been adapted accordingly.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
All calling sites except for QemuConfig.pm::get_replicatable_volumes()
already enabled it. Making it the non-configurable default results in a
change in the VM replication. Now a disk image only referenced in the
pending section will also be replicated.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Make it possible to optionally iterate over disks in the pending section
of VMs, similar as to how snapshots are handled already.
This is for example useful in the migration if we don't want to rely on
the scanning of all storages.
All calling sites are adapted and enable it, except for
QemuConfig::get_replicatable_volumes as that would cause a change for
the replication if pending disks would be included.
The following lists the calling sites and if they should be fine with
the change (source [0]):
1. QemuMigrate: scan_local_volumes(): needed to include pending disk
images
2. API2/Qemu.pm: check_vm_disks_local() for migration precondition:
related to migration, so more consistent with pending
3. QemuConfig.pm: get_replicatable_volumes(): would change the behavior
of the replication, will not use it for now.
4. QemuServer.pm: get_vm_volumes(): is used multiple times by:
4a. vm_stop_cleanup() to deactivate/unmap: should also be fine with
including pending
4b. QemuMigrate.pm: in prepare(): part of migration, so more consistent
with pending
4c. QemuMigrate.pm: in phase3_cleanup() for deactivation: part of
migration, so more consistent with pending
[0] https://lists.proxmox.com/pipermail/pve-devel/2023-May/056868.html
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
they can only be migrated to nodes where there exists a mapping and if
the migration is done offline
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Markus Frank <m.frank@proxmox.com>
We use this in a few places. By factoring it into its own function, we
can avoid running slightly different checks in various places.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Since commit 9b6efe43 ("migrate: add live-migration of replicated
disks") live-migration with replicated volumes is possible. When
handling the replication, it is checked that all local volumes
previously detected as replicatable are actually replicated. So the
check if migration with snapshots is possible can just allow volumes
that are detected as replicatable.
Note that VM state files are also replicated.
If there is an invalid configuration with a non-replicatable volume or
state file and replication is enabled, then replication will fail, and
thus migration will fail early.
Trying to live-migrate to a non-replication target (needs --force)
will still fail if there are snapshots, because they are (correctly)
detected as non-replicated.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
if it is present. Should give more information for some failures and
even if it's not present, that fact helps to narrow things down.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
was only explained in git history and vm_stop, add comments in other
relevant places to avoid future breakage.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
remote migration uses a websocket connection to a task worker running on
the target node instead of commands via SSH to control the migration.
this websocket tunnel is started earlier than the SSH tunnel, and allows
adding UNIX-socket forwarding over additional websocket connections
on-demand.
the main differences to regular intra-cluster migration are:
- source VM config and disks are only removed upon request via --delete
- shared storages are treated like local storages, since we can't
assume they are shared across clusters (with potentical to extend this
by marking storages as shared)
- NBD migrated disks are explicitly pre-allocated on the target node via
tunnel command before starting the target VM instance
- in addition to storages, network bridges and the VMID itself is
transformed via a user defined mapping
- all commands and migration data streams are sent via a WS tunnel proxy
- pending changes and snapshots are discarded on the target side (for
the time being)
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
no semantic changes intended, except for:
- no longer passing the main migration UNIX socket to SSH twice for
forwarding
- dropping the 'unix:' prefix in start_remote_tunnel's timeout error message
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
The former to ensure the manager that depends on the newer
qemu-server is actually installed and the latter to avoid false
positives
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
at the end of a live migration, we need to remove old mac entries
on source host (vm is not yet stopped), before resume vm on target host
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
[T: resolve conflicts and rework on apply ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Generalizes fd95d780 ("migrate: send updated TPM state volid to target
node") to also handle other offline migrated disks appearing in the
VM config, which currently should only be cloud-init.
Breaks migration new -> old under similar (edge-case-)conditions as
fd95d780 did.
Keep sending the 'tpmstate0' STDIN parameter to avoid breaking new ->
old in the scenario fd95d780 fixed.
Keep parsing the vm_start 'tpmstate0' STDIN parameter to avoid
breaking old -> new, and to be able to keep sending it.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
When phase2() is aborted after the migration already converged, then
after migrate_cancel, the VM might be in POSTMIGRATE state.
(There also is a conditional for SHUTDOWN state in QEMU's
migration_iteration_finish(), so it's likely possible to end up there
if the VM is shut down at the right time during migration, but no need
to resume then).
Detect the POSTMIGRATE state and resume the VM if it wasn't paused at
the beginning of the migration. There is no direct way to go to
PAUSED, so just print an error if the VM was paused at the beginning
of the migration.
Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Also cannot issue a guest agent command in that case.
Reported in the community forum:
https://forum.proxmox.com/threads/106618
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
besides the log calls these don't need any parts of the migration state,
so let's make them generic and re-use them for container migration and
replication in the future.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
since we are going to reuse the same mechanism/code for network bridge
mapping and pve-container.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
The volid may change if local-storage migration is involved, we need
to tell the target node the new one and update the in-memory config
for starting the target VM accordingly.
Reported here: https://forum.proxmox.com/threads/99906/#post-431345
this possibly breaks migration new -> old iff
- spice is not used (else the explicit ticket wins because it comes
later)
- a local TPM state volume is used
- that local TPM state volume has a different volume id on the target
node (switched storage, volname already taken, ..)
because the target node will then mis-interpret the tpmstate0 line as
spice ticket and set it accordingly. if the old tpm state volume ID does
not exist on the target node, migration will fail. if it exists by
chance, it might work albeit with a wrong spice ticket (new because of
this patch) and tpm state volume (pre-existing breakage).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Starts an instance of swtpm per VM in it's systemd scope, it will
terminate by itself if the VM exits, or be terminated manually if
startup fails.
Before first use, a TPM state is created via swtpm_setup. State is
stored in a 'tpmstate0' volume, treated much the same way as an efidisk.
It is migrated 'offline', the important part here is the creation of the
target volume, the actual data transfer happens via the QEMU device
state migration process.
Move-disk can only work offline, as the disk is not registered with
QEMU, so 'drive-mirror' wouldn't work. swtpm itself has no method of
moving a backing storage at runtime.
For backups, a bit of a workaround is necessary (this may later be
replaced by NBD support in swtpm): During the backup, we attach the
backing file of the TPM as a read-only drive to QEMU, so our backup
code can detect it as a block device and back it up as such, while
ensuring consistency with the rest of disk state ("snapshot" semantic).
The name for the ephemeral drive is specifically chosen as
'drive-tpmstate0-backup', diverging from our usual naming scheme with
the '-backup' suffix, to avoid it ever being treated as a regular drive
from the rest of the stack in case it gets left over after a backup for
some reason (shouldn't happen).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The '--targetstorage' parameter does not apply to shared storages.
Example for a problem solved with the enabled check: Given a VM with
images only on a shared storage 'storeA', not available on the target
node (i.e. restricted by the nodes property). Then using
'--targetstorage storeB' would make offline migration suddenly
"work", but of course the disks would not be accessible and then
trying to migrate back would fail...
Example for a problem solved with the content type check: if a
VM had a shared ISO image, and there was a '--targetstorage storeA'
option, availablity of the 'iso' content type is checked for
'storeA', which is wrong as the ISO would not be moved to that
storage.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
The content of the ISO should be the same on both nodes, so offline
migrate the ISO, but don't regenerate it on VM start on the target node.
This way even with snippets the content will not change during live
migration.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
and use it for the vdisk_list call too. This avoids scanning (and picking up
volumes from!) storages that are not even configured to hold images.
Previously, the content type was only enforced when a storage map was present.
Also serves a bit as a preparation to enforce content type on guest startup,
because now migration failure happens early and not only when trying to start
the guest on the remote node.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
storage_check_enabled simply checks for the 'disable' option and then calls
storage_check_node.
While not strictly necessary for a second call where only the storage differs,
e.g. in case of clone, it is more future-proof: if support for a target storage
is added at some point, it might be easy to miss adapting the call.
For the migration checks, the situation is improved by now always catching
disabled (target) storages.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This reverts commit ff09c795ed. We wanted to wait
until PVE 7.0 for the change to not break migration new -> old until then.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Reviewed-by: Stefan Reiter <s.reiter@proxmox.com>