Aliased volids can lead to unexpected behavior in a migration.
An aliased volid can happen if we have two storage configurations,
pointing to the same place. The resulting 'path' for a disk image
will be the same.
Therefore, stop the migration in such a case.
The check works by comparing the path returned by the storage plugin.
We decided against checking the storages themselves being aliased. It is
not possible to infer that reliably from just the storage configuration
options alone.
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Since we don't scan all storages for matching disk images anymore for a
migration we don't have any images found via storage alone. They will be
referenced in the config somewhere.
Therefore, there is no need for the 'storage' ref.
The 'referenced_in_config' is not really needed and can apply to both,
attached and unused disk images.
Therefore the QemuServer::foreach_volid() will change the
'referenced_in_config' attribute to an 'is_attached' one that only
applies to disk images that are in the _main_ config part and are not
unused.
In QemuMigrate::scan_local_volumes() we can then quite easily map the
refs to each state, attached, unused, referenced_in_{pending,snapshot}.
The refs are mostly used for informational use to print out in the logs
why a disk image is part of the migration. Except for the 'attached' case.
In the future the extra step of the refs in QemuMigrate could probably
be streamlined even more.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
When scanning all configured storages for disk images belonging to the
VM, the migration could easily fail if a storage is not available, but
enabled. That storage might not even be used by the VM at all.
By not scanning all storages and only looking at the disk images
referenced in the VM config, we can avoid unnecessary failures.
Some information that used to be provided by the storage scanning needs
to be fetched explicilty (size, format).
Behaviorally the biggest change is that unreferenced disk images will
not be migrated anymore. Only images referenced in the config will be
migrated.
The tests have been adapted accordingly.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
All calling sites except for QemuConfig.pm::get_replicatable_volumes()
already enabled it. Making it the non-configurable default results in a
change in the VM replication. Now a disk image only referenced in the
pending section will also be replicated.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Make it possible to optionally iterate over disks in the pending section
of VMs, similar as to how snapshots are handled already.
This is for example useful in the migration if we don't want to rely on
the scanning of all storages.
All calling sites are adapted and enable it, except for
QemuConfig::get_replicatable_volumes as that would cause a change for
the replication if pending disks would be included.
The following lists the calling sites and if they should be fine with
the change (source [0]):
1. QemuMigrate: scan_local_volumes(): needed to include pending disk
images
2. API2/Qemu.pm: check_vm_disks_local() for migration precondition:
related to migration, so more consistent with pending
3. QemuConfig.pm: get_replicatable_volumes(): would change the behavior
of the replication, will not use it for now.
4. QemuServer.pm: get_vm_volumes(): is used multiple times by:
4a. vm_stop_cleanup() to deactivate/unmap: should also be fine with
including pending
4b. QemuMigrate.pm: in prepare(): part of migration, so more consistent
with pending
4c. QemuMigrate.pm: in phase3_cleanup() for deactivation: part of
migration, so more consistent with pending
[0] https://lists.proxmox.com/pipermail/pve-devel/2023-May/056868.html
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
they can only be migrated to nodes where there exists a mapping and if
the migration is done offline
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Markus Frank <m.frank@proxmox.com>
We use this in a few places. By factoring it into its own function, we
can avoid running slightly different checks in various places.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Since commit 9b6efe43 ("migrate: add live-migration of replicated
disks") live-migration with replicated volumes is possible. When
handling the replication, it is checked that all local volumes
previously detected as replicatable are actually replicated. So the
check if migration with snapshots is possible can just allow volumes
that are detected as replicatable.
Note that VM state files are also replicated.
If there is an invalid configuration with a non-replicatable volume or
state file and replication is enabled, then replication will fail, and
thus migration will fail early.
Trying to live-migrate to a non-replication target (needs --force)
will still fail if there are snapshots, because they are (correctly)
detected as non-replicated.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
if it is present. Should give more information for some failures and
even if it's not present, that fact helps to narrow things down.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
was only explained in git history and vm_stop, add comments in other
relevant places to avoid future breakage.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
remote migration uses a websocket connection to a task worker running on
the target node instead of commands via SSH to control the migration.
this websocket tunnel is started earlier than the SSH tunnel, and allows
adding UNIX-socket forwarding over additional websocket connections
on-demand.
the main differences to regular intra-cluster migration are:
- source VM config and disks are only removed upon request via --delete
- shared storages are treated like local storages, since we can't
assume they are shared across clusters (with potentical to extend this
by marking storages as shared)
- NBD migrated disks are explicitly pre-allocated on the target node via
tunnel command before starting the target VM instance
- in addition to storages, network bridges and the VMID itself is
transformed via a user defined mapping
- all commands and migration data streams are sent via a WS tunnel proxy
- pending changes and snapshots are discarded on the target side (for
the time being)
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
no semantic changes intended, except for:
- no longer passing the main migration UNIX socket to SSH twice for
forwarding
- dropping the 'unix:' prefix in start_remote_tunnel's timeout error message
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
The former to ensure the manager that depends on the newer
qemu-server is actually installed and the latter to avoid false
positives
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
at the end of a live migration, we need to remove old mac entries
on source host (vm is not yet stopped), before resume vm on target host
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
[T: resolve conflicts and rework on apply ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Generalizes fd95d780 ("migrate: send updated TPM state volid to target
node") to also handle other offline migrated disks appearing in the
VM config, which currently should only be cloud-init.
Breaks migration new -> old under similar (edge-case-)conditions as
fd95d780 did.
Keep sending the 'tpmstate0' STDIN parameter to avoid breaking new ->
old in the scenario fd95d780 fixed.
Keep parsing the vm_start 'tpmstate0' STDIN parameter to avoid
breaking old -> new, and to be able to keep sending it.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
When phase2() is aborted after the migration already converged, then
after migrate_cancel, the VM might be in POSTMIGRATE state.
(There also is a conditional for SHUTDOWN state in QEMU's
migration_iteration_finish(), so it's likely possible to end up there
if the VM is shut down at the right time during migration, but no need
to resume then).
Detect the POSTMIGRATE state and resume the VM if it wasn't paused at
the beginning of the migration. There is no direct way to go to
PAUSED, so just print an error if the VM was paused at the beginning
of the migration.
Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Also cannot issue a guest agent command in that case.
Reported in the community forum:
https://forum.proxmox.com/threads/106618
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
besides the log calls these don't need any parts of the migration state,
so let's make them generic and re-use them for container migration and
replication in the future.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
since we are going to reuse the same mechanism/code for network bridge
mapping and pve-container.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
The volid may change if local-storage migration is involved, we need
to tell the target node the new one and update the in-memory config
for starting the target VM accordingly.
Reported here: https://forum.proxmox.com/threads/99906/#post-431345
this possibly breaks migration new -> old iff
- spice is not used (else the explicit ticket wins because it comes
later)
- a local TPM state volume is used
- that local TPM state volume has a different volume id on the target
node (switched storage, volname already taken, ..)
because the target node will then mis-interpret the tpmstate0 line as
spice ticket and set it accordingly. if the old tpm state volume ID does
not exist on the target node, migration will fail. if it exists by
chance, it might work albeit with a wrong spice ticket (new because of
this patch) and tpm state volume (pre-existing breakage).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Starts an instance of swtpm per VM in it's systemd scope, it will
terminate by itself if the VM exits, or be terminated manually if
startup fails.
Before first use, a TPM state is created via swtpm_setup. State is
stored in a 'tpmstate0' volume, treated much the same way as an efidisk.
It is migrated 'offline', the important part here is the creation of the
target volume, the actual data transfer happens via the QEMU device
state migration process.
Move-disk can only work offline, as the disk is not registered with
QEMU, so 'drive-mirror' wouldn't work. swtpm itself has no method of
moving a backing storage at runtime.
For backups, a bit of a workaround is necessary (this may later be
replaced by NBD support in swtpm): During the backup, we attach the
backing file of the TPM as a read-only drive to QEMU, so our backup
code can detect it as a block device and back it up as such, while
ensuring consistency with the rest of disk state ("snapshot" semantic).
The name for the ephemeral drive is specifically chosen as
'drive-tpmstate0-backup', diverging from our usual naming scheme with
the '-backup' suffix, to avoid it ever being treated as a regular drive
from the rest of the stack in case it gets left over after a backup for
some reason (shouldn't happen).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The '--targetstorage' parameter does not apply to shared storages.
Example for a problem solved with the enabled check: Given a VM with
images only on a shared storage 'storeA', not available on the target
node (i.e. restricted by the nodes property). Then using
'--targetstorage storeB' would make offline migration suddenly
"work", but of course the disks would not be accessible and then
trying to migrate back would fail...
Example for a problem solved with the content type check: if a
VM had a shared ISO image, and there was a '--targetstorage storeA'
option, availablity of the 'iso' content type is checked for
'storeA', which is wrong as the ISO would not be moved to that
storage.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
The content of the ISO should be the same on both nodes, so offline
migrate the ISO, but don't regenerate it on VM start on the target node.
This way even with snippets the content will not change during live
migration.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
and use it for the vdisk_list call too. This avoids scanning (and picking up
volumes from!) storages that are not even configured to hold images.
Previously, the content type was only enforced when a storage map was present.
Also serves a bit as a preparation to enforce content type on guest startup,
because now migration failure happens early and not only when trying to start
the guest on the remote node.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
storage_check_enabled simply checks for the 'disable' option and then calls
storage_check_node.
While not strictly necessary for a second call where only the storage differs,
e.g. in case of clone, it is more future-proof: if support for a target storage
is added at some point, it might be easy to miss adapting the call.
For the migration checks, the situation is improved by now always catching
disabled (target) storages.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This reverts commit ff09c795ed. We wanted to wait
until PVE 7.0 for the change to not break migration new -> old until then.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Reviewed-by: Stefan Reiter <s.reiter@proxmox.com>
The default was changed for 5.2, so while it is not 32 MiB/s anymore,
it is still 128 MiB/s which I did not notice on my 1 Gbps (or < 125
MiB/s) setup. For users with links faster than one gigabit it now did
some limiting - so setup a very high limit so than even 100G should
not max this out.
This reverts commit a89bd10084.
The variable is only ever used for calculating the average speed of memory
migration, but it was set before disk mirroring already. But the disk
sizes are not included in the calculation, resulting in (very) wrong values.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Either we're done in a few seconds anyway, or if the VM dirties lots
of pages we need quite a bit of time, and then it does not help to
output roughly the same status 10 times a second...
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
* use render_bytes where possible, to get quick to read and grasp
units printed
* xbzrle is only interesting if actually pages/bytes are send using
it, so only log in that case
* log if VM dirties more than we send
* log current speed we get from QEMU
In general there are less lines logged and huge integers are avoided.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the claim that QEMU limits this to 32M otherwise is bogus, at least
with any current QEMU version..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Use an early die so that the rest can loose an indentation level for
the actual migration status reporting code
Extract common used members of the stat hash for shorter code.
use `git show -w --word-diff=color --word-diff-regex='\w+'` for
getting a better view of actual changes
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
avoids the possibility to die during phase3_cleanup and instead of needing to
duplicate the cleanup ourselves, benefit from phase2_cleanup doing so.
The duplicate cleanup was also very incomplete: it didn't stop the remote kvm
process (leading to 'VM already running' when trying to migrate again
afterwards), but it removed its disks, and it didn't unlock the config, didn't
close the tunnel and didn't cancel the block-dirty bitmaps.
Since migrate_cancel should do nothing after the (non-storage) migrate process
has completed, even that cleanup step is fine here.
Since phase3 is empty at the moment, the order of operations is still the same.
Also add a test, that would complain about finish_tunnel not being called before
this patch. That test also checks that local disks are not already removed
before finishing the block jobs.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Namely, those migrated with storage_migrate by using the information from
volume_map. Call cleanup_remotedisks in phase1_cleanup as well, because that's
where we end if sync_offline_local_volumes fails, and some disks might already
have been transfered successfully. Note that the local disks are still here, so
this is fine.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This also changes the behavior to remove the local copies of offline migrated
volumes only after the migration has finished successfully (this is relevant
for mixed settings, e.g. online migration with unused/vmstate disks).
local_volumes contains both, the volumes previously in $self->{volumes}
and the volumes in $self->{online_local_volumes}, and hence is the place
to look for which volumes we need to remove. Of course, replicated
volumes still need to be skipped.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>