Either we're done in a few seconds anyway, or if the VM dirties lots
of pages we need quite a bit of time, and then it does not help to
output roughly the same status 10 times a second...
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
* use render_bytes where possible, to get quick to read and grasp
units printed
* xbzrle is only interesting if actually pages/bytes are send using
it, so only log in that case
* log if VM dirties more than we send
* log current speed we get from QEMU
In general there are less lines logged and huge integers are avoided.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the claim that QEMU limits this to 32M otherwise is bogus, at least
with any current QEMU version..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Use an early die so that the rest can loose an indentation level for
the actual migration status reporting code
Extract common used members of the stat hash for shorter code.
use `git show -w --word-diff=color --word-diff-regex='\w+'` for
getting a better view of actual changes
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
avoids the possibility to die during phase3_cleanup and instead of needing to
duplicate the cleanup ourselves, benefit from phase2_cleanup doing so.
The duplicate cleanup was also very incomplete: it didn't stop the remote kvm
process (leading to 'VM already running' when trying to migrate again
afterwards), but it removed its disks, and it didn't unlock the config, didn't
close the tunnel and didn't cancel the block-dirty bitmaps.
Since migrate_cancel should do nothing after the (non-storage) migrate process
has completed, even that cleanup step is fine here.
Since phase3 is empty at the moment, the order of operations is still the same.
Also add a test, that would complain about finish_tunnel not being called before
this patch. That test also checks that local disks are not already removed
before finishing the block jobs.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Namely, those migrated with storage_migrate by using the information from
volume_map. Call cleanup_remotedisks in phase1_cleanup as well, because that's
where we end if sync_offline_local_volumes fails, and some disks might already
have been transfered successfully. Note that the local disks are still here, so
this is fine.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This also changes the behavior to remove the local copies of offline migrated
volumes only after the migration has finished successfully (this is relevant
for mixed settings, e.g. online migration with unused/vmstate disks).
local_volumes contains both, the volumes previously in $self->{volumes}
and the volumes in $self->{online_local_volumes}, and hence is the place
to look for which volumes we need to remove. Of course, replicated
volumes still need to be skipped.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
The case with:
1. no generic 'migration' limit from the storage plugin
2. a migrate_speed limit in the VM config
was broken. It would assign 0 to migrate_speed when picking the minimum value
and then default to the default value. Fix it by checking if bwlimit is 0
before picking the minimum.
Also, make it a bit more readable by avoiding the trick of //-assigning bwlimit
before the units match up and relying on getting back the original bwlimit value
as the minimum. Instead, only ||-assign after the units match up and don't rely
on other things.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
by using the information obtained in the first scan. This
also makes sure we only scan local storages.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
by making local_volumes class-accessible. One functions is for scanning all local
volumes and one is for actually syncing offline volumes via storage_migrate. The
exception is replicated volumes, this still happens during the scan for now.
Also introduce a filter_local_volumes helper, to makes life easier.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Offline migrated volumes are now activated within storage_migrate.
Online migrated volumes can be assumed to be already active.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
while it didn't actually fail, we probably want to avoid the behavior:
With remove_job=full:
* run_replication called during migration causes the replicated volumes to
be removed
* migration continues by fully copying all volumes
With remove_job=local:
* run_replication called during migration causes the job (and local
replication snapshots) to be removed
* migration continues by fully copying all volumes and renaming them to
avoid collision with the still existing remote volumes
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
In some cases $self->{replicated_volumes} will be auto-vivified
to {} by checks like
next if $self->{replicated_volumes}->{$volid}
and then {} would evaluate to true in a boolean context.
Now the replication information is retrieved once in prepare,
and used to decide whether to make the calls or not.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
No need to warn twice, so the warning from the outside check
was removed.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Pass new size directly, so the function doesn't need to know about
how some hash is organized. And return a message directly, instead
of both size-strings. Also dropped the wantarray, because both
existing callers use the message anyways.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
fixing the following two issues:
- the legacy code path was never converted to the new fork_tunnel
signature (which probably means that nothing triggers it in practice
anymore?)
- the NBD Unix socket got forwarded multiple times if more than one disk
was migrated via NBD (this is harmless, but wrong)
for the second issue I opted to keep the code compatible with the
possibility that Qemu starts supporting multiple NBD servers in the
future (and the target node could thus return multiple UNIX socket
paths). currently we can only start one NBD server on one socket, and
each drive-mirror simply starts a new connection over that single
socket.
I took the liberty of renaming the variables/keys since I found
'tunnel_addr' and 'sock_addr' rather confusing.
Reviewed-By: Mira Limbeck <m.limbeck@proxmox.com>
Tested-By: Mira Limbeck <m.limbeck@proxmox.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
we really only want to rescan the disk size of the disks we actually
need, and that are only the local disks (for which we have to allocate
the correct size on the target)
also we want to always skip the efidisk, since we get the wanted
size after the loop, and this produced a confusing log line
(for details why we do not want the 'real' size,
see commit 818ce80ec1)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
by avoiding auto-vivification of $self->{online_local_volumes} via
iteration. most code paths don't care whether it's undef or a reference
to an empty list, but this caused the (already) fixed bug of calling
nbd_stop without having started an NBD server in the first place.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
and refactor the test_volid closure. Like this get_replicatable_volumes doesn't
need a separate loop for unused volumes anymore. For get_vm_volumes, which is used
for activation/deactivation of volumes at migration and deactivation in vm_stop_cleanup,
includes those volumes now. For migration it's an improvement, because those volumes
might need to be migrated and for vm_stop_cleanup it shouldn't hurt. The last user
of foreach_volid is check_vm_disks_local used by migrate_vm_precondition,
where information about the additional volumes doesn't hurt either.
Note that replicate is (still) set by default, so the behavior for
get_replicatable_volumes for unused volumes should not change.
Hibernation vmstate files are now also included and recognized as 'is_vmstate'.
The 'size' attribute will not be overwritten by subsequent iterations for the
same volid anymore (a volid may appear both in the config and in snapshots),
so the size from the current config is now preferred.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
by counting only local volumes that will be live-migrated via qemu_drive_mirror,
i.e. those listed in $self->{online_local_volumes}.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
If storage_migrate dies, the error message might not include the
volume ID or the target storage ID, but those might be good to know.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This makes it possible to migrate a VM with volumes store1:vm-123-disk-0
store2:vm-123-disk-0 to some targetstorage. Also prevents migration failure
when there is an orphaned disk with the same volid on the target.
To avoid confusion, the name should not change for 'vmstate'-volumes.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
It was necessary to move foreach_volid back to QemuServer.pm
In VZDump/QemuServer.pm and QemuMigrate.pm the dependency on
QemuConfig.pm was already there, just the explicit "use" was missing.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This is required to support custom CPU models, since the
"cpu-models.conf" file is not versioned, and can be changed while a VM
using a custom model is running. Changing the file in such a state can
lead to a different "-cpu" argument on the receiving side.
This patch fixes this by passing the entire "-cpu" option (extracted
from /proc/.../cmdline) as a "qm start" parameter. Note that this is
only done if the VM to migrate is using a custom model (which we can
check just fine, since the <vmid>.conf *is* versioned with pending
changes), thus not breaking any live-migration directionality.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
the syntax is backwards compatible, providing a single storage ID or '1'
works like before. the new helper ensures consistent behaviour at all
call sites.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
485449e37 ("qmp: use migrate-set-parameters in favor of deprecated options")
changed the initial "migrate_set_downtime" QMP call to the more recent
"migrate-set-parameters", but forgot to do so for the auto-increase code
further below.
Since the units of the two calls don't match, this would have caused the
auto-increase to increase the limit to absurd levels as soon as it kicked
in (ms treated as s).
Update the second call to the new version as well, and while at it remove
the unnecessary "defined()" check for $migrate_downtime, which is always
initialized from the defaults anyway.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>