In some cases $self->{replicated_volumes} will be auto-vivified
to {} by checks like
next if $self->{replicated_volumes}->{$volid}
and then {} would evaluate to true in a boolean context.
Now the replication information is retrieved once in prepare,
and used to decide whether to make the calls or not.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
No need to warn twice, so the warning from the outside check
was removed.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Pass new size directly, so the function doesn't need to know about
how some hash is organized. And return a message directly, instead
of both size-strings. Also dropped the wantarray, because both
existing callers use the message anyways.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
fixing the following two issues:
- the legacy code path was never converted to the new fork_tunnel
signature (which probably means that nothing triggers it in practice
anymore?)
- the NBD Unix socket got forwarded multiple times if more than one disk
was migrated via NBD (this is harmless, but wrong)
for the second issue I opted to keep the code compatible with the
possibility that Qemu starts supporting multiple NBD servers in the
future (and the target node could thus return multiple UNIX socket
paths). currently we can only start one NBD server on one socket, and
each drive-mirror simply starts a new connection over that single
socket.
I took the liberty of renaming the variables/keys since I found
'tunnel_addr' and 'sock_addr' rather confusing.
Reviewed-By: Mira Limbeck <m.limbeck@proxmox.com>
Tested-By: Mira Limbeck <m.limbeck@proxmox.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
we really only want to rescan the disk size of the disks we actually
need, and that are only the local disks (for which we have to allocate
the correct size on the target)
also we want to always skip the efidisk, since we get the wanted
size after the loop, and this produced a confusing log line
(for details why we do not want the 'real' size,
see commit 818ce80ec1)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
by avoiding auto-vivification of $self->{online_local_volumes} via
iteration. most code paths don't care whether it's undef or a reference
to an empty list, but this caused the (already) fixed bug of calling
nbd_stop without having started an NBD server in the first place.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
and refactor the test_volid closure. Like this get_replicatable_volumes doesn't
need a separate loop for unused volumes anymore. For get_vm_volumes, which is used
for activation/deactivation of volumes at migration and deactivation in vm_stop_cleanup,
includes those volumes now. For migration it's an improvement, because those volumes
might need to be migrated and for vm_stop_cleanup it shouldn't hurt. The last user
of foreach_volid is check_vm_disks_local used by migrate_vm_precondition,
where information about the additional volumes doesn't hurt either.
Note that replicate is (still) set by default, so the behavior for
get_replicatable_volumes for unused volumes should not change.
Hibernation vmstate files are now also included and recognized as 'is_vmstate'.
The 'size' attribute will not be overwritten by subsequent iterations for the
same volid anymore (a volid may appear both in the config and in snapshots),
so the size from the current config is now preferred.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
by counting only local volumes that will be live-migrated via qemu_drive_mirror,
i.e. those listed in $self->{online_local_volumes}.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
If storage_migrate dies, the error message might not include the
volume ID or the target storage ID, but those might be good to know.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This makes it possible to migrate a VM with volumes store1:vm-123-disk-0
store2:vm-123-disk-0 to some targetstorage. Also prevents migration failure
when there is an orphaned disk with the same volid on the target.
To avoid confusion, the name should not change for 'vmstate'-volumes.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
It was necessary to move foreach_volid back to QemuServer.pm
In VZDump/QemuServer.pm and QemuMigrate.pm the dependency on
QemuConfig.pm was already there, just the explicit "use" was missing.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This is required to support custom CPU models, since the
"cpu-models.conf" file is not versioned, and can be changed while a VM
using a custom model is running. Changing the file in such a state can
lead to a different "-cpu" argument on the receiving side.
This patch fixes this by passing the entire "-cpu" option (extracted
from /proc/.../cmdline) as a "qm start" parameter. Note that this is
only done if the VM to migrate is using a custom model (which we can
check just fine, since the <vmid>.conf *is* versioned with pending
changes), thus not breaking any live-migration directionality.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
the syntax is backwards compatible, providing a single storage ID or '1'
works like before. the new helper ensures consistent behaviour at all
call sites.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
485449e37 ("qmp: use migrate-set-parameters in favor of deprecated options")
changed the initial "migrate_set_downtime" QMP call to the more recent
"migrate-set-parameters", but forgot to do so for the auto-increase code
further below.
Since the units of the two calls don't match, this would have caused the
auto-increase to increase the limit to absurd levels as soon as it kicked
in (ms treated as s).
Update the second call to the new version as well, and while at it remove
the unnecessary "defined()" check for $migrate_downtime, which is always
initialized from the defaults anyway.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
to also handle cases where disk allocation failed in the remote
vm_start, and we only have a bitmap but no target drive information.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
on storages where the minimum size of images is bigger than the real
OVMF_VARS.fd file, they get padded to their minimum size
when using such an image, qemu maps it fully to the vm, but the efi
does not find the vars region and creates a file on the first efi
partition it finds
this breaks some settings in the ovmf, such as resolution
to fix this, we have to specify the size for the pflash, so that
qemu only maps the first n bytes in the vm (this only works for
raw files, not for qcow2)
we also have to use the correct size when converting between storages
in 'clone_disk' (used for move disk and cloning vms) and when
live migrating to different storages
when we now expect that the source image is always correctly used/created
(e.g. raw with size=x in pflash argument) then we always create the
target correctly
when encountering users which have a non-valid image (e.g. a efidisk
moved from zfs to qcow2 before this patch), we have to tell them to
recreate the efidisk and the settings on it
we have to version_guard it to 4.1+pve2 (since we haven't bumped yet
since the change to pve2)
also add 2 tests, one for the old version and one for the new
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
Reviewed-by: Stefan Reiter <s.reiter@proxmox.com>
[ Thomas: rebased to master ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
since bitmaps are set early on, and 'qm start' potentially has allocated
the disks but still failed. we can only clean up what we know about
anyway, so the disk part is still only best effort.
also use replicated_volumes instead of bitmap existence to check for
replicated volumes, since 'qm start' on an old node that does not
understand replicated volumes might have allocated a new volume that we
DO want to clean up, and not skip.
also cleanup disks after stopping target VM, otherwise we might end up
in a situation where the target VM is still running and using the disks,
thus blocking the disk cleanup.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
by only checking for replicatable volumes when a replication job is
defined, and passing only actually replicated volumes to the target node
via STDIN, and back via STDOUT.
otherwise this can pick up theoretically replicatable, but not actually
replicated volumes and treat them wrong.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
There is a need to set $noerr, because otherwise migration for a
VM with a non-replicatable volume fails with:
missing replicate feature on volume 'myfs:107/vm-107-disk-2.raw'
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
as we need at least pve-qemu in 4.2 for this to work, the target side
is implicitly checked with "to old version" check for migrate or the
mirror will fail anyway.
Just use the simple "qemu binary version check", as we could stil
live migrate an older snapshot with older machine versions if both
sides have a recent enough qemu.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
with incremental drive-mirror and dirty-bitmap tracking.
1.) get replicated disks that are currently referenced by running VM
2.) add a block-dirty-bitmap to each of them
3.) replicate ALL replicated disks
4.) pass bitmaps from 2) to drive-mirror for disks from 1)
5.) skip replicated disks when cleaning up volumes on either source or
target
added error handling is just removing the bitmaps if an error occurs at
any point after 2, except when the handover to the target node has
already happened, since the bitmaps are cleaned up together with the
source VM in that case.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
to make migration logs a bit easier to grasp with a quick glance.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
Clarify why a cancel is actually not really canceling here, because
we're already finished with storage migration and the block jobs are
all in ready state and we (source) are going to stop soon to hand
over to target.
> Note that if you issue 'block-job-cancel' after 'drive-mirror' has
> indicated (via the event BLOCK_JOB_READY) that the source and
> destination are synchronized, then the event triggered by this
> command changes to BLOCK_JOB_COMPLETED, to indicate that the
> mirroring has ended and the destination now has a point-in-time
> copy tied to the time of the cancellation
-- qapi/block-core.json (QEMU 4.2)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The change to the prefixed version broke migration from new to old
qemu-server version. This reverts the change and adds a TODO comment for
7.0 to change it to the prefixed version then.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
The reuse of the tunnel, which we're opening to communicate with the target
node and to forward the unix socket for the state migration, for the NBD unix
socket requires adding support for an array of sockets to forward, not just a
single one. We also have to change the $sock_addr variable to an array
for the cleanup of the socket file as SSH does not remove the file.
To communicate to the target node the support of unix sockets for NBD
storage migration, we're specifying an nbd_protocol_version which is set
to 1. This version is then passed to the target node via STDIN. Because
we don't want to be dependent on the order of arguments being passed
via STDIN, we also prefix the spice ticket with 'spice_ticket: '. The
target side handles both the spice ticket and the nbd protocol version
with a fallback for old source nodes passing the spice ticket without a
prefix.
All arguments are line based and require a newline in between.
When the NBD server on the target node is started with a unix socket, we
get a different line containing all the information required to start
the drive-mirror. This contains the unix socket path used on the target node
which we require for forwarding and cleanup.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
With Qemu 4.2 we encountered a problem with unix sockets and SSH socket
forwarding for drive-mirror. It seems the socket gets reopened again and
again after it closes for some reason. This can be worked around by
specifying 'block-job-cancel' instead of 'block-job-complete' when we're
not interested in swapping the disks again from NBD to their original
protocol. This is always the case when we use drive-mirror for live
migrating a VM.
qemu_drive_mirror is used for migration and for clone_disk. All in all
we have 3 cases to handle. Either the 'skip' case which skips the
completion of the job. The 'wait' case which was the default before and
still is when $completion is undefined. And the new 'wait_noswap' case
which is used for the live migration.
If 'wait_noswap' is specified, we issue a 'block-job-cancel' once the block
job is in 'ready' state. This completes the block job without swapping the
disks.
clone_disk always uses 'block-job-cancel' via the qemu_blockjobs_cancel
sub.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
This fixes an issue when migrating a VM with an unused volume with format
qcow2 or vmdk. Since 'snapshots' wasn't set, storage_migrate wanted to
export/import with format raw+size instead. Therefore it used (instead of
just 'dd') 'qemu-img convert', which fails when its output leaves through
a pipe. Upon importing, a second error is present, because the format from
the volume ID doesn't match the format of the stream and there is no
conversion yet.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
LGTM-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
The initialization for the drive keys in $confdesc is changed
to be a single for-loop iterating over the keys of $drivedesc_hash and
the initialization of the unusedN keys is move to directly below it.
To avoid the need to change all the call sites, functions with more than
a few callers are exported from the submodule and imported into QemuServer.pm.
For callers of the now imported functions within QemuServer.pm, the prefix
PVE::QemuServer is dropped, because it is unnecessary and now even confusing.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
migrate_set_downtime, migrate_set_speed and migrate-set-cachesize have
all been deprecated since 2.8 or 2.11 [0]. They still work, but no
reason not to use the correct version.
Note that the downtime-limit parameter switched from seconds to
milliseconds, so convert to that. Slightly improve log output with units
while at it.
[0] https://qemu.weilnetz.de/doc/qemu-doc.html#Deprecated-features
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
QEMU usually only prints warnings and errors and stays silent otherwise,
so it makes sense to just log all of it's output.
Prefix it with '[<target_hostname>]' to indicate that the output is
coming from the remote node, so users know where to search for the
error.
Side effect is that the 'VM start' task created by the migration will
now show the "QEMU:" prefix, but it's still very readable IMHO.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
By default run_command prints the entire commandline executed when an
error occurs, but QEMU and our migrate command are not only
uninteresting to the user[*] but also annoyingly long. Hide them and only
print the exit code.
[*] Especially our migrate command, since it can't be manually executed
anyway. QEMU's commandline *might* contain something interesting, but is
so long that it's tricky to parse anyway, any a user can always call 'qm
showcmd --pretty'.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Split out 'update_disksize' from the renamed 'update_disk_config' to
allow code reuse in QemuMigrate.
Remove dots after messages to keep style consistent for migration log.
After updating in sync_disks (phase1) of migration, write out updated
config. This means that even if migration fails or is aborted in later
stages, we keep the fixed config - this is not an issue, as it would
have been fixed on the next attempt anyway, and it can't hurt to have
the correct size instead of a wrong one either way.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
...PVE::QemuServer::Machine.
qemu_machine_feature_enabled is exported since it has a *lot* of users
in PVE::QemuServer and a long enough name as it is.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
QMP and monitor helpers are moved from QemuServer.pm.
By using only vm_running_locally instead of check_running, a cyclic
dependency to QemuConfig is avoided. This also means that the $nocheck
parameter serves no more purpose, and has thus been removed along with
vm_mon_cmd_nocheck.
Care has been taken to avoid errors resulting from this, and
occasionally a manual check for a VM's existance inserted on the
callsite.
Methods have been renamed to avoid redundant naming:
* vm_qmp_command -> qmp_cmd
* vm_mon_cmd -> mon_cmd
* vm_human_monitor_command -> hmp_cmd
mon_cmd is exported since it has many users. This patch also changes all
non-package users of vm_qmp_command to use the mon_cmd helper. Includes
mocking for tests.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Live migration with a local cloudinit disk was never intended to work. It did
however work to an extent that the migration completed but the disk on the
source node could not be deleted. Now die if a live migration is started with
a local cloudinit disk.
With the GUI changes live migration is already disabled as it recognizes the
cloudinit disk as a local resource.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
for some reason not setting port results in a port of '65535' which
triggers an execption in http-server anyevent, so we set the port to 0
also, we have to read the ticket from stdin even for 'unix' type secure
migration
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
used for online local disks via qemu_drive_mirror
Add TODO comment for offline disks, as clone_disk calls `qemu-img
convert`, which does not have a bandwidth limit parameter.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
The 'migrate_speed' can be set in the VM config. Additionally the 'migrate'
bwlimit from datacenter.cfg (storage-specific limits play no role for
memory+state migration) or the parameter provided to the API call can restrict
the speed. Take the lower of the two.
This patch also refactors the setting of migrate_speed and comments for clarity.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
disk is not copied to the target node but still deleted on cleanup
(phase3_cleanup).
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
commit 4530494bf9 introduced an
regression with local disk migrations if the VM is online and thus
needs to live migrated and no target storage was passed as parameter.
We made the hack to write "1" to the targetstorage option in this
case obsolete, but it was still used on deciding if there are any
drives to mirror at all. Here it is enough to check if there are any
'online_local_volumes' because that hash gets only filled if we can
and are told to live mirror local disk on migrations anyway. Also,
we abort early if local disks are found and the 'with-local-disks'
option is not set.
This was reported at:
https://forum.proxmox.com/threads/livemigration-with-localdisk-doesnt-coppy-and-data-from-the-hdds-anymore.50744/
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the check for targetstorage in:
if ($self->{running} && $self->{opts}->{targetstorage} && $local_volumes->{$volid}->{ref} eq 'config') {
was obsolete, as we always set the tragetstorage opts variable to '1'
in a broader "use same sid for remote local" check above.
So removing it leads to the same if truthtable but fixes the
check if we should fallback to the volume's SID if targetstorage is
not set, as else it seemed to be always set, and '1' is naturally not
a correct stroage ID.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
targetsid was not used, for disk unused (offline copy)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Some storage like rbd or lvm can't keep thin-provising after a qemu-mirror.
Call qga guest-fstrim if qga is available and fstrim_cloned_disks is enabled
after move_disk and migrate.
Co-Authored-By: Alexandre Derumier <aderumier@odiso.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
qemu 2.11 need a power of 2 cache size.
"
Parameter 'xbzrle_cache_size' expects is invalid,
it should be bigger than target page size and a power of two
"
roundup to near power of 2 value
With shared=1, (live) migration ignores the disk and assumes it is
present on all target nodes. This works similar to shared=1 on LXC
mountpoints.
Signed-off-by: Chris Hofstaedtler <chris.hofstaedtler@deduktiva.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This should not be needed since we call 'block-job-complete' before
in qemu_drive_mirror_monitor(), and after benchmarking it does not
appear to be needed nor provide a measurable improvement when shutting
down the source.
and only transfer state and switch direction if there
actually are any replicated volumes.
once we add support for live-migration with replicated
volumes, adding a set-replication-state command to the
tunnel and using that probably makes sense.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
because we want commands to return meaningful errors, and
print them on the client/source side.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
qm mtunnel was deemed as deprecated but still in use here.
Switch over to pvecm's mtunnel to allow removing the qm variant in
PVE 5.1
Also use an absolute path so we do not depended on the targets
environment variables
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>