Commit Graph

166 Commits

Author SHA1 Message Date
Fabian Grünbichler
b9f44d2773 migrate: add replication info to disk overview
to make migration logs a bit easier to grasp with a quick glance.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
2020-03-24 11:54:32 +01:00
Thomas Lamprecht
1e0074c437 migrate phase3: add to comment why a blockjob cancel is OK here
Clarify why a cancel is actually not really canceling here, because
we're already finished with storage migration and the block jobs are
all in ready state and we (source) are going to stop soon to hand
over to target.

> Note that if you issue 'block-job-cancel' after 'drive-mirror' has
> indicated (via the event BLOCK_JOB_READY) that the source and
> destination are synchronized, then the event triggered by this
> command changes to BLOCK_JOB_COMPLETED, to indicate that the
> mirroring has ended and the destination now has a point-in-time
> copy tied to the time of the cancellation
-- qapi/block-core.json (QEMU 4.2)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-03-20 11:08:23 +01:00
Mira Limbeck
ff09c795ed revert spice_ticket prefix change in 7827de4
The change to the prefixed version broke migration from new to old
qemu-server version. This reverts the change and adds a TODO comment for
7.0 to change it to the prefixed version then.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2020-03-20 10:37:33 +01:00
Fabian Grünbichler
db1f8b39e1 drive_mirror: rename variables and values
and add some more details to comments.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-03-18 08:21:29 +01:00
Mira Limbeck
7827de41a2 add unix socket support for NBD storage migration
The reuse of the tunnel, which we're opening to communicate with the target
node and to forward the unix socket for the state migration, for the NBD unix
socket requires adding support for an array of sockets to forward, not just a
single one. We also have to change the $sock_addr variable to an array
for the cleanup of the socket file as SSH does not remove the file.

To communicate to the target node the support of unix sockets for NBD
storage migration, we're specifying an nbd_protocol_version which is set
to 1. This version is then passed to the target node via STDIN. Because
we don't want to be dependent on the order of arguments being passed
via STDIN, we also prefix the spice ticket with 'spice_ticket: '. The
target side handles both the spice ticket and the nbd protocol version
with a fallback for old source nodes passing the spice ticket without a
prefix.
All arguments are line based and require a newline in between.

When the NBD server on the target node is started with a unix socket, we
get a different line containing all the information required to start
the drive-mirror. This contains the unix socket path used on the target node
which we require for forwarding and cleanup.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2020-03-18 08:03:44 +01:00
Mira Limbeck
e02fb12620 add qemu_drive_mirror_monitor completion modes
With Qemu 4.2 we encountered a problem with unix sockets and SSH socket
forwarding for drive-mirror. It seems the socket gets reopened again and
again after it closes for some reason. This can be worked around by
specifying 'block-job-cancel' instead of 'block-job-complete' when we're
not interested in swapping the disks again from NBD to their original
protocol. This is always the case when we use drive-mirror for live
migrating a VM.

qemu_drive_mirror is used for migration and for clone_disk. All in all
we have 3 cases to handle. Either the 'skip' case which skips the
completion of the job. The 'wait' case which was the default before and
still is when $completion is undefined. And the new 'wait_noswap' case
which is used for the live migration.
If 'wait_noswap' is specified, we issue a 'block-job-cancel' once the block
job is in 'ready' state. This completes the block job without swapping the
disks.

clone_disk always uses 'block-job-cancel' via the qemu_blockjobs_cancel
sub.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2020-03-18 08:03:44 +01:00
Fabian Ebner
0ad295f9fb Consistently use format determined in 'PVE::Storage::foreach_volid'
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
LGTM-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-03-09 19:36:58 +01:00
Fabian Ebner
5eca0c3643 sync_disks: Always set 'snapshots' for qcow2 and vmdk volumes
This fixes an issue when migrating a VM with an unused volume with format
qcow2 or vmdk. Since 'snapshots' wasn't set, storage_migrate wanted to
export/import with format raw+size instead. Therefore it used (instead of
just 'dd') 'qemu-img convert', which fails when its output leaves through
a pipe. Upon importing, a second error is present, because the format from
the volume ID doesn't match the format of the stream and there is no
conversion yet.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
LGTM-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-03-09 19:36:45 +01:00
Fabian Ebner
e0fd2b2f84 Create Drive.pm and move drive-related code there
The initialization for the drive keys in $confdesc is changed
to be a single for-loop iterating over the keys of $drivedesc_hash and
the initialization of the unusedN keys is move to directly below it.

To avoid the need to change all the call sites, functions with more than
a few callers are exported from the submodule and imported into QemuServer.pm.

For callers of the now imported functions within QemuServer.pm, the prefix
PVE::QemuServer is dropped, because it is unnecessary and now even confusing.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-03-07 18:23:57 +01:00
Stefan Reiter
29eb909ee0 fix #2611: use correct operation in get_bandwidth_limit
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-03-03 11:47:13 +01:00
Stefan Reiter
485449e37b qmp: use migrate-set-parameters in favor of deprecated options
migrate_set_downtime, migrate_set_speed and migrate-set-cachesize have
all been deprecated since 2.8 or 2.11 [0]. They still work, but no
reason not to use the correct version.

Note that the downtime-limit parameter switched from seconds to
milliseconds, so convert to that. Slightly improve log output with units
while at it.

[0] https://qemu.weilnetz.de/doc/qemu-doc.html#Deprecated-features

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-02-06 13:50:33 +01:00
Fabian Grünbichler
683ab65491 migrate: re-order lines to improve readability
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-02-05 09:43:09 +01:00
Fabian Ebner
1764fa05d0 Extract volume ID before calling 'parse_volume_id'
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-02-05 08:41:05 +01:00
Fabian Ebner
8b02e56870 rename 'volid' to 'drivestr' where it's not only a volume ID
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-02-05 08:41:05 +01:00
Fabian Ebner
c96173968a Remove unused 'sharedvm' variable
AFAICT this one hasn't been in use since commit
'4530494bf9f3d45c4a405c53ef3688e641f6bd8e'

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-01-09 17:43:51 +01:00
Stefan Reiter
8bf30c2a72 fix #2493: show QEMU errors in migration log
QEMU usually only prints warnings and errors and stays silent otherwise,
so it makes sense to just log all of it's output.

Prefix it with '[<target_hostname>]' to indicate that the output is
coming from the remote node, so users know where to search for the
error.

Side effect is that the 'VM start' task created by the migration will
now show the "QEMU:" prefix, but it's still very readable IMHO.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-12-12 13:36:19 +01:00
Stefan Reiter
6e0216d862 hide long commandline on vm_start/migrate failure
By default run_command prints the entire commandline executed when an
error occurs, but QEMU and our migrate command are not only
uninteresting to the user[*] but also annoyingly long. Hide them and only
print the exit code.

[*] Especially our migrate command, since it can't be manually executed
anyway. QEMU's commandline *might* contain something interesting, but is
so long that it's tricky to parse anyway, any a user can always call 'qm
showcmd --pretty'.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-12-12 13:35:40 +01:00
Stefan Reiter
68b108ee3a update disk size before local disk migration
Split out 'update_disksize' from the renamed 'update_disk_config' to
allow code reuse in QemuMigrate.

Remove dots after messages to keep style consistent for migration log.

After updating in sync_disks (phase1) of migration, write out updated
config. This means that even if migration fails or is aborted in later
stages, we keep the fixed config - this is not an issue, as it would
have been fixed on the next attempt anyway, and it can't hurt to have
the correct size instead of a wrong one either way.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-12-11 10:42:56 +01:00
Stefan Reiter
71c58bb7ed remove $vmid param from print_drive
It isn't used in the sub, but suggest it is needed. No users outside
qemu-server found.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-12-09 11:44:13 +01:00
Thomas Lamprecht
dad06e2068 refactor storage whitelist in sync_disks to regex
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-12-04 18:40:03 +01:00
Thomas Lamprecht
40a572f7e8 migrate phase 3 cleanup: add error into error propagation message
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-30 17:27:14 +01:00
Stefan Reiter
3392d6cacf refactor: extract QEMU machine related helpers to package
...PVE::QemuServer::Machine.

qemu_machine_feature_enabled is exported since it has a *lot* of users
in PVE::QemuServer and a long enough name as it is.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Stefan Reiter
0a13e08ec2 refactor: create QemuServer::Monitor for high-level QMP access
QMP and monitor helpers are moved from QemuServer.pm.

By using only vm_running_locally instead of check_running, a cyclic
dependency to QemuConfig is avoided. This also means that the $nocheck
parameter serves no more purpose, and has thus been removed along with
vm_mon_cmd_nocheck.

Care has been taken to avoid errors resulting from this, and
occasionally a manual check for a VM's existance inserted on the
callsite.

Methods have been renamed to avoid redundant naming:
* vm_qmp_command -> qmp_cmd
* vm_mon_cmd -> mon_cmd
* vm_human_monitor_command -> hmp_cmd

mon_cmd is exported since it has many users. This patch also changes all
non-package users of vm_qmp_command to use the mon_cmd helper. Includes
mocking for tests.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Thomas Lamprecht
e85d01f282 migration: fix false-positive log for copying local images
Only log that if we actually have local disks.
Add also an explicit log for replication.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-20 16:01:35 +01:00
Fabian Ebner
9270672e67 fix typo in migration cleanup error message
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-10-28 11:30:10 +01:00
Mira Limbeck
9860fe4ef9 close #2263: die on live migration with local cloudinit disk
Live migration with a local cloudinit disk was never intended to work. It did
however work to an extent that the migration completed but the disk on the
source node could not be deleted. Now die if a live migration is started with
a local cloudinit disk.

With the GUI changes live migration is already disabled as it recognizes the
cloudinit disk as a local resource.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2019-08-26 12:13:07 +02:00
Dominik Csapak
ccab68c22c fix remote viewer live migration
for some reason not setting port results in a port of '65535' which
triggers an execption in http-server anyevent, so we set the port to 0

also, we have to read the ticket from stdin even for 'unix' type secure
migration

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-08-20 11:49:24 +02:00
Tim Marx
ca6abacf6b migrate: log which local resource causes error
Signed-off-by: Tim Marx <t.marx@proxmox.com>
2019-05-07 10:22:12 +00:00
Tim Marx
370b05e719 whitespace cleanup
Signed-off-by: Tim Marx <t.marx@proxmox.com>
2019-05-07 10:22:12 +00:00
Stoiko Ivanov
d189e5901b bwlimit: add parameter for QemuMigrate::phase2
used for online local disks via qemu_drive_mirror

Add TODO comment for offline disks, as clone_disk calls `qemu-img
convert`, which does not have a bandwidth limit parameter.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2019-04-02 11:00:28 +02:00
Stoiko Ivanov
15a37695b6 bwlimit: add parameter to QemuMigrate::sync_disks
used for offline migration of local volumes

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2019-04-02 10:58:35 +02:00
Stoiko Ivanov
ddd664d739 bwlimit: honor bwlimit for migrate qmp call
The 'migrate_speed' can be set in the VM config. Additionally the 'migrate'
bwlimit from datacenter.cfg (storage-specific limits play no role for
memory+state migration) or the parameter provided to the API call can restrict
the speed. Take the lower of the two.

This patch also refactors the setting of migrate_speed and comments for clarity.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2019-04-02 10:34:40 +02:00
Mira Limbeck
9e93a63fe4 fix #2100: ignore cloudinit drive on offline migration
disk is not copied to the target node but still deleted on cleanup
(phase3_cleanup).

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
2019-03-29 18:11:33 +01:00
Thomas Lamprecht
769f187df5 followup whitespace fixes
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-02-20 07:34:10 +01:00
Alexandre Derumier
f3a483b682 QemuMigrate : cleanup identation 2019-02-20 07:32:23 +01:00
Thomas Lamprecht
c7789f54ad migrate: fix local disk migration with online VMs
commit 4530494bf9 introduced an
regression with local disk migrations if the VM is online and thus
needs to live migrated and no target storage was passed as parameter.

We made the hack to write "1" to the targetstorage option in this
case obsolete, but it was still used on deciding if there are any
drives to mirror at all. Here it is enough to check if there are any
'online_local_volumes' because that hash gets only filled if we can
and are told to live mirror local disk on migrations anyway. Also,
we abort early if local disks are found and the 'with-local-disks'
option is not set.

This was reported at:
https://forum.proxmox.com/threads/livemigration-with-localdisk-doesnt-coppy-and-data-from-the-hdds-anymore.50744/

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-01-17 10:58:50 +01:00
Wolfgang Bumiller
8c58b12d0d cleanup: use a local $override_targetsid variable
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-12-20 10:11:32 +01:00
Thomas Lamprecht
4530494bf9 fix local disk migration when no target storage is set
the check for targetstorage in:
if ($self->{running} && $self->{opts}->{targetstorage} && $local_volumes->{$volid}->{ref} eq 'config') {

was obsolete, as we always set the tragetstorage opts variable to '1'
in a broader "use same sid for remote local" check above.
So removing it leads to the same if truthtable but fixes the
check if we should fallback to the volume's SID if targetstorage is
not set, as else it seemed to be always set, and '1' is naturally not
a correct stroage ID.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-12-20 10:11:32 +01:00
Alexandre Derumier
d0c671823d fix #1013 : migrate : sync_disk : --targetstorage with offline disk
targetsid was not used, for disk unused (offline copy)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-12-20 10:11:32 +01:00
Stoiko Ivanov
ca6621315e Fix #1242 : clone_disk : call qga fstrim after clone
Some storage like rbd or lvm can't keep thin-provising after a qemu-mirror.

Call qga guest-fstrim if qga is available and fstrim_cloned_disks is enabled
after move_disk and migrate.

Co-Authored-By: Alexandre Derumier <aderumier@odiso.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2018-08-02 11:35:50 +02:00
Alexandre Derumier
50d8dd5dc7 migrate cache-size : power of 2
qemu 2.11 need a power of 2 cache size.

"
Parameter 'xbzrle_cache_size' expects is invalid,
it should be bigger than target page size and a power of two
"

roundup to near power of 2 value
2018-02-22 16:27:48 +01:00
Herman van Rink
d108cb1eb2 migrate: task log: fix typo
Signed-off-by: Herman van Rink <rink@initfour.nl>
2018-02-22 14:50:00 +01:00
Chris Hofstaedtler
ec82e3eee4 fix #1569: add shared flag to disks
With shared=1, (live) migration ignores the disk and assumes it is
present on all target nodes. This works similar to shared=1 on LXC
mountpoints.

Signed-off-by: Chris Hofstaedtler <chris.hofstaedtler@deduktiva.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2018-02-15 15:19:29 +01:00
Alexandre Derumier
d296ed08d3 migration : enable mtunnel for insecure migration V2
We only use it to send commands faster like resume

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2017-09-12 14:15:33 +02:00
Emmanuel Kasper
c2327320a8 Remove unused variable declaration 2017-09-07 11:22:32 +02:00
Emmanuel Kasper
46dd42f70c Fix #1441: Do not unplug controllers when the mirroring is finished
This should not be needed since we call 'block-job-complete' before
in qemu_drive_mirror_monitor(), and after benchmarking it does not
appear to be needed nor provide a measurable improvement when shutting
down the source.
2017-09-07 11:22:32 +02:00
Fabian Grünbichler
4305207d61 migrate: reduce polling intervals
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
4bdd20ab14 migrate: keep track of replication
and only transfer state and switch direction if there
actually are any replicated volumes.

once we add support for live-migration with replicated
volumes, adding a set-replication-state command to the
tunnel and using that probably makes sense.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
2e7fee87df migrate: finish tunnel in phase 3
after resuming the VM over the tunnel.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
1d5aaa1db5 qm mtunnel/migrate: add resume VMID command
and reformat the legacy SSH variant for readability.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00