Commit Graph

254 Commits

Author SHA1 Message Date
Fabian Ebner
1764fa05d0 Extract volume ID before calling 'parse_volume_id'
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-02-05 08:41:05 +01:00
Fabian Ebner
8b02e56870 rename 'volid' to 'drivestr' where it's not only a volume ID
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-02-05 08:41:05 +01:00
Fabian Ebner
c96173968a Remove unused 'sharedvm' variable
AFAICT this one hasn't been in use since commit
'4530494bf9f3d45c4a405c53ef3688e641f6bd8e'

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-01-09 17:43:51 +01:00
Stefan Reiter
8bf30c2a72 fix #2493: show QEMU errors in migration log
QEMU usually only prints warnings and errors and stays silent otherwise,
so it makes sense to just log all of it's output.

Prefix it with '[<target_hostname>]' to indicate that the output is
coming from the remote node, so users know where to search for the
error.

Side effect is that the 'VM start' task created by the migration will
now show the "QEMU:" prefix, but it's still very readable IMHO.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-12-12 13:36:19 +01:00
Stefan Reiter
6e0216d862 hide long commandline on vm_start/migrate failure
By default run_command prints the entire commandline executed when an
error occurs, but QEMU and our migrate command are not only
uninteresting to the user[*] but also annoyingly long. Hide them and only
print the exit code.

[*] Especially our migrate command, since it can't be manually executed
anyway. QEMU's commandline *might* contain something interesting, but is
so long that it's tricky to parse anyway, any a user can always call 'qm
showcmd --pretty'.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-12-12 13:35:40 +01:00
Stefan Reiter
68b108ee3a update disk size before local disk migration
Split out 'update_disksize' from the renamed 'update_disk_config' to
allow code reuse in QemuMigrate.

Remove dots after messages to keep style consistent for migration log.

After updating in sync_disks (phase1) of migration, write out updated
config. This means that even if migration fails or is aborted in later
stages, we keep the fixed config - this is not an issue, as it would
have been fixed on the next attempt anyway, and it can't hurt to have
the correct size instead of a wrong one either way.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-12-11 10:42:56 +01:00
Stefan Reiter
71c58bb7ed remove $vmid param from print_drive
It isn't used in the sub, but suggest it is needed. No users outside
qemu-server found.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-12-09 11:44:13 +01:00
Thomas Lamprecht
dad06e2068 refactor storage whitelist in sync_disks to regex
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-12-04 18:40:03 +01:00
Thomas Lamprecht
40a572f7e8 migrate phase 3 cleanup: add error into error propagation message
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-30 17:27:14 +01:00
Stefan Reiter
3392d6cacf refactor: extract QEMU machine related helpers to package
...PVE::QemuServer::Machine.

qemu_machine_feature_enabled is exported since it has a *lot* of users
in PVE::QemuServer and a long enough name as it is.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Stefan Reiter
0a13e08ec2 refactor: create QemuServer::Monitor for high-level QMP access
QMP and monitor helpers are moved from QemuServer.pm.

By using only vm_running_locally instead of check_running, a cyclic
dependency to QemuConfig is avoided. This also means that the $nocheck
parameter serves no more purpose, and has thus been removed along with
vm_mon_cmd_nocheck.

Care has been taken to avoid errors resulting from this, and
occasionally a manual check for a VM's existance inserted on the
callsite.

Methods have been renamed to avoid redundant naming:
* vm_qmp_command -> qmp_cmd
* vm_mon_cmd -> mon_cmd
* vm_human_monitor_command -> hmp_cmd

mon_cmd is exported since it has many users. This patch also changes all
non-package users of vm_qmp_command to use the mon_cmd helper. Includes
mocking for tests.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Thomas Lamprecht
e85d01f282 migration: fix false-positive log for copying local images
Only log that if we actually have local disks.
Add also an explicit log for replication.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-20 16:01:35 +01:00
Fabian Ebner
9270672e67 fix typo in migration cleanup error message
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-10-28 11:30:10 +01:00
Mira Limbeck
9860fe4ef9 close #2263: die on live migration with local cloudinit disk
Live migration with a local cloudinit disk was never intended to work. It did
however work to an extent that the migration completed but the disk on the
source node could not be deleted. Now die if a live migration is started with
a local cloudinit disk.

With the GUI changes live migration is already disabled as it recognizes the
cloudinit disk as a local resource.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2019-08-26 12:13:07 +02:00
Dominik Csapak
ccab68c22c fix remote viewer live migration
for some reason not setting port results in a port of '65535' which
triggers an execption in http-server anyevent, so we set the port to 0

also, we have to read the ticket from stdin even for 'unix' type secure
migration

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-08-20 11:49:24 +02:00
Tim Marx
ca6abacf6b migrate: log which local resource causes error
Signed-off-by: Tim Marx <t.marx@proxmox.com>
2019-05-07 10:22:12 +00:00
Tim Marx
370b05e719 whitespace cleanup
Signed-off-by: Tim Marx <t.marx@proxmox.com>
2019-05-07 10:22:12 +00:00
Stoiko Ivanov
d189e5901b bwlimit: add parameter for QemuMigrate::phase2
used for online local disks via qemu_drive_mirror

Add TODO comment for offline disks, as clone_disk calls `qemu-img
convert`, which does not have a bandwidth limit parameter.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2019-04-02 11:00:28 +02:00
Stoiko Ivanov
15a37695b6 bwlimit: add parameter to QemuMigrate::sync_disks
used for offline migration of local volumes

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2019-04-02 10:58:35 +02:00
Stoiko Ivanov
ddd664d739 bwlimit: honor bwlimit for migrate qmp call
The 'migrate_speed' can be set in the VM config. Additionally the 'migrate'
bwlimit from datacenter.cfg (storage-specific limits play no role for
memory+state migration) or the parameter provided to the API call can restrict
the speed. Take the lower of the two.

This patch also refactors the setting of migrate_speed and comments for clarity.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2019-04-02 10:34:40 +02:00
Mira Limbeck
9e93a63fe4 fix #2100: ignore cloudinit drive on offline migration
disk is not copied to the target node but still deleted on cleanup
(phase3_cleanup).

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
2019-03-29 18:11:33 +01:00
Thomas Lamprecht
769f187df5 followup whitespace fixes
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-02-20 07:34:10 +01:00
Alexandre Derumier
f3a483b682 QemuMigrate : cleanup identation 2019-02-20 07:32:23 +01:00
Thomas Lamprecht
c7789f54ad migrate: fix local disk migration with online VMs
commit 4530494bf9 introduced an
regression with local disk migrations if the VM is online and thus
needs to live migrated and no target storage was passed as parameter.

We made the hack to write "1" to the targetstorage option in this
case obsolete, but it was still used on deciding if there are any
drives to mirror at all. Here it is enough to check if there are any
'online_local_volumes' because that hash gets only filled if we can
and are told to live mirror local disk on migrations anyway. Also,
we abort early if local disks are found and the 'with-local-disks'
option is not set.

This was reported at:
https://forum.proxmox.com/threads/livemigration-with-localdisk-doesnt-coppy-and-data-from-the-hdds-anymore.50744/

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-01-17 10:58:50 +01:00
Wolfgang Bumiller
8c58b12d0d cleanup: use a local $override_targetsid variable
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-12-20 10:11:32 +01:00
Thomas Lamprecht
4530494bf9 fix local disk migration when no target storage is set
the check for targetstorage in:
if ($self->{running} && $self->{opts}->{targetstorage} && $local_volumes->{$volid}->{ref} eq 'config') {

was obsolete, as we always set the tragetstorage opts variable to '1'
in a broader "use same sid for remote local" check above.
So removing it leads to the same if truthtable but fixes the
check if we should fallback to the volume's SID if targetstorage is
not set, as else it seemed to be always set, and '1' is naturally not
a correct stroage ID.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-12-20 10:11:32 +01:00
Alexandre Derumier
d0c671823d fix #1013 : migrate : sync_disk : --targetstorage with offline disk
targetsid was not used, for disk unused (offline copy)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-12-20 10:11:32 +01:00
Stoiko Ivanov
ca6621315e Fix #1242 : clone_disk : call qga fstrim after clone
Some storage like rbd or lvm can't keep thin-provising after a qemu-mirror.

Call qga guest-fstrim if qga is available and fstrim_cloned_disks is enabled
after move_disk and migrate.

Co-Authored-By: Alexandre Derumier <aderumier@odiso.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2018-08-02 11:35:50 +02:00
Alexandre Derumier
50d8dd5dc7 migrate cache-size : power of 2
qemu 2.11 need a power of 2 cache size.

"
Parameter 'xbzrle_cache_size' expects is invalid,
it should be bigger than target page size and a power of two
"

roundup to near power of 2 value
2018-02-22 16:27:48 +01:00
Herman van Rink
d108cb1eb2 migrate: task log: fix typo
Signed-off-by: Herman van Rink <rink@initfour.nl>
2018-02-22 14:50:00 +01:00
Chris Hofstaedtler
ec82e3eee4 fix #1569: add shared flag to disks
With shared=1, (live) migration ignores the disk and assumes it is
present on all target nodes. This works similar to shared=1 on LXC
mountpoints.

Signed-off-by: Chris Hofstaedtler <chris.hofstaedtler@deduktiva.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2018-02-15 15:19:29 +01:00
Alexandre Derumier
d296ed08d3 migration : enable mtunnel for insecure migration V2
We only use it to send commands faster like resume

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2017-09-12 14:15:33 +02:00
Emmanuel Kasper
c2327320a8 Remove unused variable declaration 2017-09-07 11:22:32 +02:00
Emmanuel Kasper
46dd42f70c Fix #1441: Do not unplug controllers when the mirroring is finished
This should not be needed since we call 'block-job-complete' before
in qemu_drive_mirror_monitor(), and after benchmarking it does not
appear to be needed nor provide a measurable improvement when shutting
down the source.
2017-09-07 11:22:32 +02:00
Fabian Grünbichler
4305207d61 migrate: reduce polling intervals
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
4bdd20ab14 migrate: keep track of replication
and only transfer state and switch direction if there
actually are any replicated volumes.

once we add support for live-migration with replicated
volumes, adding a set-replication-state command to the
tunnel and using that probably makes sense.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
2e7fee87df migrate: finish tunnel in phase 3
after resuming the VM over the tunnel.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
1d5aaa1db5 qm mtunnel/migrate: add resume VMID command
and reformat the legacy SSH variant for readability.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
bcb51ae8f9 mtunnel: add and handle OK/ERR replies
because we want commands to return meaningful errors, and
print them on the client/source side.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
58cbe63901 migrate: read mtunnel version
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
e0eb1f7677 migrate: refactor mtunnel read/write
to make adding new commands and reading replies easier

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Fabian Grünbichler
d7b1b24b6f migrate: switch back to qm mtunnel
to allow adding guest specific commands to the tunnel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-08-07 09:23:56 +02:00
Emmanuel Kasper
171ed95c76 Use default values when memory is not set in vm.conf when migrating
This fixes a "Use of uninitialized value in multiplication (*) "
warning when doing a migration
2017-07-03 14:37:00 +02:00
Thomas Lamprecht
da18cc9300 migrate: use 'mtunnel' from pvecm
qm mtunnel was deemed as deprecated but still in use here.
Switch over to pvecm's mtunnel to allow removing the qm variant in
PVE 5.1

Also use an absolute path so we do not depended on the targets
environment variables

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2017-06-23 11:00:52 +02:00
Wolfgang Bumiller
6f58fce9ee migrate: pass the with_snapshots parameter 2017-06-22 12:58:14 +02:00
Dietmar Maurer
54d10ab121 PVE::QemuMigrate. do not use JSON - not required here 2017-06-22 08:51:38 +02:00
Dietmar Maurer
d652f7b1ab PVE/QemuMigrate.pm: use new replication job helpers from AbstractMigrate 2017-06-21 12:27:45 +02:00
Dietmar Maurer
f6a17ff5e3 Change target in replication-state when replication direction is switched 2017-06-21 10:59:45 +02:00
Dietmar Maurer
dbc9420b0b PVE/QemuMigrate.pm: use replication job, transfer replication state 2017-06-20 12:42:51 +02:00
Dietmar Maurer
5009a8c755 PVE/QemuMigrate.pm: fix syntax errors 2017-06-13 11:56:26 +02:00
Dietmar Maurer
aee6abe5ba PVE/QemuMigrate.pm - use PVE::QemuServer::foreach_volid 2017-06-13 11:26:47 +02:00
Wolfgang Bumiller
ba5acf88a1 migrate: migration_type setting moved to pve-guest-common 2017-06-09 12:28:28 +02:00
Wolfgang Bumiller
f1c2a53aee migration: implement insecure offline migration 2017-06-01 10:50:28 +02:00
Wolfgang Bumiller
7126e1c9bb migrate: pass ssh_info to storage_migrate 2017-05-23 09:57:17 +02:00
Dietmar Maurer
46883f80f6 Revert "Integrate replica in the qemu migration."
This reverts commit 63d02c7074.

The commit changes the configuration before the VM is actually
migrated, so it is possible to have a wrong configuration when
migration fails for some reason. Also, I am quite unsure if
this automatic target change is really wanted. The patch also
contains wrong refereces to $self->{opts}->{node}.
2017-05-06 10:39:43 +02:00
Dietmar Maurer
b1c12185fb Revert "migrate: cleanup replica volume skip condition"
This reverts commit 6e8044dcea.
2017-05-06 10:38:06 +02:00
Wolfgang Bumiller
6e8044dcea migrate: cleanup replica volume skip condition 2017-04-28 10:34:46 +02:00
Wolfgang Link
63d02c7074 Integrate replica in the qemu migration.
Now it is possible to migrate a VM offline when replica is enabled.
It will reduce replication to an minimal amount.
2017-04-28 10:11:33 +02:00
Alexandre Derumier
d80ad67f9d live storage migration : fix check of target storage availability
if we define a different target storeid for remote node,
and that storage is not available on source node

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2017-04-21 12:05:36 +02:00
Fabian Grünbichler
877e2ea746 migrate: clarify comment
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-04-21 11:43:29 +02:00
Fabian Grünbichler
28412ae488 migrate: cleanup nbd source disks earlier
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-04-21 11:43:29 +02:00
Fabian Grünbichler
504105c638 fix #1338: migrate: stop nbd before resuming
since Qemu 2.9, block device write access is limited to one
writer unless shared_rw is set to true. there is an
exception for live-migrating local disks via NBD as long as
the VM is suspended.

stop the NBD server before resuming the VM accordingly to
unbreak local disk live-migration.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-04-21 11:43:29 +02:00
Fabian Grünbichler
8b54f4b8db defined() style cleanup 2017-02-28 12:46:47 +01:00
Wolfgang Link
9045f57a27 Check array existed before use.
This triggers if a qemu guest has a local unused disk.
The disk will migrate by offline disk migration, so it is not in the target_drives.
2017-02-28 12:33:27 +01:00
Alexandre Derumier
56af714629 add with-local-disks option for live storage migration
As Fabian as required,
add an extra flag "with-local-disks"  to enable live storage migration with localdisk.

default target storage is same sid than source, this can be overrided with
"targetstorage" option.

I will try improve this later, with optionnal mapping, disk by disk.

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2017-01-06 12:10:25 +01:00
Wolfgang Bumiller
bd2d5fe6ff cleanup: error messages 2017-01-05 10:03:16 +01:00
Wolfgang Bumiller
3b4cf0f0fc cleanup: whitespaces & style 2017-01-05 10:03:10 +01:00
Alexandre Derumier
b74cad8ae3 add live storage migration with vm migration
This allow to migrate disks on local storage  to a remote node storage.

When the target node start, a new volumes are created and exposed through qemu embedded nbd server.

qemu drive-mirror is launch on source vm for each disk with nbd server as target.

when drive-mirror reach 100% of 1 disk, we don't complete the block jobs and begin mirror of next disk.
(mirroring are parralel, but we try to mirroring them 1 by 1 to avoid storage && network overload)

Then we live migrate the vm to destination node. (drive-mirror still occur at the same time).

We the vm is livemigrate (source vm paused, target vm pause), we complete the block jobs mirror.

When is done we stop the source vm and resume the target vm

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2017-01-05 09:09:46 +01:00
Dominik Csapak
b3205b153e allow migration of local qcow2 snapshots
we can migrate local snapshots when on zfs or dir storage with qcow2,
but the check was incorrect

we checked for if (zfs && !qcow2) instead of  if (zfs || qcow2)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2016-12-05 12:32:50 +01:00
Thomas Lamprecht
2de2d6f74e allow dedicated migration network, bug #1177
Without this patch we use the network were the cluster traffic runs
for sending migration traffic. This is not ideal as it may hinder
cluster traffic. Further some users have a powerful network which
would be perfect for migrations, with this patch they can run the
migration traffic over such a network without having the corosync
traffic on the same network.

The network is configurable through /etc/pve/datacenter.cfg which
got a new property, namely migration. migration has two
subproperties: type (replaces the old migration_unsecure property)
and network.

For the case of a network failure or that a VM has to be moved over
another network for arbitrary other reasons I added the
migration_type and migration_network parameters to qm migrate (and
respectively vm_start as this gets used on migration).
They allow overwriting the datacenter.cfg settings.

Fixes bug #1177

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-11-03 09:51:23 +01:00
Fabian Grünbichler
3a7bc9e252 forbid migration of template with local base image 2016-09-15 14:15:09 +02:00
Fabian Grünbichler
5bf7f0f1a8 collect errors from all local volumes
and then die with more meaningful output, instead of on the
first encountered error.
2016-06-30 11:55:21 +02:00
Fabian Grünbichler
dabf24736c add comments and rename volhash 2016-06-30 11:55:21 +02:00
Fabian Grünbichler
4abdd867df switch order of disk checks
to make log message more meaningful.
'storage' < 'snapshot' < 'config'
2016-06-30 11:55:21 +02:00
Fabian Grünbichler
d62fcf74a7 collect and log origin of found local volumes
just knowing that local disks prevent a migration is not
very helpful, so be a bit more verbose here.
2016-06-30 11:55:21 +02:00
Fabian Grünbichler
2a2127bd6d drop unncessary cdromhash 2016-06-17 16:28:07 +02:00
Fabian Grünbichler
98d80cb67b use foreach_drive instead of foreach_volid
foreach_volid recurses over snapshots as well, resulting in
lots of repeated checks (especially for VMs with lots of
snapshots and disks).

a potential vmstate volume must be checked explicitly,
because foreach_drive does not care about those.
2016-06-17 16:27:25 +02:00
Fabian Grünbichler
86638cc2dc fix whitespace/indent 2016-06-17 16:24:16 +02:00
Fabian Grünbichler
89719f9887 don't repeat storage check for each volid 2016-06-17 16:23:49 +02:00
Wolfgang Link
b6adff3385 fix perl scope issues
Add parameter array to foreach_volid to use is in the functions.
correct typos.
2016-06-16 11:26:37 +02:00
Dietmar Maurer
3629c19d23 add check for snapshots at migration
We cannot migrate snapshots on local disks, for example lvmthin snapshots.
2016-06-16 10:21:57 +02:00
Wolfgang Link
c4d2d6c15c Add LVM and LVMThin to QemuMigration
Offline migration on LVM and LVMThin are possible offline.
2016-06-16 08:14:33 +02:00
Thomas Lamprecht
e858e9d241 do not open forward tunnel on insecure migrations
Restore previous behaviour and do not request a forward tunnel on
insecure migrations.

For the migrations of all kind this has no direct impact, they all
worked, but an port to much requested from an limited pool is still
not ideal. Also an open tunnel, if not needed.

This is a light regression introduced from commit 1c9d54b.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-06 14:51:34 +02:00
Thomas Lamprecht
54323eed5f migrate: unlink unix socket before starting migration
Just to be sure nobody else has (wrongfully) left that file here.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 16:02:25 +02:00
Thomas Lamprecht
f34d146679 migrate: add some more log output
Output all errors - if any - and add some log outputs on what we qmp
commands we do with which parameters, may be helpful when debugging
or analyzing a users problem.

Also check if the queried status is defined, as on a error this may
not be.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 12:00:50 +02:00
Thomas Lamprecht
92437b8de0 migrate: close tunnel after dest. VM stopped on error
On error let phase2_cleanup close the tunnel as it stops the for
incoming migration waiting VM on the destination first, to be safe.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 12:00:25 +02:00
Thomas Lamprecht
1c9d54bfd0 migrate: use ssh forwarded UNIX socket tunnel
We cannot guarantee when the SSH forward Tunnel really becomes
ready. The check with the mtunnel API call did not help for this
prolem as it only checked that the SSH connection itself works and
that the destination node has quorum but the forwarded tunnel itself
was not checked.

The Forward tunnel is a different channel in the SSH connection,
independent of the SSH `qm mtunnel` channel, so only if that works
it does not guarantees that our migration tunnel is up and ready.

When the node(s) where under load, or when we did parallel
migrations (migrateall), the migrate command was often started
before a tunnel was open and ready to receive data. This led to
a direct abortion of the migration and is the main cause in why
parallel migrations often leave two thirds or more VMs on the
source node.
The issue was tracked down to SSH after debugging the QEMU
process and enabling debug logging showed that the tunnel became
often to late available and ready, or not at all.

Fixing the TCP forward tunnel is quirky and not straight ahead, the
only way SSH gives as a possibility is to use -N (no command)
-f (background) and -o "ExitOnForwardFailure=yes", then it would
wait in the foreground until the tunnel is ready and only then
background itself. This is not quite the nicest way for our special
use case and our code base.
Waiting for the local port to become open and ready (through
/proc/net/tcp[6]] as a proof of concept is not enough, even if the
port is in the listening state and should theoretically accept
connections this still failed often as the tunnel was not yet fully
ready.

Further another problem would still be open if we tried to patch the
SSH Forward method we currently use - which we solve for free with
the approach of this patch - namely the problem that the method
to get an available port (next_migration_port) has a serious race
condition which could lead to multiple use of the same port on a
parallel migration (I observed this on my many test, seldom but if
it happens its really bad).

So lets now use UNIX sockets, which ssh supports since version 5.7.
The end points are UNIX socket bound to the VMID - thus no port so
no race and also no limitation of available ports (we reserved 50 for
migration).

The endpoints get created in /run/qemu-server/VMID.migrate and as
KVM/QEMU in current versions is able to use UNIX socket just as well
as TCP we have not to change much on the interaction with QEMU.
QEMU is started with the migrate_incoming url at the local
destination endpoint and creates the socket file, we then create
a listening socket on the source side and connect over SSH to the
destination.
Now the migration can be started by issuing the migrate qmp command
with an updated uri.

This breaks live migration from new to old, but *not* from old to
new, so there is a upgrade path.
If a live migration from new to old must be made (for whatever
reason), use the unsecure_migration setting (man datacenter.conf)
to allow this, although that should only be done in trusted network.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 11:51:46 +02:00
Thomas Lamprecht
61b04c6d5a migrate: collect migration tunnel child process
use waitpid with WNO_HANG to check if the ssh tunnel child process
is still running and collect at the same time if it exited.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 11:47:13 +02:00
Wolfgang Link
674051dcac fix typo 2016-06-02 09:59:51 +02:00
Fabian Grünbichler
e1fc368d6b fix typos 2016-05-04 10:47:23 +02:00
Fabian Grünbichler
73f5ee92af fix #971: don't activate shared storage in offline migration
instead, just print a warning if the connection check fails.
as long as the storage is online on the target node, the VM
will start fine after migration.
2016-05-04 10:47:15 +02:00
Fabian Grünbichler
29701766ae migrate: check if storage is available 2016-05-04 10:47:04 +02:00
Fabian Grünbichler
ffda963f46 Refactor basic config-related methods
Drop load_config, write_config, lock_config[_xx],
check_lock, check_protection, is_template and config_file
in favour of implementions in PVE::AbstractConfig.

Implement guest_type, __config_max_unused_disks,
config_file_lock and cfs_config_path from
PVE::AbstractConfig in PVE::QemuConfig.
2016-03-08 11:41:59 +01:00
Fabian Grünbichler
8317c759bf Drop skiplock from write_config
Since write_config was always called with skiplock=1 except
once, it makes sense to drop this parameter like in
PVE::LXC::write_config . If needed in the future, the
caller can use check_lock before write_config anyway.
2016-02-12 12:16:57 +01:00
Fabian Grünbichler
63be43a947 Refactor update_config_nolock -> write_config
The method update_config wrapped update_config_nolock
using lock_config, but to prevent update races the whole
"read config", "do something", "write config" flow was
always protected by lock_config anyway, and update_config
was never called.

Thus, we can safely drop update_config and rename
update_config_nolock to write_config like in PVE::LXC .
2016-02-12 12:14:52 +01:00
Wolfgang Link
386c6ba7f5 close tunnel after migration is finish.
if we do not close it, there is a change that the tunnel stays open and the next migration will not work.
2016-02-02 18:16:18 +01:00
Alexandre Derumier
42dbd2ee30 add qemu_machine_pxe
return machinename with .pxe suffix if a nic with pxe romfile exist

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2015-11-06 10:51:14 +01:00
Alexandre Derumier
7bac824e19 use qom-get to check if pxe file are used V2
fix qemu 2.4 pxe -> qemu 2.4 efi

Changelog : forget to add a check on qom-get result

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2015-11-06 07:55:07 +01:00
Wolfgang Bumiller
407e0b8bef migration: improve ipv6 case
Qemu parses hostnames in brackets correctly but sets an ipv6
flag for them as if they were ipv6 addresses, only insert
brackets for ipv6 addresses.
2015-11-06 07:53:03 +01:00
Alexandre Derumier
289e0b8564 migrate : add nocheck for resume
Users have reported resume bug when HA is used.

They seem to have a little race (bench show >0s < 1s) between the vm conf file move on source node and replication to,
and resume on target node.

I don't known why this is only with HA, maybe this occur will standard migration too.

Anyway, we don't need to read the vm config file to resume the vm on target host,
as we are sure that the vm is migrated, and config file move action is correct in the cluster.

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2015-10-15 12:41:13 +02:00
Wolfgang Bumiller
2fbd27eabc migration: put the source address in brackets
Always adding brackets around the address works. They're required for
ipv6 and qemu also accepts them for ipv4 and hostnames.
2015-05-21 17:30:30 +02:00
Wolfgang Bumiller
af0eba7e35 pass port family to next_*_port() calls 2015-05-12 12:28:56 +02:00
Wolfgang Link
adf8ac08c8 implement offline migration on zfs
Signed-off-by: Wolfgang Link <w.link@proxmox.com>
2015-04-27 10:42:59 +02:00
Wolfgang Link
37a6dc7809 fix bug #618: correct typo
Signed-off-by: Wolfgang Link <w.link@proxmox.com>
2015-04-27 10:42:49 +02:00
Alexandre Derumier
985a5f483d migration : add setup state
since qemu 1.5, they are a new migration state : "setup"

it's mainly use for rdma migration, but slow vm can it see and hang on migration

http://git.qemu.org/?p=qemu.git;a=commit;h=3b6959506831193f37cc830c8e111b437c0d1380

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2014-06-17 08:57:31 +02:00
Stefan Priebe
2e787b1892 QemuMigrate: print migration xbzrle if enabled (has xbzrlecachesize) for whatever reason (bug qemu, bug pve, ...)
Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
2014-02-10 12:29:17 +01:00
Alexandre Derumier
a89fded11f migration : enable auto-converge capability v2
This reduce guest cpu speed if dirtied bytes is 50% more than the approx.amount of bytes that just got transferred since the last time we were in this routine.

qemu commit :
http://git.qemu.org/?p=qemu.git;a=commit;h=bde1e2ec2176c363c1783bf8887b6b1beb08dfee

tested with "stress -m 2 -c 2" under debian

without autoconvergence : downtime 12s - duration 12min
with autoconvergence : downtime 2s - duration 4min

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2014-01-10 13:01:55 +01:00
Dietmar Maurer
dd25eecf62 code cleanup
Use new helper methods.
2013-12-10 10:46:50 +01:00
Alexandre Derumier
fd8469f7de qemu migrate : only wait for spice server online + eval
Currently offline migration fail ,because we are trying to check with qmp the spiceserver status.
This should be done online only.

I also add eval, to avoid migration lock if qmp query fail.

Fix :http://forum.proxmox.com/threads/16093-VM-is-locked-after-offline-migration?p=82852

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2013-09-19 06:28:17 +02:00
Dietmar Maurer
033a0b366d fix migration port (wrong quote) 2013-08-12 09:48:13 +02:00
Stefan Priebe
5bc1e0397e qemu-server: add support for unsecure migration (setting in datacenter.cfg)
This patch adds support for unsecure migration using a direct tcp connection
KVM <=> KVM instead of an extra SSH tunnel. Without ssh the limit is just the
bandwith and no longer the CPU / one single core.

You can enable this by adding:
migration_unsecure: 1
to datacenter.cfg

Examples using qemu 1.4 as migration with qemu 1.3 still does not work for me:

current default with SSH Tunnel VM uses 2GB mem:
Dec 27 21:10:32 starting migration of VM 105 to node 'cloud1-1202' (10.255.0.20)
Dec 27 21:10:32 copying disk images
Dec 27 21:10:32 starting VM 105 on remote node 'cloud1-1202'
Dec 27 21:10:35 starting ssh migration tunnel
Dec 27 21:10:36 starting online/live migration on localhost:60000
Dec 27 21:10:36 migrate_set_speed: 8589934592
Dec 27 21:10:36 migrate_set_downtime: 1
Dec 27 21:10:38 migration status: active (transferred 152481002, remaining 1938546688), total 2156396544) , expected downtime 0
Dec 27 21:10:40 migration status: active (transferred 279836995, remaining 1811140608), total 2156396544) , expected downtime 0
Dec 27 21:10:42 migration status: active (transferred 421265271, remaining 1669840896), total 2156396544) , expected downtime 0
Dec 27 21:10:44 migration status: active (transferred 570987974, remaining 1520152576), total 2156396544) , expected downtime 0
Dec 27 21:10:46 migration status: active (transferred 721469404, remaining 1369939968), total 2156396544) , expected downtime 0
Dec 27 21:10:48 migration status: active (transferred 875595258, remaining 1216057344), total 2156396544) , expected downtime 0
Dec 27 21:10:50 migration status: active (transferred 1034654822, remaining 1056931840), total 2156396544) , expected downtime 0
Dec 27 21:10:54 migration status: active (transferred 1176288424, remaining 915369984), total 2156396544) , expected downtime 0
Dec 27 21:10:56 migration status: active (transferred 1339734759, remaining 752050176), total 2156396544) , expected downtime 0
Dec 27 21:10:58 migration status: active (transferred 1503743261, remaining 588206080), total 2156396544) , expected downtime 0
Dec 27 21:11:02 migration status: active (transferred 1645097827, remaining 446906368), total 2156396544) , expected downtime 0
Dec 27 21:11:04 migration status: active (transferred 1810562934, remaining 281751552), total 2156396544) , expected downtime 0
Dec 27 21:11:06 migration status: active (transferred 1964377505, remaining 126033920), total 2156396544) , expected downtime 0
Dec 27 21:11:08 migration status: active (transferred 2077930417, remaining 0), total 2156396544) , expected downtime 0
Dec 27 21:11:09 migration speed: 62.06 MB/s - downtime 37 ms
Dec 27 21:11:09 migration status: completed
Dec 27 21:11:13 migration finished successfuly (duration 00:00:41)
TASK OK

with unsecure migration without SSH Tunnel:
Dec 27 22:43:14 starting migration of VM 105 to node 'cloud1-1203' (10.255.0.22)
Dec 27 22:43:14 copying disk images
Dec 27 22:43:14 starting VM 105 on remote node 'cloud1-1203'
Dec 27 22:43:17 starting online/live migration on 10.255.0.22:60000
Dec 27 22:43:17 migrate_set_speed: 8589934592
Dec 27 22:43:17 migrate_set_downtime: 1
Dec 27 22:43:19 migration speed: 1024.00 MB/s - downtime 1100 ms
Dec 27 22:43:19 migration status: completed
Dec 27 22:43:22 migration finished successfuly (duration 00:00:09)
TASK OK
2013-07-26 11:23:49 +02:00
Dietmar Maurer
7c14dcae1f use STDIN to pass spice ticket 2013-07-24 12:19:51 +02:00
Dietmar Maurer
86b8228b59 new vga_conf_has_spice() helper
code cleanups
2013-07-24 12:01:03 +02:00
Alexandre Derumier
95a4b4a98b add spice migration
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2013-07-24 10:54:20 +02:00
Dietmar Maurer
42668529e6 migrate: pass --machine parameter to remote 'qm start' command 2013-06-05 10:24:39 +02:00
Dietmar Maurer
f9a971e0ee fix bug #381: use PVE::Tools::next_migrate_port() 2013-05-13 07:30:50 +02:00
Dietmar Maurer
b7b1ac9d04 fix check if a backing file exist 2013-02-28 06:36:46 +01:00
Alexandre Derumier
d5f315fda5 migration : display qm resume error in task log
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2013-02-25 06:15:45 +01:00
Alexandre Derumier
d560409207 forbid offline migration of a non shared volume if it's a clone
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2013-02-15 07:52:53 +01:00
Dietmar Maurer
0302101cf1 remove expected_downtime from migration status 2013-02-13 10:47:54 +01:00
Stefan Priebe
19168b91ae QemuMigrate: phase2_cleanup misses migrate_cancel
Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
2013-01-02 06:36:53 +01:00
Stefan Priebe
865ef13278 implement dynamic migration_downtime
changelog:

- increment counter also if remaining memory equal 0 (qemu 1.4 migration code)
- only increment coutner and set down_time if memory transfert have occured. (to avoid too fast downtime increment)

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-12-31 07:20:56 +01:00
Alexandre Derumier
135007c099 add downtime && expected_downtime query-migrate info
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-12-27 12:45:56 +01:00
Alexandre Derumier
ab399b7c5d add error log for qm start of the target vm.
Can be usefull to see what's wrong if target vm doesn't start (missing storage, missing bridge,...)

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-12-27 12:44:34 +01:00
Alexandre Derumier
3beb415bd7 move qmp migrate_set_down && migrate_set_speed to qemumigrate
so we can set the values when the vm is running
also use int() to get json working

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-12-27 12:43:39 +01:00
Dietmar Maurer
d5769dc253 migrate volumes used inside snapshots including vmstate
Introduce new helper function foreach_volid()
2012-09-25 08:09:50 +02:00
Dietmar Maurer
a06c7f7ec4 fix check for non-shared disks 2012-09-25 07:26:34 +02:00
Dietmar Maurer
972511a06a migrate: disable xbzrle for now.
This is not stable, and sometimes cause endless migration (migration never stops).
2012-08-31 11:02:47 +02:00
Dietmar Maurer
94235c592c avoid warning about uninitialized value 2012-08-30 12:15:07 +02:00
Dietmar Maurer
b0b756c14d migrate: tolerate query-migrate errors 2012-08-30 09:28:24 +02:00
Alexandre Derumier
e18b0b9964 livemigrate : activate xbzrle cache
This help migrate for vm with of lot of memory access (like database)

live migration tests working:

kvm 1.2 -> kvm 1.2  (xbzrle set on both side)
kvm 1.1 -> kvm 1.2 (xbzrle on target)
kvm 1.1 -> kvm 1.1 (xbzrle not set, qmp command try to set xbzrle but fail)

failing migration

kvm 1.2 -> kvm 1.1 fail, but this is expected.

I tested with a memory benchmark running on the vm with 4GB ram

without xbzrle : migration take 10min, with many network hang
with xbzrle : migration take 1min, no hang

I display xbzrle counters for debug purpose, we can remove them later

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-08-29 07:55:21 +02:00
Dietmar Maurer
af30308f36 we call vm_stop to target host,
to be sure that kvm process is killed (but it should kill itself),
and deactivate volumes

I slightly modified this patch (orig. from Alexandre) so that it apply cleanly.
2012-08-23 10:28:41 +02:00
Alexandre Derumier
e52bd94c7e live migration: reduce sleep when remaining memory is low
Reduce sleep to 0.3s when remaining memory is lower than the average transfert in 1 iteration.

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-08-23 07:37:59 +02:00
Dietmar Maurer
f5eb281ad3 cleanup: detete trailing whitespace 2012-08-23 07:36:48 +02:00
Alexandre Derumier
b67900f17a put target vm in singlestep mode and resume it only when config is moved
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-08-23 07:32:21 +02:00
Alexandre Derumier
7e8dcf2cb0 add migratedfrom param to start vm with conf file an another node
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-08-23 07:18:36 +02:00
Alexandre Derumier
c04b5b04de implement phase2_cleanup
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-08-23 07:17:15 +02:00
Alexandre Derumier
b8d208023b move config file in phase3, when live migration is finished
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-08-23 07:16:45 +02:00
Dietmar Maurer
373ea5798a migrate: only scan available storages 2012-07-16 10:20:36 +02:00
Dietmar Maurer
522c8f97d7 code cleanup, bump version to 2.0-44 2012-07-16 07:00:28 +02:00
Alexandre Derumier
80b2cbd1b9 migrate: syncdisk : avoid scanning shared storage
Currently we get list from PVE::Storage (for unused volumes), from all storage.
If something goes wrong with the network on host and thenwe can't communicate with a network shared storage(sheepdog,rbd,..),
the vdisk_list die (timeout) and we cannot migrate the vm on another kvm host.(online or offline).

We don't need to scan shared storage, as they are no disk to sync.

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-07-16 06:52:35 +02:00
Dietmar Maurer
a05b47a8a8 migrate: fix warning about uninitialized values
And display acurate byte values instead of KB
2012-07-13 12:37:19 +02:00
Alexandre Derumier
5a7835f572 convert migrate monitor commands to qmp
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2012-06-26 06:38:34 +02:00
Dietmar Maurer
1858638fe3 replace change_config_nolock with update_config_nolock
We now use cfs_file_write() in order to avoid race conditions between
file IO and cfs operations (read after write works now).
2012-02-02 14:18:41 +01:00
Dietmar Maurer
97439670bc online migration fix: close tunnel later, wait for connection close 2012-01-17 11:25:44 +01:00
Dietmar Maurer
17eed025b3 use PVE::Tools::run_with_timeout 2011-12-15 11:29:01 +01:00
Dietmar Maurer
d68afb26bf improve error message 2011-12-08 10:07:19 +01:00
Dietmar Maurer
72afda82a1 fix migration tunnel 2011-12-08 09:32:09 +01:00
Dietmar Maurer
46a84fd400 replace logmsg() with $self->log() 2011-12-07 11:25:20 +01:00
Dietmar Maurer
16e903f2dc use new AbstractMigrate.pm 2011-12-07 06:36:20 +01:00