qemu-server

mirror of https://git.proxmox.com/git/qemu-server synced 2025-10-24 08:51:24 +00:00

Author	SHA1	Message	Date
Fabian Ebner	1764fa05d0	Extract volume ID before calling 'parse_volume_id' Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2020-02-05 08:41:05 +01:00
Fabian Ebner	8b02e56870	rename 'volid' to 'drivestr' where it's not only a volume ID Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2020-02-05 08:41:05 +01:00
Fabian Ebner	c96173968a	Remove unused 'sharedvm' variable AFAICT this one hasn't been in use since commit '4530494bf9f3d45c4a405c53ef3688e641f6bd8e' Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2020-01-09 17:43:51 +01:00
Stefan Reiter	8bf30c2a72	fix #2493 : show QEMU errors in migration log QEMU usually only prints warnings and errors and stays silent otherwise, so it makes sense to just log all of it's output. Prefix it with '[<target_hostname>]' to indicate that the output is coming from the remote node, so users know where to search for the error. Side effect is that the 'VM start' task created by the migration will now show the "QEMU:" prefix, but it's still very readable IMHO. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2019-12-12 13:36:19 +01:00
Stefan Reiter	6e0216d862	hide long commandline on vm_start/migrate failure By default run_command prints the entire commandline executed when an error occurs, but QEMU and our migrate command are not only uninteresting to the user[] but also annoyingly long. Hide them and only print the exit code. [] Especially our migrate command, since it can't be manually executed anyway. QEMU's commandline might contain something interesting, but is so long that it's tricky to parse anyway, any a user can always call 'qm showcmd --pretty'. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2019-12-12 13:35:40 +01:00
Stefan Reiter	68b108ee3a	update disk size before local disk migration Split out 'update_disksize' from the renamed 'update_disk_config' to allow code reuse in QemuMigrate. Remove dots after messages to keep style consistent for migration log. After updating in sync_disks (phase1) of migration, write out updated config. This means that even if migration fails or is aborted in later stages, we keep the fixed config - this is not an issue, as it would have been fixed on the next attempt anyway, and it can't hurt to have the correct size instead of a wrong one either way. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2019-12-11 10:42:56 +01:00
Stefan Reiter	71c58bb7ed	remove $vmid param from print_drive It isn't used in the sub, but suggest it is needed. No users outside qemu-server found. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2019-12-09 11:44:13 +01:00
Thomas Lamprecht	dad06e2068	refactor storage whitelist in sync_disks to regex Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-12-04 18:40:03 +01:00
Thomas Lamprecht	40a572f7e8	migrate phase 3 cleanup: add error into error propagation message Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-11-30 17:27:14 +01:00
Stefan Reiter	3392d6cacf	refactor: extract QEMU machine related helpers to package ...PVE::QemuServer::Machine. qemu_machine_feature_enabled is exported since it has a lot of users in PVE::QemuServer and a long enough name as it is. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2019-11-20 16:29:23 +01:00
Stefan Reiter	0a13e08ec2	refactor: create QemuServer::Monitor for high-level QMP access QMP and monitor helpers are moved from QemuServer.pm. By using only vm_running_locally instead of check_running, a cyclic dependency to QemuConfig is avoided. This also means that the $nocheck parameter serves no more purpose, and has thus been removed along with vm_mon_cmd_nocheck. Care has been taken to avoid errors resulting from this, and occasionally a manual check for a VM's existance inserted on the callsite. Methods have been renamed to avoid redundant naming: * vm_qmp_command -> qmp_cmd * vm_mon_cmd -> mon_cmd * vm_human_monitor_command -> hmp_cmd mon_cmd is exported since it has many users. This patch also changes all non-package users of vm_qmp_command to use the mon_cmd helper. Includes mocking for tests. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2019-11-20 16:29:23 +01:00
Thomas Lamprecht	e85d01f282	migration: fix false-positive log for copying local images Only log that if we actually have local disks. Add also an explicit log for replication. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-11-20 16:01:35 +01:00
Fabian Ebner	9270672e67	fix typo in migration cleanup error message Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-10-28 11:30:10 +01:00
Mira Limbeck	9860fe4ef9	close #2263 : die on live migration with local cloudinit disk Live migration with a local cloudinit disk was never intended to work. It did however work to an extent that the migration completed but the disk on the source node could not be deleted. Now die if a live migration is started with a local cloudinit disk. With the GUI changes live migration is already disabled as it recognizes the cloudinit disk as a local resource. Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>	2019-08-26 12:13:07 +02:00
Dominik Csapak	ccab68c22c	fix remote viewer live migration for some reason not setting port results in a port of '65535' which triggers an execption in http-server anyevent, so we set the port to 0 also, we have to read the ticket from stdin even for 'unix' type secure migration Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2019-08-20 11:49:24 +02:00
Tim Marx	ca6abacf6b	migrate: log which local resource causes error Signed-off-by: Tim Marx <t.marx@proxmox.com>	2019-05-07 10:22:12 +00:00
Tim Marx	370b05e719	whitespace cleanup Signed-off-by: Tim Marx <t.marx@proxmox.com>	2019-05-07 10:22:12 +00:00
Stoiko Ivanov	d189e5901b	bwlimit: add parameter for QemuMigrate::phase2 used for online local disks via qemu_drive_mirror Add TODO comment for offline disks, as clone_disk calls `qemu-img convert`, which does not have a bandwidth limit parameter. Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>	2019-04-02 11:00:28 +02:00
Stoiko Ivanov	15a37695b6	bwlimit: add parameter to QemuMigrate::sync_disks used for offline migration of local volumes Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>	2019-04-02 10:58:35 +02:00
Stoiko Ivanov	ddd664d739	bwlimit: honor bwlimit for migrate qmp call The 'migrate_speed' can be set in the VM config. Additionally the 'migrate' bwlimit from datacenter.cfg (storage-specific limits play no role for memory+state migration) or the parameter provided to the API call can restrict the speed. Take the lower of the two. This patch also refactors the setting of migrate_speed and comments for clarity. Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>	2019-04-02 10:34:40 +02:00
Mira Limbeck	9e93a63fe4	fix #2100 : ignore cloudinit drive on offline migration disk is not copied to the target node but still deleted on cleanup (phase3_cleanup). Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>	2019-03-29 18:11:33 +01:00
Thomas Lamprecht	769f187df5	followup whitespace fixes Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-02-20 07:34:10 +01:00
Alexandre Derumier	f3a483b682	QemuMigrate : cleanup identation	2019-02-20 07:32:23 +01:00
Thomas Lamprecht	c7789f54ad	migrate: fix local disk migration with online VMs commit `4530494bf9` introduced an regression with local disk migrations if the VM is online and thus needs to live migrated and no target storage was passed as parameter. We made the hack to write "1" to the targetstorage option in this case obsolete, but it was still used on deciding if there are any drives to mirror at all. Here it is enough to check if there are any 'online_local_volumes' because that hash gets only filled if we can and are told to live mirror local disk on migrations anyway. Also, we abort early if local disks are found and the 'with-local-disks' option is not set. This was reported at: https://forum.proxmox.com/threads/livemigration-with-localdisk-doesnt-coppy-and-data-from-the-hdds-anymore.50744/ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-17 10:58:50 +01:00
Wolfgang Bumiller	8c58b12d0d	cleanup: use a local $override_targetsid variable Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2018-12-20 10:11:32 +01:00
Thomas Lamprecht	4530494bf9	fix local disk migration when no target storage is set the check for targetstorage in: if ($self->{running} && $self->{opts}->{targetstorage} && $local_volumes->{$volid}->{ref} eq 'config') { was obsolete, as we always set the tragetstorage opts variable to '1' in a broader "use same sid for remote local" check above. So removing it leads to the same if truthtable but fixes the check if we should fallback to the volume's SID if targetstorage is not set, as else it seemed to be always set, and '1' is naturally not a correct stroage ID. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2018-12-20 10:11:32 +01:00
Alexandre Derumier	d0c671823d	fix #1013 : migrate : sync_disk : --targetstorage with offline disk targetsid was not used, for disk unused (offline copy) Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2018-12-20 10:11:32 +01:00
Stoiko Ivanov	ca6621315e	Fix #1242 : clone_disk : call qga fstrim after clone Some storage like rbd or lvm can't keep thin-provising after a qemu-mirror. Call qga guest-fstrim if qga is available and fstrim_cloned_disks is enabled after move_disk and migrate. Co-Authored-By: Alexandre Derumier <aderumier@odiso.com> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>	2018-08-02 11:35:50 +02:00
Alexandre Derumier	50d8dd5dc7	migrate cache-size : power of 2 qemu 2.11 need a power of 2 cache size. " Parameter 'xbzrle_cache_size' expects is invalid, it should be bigger than target page size and a power of two " roundup to near power of 2 value	2018-02-22 16:27:48 +01:00
Herman van Rink	d108cb1eb2	migrate: task log: fix typo Signed-off-by: Herman van Rink <rink@initfour.nl>	2018-02-22 14:50:00 +01:00
Chris Hofstaedtler	ec82e3eee4	fix #1569 : add shared flag to disks With shared=1, (live) migration ignores the disk and assumes it is present on all target nodes. This works similar to shared=1 on LXC mountpoints. Signed-off-by: Chris Hofstaedtler <chris.hofstaedtler@deduktiva.com> Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-02-15 15:19:29 +01:00
Alexandre Derumier	d296ed08d3	migration : enable mtunnel for insecure migration V2 We only use it to send commands faster like resume Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2017-09-12 14:15:33 +02:00
Emmanuel Kasper	c2327320a8	Remove unused variable declaration	2017-09-07 11:22:32 +02:00
Emmanuel Kasper	46dd42f70c	Fix #1441 : Do not unplug controllers when the mirroring is finished This should not be needed since we call 'block-job-complete' before in qemu_drive_mirror_monitor(), and after benchmarking it does not appear to be needed nor provide a measurable improvement when shutting down the source.	2017-09-07 11:22:32 +02:00
Fabian Grünbichler	4305207d61	migrate: reduce polling intervals Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-08-07 09:23:56 +02:00
Fabian Grünbichler	4bdd20ab14	migrate: keep track of replication and only transfer state and switch direction if there actually are any replicated volumes. once we add support for live-migration with replicated volumes, adding a set-replication-state command to the tunnel and using that probably makes sense. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-08-07 09:23:56 +02:00
Fabian Grünbichler	2e7fee87df	migrate: finish tunnel in phase 3 after resuming the VM over the tunnel. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-08-07 09:23:56 +02:00
Fabian Grünbichler	1d5aaa1db5	qm mtunnel/migrate: add resume VMID command and reformat the legacy SSH variant for readability. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-08-07 09:23:56 +02:00
Fabian Grünbichler	bcb51ae8f9	mtunnel: add and handle OK/ERR replies because we want commands to return meaningful errors, and print them on the client/source side. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-08-07 09:23:56 +02:00
Fabian Grünbichler	58cbe63901	migrate: read mtunnel version Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-08-07 09:23:56 +02:00
Fabian Grünbichler	e0eb1f7677	migrate: refactor mtunnel read/write to make adding new commands and reading replies easier Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-08-07 09:23:56 +02:00
Fabian Grünbichler	d7b1b24b6f	migrate: switch back to qm mtunnel to allow adding guest specific commands to the tunnel Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-08-07 09:23:56 +02:00
Emmanuel Kasper	171ed95c76	Use default values when memory is not set in vm.conf when migrating This fixes a "Use of uninitialized value in multiplication (*) " warning when doing a migration	2017-07-03 14:37:00 +02:00
Thomas Lamprecht	da18cc9300	migrate: use 'mtunnel' from pvecm qm mtunnel was deemed as deprecated but still in use here. Switch over to pvecm's mtunnel to allow removing the qm variant in PVE 5.1 Also use an absolute path so we do not depended on the targets environment variables Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2017-06-23 11:00:52 +02:00
Wolfgang Bumiller	6f58fce9ee	migrate: pass the with_snapshots parameter	2017-06-22 12:58:14 +02:00
Dietmar Maurer	54d10ab121	PVE::QemuMigrate. do not use JSON - not required here	2017-06-22 08:51:38 +02:00
Dietmar Maurer	d652f7b1ab	PVE/QemuMigrate.pm: use new replication job helpers from AbstractMigrate	2017-06-21 12:27:45 +02:00
Dietmar Maurer	f6a17ff5e3	Change target in replication-state when replication direction is switched	2017-06-21 10:59:45 +02:00
Dietmar Maurer	dbc9420b0b	PVE/QemuMigrate.pm: use replication job, transfer replication state	2017-06-20 12:42:51 +02:00
Dietmar Maurer	5009a8c755	PVE/QemuMigrate.pm: fix syntax errors	2017-06-13 11:56:26 +02:00
Dietmar Maurer	aee6abe5ba	PVE/QemuMigrate.pm - use PVE::QemuServer::foreach_volid	2017-06-13 11:26:47 +02:00
Wolfgang Bumiller	ba5acf88a1	migrate: migration_type setting moved to pve-guest-common	2017-06-09 12:28:28 +02:00
Wolfgang Bumiller	f1c2a53aee	migration: implement insecure offline migration	2017-06-01 10:50:28 +02:00
Wolfgang Bumiller	7126e1c9bb	migrate: pass ssh_info to storage_migrate	2017-05-23 09:57:17 +02:00
Dietmar Maurer	46883f80f6	Revert "Integrate replica in the qemu migration." This reverts commit `63d02c7074`. The commit changes the configuration before the VM is actually migrated, so it is possible to have a wrong configuration when migration fails for some reason. Also, I am quite unsure if this automatic target change is really wanted. The patch also contains wrong refereces to $self->{opts}->{node}.	2017-05-06 10:39:43 +02:00
Dietmar Maurer	b1c12185fb	Revert "migrate: cleanup replica volume skip condition" This reverts commit `6e8044dcea`.	2017-05-06 10:38:06 +02:00
Wolfgang Bumiller	6e8044dcea	migrate: cleanup replica volume skip condition	2017-04-28 10:34:46 +02:00
Wolfgang Link	63d02c7074	Integrate replica in the qemu migration. Now it is possible to migrate a VM offline when replica is enabled. It will reduce replication to an minimal amount.	2017-04-28 10:11:33 +02:00
Alexandre Derumier	d80ad67f9d	live storage migration : fix check of target storage availability if we define a different target storeid for remote node, and that storage is not available on source node Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2017-04-21 12:05:36 +02:00
Fabian Grünbichler	877e2ea746	migrate: clarify comment Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-04-21 11:43:29 +02:00
Fabian Grünbichler	28412ae488	migrate: cleanup nbd source disks earlier Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-04-21 11:43:29 +02:00
Fabian Grünbichler	504105c638	fix #1338 : migrate: stop nbd before resuming since Qemu 2.9, block device write access is limited to one writer unless shared_rw is set to true. there is an exception for live-migrating local disks via NBD as long as the VM is suspended. stop the NBD server before resuming the VM accordingly to unbreak local disk live-migration. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2017-04-21 11:43:29 +02:00
Fabian Grünbichler	8b54f4b8db	defined() style cleanup	2017-02-28 12:46:47 +01:00
Wolfgang Link	9045f57a27	Check array existed before use. This triggers if a qemu guest has a local unused disk. The disk will migrate by offline disk migration, so it is not in the target_drives.	2017-02-28 12:33:27 +01:00
Alexandre Derumier	56af714629	add with-local-disks option for live storage migration As Fabian as required, add an extra flag "with-local-disks" to enable live storage migration with localdisk. default target storage is same sid than source, this can be overrided with "targetstorage" option. I will try improve this later, with optionnal mapping, disk by disk. Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2017-01-06 12:10:25 +01:00
Wolfgang Bumiller	bd2d5fe6ff	cleanup: error messages	2017-01-05 10:03:16 +01:00
Wolfgang Bumiller	3b4cf0f0fc	cleanup: whitespaces & style	2017-01-05 10:03:10 +01:00
Alexandre Derumier	b74cad8ae3	add live storage migration with vm migration This allow to migrate disks on local storage to a remote node storage. When the target node start, a new volumes are created and exposed through qemu embedded nbd server. qemu drive-mirror is launch on source vm for each disk with nbd server as target. when drive-mirror reach 100% of 1 disk, we don't complete the block jobs and begin mirror of next disk. (mirroring are parralel, but we try to mirroring them 1 by 1 to avoid storage && network overload) Then we live migrate the vm to destination node. (drive-mirror still occur at the same time). We the vm is livemigrate (source vm paused, target vm pause), we complete the block jobs mirror. When is done we stop the source vm and resume the target vm Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2017-01-05 09:09:46 +01:00
Dominik Csapak	b3205b153e	allow migration of local qcow2 snapshots we can migrate local snapshots when on zfs or dir storage with qcow2, but the check was incorrect we checked for if (zfs && !qcow2) instead of if (zfs \|\| qcow2) Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2016-12-05 12:32:50 +01:00
Thomas Lamprecht	2de2d6f74e	allow dedicated migration network, bug #1177 Without this patch we use the network were the cluster traffic runs for sending migration traffic. This is not ideal as it may hinder cluster traffic. Further some users have a powerful network which would be perfect for migrations, with this patch they can run the migration traffic over such a network without having the corosync traffic on the same network. The network is configurable through /etc/pve/datacenter.cfg which got a new property, namely migration. migration has two subproperties: type (replaces the old migration_unsecure property) and network. For the case of a network failure or that a VM has to be moved over another network for arbitrary other reasons I added the migration_type and migration_network parameters to qm migrate (and respectively vm_start as this gets used on migration). They allow overwriting the datacenter.cfg settings. Fixes bug #1177 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2016-11-03 09:51:23 +01:00
Fabian Grünbichler	3a7bc9e252	forbid migration of template with local base image	2016-09-15 14:15:09 +02:00
Fabian Grünbichler	5bf7f0f1a8	collect errors from all local volumes and then die with more meaningful output, instead of on the first encountered error.	2016-06-30 11:55:21 +02:00
Fabian Grünbichler	dabf24736c	add comments and rename volhash	2016-06-30 11:55:21 +02:00
Fabian Grünbichler	4abdd867df	switch order of disk checks to make log message more meaningful. 'storage' < 'snapshot' < 'config'	2016-06-30 11:55:21 +02:00
Fabian Grünbichler	d62fcf74a7	collect and log origin of found local volumes just knowing that local disks prevent a migration is not very helpful, so be a bit more verbose here.	2016-06-30 11:55:21 +02:00
Fabian Grünbichler	2a2127bd6d	drop unncessary cdromhash	2016-06-17 16:28:07 +02:00
Fabian Grünbichler	98d80cb67b	use foreach_drive instead of foreach_volid foreach_volid recurses over snapshots as well, resulting in lots of repeated checks (especially for VMs with lots of snapshots and disks). a potential vmstate volume must be checked explicitly, because foreach_drive does not care about those.	2016-06-17 16:27:25 +02:00
Fabian Grünbichler	86638cc2dc	fix whitespace/indent	2016-06-17 16:24:16 +02:00
Fabian Grünbichler	89719f9887	don't repeat storage check for each volid	2016-06-17 16:23:49 +02:00
Wolfgang Link	b6adff3385	fix perl scope issues Add parameter array to foreach_volid to use is in the functions. correct typos.	2016-06-16 11:26:37 +02:00
Dietmar Maurer	3629c19d23	add check for snapshots at migration We cannot migrate snapshots on local disks, for example lvmthin snapshots.	2016-06-16 10:21:57 +02:00
Wolfgang Link	c4d2d6c15c	Add LVM and LVMThin to QemuMigration Offline migration on LVM and LVMThin are possible offline.	2016-06-16 08:14:33 +02:00
Thomas Lamprecht	e858e9d241	do not open forward tunnel on insecure migrations Restore previous behaviour and do not request a forward tunnel on insecure migrations. For the migrations of all kind this has no direct impact, they all worked, but an port to much requested from an limited pool is still not ideal. Also an open tunnel, if not needed. This is a light regression introduced from commit `1c9d54b`. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2016-06-06 14:51:34 +02:00
Thomas Lamprecht	54323eed5f	migrate: unlink unix socket before starting migration Just to be sure nobody else has (wrongfully) left that file here. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2016-06-03 16:02:25 +02:00
Thomas Lamprecht	f34d146679	migrate: add some more log output Output all errors - if any - and add some log outputs on what we qmp commands we do with which parameters, may be helpful when debugging or analyzing a users problem. Also check if the queried status is defined, as on a error this may not be. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2016-06-03 12:00:50 +02:00
Thomas Lamprecht	92437b8de0	migrate: close tunnel after dest. VM stopped on error On error let phase2_cleanup close the tunnel as it stops the for incoming migration waiting VM on the destination first, to be safe. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2016-06-03 12:00:25 +02:00
Thomas Lamprecht	1c9d54bfd0	migrate: use ssh forwarded UNIX socket tunnel We cannot guarantee when the SSH forward Tunnel really becomes ready. The check with the mtunnel API call did not help for this prolem as it only checked that the SSH connection itself works and that the destination node has quorum but the forwarded tunnel itself was not checked. The Forward tunnel is a different channel in the SSH connection, independent of the SSH `qm mtunnel` channel, so only if that works it does not guarantees that our migration tunnel is up and ready. When the node(s) where under load, or when we did parallel migrations (migrateall), the migrate command was often started before a tunnel was open and ready to receive data. This led to a direct abortion of the migration and is the main cause in why parallel migrations often leave two thirds or more VMs on the source node. The issue was tracked down to SSH after debugging the QEMU process and enabling debug logging showed that the tunnel became often to late available and ready, or not at all. Fixing the TCP forward tunnel is quirky and not straight ahead, the only way SSH gives as a possibility is to use -N (no command) -f (background) and -o "ExitOnForwardFailure=yes", then it would wait in the foreground until the tunnel is ready and only then background itself. This is not quite the nicest way for our special use case and our code base. Waiting for the local port to become open and ready (through /proc/net/tcp[6]] as a proof of concept is not enough, even if the port is in the listening state and should theoretically accept connections this still failed often as the tunnel was not yet fully ready. Further another problem would still be open if we tried to patch the SSH Forward method we currently use - which we solve for free with the approach of this patch - namely the problem that the method to get an available port (next_migration_port) has a serious race condition which could lead to multiple use of the same port on a parallel migration (I observed this on my many test, seldom but if it happens its really bad). So lets now use UNIX sockets, which ssh supports since version 5.7. The end points are UNIX socket bound to the VMID - thus no port so no race and also no limitation of available ports (we reserved 50 for migration). The endpoints get created in /run/qemu-server/VMID.migrate and as KVM/QEMU in current versions is able to use UNIX socket just as well as TCP we have not to change much on the interaction with QEMU. QEMU is started with the migrate_incoming url at the local destination endpoint and creates the socket file, we then create a listening socket on the source side and connect over SSH to the destination. Now the migration can be started by issuing the migrate qmp command with an updated uri. This breaks live migration from new to old, but not from old to new, so there is a upgrade path. If a live migration from new to old must be made (for whatever reason), use the unsecure_migration setting (man datacenter.conf) to allow this, although that should only be done in trusted network. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2016-06-03 11:51:46 +02:00
Thomas Lamprecht	61b04c6d5a	migrate: collect migration tunnel child process use waitpid with WNO_HANG to check if the ssh tunnel child process is still running and collect at the same time if it exited. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2016-06-03 11:47:13 +02:00
Wolfgang Link	674051dcac	fix typo	2016-06-02 09:59:51 +02:00
Fabian Grünbichler	e1fc368d6b	fix typos	2016-05-04 10:47:23 +02:00
Fabian Grünbichler	73f5ee92af	fix #971 : don't activate shared storage in offline migration instead, just print a warning if the connection check fails. as long as the storage is online on the target node, the VM will start fine after migration.	2016-05-04 10:47:15 +02:00
Fabian Grünbichler	29701766ae	migrate: check if storage is available	2016-05-04 10:47:04 +02:00
Fabian Grünbichler	ffda963f46	Refactor basic config-related methods Drop load_config, write_config, lock_config[_xx], check_lock, check_protection, is_template and config_file in favour of implementions in PVE::AbstractConfig. Implement guest_type, __config_max_unused_disks, config_file_lock and cfs_config_path from PVE::AbstractConfig in PVE::QemuConfig.	2016-03-08 11:41:59 +01:00
Fabian Grünbichler	8317c759bf	Drop skiplock from write_config Since write_config was always called with skiplock=1 except once, it makes sense to drop this parameter like in PVE::LXC::write_config . If needed in the future, the caller can use check_lock before write_config anyway.	2016-02-12 12:16:57 +01:00
Fabian Grünbichler	63be43a947	Refactor update_config_nolock -> write_config The method update_config wrapped update_config_nolock using lock_config, but to prevent update races the whole "read config", "do something", "write config" flow was always protected by lock_config anyway, and update_config was never called. Thus, we can safely drop update_config and rename update_config_nolock to write_config like in PVE::LXC .	2016-02-12 12:14:52 +01:00
Wolfgang Link	386c6ba7f5	close tunnel after migration is finish. if we do not close it, there is a change that the tunnel stays open and the next migration will not work.	2016-02-02 18:16:18 +01:00
Alexandre Derumier	42dbd2ee30	add qemu_machine_pxe return machinename with .pxe suffix if a nic with pxe romfile exist Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2015-11-06 10:51:14 +01:00
Alexandre Derumier	7bac824e19	use qom-get to check if pxe file are used V2 fix qemu 2.4 pxe -> qemu 2.4 efi Changelog : forget to add a check on qom-get result Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2015-11-06 07:55:07 +01:00
Wolfgang Bumiller	407e0b8bef	migration: improve ipv6 case Qemu parses hostnames in brackets correctly but sets an ipv6 flag for them as if they were ipv6 addresses, only insert brackets for ipv6 addresses.	2015-11-06 07:53:03 +01:00
Alexandre Derumier	289e0b8564	migrate : add nocheck for resume Users have reported resume bug when HA is used. They seem to have a little race (bench show >0s < 1s) between the vm conf file move on source node and replication to, and resume on target node. I don't known why this is only with HA, maybe this occur will standard migration too. Anyway, we don't need to read the vm config file to resume the vm on target host, as we are sure that the vm is migrated, and config file move action is correct in the cluster. Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2015-10-15 12:41:13 +02:00
Wolfgang Bumiller	2fbd27eabc	migration: put the source address in brackets Always adding brackets around the address works. They're required for ipv6 and qemu also accepts them for ipv4 and hostnames.	2015-05-21 17:30:30 +02:00
Wolfgang Bumiller	af0eba7e35	pass port family to next_*_port() calls	2015-05-12 12:28:56 +02:00
Wolfgang Link	adf8ac08c8	implement offline migration on zfs Signed-off-by: Wolfgang Link <w.link@proxmox.com>	2015-04-27 10:42:59 +02:00
Wolfgang Link	37a6dc7809	fix bug #618 : correct typo Signed-off-by: Wolfgang Link <w.link@proxmox.com>	2015-04-27 10:42:49 +02:00
Alexandre Derumier	985a5f483d	migration : add setup state since qemu 1.5, they are a new migration state : "setup" it's mainly use for rdma migration, but slow vm can it see and hang on migration http://git.qemu.org/?p=qemu.git;a=commit;h=3b6959506831193f37cc830c8e111b437c0d1380 Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2014-06-17 08:57:31 +02:00
Stefan Priebe	2e787b1892	QemuMigrate: print migration xbzrle if enabled (has xbzrlecachesize) for whatever reason (bug qemu, bug pve, ...) Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>	2014-02-10 12:29:17 +01:00
Alexandre Derumier	a89fded11f	migration : enable auto-converge capability v2 This reduce guest cpu speed if dirtied bytes is 50% more than the approx.amount of bytes that just got transferred since the last time we were in this routine. qemu commit : http://git.qemu.org/?p=qemu.git;a=commit;h=bde1e2ec2176c363c1783bf8887b6b1beb08dfee tested with "stress -m 2 -c 2" under debian without autoconvergence : downtime 12s - duration 12min with autoconvergence : downtime 2s - duration 4min Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2014-01-10 13:01:55 +01:00
Dietmar Maurer	dd25eecf62	code cleanup Use new helper methods.	2013-12-10 10:46:50 +01:00
Alexandre Derumier	fd8469f7de	qemu migrate : only wait for spice server online + eval Currently offline migration fail ,because we are trying to check with qmp the spiceserver status. This should be done online only. I also add eval, to avoid migration lock if qmp query fail. Fix :http://forum.proxmox.com/threads/16093-VM-is-locked-after-offline-migration?p=82852 Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2013-09-19 06:28:17 +02:00
Dietmar Maurer	033a0b366d	fix migration port (wrong quote)	2013-08-12 09:48:13 +02:00
Stefan Priebe	5bc1e0397e	qemu-server: add support for unsecure migration (setting in datacenter.cfg) This patch adds support for unsecure migration using a direct tcp connection KVM <=> KVM instead of an extra SSH tunnel. Without ssh the limit is just the bandwith and no longer the CPU / one single core. You can enable this by adding: migration_unsecure: 1 to datacenter.cfg Examples using qemu 1.4 as migration with qemu 1.3 still does not work for me: current default with SSH Tunnel VM uses 2GB mem: Dec 27 21:10:32 starting migration of VM 105 to node 'cloud1-1202' (10.255.0.20) Dec 27 21:10:32 copying disk images Dec 27 21:10:32 starting VM 105 on remote node 'cloud1-1202' Dec 27 21:10:35 starting ssh migration tunnel Dec 27 21:10:36 starting online/live migration on localhost:60000 Dec 27 21:10:36 migrate_set_speed: 8589934592 Dec 27 21:10:36 migrate_set_downtime: 1 Dec 27 21:10:38 migration status: active (transferred 152481002, remaining 1938546688), total 2156396544) , expected downtime 0 Dec 27 21:10:40 migration status: active (transferred 279836995, remaining 1811140608), total 2156396544) , expected downtime 0 Dec 27 21:10:42 migration status: active (transferred 421265271, remaining 1669840896), total 2156396544) , expected downtime 0 Dec 27 21:10:44 migration status: active (transferred 570987974, remaining 1520152576), total 2156396544) , expected downtime 0 Dec 27 21:10:46 migration status: active (transferred 721469404, remaining 1369939968), total 2156396544) , expected downtime 0 Dec 27 21:10:48 migration status: active (transferred 875595258, remaining 1216057344), total 2156396544) , expected downtime 0 Dec 27 21:10:50 migration status: active (transferred 1034654822, remaining 1056931840), total 2156396544) , expected downtime 0 Dec 27 21:10:54 migration status: active (transferred 1176288424, remaining 915369984), total 2156396544) , expected downtime 0 Dec 27 21:10:56 migration status: active (transferred 1339734759, remaining 752050176), total 2156396544) , expected downtime 0 Dec 27 21:10:58 migration status: active (transferred 1503743261, remaining 588206080), total 2156396544) , expected downtime 0 Dec 27 21:11:02 migration status: active (transferred 1645097827, remaining 446906368), total 2156396544) , expected downtime 0 Dec 27 21:11:04 migration status: active (transferred 1810562934, remaining 281751552), total 2156396544) , expected downtime 0 Dec 27 21:11:06 migration status: active (transferred 1964377505, remaining 126033920), total 2156396544) , expected downtime 0 Dec 27 21:11:08 migration status: active (transferred 2077930417, remaining 0), total 2156396544) , expected downtime 0 Dec 27 21:11:09 migration speed: 62.06 MB/s - downtime 37 ms Dec 27 21:11:09 migration status: completed Dec 27 21:11:13 migration finished successfuly (duration 00:00:41) TASK OK with unsecure migration without SSH Tunnel: Dec 27 22:43:14 starting migration of VM 105 to node 'cloud1-1203' (10.255.0.22) Dec 27 22:43:14 copying disk images Dec 27 22:43:14 starting VM 105 on remote node 'cloud1-1203' Dec 27 22:43:17 starting online/live migration on 10.255.0.22:60000 Dec 27 22:43:17 migrate_set_speed: 8589934592 Dec 27 22:43:17 migrate_set_downtime: 1 Dec 27 22:43:19 migration speed: 1024.00 MB/s - downtime 1100 ms Dec 27 22:43:19 migration status: completed Dec 27 22:43:22 migration finished successfuly (duration 00:00:09) TASK OK	2013-07-26 11:23:49 +02:00
Dietmar Maurer	7c14dcae1f	use STDIN to pass spice ticket	2013-07-24 12:19:51 +02:00
Dietmar Maurer	86b8228b59	new vga_conf_has_spice() helper code cleanups	2013-07-24 12:01:03 +02:00
Alexandre Derumier	95a4b4a98b	add spice migration Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2013-07-24 10:54:20 +02:00
Dietmar Maurer	42668529e6	migrate: pass --machine parameter to remote 'qm start' command	2013-06-05 10:24:39 +02:00
Dietmar Maurer	f9a971e0ee	fix bug #381 : use PVE::Tools::next_migrate_port()	2013-05-13 07:30:50 +02:00
Dietmar Maurer	b7b1ac9d04	fix check if a backing file exist	2013-02-28 06:36:46 +01:00
Alexandre Derumier	d5f315fda5	migration : display qm resume error in task log Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2013-02-25 06:15:45 +01:00
Alexandre Derumier	d560409207	forbid offline migration of a non shared volume if it's a clone Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2013-02-15 07:52:53 +01:00
Dietmar Maurer	0302101cf1	remove expected_downtime from migration status	2013-02-13 10:47:54 +01:00
Stefan Priebe	19168b91ae	QemuMigrate: phase2_cleanup misses migrate_cancel Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>	2013-01-02 06:36:53 +01:00
Stefan Priebe	865ef13278	implement dynamic migration_downtime changelog: - increment counter also if remaining memory equal 0 (qemu 1.4 migration code) - only increment coutner and set down_time if memory transfert have occured. (to avoid too fast downtime increment) Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-12-31 07:20:56 +01:00
Alexandre Derumier	135007c099	add downtime && expected_downtime query-migrate info Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-12-27 12:45:56 +01:00
Alexandre Derumier	ab399b7c5d	add error log for qm start of the target vm. Can be usefull to see what's wrong if target vm doesn't start (missing storage, missing bridge,...) Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-12-27 12:44:34 +01:00
Alexandre Derumier	3beb415bd7	move qmp migrate_set_down && migrate_set_speed to qemumigrate so we can set the values when the vm is running also use int() to get json working Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-12-27 12:43:39 +01:00
Dietmar Maurer	d5769dc253	migrate volumes used inside snapshots including vmstate Introduce new helper function foreach_volid()	2012-09-25 08:09:50 +02:00
Dietmar Maurer	a06c7f7ec4	fix check for non-shared disks	2012-09-25 07:26:34 +02:00
Dietmar Maurer	972511a06a	migrate: disable xbzrle for now. This is not stable, and sometimes cause endless migration (migration never stops).	2012-08-31 11:02:47 +02:00
Dietmar Maurer	94235c592c	avoid warning about uninitialized value	2012-08-30 12:15:07 +02:00
Dietmar Maurer	b0b756c14d	migrate: tolerate query-migrate errors	2012-08-30 09:28:24 +02:00
Alexandre Derumier	e18b0b9964	livemigrate : activate xbzrle cache This help migrate for vm with of lot of memory access (like database) live migration tests working: kvm 1.2 -> kvm 1.2 (xbzrle set on both side) kvm 1.1 -> kvm 1.2 (xbzrle on target) kvm 1.1 -> kvm 1.1 (xbzrle not set, qmp command try to set xbzrle but fail) failing migration kvm 1.2 -> kvm 1.1 fail, but this is expected. I tested with a memory benchmark running on the vm with 4GB ram without xbzrle : migration take 10min, with many network hang with xbzrle : migration take 1min, no hang I display xbzrle counters for debug purpose, we can remove them later Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-08-29 07:55:21 +02:00
Dietmar Maurer	af30308f36	we call vm_stop to target host, to be sure that kvm process is killed (but it should kill itself), and deactivate volumes I slightly modified this patch (orig. from Alexandre) so that it apply cleanly.	2012-08-23 10:28:41 +02:00
Alexandre Derumier	e52bd94c7e	live migration: reduce sleep when remaining memory is low Reduce sleep to 0.3s when remaining memory is lower than the average transfert in 1 iteration. Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-08-23 07:37:59 +02:00
Dietmar Maurer	f5eb281ad3	cleanup: detete trailing whitespace	2012-08-23 07:36:48 +02:00
Alexandre Derumier	b67900f17a	put target vm in singlestep mode and resume it only when config is moved Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-08-23 07:32:21 +02:00
Alexandre Derumier	7e8dcf2cb0	add migratedfrom param to start vm with conf file an another node Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-08-23 07:18:36 +02:00
Alexandre Derumier	c04b5b04de	implement phase2_cleanup Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-08-23 07:17:15 +02:00
Alexandre Derumier	b8d208023b	move config file in phase3, when live migration is finished Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-08-23 07:16:45 +02:00
Dietmar Maurer	373ea5798a	migrate: only scan available storages	2012-07-16 10:20:36 +02:00
Dietmar Maurer	522c8f97d7	code cleanup, bump version to 2.0-44	2012-07-16 07:00:28 +02:00
Alexandre Derumier	80b2cbd1b9	migrate: syncdisk : avoid scanning shared storage Currently we get list from PVE::Storage (for unused volumes), from all storage. If something goes wrong with the network on host and thenwe can't communicate with a network shared storage(sheepdog,rbd,..), the vdisk_list die (timeout) and we cannot migrate the vm on another kvm host.(online or offline). We don't need to scan shared storage, as they are no disk to sync. Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-07-16 06:52:35 +02:00
Dietmar Maurer	a05b47a8a8	migrate: fix warning about uninitialized values And display acurate byte values instead of KB	2012-07-13 12:37:19 +02:00
Alexandre Derumier	5a7835f572	convert migrate monitor commands to qmp Signed-off-by: Alexandre Derumier <aderumier@odiso.com>	2012-06-26 06:38:34 +02:00
Dietmar Maurer	1858638fe3	replace change_config_nolock with update_config_nolock We now use cfs_file_write() in order to avoid race conditions between file IO and cfs operations (read after write works now).	2012-02-02 14:18:41 +01:00
Dietmar Maurer	97439670bc	online migration fix: close tunnel later, wait for connection close	2012-01-17 11:25:44 +01:00
Dietmar Maurer	17eed025b3	use PVE::Tools::run_with_timeout	2011-12-15 11:29:01 +01:00
Dietmar Maurer	d68afb26bf	improve error message	2011-12-08 10:07:19 +01:00
Dietmar Maurer	72afda82a1	fix migration tunnel	2011-12-08 09:32:09 +01:00
Dietmar Maurer	46a84fd400	replace logmsg() with $self->log()	2011-12-07 11:25:20 +01:00
Dietmar Maurer	16e903f2dc	use new AbstractMigrate.pm	2011-12-07 06:36:20 +01:00

1 2 3 4 5 ...

254 Commits