qemu-server

mirror of https://git.proxmox.com/git/qemu-server synced 2025-09-19 04:31:59 +00:00

Author	SHA1	Message	Date
Fabian Grünbichler	a20dc58a1b	explain 'nocheck' in more places was only explained in git history and vm_stop, add comments in other relevant places to avoid future breakage. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2022-11-21 13:42:52 +01:00
Fabian Grünbichler	eef93bc590	migrate: add remote migration handling remote migration uses a websocket connection to a task worker running on the target node instead of commands via SSH to control the migration. this websocket tunnel is started earlier than the SSH tunnel, and allows adding UNIX-socket forwarding over additional websocket connections on-demand. the main differences to regular intra-cluster migration are: - source VM config and disks are only removed upon request via --delete - shared storages are treated like local storages, since we can't assume they are shared across clusters (with potentical to extend this by marking storages as shared) - NBD migrated disks are explicitly pre-allocated on the target node via tunnel command before starting the target VM instance - in addition to storages, network bridges and the VMID itself is transformed via a user defined mapping - all commands and migration data streams are sent via a WS tunnel proxy - pending changes and snapshots are discarded on the target side (for the time being) Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2022-11-17 15:21:39 +01:00
Fabian Grünbichler	05b2a4ae9c	migrate: refactor remote VM/tunnel start no semantic changes intended, except for: - no longer passing the main migration UNIX socket to SSH twice for forwarding - dropping the 'unix:' prefix in start_remote_tunnel's timeout error message Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2022-11-17 15:21:39 +01:00
Thomas Lamprecht	71cc2c4177	migration: cloudinit check: bump manager dependency and guard with cloudinit drive The former to ensure the manager that depends on the newer qemu-server is actually installed and the latter to avoid false positives Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-16 12:47:43 +01:00
Alexandre Derumier	73ed64967e	migration : add del_nets_bridge_fdb at the end of a live migration, we need to remove old mac entries on source host (vm is not yet stopped), before resume vm on target host Signed-off-by: Alexandre Derumier <aderumier@odiso.com> [T: resolve conflicts and rework on apply ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-13 14:56:57 +01:00
Alexandre Derumier	9c88e85446	migration: test targetnode min version for cloudinit section Signed-off-by: Alexandre Derumier <aderumier@odiso.com> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-08 17:23:30 +01:00
Fabian Ebner	13d121d79b	fix #3861 : migrate: fix live migration when cloud-init changes storage Generalizes `fd95d780` ("migrate: send updated TPM state volid to target node") to also handle other offline migrated disks appearing in the VM config, which currently should only be cloud-init. Breaks migration new -> old under similar (edge-case-)conditions as `fd95d780` did. Keep sending the 'tpmstate0' STDIN parameter to avoid breaking new -> old in the scenario `fd95d780` fixed. Keep parsing the vm_start 'tpmstate0' STDIN parameter to avoid breaking old -> new, and to be able to keep sending it. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2022-04-28 18:29:12 +02:00
Fabian Ebner	8a0d269b75	migrate: resume initially running VM when failing after convergence When phase2() is aborted after the migration already converged, then after migrate_cancel, the VM might be in POSTMIGRATE state. (There also is a conditional for SHUTDOWN state in QEMU's migration_iteration_finish(), so it's likely possible to end up there if the VM is shut down at the right time during migration, but no need to resume then). Detect the POSTMIGRATE state and resume the VM if it wasn't paused at the beginning of the migration. There is no direct way to go to PAUSED, so just print an error if the VM was paused at the beginning of the migration. Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2022-04-26 11:42:25 +02:00
Fabian Ebner	0028391f95	migrate: add log for guest fstrim and make a failure noticable. Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2022-04-26 11:42:25 +02:00
Fabian Ebner	a183576e30	migrate: keep VM paused after migration if it was before Also cannot issue a guest agent command in that case. Reported in the community forum: https://forum.proxmox.com/threads/106618 Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2022-04-21 08:57:11 +02:00
Fabian Grünbichler	e594231bf1	migrate: move tunnel-helpers to pve-guest-common besides the log calls these don't need any parts of the migration state, so let's make them generic and re-use them for container migration and replication in the future. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2022-02-09 18:49:55 +01:00
Fabian Grünbichler	82a0367149	move map_storage to PVE::JSONSchema::map_id since we are going to reuse the same mechanism/code for network bridge mapping and pve-container. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2022-02-09 18:46:20 +01:00
Fabian Grünbichler	fd95d780a2	migrate: send updated TPM state volid to target node The volid may change if local-storage migration is involved, we need to tell the target node the new one and update the in-memory config for starting the target VM accordingly. Reported here: https://forum.proxmox.com/threads/99906/#post-431345 this possibly breaks migration new -> old iff - spice is not used (else the explicit ticket wins because it comes later) - a local TPM state volume is used - that local TPM state volume has a different volume id on the target node (switched storage, volname already taken, ..) because the target node will then mis-interpret the tpmstate0 line as spice ticket and set it accordingly. if the old tpm state volume ID does not exist on the target node, migration will fail. if it exists by chance, it might work albeit with a wrong spice ticket (new because of this patch) and tpm state volume (pre-existing breakage). Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2021-11-22 16:55:17 +01:00
Stefan Reiter	f9dde219f2	fix #3075 : add TPM v1.2 and v2.0 support via swtpm Starts an instance of swtpm per VM in it's systemd scope, it will terminate by itself if the VM exits, or be terminated manually if startup fails. Before first use, a TPM state is created via swtpm_setup. State is stored in a 'tpmstate0' volume, treated much the same way as an efidisk. It is migrated 'offline', the important part here is the creation of the target volume, the actual data transfer happens via the QEMU device state migration process. Move-disk can only work offline, as the disk is not registered with QEMU, so 'drive-mirror' wouldn't work. swtpm itself has no method of moving a backing storage at runtime. For backups, a bit of a workaround is necessary (this may later be replaced by NBD support in swtpm): During the backup, we attach the backing file of the TPM as a read-only drive to QEMU, so our backup code can detect it as a block device and back it up as such, while ensuring consistency with the rest of disk state ("snapshot" semantic). The name for the ephemeral drive is specifically chosen as 'drive-tpmstate0-backup', diverging from our usual naming scheme with the '-backup' suffix, to avoid it ever being treated as a regular drive from the rest of the stack in case it gets left over after a backup for some reason (shouldn't happen). Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2021-10-05 06:51:02 +02:00
Thomas Lamprecht	f8830c4d6e	migrate: code style, use up to 100cc if it helps to reduce line-bloat Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-09-22 09:26:18 +02:00
Thomas Lamprecht	95b3583b5e	migrate: simplify code and add comment Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-09-22 09:25:53 +02:00
Fabian Ebner	d213ba299d	migrate: use correct target storage id for checks The '--targetstorage' parameter does not apply to shared storages. Example for a problem solved with the enabled check: Given a VM with images only on a shared storage 'storeA', not available on the target node (i.e. restricted by the nodes property). Then using '--targetstorage storeB' would make offline migration suddenly "work", but of course the disks would not be accessible and then trying to migrate back would fail... Example for a problem solved with the content type check: if a VM had a shared ISO image, and there was a '--targetstorage storeA' option, availablity of the 'iso' content type is checked for 'storeA', which is wrong as the ISO would not be moved to that storage. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-09-22 08:57:35 +02:00
Mira Limbeck	104f47a9f8	fix #2563 : allow live migration with local cloud-init disk The content of the ISO should be the same on both nodes, so offline migrate the ISO, but don't regenerate it on VM start on the target node. This way even with snippets the content will not change during live migration. Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>	2021-07-23 11:04:22 +02:00
Wolfgang Bumiller	205dbf39b1	allow migrating raw btrfs volumes Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2021-06-23 12:26:40 +02:00
Thomas Lamprecht	db861a4617	migrate prepare: make content type check generic to avoid false-positives, e.g., from a ISO on a ISO only storage. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-06-23 12:15:43 +02:00
Thomas Lamprecht	8a5bd88907	migrate prepare: use also explicit variable for storecfg Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-06-23 12:15:16 +02:00
Fabian Ebner	24b84b4766	migrate: enforce that image content type is available and use it for the vdisk_list call too. This avoids scanning (and picking up volumes from!) storages that are not even configured to hold images. Previously, the content type was only enforced when a storage map was present. Also serves a bit as a preparation to enforce content type on guest startup, because now migration failure happens early and not only when trying to start the guest on the remote node. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-06-21 11:17:48 +02:00
Fabian Ebner	0d2db08414	prefer storage_check_enabled over storage_check_node storage_check_enabled simply checks for the 'disable' option and then calls storage_check_node. While not strictly necessary for a second call where only the storage differs, e.g. in case of clone, it is more future-proof: if support for a target storage is added at some point, it might be easy to miss adapting the call. For the migration checks, the situation is improved by now always catching disabled (target) storages. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-06-21 11:17:48 +02:00
Fabian Ebner	692f604bb0	Revert "revert spice_ticket prefix change in 7827de4" This reverts commit `ff09c795ed`. We wanted to wait until PVE 7.0 for the change to not break migration new -> old until then. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Reviewed-by: Stefan Reiter <s.reiter@proxmox.com>	2021-06-08 14:56:10 +02:00
Thomas Lamprecht	8f43ac4893	Revert "migration: do not set default speed limit" The default was changed for 5.2, so while it is not 32 MiB/s anymore, it is still 128 MiB/s which I did not notice on my 1 Gbps (or < 125 MiB/s) setup. For users with links faster than one gigabit it now did some limiting - so setup a very high limit so than even 100G should not max this out. This reverts commit `a89bd10084`.	2021-04-29 15:48:21 +02:00
Fabian Ebner	9938d24df2	migrate: fix memory migration start time The variable is only ever used for calculating the average speed of memory migration, but it was set before disk mirroring already. But the disk sizes are not included in the calculation, resulting in (very) wrong values. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-23 15:00:44 +02:00
Thomas Lamprecht	b68a957b2e	migration: keep log rate steady if polling gets more frequent Either we're done in a few seconds anyway, or if the VM dirties lots of pages we need quite a bit of time, and then it does not help to output roughly the same status 10 times a second... Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-19 22:08:19 +02:00
Thomas Lamprecht	0fca250af0	migration: rework logging to more humand friendly, less spammy * use render_bytes where possible, to get quick to read and grasp units printed * xbzrle is only interesting if actually pages/bytes are send using it, so only log in that case * log if VM dirties more than we send * log current speed we get from QEMU In general there are less lines logged and huge integers are avoided. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-19 21:54:37 +02:00
Thomas Lamprecht	e693c49190	migration: factor out variable + code cleanup Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-19 21:51:21 +02:00
Thomas Lamprecht	7de328c629	migration: log: s/migration_caps/migration capabilities/ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-19 21:48:31 +02:00
Thomas Lamprecht	a89bd10084	migration: do not set default speed limit the claim that QEMU limits this to 32M otherwise is bogus, at least with any current QEMU version.. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-19 21:46:52 +02:00
Thomas Lamprecht	6539865a9d	migration: refactor and tidy-up code Use an early die so that the rest can loose an indentation level for the actual migration status reporting code Extract common used members of the stat hash for shorter code. use `git show -w --word-diff=color --word-diff-regex='\w+'` for getting a better view of actual changes Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-19 14:59:54 +02:00
Fabian Ebner	0783c3c271	migration: move finishing block jobs to phase2 for better/uniform error handling avoids the possibility to die during phase3_cleanup and instead of needing to duplicate the cleanup ourselves, benefit from phase2_cleanup doing so. The duplicate cleanup was also very incomplete: it didn't stop the remote kvm process (leading to 'VM already running' when trying to migrate again afterwards), but it removed its disks, and it didn't unlock the config, didn't close the tunnel and didn't cancel the block-dirty bitmaps. Since migrate_cancel should do nothing after the (non-storage) migrate process has completed, even that cleanup step is fine here. Since phase3 is empty at the moment, the order of operations is still the same. Also add a test, that would complain about finish_tunnel not being called before this patch. That test also checks that local disks are not already removed before finishing the block jobs. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	a6be63ac9b	migration: split out replication from scan_local_volumes and avoid one loop over the config, by extending foreach_volid to include the drivename. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	4b26ffbfa5	migration: keep track of replicated volumes via local_volumes by extending filter_local_volumes. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	efe0d457c6	migration: use storage_migration for checks instead of online_local_volumes Like this we don't need to worry about auto-vivifaction. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	eb5751ba02	migration: cleanup_remotedisks: simplify and include more disks Namely, those migrated with storage_migrate by using the information from volume_map. Call cleanup_remotedisks in phase1_cleanup as well, because that's where we end if sync_offline_local_volumes fails, and some disks might already have been transfered successfully. Note that the local disks are still here, so this is fine. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	ad8b9d5e2d	migration: simplify removal of local volumes and get rid of self->{volumes} This also changes the behavior to remove the local copies of offline migrated volumes only after the migration has finished successfully (this is relevant for mixed settings, e.g. online migration with unused/vmstate disks). local_volumes contains both, the volumes previously in $self->{volumes} and the volumes in $self->{online_local_volumes}, and hence is the place to look for which volumes we need to remove. Of course, replicated volumes still need to be skipped. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	efbbe59da4	migration: add nbd migrated volumes to volume_map earlier and avoid a little bit of duplication by creating a helper Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	c3417e3b6e	migration: save targetstorage and bwlimit in local_volumes hash and re-use information It is enough to call get_bandwith_limit once for each source_storage. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	2c4ba4c3ee	migration: fix calculation of bandwith limit for non-disk migration The case with: 1. no generic 'migration' limit from the storage plugin 2. a migrate_speed limit in the VM config was broken. It would assign 0 to migrate_speed when picking the minimum value and then default to the default value. Fix it by checking if bwlimit is 0 before picking the minimum. Also, make it a bit more readable by avoiding the trick of //-assigning bwlimit before the units match up and relying on getting back the original bwlimit value as the minimum. Instead, only \|\|-assign after the units match up and don't rely on other things. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	3276a43470	migration: split out config_update_local_disksizes from scan_local_volumes Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	62a4c963b8	migration: avoid re-scanning all volumes by using the information obtained in the first scan. This also makes sure we only scan local storages. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	d10b78f4d2	migration: split sync_disks into two functions by making local_volumes class-accessible. One functions is for scanning all local volumes and one is for actually syncing offline volumes via storage_migrate. The exception is replicated volumes, this still happens during the scan for now. Also introduce a filter_local_volumes helper, to makes life easier. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2021-04-18 18:30:41 +02:00
Fabian Ebner	eb3acec88a	migration: sort volumes migrated with storage_migrate Having a deterministic order here is useful for testing. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2020-12-15 15:21:37 +01:00
Fabian Ebner	7d730f953c	migration: factor out starting remote tunnel so it can be mocked when testing. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2020-12-15 15:21:37 +01:00
Fabian Ebner	27fa645e66	use new move_config_to_node method Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2020-12-15 15:21:37 +01:00
Fabian Ebner	e219712561	deactivate volumes after storage_migrate Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2020-11-24 16:19:35 +01:00
Fabian Ebner	78bd57d9c3	adapt to new storage_migrate activation behavior Offline migrated volumes are now activated within storage_migrate. Online migrated volumes can be assumed to be already active. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2020-11-24 16:19:29 +01:00
Fabian Ebner	19ff368213	don't migrate replicated VM whose replication job is marked for removal while it didn't actually fail, we probably want to avoid the behavior: With remove_job=full: * run_replication called during migration causes the replicated volumes to be removed * migration continues by fully copying all volumes With remove_job=local: * run_replication called during migration causes the job (and local replication snapshots) to be removed * migration continues by fully copying all volumes and renaming them to avoid collision with the still existing remote volumes Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2020-11-09 10:08:22 +01:00

1 2 3 4 5 ...

254 Commits