Commit Graph

100 Commits

Author SHA1 Message Date
Dietmar Maurer
46883f80f6 Revert "Integrate replica in the qemu migration."
This reverts commit 63d02c7074.

The commit changes the configuration before the VM is actually
migrated, so it is possible to have a wrong configuration when
migration fails for some reason. Also, I am quite unsure if
this automatic target change is really wanted. The patch also
contains wrong refereces to $self->{opts}->{node}.
2017-05-06 10:39:43 +02:00
Dietmar Maurer
b1c12185fb Revert "migrate: cleanup replica volume skip condition"
This reverts commit 6e8044dcea.
2017-05-06 10:38:06 +02:00
Wolfgang Bumiller
6e8044dcea migrate: cleanup replica volume skip condition 2017-04-28 10:34:46 +02:00
Wolfgang Link
63d02c7074 Integrate replica in the qemu migration.
Now it is possible to migrate a VM offline when replica is enabled.
It will reduce replication to an minimal amount.
2017-04-28 10:11:33 +02:00
Alexandre Derumier
d80ad67f9d live storage migration : fix check of target storage availability
if we define a different target storeid for remote node,
and that storage is not available on source node

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2017-04-21 12:05:36 +02:00
Fabian Grünbichler
877e2ea746 migrate: clarify comment
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-04-21 11:43:29 +02:00
Fabian Grünbichler
28412ae488 migrate: cleanup nbd source disks earlier
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-04-21 11:43:29 +02:00
Fabian Grünbichler
504105c638 fix #1338: migrate: stop nbd before resuming
since Qemu 2.9, block device write access is limited to one
writer unless shared_rw is set to true. there is an
exception for live-migrating local disks via NBD as long as
the VM is suspended.

stop the NBD server before resuming the VM accordingly to
unbreak local disk live-migration.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-04-21 11:43:29 +02:00
Fabian Grünbichler
8b54f4b8db defined() style cleanup 2017-02-28 12:46:47 +01:00
Wolfgang Link
9045f57a27 Check array existed before use.
This triggers if a qemu guest has a local unused disk.
The disk will migrate by offline disk migration, so it is not in the target_drives.
2017-02-28 12:33:27 +01:00
Alexandre Derumier
56af714629 add with-local-disks option for live storage migration
As Fabian as required,
add an extra flag "with-local-disks"  to enable live storage migration with localdisk.

default target storage is same sid than source, this can be overrided with
"targetstorage" option.

I will try improve this later, with optionnal mapping, disk by disk.

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2017-01-06 12:10:25 +01:00
Wolfgang Bumiller
bd2d5fe6ff cleanup: error messages 2017-01-05 10:03:16 +01:00
Wolfgang Bumiller
3b4cf0f0fc cleanup: whitespaces & style 2017-01-05 10:03:10 +01:00
Alexandre Derumier
b74cad8ae3 add live storage migration with vm migration
This allow to migrate disks on local storage  to a remote node storage.

When the target node start, a new volumes are created and exposed through qemu embedded nbd server.

qemu drive-mirror is launch on source vm for each disk with nbd server as target.

when drive-mirror reach 100% of 1 disk, we don't complete the block jobs and begin mirror of next disk.
(mirroring are parralel, but we try to mirroring them 1 by 1 to avoid storage && network overload)

Then we live migrate the vm to destination node. (drive-mirror still occur at the same time).

We the vm is livemigrate (source vm paused, target vm pause), we complete the block jobs mirror.

When is done we stop the source vm and resume the target vm

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2017-01-05 09:09:46 +01:00
Dominik Csapak
b3205b153e allow migration of local qcow2 snapshots
we can migrate local snapshots when on zfs or dir storage with qcow2,
but the check was incorrect

we checked for if (zfs && !qcow2) instead of  if (zfs || qcow2)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2016-12-05 12:32:50 +01:00
Thomas Lamprecht
2de2d6f74e allow dedicated migration network, bug #1177
Without this patch we use the network were the cluster traffic runs
for sending migration traffic. This is not ideal as it may hinder
cluster traffic. Further some users have a powerful network which
would be perfect for migrations, with this patch they can run the
migration traffic over such a network without having the corosync
traffic on the same network.

The network is configurable through /etc/pve/datacenter.cfg which
got a new property, namely migration. migration has two
subproperties: type (replaces the old migration_unsecure property)
and network.

For the case of a network failure or that a VM has to be moved over
another network for arbitrary other reasons I added the
migration_type and migration_network parameters to qm migrate (and
respectively vm_start as this gets used on migration).
They allow overwriting the datacenter.cfg settings.

Fixes bug #1177

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-11-03 09:51:23 +01:00
Fabian Grünbichler
3a7bc9e252 forbid migration of template with local base image 2016-09-15 14:15:09 +02:00
Fabian Grünbichler
5bf7f0f1a8 collect errors from all local volumes
and then die with more meaningful output, instead of on the
first encountered error.
2016-06-30 11:55:21 +02:00
Fabian Grünbichler
dabf24736c add comments and rename volhash 2016-06-30 11:55:21 +02:00
Fabian Grünbichler
4abdd867df switch order of disk checks
to make log message more meaningful.
'storage' < 'snapshot' < 'config'
2016-06-30 11:55:21 +02:00
Fabian Grünbichler
d62fcf74a7 collect and log origin of found local volumes
just knowing that local disks prevent a migration is not
very helpful, so be a bit more verbose here.
2016-06-30 11:55:21 +02:00
Fabian Grünbichler
2a2127bd6d drop unncessary cdromhash 2016-06-17 16:28:07 +02:00
Fabian Grünbichler
98d80cb67b use foreach_drive instead of foreach_volid
foreach_volid recurses over snapshots as well, resulting in
lots of repeated checks (especially for VMs with lots of
snapshots and disks).

a potential vmstate volume must be checked explicitly,
because foreach_drive does not care about those.
2016-06-17 16:27:25 +02:00
Fabian Grünbichler
86638cc2dc fix whitespace/indent 2016-06-17 16:24:16 +02:00
Fabian Grünbichler
89719f9887 don't repeat storage check for each volid 2016-06-17 16:23:49 +02:00
Wolfgang Link
b6adff3385 fix perl scope issues
Add parameter array to foreach_volid to use is in the functions.
correct typos.
2016-06-16 11:26:37 +02:00
Dietmar Maurer
3629c19d23 add check for snapshots at migration
We cannot migrate snapshots on local disks, for example lvmthin snapshots.
2016-06-16 10:21:57 +02:00
Wolfgang Link
c4d2d6c15c Add LVM and LVMThin to QemuMigration
Offline migration on LVM and LVMThin are possible offline.
2016-06-16 08:14:33 +02:00
Thomas Lamprecht
e858e9d241 do not open forward tunnel on insecure migrations
Restore previous behaviour and do not request a forward tunnel on
insecure migrations.

For the migrations of all kind this has no direct impact, they all
worked, but an port to much requested from an limited pool is still
not ideal. Also an open tunnel, if not needed.

This is a light regression introduced from commit 1c9d54b.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-06 14:51:34 +02:00
Thomas Lamprecht
54323eed5f migrate: unlink unix socket before starting migration
Just to be sure nobody else has (wrongfully) left that file here.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 16:02:25 +02:00
Thomas Lamprecht
f34d146679 migrate: add some more log output
Output all errors - if any - and add some log outputs on what we qmp
commands we do with which parameters, may be helpful when debugging
or analyzing a users problem.

Also check if the queried status is defined, as on a error this may
not be.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 12:00:50 +02:00
Thomas Lamprecht
92437b8de0 migrate: close tunnel after dest. VM stopped on error
On error let phase2_cleanup close the tunnel as it stops the for
incoming migration waiting VM on the destination first, to be safe.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 12:00:25 +02:00
Thomas Lamprecht
1c9d54bfd0 migrate: use ssh forwarded UNIX socket tunnel
We cannot guarantee when the SSH forward Tunnel really becomes
ready. The check with the mtunnel API call did not help for this
prolem as it only checked that the SSH connection itself works and
that the destination node has quorum but the forwarded tunnel itself
was not checked.

The Forward tunnel is a different channel in the SSH connection,
independent of the SSH `qm mtunnel` channel, so only if that works
it does not guarantees that our migration tunnel is up and ready.

When the node(s) where under load, or when we did parallel
migrations (migrateall), the migrate command was often started
before a tunnel was open and ready to receive data. This led to
a direct abortion of the migration and is the main cause in why
parallel migrations often leave two thirds or more VMs on the
source node.
The issue was tracked down to SSH after debugging the QEMU
process and enabling debug logging showed that the tunnel became
often to late available and ready, or not at all.

Fixing the TCP forward tunnel is quirky and not straight ahead, the
only way SSH gives as a possibility is to use -N (no command)
-f (background) and -o "ExitOnForwardFailure=yes", then it would
wait in the foreground until the tunnel is ready and only then
background itself. This is not quite the nicest way for our special
use case and our code base.
Waiting for the local port to become open and ready (through
/proc/net/tcp[6]] as a proof of concept is not enough, even if the
port is in the listening state and should theoretically accept
connections this still failed often as the tunnel was not yet fully
ready.

Further another problem would still be open if we tried to patch the
SSH Forward method we currently use - which we solve for free with
the approach of this patch - namely the problem that the method
to get an available port (next_migration_port) has a serious race
condition which could lead to multiple use of the same port on a
parallel migration (I observed this on my many test, seldom but if
it happens its really bad).

So lets now use UNIX sockets, which ssh supports since version 5.7.
The end points are UNIX socket bound to the VMID - thus no port so
no race and also no limitation of available ports (we reserved 50 for
migration).

The endpoints get created in /run/qemu-server/VMID.migrate and as
KVM/QEMU in current versions is able to use UNIX socket just as well
as TCP we have not to change much on the interaction with QEMU.
QEMU is started with the migrate_incoming url at the local
destination endpoint and creates the socket file, we then create
a listening socket on the source side and connect over SSH to the
destination.
Now the migration can be started by issuing the migrate qmp command
with an updated uri.

This breaks live migration from new to old, but *not* from old to
new, so there is a upgrade path.
If a live migration from new to old must be made (for whatever
reason), use the unsecure_migration setting (man datacenter.conf)
to allow this, although that should only be done in trusted network.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 11:51:46 +02:00
Thomas Lamprecht
61b04c6d5a migrate: collect migration tunnel child process
use waitpid with WNO_HANG to check if the ssh tunnel child process
is still running and collect at the same time if it exited.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 11:47:13 +02:00
Wolfgang Link
674051dcac fix typo 2016-06-02 09:59:51 +02:00
Fabian Grünbichler
e1fc368d6b fix typos 2016-05-04 10:47:23 +02:00
Fabian Grünbichler
73f5ee92af fix #971: don't activate shared storage in offline migration
instead, just print a warning if the connection check fails.
as long as the storage is online on the target node, the VM
will start fine after migration.
2016-05-04 10:47:15 +02:00
Fabian Grünbichler
29701766ae migrate: check if storage is available 2016-05-04 10:47:04 +02:00
Fabian Grünbichler
ffda963f46 Refactor basic config-related methods
Drop load_config, write_config, lock_config[_xx],
check_lock, check_protection, is_template and config_file
in favour of implementions in PVE::AbstractConfig.

Implement guest_type, __config_max_unused_disks,
config_file_lock and cfs_config_path from
PVE::AbstractConfig in PVE::QemuConfig.
2016-03-08 11:41:59 +01:00
Fabian Grünbichler
8317c759bf Drop skiplock from write_config
Since write_config was always called with skiplock=1 except
once, it makes sense to drop this parameter like in
PVE::LXC::write_config . If needed in the future, the
caller can use check_lock before write_config anyway.
2016-02-12 12:16:57 +01:00
Fabian Grünbichler
63be43a947 Refactor update_config_nolock -> write_config
The method update_config wrapped update_config_nolock
using lock_config, but to prevent update races the whole
"read config", "do something", "write config" flow was
always protected by lock_config anyway, and update_config
was never called.

Thus, we can safely drop update_config and rename
update_config_nolock to write_config like in PVE::LXC .
2016-02-12 12:14:52 +01:00
Wolfgang Link
386c6ba7f5 close tunnel after migration is finish.
if we do not close it, there is a change that the tunnel stays open and the next migration will not work.
2016-02-02 18:16:18 +01:00
Alexandre Derumier
42dbd2ee30 add qemu_machine_pxe
return machinename with .pxe suffix if a nic with pxe romfile exist

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2015-11-06 10:51:14 +01:00
Alexandre Derumier
7bac824e19 use qom-get to check if pxe file are used V2
fix qemu 2.4 pxe -> qemu 2.4 efi

Changelog : forget to add a check on qom-get result

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2015-11-06 07:55:07 +01:00
Wolfgang Bumiller
407e0b8bef migration: improve ipv6 case
Qemu parses hostnames in brackets correctly but sets an ipv6
flag for them as if they were ipv6 addresses, only insert
brackets for ipv6 addresses.
2015-11-06 07:53:03 +01:00
Alexandre Derumier
289e0b8564 migrate : add nocheck for resume
Users have reported resume bug when HA is used.

They seem to have a little race (bench show >0s < 1s) between the vm conf file move on source node and replication to,
and resume on target node.

I don't known why this is only with HA, maybe this occur will standard migration too.

Anyway, we don't need to read the vm config file to resume the vm on target host,
as we are sure that the vm is migrated, and config file move action is correct in the cluster.

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2015-10-15 12:41:13 +02:00
Wolfgang Bumiller
2fbd27eabc migration: put the source address in brackets
Always adding brackets around the address works. They're required for
ipv6 and qemu also accepts them for ipv4 and hostnames.
2015-05-21 17:30:30 +02:00
Wolfgang Bumiller
af0eba7e35 pass port family to next_*_port() calls 2015-05-12 12:28:56 +02:00
Wolfgang Link
adf8ac08c8 implement offline migration on zfs
Signed-off-by: Wolfgang Link <w.link@proxmox.com>
2015-04-27 10:42:59 +02:00
Wolfgang Link
37a6dc7809 fix bug #618: correct typo
Signed-off-by: Wolfgang Link <w.link@proxmox.com>
2015-04-27 10:42:49 +02:00