remote migration uses a websocket connection to a task worker running on
the target node instead of commands via SSH to control the migration.
this websocket tunnel is started earlier than the SSH tunnel, and allows
adding UNIX-socket forwarding over additional websocket connections
on-demand.
the main differences to regular intra-cluster migration are:
- source VM config and disks are only removed upon request via --delete
- shared storages are treated like local storages, since we can't
assume they are shared across clusters (with potentical to extend this
by marking storages as shared)
- NBD migrated disks are explicitly pre-allocated on the target node via
tunnel command before starting the target VM instance
- in addition to storages, network bridges and the VMID itself is
transformed via a user defined mapping
- all commands and migration data streams are sent via a WS tunnel proxy
- pending changes and snapshots are discarded on the target side (for
the time being)
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the following two endpoints are used for migration on the remote side
POST /nodes/NODE/qemu/VMID/mtunnel
which creates and locks an empty VM config, and spawns the main qmtunnel
worker which binds to a VM-specific UNIX socket.
this worker handles JSON-encoded migration commands coming in via this
UNIX socket:
- config (set target VM config)
-- checks permissions for updating config
-- strips pending changes and snapshots
-- sets (optional) firewall config
- disk (allocate disk for NBD migration)
-- checks permission for target storage
-- returns drive string for allocated volume
- disk-import, query-disk-import, bwlimit
-- handled by PVE::StorageTunnel
- start (returning migration info)
- fstrim (via agent)
- ticket (creates a ticket for a WS connection to a specific socket)
- resume
- stop
- nbdstop
- unlock
- quit (+ cleanup)
this worker serves as a replacement for both 'qm mtunnel' and various
manual calls via SSH. the API call will return a ticket valid for
connecting to the worker's UNIX socket via a websocket connection.
GET+WebSocket upgrade /nodes/NODE/qemu/VMID/mtunnelwebsocket
gets called for connecting to a UNIX socket via websocket forwarding,
i.e. once for the main command mtunnel, and once each for the memory
migration and each NBD drive-mirror/storage migration.
access is guarded by a short-lived ticket binding the authenticated user
to the socket path. such tickets can be requested over the main mtunnel,
which keeps track of socket paths currently used by that
mtunnel/migration instance.
each command handler should check privileges for the requested action if
necessary.
both mtunnel and mtunnelwebsocket endpoints are not proxied, the
client/caller is responsible for ensuring the passed 'node' parameter
and the endpoint handling the call are matching.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
from GuestHelpers. This function checks all necessary permissions and
raises an exception if the user does not have the correct ones.
This is necessary for the new 'privileged' tags and 'user-tag-access'
permissions to work.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
The process for editing Cloud-init drives checked for inconsistent
permissions: for adding, the VM.Config.Disk permission was needed, while
the VM.Config.CDROM permission was needed to remove a drive. The regex
in drive_is_cloudinit needed to be adapted since the drive names have
different formats before/after they are actually generated.
Due to the regex letting names fall through before, Cloud-init drives
were being checked as disks, even though they are actually treated as
CDROM drives. Due to this, it makes more sense to check for
VM.Config.CDROM instead, while also requiring VM.Config.Cloudinit, since
generating a Cloud-init drive already generates default values that are
passed to the VM.
Signed-off-by: Leo Nunner <l.nunner@proxmox.com>
- The forced-remove flag wasn't really used AFAICT and makes
no sense IMO.
- Whether or not we care about non-MAC changes does not
belong here, but should instead taken into account in the
actual hotplug path recording the cloud-init state (iow.
into $cloudinit_record_changed().)
(This is not done here atm.)
- It seems much simpler to just have:
* 'old' = the old value if it's not a new value
* 'new' = the new value unless it's being deleted
* If only one of them is set it's an addition or removal.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
This allow to regenerate the config drive with 1 api call.
This also avoid to delete drive first, and recreate it again.
As it's a readonly drive, we can simply live update it,
and eject/replace it with qemu monitor
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
While the clamping already happens before setting the actual systemd
CPU{Shares, Weight}, it can be done here too, to avoid writing new
out-of-range values into the config.
Can't use a validator enforcing this because existing out-of-range
values should not become errors upon parsing the config.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
This will break possibly existing workflows like
1. add second cloud-init
2. remove first cloud-init
to change the cloud-init storage.
On the other hand, it avoids unintended misconfiguration of having
mutliple cloud-init drives with potentially different settings.
Also in preparation for adding cloud-init-related API calls, where
not being able to assume that there's only one cloud-init drive/state
would complicate things quite a bit.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Prevent the user from suspending the vm at all, as while suspension
itself may finish, the saved state is incomplete as we can neither
save nor restore PCIe device state in any generic fashion, so
resuming will almost certainly break.
The single case when it could work is when the guest OS didn't uses
the passed through device at all, so there's no state, but that's
really odd (as why bother passing through then), and the user should
rather remove the hostpci entry in that case.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[ T: reword commit message slightly ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
if the configured display hardware has the (optional) default type, but
some other attribute is set, this would match against `undef` and spew
lots of warnings in the logs.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Generalizes fd95d780 ("migrate: send updated TPM state volid to target
node") to also handle other offline migrated disks appearing in the
VM config, which currently should only be cloud-init.
Breaks migration new -> old under similar (edge-case-)conditions as
fd95d780 did.
Keep sending the 'tpmstate0' STDIN parameter to avoid breaking new ->
old in the scenario fd95d780 fixed.
Keep parsing the vm_start 'tpmstate0' STDIN parameter to avoid
breaking old -> new, and to be able to keep sending it.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
In preparation to allow passing along certain parameters together with
'archive'. Moving the parameter checks to after the
conflicts-with-'archive' to ensure that the more telling error will
trigger first.
All check helpers should handle empty params fine, but check first
just to make sure and to avoid all the superfluous function calls.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
In the spirit of c75bf16 ("qm importdisk: tell user to what VM disk we
actually imported"), and so that the information is not lost once qm
importdisk switches to re-using the API call.
Added for cloudinit too, because a new disk is allocated.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
via the special syntax <storeid>:<size>.
Not worth it by itself, but this is anticipating a new 'import-from'
parameter which is only used upon import/allocation, but shouldn't be
part of the schema for the config or other API enpoints.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Drive keys are sorted when cloning and 'tpmstate0' comes late, so it
was likely that potentially large disks were already copied just to be
removed again, because of the TPM state restriction at the end.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
otherwise users might get confused if they just get a message about a
migrate lock not being available..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
A to-be-deleted snapshot might be actively used by replication,
resulting in a not (or only partially) removed snapshot and locked
(snapshot-delete) VM. Simply wait a few seconds for any ongoing
replication.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
and also when source and target drivename are different. In those
cases, it is done via qemu-img convert/dd.
In preparation to allow import from existing PVE-managed disks.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
It's confusing that the config associated to the destination is
actually a reference to the source config for both existing callers.
Also, disk import will need to base the calculation on the passed-in
drive parameters and not just the current config, so this change is in
preparation for that too.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
While the new options should be written to the pending config, the
decisions (currently only one) in create_disks needs to be made for
the current config.
Seems to fix EFI disk creation, but actually, it's only
future-proofing, because, currently, the same OVMF_VARS file is
used independently of $smm.
The correct config is also needed to determine the correct size for
the EFI disk for the upcoming import-from feature.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
For creation, activation and size update never triggered, because the
passed in $conf is essentially the same as the creation $settings, so
the disk was always detected to be the same as the "existing" one. But
actually, all disks are new, so it makes sense to do it.
For update, activation and size update nearly always triggered,
because only the pending changes are passed in as $conf. The case
where it didn't trigger is when the same pending change was made twice
(there are cases where hotplug isn't done, but makes it even more
unlikely).
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
'force-cpu' parameter was introduced to allow live-migration of VMs with
custom CPU models; it does not need to be allowed for general use on
vm_start for regular users, since they would be able to set arbitrary
cpu types or cpuid parameters that aren't supported.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
else this fails if we check 'boot' before the device was put into
the config or pending section.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
using the familiar early+repeated checks pattern from other API calls.
Only intended functional changes are with regard to locking/forking.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
using the familiar early+repeated checks pattern from other API calls.
Only intended functional changes are with regard to locking/forking.
For a full clone of a running VM without guest agent, this also fixes
issuing vm_{resume,suspend} calls for drive mirror completion.
Previously, those just timed out, because of not getting the lock:
> create full clone of drive scsi0 (rbdkvm:vm-104-disk-0)
> Formatting '/var/lib/vz/images/105/vm-105-disk-0.raw', fmt=raw
> size=4294967296 preallocation=off
> drive mirror is starting for drive-scsi0
> drive-scsi0: transferred 2.0 MiB of 4.0 GiB (0.05%) in 0s
> drive-scsi0: transferred 635.0 MiB of 4.0 GiB (15.50%) in 1s
> drive-scsi0: transferred 1.6 GiB of 4.0 GiB (40.50%) in 2s
> drive-scsi0: transferred 3.6 GiB of 4.0 GiB (90.23%) in 3s
> drive-scsi0: transferred 4.0 GiB of 4.0 GiB (100.00%) in 4s, ready
> all 'mirror' jobs are ready
> suspend vm
> trying to acquire lock...
> can't lock file '/var/lock/qemu-server/lock-104.conf' - got timeout
> drive-scsi0: Cancelling block job
> drive-scsi0: Done.
> resume vm
> trying to acquire lock...
> can't lock file '/var/lock/qemu-server/lock-104.conf' - got timeout
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
The volid may change if local-storage migration is involved, we need
to tell the target node the new one and update the in-memory config
for starting the target VM accordingly.
Reported here: https://forum.proxmox.com/threads/99906/#post-431345
this possibly breaks migration new -> old iff
- spice is not used (else the explicit ticket wins because it comes
later)
- a local TPM state volume is used
- that local TPM state volume has a different volume id on the target
node (switched storage, volname already taken, ..)
because the target node will then mis-interpret the tpmstate0 line as
spice ticket and set it accordingly. if the old tpm state volume ID does
not exist on the target node, migration will fail. if it exists by
chance, it might work albeit with a wrong spice ticket (new because of
this patch) and tpm state volume (pre-existing breakage).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
only do the compat fallback if no explicit spice ticket was given, and
warn on unknown parameters on STDIN.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
this error path is mostly used for re-attaching disks and the like,
and the "check if task is already done" part uses a method to read
the task status that will never include a trailing newline, so add it
our self to avoid "... at /usr/share/perl5/PVE/API2/Qemu.pm line
1480. (500)"
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
to re-use them for incoming remote migrations.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Fabian Ebner <f.ebner@proxmox.com>
Using $update_vm_api for unused disks will cause them to end up as a
pending change if the VM is running.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
this broke with the previous simplification.
Tested-by: Aaron Lauterer <a.lauterer@proxmox.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>