Commit Graph

2784 Commits

Author SHA1 Message Date
Thomas Lamprecht
7585466269 vm destroy: allow opt-out of purging unreferenced disks
Since an old change released with a version bump on 2009-09-07, we
search all enabled storages for VMID maching volumes on VM removal
and purge those too.

This has multiple pitfalls and may be quite unexpected for some
users.

It can make problems when:
* on recovery a VM is created, before disks are reattached the admin
  notices some settings issues and chooses to just recreate the VM;
  but during destroying the dummy VM all related disks get destroyed
  unconditionally which may result in data loss. This actually
  happened and is the original reason for the decision to change
  this.

* a storage is shared between PVE instance (between a set of clusters
  and/or single nodes), while this is against our rules it may still
  come as a surprise if destroying a VM on node A may destroy
  unrelated and unreferenced disks on the unrelated node B without
  asking or allowing to avoid that.

As this the removal of matching but unreferenced disks can result in
permanent data loss (up to the last backup) and may be to subtle and
unforgiving, allow to opt-out of it.

In the long run we want to make this opt-in, but that is an API
change and so needs to wait for next major release. But, we can adapt
the GUI already to make it opt-in there, catching most of the cases.

side-note: CT do not have this behavior at all

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-01-25 15:22:46 +01:00
Thomas Lamprecht
4405703d89 qm: import ovf: removed all imported disks on error
While the do_import method cleans up the current disk it was
importing on any error the following cases are not handled:
* multiple disks, first few succeed then one fails, only the last
  failed one was taken care of before this patch
* error after the import disk loop was not handled

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-01-25 14:58:57 +01:00
Mira Limbeck
1b485263b3 fix drive-mirror completion with cloudinit
On clone_vm when cloning the disks while the VM is running, we use
drive-mirror. We skip completion until the last disk, but with a
cloudinit disk there's no drive-mirror and so no completion done. If it
is the last disk in the hash, we never complete the drive-mirror jobs
and no further cloning is possible as there are already active jobs
using the disks.

To fix it we have to call qemu_drive_mirror_monitor directly in the case
of cloudinit when completion is requested and there are jobs defined.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2021-01-25 14:30:35 +01:00
Gilles Pietri
211785ee50 audio: add the none audio backend
Signed-off-by: Gilles Pietri <contact+dev@gilouweb.com>
2021-01-12 12:30:31 +01:00
Thomas Lamprecht
0435f8798c buildsys: clean: remove migration test runtime files
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-01-12 12:01:41 +01:00
Fabian Ebner
c97a9c6ed8 tests: mock storage locking for migration tests
by doing it in a local directory instead of /var/lock/pve-manager, which is
used by the installed/non-test PVE code. This also covers the shared case,
which will become relevant after fixing #3229 (currently migration doesn't
touch disks on shared storages).

Reported-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-01-12 11:56:55 +01:00
Aaron Lauterer
0a4aff09bd improve description of fstrim_cloned_disks
The phrasing left some room for speculation when this would be triggered.
E.g. after cloning a full VM?

Currently the only instances where it is used is when a disk is moved or
a VM migrated.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2020-12-18 17:55:29 +01:00
Fabian Ebner
0494299f57 tests: allow running migration tests in parallel
It's not easily possible to use separate JSON files for the test configuration,
because part of it is generated with perl code. While this could be encoded too,
it seems cleaner to use the "run a single test by specifing the name"
functionality while adding a way for make to obtain a list of the test names.

Each test has (and needs) its own directory now, meaning the log files do not
need to be renamed anymore.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-12-18 17:47:27 +01:00
Thomas Lamprecht
16782e5fdf bump version to 6.3-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-12-16 08:35:37 +01:00
Mira Limbeck
c997e24af5 fix cloning/restoring of cloudinit disks in raw format
We only added the format extension when it was not 'raw'. But on file level
storages we always require it. To fix this, always add the format
extension if the storage provides the 'path' property.
This is the same logic we use in create_disks for cloudinit disks.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2020-12-15 16:17:32 +01:00
Thomas Lamprecht
eabd73492a d/control: bump versioned dependency on libpve-guest-common-perl (>= 3.1-3)
for new move VM config helper

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-12-15 15:56:49 +01:00
Fabian Ebner
48831384b8 create test environment for migration
and the associated parts for 'qm start'.

Each test will first populate the MigrationTest/run directory
with the relevant configuration files and files keeping track of the
state of everything necessary. Second, the mock-script for migration
is executed, which in turn will execute the 'qm start' mock-script
(if it's an online test that gets far enough). The scripts will simulate
a migration and update the relevant files in the MigrationTest/run directory.
Finally, the main test script will evaluate the state.

The main checks are the volume IDs on the source and target and the VM
configuration itself. Additional checks are the vm_status and expected_calls,
keeping track if certain calls have been made.

The rationale behind creating two mock-scripts is two-fold:
1. It removes the need to hard code responses for the tunnel
   and to recycle logic for determining and allocating migration volumes.
   Some of that logic already happens in the API part, so it was necessary
   to mock the whole CLI-Handler.
2. It allows testing the code relevant for migration in 'qm start' as well,
   and it should even be possible to test different versions of the
   mock-scripts against each other. With a bit of extra work and things
   like 'git worktree', it might even be possible to automate this.

The helper get_patched config is useful to change pre-defined configuration
files on the fly, avoiding the new to explicitly define whole configurations to
test for something in many cases.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-12-15 15:21:37 +01:00
Fabian Ebner
eb3acec88a migration: sort volumes migrated with storage_migrate
Having a deterministic order here is useful for testing.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-12-15 15:21:37 +01:00
Fabian Ebner
7d730f953c migration: factor out starting remote tunnel
so it can be mocked when testing.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-12-15 15:21:37 +01:00
Fabian Ebner
27fa645e66 use new move_config_to_node method
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-12-15 15:21:37 +01:00
Fabian Ebner
b5688f69a0 clone_disk: fix offline clone of efidisk
by partially reverting 4df98f2f14 and fixing the
line-length issue differently. The commit didn't update two later usages of
$size, breaking copying the efidisk. The other usage as a parameter to
qemu_img_convert() is luckily only cosmetic, for progress output.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-12-15 14:50:06 +01:00
Thomas Lamprecht
e00319af4d api: adapt VM destroy description
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-12-15 14:49:41 +01:00
Dominic Jäger
3eecc92525 qm destroy: Extend --purge description
Add replication jobs & HA. This makes the enumeration complete.

Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
2020-12-15 14:46:14 +01:00
Thomas Lamprecht
6711e689c6 bump version to 6.3-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-12-03 18:08:04 +01:00
Thomas Lamprecht
82d30d5acf d/control: bump versioned dependency for libpve-common-perl
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-12-03 17:26:24 +01:00
Dominik Csapak
fbec3f894a use get_repository from PVE::PBSClient
this fixes the issue that we did not generate the correct repository
url for pbs storages that contained an ipv6 address or a port

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-12-03 17:25:32 +01:00
Thomas Lamprecht
012b520e5b bump version to 6.3-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-25 14:33:38 +01:00
Dominik Csapak
e8e0fd93bf qmeventd: flush after verbose printing
if one would try to use -v in a systemd service, systemd would disable
line buffering for stdout and no output would happen (until the buffer
is full)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-11-25 14:33:38 +01:00
Thomas Lamprecht
3bae384f75 clone disk: avoid errors after disk was moved by QEMU
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-25 14:18:23 +01:00
Thomas Lamprecht
1b987638a8 api: cleanup code format of clone_disk call
showing off it's monstrosity of a method signature, needs to be
cleaned up in a followup commit

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-25 14:18:23 +01:00
Thomas Lamprecht
a2af1bbe89 add and use get_qga_key
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-25 14:18:23 +01:00
Fabian Grünbichler
e5b18771b8 status: skip query-proxmox-support if VM is offline
otherwise pvestatd will print lots of warnings..

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-11-25 11:26:37 +01:00
Stefan Reiter
6891fd70ed print query-proxmox-support result in 'full' status
Extends print_recursive_hash for the CLI to handle JSON booleans so the
result will actually show up in 'qm status --verbose'.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-24 17:20:56 +01:00
Fabian Grünbichler
1b535ca9f9 d/control: bump versioned dependency on pve-storage
for 'activate_volumes in storage_migrate', which we now rely on in
migration code

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-11-24 16:28:08 +01:00
Fabian Ebner
e219712561 deactivate volumes after storage_migrate
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-11-24 16:19:35 +01:00
Fabian Ebner
78bd57d9c3 adapt to new storage_migrate activation behavior
Offline migrated volumes are now activated within storage_migrate.
Online migrated volumes can be assumed to be already active.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-11-24 16:19:29 +01:00
Alexandre Derumier
6cbd3eb82c systemd scope: add CPUWeight for cgroupv2 2020-11-24 12:00:38 +01:00
Alexandre Derumier
5b65b00d04 replace cgroups_write by cgroup change_cpu_shares && change_cpu_quota 2020-11-24 12:00:38 +01:00
Wolfgang Bumiller
114d2e765a add PVE::QemuServer::Cgroup
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-11-24 12:00:33 +01:00
Thomas Lamprecht
ff0721517d bump version to 6.2-20
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-12 17:09:44 +01:00
Thomas Lamprecht
c0d096321a stop qmeventd after pve-guests and pve-ha-lrm services
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Co-developed-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-12 17:06:33 +01:00
Fabian Ebner
19ff368213 don't migrate replicated VM whose replication job is marked for removal
while it didn't actually fail, we probably want to avoid the behavior:

With remove_job=full:
    * run_replication called during migration causes the replicated volumes to
      be removed
    * migration continues by fully copying all volumes

With remove_job=local:
    * run_replication called during migration causes the job (and local
      replication snapshots) to be removed
    * migration continues by fully copying all volumes and renaming them to
      avoid collision with the still existing remote volumes

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-11-09 10:08:22 +01:00
Fabian Ebner
c2c96d7378 fix checks for transfering replication state/switching job target
In some cases $self->{replicated_volumes} will be auto-vivified
to {} by checks like
next if $self->{replicated_volumes}->{$volid}
and then {} would evaluate to true in a boolean context.

Now the replication information is retrieved once in prepare,
and used to decide whether to make the calls or not.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-11-09 10:08:22 +01:00
Fabian Ebner
68980d6626 Repeat check for replication target in locked section
No need to warn twice, so the warning from the outside check
was removed.

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-11-09 10:08:22 +01:00
Thomas Lamprecht
1749a376dc bump version to 6.2-19
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-05 13:37:03 +01:00
Stefan Reiter
8e0c97bbbf fix vm_resume and allow vm_start with QMP status 'shutdown'
When the VM is in status 'shutdown', i.e. after the guest issues a
powerdown while a backup is running, QEMU requires a 'system_reset' to
be issued before 'cont' can boot the guest again.

Additionally, when the VM has been powered down during a backup, the
logically correct call would be a 'vm_start', so automatically vm_resume
from vm_start in case this situation occurs. This also means the GUI can
cope with this almost unchanged.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-05 11:22:47 +01:00
Stefan Reiter
27b25d037e config_to_command: use -no-shutdown option
Ignore shutdowns triggered from within the guest in favor of detecting
them via qmeventd and stopping the QEMU process that way.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-05 11:22:47 +01:00
Stefan Reiter
962d4d647d vzdump: use dirty bitmap for not running VMs too
Now that VMs can be started during a backup, it makes sense to create a
dirty bitmap in these cases too, since the VM might be resumed and thus
continue running normally even after the backup is done.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-05 11:22:47 +01:00
Stefan Reiter
4ac842cbab vzdump: connect to qmeventd for duration of backup
Connect and send the vmid of the VM being backed up. This prevents
qmeventd from SIGTERMing the underlying QEMU instance, even if the guest
shuts itself down, until we close the socket connection (in cleanup,
which happens on success and abort, or if we crash the file handle will
be closed as well).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-05 11:22:47 +01:00
Stefan Reiter
4c500f1696 qmeventd: add last-ditch effort SIGKILL cleanup
'alarm' is used to schedule an additionaly cleanup round 5 seconds after
sending SIGTERM via terminate_client. This then sends SIGKILL via a
pidfd (if supported by the kernel) or directly via kill, making sure
that the QEMU process is *really* dead and won't be left behind in an
undetermined state.

This shouldn't be an issue under normal circumstances, but can help
avoid dead processes lying around if QEMU hangs after SIGTERM.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-05 11:22:47 +01:00
Stefan Reiter
3ff8500175 qmeventd: add handling for -no-shutdown QEMU instances
We take care of killing QEMU processes when a guest shuts down manually.
QEMU will not exit itself, if started with -no-shutdown, but it will
still emit a "SHUTDOWN" event, which we await and then send SIGTERM.

This additionally allows us to handle backups in such situations. A
vzdump instance will connect to our socket and identify itself as such
in the handshake, sending along a VMID which will be marked as backing
up until the file handle is closed.

When a SHUTDOWN event is received while the VM is backing up, we do not
kill the VM. And when the vzdump handle is closed, we check if the
guest has started up since, and only if it's determined to still be
turned off, we then finally kill QEMU.

We cannot wait for QEMU directly to finish the backup (i.e. with
query-backup), as that would kill the VM too fast for vzdump to send the
last 'query-backup' to mark the backup as successful and done.

For handling 'query-status' messages sent to QEMU, a state-machine-esque
protocol is implemented into the Client struct (ClientState). This is
necessary, since QMP works asynchronously, and results arrive on the
same channel as events and even the handshake.

For referencing QEMU Clients from vzdump messages, they are kept in a
hash table. This requires linking against glib.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-05 11:22:47 +01:00
Fabian Grünbichler
acfc6ef8e0 fix #3113: unbreak drive hotplug
by adding the missing argument (otherwise all the other ones are shifted
one slot to the left, which is of course bogus).

this has been broken since 2018 (d559309), but was only made
visible/caused a failure with the recent changes adding

use strict;
use warnings;

to PVE::QemuServer::PCI

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-11-05 10:29:21 +01:00
Thomas Lamprecht
8c9021cd69 bump version to 6.2-18
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-29 18:23:23 +01:00
Dominik Csapak
edae17185b partially fix #3056: try to cancel backup without uuid
if the 'backup' qmp call itself times out or fails, we still want to
try to cancel the backup, else it can happen that there is still
a backup running even when vzdump thinks it was canceled

qapi docs says that backup cancel always returns success, even
if no backup is running

since we hold a global and a per vm lock for the backup, this should be
ok, since we should not reach this code without that lock

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-10-29 18:19:06 +01:00
Stefan Reiter
acc10e5159 migrate: enable dirty-bitmap migration
We query QEMU if it's safe before enabling it, as on versions without
the necessary patches it not only would be useless, but can actually
lead to hangs.

PBS state is always migrated, as it's a small amount of data anyway, so
we don't need to set a specific flag for it.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-10-29 18:18:02 +01:00