Since an old change released with a version bump on 2009-09-07, we
search all enabled storages for VMID maching volumes on VM removal
and purge those too.
This has multiple pitfalls and may be quite unexpected for some
users.
It can make problems when:
* on recovery a VM is created, before disks are reattached the admin
notices some settings issues and chooses to just recreate the VM;
but during destroying the dummy VM all related disks get destroyed
unconditionally which may result in data loss. This actually
happened and is the original reason for the decision to change
this.
* a storage is shared between PVE instance (between a set of clusters
and/or single nodes), while this is against our rules it may still
come as a surprise if destroying a VM on node A may destroy
unrelated and unreferenced disks on the unrelated node B without
asking or allowing to avoid that.
As this the removal of matching but unreferenced disks can result in
permanent data loss (up to the last backup) and may be to subtle and
unforgiving, allow to opt-out of it.
In the long run we want to make this opt-in, but that is an API
change and so needs to wait for next major release. But, we can adapt
the GUI already to make it opt-in there, catching most of the cases.
side-note: CT do not have this behavior at all
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
While the do_import method cleans up the current disk it was
importing on any error the following cases are not handled:
* multiple disks, first few succeed then one fails, only the last
failed one was taken care of before this patch
* error after the import disk loop was not handled
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
On clone_vm when cloning the disks while the VM is running, we use
drive-mirror. We skip completion until the last disk, but with a
cloudinit disk there's no drive-mirror and so no completion done. If it
is the last disk in the hash, we never complete the drive-mirror jobs
and no further cloning is possible as there are already active jobs
using the disks.
To fix it we have to call qemu_drive_mirror_monitor directly in the case
of cloudinit when completion is requested and there are jobs defined.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
by doing it in a local directory instead of /var/lock/pve-manager, which is
used by the installed/non-test PVE code. This also covers the shared case,
which will become relevant after fixing #3229 (currently migration doesn't
touch disks on shared storages).
Reported-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
The phrasing left some room for speculation when this would be triggered.
E.g. after cloning a full VM?
Currently the only instances where it is used is when a disk is moved or
a VM migrated.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
It's not easily possible to use separate JSON files for the test configuration,
because part of it is generated with perl code. While this could be encoded too,
it seems cleaner to use the "run a single test by specifing the name"
functionality while adding a way for make to obtain a list of the test names.
Each test has (and needs) its own directory now, meaning the log files do not
need to be renamed anymore.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
We only added the format extension when it was not 'raw'. But on file level
storages we always require it. To fix this, always add the format
extension if the storage provides the 'path' property.
This is the same logic we use in create_disks for cloudinit disks.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
and the associated parts for 'qm start'.
Each test will first populate the MigrationTest/run directory
with the relevant configuration files and files keeping track of the
state of everything necessary. Second, the mock-script for migration
is executed, which in turn will execute the 'qm start' mock-script
(if it's an online test that gets far enough). The scripts will simulate
a migration and update the relevant files in the MigrationTest/run directory.
Finally, the main test script will evaluate the state.
The main checks are the volume IDs on the source and target and the VM
configuration itself. Additional checks are the vm_status and expected_calls,
keeping track if certain calls have been made.
The rationale behind creating two mock-scripts is two-fold:
1. It removes the need to hard code responses for the tunnel
and to recycle logic for determining and allocating migration volumes.
Some of that logic already happens in the API part, so it was necessary
to mock the whole CLI-Handler.
2. It allows testing the code relevant for migration in 'qm start' as well,
and it should even be possible to test different versions of the
mock-scripts against each other. With a bit of extra work and things
like 'git worktree', it might even be possible to automate this.
The helper get_patched config is useful to change pre-defined configuration
files on the fly, avoiding the new to explicitly define whole configurations to
test for something in many cases.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
by partially reverting 4df98f2f14 and fixing the
line-length issue differently. The commit didn't update two later usages of
$size, breaking copying the efidisk. The other usage as a parameter to
qemu_img_convert() is luckily only cosmetic, for progress output.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
this fixes the issue that we did not generate the correct repository
url for pbs storages that contained an ipv6 address or a port
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if one would try to use -v in a systemd service, systemd would disable
line buffering for stdout and no output would happen (until the buffer
is full)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
showing off it's monstrosity of a method signature, needs to be
cleaned up in a followup commit
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Extends print_recursive_hash for the CLI to handle JSON booleans so the
result will actually show up in 'qm status --verbose'.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Offline migrated volumes are now activated within storage_migrate.
Online migrated volumes can be assumed to be already active.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
while it didn't actually fail, we probably want to avoid the behavior:
With remove_job=full:
* run_replication called during migration causes the replicated volumes to
be removed
* migration continues by fully copying all volumes
With remove_job=local:
* run_replication called during migration causes the job (and local
replication snapshots) to be removed
* migration continues by fully copying all volumes and renaming them to
avoid collision with the still existing remote volumes
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
In some cases $self->{replicated_volumes} will be auto-vivified
to {} by checks like
next if $self->{replicated_volumes}->{$volid}
and then {} would evaluate to true in a boolean context.
Now the replication information is retrieved once in prepare,
and used to decide whether to make the calls or not.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
No need to warn twice, so the warning from the outside check
was removed.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
When the VM is in status 'shutdown', i.e. after the guest issues a
powerdown while a backup is running, QEMU requires a 'system_reset' to
be issued before 'cont' can boot the guest again.
Additionally, when the VM has been powered down during a backup, the
logically correct call would be a 'vm_start', so automatically vm_resume
from vm_start in case this situation occurs. This also means the GUI can
cope with this almost unchanged.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Ignore shutdowns triggered from within the guest in favor of detecting
them via qmeventd and stopping the QEMU process that way.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Now that VMs can be started during a backup, it makes sense to create a
dirty bitmap in these cases too, since the VM might be resumed and thus
continue running normally even after the backup is done.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Connect and send the vmid of the VM being backed up. This prevents
qmeventd from SIGTERMing the underlying QEMU instance, even if the guest
shuts itself down, until we close the socket connection (in cleanup,
which happens on success and abort, or if we crash the file handle will
be closed as well).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
'alarm' is used to schedule an additionaly cleanup round 5 seconds after
sending SIGTERM via terminate_client. This then sends SIGKILL via a
pidfd (if supported by the kernel) or directly via kill, making sure
that the QEMU process is *really* dead and won't be left behind in an
undetermined state.
This shouldn't be an issue under normal circumstances, but can help
avoid dead processes lying around if QEMU hangs after SIGTERM.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
We take care of killing QEMU processes when a guest shuts down manually.
QEMU will not exit itself, if started with -no-shutdown, but it will
still emit a "SHUTDOWN" event, which we await and then send SIGTERM.
This additionally allows us to handle backups in such situations. A
vzdump instance will connect to our socket and identify itself as such
in the handshake, sending along a VMID which will be marked as backing
up until the file handle is closed.
When a SHUTDOWN event is received while the VM is backing up, we do not
kill the VM. And when the vzdump handle is closed, we check if the
guest has started up since, and only if it's determined to still be
turned off, we then finally kill QEMU.
We cannot wait for QEMU directly to finish the backup (i.e. with
query-backup), as that would kill the VM too fast for vzdump to send the
last 'query-backup' to mark the backup as successful and done.
For handling 'query-status' messages sent to QEMU, a state-machine-esque
protocol is implemented into the Client struct (ClientState). This is
necessary, since QMP works asynchronously, and results arrive on the
same channel as events and even the handshake.
For referencing QEMU Clients from vzdump messages, they are kept in a
hash table. This requires linking against glib.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
by adding the missing argument (otherwise all the other ones are shifted
one slot to the left, which is of course bogus).
this has been broken since 2018 (d559309), but was only made
visible/caused a failure with the recent changes adding
use strict;
use warnings;
to PVE::QemuServer::PCI
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
if the 'backup' qmp call itself times out or fails, we still want to
try to cancel the backup, else it can happen that there is still
a backup running even when vzdump thinks it was canceled
qapi docs says that backup cancel always returns success, even
if no backup is running
since we hold a global and a per vm lock for the backup, this should be
ok, since we should not reach this code without that lock
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
We query QEMU if it's safe before enabling it, as on versions without
the necessary patches it not only would be useless, but can actually
lead to hangs.
PBS state is always migrated, as it's a small amount of data anyway, so
we don't need to set a specific flag for it.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>