Commit Graph

69 Commits

Author SHA1 Message Date
Thomas Lamprecht
5f8a64ae59 Merge patch series "more robust handling of fleecing images"
Fiona Ebner <f.ebner@proxmox.com> says:

Record the created fleecing images in the VM configuration to be able
to remove left-overs after hard failures.

Adds a new special configuration section 'fleecing', making special
section handling more generic as preparation, as well as fixing some
corner cases in configuration parsing and adding tests.

Fiona Ebner (16):
  migration: remove unused variable
  test: avoid duplicate mock module in restore config test
  test: add parse config tests
  parse config: be precise about section type checks
  test: add test case exposing issue with unknown sections
  parse config: skip unknown sections and warn about their presence
  vzdump: anchor matches for pending and special sections
  vzdump: skip all special sections
  config: make special section handling generic
  test: parse config: test config with duplicate sections
  parse config: warn about duplicate sections
  check type: require schema as an argument
  config: add fleecing section
  fix #5440: vzdump: better cleanup fleecing images after hard errors
  migration: attempt to clean up potential left-over fleecing images
  destroy vm: clean up potential left-over fleecing images

Link: https://lore.proxmox.com/20250127112923.31703-1-f.ebner@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-07 14:13:56 +02:00
Fiona Ebner
a39866732f fix #5440: vzdump: better cleanup fleecing images after hard errors
By recording the allocated fleecing images in the VM config, they
are not immediately orphaned, should a hard error occur during
backup that prevents cleanup.

They are attempted to be cleaned up during the next backup run.

In the cleanup helper, check if fleecing images are still attached in
QEMU and detach them. This allows recovering from more failure
scenarios. However, to avoid a deadlock, a left-over backup job needs
to be canceled first. While canceling a left-over backup already
happens when cleanup is done for a subsquent backup, it is required
for other cases that like cleanup before migration (to be added in a
following commit).

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-15-f.ebner@proxmox.com
2025-04-07 14:13:01 +02:00
Fiona Ebner
63567c0c42 config: make special section handling generic
Collect special sections below a common 'special-sections' key in
preparation to introduce a new special section.

The special 'cloudinit' section was added in the top-level of the
configuration structure, but it's cleaner to group special sections
more similar to snapshots.

The 'cloudinit' key was already initialized, so having the new
'special-sections' key be always initialized should not cause issues
after checking and adapting all usages of 'cloudinit' which this patch
attempts to do.

Add compat handling for remote migration which might receive the
configuration hash from a node that does not yet have the changes.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-10-f.ebner@proxmox.com
2025-04-07 14:13:01 +02:00
Fiona Ebner
1afbefb2d0 config: add special class that prevents writing the configuration
To be used in the context of template backup, where a minimized
temporary configuration is created to start the VM in 'prelaunch'
mode. Issue a warning, so that code paths where this happens will be
noted and can be evaluated and adapted.

Since the code currently doesn't use blessed config objects, special
dispatching is needed to potentially defer to the new child class in
the write_config() method.

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-02-18 15:06:04 +01:00
Markus Frank
5fc635cc6d migration: add check_non_migratable_resources function
The function checks for resources that cannot be migrated, snapshoted,
or suspended.

To run this function while the snapshot lock is active, the
pve-guest-common patch 'AbstractConfig: add abstract method to check for
resources preventing a snapshot.' is required.

Signed-off-by: Markus Frank <m.frank@proxmox.com>
2024-11-18 21:26:39 +01:00
Markus Frank
8082eb8ca1 config: define machine schema as property-string
Convert the machine parameter to a property-string and use the machine
type as the default key for backward compatibility.

Signed-off-by: Markus Frank <m.frank@proxmox.com>
2024-04-11 10:18:27 +02:00
Fabian Grünbichler
c1b2092d5e qemu_volume_snapshot_delete: drop (now) unused parameter
since we always determine the deviceid, passing in a possibly wrong value makes
no sense and could actually re-introduce bugs.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
2024-01-09 10:25:11 +01:00
Alexandre Derumier
7f8c808772 add memory parser
In preparation to add more properties to the memory configuration like
maximum hotpluggable memory and whether virtio-mem devices should be
used.

This also allows to get rid of the cyclic include of PVE::QemuServer
in the memory module.

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
[FE: also convert new usage in get_derived_property
     remove cyclic include of PVE::QemuServer
     add commit message]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-18 17:08:48 +02:00
Aaron Lauterer
0b7a0b78db qemuserver: foreach_volid: always include pending disks
All calling sites except for QemuConfig.pm::get_replicatable_volumes()
already enabled it. Making it the non-configurable default results in a
change in the VM replication.  Now a disk image only referenced in the
pending section will also be replicated.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2023-06-21 12:48:11 +02:00
Aaron Lauterer
6328c554c1 qemuserver: foreach_volid: include pending volumes
Make it possible to optionally iterate over disks in the pending section
of VMs, similar as to how snapshots are handled already.

This is for example useful in the migration if we don't want to rely on
the scanning of all storages.

All calling sites are adapted and enable it, except for
QemuConfig::get_replicatable_volumes as that would cause a change for
the replication if pending disks would be included.

The following lists the calling sites and if they should be fine with
the change (source [0]):

1. QemuMigrate: scan_local_volumes(): needed to include pending disk
   images
2. API2/Qemu.pm: check_vm_disks_local() for migration precondition:
   related to migration, so more consistent with pending
3. QemuConfig.pm: get_replicatable_volumes(): would change the behavior
   of the replication, will not use it for now.
4. QemuServer.pm: get_vm_volumes(): is used multiple times by:
4a. vm_stop_cleanup() to deactivate/unmap: should also be fine with
    including pending
4b. QemuMigrate.pm: in prepare(): part of migration, so more consistent
    with pending
4c. QemuMigrate.pm: in phase3_cleanup() for deactivation: part of
    migration, so more consistent with pending

[0] https://lists.proxmox.com/pipermail/pve-devel/2023-May/056868.html

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2023-06-21 12:48:11 +02:00
Fiona Ebner
3dd16f4a6c config: implement method to calculate derived properties from a config
See the corresponding commit in guest-common for more information.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-08 17:47:53 +02:00
Mira Limbeck
f9408af460 fix #4201: delete cloud-init disk on rollback
If the config doesn't contain the cloud-init disk anymore after the
rollback, we have to clean it up since otherwise no further disk can be
attached unless the one still existing on the storage is deleted.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Reviewed-by: Stefan Hanreich <s.hanreich@proxmox.com>
Tested-by: Stefan Hanreich <s.hanreich@proxmox.com>
2022-11-11 19:26:16 +01:00
Alexandre Derumier
ae37db2423 qemuconfig: load_current_config : delete cloudinit value
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-08 17:23:30 +01:00
Fabian Ebner
39b56b169a api: create disks: avoid adding secondary cloud-init drives
This will break possibly existing workflows like
1. add second cloud-init
2. remove first cloud-init
to change the cloud-init storage.

On the other hand, it avoids unintended misconfiguration of having
mutliple cloud-init drives with potentially different settings.

Also in preparation for adding cloud-init-related API calls, where
not being able to assume that there's only one cloud-init drive/state
would complicate things quite a bit.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-09-12 18:04:59 +02:00
Fiona Ebner
d064764a70 snapshot: save VM state: propagate error from QEMU
So that there is a better chance to debug issues like in [0]. For
suspending, which uses the same QMP calls, this is already done.

[0]: https://forum.proxmox.com/threads/114203/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-08-30 12:45:40 +02:00
Fabian Ebner
fe2c506926 snapshot: implement __snapshot_activate_storages
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-02-08 10:43:03 +01:00
Fabian Ebner
7a16336dd2 config: rollback is possible: add blockers parameter
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-11-08 10:34:00 +01:00
Stefan Reiter
27a5be5376 snapshot: set migration caps before savevm-start
A "savevm" call (both our async variant and the upstream sync one) use
migration code internally. As such, they both expect migration
capabilities to be set.

This is usually not a problem, as the default set of capabilities is ok,
however, it leads to differing snapshot settings if one does a snapshot
after a machine has been live-migrated (as the capabilities will persist
from that), which could potentially lead to discrepencies in snapshots
(currently it seems to be fine, but it still makes sense to set them to
safeguard against future changes).

Note that we do set the "dirty-bitmaps" capability now (if
query-proxmox-support reports true), which has three effects:

1) PBS dirty-bitmaps are preserved in snapshots, enabling
   fast-incremental backups to work after rollback (as long as no newer
   backups exist), including for hibernate/resume
2) snapshots taken from now on, with a QEMU version supporting bitmap
   migration, *might* lead to incompatibility of these snapshots with
   QEMU versions that don't know about bitmaps at all (i.e. < 5.0 IIRC?)
   - forward compatibility is still given, and all other capabilities we
   set go back to very old versions
3) since we now explicitly disable bitmap saving if the version doesn't
   report support, we avoid crashes even with not-updated QEMU versions

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-03-16 20:44:51 +01:00
Thomas Lamprecht
5c3f782554 snapshot: clear up log messages
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-11 14:06:02 +01:00
Thomas Lamprecht
983088730b snapshot: reduce logging rate after one minute
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-11 14:05:20 +01:00
Thomas Lamprecht
f97224b1ef snapshot: log storage where VM state is saved too
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-11 14:04:19 +01:00
Stefan Reiter
8828460b1d savevm: show information about drives during snapshot
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-08 16:35:24 +01:00
Stefan Reiter
969eb0b84d savevm: periodically print progress
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-08 16:35:24 +01:00
Thomas Lamprecht
d1c1af4b02 tree wide cleanup of s/return undef/return/
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-16 16:20:05 +02:00
Aaron Lauterer
185df962a5 vzdump: move include logic for volumes to method
Move the logic which volumes are included in the backup job to its own
method and adapt the VZDump code accordingly. This makes it possible to
develop other features around backup jobs.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2020-06-24 10:56:57 +02:00
Fabian Ebner
ae180b8f08 Include vmstate and unused volumes in foreach_volid
and refactor the test_volid closure. Like this get_replicatable_volumes doesn't
need a separate loop for unused volumes anymore. For get_vm_volumes, which is used
for activation/deactivation of volumes at migration and deactivation in vm_stop_cleanup,
includes those volumes now. For migration it's an improvement, because those volumes
might need to be migrated and for vm_stop_cleanup it shouldn't hurt. The last user
of foreach_volid is check_vm_disks_local used by migrate_vm_precondition,
where information about the additional volumes doesn't hurt either.

Note that replicate is (still) set by default, so the behavior for
get_replicatable_volumes for unused volumes should not change.

Hibernation vmstate files are now also included and recognized as 'is_vmstate'.
The 'size' attribute will not be overwritten by subsequent iterations for the
same volid anymore (a volid may appear both in the config and in snapshots),
so the size from the current config is now preferred.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-04-29 12:14:40 +02:00
Fabian Ebner
092538833e Allow parsing vmstate entries
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-04-08 22:11:54 +02:00
Fabian Ebner
912792e245 Switch to using foreach_volume instead of foreach_drive
It was necessary to move foreach_volid back to QemuServer.pm

In VZDump/QemuServer.pm and QemuMigrate.pm the dependency on
QemuConfig.pm was already there, just the explicit "use" was missing.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-04-08 22:11:54 +02:00
Stefan Reiter
ea1c21108f Include "-cpu" parameter with snapshots/suspend
Just like with live-migration, custom CPU models might change after a
snapshot has been taken (or a VM suspended), which would lead to a
different QEMU invocation on rollback/resume.

Save the "-cpu" argument as a new "runningcpu" option into the VM conf
akin to "runningmachine" and use as override during rollback/resume.

No functional change with non-custom CPU types intended.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-04-07 17:27:58 +02:00
Fabian Grünbichler
0c498cca36 vm_start: condense signature
as preparation for refactoring it further. remote migration will add
another 1-2 parameters, and it is already unwieldly enough as it is.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-04-01 17:42:15 +02:00
Fabian Ebner
beed0e3b0c Implement volume-related helpers and use new foreach_volume
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-03-30 09:59:48 +02:00
Fabian Ebner
a3597c9448 Use MAX_UNUSED_DISKS from Drive.pm to avoid duplication
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-03-07 18:23:57 +01:00
Dominik Csapak
66cebc4671 refactor finding of vmstate storage
we need that on another place, so refactor in its own sub

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-12-11 09:02:55 +01:00
Stefan Reiter
3392d6cacf refactor: extract QEMU machine related helpers to package
...PVE::QemuServer::Machine.

qemu_machine_feature_enabled is exported since it has a *lot* of users
in PVE::QemuServer and a long enough name as it is.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Stefan Reiter
0a13e08ec2 refactor: create QemuServer::Monitor for high-level QMP access
QMP and monitor helpers are moved from QemuServer.pm.

By using only vm_running_locally instead of check_running, a cyclic
dependency to QemuConfig is avoided. This also means that the $nocheck
parameter serves no more purpose, and has thus been removed along with
vm_mon_cmd_nocheck.

Care has been taken to avoid errors resulting from this, and
occasionally a manual check for a VM's existance inserted on the
callsite.

Methods have been renamed to avoid redundant naming:
* vm_qmp_command -> qmp_cmd
* vm_mon_cmd -> mon_cmd
* vm_human_monitor_command -> hmp_cmd

mon_cmd is exported since it has many users. This patch also changes all
non-package users of vm_qmp_command to use the mon_cmd helper. Includes
mocking for tests.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Stefan Reiter
babf613a08 refactor: split check_running into _exists_ and _running_
vm_exists_on_node in PVE::QemuConfig checks if a config file for a vmid
exists

vm_running_locally in PVE::QemuServer::Helpers checks if a VM is running
on the local machine by probing its pidfile and checking /proc/.../cmdline

check_running is left in QemuServer for compatibility, but changed to
simply call the two new helper functions.

Both methods are also correctly mocked for testing snapshots.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Stefan Reiter
d036e418a8 refactor: create QemuServer::Helpers and move file/dir code
Also remove unused $confdir variable in QemuConfig, but leave it and
$lock_dir there, since those paths should only be used with
cfs_config_path anyway.

nodename() is still called in multiple places, but since it's cached by
INotify it doesn't really matter.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Thomas Lamprecht
5c1d42b7f8 Fix #2171: vm_start: volid based statefiles were not activated
So, while we could just make this a special case before the
config_to_command call and set the $conf->{vmstate} to the statefile
for the case were it's a valid volumeid, the special case handling
get's much easier when we do this outside of that method.

So it's basically a trade-off, and after looking far to long at all
nice revisions Alwin made for me and Fabians request, and even trying
out different approaches, it was never perfect.

But having slight code duplication over the movement mess I proposed
(as I did not had the full picture then, sorry Alwin) felt like the
slightly nicer trade off, as all worked I just use this one now, it
has very clear semantics, easy to understand and that now three lines
are duplicated is IMO irrelevant.

Co-developed-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-10-17 19:20:32 +02:00
Aaron Lauterer
a022e3fdab tree-wide trailing whitespace cleanup
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2019-09-25 16:55:53 +02:00
Alwin Antreich
22ea69ca65 fix: qemu: uninitialized value in multiplication
A generated VM with default values does not set the memory key in the
configuration. Hence the size of the state file for a suspend had only
the default size of the state itself and not in addition twice the
configured memory.

The patch uses the static defaults from the JSON schema if the memory
key is not set.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2019-07-24 18:41:03 +02:00
Dominik Csapak
b3983fa1dc add correct comment about savevm-start
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-03-19 09:26:56 +01:00
Dominik Csapak
48b4cdc210 add statestorage parameter to suspend API
this makes it possible to give a storage for state saving, if one
wants to use a different storage than for snapshots or does not
want to save this info into the config

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-03-19 08:53:14 +01:00
Dominik Csapak
159719e55b implement suspend to disk for running vms
the idea is to have the same logic as with snapshots, but without
the snapshotting of the disks, and after saving the vm state (incl memory),
we hard shut off the guest.

this way the disks will not be touched anymore by the guest

to prevent any alteration of the vm (incl migration, hw changes, etc) we
add a config lock 'suspend'

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-03-19 08:53:14 +01:00
Dominik Csapak
566caaa4f1 correct comment about size
and remove comment that makes no sense

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-03-19 08:53:14 +01:00
David Limbeck
c9db224015 fix #1969: increase max unused disks
increase to 256, same as containers

Signed-off-by: David Limbeck <d.limbeck@proxmox.com>
2018-11-09 16:23:07 +01:00
Thomas Lamprecht
4f4d9772a5 rollback: vmgenid: comment why we regenerate vmgenid
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2018-09-19 14:01:05 +02:00
Dominik Csapak
6ee499fff8 fix #1908: add vmgenid config/device
this adds a VM Generation ID device uses by Windows (Server) to determine
some specific actions that may have happened with the vm
such as rollback, restore, etc.

see:

https://docs.microsoft.com/en-us/windows/desktop/hyperv_v2/virtual-machine-generation-identifier

for details on how it works and when it should change

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2018-09-19 12:23:24 +02:00
Thomas Lamprecht
e6d35c710c fixup comment formatting
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2018-09-17 15:49:31 +02:00
Dominik Csapak
c6737ef18b ensure correct machine type gets saved on snapshot
instead of overwriting the 'machine' config in the snapshot,
use its own 'runningmachine' config only for the snapshot

this way, we do not lose the machine type if it was
explicitely set during the snapshot, but deleted afterwards

we also have to adapt the tests for this

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2018-09-17 15:48:05 +02:00
Dominik Csapak
58b1a8d75d use new snapshot rollback hook to handle machine type
this was in guest-common, makes more sense to do it here - it's not
guest agnostic after all.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2018-09-17 15:43:37 +02:00