The drive_is_read_only() helper only applies to '-drive', but not
'-blockdev' and is only used in a single place. Inline it to avoid
accidental usages popping up in the future.
This also gets rid of a hidden dependency from Drive to QemuConfig.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812143900.138723-5-f.ebner@proxmox.com
With ide-hd, the inserted block node needs to be marked as writable
too, but -blockdev will complain if it's marked as writable but the
actual backing device is read-only (e.g. read-only base LV).
IDE/SATA do not support being configured as read-only, the most
similar is using ide-cd instead of ide-hd, with most of the code and
configuration shared in QEMU.
Since a template is never actually started, the front-end device is
never accessed. The backup only accesses the inserted block node, so
it does not matter for the backup if the type is 'ide-cd' instead.
The same issue did not manifest for '-drive', because the '-snapshot'
option is used for template backups. The '-snapshot' option does not
affect '-blockdev', from 'man kvm':
> snapshot is incompatible with -blockdev
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812143900.138723-4-f.ebner@proxmox.com
This is in preparation to remove the hidden dependency from the Drive
module to QemuConfig.
Note that the drive_is_read_only() can be replaced with $is_template
for OVMF, because the helper only behaves differently for IDE and
SATA, but not for EFI disks.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812143900.138723-2-f.ebner@proxmox.com
Re-using the detach() helper has the side effect of avoiding logging
errors to syslog for automatically removed child nodes. This should be
the case for all file nodes here. None are explicitly added via
blockdev-add and thus QEMU already auto-removes them.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812115652.79330-4-f.ebner@proxmox.com
Without passing 'noerr' to mon_cmd(), errors are logged to the system
journal. In attach() and detach(), there are two mon_cmd() calls that
are expected to fail in some scenarios for which the errors should not
be logged.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250812115652.79330-3-f.ebner@proxmox.com
Require that snapshot-as-volume-chain qcow2 images are always used in
combination with '-blockdev', rather than '-drive'. With '-drive', the
'discard-no-unref' option is not set and the fragmentation can lead to
the same issue that for '-blockdev', was solved by commit a3a9a2ab
("fix #6543: use qcow2 'discard-no-unref' option when using
snapshot-as-volume-chain").
While it would be possible to set the flag for '-drive' too, the
snapshot-as-volume-chain feature already only works with machine type
>= 10.0, see commit 6b2b45fd ("snapshot create/delete: die early for
snapshot-as-volume-chain for pre-10.0 machine version") and it's only
tested for those. Avoid accidents and other unknown issues by being
strict and prohibiting usage without '-blockdev'.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/all/20250811135154.253817-1-f.ebner@proxmox.com
As reported in the community forum [0], a running VM with pre-10.0
machine version using a storage with snapshot-as-volume-chain will run
into issues when creating a snapshot. Similarly deleting the snapshot
of such a VM would fail. Having '-blockdev' is a hard requirement for
the implementation of the snapshot-as-volume-chain feature for running
VMs, so die and suggest upgrading the machine version.
[0]: https://forum.proxmox.com/threads/lvm-thick-with-iscsi-pve-9-0-3.169319/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Hannes Laimer <h.laimer@proxmox.com>
Link: https://lore.proxmox.com/20250807104832.51784-1-f.ebner@proxmox.com
At this point, the dbus-vmstate helper is not expected to be running
anymore.
Using $noerr here didn't really make sense - as it never should be
running anymore at this point, plus the VM should also be stopped - thus
the "happy" path here is to fail removing the dbus-vmstate helper.
It resulted in another spurious warning _after_ a migration on the
source node.
Fixes: 067a0f55 ("vmstate: improve cleaning up dbus-vmstate and avoid spurious warning")
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Link: https://lore.proxmox.com/20250805095828.301188-1-c.heiss@proxmox.com
First, moving to vm_stop_cleanup(), which is a better fit for this.
It gets called by the cleanup API method in case of unclean shutdown or
from inside the guest.
In every case, the dbus-vmstate daemon should _never_ be running at this
point, as it is started only before migration and stopped directly after
migration, before vm_stop_cleanup() is even called. So it should only be
left running in case of a crash during migration.
Calling it anyway here ensures that the daemon is always (cleanly) shut
down. As the dbus-vmstate is part of the VM scope unit, that would it
tear it down too as a last resort.
Fixes the following spurious warning when a VM was shutdown from inside
the guest:
`failed to retrieve org.qemu.VMState1 owners: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get owners of name 'org.qemu.VMState1': no such name`
Reported-by: Hannes Duerr <h.duerr@proxmox.com>
Reported-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Link: https://lore.proxmox.com/20250804133002.1625925-1-c.heiss@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
We actually query if there are any guests positive affinity rule for
the to-be migrated VM, while that normally means the they will be
migrated, it doesn't has to be (e.g., node constraints might interfere
here), and "comigrated" is not as much used compared to
"dependencies", so that might be easier to understand for non-native
speakers or users (vs devs, these details tend to leak).
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
By returning an object instead of a array for the default the frontend
can get confused if it's using an iterator code that expects arrays.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Removing the first snapshot in a snapshot-as-volume-chain is done via
block-commit for performance reasons, rather than stream, because the
snapshot volume, being the base, is usually larger than the delta
since the snapshot.
When a drive has the 'ro' flag set in the virtual machine
configuration, all three nodes in the throttle->fmt->file chain are
opened with the read-only flag set and thus the format node could not
serve as the target for the stream operation.
Fix this, by temporarily re-opening the format node as writable. Note
that from the guest perspective, nothing changes, because the
read-only flag for the top throttle node is preserved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250731104935.53039-1-f.ebner@proxmox.com
This is a further improvement after commit 8e671e79 ("block job:
mirror: always detach the target node upon error"). It might be
that a cancelled job ends up in concluded state without an error
being set in the result of the 'query-block-jobs' QMP command and
the target node would not be detached. To fix it, also detach the
target node when cancelling the job. This is correct even when the job
was cancelled after completion, as in that case, the drive is not
switched over to use the target node.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
Link: https://lore.proxmox.com/20250731090956.23443-1-f.ebner@proxmox.com
Add information about positive and negative ha resource affinity rules,
which the VM is part of, to the migration precondition API endpoint.
These inform callees about any comigrated resources or blocking
resources that are caused by the resource affinity rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Link: https://lore.proxmox.com/20250730181428.392906-21-d.kral@proxmox.com
Instead of RSS, let's use the same PSS values as for the specific host
view as default, in case this value is not overwritten by the balloon
info.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Link: https://lore.proxmox.com/20250726010626.1496866-29-a.lauterer@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
no point in bloating the rather big vmstatus sub even further, this
way one can also add a descriptive name and comment.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The mem field itself will switch from the outside view to the "inside"
view if the VM is reporting detailed memory usage informatio via the
ballooning device.
Since sometimes other processes belong to a VM too, for example swtpm,
we collect all PIDs belonging to the VM cgroup and fetch their PSS data
to account for shared libraries used.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Link: https://lore.proxmox.com/20250726010626.1496866-28-a.lauterer@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[AL:
* rebased on current master
* switch to new, more generic read_cgroup_pressure function
* add pressures to return properties
]
Originally-by: Folke Gleumes <f.gleumes@proxmox.com>
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Link: https://lore.proxmox.com/20250726010626.1496866-27-a.lauterer@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
To ensure we got the relevant rules for conntrack migration available.
Only do a suggests on the newer nft based proxmox-firewall, we do not
have any hard-dependency on it anywhere currently.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This avoids having the handling for 'discard-no-unref' in two places.
In the tests, rename the relevant target images with a '-target'
suffix to test for them in the mocked volume_snapshot_info() helper.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250730150325.138087-3-f.ebner@proxmox.com
Would fail with an error
> Block format 'qcow2' does not support the option 'zeroinit:driver'
for a qcow2 target on a directory storage with
'snapshot-as-volume-chain'.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250730150325.138087-2-f.ebner@proxmox.com
After a successful live-migration, the old VM-specific conntrack entries
are not needed anymore on the source node and can thus be flushed.
Tested-by: Stefan Hanreich <s.hanreich@proxmox.com>
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Link: https://lore.proxmox.com/20250730094549.263805-9-c.heiss@proxmox.com
Fixes#5180 [0].
This implements for live-migration:
a) the dbus-vmstate is started on the target side, together with the VM
b) the dbus-vmstate helper is started on the source side
c) everything is cleaned up properly, in any case
It is currently off-by-default and must be enabled via the optional
`with-conntrack-state` migration parameter.
The conntrack entry migration is done in such a way that it can
soft-fail, w/o impacting the actual migration, i.e. considering it
"best-effort".
A failed conntrack entry migration does not have any real impact on
functionality, other than it might exhibit the problems as lined out in
the issue [0].
For remote migrations, only a warning is thrown for now. Cross-cluster
migration has stricter requirements and is not a "one-size-fits-it-all".
E.g. the most promentient issue if the network segmentation is
different, which would make the conntrack entries useless or require
careful rewriting.
[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5180
Tested-by: Stefan Hanreich <s.hanreich@proxmox.com>
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Link: https://lore.proxmox.com/20250730094549.263805-8-c.heiss@proxmox.com
First part to fixing #5180 [0].
Adds a simple D-Bus server which implements the `org.qemu.VMState1`
interface as specified in the QEMU documentation [1].
Using the built-in QEMU VMState machinery saves us from having to worry
about transfer and convergence of the data and letl QEMU take care of
it.
Any object on the D-Bus path `/org/qemu/VMState1` implementing that
interface will be called by QEMU during live-migration, iif the `Id`
property is registered within the `dbus-vmstate` QEMU object for a
specific VM.
The actual state loading/restoring is done via the conntrack(8) tool, a
small tool which already implements hard parts of interacting with the
conntrack subsystem via netlink.
Filtering is done on CONNMARK, which is set to the specific VMID for all
packets by the firewall.
Additionally, a custom `com.proxmox.VMStateHelper` interface is
implemented by the object, adding a small `Quit` method for cleanly
shutting down the daemon via the D-Bus API.
For all to work, D-Bus needs a policy describing who is allowed to
access the interface. [2]
Currently, there is a hard-limit of 1 MiB of state enforced by QEMU.
Typical conntrack state entries as dumped by conntrack(8) in the `save`
output format are just plaintext, ASCII lines and mostly around
150-200 characters. That translates then to about ~5200 entries that can
be migrated.
Such a typical line looks like:
-A -t 431974 -u SEEN_REPLY,ASSURED -s 10.1.0.1 -d 10.1.1.20 \
-r 10.1.1.20 -q 10.1.0.1 -p tcp --sport 48550 --dport 22 \
--reply-port-src 22 --reply-port-dst 48550 --state ESTABLISHED
In the future, compression could be implemented for these before sending
them to QEMU, which should increase the above number quite a bit - since
these entries are nicely compressible.
[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5180
[1] https://www.qemu.org/docs/master/interop/dbus-vmstate.html
[2] https://dbus.freedesktop.org/doc/dbus-daemon.1.html#configuration_file
Tested-by: Stefan Hanreich <s.hanreich@proxmox.com>
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Link: https://lore.proxmox.com/20250730094549.263805-7-c.heiss@proxmox.com
Similar to the already existing ones for CPU and QEMU machine support.
Very simple for now, only provides one property for now:
'has-dbus-vmstate' - Whether the dbus-vmstate is available/installed
Tested-by: Stefan Hanreich <s.hanreich@proxmox.com>
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Link: https://lore.proxmox.com/20250730094549.263805-6-c.heiss@proxmox.com
Die if the list of snapshots returned by `volume_snapshot_info` does
not contain the snapshot we are trying to delete.
Previously it was just assumed that the snapshot would be present,
leading to the entry in the list being autoviviefied by the following
lines of code. Hence, we tried to commit nonexistent snapshot states
here, as both `$parentsnap` and `$childsnap` would be undefined.
Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
Link: https://lore.proxmox.com/20250730103705.98313-3-s.sterz@proxmox.com
[FE: also include volid in error message]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>