For a Proxmox VE managed volume, prefer the format from the storage
layer rather than the 'format' option set on the drive. Fail if there
is a mismatch between the detected and configured format, because this
is not expected for managed volumes. Having this early hard failure
protects against undesirable issues with live migration and reboot
where the format of a drive would suddenly be different.
For a not Proxmox VE managed volume, use the same logic as before,
i.e. use the 'format' option for the drive with 'raw' as a fallback:
Only root can configure such devices.
Both also apply to the case where the 'cdrom' flag is set to avoid
autodetection by QEMU.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
FG: typo fix in comment
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Since there are certain checks that depend on the QEMU binary version,
tests with a fixed QEMU binary version make it less likely to catch
issues on current setups, because for those, the QEMU binary version
will always be higher than in the tests.
Some of the affected tests explicitly mention the version, so set the
machine version for those. For the others, there's no real requirement
to test for a specific machine version either, so just use the latest.
This completes the transition for using machine version for tests
instead of QEMU binary version. The three remaining tests that set the
binary version explicitly want to test for it.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>
Since there are certain checks that depend on the QEMU binary version,
tests with a fixed QEMU binary version make it less likely to catch
issues on current setups, because for those, the QEMU binary version
will always be higher than in the tests.
Two of the affected tests explicitly mention the version, so set the
machine version for those. For the other two, there's no real
requirement to test for a specific machine version either, so just use
the latest.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>
Since there are certain checks that depend on the QEMU binary version,
tests with a fixed QEMU binary version make it less likely to catch
issues on current setups, because for those, the QEMU binary version
will always be higher than in the tests.
For all but one of the affected tests, there's no real requirement to
test for a specific machine version either, so just use the latest.
The exception is the 'netdev' test, because it should stay distinct
from the 'netdev-7.1' test.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>
The minimum supported version for a Proxmox VE 8 node is QEMU 8.0.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>
Since there are certain checks that depend on the QEMU binary version,
tests with a fixed QEMU binary version make it less likely to catch
issues on current setups, because for those, the QEMU binary version
will always be higher than in the tests.
Set the machine version, because these tests depend on that. The machine
version is what should actually be tested for rather than an old QEMU
binary version.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>
The minimum supported version for a Proxmox VE 8 node is QEMU 8.0.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>
Since there are certain checks that depend on the QEMU binary version,
tests with a fixed QEMU binary version make it less likely to catch
issues on current setups, because for those, the QEMU binary version
will always be higher than in the tests.
For the affected tests, there's no real requirement to test for a
specific machine version either, so just use the latest.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>
This is necessary to bump the minimum required version in
config_to_command() beyond 4.1.1, because otherwise there will be an
error message mismatch making the test fail.
Bump the tests all the way to 9.0.0, because that is the current version
and because then the test doesn't have to be touched again too soon
(only when wishing to bump the minimum required version beyond 9.0.0
in config_to_command()).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>
Since kernel 6.8, NVIDIAs vGPU driver does not use the generic mdev
interface anymore, since they relied on a feature there which is not
available anymore. IIUC the kernel [0] recommends drivers to implement
their own device specific features since putting all in the generic one
does not make sense.
They now have an 'nvidia' folder in the device sysfs path, which
contains the files `creatable_vgpu_types`/`current_vgpu_type` to
control the virtual functions model, and then the whole virtual function
has to be passed through (although without resetting and changing to the
vfio-pci driver).
This patch implements changes so that from a config perspective, it
still is an mediated device, and we map the functionality iff the device
has no mediated devices but the new NVIDIAs sysfsapi and the model name
is 'nvidia-<..>'
It behaves a bit different than mdevs and normal pci passthrough, as we
have to choose the correct device immediately since it's bound to the
pciid, but we must not bind the device to vfio-pci as the NVIDIA driver
implements this functionality itself.
When cleaning up, we iterate over all reserved devices (since for a
mapping we can't know at this point which was chosen besides looking at
the reservations) and reset the vgpu model to '0', so it frees up the
reservation from NVIDIAs side. (We also do that in a loop, since it's
not always immediately ready after QEMU closes)
A general problem (but that was previously also the case) is that a
showcmd (for a not running guest) reserves the pciids, which might block
an execution of a different real vm. This is now a bit more problematic
as we (temporarily) set the vgpu type then.
0: https://docs.kernel.org/driver-api/vfio-pci-device-specific-driver-acceptance.html
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Christoph Heiss <c.heiss@proxmox.com>
Reviewed-by: Christoph Heiss <c.heiss@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Since the only way this could happen is when we're being called
from 'qm showcmd' and there we don't want to reserve or create anything.
In case the VM was not running, we actually reserve the devices, so we
want to call 'cleanup_pci_devices' after to remove those again. This
minimizes the timespan where those devices are not available for real vm
starts.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Christoph Heiss <c.heiss@proxmox.com>
Reviewed-by: Christoph Heiss <c.heiss@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The version of the running QEMU binary is not related to the machine
version and so it's a bit confusing to have the helper in the
'Machine' module. It cannot live in the 'Helpers' module, because that
would lead to a cyclic inclusion Helpers <-> Monitor. Thus,
'QMPHelpers' is chosen as the new home.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
There is a possibility that the drive-mirror job is not yet done when
the migration wants to inactivate the source's blockdrives:
> bdrv_co_write_req_prepare: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
This can be prevented by using the 'write-blocking' copy mode (also
called active mode) for the mirror. However, with active mode, the
guest write speed is limited by the synchronous writes to the mirror
target. For this reason, a way to start out in the faster 'background'
mode and later switch to active mode was introduced in QEMU 8.2.
The switch is done once the mirror job for all drives is ready to be
completed to reduce the time spent where guest IO is limited.
The loop waiting for actively-synced to become true is not an endless
loop: Once the remaining dirty parts have been mirrored by the
background iteration, the actively-synced flag will be set. Because
the 'block-job-change' QMP command already succeeded, new writes will
be done synchronously to the target and thus not lead to new dirty
parts. If the job fails or vanishes (shouldn't actually happen,
because auto-dismiss is false), the loop will be exited and the error
propagated.
Reported rarely, but steadily over the years:
https://forum.proxmox.com/threads/78954/post-353651https://forum.proxmox.com/threads/78954/post-380015https://forum.proxmox.com/threads/100020/post-431660https://forum.proxmox.com/threads/111831/post-482425https://forum.proxmox.com/threads/111831/post-499807https://forum.proxmox.com/threads/137849/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
templates can only be started in context of a pbs backup, and there we
don't need or want to use most of the config, since they cannot be
started normally anyway.
We minimize the config by copying some specific relevant options (see
the comments for why the options were chosen) and all disk
configurations.
Since we change the qemu commandline for templates, we now have to adapt
the tests involving templates.
Without this, users can get into a situation where the template cannot
be backed up when there are some resources not available (such as cpu
cores, kvm, pci devices, etc.) even if the backup process does not need
them.
This change has some nice side effects, such as we don't need to
allocate the full amount of memory anymore for templates that have a
hostpci device configured, the configured bridges don't have to exist,
etc.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
during pbs backups, we need to start templates, so add a test for that.
We already have some tests for templates, but none with hostpci,tpm,
etc.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
vIOMMU enables the option to passthrough pci devices to L2 VMs in L1
VMs via Nested Virtualisation and adds an extra isolation.
Uses the new property-string from the "config: define machine schema
as property-string"-commit to add the viommu option to the machine
parameter.
Currently there are two vIOMMU implementation in QEMU to choose:
intel or virtio
Virtio-iommu is more recent but less used in production than intel-iommu.
The assert_valid_machine_property function prevents using intel-iommu with
i440fx.
Signed-off-by: Markus Frank <m.frank@proxmox.com>
[ TL: tiny coding style fix to extract variable inside if expr ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
add one test case for a spice display and one for std
Signed-off-by: Markus Frank <m.frank@proxmox.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
by remembering the 'forcemachine' parameter that's passed along when
starting the target instance.
In preparation to introduce a call to get_current_qemu_machine after
starting a VM to check for machine version deprecation.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
In preparation to turn the 'machine' parameter into a property string.
parse_property_string checks for the regex, therefore the test-cases
with 'somemachine' and 'someothermachine' would fail.
To avoid that, replace 'somemachine' and 'someothermachine' with 'q35'
and 'pc' with sed:
sed -i 's/somemachine/q35/g'
sed -i 's/someothermachine/pc/g'
Signed-off-by: Markus Frank <m.frank@proxmox.com>
[FE: improve wording in commit message]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Quoting from QEMU commit 4271f40383 ("virtio-net: correctly report
maximum tx_queue_size value"):
> Maximum value for tx_queue_size depends on the backend type.
> 1024 for vDPA/vhost-user, 256 for all the others.
> So the parameter is silently ignored and ethtool reports a different
> value than the one provided by the user.
Indeed, for a non-vDPA/vhost-user netdev, the guest will see TX: 256
instead of the specified 1024 here. With the mentioned QEMU commit (in
master and will be part of 8.1), using 1024 will be a hard error:
> Invalid tx_queue_size (= 1024), must be a power of 2 between 256 and 256
Since neither vhost-user, nor vhost-vdpa netdev types are exposed by
Proxmox VE, just changing the limit to the correct 256 should be fine.
No obvious issue during live-migration found.
Fixes: 620d6b32 ("virtio-net: increase defaults rx|tx-queue-size to 1024")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Only unit 0 for IDE is supported with machine type q35. Currently,
QEMU will fail startup with machine type q35 with an error like
> Can't create IDE unit 1, bus supports only 1 units
when ide1 or ide3 is configured.
Make sure to keep backwards compat form migration by leaving ide0 and
ide2 fixed. Since starting with ide1 or ide3 never worked, they can be
moved to a controller with a higher ID without issue.
Reported in the community forum:
https://forum.proxmox.com/threads/124615/post-543127https://forum.proxmox.com/threads/130815/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
None of the configured test storages support the content type iso
right now, just add it to cifs-store.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
When scanning all configured storages for disk images belonging to the
VM, the migration could easily fail if a storage is not available, but
enabled. That storage might not even be used by the VM at all.
By not scanning all storages and only looking at the disk images
referenced in the VM config, we can avoid unnecessary failures.
Some information that used to be provided by the storage scanning needs
to be fetched explicilty (size, format).
Behaviorally the biggest change is that unreferenced disk images will
not be migrated anymore. Only images referenced in the config will be
migrated.
The tests have been adapted accordingly.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
by adding them to their own list, saving the nodes where they are not
allowed, and return those on 'wantarray' so we don't break existing
callers that don't expect it.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Markus Frank <m.frank@proxmox.com>
this patch allows configuring pci devices that are mapped via cluster
resource mapping when the user has 'Resource.Use' on the ACL path
'/mapping/pci/{ID}' (in addition to the usual required vm config
privileges)
When given multiple mappings in the config, we use them as alternatives
for the passthrough, and will select the first free one on startup.
It is using our regular pci reservation mechanism for regular devices and
we introduce a selection mechanism for mediated devices.
A few changes to the inner workings were required to make this work well:
* parse_hostpci now returns a different structure where we have a list
of lists (first level is for the different alternatives and second
level is for the different devices that should be passed through
together)
* factor out the 'parse_hostpci_devices' which parses each device from
the config and does some precondition checks
* reserve_pci_usage now behaves slightly different when trying to
reserve an device with the same VMID that's already reserved for,
since for checking which alternative we can use, we already must
reserve one (this means that qm showcmd can actually reserve devices,
albeit only for up to 10 seconds)
* configuring a mediated device on a multifunction device is not
supported anymore, and results in failure to start (previously, it
just chose the first device to do it). This is a breaking change
* configuring a single pci device twice on different hostpci slots now
fails during commandline generation instead on qemu start, so we had
to adapt one test where this occurred (it could never have worked
anyway)
* we now check permissions during clone/restore, meaning raw/real
devices can only be cloned/restored by root@pam from now on.
this is a breaking change.
Fixes#3574: Improve SR-IOV usability
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Markus Frank <m.frank@proxmox.com>
The commands snapshot-drive and delete-drive-snapshot have been unused
by qemu-server since commit eba2b721 ("use qemu's blockdev-snapshot
functions") and are now going to be dropped in our QEMU builds too, so
get rid of these left-overs.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
like the deprecation message printed by QEMU suggests.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Even if between single quotes, the dollar sign needs to be escaped
here. Otherwise, there will be an error
> Search pattern not terminated at -e line 1.
and no migration tests would be run. The error did not lead to
aborting though, making it harder to notice.
Fixes: aac89f6c ("tests: avoid calling test script to get target names")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
As otherwise we couple *all* Makefile targets to the dependencies of
the test script, even for a simple make call (e.g., done on building
the source), so use a much simpler heuristic that just depends on
perl, which is essential in Debian.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Previously, cloning a stopped VM didn't respect bwlimit. Passing the -r
(ratelimit) parameter to qemu-img convert fixes this issue.
Signed-off-by: Leo Nunner <l.nunner@proxmox.com>
[ T: reword subject line slightly ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Since commit 9b6efe43 ("migrate: add live-migration of replicated
disks") live-migration with replicated volumes is possible. When
handling the replication, it is checked that all local volumes
previously detected as replicatable are actually replicated. So the
check if migration with snapshots is possible can just allow volumes
that are detected as replicatable.
Note that VM state files are also replicated.
If there is an invalid configuration with a non-replicatable volume or
state file and replication is enabled, then replication will fail, and
thus migration will fail early.
Trying to live-migrate to a non-replication target (needs --force)
will still fail if there are snapshots, because they are (correctly)
detected as non-replicated.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The 'nbd-server-add' QMP command has been deprecated since QEMU 5.2 in
favor of a more general 'block-export-add'.
When using 'nbd-server-add', QEMU internally converts the parameters
and calls blk_exp_add() which is also used by 'block-export-add'. It
does one more thing, namely calling nbd_export_set_on_eject_blk() to
auto-remove the export from the server when the backing drive goes
away. But that behavior is not needed in our case, stopping the NBD
server removes the exports anyways.
It was checked with a debugger that the parameters to blk_exp_add()
are still the same after this change. Well, the block node names are
autogenerated and not consistent across invocations.
The alternative to using 'query-block' would be specifying a
predictable 'node-name' for our '-drive' commandline. It's not that
difficult for this use case, but in general one needs to be careful
(e.g. it can't be specified for an empty CD drive, but would need to
be set when inserting a CD later). Querying the actual 'node-name'
seemed a bit more future-proof.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
in preparation of reworking the new separate method for OVMF cmd
assembly, do this in a separate very targeted commit to make it more
clear that the next reworking-commit doesn't messes with our tests at
all.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This is reducing packet drop on high pps, and also needed for dpdk.
Redhat already have use it by default in rhev and his openstack platform too
since 2019.
I'm using it in production since 6 months, I don't have seen performance regression.
fix: (which ask for custom option, but setting it by default seem fine for me)
https://bugzilla.proxmox.com/show_bug.cgi?id=1546https://bugzilla.proxmox.com/show_bug.cgi?id=2349
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
at the end of a live migration, we need to remove old mac entries
on source host (vm is not yet stopped), before resume vm on target host
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
[T: resolve conflicts and rework on apply ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>