Commit Graph

17 Commits

Author SHA1 Message Date
Fiona Ebner
089aed811d cfg2cmd: netdev: fix value for tx_queue_size
Quoting from QEMU commit 4271f40383 ("virtio-net: correctly report
maximum tx_queue_size value"):

> Maximum value for tx_queue_size depends on the backend type.
> 1024 for vDPA/vhost-user, 256 for all the others.

> So the parameter is silently ignored and ethtool reports a different
> value than the one provided by the user.

Indeed, for a non-vDPA/vhost-user netdev, the guest will see TX: 256
instead of the specified 1024 here. With the mentioned QEMU commit (in
master and will be part of 8.1), using 1024 will be a hard error:

> Invalid tx_queue_size (= 1024), must be a power of 2 between 256 and 256

Since neither vhost-user, nor vhost-vdpa netdev types are exposed by
Proxmox VE, just changing the limit to the correct 256 should be fine.
No obvious issue during live-migration found.

Fixes: 620d6b32 ("virtio-net: increase defaults rx|tx-queue-size to 1024")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-07-27 13:14:12 +02:00
Dominik Csapak
9b71c34d61 enable cluster mapped PCI devices for guests
this patch allows configuring pci devices that are mapped via cluster
resource mapping when the user has 'Resource.Use' on the ACL path
'/mapping/pci/{ID}' (in  addition to the usual required vm config
privileges)

When given multiple mappings in the config, we use them as alternatives
for the passthrough, and will select the first free one on startup.
It is using our regular pci reservation mechanism for regular devices and
we introduce a selection mechanism for mediated devices.

A few changes to the inner workings were required to make this work well:
* parse_hostpci now returns a different structure where we have a list
  of lists (first level is for the different alternatives and second
  level is for the different devices that should be passed through
  together)
* factor out the 'parse_hostpci_devices' which parses each device from
  the config and does some precondition checks
* reserve_pci_usage now behaves slightly different when trying to
  reserve an device with the same VMID that's already reserved for,
  since for checking which alternative we can use, we already must
  reserve one (this means that qm showcmd can actually reserve devices,
  albeit only for up to 10 seconds)
* configuring a mediated device on a multifunction device is not
  supported anymore, and results in failure to start (previously, it
  just chose the first device to do it). This is a breaking change
* configuring a single pci device twice on different hostpci slots now
  fails during commandline generation instead on qemu start, so we had
  to adapt one test where this occurred (it could never have worked
  anyway)
* we now check permissions during clone/restore, meaning raw/real
  devices can only be cloned/restored by root@pam from now on.
  this is a breaking change.

Fixes #3574: Improve SR-IOV usability
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By:  Markus Frank <m.frank@proxmox.com>
2023-06-16 16:24:02 +02:00
Thomas Lamprecht
2ceb59d4b1 ovmf cmd assembly: reorder arguments
in preparation of reworking the new separate method for OVMF cmd
assembly, do this in a separate very targeted commit to make it more
clear that the next reworking-commit doesn't messes with our tests at
all.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-12-12 11:41:50 +01:00
Alexandre Derumier
620d6b328f virtio-net: increase defaults rx|tx-queue-size to 1024
This is reducing packet drop on high pps, and also needed for dpdk.

Redhat already have use it by default in rhev and his openstack platform too
since 2019.

I'm using it in production since 6 months, I don't have seen performance regression.

fix: (which ask for custom option, but setting it by default seem fine for me)

https://bugzilla.proxmox.com/show_bug.cgi?id=1546
https://bugzilla.proxmox.com/show_bug.cgi?id=2349
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2022-11-13 16:42:23 +01:00
Thomas Lamprecht
6884a7d7fa fix #4115: enable option to name QEMU threads after their main purpose
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-06-17 14:25:49 +02:00
Alexandre Derumier
c70e4ec397 memory: enable balloon free-page-reporting for auto-memory reclaim
Allow balloon device  driver to report hints of guest free pages to
the host, for auto memory reclaim

https://lwn.net/Articles/759413/
https://events19.linuxfoundation.org/wp-content/uploads/2017/12/KVMForum2018.pdf

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
[ T: fixup tests ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-27 11:08:50 +02:00
Thomas Lamprecht
cc18103635 cfg2cmd: switch off ACPI hotplug on bridges for q35 VMs
See commit 17858a1695 (hw/acpi/ich9: Set ACPI PCI hot-plug as default
on Q35)[0] in upstream QEMU repository for details about why the change
was made.

As that change affects systemds predictable interface naming[1],
e.g., by going from a previously `ens18` name to `enp6s18`, it may
have rather bad effects for users that did not setup some .link files
to enforce a specific naming by an more stable information like the
NIC's MAC-Address

The alternative would be making the preferred mode of hotplug an
option like `hotplug-mode=<acpi|pcie>`, but it does not seems like
one would like to change that much in the first place...

Note the changes to the tests and especially the tests with q35
machines that did not change.

[0]: https://gitlab.com/qemu-project/qemu/-/commit/17858a1695
[1]: https://www.freedesktop.org/software/systemd/man/systemd.net-naming-scheme.html#Naming

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-04 15:30:30 +01:00
Stefan Reiter
378ad769dd cfg2cmd: use long form QEMU parameters to avoid warning in 6.0
QEMU warns us about this:

kvm: -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait: warning: short-form boolean option 'server' deprecated
Please use server=on instead
kvm: -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait: warning: short-form boolean option 'nowait' deprecated
Please use wait=off instead
kvm: -vnc unix:/var/run/qemu-server/100.vnc,password: warning: short-form boolean option 'password' deprecated
Please use password=on instead

The new syntax is backwards compatible to at least QEMU 4.0.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-28 11:31:15 +02:00
Stefan Reiter
27b25d037e config_to_command: use -no-shutdown option
Ignore shutdowns triggered from within the guest in favor of detecting
them via qmeventd and stopping the QEMU process that way.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-05 11:22:47 +01:00
Stefan Reiter
c4581b9cc5 Rework get_cpu_options and allow custom CPU models
If a cputype is custom (check via prefix), try to load options from the
custom CPU model config, and set values accordingly.

While at it, extract currently hardcoded values into seperate sub and add
reasonings.

Since the new flag resolving outputs flags in sorted order for
consistency, adapt the test cases to not break. Only the order is
changed, not which flags are present.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Reviewed-By: Fabian Ebner <f.ebner@proxmox.com>
Tested-By: Fabian Ebner <f.ebner@proxmox.com>
2020-04-07 17:27:58 +02:00
Stefan Reiter
ac0077cc33 Use 'QEMU version' -> '+pve-version' mapping for machine types
The previously introduced approach can fail for pinned versions when a
new QEMU release is introduced. The saner approach is to use a mapping
that gives one pve-version for each QEMU release.

Fortunately, the old system has not been bumped yet, so we can still
change it without too much effort.

QEMU versions without a mapping are assumed to be pve0, 4.1 is mapped to
pve1 since thats what we had as our default previously.

Pinned machine versions (i.e. pc-i440fx-4.1) are always assumed to be
pve0, for specific pve-versions they'd have to be pinned as well (i.e.
pc-i440fx-4.1+pve1).

The new logic also makes the pve-version dynamic, and starts VMs with
the lowest possible 'feature-level', i.e. if a feature is only available
with 4.1+pve2, but the VM isn't using it, we still start it with
4.1+pve0.

We die if we don't support a version that is requested from us. This
allows us to use the pve-version as live-migration blocks (i.e. bumping
the version and then live-migrating a VM which uses the new feature (so
is running with the bumped version) to an outdated node will present the
user with a helpful error message and fail instead of silently modifying
the config and only failing *after* the migration).

$version_guard is introduced in config_to_command to use for features
that need to check pve-version, it automatically handles selecting the
newest necessary pve-version for the VM.

Tests have to be adjusted, since all of them now resolve to pve0 instead
of pve1. EXPECT_ERROR matching is changed to use 'eq' instead of regex
to allow special characters in error messages.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-02-12 10:32:57 +01:00
Dominik Csapak
844d8fa628 move the vmgenid device after readconfig on q35
and adapt the tests

this does not impact live migration, since the order here does not
change the device layout

we want this to consistently have the readconfig first

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-01-31 20:26:26 +01:00
Dominik Csapak
844b55fb89 fix #2510: hostpci: always check if device exists
if the user set a device as hostpci with the 'shorthand' syntax:

hostpciX: 00:12

we ignored it on starting and showcmd and continued.
Since the user explicitly wanted to passthrough a device, we now check
if there is actually a device with that id

for explicitly configured devices (00:12.1), we did not check if it exists,
but the kvm call failed with a non-obvious error message

now we always call 'lspci' from SysFSTools to check if it actually exists,
and fail if not. With this, we can drop the workaround for adding
'0000' if no domain was given, since lspci does it already for us

this fixes #2510, an issue with using mediated devices where the users did not have
the domain in the config, since we forgot to add the default domain there

the only issue with this patch is that it changes the behaviour of
'showcmd' slightly, as in now, we die if the device was explicitly
given, but did not exists (we showed the commandline, now we fail)

this also slightly changes the commandline for qemu (adding always
the domain), which is not a problem since we cannot live migrate
or snapshot such vms, but we have to adapt the tests

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-12-09 11:30:14 +01:00
Thomas Lamprecht
9471e48bf9 implement PVE Version addition for QEMU machine
With our QEMU 4.1.1 package we can pass a additional internal version
to QEMU's machine, it will be split out there and ignored, but
returned on a QMP 'query-machines' call.

This allows us to use it for increasing the granularity with which we
can roll-out HW layout changes/additions for VMs. Until now we
required a machine version bump, happening normally every major
release of QEMU, with seldom, for us irrelevant, exceptions.
This often delays rolling out a feature, which would break
live-migration, by several months. That can now be avoided, the new
"pve-version" component of the machine can be bumped at will, and
thus we are much more flexible.

That versions orders after the ($major, $minor) version components
from an stable release - it can thus also be reset on the next
release.

The implementation extends the qemu-machine REGEX, remembers
"pve-version" when doing a "query-machines" and integrates support
into the min_version and extract_version helpers.

We start out with a version of 1.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-25 16:43:38 +01:00
Thomas Lamprecht
05a427dfbc cfg2cmd test: hostpci, also specify exact PCIe devices
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-09-09 08:40:07 +02:00
Thomas Lamprecht
f66bff004c cfg2cmd test: fix hostpci tests, specify exact PCI device
as a hack to avoid that we hit a code path where the current ones
from the hosts are checked, and the command line option is added
depending if it exists or not.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-09-09 08:26:47 +02:00
Thomas Lamprecht
d192bd4d5b cfg2cmd tests: improve hostpci test by marking some as PCIe
and add a linux based one, which acts a bit different

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-09-06 19:45:40 +02:00