This is reducing packet drop on high pps, and also needed for dpdk.
Redhat already have use it by default in rhev and his openstack platform too
since 2019.
I'm using it in production since 6 months, I don't have seen performance regression.
fix: (which ask for custom option, but setting it by default seem fine for me)
https://bugzilla.proxmox.com/show_bug.cgi?id=1546https://bugzilla.proxmox.com/show_bug.cgi?id=2349
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
at the end of a live migration, we need to remove old mac entries
on source host (vm is not yet stopped), before resume vm on target host
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
[T: resolve conflicts and rework on apply ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
In theory we can have a config with netX records that do not specify
a `macaddr` property, we just auto-generate on in config2cmd for
startup transitively, but don't save that explicitly back to the
config; so while we could parse the /proc/$pid/cmdline or try to get
the info from QMP (not fully straight forward) it seems rather a
hassle; especially if one has in mind that this cannot happen via the
API FWICT; as there a "deletion" *saves* a newly auto generated value
out to the config, same with clone of a VM and restore of a backup.
So, in basically all reasonable cases we got the `macaddr` available,
but if we don't it makes no sense to add a FDB variable for a *newly*
generated one by the parse_net call, as the VM won't use that (well,
at least if one doesn't get "lucky" and it randomly re-generates the
same as on startup), so allow telling parse_net to skip auto
generating MACs and use that in the add-fdb-entries helper
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
On plain VM start (no live migration), we can simply add MAC address
into the fdb. In case of a live migration, we add the mac address
just before the resume.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
If the config doesn't contain the cloud-init disk anymore after the
rollback, we have to clean it up since otherwise no further disk can be
attached unless the one still existing on the storage is deleted.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Reviewed-by: Stefan Hanreich <s.hanreich@proxmox.com>
Tested-by: Stefan Hanreich <s.hanreich@proxmox.com>
same as with the extended support for more usb devices, allow
hotplugging for guests that can use the qemu-xhci controller which
require a machine type >= 7.1 and a ostype l26 or windows > 7
if no usb device was passed through on startup, dynamically add
the xhci controller (and remove if the last usb device is unplugged)
so that live migration is still possible
much of the usb hotplug code was already there, but it still needed
a few adaptions, for example we have to add a chardev when adding
a spice redir port (that gets automatically removed when the
usb-redir device gets removed)
since the spice devices use the id 'usbredirdevX' instead of 'usbX', we
have to manually map that a bit around
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
for machine versions >= 7.1 and ostype linux or windows > 7, we use the
qemu-xhci controller where we have up to 14 usable ports, so make them
available to the user
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
going by reports in the forum (e.g. [0]) and semi-official qemu
information[1], we should prefer qemu-xhci over nec-usb-xhci
for compatibility purposes, we guard that behind the machine version,
so that guests with a fixed version don't suddenly have a different usb
controller after a reboot (which could potentially break some hardcoded
guest configs)
0: https://forum.proxmox.com/threads/proxmox-usb-connect-disconnect-loop.117063/
1: https://www.kraxel.org/blog/2018/08/qemu-usb-tips/
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
should not happen normally, but an inattentive user of that function
may forget to check the validity of the parsed device, so err
on the safe side here
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
just outside of context, we already save the result from
machine_type_is_q35 into the $q35 variable, but never use it.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Reuse the PVE::CpuSet to validate cpuset formatting.
Add new qemu property called 'affinity' to store the cpuset.
Push taskset command in front of kvm if 'affinity' is set.
Signed-off-by: Daniel Bowder <daniel@bowdernet.com>
mdevs have a host-unique UUID they are indexed with in the PCI-id
independent `/sys/bus/mdev/devices/<uuid>` path, so there is no need
to go through the PCI id for them.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if the preparing of PCI devices or the start of the VM fails, we need
to cleanup the PCI devices (reservations *and* mdevs), or else it
might happen that there are leftovers which must be manually removed.
to include also mdevs now, refactor the cleanup code from
'vm_stop_cleanup' into it's own function, and call that instead of
only 'remove_pci_reservation'
also simplifies the code, such that it now removes all PCI ids
reserved for that VMID, since we cannot have multiple VMs with the
same VMID anyway
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Previously, only a plaintext line in the task log showed something was off.
Now, the GUI will show it as a warning.
Reviewed-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>
This allow to regenerate config drive if pending values exist
when we change vm options.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
This allow to regenerate the config drive with 1 api call.
This also avoid to delete drive first, and recreate it again.
As it's a readonly drive, we can simply live update it,
and eject/replace it with qemu monitor
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Instead using vm pending options for pending cloudinit generated config,
write current generated cloudinit config in a new [special:cloudinit] SECTION.
Currently, some options like vm name, nic mac address can be hotplugged,
so they are not way to know if the cloud-init disk is already updated.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
While the clamping already happens before setting the actual systemd
CPU{Shares, Weight}, it can be done here too, to avoid writing new
out-of-range values into the config.
Can't use a validator enforcing this because existing out-of-range
values should not become errors upon parsing the config.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
while making it take the value directly instead of the config.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
set aliases for the previous ones for backward compat.
There's still cleanup potential, e.g., for snapshots, but to do that
nicely we may need (or want) to extend CLIHandler to accept commands
without fixed params also on the command group itself.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Since kernel 5.15, there is an issue with io_uring when used in
combination with CIFS [0]. Unfortunately, the kernel developers did
not suggest any way to resolve the issue and didn't comment on my
proposed one. So for now, just disable io_uring when the storage is
CIFS, like is done for other storage types that had problematic
interactions.
It is rather easy to reproduce when writing large amounts of data
within the VM. I used
dd if=/dev/urandom of=file bs=1M count=1000
to reproduce it consistently, but your mileage may vary.
Some forum reports about users running into the issue [1][2][3].
[0]: https://www.spinics.net/lists/linux-cifs/msg26734.html
[1]: https://forum.proxmox.com/threads/109848/
[2]: https://forum.proxmox.com/threads/110464/
[3]: https://forum.proxmox.com/threads/111382/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
if the preparing of pci devices or the start of the vm fails, we need
to cleanup the pci devices (reservations *and* mdevs), or else
it might happen that there are leftovers which must be manually removed.
to include also mdevs now, refactor the cleanup code from 'vm_stop_cleanup'
into it's own function, and call that instead of only 'remove_pci_reservation'
also print the errors of the cleanup steps with 'warn', otherwise we
might discard important errors
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
allowing slower or overloaded systems a higher chance to finish
commands while not being to long to be problematic for sync api calls
with their 30s total budget
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the volume path could contain escaped ":" or ",", which means their '\'
needs to be escaped another time for passing to HMP.
the same approach is used for hotplugging regular drives in
PVE::QemuServer, and is needed (at least) for RBD storages with IPv6
monhosts or an explicit monhost port.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
This will break possibly existing workflows like
1. add second cloud-init
2. remove first cloud-init
to change the cloud-init storage.
On the other hand, it avoids unintended misconfiguration of having
mutliple cloud-init drives with potentially different settings.
Also in preparation for adding cloud-init-related API calls, where
not being able to assume that there's only one cloud-init drive/state
would complicate things quite a bit.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
So that there is a better chance to debug issues like in [0]. For
suspending, which uses the same QMP calls, this is already done.
[0]: https://forum.proxmox.com/threads/114203/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
we need to do the independent of is_custom_model to ensure the
reported model is understood by QEMU
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reported-by: Fiona Ebner <f.ebner@proxmox.com>
the former CPU type never existed on the market and will be dropped
by QEMU 7.1, so map it to the server variant as they're pretty much
identical anyway FIWCT.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Prevent the user from suspending the vm at all, as while suspension
itself may finish, the saved state is incomplete as we can neither
save nor restore PCIe device state in any generic fashion, so
resuming will almost certainly break.
The single case when it could work is when the guest OS didn't uses
the passed through device at all, so there's no state, but that's
really odd (as why bother passing through then), and the user should
rather remove the hostpci entry in that case.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[ T: reword commit message slightly ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
like for other drive-related operations. The default of 3 seconds is
just not enough for large (or slow) disks.
Reported in the community forum:
https://forum.proxmox.com/threads/49543/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
the created backups are encrypted, but are not restorable with the
master key in case the original PVE system is lost.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
if the key file doesn't exist (anymore), but the storage.cfg references
one, die when starting a backup that should use encryption instead of
falling back to plain-text operations.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
When passing through an NVIDIA vGPU via mediated devices, their
software needs the qemu process to have the 'uuid' parameter set to the
one of the vGPU. Since it's currently not possible to pass through multiple
vGPUs to one VM (seems to be an NVIDIA driver limitation at the moment),
we don't have to take care about that.
Sadly, the place we do this, it does not show up in 'qm showcmd' as we
don't (want to) query the pci devices in that case, and then we don't
have a way of knowing if it's an NVIDIA card or not. But since this
is informational with QEMU anyway, i'd say we can ignore that.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Before, the two strings were one single string each, rather than multiple
separated by newlines.
In the docs, this looked very strange as there were linebreaks and the
dots were shown. Can be seen e.g. in api-viewer /nodes/{node}/qemu/{vmid}/config.
Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
we forgot to give the namespace parameter here, so do that.
while we're at it, give the pbs options as a hash instead of adding
another parameter.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
if the configured display hardware has the (optional) default type, but
some other attribute is set, this would match against `undef` and spew
lots of warnings in the logs.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
since this output is printed to the command line it should
be encoded to avoid the wide character warnings
Signed-off-by: Stefan Hrdlicka <s.hrdlicka@proxmox.com>
and exit early if they are not met.
The necessary libraries were taken from Thomas' post in our community
forum:
https://forum.proxmox.com/threads/.61801/post-466767 (ff)
The /dev/dri/renderD.* check is based on util/drm.c in the current
qemu source code.
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Generalizes fd95d780 ("migrate: send updated TPM state volid to target
node") to also handle other offline migrated disks appearing in the
VM config, which currently should only be cloud-init.
Breaks migration new -> old under similar (edge-case-)conditions as
fd95d780 did.
Keep sending the 'tpmstate0' STDIN parameter to avoid breaking new ->
old in the scenario fd95d780 fixed.
Keep parsing the vm_start 'tpmstate0' STDIN parameter to avoid
breaking old -> new, and to be able to keep sending it.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
In preparation to allow passing along certain parameters together with
'archive'. Moving the parameter checks to after the
conflicts-with-'archive' to ensure that the more telling error will
trigger first.
All check helpers should handle empty params fine, but check first
just to make sure and to avoid all the superfluous function calls.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
To be consistent with PBS's implementation of multi-line comments
remove "\s*" here too. Since the regex isn't lazy .* matches
everything \s* would anyway. (Note that new lines occurs after "$").
Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
When phase2() is aborted after the migration already converged, then
after migrate_cancel, the VM might be in POSTMIGRATE state.
(There also is a conditional for SHUTDOWN state in QEMU's
migration_iteration_finish(), so it's likely possible to end up there
if the VM is shut down at the right time during migration, but no need
to resume then).
Detect the POSTMIGRATE state and resume the VM if it wasn't paused at
the beginning of the migration. There is no direct way to go to
PAUSED, so just print an error if the VM was paused at the beginning
of the migration.
Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
by re-using the same hash that's used when allocating/activating the
disks in the helpers doing the opposite.
Also in preparation to allow skipping certain disks upon restore.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Also cannot issue a guest agent command in that case.
Reported in the community forum:
https://forum.proxmox.com/threads/106618
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
It's only available since QEMU 6.2 and doing a check here rather than
bumping the package dependency allows for easy downgrades.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
In the spirit of c75bf16 ("qm importdisk: tell user to what VM disk we
actually imported"), and so that the information is not lost once qm
importdisk switches to re-using the API call.
Added for cloudinit too, because a new disk is allocated.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
via the special syntax <storeid>:<size>.
Not worth it by itself, but this is anticipating a new 'import-from'
parameter which is only used upon import/allocation, but shouldn't be
part of the schema for the config or other API enpoints.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Co-developed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
[split into its own patch + minor improvements/style fixes]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
[renamed API handler, since it's not an index]
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Drive keys are sorted when cloning and 'tpmstate0' comes late, so it
was likely that potentially large disks were already copied just to be
removed again, because of the TPM state restriction at the end.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
because when the VM ID of target and source are the same,
qemu_drive_mirror_monitor() switches the QEMU device node over to the
new backing image. The planned import-from functionality makes it
possible to run into this, although for an a bit unusual use case.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
otherwise users might get confused if they just get a message about a
migrate lock not being available..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
A to-be-deleted snapshot might be actively used by replication,
resulting in a not (or only partially) removed snapshot and locked
(snapshot-delete) VM. Simply wait a few seconds for any ongoing
replication.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Necessary to import from an existing storage using block-device
volumes like ZFS.
Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
[split into its own patch]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
For disk import, it should be based on the disk properties that are
passed in rather than on those of a possibly pre-existing disk in the
config.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
and also when source and target drivename are different. In those
cases, it is done via qemu-img convert/dd.
In preparation to allow import from existing PVE-managed disks.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
It's confusing that the config associated to the destination is
actually a reference to the source config for both existing callers.
Also, disk import will need to base the calculation on the passed-in
drive parameters and not just the current config, so this change is in
preparation for that too.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
While the new options should be written to the pending config, the
decisions (currently only one) in create_disks needs to be made for
the current config.
Seems to fix EFI disk creation, but actually, it's only
future-proofing, because, currently, the same OVMF_VARS file is
used independently of $smm.
The correct config is also needed to determine the correct size for
the EFI disk for the upcoming import-from feature.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
For creation, activation and size update never triggered, because the
passed in $conf is essentially the same as the creation $settings, so
the disk was always detected to be the same as the "existing" one. But
actually, all disks are new, so it makes sense to do it.
For update, activation and size update nearly always triggered,
because only the pending changes are passed in as $conf. The case
where it didn't trigger is when the same pending change was made twice
(there are cases where hotplug isn't done, but makes it even more
unlikely).
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Avoids the error
adding drive failed: Duplicate ID 'drive-scsi1' for drive
that could happen when switching over to a new disk (e.g. via qm set),
if unplugging wasn't fast enough.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
'force-cpu' parameter was introduced to allow live-migration of VMs with
custom CPU models; it does not need to be allowed for general use on
vm_start for regular users, since they would be able to set arbitrary
cpu types or cpuid parameters that aren't supported.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
Using a loop of freeze, sleep 5, thaw, sleep 5, an idling Windows 11
VM with 4 cores and 8GiB RAM once took 54 seconds for thawing. It took
less than a second about 90% of the time and maximum of a few seconds
for the majortiy of other cases, but there can be outliers where 10
seconds is not enough.
And there can be hookscripts executed upon thaw, which might also not
complete instantly.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
The refactoring in 36d4bdcb86 missed
this. The check is already done as part of the following check_storage
call.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
When restoring a backup and the storage the disks would be created on
doesn't allow 'images', the process errors without cleanup.
This is the same behaviour we currently have when the storage is
disabled.
Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>
Reviewed-by: Fabian Ebner <f.ebner@proxmox.com>
Tested-by: Fabian Ebner <f.ebner@proxmox.com>
preparation for also clamping on hotplug and lower the minimum in the
schema so that the full v2 range can be used.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
it's potentially expensive to check and the user already needs to
explicitly turn auto-encoding off, besides QEMU/QGA should handle
that and just error out gracefully on bogus base64 values.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
by adding an optional parameter 'encode' (enabled by default). When it
is disabled, the content must be base64 encoded already. This
way, users can send a binary file to the vm by base64 encoding it
themselves
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
besides the log calls these don't need any parts of the migration state,
so let's make them generic and re-use them for container migration and
replication in the future.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
else this fails if we check 'boot' before the device was put into
the config or pending section.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
when passing a config from one cluster to another, we want to be strict
when parsing - it's better to fail the migration early and upgrade the
target node instead of failing the migration later (when significant
work for transferring disks and/or state has already been done) or not
at all, but silently lose config settings that the target doesn't
understand.
this also might be helpful in other cases - e.g. when restoring from a
backup.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
since we are going to reuse the same mechanism/code for network bridge
mapping and pve-container.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
While existing callers are not using the parameter after the call,
the modification is rather unexpected and could lead to bugs quickly.
Also avoid setting an undef value in the hash, but use delete instead.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>