vIOMMU enables the option to passthrough pci devices to L2 VMs in L1
VMs via Nested Virtualisation and adds an extra isolation.
Uses the new property-string from the "config: define machine schema
as property-string"-commit to add the viommu option to the machine
parameter.
Currently there are two vIOMMU implementation in QEMU to choose:
intel or virtio
Virtio-iommu is more recent but less used in production than intel-iommu.
The assert_valid_machine_property function prevents using intel-iommu with
i440fx.
Signed-off-by: Markus Frank <m.frank@proxmox.com>
[ TL: tiny coding style fix to extract variable inside if expr ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Convert the machine parameter to a property-string and use the machine
type as the default key for backward compatibility.
Signed-off-by: Markus Frank <m.frank@proxmox.com>
Upon obtaining the device type, a check is performed to determine if it
is a CD drive. It is important to note that Cloudinit drives are always
assigned as CD drives. If the drive has not yet been allocated, the test
will fail due to the unset cd attribute.
To avoid this, an explicit check is now performed to determine if it is
a Cloudinit drive that has not yet been assigned.
Fixes: d1feab4 ("fix #4957: add vendor and product information passthrough for SCSI-Disks")
Signed-off-by: Hannes Duerr <h.duerr@proxmox.com>
When attempting a CPU hotplug on an architecture other than x86_64, die
with a clean error instead of attempting a hotplug with a known
non-working device command line. Also move the corresponding FIXME up to
the error.
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
could be a better fit in PVE::Tools, like proposed by Filip, but OTOH.
Tools is already crowded as is, so wait if we need it on more places
outside of qemu-server.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Make the default value for 'kvm' consistent, taking into account
whether the VM will run on the same CPU architecture as the host.
This would be a breaking change to CPU hotplug for VMs with a
different CPU architecture running on an x86_64 host, as in this case
the default CPU type for CPU hotplug changes from 'kvm64' to 'qemu64'.
However, CPU hotplug of non x86_64 architectures is not supported
anyway, so this is not a breaking change after all.
It should be noted that this change does alter the CPU hotplug
behaviour when emulating an x86_64 CPU on a non-x86_64 host. This is
however not officially supported in Proxmox VE.
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
Instead of starting a VM with a 32-bit CPU type and a 64-bit OVMF image,
throw an error before starting the VM telling the user that OVMF is not
supported on 32-bit CPU types.
To obtain a list of 32-bit CPU types, refer to the builtin_x86_defs in
target/i386/cpu.c of QEMU. Exclude any entries that have the long mode
feature (CPUID_EXT2_LM).
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
PVE::Storage::path() neither activates the storage of the passed-in volume, nor
does it ensure that the returned value is actually a file or block device, so
this actually fixes two issues. PVE::Storage::abs_filesystem_path() actually
takes care of both, while still calling path() under the hood (since $volid
here is always a proper volid, unless we change the cicustom schema at some
point in the future).
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
adds vendor and product information for SCSI devices to the json schema
and checks in the VM create/update API call if it is possible to add
these to QEMU as a device option
Signed-off-by: Hannes Duerr <h.duerr@proxmox.com>
[FE: add missing space to exception message
use config option for exception e.g. scsi0 rather than 'product'
style fixes]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Encapsulation of the functionality for determining the scsi device type
in a new function for reusability in QemuServer/Drive.pm
Signed-off-by: Hannes Duerr <h.duerr@proxmox.com>
by adding a comment and grouping the code better. See the PVE QEMU
patch "PVE: Allow version code in machine type" for reference. The way
the code was written previously made it look like a bug where
$pve_version might be overwritten multiple times.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The default VM startup timeout is `max(30, VM memory in GiB)` seconds.
Multiple reports in the forum [0] [1] and the bug tracker [2] suggest
this is too short when using PCI passthrough with a large amount of VM
memory, since QEMU needs to map the whole memory during startup (see
comment #2 in [2]). As a result, VM startup fails with "got timeout".
To work around this, set a larger default timeout if at least one PCI
device is passed through. The question remains how to choose an
appropriate timeout. Users reported the following startup times:
ref | RAM | time | ratio (s/GiB)
---------------------------------
[1] | 60G | 135s | 2.25
[1] | 70G | 157s | 2.24
[1] | 80G | 277s | 3.46
[2] | 65G | 213s | 3.28
[2] | 96G | >290s | >3.02
The data does not really indicate any simple (e.g. linear)
relationship between RAM and startup time (even data from the same
source). However, to keep the heuristic simple, assume linear growth
and multiply the default timeout by 4 if at least one `hostpci[n]`
option is present, obtaining `4 * max(30, VM memory in GiB)`. This
covers all cases above, and should still leave some headroom.
[0]: https://forum.proxmox.com/threads/83765/post-552071
[1]: https://forum.proxmox.com/threads/126398/post-592826
[2]: https://bugzilla.proxmox.com/show_bug.cgi?id=3502
Suggested-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
In preparation to add more properties to the memory configuration like
maximum hotpluggable memory and whether virtio-mem devices should be
used.
This also allows to get rid of the cyclic include of PVE::QemuServer
in the memory module.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
[FE: also convert new usage in get_derived_property
remove cyclic include of PVE::QemuServer
add commit message]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
moving qemu_{device,object}{add,del} helpers there for now.
In preparation to remove the cyclic include of PVE::QemuServer in the
memory module and generally for better modularity in the future.
No functional change intended.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
PVE::QemuServer::check_running() does both
PVE::QemuConfig::assert_config_exists_on_node()
PVE::QemuServer::Helpers::vm_running_locally()
The former one isn't needed here when doing hotplug, because the API
already assert that the VM config exists. It also would introduce a
new cyclic dependency between PVE::QemuServer::Memory <->
PVE::QemuConfig with the proposed virtio-mem patch set.
In preparation to remove the cyclic include of PVE::QemuServer in the
memory module.
No functional change intended.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
which is the only user of the parse_numa() helper. While at it, avoid
the duplication of MAX_NUMA.
In preparation to remove the cyclic include of PVE::QemuServer in the
memory module.
No functional change intended.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Commit efa3355d ("fix #3428: cloudinit: add parameter for upgrade on
boot") changed the default, but this is a breaking change. The bug
report was only about making the option configurable.
The commit doesn't give an explicit reason for why, and arguably,
doing the upgrade is not an issue for most users. It also leads to a
different cloud-init instance ID, because of the different setting,
which in turn leads to ssh host key regeneration within the VM.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
this patch allows configuring pci devices that are mapped via cluster
resource mapping when the user has 'Resource.Use' on the ACL path
'/mapping/pci/{ID}' (in addition to the usual required vm config
privileges)
When given multiple mappings in the config, we use them as alternatives
for the passthrough, and will select the first free one on startup.
It is using our regular pci reservation mechanism for regular devices and
we introduce a selection mechanism for mediated devices.
A few changes to the inner workings were required to make this work well:
* parse_hostpci now returns a different structure where we have a list
of lists (first level is for the different alternatives and second
level is for the different devices that should be passed through
together)
* factor out the 'parse_hostpci_devices' which parses each device from
the config and does some precondition checks
* reserve_pci_usage now behaves slightly different when trying to
reserve an device with the same VMID that's already reserved for,
since for checking which alternative we can use, we already must
reserve one (this means that qm showcmd can actually reserve devices,
albeit only for up to 10 seconds)
* configuring a mediated device on a multifunction device is not
supported anymore, and results in failure to start (previously, it
just chose the first device to do it). This is a breaking change
* configuring a single pci device twice on different hostpci slots now
fails during commandline generation instead on qemu start, so we had
to adapt one test where this occurred (it could never have worked
anyway)
* we now check permissions during clone/restore, meaning raw/real
devices can only be cloned/restored by root@pam from now on.
this is a breaking change.
Fixes#3574: Improve SR-IOV usability
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Markus Frank <m.frank@proxmox.com>
this patch allows configuring usb devices that are mapped via
cluster resource mapping when the user has 'Mapping.Use' on the ACL
path '/mapping/usb/{ID}' (in addition to the usual required vm config
privileges)
for now, this is only valid if there is exactly one mapping for the
host, since we don't track passed through usb devices yet
This now also checks permissions on clone/restore, meaning a
'non-mapped' device can only be cloned/restored as root@pam user.
That is a breaking change.
Refactor the checks for restoring into a sub, so we have central place
where we can add such checks
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Markus Frank <m.frank@proxmox.com>
similar to how we handle the PCI module and format. This makes the
'verify_usb_device' method and format unnecessary since
we simply check the format with a regex.
while doing tihs, i noticed that we don't correctly check for the
case-insensitive variant for 'spice' during hotplug, so fix that too
With this we can also remove some parameters from the get_usb_devices
and get_usb_controllers functions
while were at it, refactor the permission checks for the usb config too
and use the new 'my sub' style for the functions
also make print_usbdevice_full parse the device itself, so we don't have
to do it in multiple places (especially in places where we don't see
that this is needed)
No functional change intended
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Markus Frank <m.frank@proxmox.com>
https://gitlab.com/x86-psABIs/x86-64-ABI/https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg01592.html
"
In 2020, AMD, Intel, Red Hat, and SUSE worked together to define
three microarchitecture levels on top of the historical x86-64
baseline:
* x86-64: original x86_64 baseline instruction set
* x86-64-v2: vector instructions up to Streaming SIMD
Extensions 4.2 (SSE4.2) and Supplemental
Streaming SIMD Extensions 3 (SSSE3), the
POPCNT instruction, and CMPXCHG16B
* x86-64-v3: vector instructions up to AVX2, MOVBE,
and additional bit-manipulation instructions.
* x86-64-v4: vector instructions from some of the
AVX-512 variants.
"
This patch add new builtin model derivated from qemu64 model,
to be compatible between intel/amd.
mandatory flags from qemu-doc generator:
https://gitlab.com/qemu/qemu/-/blob/master/scripts/cpu-x86-uarch-abi.py
levels = [
[ # x86-64 baseline
"cmov",
"cx8",
"fpu",
"fxsr",
"mmx",
"syscall",
"sse",
"sse2",
],
[ # x86-64-v2
"cx16",
"lahf-lm",
"popcnt",
"pni",
"sse4.1",
"sse4.2",
"ssse3",
],
[ # x86-64-v3
"avx",
"avx2",
"bmi1",
"bmi2",
"f16c",
"fma",
"abm",
"movbe",
"xsave" #missing from qemu doc currently
],
[ # x86-64-v4
"avx512f",
"avx512bw",
"avx512cd",
"avx512dq",
"avx512vl",
],
]
x86-64-v1 : I'm skipping it, as it's basicaly qemu64|kvm64 -vme,-cx16 for compat Opteron_G1 from 2004
so will use it as qemu64|kvm64 is higher are not working on opteron_g1 anyway
x86-64-v2 : Derived from qemu, +popcnt;+pni;+sse4.1;+sse4.2;+ssse3
min intel: Nehalem
min amd : Opteron_G3
x86-64-v2-AES : Derived from qemu, +aes;+popcnt;+pni;+sse4.1;+sse4.2;+ssse3
min intel: Westmere
min amd : Opteron_G3
x86-64-v3 : Derived from qemu64 +aes;+popcnt;+pni;+sse4.1;+sse4.2;+ssse3;+avx;+avx2;+bmi1;+bmi2;+f16c;+fma;+abm;+movbe+xsave
min intel: Haswell
min amd : EPYC_v1
x86-64-v4 : Derived from qemu64 +aes;+popcnt;+pni;+sse4.1;+sse4.2;+ssse3;+avx;+avx2;+bmi1;+bmi2;+f16c;+fma;+abm;+movbe;+xsave;+avx512f;+avx512bw;+avx512cd;+avx512dq;+avx512vl
min intel: Skylake
min amd : EPYC_v4
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
If no FQDN is provided, we simply set it to the current hostname. This
ensures that the hostname *really* gets set, since we encountered an
issue on Fedora and CentOS based systems where no hostname got set at
all.
When there's no FQDN set in the cloudinit config, this leads to the
following entry:
127.0.1.1 <hostname> <hostname>
Which doesn't seem to cause any issues.
Tested on:
- Ubuntu 23.04
- CentOS 8
- Fedora 38
- Debian 11
- SUSE 15.4
Signed-off-by: Leo Nunner <l.nunner@proxmox.com>
While, usually, the slot should match the ID, it's not explicitly
guaranteed and relies on QEMU internals. Using the numerical ID is
more future-proof and more consistent with plugging, where no slot
information (except the maximum limit) is relied upon.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
current qemu_dimm_list can return any kind of memory devices.
make it more generic, with an optionnal type device
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Required because there's one single efi for ARM, and the 2m/4m
difference doesn't seem to apply.
Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>
[ T: move description to format and reword subject ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Mira found out that 41 phys-bits the limit is pretty much the same as
with 40, as such odd sizes are a bit unexpected anyway lets mask the
LSB and use that as base, that way we're good again.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>