mdevs have a host-unique UUID they are indexed with in the PCI-id
independent `/sys/bus/mdev/devices/<uuid>` path, so there is no need
to go through the PCI id for them.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if the preparing of PCI devices or the start of the VM fails, we need
to cleanup the PCI devices (reservations *and* mdevs), or else it
might happen that there are leftovers which must be manually removed.
to include also mdevs now, refactor the cleanup code from
'vm_stop_cleanup' into it's own function, and call that instead of
only 'remove_pci_reservation'
also simplifies the code, such that it now removes all PCI ids
reserved for that VMID, since we cannot have multiple VMs with the
same VMID anyway
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
When passing through an NVIDIA vGPU via mediated devices, their
software needs the qemu process to have the 'uuid' parameter set to the
one of the vGPU. Since it's currently not possible to pass through multiple
vGPUs to one VM (seems to be an NVIDIA driver limitation at the moment),
we don't have to take care about that.
Sadly, the place we do this, it does not show up in 'qm showcmd' as we
don't (want to) query the pci devices in that case, and then we don't
have a way of knowing if it's an NVIDIA card or not. But since this
is informational with QEMU anyway, i'd say we can ignore that.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
This allows mobile- and vGPUs to be presented to the guest as if they
were the original desktop variants of the card. It also allows
device-ID variants that guests don't know about to be renamed to
match compatible sibling devices the guest does have drivers for
(e.g. to remove manufacturer-specific vendor ID variants that prevent
the use of a device which would otherwise have a supported chipset)
e.g. hostpci0: 03:00,vendor-id=0x8086,device-id=0x10f6
Signed-off-by: Nicholas Sherlock <n.sherlock@gmail.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
both style and readability are naturally subjective to a certain
degree...
Also, this patch mixes a bit much into one thing, but splitting that
up would mean lots of work I just wanted to avoid, sorry about that.
Among other things:
- avoid a level of indentation in the reserve loop
- rename pciids to reservation_list where it was a better fit
- make reserve set either pid or time to avoid suggesting that we
save both
- rename parameters to requested/dropped IDs for easier understanding
what's going on in the code
- avoid old_pid/pid, use running_pid and reserver_pid instead to
clarify what they actually mean
- drop useless returns to avoid suggesting the return value has any
use and save some lnes
- use a hash slice to delete all dropped IDs at once, shorter and
faster
- use 5 second timeout for reservation, this does nothing intensive
nor does it wait for anything, so the critical section should be
really short, 5s is really long enough for a wait..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
lck needs to die, the days of any 8.3 file naming schemes are long
gone (in the server space that is ;)
/var/run is /run so use the shorter, and while /var/lock is a OK
place for the locks we try to keep lock and lock-object together
nowadays. The qemu-server sub-directory avoids overly cluttering the
already crowded top-level /run dir
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
saves a list of pciid <-> vmid mappings in /var/run
that we can check when we start a vm
if we're not given a pid but a timeout, we save the time when the
reservation will run out (current time + timeout + 5s) since each
vm start (until we can save the pid) varies from config to config
reserve_pci_usage and remove_pci_reservation always expect a list of ids
so that we can update the reservation for a vm all at once
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we do not need this group, but want to use the regex where we have
multiple groups, so make it a non-capture group
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
fixes commit 74c17b7a23 which moved
this code here, but forgot to pass $vga ref, as the module was not
using warning nor strict mode this was not caught..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(also fixes#3011)
Deprecates the old-style 'boot' and 'bootdisk' options by adding a new
'order=' subproperty to 'boot'.
This allows a user to specify more than one disk in the boot order,
helping with newer versions of SeaBIOS/OVMF where disks without a
bootindex won't be initialized at all (breaks soft-raid and some LVM
setups).
This also allows specifying a bootindex for USB and hostpci devices,
which was not possible before. Floppy boot support is not supported in
the new model, but I doubt that will be a problem (AFAICT we can't even
attach floppy disks to a VM?).
Default behaviour is intended to stay the same, i.e. while new VMs will
receive the new 'order' property, it will be set so the VM starts the
same as before (using get_default_bootorder).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
during refactoring, the vmid got lost, but is necessary to get
the correct mdev id
Fixes commit 74c17b7a23
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[ reference fixed commit ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Legacy IGD passthrough requires address 00:1f.0 to not be assigned to
anything on QEMU startup (currently it's assigned to bridge pci.2).
Changing this in general would break live-migration, so introduce a new
hostpci parameter "legacy-igd", which if set to 1 will move that bridge
to be nested under bridge 1.
This is safe because:
* Bridge 1 is unconditionally created on i440fx, so nesting is ok
* Defaults are not changed, i.e. PCI layout only changes when the new
parameter is specified manually
* hostpci forbids migration anyway
Additionally, the PT device has to be assigned address 00:02.0 in the
guest as well, which is usually used for VGA assignment. Luckily, IGD PT
requires vga=none, so that is not an issue either.
See https://git.qemu.org/?p=qemu.git;a=blob;f=docs/igd-assign.txt
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Allow a user to add a virtio-rng-pci (an emulated hardware random
number generator) to a VM with the rng0 setting. The setting is
version_guard()-ed.
Limit the selection of entropy source to one of three:
/dev/urandom (preferred): Non-blocking kernel entropy source
/dev/random: Blocking kernel source
/dev/hwrng: Hardware RNG on the host for passthrough
QEMU itself defaults to /dev/urandom (or the equivalent getrandom()
call) if no source file is given, but I don't fully trust that
behaviour to stay constant, considering the documentation [0] already
disagrees with the code [1], so let's always specify the file ourselves.
/dev/urandom is preferred, since it prevents host entropy starvation.
The quality of randomness is still good enough to emulate a hwrng, since
a) it's still seeded from the kernel's true entropy pool periodically
and b) it's mixed with true entropy in the guest as well.
Additionally, all sources about entropy predicition attacks I could find
mention that to predict /dev/urandom results, /dev/random has to be
accessed or manipulated in one way or the other - this is not possible
from a VM however, as the entropy we're talking about comes from the
*hosts* blocking pool.
More about the entropy and security implications of the non-blocking
interface in [2] and [3].
Note further that only one /dev/hwrng exists at any given time, if
multiple RNGs are available, only the one selected in
'/sys/devices/virtual/misc/hw_random/rng_current' will feed the file.
Selecting this is left as an exercise to the user, if at all required.
We limit the available entropy to 1 KiB/s by default, but allow the user
to override this. Interesting to note is that the limiter does not work
linearly, i.e. max_bytes=1024/period=1000 means that up to 1 KiB of data
becomes available on a 1000 millisecond timer, not that 1 KiB is
streamed to the guest over the course of one second - hence the
configurable period.
The default used here is the same as given in the QEMU documentation [0]
and has been verified to affect entropy availability in a guest by
measuring /dev/random throughput. 1 KiB/s is enough to avoid any
early-boot entropy shortages, and already has a significant impact on
/dev/random availability in the guest.
[0] https://wiki.qemu.org/Features/VirtIORNG
[1] https://git.qemu.org/?p=qemu.git;a=blob;f=crypto/random-platform.c;h=f92f96987d7d262047c7604b169a7fdf11236107;hb=HEAD
[2] https://lwn.net/Articles/261804/
[3] https://lwn.net/Articles/808575/
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
to achieve this we have to add 3 new scsihw addresses since lsi
controllers can only hold 7 scsi drives
we go up to 31, since this is the limit for virtio-scsi-single devices
we have reserved (we can increase this in the future)
to make it more future proof, we add a new pci bridge under pci
bridge 1, so we have to adapt the bridge adding code (we did not
need this for q35 previously)
impact on live migration:
since on older versions of qemu-server we do not have those config
settings, there is no problem from old -> new
new->old is not supported anyway and this breaks so that
the vm crashes and loses the configs for scsi15-30
(same behaviour as e.g. with audio0 and migration from new->old)
tested with 31 scsi disk on
i440fx + virtio-scsi
i440fx + lsi
q35 + virtio-scsi
q35 + lsi
with ovmf + seabios
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
For non pci express passthrough additional addresses are reserved.
For pcie passthrough pcie root ports are needed (unless guest is like
windows 7).
The first 4 pcie root ports are defined by default in the pve-q35*.cfg
files. If more than 4 pcie devices are passed through the needed root
ports are created on demand. This helps to keep live migration possible
without adding a new pve-q35*.cfg file.
For the windows 7 like guests additional addresses are reserved as well.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
with such a shared memory device, a vm can share data with other
vms or with the host via memory
one of the use cases is looking-glass[1] with pci-passthrough, which copies
the guest fb to the host and you get a high-speed, low-latency
display client for the vm
on vm stop we delete the file again
1: https://looking-glass.hostfission.com/
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Win7 is very picky about pcie assignments and fails with
'error 12' the way we add hospci devices.
To combat that, we simply give the hostpci device a normal port
instead.
Start with address 0x10, so that we have space before those devices,
and between them and the ones configured in pve-q35.cfg should we
need it in the future.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
On arm we start off with a pcie bridge pcie.0. We need a
keyboard in addition to the tablet device, and we need to
connect both to an 'ehci' controller.
To do all this, we also pass the $arch variable through a
whole lot of function calls to ultimately also adapt the
hotplug code to take care of the new keyboard device.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
we change 'vga' to a property string and add a 'memory' property
with this, the user can better control the memory given to the virtual
gpu, this is especially useful for spice/qxl since high resolutions need
more memory
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>