Commit Graph

36 Commits

Author SHA1 Message Date
Dominik Csapak
6fa358a334 pci: make mediated device sysfs path independent of PCI id
mdevs have a host-unique UUID they are indexed with in the PCI-id
independent `/sys/bus/mdev/devices/<uuid>` path, so there is no need
to go through the PCI id for them.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2022-11-09 09:06:19 +01:00
Thomas Lamprecht
2fa64dbddd pci: add/improve HW reservation comments
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-09 08:55:55 +01:00
Dominik Csapak
1b189121fc vm start/stop: cleanup passed-through pci devices in more situations
if the preparing of PCI devices or the start of the VM fails, we need
to cleanup the PCI devices (reservations *and* mdevs), or else it
might happen that there are leftovers which must be manually removed.

to include also mdevs now, refactor the cleanup code from
'vm_stop_cleanup' into it's own function, and call that instead of
only 'remove_pci_reservation'

also simplifies the code, such that it now removes all PCI ids
reserved for that VMID, since we cannot have multiple VMs with the
same VMID anyway

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-09 08:49:45 +01:00
Dominik Csapak
bbf96e0f1e automatically add 'uuid' parameter when passing through NVIDIA vGPU
When passing through an NVIDIA vGPU via mediated devices, their
software needs the qemu process to have the 'uuid' parameter set to the
one of the vGPU. Since it's currently not possible to pass through multiple
vGPUs to one VM (seems to be an NVIDIA driver limitation at the moment),
we don't have to take care about that.

Sadly, the place we do this, it does not show up in 'qm showcmd' as we
don't (want to) query the pci devices in that case, and then we don't
have a way of knowing if it's an NVIDIA card or not. But since this
is informational with QEMU anyway, i'd say we can ignore that.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2022-08-12 13:42:33 +02:00
Dominik Csapak
d8a7e9e881 PCI: allow longer pci domains
some systems[0] have pci domains longer than the default ('0000') of 4
characters, so change the regex to allow at least 4.

0: https://forum.proxmox.com/threads/problem-with-gpu-passthrough-in-a-virtual-machine.105720/

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2022-03-16 18:03:35 +01:00
Nicholas Sherlock
d806b017ac pci: allow override of PCI vendor/device ids
This allows mobile- and vGPUs to be presented to the guest as if they
were the original desktop variants of the card. It also allows
device-ID variants that guests don't know about to be renamed to
match compatible sibling devices the guest does have drivers for
(e.g. to remove manufacturer-specific vendor ID variants that prevent
the use of a device which would otherwise have a supported chipset)

e.g. hostpci0: 03:00,vendor-id=0x8086,device-id=0x10f6

Signed-off-by: Nicholas Sherlock <n.sherlock@gmail.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
2022-01-25 10:59:23 +01:00
Thomas Lamprecht
d01de38cb6 pci: prepare: improve no-IOMMU error message
give some context

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-10-15 19:58:16 +02:00
Thomas Lamprecht
a01593676c pci reservation: rework helpers style and readability wise
both style and readability are naturally subjective to a certain
degree...

Also, this patch mixes a bit much into one thing, but splitting that
up would mean lots of work I just wanted to avoid, sorry about that.

Among other things:

- avoid a level of indentation in the reserve loop
- rename pciids to reservation_list where it was a better fit
- make reserve set either pid or time to avoid suggesting that we
  save both
- rename parameters to requested/dropped IDs for easier understanding
  what's going on in the code
- avoid old_pid/pid, use running_pid and reserver_pid instead to
  clarify what they actually mean
- drop useless returns to avoid suggesting the return value has any
  use and save some lnes
- use a hash slice to delete all dropped IDs at once, shorter and
  faster
- use 5 second timeout for reservation, this does nothing intensive
  nor does it wait for anything, so the critical section should be
  really short, 5s is really long enough for a wait..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-10-15 19:58:16 +02:00
Thomas Lamprecht
bda0ebff2d pci reservation: move lock/reservation file into /run/qemu-server
lck needs to die, the days of any 8.3 file naming schemes are long
gone (in the server space that is ;)

/var/run is /run so use the shorter, and while /var/lock is a OK
place for the locks we try to keep lock and lock-object together
nowadays. The qemu-server sub-directory avoids overly cluttering the
already crowded top-level /run dir

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-10-15 18:17:34 +02:00
Thomas Lamprecht
cda95d5223 pci reservation: encode locklessness of parsers in name
to avoid that they're misused

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-10-15 14:44:50 +02:00
Dominik Csapak
3bfee796f4 pci: add helpers to (un)reserve pciids for a vm
saves a list of pciid <-> vmid mappings in /var/run
that we can check when we start a vm

if we're not given a pid but a timeout, we save the time when the
reservation will run out (current time + timeout + 5s) since each
vm start (until we can save the pid) varies from config to config

reserve_pci_usage and remove_pci_reservation always expect a list of ids
so that we can update the reservation for a vm all at once

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-10-11 09:07:52 +02:00
Thomas Lamprecht
71cb8e0f87 pci related code cleanups
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-10-11 08:39:28 +02:00
Thomas Lamprecht
e2b42bee6d pci: use local helper to generated generate_mdev_uuid
avoid (API) leaking qemu-server specific stuff into pve-common

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-10-11 08:38:28 +02:00
Thomas Lamprecht
82712fcd3c pci: prepare_pci_device: fixup parameter name
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-10-11 08:37:35 +02:00
Dominik Csapak
acd4b77745 pci: refactor pci device preparation
makes the vm start a bit less crowded

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-10-08 06:27:19 +02:00
Dominik Csapak
a4d5b84c9c pci: to not capture first group in PCIRE
we do not need this group, but want to use the regex where we have
multiple groups, so make it a non-capture group

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-10-05 16:14:42 +02:00
Thomas Lamprecht
41af2dfc25 PCI: use warnings/strict and fix setting $vga from config2command
fixes commit 74c17b7a23 which moved
this code here, but forgot to pass $vga ref, as the module was not
using warning nor strict mode this was not caught..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-16 18:03:32 +02:00
Thomas Lamprecht
f7d1505b0c tree wide cleanups
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-16 18:03:32 +02:00
Thomas Lamprecht
d1c1af4b02 tree wide cleanup of s/return undef/return/
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-16 16:20:05 +02:00
Stefan Reiter
2141a802b8 fix #3010: add 'bootorder' parameter for better control of boot devices
(also fixes #3011)

Deprecates the old-style 'boot' and 'bootdisk' options by adding a new
'order=' subproperty to 'boot'.

This allows a user to specify more than one disk in the boot order,
helping with newer versions of SeaBIOS/OVMF where disks without a
bootindex won't be initialized at all (breaks soft-raid and some LVM
setups).

This also allows specifying a bootindex for USB and hostpci devices,
which was not possible before. Floppy boot support is not supported in
the new model, but I doubt that will be a problem (AFAICT we can't even
attach floppy disks to a VM?).

Default behaviour is intended to stay the same, i.e. while new VMs will
receive the new 'order' property, it will be set so the VM starts the
same as before (using get_default_bootorder).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-10-14 12:30:50 +02:00
Dominik Csapak
7de7f675c2 fix mdev cmdline generation
during refactoring, the vmid got lost, but is necessary to get
the correct mdev id

Fixes commit 74c17b7a23
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[ reference fixed commit ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-13 10:29:25 +02:00
Thomas Lamprecht
1fac3a0b31 pci: whitespace, indentation and formating fixes
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-06-25 13:33:26 +02:00
Stefan Reiter
13d689792e fix #2794: allow legacy IGD passthrough
Legacy IGD passthrough requires address 00:1f.0 to not be assigned to
anything on QEMU startup (currently it's assigned to bridge pci.2).
Changing this in general would break live-migration, so introduce a new
hostpci parameter "legacy-igd", which if set to 1 will move that bridge
to be nested under bridge 1.

This is safe because:
* Bridge 1 is unconditionally created on i440fx, so nesting is ok
* Defaults are not changed, i.e. PCI layout only changes when the new
parameter is specified manually
* hostpci forbids migration anyway

Additionally, the PT device has to be assigned address 00:02.0 in the
guest as well, which is usually used for VGA assignment. Luckily, IGD PT
requires vga=none, so that is not an issue either.

See https://git.qemu.org/?p=qemu.git;a=blob;f=docs/igd-assign.txt

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-06-25 13:25:35 +02:00
Stefan Reiter
74c17b7a23 cfg2cmd: hostpci: move code to PCI.pm
To avoid further cluttering config_to_command with subsequent changes.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-06-25 13:25:35 +02:00
Stefan Reiter
2cf61f33d9 fix #2264: add virtio-rng device
Allow a user to add a virtio-rng-pci (an emulated hardware random
number generator) to a VM with the rng0 setting. The setting is
version_guard()-ed.

Limit the selection of entropy source to one of three:
/dev/urandom (preferred): Non-blocking kernel entropy source
/dev/random: Blocking kernel source
/dev/hwrng: Hardware RNG on the host for passthrough

QEMU itself defaults to /dev/urandom (or the equivalent getrandom()
call) if no source file is given, but I don't fully trust that
behaviour to stay constant, considering the documentation [0] already
disagrees with the code [1], so let's always specify the file ourselves.

/dev/urandom is preferred, since it prevents host entropy starvation.
The quality of randomness is still good enough to emulate a hwrng, since
a) it's still seeded from the kernel's true entropy pool periodically
and b) it's mixed with true entropy in the guest as well.

Additionally, all sources about entropy predicition attacks I could find
mention that to predict /dev/urandom results, /dev/random has to be
accessed or manipulated in one way or the other - this is not possible
from a VM however, as the entropy we're talking about comes from the
*hosts* blocking pool.

More about the entropy and security implications of the non-blocking
interface in [2] and [3].

Note further that only one /dev/hwrng exists at any given time, if
multiple RNGs are available, only the one selected in
'/sys/devices/virtual/misc/hw_random/rng_current' will feed the file.
Selecting this is left as an exercise to the user, if at all required.

We limit the available entropy to 1 KiB/s by default, but allow the user
to override this. Interesting to note is that the limiter does not work
linearly, i.e. max_bytes=1024/period=1000 means that up to 1 KiB of data
becomes available on a 1000 millisecond timer, not that 1 KiB is
streamed to the guest over the course of one second - hence the
configurable period.

The default used here is the same as given in the QEMU documentation [0]
and has been verified to affect entropy availability in a guest by
measuring /dev/random throughput. 1 KiB/s is enough to avoid any
early-boot entropy shortages, and already has a significant impact on
/dev/random availability in the guest.

[0] https://wiki.qemu.org/Features/VirtIORNG
[1] https://git.qemu.org/?p=qemu.git;a=blob;f=crypto/random-platform.c;h=f92f96987d7d262047c7604b169a7fdf11236107;hb=HEAD
[2] https://lwn.net/Articles/261804/
[3] https://lwn.net/Articles/808575/

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-03-06 18:09:04 +01:00
Dominik Csapak
2513b862e6 fix #2566: increase scsi limit to 31
to achieve this we have to add 3 new scsihw addresses since lsi
controllers can only hold 7 scsi drives

we go up to 31, since this is the limit for virtio-scsi-single devices
we have reserved (we can increase this in the future)

to make it more future proof, we add a new pci bridge under pci
bridge 1, so we have to adapt the bridge adding code (we did not
need this for q35 previously)

impact on live migration:
since on older versions of qemu-server we do not have those config
settings, there is no problem from old -> new

new->old is not supported anyway and this breaks so that
the vm crashes and loses the configs for scsi15-30
(same behaviour as e.g. with audio0 and migration from new->old)

tested with 31 scsi disk on
i440fx + virtio-scsi
i440fx + lsi
q35 + virtio-scsi
q35 + lsi
with ovmf + seabios

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-01-31 20:26:26 +01:00
Thomas Lamprecht
e2b0d85dda PCIe passthrough: fixup: avoid addr conflict and cleanup a bit
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-09-06 19:27:30 +02:00
Thomas Lamprecht
d7d698f60c pci: add conflict tests
best viewed with: git show -w

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-09-06 19:27:30 +02:00
Aaron Lauterer
c4e1638148 Add support for up to 16 PCI(e) devices
For non pci express passthrough additional addresses are reserved.
For pcie passthrough pcie root ports are needed (unless guest is like
windows 7).

The first 4 pcie root ports are defined by default in the pve-q35*.cfg
files. If more than 4 pcie devices are passed through the needed root
ports are created on demand. This helps to keep live migration possible
without adding a new pve-q35*.cfg file.

For the windows 7 like guests additional addresses are reserved as well.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2019-09-06 19:27:30 +02:00
Aaron Lauterer
d438e06028 Add PCI address for audio device
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2019-07-18 08:24:39 +02:00
Dominik Csapak
6dbcb07367 add ivshmem device to config
with such a shared memory device, a vm can share data with other
vms or with the host via memory

one of the use cases is looking-glass[1] with pci-passthrough, which copies
the guest fb to the host and you get a high-speed, low-latency
display client for the vm

on vm stop we delete the file again

1: https://looking-glass.hostfission.com/

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-02-26 08:01:12 +01:00
Dominik Csapak
739ba34024 add win7 pcie quirk
Win7 is very picky about pcie assignments and fails with
'error 12' the way we add hospci devices.

To combat that, we simply give the hostpci device a normal port
instead.

Start with address 0x10, so that we have space before those devices,
and between them and the ones configured in pve-q35.cfg should we
need it in the future.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2018-12-17 14:00:23 +01:00
Dominik Csapak
b71351a7ed QemuServer: remove PCI sysfs helpers
and use them from PVE::SysFSTools, where they got moved to

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2018-11-19 14:06:11 +01:00
Wolfgang Bumiller
d559309fcf arm: pci addressing, keyboard and ehci controller
On arm we start off with a pcie bridge pcie.0. We need a
keyboard in addition to the tablet device, and we need to
connect both to an 'ehci' controller.

To do all this, we also pass the $arch variable through a
whole lot of function calls to ultimately also adapt the
hotplug code to take care of the new keyboard device.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-11-13 14:44:28 +01:00
Dominik Csapak
55655ebc32 fix #1952: make vga memory configurable
we change 'vga' to a property string and add a 'memory' property
with this, the user can better control the memory given to the virtual
gpu, this is especially useful for spice/qxl since high resolutions need
more memory

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2018-11-09 13:45:07 +01:00
Dominik Csapak
de9768f002 refactor PCI into own file
to reduce QemuServer.pm size
also move the $device hash out of any function

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2016-06-22 09:13:16 +02:00