This function is no longer used, as all its callers have been converted
to use pci_init_nic_devices() instead.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Huth <thuth@redhat.com>
The loop over nd_table[] to add PCI NICs is repeated in quite a few
places. Add a helper function to do it.
Some platforms also try to instantiate a specific model in a specific
slot, to match the real hardware. Add pci_init_nic_in_slot() for that
purpose.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Switch the PCI bus from using BusClass::reset to the Resettable
interface.
This has no behavioural change, because the BusClass code to support
subclasses that use the legacy BusClass::reset will call that method
in the hold phase of 3-phase reset.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-id: 20240119163512.3810301-2-peter.maydell@linaro.org
This patch modifies pci_setup_iommu() to set PCIIOMMUOps
instead of setting PCIIOMMUFunc. PCIIOMMUFunc is used to
get an address space for a PCI device in vendor specific
way. The PCIIOMMUOps still offers this functionality. But
using PCIIOMMUOps leaves space to add more iommu related
vendor specific operations.
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hervé Poussineau <hpoussin@reactos.org>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Cc: Jagannathan Raman <jag.raman@oracle.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
[ clg: - refreshed on latest QEMU
- included hw/remote/iommu.c
- documentation update
- asserts in pci_setup_iommu()
- removed checks on iommu_bus->iommu_ops->get_address_space
- included Elroy PCI host (PA-RISC) ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This are the easiest cases, where we were already using
VMSTATE_INSTANCE_ID_ANY.
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-3-quintela@redhat.com>
Fix:
hw/pci/pci.c:504:54: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
MemoryRegion *address_space_io,
^
hw/pci/pci.c:533:38: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
MemoryRegion *address_space_io,
^
hw/pci/pci.c:543:40: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
MemoryRegion *address_space_io,
^
hw/pci/pci.c:590:45: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
MemoryRegion *address_space_io,
^
include/exec/address-spaces.h:35:21: note: previous declaration is here
extern AddressSpace address_space_io;
^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20231010115048.11856-6-philmd@linaro.org>
current code sets PCI_SEC_LATENCY_TIMER to RW, but for
pcie to pcie bridges it must be RO 0 according to
pci express spec which says:
This register does not apply to PCI Express. It must be read-only
and hardwired to 00h. For PCI Express to PCI/PCI-X Bridges, refer to the
[PCIe-to-PCI-PCI-X-Bridge] for requirements for this register.
also, fix typo in comment where it's made writeable - this typo
is likely what prevented us noticing we violate this requirement
in the 1st place.
Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Message-Id: <de9d05366a70172e1789d10591dbe59e39c3849c.1693432039.git.mst@redhat.com>
Tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The PCI_COMMAND register is located at offset 4 within
the PCI configuration space and occupies 2 bytes. The
interrupt disable bit is at the 10th bit, which corresponds
to the byte at offset 5 in the PCI configuration space.
In our testing environment, the guest driver may directly
updates the byte at offset 5 in the PCI configuration space.
The backtrace looks like as following:
at hw/pci/pci.c:1442
at hw/virtio/virtio-pci.c:605
val=5, len=1) at hw/pci/pci_host.c:81
In this situation, the range_covers_byte function called
by the pci_default_write_config function will return false,
resulting in the inability to handle the interrupt disable
update event.
To fix this issue, we can use the ranges_overlap function
instead of range_covers_byte to determine whether the interrupt
bit has been updated.
Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn>
Signed-off-by: yuanminghao <yuanmh12@chinatelecom.cn>
Message-Id: <ce2d0437-8faa-4d61-b536-4668f645a959@chinatelecom.cn>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: b6981cb57b ("pci: interrupt disable bit support")
This change is cosmetic. A comment is added explaining why we need to check for
the availability of function 0 when we hotplug a device.
CC: mst@redhat.com
CC: mjt@tls.msk.ru
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The current implementers of ARI are all SR-IOV devices. The ARI next
function number field is undefined for VF according to PCI Express Base
Specification Revision 5.0 Version 1.0 section 9.3.7.7. The PF still
requires some defined value so end the linked list formed with the field
by specifying 0 as required for any ARI implementation according to
section 7.8.7.2.
For migration, the field will keep having 1 as its value on the old
QEMU machine versions.
Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt")
Fixes: 44c2c09488 ("hw/nvme: Add support for SR-IOV")
Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20230710153838.33917-3-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCIe downstream ports only have a single device 0, so PCI Express devices can
only be plugged into slot 0 on a PCIe port. Add a warning to let users know
when the invalid configuration is used. We may enforce this more strongly later
once we get more clarity on whether we are introducing a bad regression for
users currently using the wrong configuration.
The change has been tested to not break or alter behaviors of ARI capable
devices by instantiating seven vfs on an emulated igb device (the maximum
number of vfs the igb device supports). The vfs are instantiated correctly
and are seen to have non-zero device/slot numbers in the conventional PCI BDF
representation.
CC: jusual@redhat.com
CC: imammedo@redhat.com
CC: mst@redhat.com
CC: akihiko.odaki@daynix.com
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Julia Suvorova <jusual@redhat.com>
Message-Id: <20230705115925.5339-6-anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
There is also pci_new() which creates non-multifunction PCI devices.
Accordingly the parameter is always set to true when a multi function PCI
device is to be created.
The reason for the parameter's existence seems to be that it is used in the
internal PCI code as well which is the only location where it gets set to
false. This one usage can be resolved by factoring out an internal helper
function.
Remove this redundant, error-prone parameter.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20230304114043.121024-6-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There is also pci_create_simple() which creates non-multifunction PCI
devices. Accordingly the parameter is always set to true when a multi
function PCI device is to be created.
The reason for the parameter's existence seems to be that it is used in the
internal PCI code as well which is the only location where it gets set to
false. This one usage can be replaced by trivial code.
Remove this redundant, error-prone parameter.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20230304114043.121024-5-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On incoming migration we have the following sequence to load option
ROM:
1. On device realize we do normal load ROM from the file
2. Than, on incoming migration we rewrite ROM from the incoming RAM
block. If sizes mismatch we fail, like this:
Size mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x40000 != 0x80000: Invalid argument
This is not ideal when we migrate to updated distribution: we have to
keep old ROM files in new distribution and be careful around romfile
property to load correct ROM file. Which is loaded actually just to
allocate the ROM with correct length.
Note, that romsize property doesn't really help: if we try to specify
it when default romfile is larger, it fails with something like:
romfile "efi-virtio.rom" (160768 bytes) is too large for ROM size 65536
Let's just ignore ROM file when romsize is specified and we are in
incoming migration state. In other words, we need only to preallocate
ROM of specified size, local ROM file is unrelated.
This way:
If romsize was specified on source, we just use same commandline as on
source, and migration will work independently of local ROM files on
target.
If romsize was not specified on source (and we have mismatching local
ROM file on target host), we have to specify romsize on target to match
source romsize. romfile parameter may be kept same as on source or may
be dropped, the file is not loaded anyway.
As a bonus we avoid extra reading from ROM file on target.
Note: when we don't have romsize parameter on source command line and
need it for target, it may be calculated as aligned up to power of two
size of ROM file on source (if we know, which file is it) or,
alternatively it may be retrieved from source QEMU by QMP qom-get
command, like
{ "execute": "qom-get",
"arguments": {
"path": "/machine/peripheral/CARD_ID/virtio-net-pci.rom[0]",
"property": "size" } }
Note: we have extra initialization of size variable to zero in
pci_add_option_rom to avoid false-positive
"error: ‘size’ may be used uninitialized"
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230522201740.88960-2-vsementsov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When calling pci_bus_irqs() multiple times on the same object without calling
pci_bus_irqs_cleanup() in between PCIBus::irq_count[] is currently leaked.
Let's fix this because Xen will do just that in a few commits, and because
calling pci_bus_irqs_cleanup() in between seems fragile and cumbersome.
Note that pci_bus_irqs_cleanup() now has to NULL irq_count such that
pci_bus_irqs() doesn't do a double free.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Chuck Zmudzinski <brchuckz@aol.com>
Message-Id: <20230403074124.3925-3-shentey@gmail.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Since it's implementation on v8.0.0-rc0, having the PCI_ERR_UNCOR_MASK
set for machine types < 8.0 will cause migration to fail if the target
QEMU version is < 8.0.0 :
qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 cmask: ff wmask: 0 w1cmask:0
qemu-system-x86_64: Failed to load PCIDevice:config
qemu-system-x86_64: Failed to load e1000e:parent_obj
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:02.0/e1000e'
qemu-system-x86_64: load of migration failed: Invalid argument
The above test migrated a 7.2 machine type from QEMU master to QEMU 7.2.0,
with this cmdline:
./qemu-system-x86_64 -M pc-q35-7.2 [-incoming XXX]
In order to fix this, property x-pcie-err-unc-mask was introduced to
control when PCI_ERR_UNCOR_MASK is enabled. This property is enabled by
default, but is disabled if machine type <= 7.2.
Fixes: 010746ae1d ("hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20230503002701.854329-1-leobras@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1576
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230515125229.44836-3-vsementsov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Fix over-80 lines and missing curly brackets for if-operators, which
are required by QEMU coding style.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230515125229.44836-2-vsementsov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
This patch provides accessor functions as replacements for direct
access to slot_reserved_mask according to the comment at the top
of include/hw/pci/pci_bus.h which advises that data structures for
PCIBus should not be directly accessed but instead be accessed using
accessor functions in pci.h.
Three accessor functions can conveniently replace all direct accesses
of slot_reserved_mask. With this patch, the new accessor functions are
used in hw/sparc64/sun4u.c and hw/xen/xen_pt.c and pci_bus.h is removed
from the included header files of the same two files.
No functional change intended.
Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
Message-Id: <b1b7f134883cbc83e455abbe5ee225c71aa0e8d0.1678888385.git.brchuckz@aol.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [sun4u]
acpi-index is now working with non-hotpluggable buses
(pci/q35 machine hostbridge), it can be used even if
ACPI PCI hotplug is disabled and as result acpi-index
uniqueness check will be omitted (since the check is
done by ACPI PCI hotplug handler, which isn't wired
when ACPI PCI hotplug is disabled).
Move check and related code to generic PCIDevice so it
would be independent of ACPI PCI hotplug.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230302161543.286002-30-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost-user support without ioeventfd
word replacements in vhost user spec
shpc improvements
cleanups, fixes all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmQBO8QPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpMUMH/3/FVp4qaF4CDwCHn7xWFRJpOREIhX/iWfUu
lGkwxnB7Lfyqdg7i4CAfgMf2emWKZchEE2DamfCo5bIX0IgRU3DWcOdR9ePvJ29J
cKwIYpxZcB4RYSoWL5OUakQLCT3JOu4XWaXeVjyHABjQhf3lGpwN4KmIOBGOy/N6
0YHOQScW2eW62wIOwhAEuYQceMt6KU32Uw3tLnMbJliiBf3a/hPctVNM9TFY9pcd
UYHGfBx/zD45owf1lTVEQFDg0eqPZKWW29g5haiOd5oAyXHHolzu+bt3bU7lH46b
f7iP12LqDudyrgoF5YWv3NJ4HaGm5V3kPqNqLLF/mjF7alxG+N8=
=hN3h
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, cleanups, fixes
vhost-user support without ioeventfd
word replacements in vhost user spec
shpc improvements
cleanups, fixes all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmQBO8QPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpMUMH/3/FVp4qaF4CDwCHn7xWFRJpOREIhX/iWfUu
# lGkwxnB7Lfyqdg7i4CAfgMf2emWKZchEE2DamfCo5bIX0IgRU3DWcOdR9ePvJ29J
# cKwIYpxZcB4RYSoWL5OUakQLCT3JOu4XWaXeVjyHABjQhf3lGpwN4KmIOBGOy/N6
# 0YHOQScW2eW62wIOwhAEuYQceMt6KU32Uw3tLnMbJliiBf3a/hPctVNM9TFY9pcd
# UYHGfBx/zD45owf1lTVEQFDg0eqPZKWW29g5haiOd5oAyXHHolzu+bt3bU7lH46b
# f7iP12LqDudyrgoF5YWv3NJ4HaGm5V3kPqNqLLF/mjF7alxG+N8=
# =hN3h
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 03 Mar 2023 00:13:56 GMT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits)
tests/data/acpi/virt: drop (most) duplicate files.
hw/cxl/mailbox: Use new UUID network order define for cel_uuid
qemu/uuid: Add UUID static initializer
qemu/bswap: Add const_le64()
tests: acpi: Update q35/DSDT.cxl for removed duplicate UID
hw/i386/acpi: Drop duplicate _UID entry for CXL root bridge
tests/acpi: Allow update of q35/DSDT.cxl
hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL
hw/pci-bridge/cxl_downstream: Fix type naming mismatch
hw/mem/cxl_type3: Improve error handling in realize()
MAINTAINERS: Add Fan Ni as Compute eXpress Link QEMU reviewer
intel-iommu: send UNMAP notifications for domain or global inv desc
smmu: switch to use memory_region_unmap_iommu_notifier_range()
memory: introduce memory_region_unmap_iommu_notifier_range()
intel-iommu: fail DEVIOTLB_UNMAP without dt mode
intel-iommu: fail MAP notifier without caching mode
memory: Optimize replay of guest mapping
chardev/char-socket: set s->listener = NULL in char_socket_finalize
hw/pci: Trace IRQ routing on PCI topology
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Trace how IRQ are rooted from EP to RC.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230211152239.88106-3-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The way that Xen handles MSI PIRQs is kind of awful.
There is a special MSI message which targets a PIRQ. The vector in the
low bits of data must be zero. The low 8 bits of the PIRQ# are in the
destination ID field, the extended destination ID field is unused, and
instead the high bits of the PIRQ# are in the high 32 bits of the address.
Using the high bits of the address means that we can't intercept and
translate these messages in kvm_send_msi(), because they won't be caught
by the APIC — addresses like 0x1000fee46000 aren't in the APIC's range.
So we catch them in pci_msi_trigger() instead, and deliver the event
channel directly.
That isn't even the worst part. The worst part is that Xen snoops on
writes to devices' MSI vectors while they are *masked*. When a MSI
message is written which looks like it targets a PIRQ, it remembers
the device and vector for later.
When the guest makes a hypercall to bind that PIRQ# (snooped from a
marked MSI vector) to an event channel port, Xen *unmasks* that MSI
vector on the device. Xen guests using PIRQ delivery of MSI don't
ever actually unmask the MSI for themselves.
Now that this is working we can finally enable XENFEAT_hvm_pirqs and
let the guest use it all.
Tested with passthrough igb and emulated e1000e + AHCI.
CPU0 CPU1
0: 65 0 IO-APIC 2-edge timer
1: 0 14 xen-pirq 1-ioapic-edge i8042
4: 0 846 xen-pirq 4-ioapic-edge ttyS0
8: 1 0 xen-pirq 8-ioapic-edge rtc0
9: 0 0 xen-pirq 9-ioapic-level acpi
12: 257 0 xen-pirq 12-ioapic-edge i8042
24: 9600 0 xen-percpu -virq timer0
25: 2758 0 xen-percpu -ipi resched0
26: 0 0 xen-percpu -ipi callfunc0
27: 0 0 xen-percpu -virq debug0
28: 1526 0 xen-percpu -ipi callfuncsingle0
29: 0 0 xen-percpu -ipi spinlock0
30: 0 8608 xen-percpu -virq timer1
31: 0 874 xen-percpu -ipi resched1
32: 0 0 xen-percpu -ipi callfunc1
33: 0 0 xen-percpu -virq debug1
34: 0 1617 xen-percpu -ipi callfuncsingle1
35: 0 0 xen-percpu -ipi spinlock1
36: 8 0 xen-dyn -event xenbus
37: 0 6046 xen-pirq -msi ahci[0000:00:03.0]
38: 1 0 xen-pirq -msi-x ens4
39: 0 73 xen-pirq -msi-x ens4-rx-0
40: 14 0 xen-pirq -msi-x ens4-rx-1
41: 0 32 xen-pirq -msi-x ens4-tx-0
42: 47 0 xen-pirq -msi-x ens4-tx-1
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
The code that collects the available NIC models is not really specific
to PCI anymore and will be required in the next patch, too, so let's
move this into a new separate function in net.c instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
function doesn't need RW aceess to passed in bus pointer,
make it const.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-31-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
pci_bus_irqs() coupled together the assignment of pci_set_irq_fn and
pci_map_irq_fn to a PCI bus. This coupling gets in the way when the
pci_map_irq_fn is board-specific while the pci_set_irq_fn is device-
specific.
For example, both of QEMU's PIIX south bridge models have different
pci_map_irq_fn implementations which are board-specific rather than
device-specific. These implementations should therefore reside in board
code. The pci_set_irq_fn's, however, should stay in the device models
because they access memory internal to the model.
Factoring out pci_bus_map_irqs() from pci_bus_irqs() allows the
assignments to be decoupled, resolving the problem described above.
Note also how pci_vpb_realize() which gets touched in this commit
assigns different pci_map_irq_fn's depending on the board.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230109172347.1830-5-shentey@gmail.com>
[PMD: Factor out in vfu_object_set_bus_irq()]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
make TCO watchdog work by default
part of generic vdpa support
asid interrupt for vhost-vdpa
added flex bus port DVSEC for cxl
misc fixes, cleanups, documentation
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmOi/OQPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpGCkH/j06y7PEDHfG1MnPoFQIEWKHPyU/FMUe1RCW
dRsfVmHZ8Jc1Jy4wVch461QpcIC+WL/Fshzh92G0hVDI2AWzJOxzpWQESmCphJJG
Olk/H/ort4ZIrwOynAHDKLzgltoTI91uao3UT7w67NumAgVYYW4Q9ObHm2G3Wmwc
fe763NmlObrNYYCIbJw/KiBLrk7M5LaMLPeoRGJefD4MYUAPXy/sUQt61VyuZpuG
xFAeDB7/76MXFKJVjccSnZfa8lihOJ5AlvCBTjjY5PbGl8+U1usdd3hOVComYb02
LW4sKLkxe5sycg/bFQdBLpz2lZVlMjpY9nd9YiumIrgLBv70Uf0=
=WyvK
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, cleanups, fixes
make TCO watchdog work by default
part of generic vdpa support
asid interrupt for vhost-vdpa
added flex bus port DVSEC for cxl
misc fixes, cleanups, documentation
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Wed 21 Dec 2022 12:32:36 GMT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (41 commits)
contrib/vhost-user-blk: Replace lseek64 with lseek
libvhost-user: Switch to unsigned int for inuse field in struct VuVirtq
hw/virtio: Extract QMP related code virtio-qmp.c
hw/virtio: Extract config read/write accessors to virtio-config-io.c
hw/virtio: Constify qmp_virtio_feature_map_t[]
hw/virtio: Guard and restrict scope of qmp_virtio_feature_map_t[]
hw/virtio: Rename virtio_ss[] -> specific_virtio_ss[]
hw/virtio: Add missing "hw/core/cpu.h" include
hw/cxl/device: Add Flex Bus Port DVSEC
hw/acpi: Rename tco.c -> ich9_tco.c
acpi/tests/avocado/bits: add mformat as one of the dependencies
docs/acpi/bits: document BITS_DEBUG environment variable
pci: drop redundant PCIDeviceClass::is_bridge field
remove DEC 21154 PCI bridge
vhost: fix vq dirty bitmap syncing when vIOMMU is enabled
acpi/tests/avocado/bits: add SPDX license identifiers for bios bits tests
include/hw: attempt to document VirtIO feature variables
vhost-user: send set log base message only once
vdpa: always start CVQ in SVQ mode if possible
vdpa: add shadow_data to vhost_vdpa
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
and use cast to TYPE_PCI_BRIDGE instead.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221129101341.185621-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
This method is for HMP command "info qtree".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221201121133.3813857-8-armbru@redhat.com>
pcibus_dev_print() contains a copy of get_class_desc(). Call the
function instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221201121133.3813857-7-armbru@redhat.com>
This moves these commands from MAINTAINERS section "QMP" to "PCI".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221201121133.3813857-3-armbru@redhat.com>
[Commit message improved]
Fix a few style violations so that checkpatch.pl won't complain when I
move this code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221201121133.3813857-2-armbru@redhat.com>
In the PCI subsystem we currently use the legacy function
qdev_reset_all() and qbus_reset_all(). These perform a recursive
reset, starting from either a qbus or a qdev. However they do not
permit any of the devices in the tree to use three-phase reset,
because device reset goes through the device_legacy_reset() function
that only calls the single DeviceClass::reset method.
Switch to using the device_cold_reset() and bus_cold_reset()
functions. These also perform a recursive reset, where first the
children are reset and then finally the parent, but they use the new
(...in 2020...) Resettable mechanism, which supports both the old
style single-reset method and also the new 3-phase reset handling.
This should be a no-behaviour-change commit which just reduces the
use of a deprecated API.
Commit created with:
sed -i -e 's/qdev_reset_all/device_cold_reset/g;s/qbus_reset_all/bus_cold_reset/g' hw/pci/*.c
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with. Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step. This is the step for qapi/pci.json.
Said commit explains the transformation in more detail. The invariant
violations mentioned there do not occur here.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221104160712.3005652-20-armbru@redhat.com>
Forward remote device's interrupts to the guest
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Message-id: 9523479eaafe050677f4de2af5dd0df18c27cfd9.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
the code in pcibus_get_fw_dev_path contained the potential for a
stack buffer overflow of 1 byte, potentially writing to the stack an
extra NUL byte.
This overflow could happen if the PCI slot is >= 0x10000000,
and the PCI function is >= 0x10000000, due to the size parameter
of snprintf being incorrectly calculated in the call:
if (PCI_FUNC(d->devfn))
snprintf(path + off, sizeof(path) + off, ",%x", PCI_FUNC(d->devfn));
since the off obtained from a previous call to snprintf is added
instead of subtracted from the total available size of the buffer.
Without the accurate size guard from snprintf, we end up writing in the
worst case:
name (32) + "@" (1) + SLOT (8) + "," (1) + FUNC (8) + term NUL (1) = 51 bytes
In order to provide something more robust, replace all of the code in
pcibus_get_fw_dev_path with a single call to g_strdup_printf,
so there is no need to rely on manual calculations.
Found by compiling QEMU with FORTIFY_SOURCE=3 as the error:
*** buffer overflow detected ***: terminated
Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff642c380 (LWP 121307)]
0x00007ffff71ff55c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff71ff55c in __pthread_kill_implementation () at /lib64/libc.so.6
#1 0x00007ffff71ac6f6 in raise () at /lib64/libc.so.6
#2 0x00007ffff7195814 in abort () at /lib64/libc.so.6
#3 0x00007ffff71f279e in __libc_message () at /lib64/libc.so.6
#4 0x00007ffff729767a in __fortify_fail () at /lib64/libc.so.6
#5 0x00007ffff7295c36 in () at /lib64/libc.so.6
#6 0x00007ffff72957f5 in __snprintf_chk () at /lib64/libc.so.6
#7 0x0000555555b1c1fd in pcibus_get_fw_dev_path ()
#8 0x0000555555f2bde4 in qdev_get_fw_dev_path_helper.constprop ()
#9 0x0000555555f2bd86 in qdev_get_fw_dev_path_helper.constprop ()
#10 0x00005555559a6e5d in get_boot_device_path ()
#11 0x00005555559a712c in get_boot_devices_list ()
#12 0x0000555555b1a3d0 in fw_cfg_machine_reset ()
#13 0x0000555555bf4c2d in pc_machine_reset ()
#14 0x0000555555c66988 in qemu_system_reset ()
#15 0x0000555555a6dff6 in qdev_machine_creation_done ()
#16 0x0000555555c79186 in qmp_x_exit_preconfig.part ()
#17 0x0000555555c7b459 in qemu_init ()
#18 0x0000555555960a29 in main ()
Found-by: Dario Faggioli <Dario Faggioli <dfaggioli@suse.com>
Found-by: Martin Liška <martin.liska@suse.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20220531114707.18830-1-cfontana@suse.de>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
This adds just enough of a root port implementation to be able to
enumerate root ports (creating the required DVSEC entries). What's not
here yet is the MMIO nor the ability to write some of the DVSEC entries.
This can be added with the qemu commandline by adding a rootport to a
specific CXL host bridge. For example:
-device cxl-rp,id=rp0,bus="cxl.0",addr=0.0,chassis=4
Like the host bridge patch, the ACPI tables aren't generated at this
point and so system software cannot use it.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-17-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This works like adding a typical pxb device, except the name is
'pxb-cxl' instead of 'pxb-pcie'. An example command line would be as
follows:
-device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1
A CXL PXB is backward compatible with PCIe. What this means in practice
is that an operating system that is unaware of CXL should still be able
to enumerate this topology as if it were PCIe.
One can create multiple CXL PXB host bridges, but a host bridge can only
be connected to the main root bus. Host bridges cannot appear elsewhere
in the topology.
Note that as of this patch, the ACPI tables needed for the host bridge
(specifically, an ACPI object in _SB named ACPI0016 and the CEDT) aren't
created. So while this patch internally creates it, it cannot be
properly used by an operating system or other system software.
Also necessary is to add an exception to scripts/device-crash-test
similar to that for exiting pxb as both must created on a PCIexpress
host bus.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan.Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-15-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A CXL component is a hardware entity that implements CXL component
registers from the CXL 2.0 spec (8.2.3). Currently these represent 3
general types.
1. Host Bridge
2. Ports (root, upstream, downstream)
3. Devices (memory, other)
A CXL component can be conceptually thought of as a PCIe device with
extra functionality when enumerated and enabled. For this reason, CXL
does here, and will continue to add on to existing PCI code paths.
Host bridges will typically need to be handled specially and so they can
implement this newly introduced interface or not. All other components
should implement this interface. Implementing this interface allows the
core PCI code to treat these devices as special where appropriate.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Message-Id: <20220429144110.25167-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A global boolean variable "vga_interface_created"(declared in softmmu/globals.c)
has been used to track the creation of vga interface. If the vga flag is passed
in the command line "default_vga"(declared in softmmu/vl.c) variable is set to 0.
To warn user, the condition checks if vga_interface_created is false
and default_vga is equal to 0. If "-vga none" is passed, this patch will not warn the
user regarding the creation of VGA device.
The warning "A -vga option was passed but this
machine type does not use that option; no VGA device has been created"
is logged if vga flag is passed but no vga device is created.
This patch has been tested for x86_64, i386, sparc, sparc64 and arm boards.
Signed-off-by: Gautam Agrawal <gautamnagrawal@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/581
Message-Id: <20220501122505.29202-1-gautamnagrawal@gmail.com>
[thuth: Fix wrong warning with "-device" in some cases as reported by Paolo]
Signed-off-by: Thomas Huth <thuth@redhat.com>
During qemu init stage, when there is pci BDF conflicts, qemu print
a warning but not showing which device the BDF is occupied by. E.x:
"PCI: slot 2 function 0 not available for virtio-scsi-pci, in use by virtio-scsi-pci"
To facilitate user knowing the offending device and fixing it, showing
the id info in the warning.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220223094435.64495-1-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch provides the building blocks for creating an SR/IOV
PCIe Extended Capability header and register/unregister
SR/IOV Virtual Functions.
Signed-off-by: Knut Omang <knuto@ifi.uio.no>
Message-Id: <20220217174504.1051716-2-lukasz.maniak@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move the pci_intx() definition to the PCI header file, so that it can
be called from other PCI files. It is used by the next patch.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Message-Id: <20211116170133.724751-3-fbarrat@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Unify format used by trace_pci_update_mappings_del(),
trace_pci_update_mappings_add(), trace_pci_cfg_write() and
trace_pci_cfg_read() to print the device name and bus number,
slot number and function number.
For instance:
pci_cfg_read virtio-net-pci 00:0 @0x20 -> 0xffffc00c
pci_cfg_write virtio-net-pci 00:0 @0x20 <- 0xfea0000c
pci_update_mappings_del d=0x555810b92330 01:00.0 4,0xffffc000+0x4000
pci_update_mappings_add d=0x555810b92330 01:00.0 4,0xfea00000+0x4000
becomes
pci_cfg_read virtio-net-pci 01:00.0 @0x20 -> 0xffffc00c
pci_cfg_write virtio-net-pci 01:00.0 @0x20 <- 0xfea0000c
pci_update_mappings_del virtio-net-pci 01:00.0 4,0xffffc000+0x4000
pci_update_mappings_add virtio-net-pci 01:00.0 4,0xfea00000+0x4000
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20211105192541.655831-1-lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows to power off pci devices. In "off" state the devices will
not be visible. No pci config space access, no pci bar access, no dma.
Default state is "on", so this patch (alone) should not change behavior.
Use case: Allows hotplug controllers implement slot power. Hotplug
controllers doing so should set the inital power state for devices in
the ->plug callback.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-2-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
They're actually more commonly used than the helper without _under_bus, because
most callers do have the pci bus on hand. After exporting we can switch a lot
of the call sites to use these two helpers.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20211028043129.38871-3-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
They're used in quite a few places of pci.[ch] and also in the rest of the code
base. Define them so that it doesn't need to be defined all over the places.
The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
opaque. The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
void* pointer.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20211028043129.38871-2-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Rename the "allocate and return" qbus creation function to
qbus_new(), to bring it into line with our _init vs _new convention.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20210923121153.23754-6-peter.maydell@linaro.org
Rename qbus_create_inplace() to qbus_init(); this is more in line
with our usual naming convention for functions that in-place
initialize objects.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-5-peter.maydell@linaro.org
Rename the pci_root_bus_new_inplace() function to
pci_root_bus_init(); this brings the bus type in to line with a
"_init for in-place init, _new for allocate-and-return" convention.
To do this we need to rename the implementation-internal function
that was using the pci_root_bus_init() name to
pci_root_bus_internal_init().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-4-peter.maydell@linaro.org
This helps to get the min and max bus number of a PCI bus hierarchy.
Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <1625748919-52456-6-git-send-email-wangxingang5@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a new bypass_iommu property for PCI host and use it to check
whether devices attached to the PCI root bus will bypass iommu.
In pci_device_iommu_address_space(), check the property and
avoid getting iommu address space for devices bypass iommu.
Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <1625748919-52456-2-git-send-email-wangxingang5@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stop including exec/address-spaces.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-5-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
If a device model
(a) doesn't set the value to a correct interrupt number and then
(b) triggers an interrupt for itself,
it's device model bug. Add assert on interrupt pin number to catch
this kind of bug more obviously.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-Id: <9cf8ac3b17e162daac0971d7be32deb6a33ae6ec.1616532563.git.isaku.yamahata@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In x86/ACPI world, linux distros are using predictable
network interface naming since systemd v197. Which on
QEMU based VMs results into path based naming scheme,
that names network interfaces based on PCI topology.
With itm on has to plug NIC in exactly the same bus/slot,
which was used when disk image was first provisioned/configured
or one risks to loose network configuration due to NIC being
renamed to actually used topology.
That also restricts freedom to reshape PCI configuration of
VM without need to reconfigure used guest image.
systemd also offers "onboard" naming scheme which is
preferred over PCI slot/topology one, provided that
firmware implements:
"
PCI Firmware Specification 3.1
4.6.7. DSM for Naming a PCI or PCI Express Device Under
Operating Systems
"
that allows to assign user defined index to PCI device,
which systemd will use to name NIC. For example, using
-device e1000,acpi-index=100
guest will rename NIC to 'eno100', where 'eno' is default
prefix for "onboard" naming scheme. This doesn't require
any advance configuration on guest side to com in effect
at 'onboard' scheme takes priority over path based naming.
Hope is that 'acpi-index' it will be easier to consume by
management layer, compared to forcing specific PCI topology
and/or having several disk image templates for different
topologies and will help to simplify process of spawning
VM from the same template without need to reconfigure
guest NIC.
This patch adds, 'acpi-index'* property and wires up
a 32bit register on top of pci hotplug register block
to pass index value to AML code at runtime.
Following patch will add corresponding _DSM code and
wire it up to PCI devices described in ACPI.
*) name comes from linux kernel terminology
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
While pci_bus_realize() currently does not use the Error* argument,
it would be an error to leave pcie_bus_realize() setting bus->flags
if pci_bus_realize() had failed.
Fix by using a local Error* and return early (propagating the error)
if pci_bus_realize() failed.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210201153700.618946-1-philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit a1190ab628 has added a "allow_unplug_during_migration = true" at
the end of the main "if" block, so it is not needed to set it anymore
in the previous checking.
Remove it, to have only sub-ifs that check for needed conditions and exit
if one fails.
Fixes: 4f5b6a05a4 ("pci: add option for net failover")
Fixes: a1190ab628 ("migration: allow unplug during migration for failover devices")
Cc: jfreimann@redhat.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210212135250.2738750-2-lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
This property can be useful for distros to set up known-good ROM sizes for
migration purposes. The VM will fail to start if the ROM is too large,
and migration compatibility will not be broken if the ROM is too small.
Note that even though romsize is a uint32_t, it has to be between 1
(because empty ROM files are not accepted, and romsize must be greater
than the file) and 2^31 (because values above are not powers of two and
are rejected).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201218182736.1634344-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210203131828.156467-3-pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
get_image_size() returns an int64_t, which pci_add_option_rom() assigns
to an "int" without any range checking. A 32-bit BAR could be up to
2 GiB in size, so reject anything above it. In order to accomodate
a rounded-up size of 2 GiB, change pci_patch_ids's size argument
to unsigned.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210203131828.156467-2-pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
These cases require a bit more thought to review; in each case, the
code was appending to a list, but not with a FOOList **tail variable.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210113221013.390592-6-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Flawed change to qmp_guest_network_get_interfaces() dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Commit 8118f0950f "migration: Append JSON description of migration
stream" needs a JSON writer. The existing qobject_to_json() wasn't a
good fit, because it requires building a QObject to convert. Instead,
migration got its very own JSON writer, in commit 190c882ce2 "QJSON:
Add JSON writer". It tacitly limits numbers to int64_t, and strings
contents to characters that don't need escaping, unlike
qobject_to_json().
The previous commit factored the JSON writer out of qobject_to_json().
Replace migration's JSON writer by it.
Cc: Juan Quintela <quintela@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-17-armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Generalize the qdev_hotplug variable to the different phases of
machine initialization. We would like to allow different
monitor commands depending on the phase.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Prevent future developers working on root complexes, root ports, or
bridges that also wish to implement a BAR for those, from shooting
themselves in the foot. PCI type 1 headers only support 2 base address
registers. It is incorrect and difficult to figure out what is wrong
with the device when this mistake is made. With this, it is immediate
and obvious what has gone wrong.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Message-Id: <20201015181411.89104-2-ben.widawsky@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Asserts are used for developer bugs. As registering a bar of the wrong
size is not something that should be possible for a user to achieve,
this is a developer bug.
While here, use the more obvious helper function.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Message-Id: <20201015181411.89104-1-ben.widawsky@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
These assertions similar to those in the adjacent pci_bus_get_irq_level()
function ensure that irqnum lies within the valid PCI bus IRQ range.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20201011082022.3016-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201024203900.3619498-3-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Extract pci_bus_change_irq_level() from pci_change_irq_level() to
make it clearer it operates on the bus.
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201024203900.3619498-2-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
'occupied' is spelled like 'ocuppied' in the message.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201006133958.600932-1-jusual@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Only qemu-system-FOO and qemu-storage-daemon provide QMP
monitors, therefore such declarations and definitions are
irrelevant for user-mode emulation.
Extracting the PCI commands to their own schema reduces the size of
the qapi-misc* headers generated, and pulls less QAPI-generated code
into user-mode.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200913195348.1064154-9-philmd@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
When debugging QEMU it is often useful to put a breakpoint on the
error_setg_internal method impl.
Unfortunately the object_property_add / object_class_property_add
methods call object_property_find / object_class_property_find methods
to check if a property exists already before adding the new property.
As a result there are a huge number of calls to error_setg_internal
on startup of most QEMU commands, making it very painful to set a
breakpoint on this method.
Most callers of object_find_property and object_class_find_property,
however, pass in a NULL for the Error parameter. This simplifies the
methods to remove the Error parameter entirely, and then adds some
new wrapper methods that are able to raise an Error when needed.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200914135617.1493072-1-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-18-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
I'm converting from qdev_create()/qdev_init_nofail() to
qdev_new()/qdev_realize_and_unref(); recent commit "qdev: New
qdev_new(), qdev_realize(), etc." explains why.
PCI devices use qdev_create() through pci_create() and
pci_create_multifunction().
Provide pci_new(), pci_new_multifunction(), and
pci_realize_and_unref() for converting PCI devices.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-14-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
I'm going to convert device realization to qdev_realize() with the
help of Coccinelle. Convert bus realization to qbus_realize() first,
to get it out of Coccinelle's way. Readability improves.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-7-armbru@redhat.com>
Sometimes it would be good to be able to read the pin number along
with the IRQ number allocated. Since we'll dump the IRQ number, no
reason to not dump the pin information. For example, the vfio-pci
device will overwrite the pin with the hardware pin number. It would
be nice to know the pin number of one assigned device from QMP/HMP.
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
CC: Julia Suvorova <jusual@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200317195908.283800-1-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
QEMU currently aborts when being started with "-nic model=rocker" or with
"-net nic,model=rocker". This happens because the "rocker" device is not
a normal NIC but a switch, which has different properties. Thus we should
only consider real NIC devices for "-nic" and "-net". These devices can
be identified by the "netdev" property, so check for this property before
adding the device to the list.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Fixes: 52310c3fa7 ("net: allow using any PCI NICs in -net or -nic")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200527153152.9211-1-thuth@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
While accessing PCI configuration bytes, assert that
'address + len' is within PCI configuration space.
Generally it is within bounds. This is more of a defensive
assert, in case a buggy device was to send 'address' which
may go out of bounds.
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20200604113525.58898-1-ppandit@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Devices may have component devices and buses.
Device realization may fail. Realization is recursive: a device's
realize() method realizes its components, and device_set_realized()
realizes its buses (which should in turn realize the devices on that
bus, except bus_set_realized() doesn't implement that, yet).
When realization of a component or bus fails, we need to roll back:
unrealize everything we realized so far. If any of these unrealizes
failed, the device would be left in an inconsistent state. Must not
happen.
device_set_realized() lets it happen: it ignores errors in the roll
back code starting at label child_realize_fail.
Since realization is recursive, unrealization must be recursive, too.
But how could a partly failed unrealize be rolled back? We'd have to
re-realize, which can fail. This design is fundamentally broken.
device_set_realized() does not roll back at all. Instead, it keeps
unrealizing, ignoring further errors.
It can screw up even for a device with no buses: if the lone
dc->unrealize() fails, it still unregisters vmstate, and calls
listeners' unrealize() callback.
bus_set_realized() does not roll back either. Instead, it stops
unrealizing.
Fortunately, no unrealize method can fail, as we'll see below.
To fix the design error, drop parameter @errp from all the unrealize
methods.
Any unrealize method that uses @errp now needs an update. This leads
us to unrealize() methods that can fail. Merely passing it to another
unrealize method cannot cause failure, though. Here are the ones that
do other things with @errp:
* virtio_serial_device_unrealize()
Fails when qbus_set_hotplug_handler() fails, but still does all the
other work. On failure, the device would stay realized with its
resources completely gone. Oops. Can't happen, because
qbus_set_hotplug_handler() can't actually fail here. Pass
&error_abort to qbus_set_hotplug_handler() instead.
* hw/ppc/spapr_drc.c's unrealize()
Fails when object_property_del() fails, but all the other work is
already done. On failure, the device would stay realized with its
vmstate registration gone. Oops. Can't happen, because
object_property_del() can't actually fail here. Pass &error_abort
to object_property_del() instead.
* spapr_phb_unrealize()
Fails and bails out when remove_drcs() fails, but other work is
already done. On failure, the device would stay realized with some
of its resources gone. Oops. remove_drcs() fails only when
chassis_from_bus()'s object_property_get_uint() fails, and it can't
here. Pass &error_abort to remove_drcs() instead.
Therefore, no unrealize method can fail before this patch.
device_set_realized()'s recursive unrealization via bus uses
object_property_set_bool(). Can't drop @errp there, so pass
&error_abort.
We similarly unrealize with object_property_set_bool() elsewhere,
always ignoring errors. Pass &error_abort instead.
Several unrealize methods no longer handle errors from other unrealize
methods: virtio_9p_device_unrealize(),
virtio_input_device_unrealize(), scsi_qdev_unrealize(), ...
Much of the deleted error handling looks wrong anyway.
One unrealize methods no longer ignore such errors:
usb_ehci_pci_exit().
Several realize methods no longer ignore errors when rolling back:
v9fs_device_realize_common(), pci_qdev_unrealize(),
spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(),
virtio_device_realize().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-17-armbru@redhat.com>
The pci_do_device_reset() function (called from pci_device_reset)
clears the PCI_INTERRUPT_LINE config reg of devices on the bus but did
this without taking wmask into account. We'll have a device model now
that needs to set a constant value for this reg and this patch allows
to do that without additional workaround in device emulation to
reverse the effect of this PCI bus reset function.
Suggested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-id: 20200313082444.2439-4-mark.cave-ayland@ilande.co.uk
Signed-off-by: John Snow <jsnow@redhat.com>
Define the new macro VMSTATE_INSTANCE_ID_ANY for callers who wants to
auto-generate the vmstate instance ID. Previously it was hard coded
as -1 instead of this macro. It helps to change this default value in
the follow up patches. No functional change.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Now that the old pc-0.x machine types have been removed, this config
knob is not required anymore.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20191209125248.5849-4-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
PCIe requester IDs are used by modern IOMMUs to differentiate devices
in order to provide a unique IOVA address space per device. These
requester IDs are composed of the bus/device/function (BDF) of the
requesting device. Conventional PCI pre-dates this concept and is
simply a shared parallel bus where transactions are claimed by
decoding target ranges rather than the packetized, point-to-point
mechanisms of PCI-express. In order to interface conventional PCI
to PCIe, the PCIe-to-PCI bridge creates and accepts packetized
transactions on behalf of all downstream devices, using one of two
potential forms of a requester ID relating to the bridge itself or its
subordinate bus. All downstream devices are therefore aliased by the
bridge's requester ID and it's not possible for the IOMMU to create
unique IOVA spaces for devices downstream of such buses.
At least that's how it works on bare metal. Until now point we've
ignored this nuance of vIOMMU support in QEMU, creating a unique
AddressSpace per device regardless of the virtual bus topology.
Aside from simply being true to bare metal behavior, there are aspects
of a shared address space that we can use to our advantage when
designing a VM. For instance, a PCI device assignment scenario where
we have the following IOMMU group on the host system:
$ ls /sys/kernel/iommu_groups/1/devices/
0000:00:01.0 0000:01:00.0 0000:01:00.1
An IOMMU group is considered the smallest set of devices which are
fully DMA isolated from other devices by the IOMMU. In this case the
root port at 00:01.0 does not guarantee that it prevents peer to peer
traffic between the endpoints on bus 01: and the devices are therefore
grouped together. VFIO considers an IOMMU group to be the smallest
unit of device ownership and allows only a single shared IOVA space
per group due to the limitations of the isolation.
Therefore, if we attempt to create the following VM, we get an error:
qemu-system-x86_64 -machine q35... \
-device intel-iommu,intremap=on \
-device pcie-root-port,addr=1e.0,id=pcie.1 \
-device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \
-device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1
qemu-system-x86_64: -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1: vfio \
0000:01:00.1: group 1 used in multiple address spaces
VFIO only allows a single IOVA space (AddressSpace) for both devices,
but we've placed them into a topology where the vIOMMU expects a
separate AddressSpace for each device. On bare metal we know that
a conventional PCI bus would provide the sort of aliasing we need
here, forcing the IOMMU to consider these devices to be part of a
single shared IOVA space. The support provided here does the same
for QEMU, such that we can create a conventional PCI topology to
expose equivalent AddressSpace sharing requirements to the VM:
qemu-system-x86_64 -machine q35... \
-device intel-iommu,intremap=on \
-device pcie-pci-bridge,addr=1e.0,id=pci.1 \
-device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \
-device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1
There are pros and cons to this configuration; it's not necessarily
recommended, it's simply a tool we can use to create configurations
which may provide additional functionality in spite of host hardware
limitations or as a benefit to the guest configuration or resource
usage. An incomplete list of pros and cons:
Cons:
a) Extended PCI configuration space is unavailable to devices
downstream of a conventional PCI bus. The degree to which this
is a drawback depends on the device and guest drivers.
b) Applying this topology to devices which are already isolated by
the host IOMMU (singleton IOMMU groups) will result in devices
which appear to be non-isolated to the VM (non-singleton groups).
This can limit configurations within the guest, such as userspace
drivers or nested device assignment.
Pros:
a) QEMU better emulates bare metal.
b) Configurations as above are now possible.
c) Host IOMMU resources and VM locked memory requirements are reduced
in vIOMMU configurations due to shared IOMMU domains on the host
and avoidance of duplicate locked memory accounting.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <157187083548.5439.14747141504058604843.stgit@gimli.home>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In "b06424de62 migration: Disable hotplug/unplug during migration" we
added a check to disable unplug for all devices until we have figured
out what works. For failover primary devices qdev_unplug() is called
from the migration handler, i.e. during migration.
This patch adds a flag to DeviceState which is set to false for all
devices and makes an exception for PCI devices that are also
primary devices in a failover pair.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Message-Id: <20191029114905.6856-8-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch adds a failover_pair_id property to PCIDev which is
used to link the primary device in a failover pair (the PCI dev) to
a standby (a virtio-net-pci) device.
It only supports ethernet devices. Also currently it only supports
PCIe devices. The requirement for PCIe is because it doesn't support
other hotplug controllers at the moment. The failover functionality can
be added to other hotplug controllers like ACPI, SHCP,... later on.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Message-Id: <20191029114905.6856-3-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In my "build everything" tree, changing sysemu/sysemu.h triggers a
recompile of some 5400 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
hw/qdev-core.h includes sysemu/sysemu.h since recent commit e965ffa70a
"qdev: add qdev_add_vm_change_state_handler()". This is a bad idea:
hw/qdev-core.h is widely included.
Move the declaration of qdev_add_vm_change_state_handler() to
sysemu/sysemu.h, and drop the problematic include from hw/qdev-core.h.
Touching sysemu/sysemu.h now recompiles some 1800 objects.
qemu/uuid.h also drops from 5400 to 1800. A few more headers show
smaller improvement: qemu/notify.h drops from 5600 to 5200,
qemu/timer.h from 5600 to 4500, and qapi/qapi-types-run-state.h from
5500 to 5000.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190812052359.30071-28-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Commit e35704ba9c "numa: Move NUMA declarations from sysemu.h to
numa.h" left a few NUMA-related macros behind. Move them now.
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-26-armbru@redhat.com>
In my "build everything" tree, changing hw/qdev-properties.h triggers
a recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
Many places including hw/qdev-properties.h (directly or via hw/qdev.h)
actually need only hw/qdev-core.h. Include hw/qdev-core.h there
instead.
hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h
and hw/qdev-properties.h, which in turn includes hw/qdev-core.h.
Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h.
While there, delete a few superfluous inclusions of hw/qdev-core.h.
Touching hw/qdev-properties.h now recompiles some 1200 objects.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-22-armbru@redhat.com>
In my "build everything" tree, changing hw/hw.h triggers a recompile
of some 2600 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).
The previous commits have left only the declaration of hw_error() in
hw/hw.h. This permits dropping most of its inclusions. Touching it
now recompiles less than 200 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-19-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
In my "build everything" tree, changing migration/vmstate.h triggers a
recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
hw/hw.h supposedly includes it for convenience. Several other headers
include it just to get VMStateDescription. The previous commit made
that unnecessary.
Include migration/vmstate.h only where it's still needed. Touching it
now recompiles only some 1600 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-16-armbru@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
In my "build everything" tree, changing hw/irq.h triggers a recompile
of some 5400 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).
hw/hw.h supposedly includes it for convenience. Several other headers
include it just to get qemu_irq and.or qemu_irq_handler.
Move the qemu_irq and qemu_irq_handler typedefs from hw/irq.h to
qemu/typedefs.h, and then include hw/irq.h only where it's still
needed. Touching it now recompiles only some 500 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-13-armbru@redhat.com>
In my "build everything" tree, changing migration/qemu-file-types.h
triggers a recompile of some 2600 out of 6600 objects (not counting
tests and objects that don't depend on qemu/osdep.h).
The culprit is again hw/hw.h, which supposedly includes it for
convenience.
Include migration/qemu-file-types.h only where it's needed. Touching
it now recompiles less than 200 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>