Commit Graph

16 Commits

Author SHA1 Message Date
Steve Sistare
322ee16824 vfio/pci: preserve pending interrupts
cpr-transfer may lose a VFIO interrupt because the KVM instance is
destroyed and recreated.  If an interrupt arrives in the middle, it is
dropped.  To fix, stop pending new interrupts during cpr save, and pick
up the pieces.  In more detail:

Stop the VCPUs. Call kvm_irqchip_remove_irqfd_notifier_gsi --> KVM_IRQFD to
deassign the irqfd gsi that routes interrupts directly to the VCPU and KVM.
After this call, interrupts fall back to the kernel vfio_msihandler, which
writes to QEMU's kvm_interrupt eventfd.  CPR already preserves that
eventfd.  When the route is re-established in new QEMU, the kernel tests
the eventfd and injects an interrupt to KVM if necessary.

Deassign INTx in a similar manner.  For both MSI and INTx, remove the
eventfd handler so old QEMU does not consume an event.

If an interrupt was already pended to KVM prior to the completion of
kvm_irqchip_remove_irqfd_notifier_gsi, it will be recovered by the
subsequent call to cpu_synchronize_all_states, which pulls KVM interrupt
state to userland prior to saving it in vmstate.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1752689169-233452-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-08-09 00:06:48 +02:00
Steve Sistare
76cfb87f5f vfio/pci: augment set_handler
Extend vfio_pci_msi_set_handler() so it can set or clear the handler.
Add a similar accessor for INTx.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1752689169-233452-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-08-09 00:06:48 +02:00
Steve Sistare
9751377c3a vfio: fix sub-page bar after cpr
Regions for sub-page BARs are normally mapped here, in response to the
guest writing to PCI config space:

  vfio_pci_write_config()
    pci_default_write_config()
      pci_update_mappings()
        memory_region_add_subregion()
    vfio_sub_page_bar_update_mapping()
      ... vfio_dma_map()

However, after CPR, the guest does not reconfigure the device and the
code path above is not taken.  To fix, in vfio_cpr_pci_post_load, call
vfio_sub_page_bar_update_mapping for each sub-page BAR with a valid
address.

Fixes: 7e9f214113 ("vfio/container: restore DMA vaddr")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1752520890-223356-1-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-28 17:52:34 +02:00
Steve Sistare
99cedd5d55 vfio/container: delete old cpr register
vfio_cpr_[un]register_container is no longer used since they were
subsumed by container type-specific registration.  Delete them.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-21-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
7ed0919119 migration: close kvm after cpr
cpr-transfer breaks vfio network connectivity to and from the guest, and
the host system log shows:
  irq bypass consumer (token 00000000a03c32e5) registration fails: -16
which is EBUSY.  This occurs because KVM descriptors are still open in
the old QEMU process.  Close them.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
87aeaead5c vfio-pci: preserve INTx
Preserve vfio INTx state across cpr-transfer.  Preserve VFIOINTx fields as
follows:
  pin : Recover this from the vfio config in kernel space
  interrupt : Preserve its eventfd descriptor across exec.
  unmask : Ditto
  route.irq : This could perhaps be recovered in vfio_pci_post_load by
    calling pci_device_route_intx_to_irq(pin), whose implementation reads
    config space for a bridge device such as ich9.  However, there is no
    guarantee that the bridge vmstate is read before vfio vmstate.  Rather
    than fiddling with MigrationPriority for vmstate handlers, explicitly
    save route.irq in vfio vmstate.
  pending : save in vfio vmstate.
  mmap_timeout, mmap_timer : Re-initialize
  bool kvm_accel : Re-initialize

In vfio_realize, defer calling vfio_intx_enable until the vmstate
is available, in vfio_pci_post_load.  Modify vfio_intx_enable and
vfio_intx_kvm_enable to skip vfio initialization, but still perform
kvm initialization.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
30edcb4d4e vfio-pci: preserve MSI
Save the MSI message area as part of vfio-pci vmstate, and preserve the
interrupt and notifier eventfd's.  migrate_incoming loads the MSI data,
then the vfio-pci post_load handler finds the eventfds in CPR state,
rebuilds vector data structures, and attaches the interrupts to the new
KVM instance.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
031fbb7110 vfio-pci: skip reset during cpr
Do not reset a vfio-pci device during CPR, and do not complain if the
kernel's PCI config space changes for non-emulated bits between the
vmstate save and load, which can happen due to ongoing interrupt activity.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-12-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
Steve Sistare
54857b0816 vfio/container: register container for cpr
Register a legacy container for cpr-transfer, replacing the generic CPR
register call with a more specific legacy container register call.  Add a
blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.

This is mostly boiler plate.  The fields to to saved and restored are added
in subsequent patches.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
Steve Sistare
3ed34463a2 vfio: move vfio-cpr.h
Move vfio-cpr.h to include/hw/vfio, because it will need to be included by
other files there.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1748546679-154091-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05 10:40:38 +02:00
Cédric Le Goater
11b8b9d53d vfio: Rename vfio-common.h to vfio-device.h
"hw/vfio/vfio-common.h" has been emptied of most of its declarations
by the previous changes and the only declarations left are related to
VFIODevice. Rename it to "hw/vfio/vfio-device.h" and make the
necessary adjustments.

Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-36-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-04-25 09:01:37 +02:00
Cédric Le Goater
8140d45b10 vfio: Introduce new files for CPR definitions and declarations
Gather all CPR related declarations into "vfio-cpr.h" to reduce exposure
of VFIO internals in "hw/vfio/vfio-common.h". These were introduced in
commit d9fa4223b3 ("vfio: register container for cpr").

Order file list in meson.build while at it.

Cc: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-22-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-04-25 09:01:37 +02:00
Philippe Mathieu-Daudé
32cad1ffb8 include: Rename sysemu/ -> system/
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.

Files renamed manually then mechanical change using sed tool.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
2024-12-20 17:44:56 +01:00
Zhenzhong Duan
f38f5dd1d4 vfio/cpr: Make vfio_cpr_register_container() return bool
This is to follow the coding standand to return bool if 'Error **'
is used to pass error.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-16 16:59:20 +02:00
Steve Sistare
0cb51c183a vfio: allow cpr-reboot migration if suspended
Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.  The
user is responsible for suspending the guest before initiating cpr, such as
by issuing guest-suspend-ram to the qemu guest agent.

Relax the vfio blocker so it does not apply to cpr, and add a notifier that
verifies the guest is suspended.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-03-08 22:10:13 +01:00
Steve Sistare
d9fa4223b3 vfio: register container for cpr
Define entry points to perform per-container cpr-specific initialization
and teardown.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-03-08 22:10:13 +01:00