cpr-transfer may lose a VFIO interrupt because the KVM instance is
destroyed and recreated. If an interrupt arrives in the middle, it is
dropped. To fix, stop pending new interrupts during cpr save, and pick
up the pieces. In more detail:
Stop the VCPUs. Call kvm_irqchip_remove_irqfd_notifier_gsi --> KVM_IRQFD to
deassign the irqfd gsi that routes interrupts directly to the VCPU and KVM.
After this call, interrupts fall back to the kernel vfio_msihandler, which
writes to QEMU's kvm_interrupt eventfd. CPR already preserves that
eventfd. When the route is re-established in new QEMU, the kernel tests
the eventfd and injects an interrupt to KVM if necessary.
Deassign INTx in a similar manner. For both MSI and INTx, remove the
eventfd handler so old QEMU does not consume an event.
If an interrupt was already pended to KVM prior to the completion of
kvm_irqchip_remove_irqfd_notifier_gsi, it will be recovered by the
subsequent call to cpu_synchronize_all_states, which pulls KVM interrupt
state to userland prior to saving it in vmstate.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1752689169-233452-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
We provide to sdbus_do_command() a pointer to a buffer to be
filled with a varying number of bytes. By not providing the
buffer size, the callee can not check the buffer is big enough.
Pass the buffer size as argument to follow good practices.
sdbus_do_command() doesn't return any error, only the size filled
in the buffer. Convert the returned type to unsigned and remove
the few unreachable lines in callers.
This allow to check for possible overflow in sd_do_command().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250804133406.17456-4-philmd@linaro.org>
* Various bug fixes around lost interrupts particularly.
* Major group interrupt work, in particular around redistributing
interrupts. Upstream group support is not in a complete or usable
state as it is.
* Significant context push/pull improvements, particularly pool and
phys context handling was quite incomplete beyond trivial OPAL
case that pushes at boot.
* Improved tracing and checking for unimp and guest error situations.
* Various other missing feature support.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmh951cACgkQUaNDx8/7
7KFK6w//SAmZpNmE380UN4OxMBcjsT5m5Cf2hy+Wq9pSEcwWckBFT03HyR86JAv3
QLR1d6yx7dY0aVWAHtFC24vlU2jpv0Io97wfX9VbgG7e4TY/i1vRMSXYYehXuU/Y
gLrwuJGxAMKWrd+4ymvHOyXHRAq3LMGQQYfqLCB77b8UJ18JyCL8FwAl/D6EsZ1y
nUW8WlDy6qQ/SJQHZZ664kyJEv7Qw4xd81ZnmoPsy3xVd7c4ASNBWvDTjRoUn2EN
sfJW76UqqFn3EqASaKsqoNPHu3kklQ/AX3KlE1wFCBjYoXwl/051wIX4RIb+b2S4
SLtc/YSAie1n2Pp1sghfLRFiRpjrmnqaLlw04Buw1TXY2OaQbFc9zTkc9rvFSez1
cNjdJcvm3myAWy2Pg//Nt3FgCqfMlrrdTlyGsdqmrEaplBy6pHnas+82o5tPGC3t
SBMgTDqNMq0v/V/gOIsmHc5/9f+FS5s+v/nvm0xJDfLkY39qP73W+YZllYyyuTHY
HiLVjD7x5BSGZAsP9EN6EnL7DPXKPIIQSfNwo2564tAhe3/IyJo8hpGhMeiZ83Hf
G9oPiLa4YljsHzP0UPRNhID5IYyngEDoh2j3AXnew1tkikHd5LIpNCdbtW5x52RR
kik4hBmqJU6sYpO0O9yCd6YWv/Bpm4bDs6tQOSWMc6uWqP0qN8M=
=65BL
-----END PGP SIGNATURE-----
Merge tag 'pull-ppc-20250721' of https://github.com/legoater/qemu into staging
ppc/xive queue:
* Various bug fixes around lost interrupts particularly.
* Major group interrupt work, in particular around redistributing
interrupts. Upstream group support is not in a complete or usable
state as it is.
* Significant context push/pull improvements, particularly pool and
phys context handling was quite incomplete beyond trivial OPAL
case that pushes at boot.
* Improved tracing and checking for unimp and guest error situations.
* Various other missing feature support.
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmh951cACgkQUaNDx8/7
# 7KFK6w//SAmZpNmE380UN4OxMBcjsT5m5Cf2hy+Wq9pSEcwWckBFT03HyR86JAv3
# QLR1d6yx7dY0aVWAHtFC24vlU2jpv0Io97wfX9VbgG7e4TY/i1vRMSXYYehXuU/Y
# gLrwuJGxAMKWrd+4ymvHOyXHRAq3LMGQQYfqLCB77b8UJ18JyCL8FwAl/D6EsZ1y
# nUW8WlDy6qQ/SJQHZZ664kyJEv7Qw4xd81ZnmoPsy3xVd7c4ASNBWvDTjRoUn2EN
# sfJW76UqqFn3EqASaKsqoNPHu3kklQ/AX3KlE1wFCBjYoXwl/051wIX4RIb+b2S4
# SLtc/YSAie1n2Pp1sghfLRFiRpjrmnqaLlw04Buw1TXY2OaQbFc9zTkc9rvFSez1
# cNjdJcvm3myAWy2Pg//Nt3FgCqfMlrrdTlyGsdqmrEaplBy6pHnas+82o5tPGC3t
# SBMgTDqNMq0v/V/gOIsmHc5/9f+FS5s+v/nvm0xJDfLkY39qP73W+YZllYyyuTHY
# HiLVjD7x5BSGZAsP9EN6EnL7DPXKPIIQSfNwo2564tAhe3/IyJo8hpGhMeiZ83Hf
# G9oPiLa4YljsHzP0UPRNhID5IYyngEDoh2j3AXnew1tkikHd5LIpNCdbtW5x52RR
# kik4hBmqJU6sYpO0O9yCd6YWv/Bpm4bDs6tQOSWMc6uWqP0qN8M=
# =65BL
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 21 Jul 2025 03:08:07 EDT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full]
# gpg: aka "Cédric Le Goater <clg@kaod.org>" [full]
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-ppc-20250721' of https://github.com/legoater/qemu: (50 commits)
ppc/xive2: Enable lower level contexts on VP push
ppc/xive: Split need_resend into restore_nvp
ppc/xive2: Implement PHYS ring VP push TIMA op
ppc/xive2: Implement POOL LGS push TIMA op
ppc/xive2: Implement set_os_pending TIMA op
ppc/xive2: redistribute group interrupts on context push
ppc/xive2: Implement pool context push TIMA op
ppc/xive: Check TIMA operations validity
ppc/xive: Redistribute phys after pulling of pool context
ppc/xive2: Prevent pulling of pool context losing phys interrupt
ppc/xive2: implement NVP context save restore for POOL ring
ppc/xive: Assert group interrupts were redistributed
ppc/xive2: Avoid needless interrupt re-check on CPPR set
ppc/xive2: Consolidate presentation processing in context push
ppc/xive2: split tctx presentation processing from set CPPR
ppc/xive: Add xive_tctx_pipr_set() helper function
ppc/xive: tctx_accept only lower irq line if an interrupt was presented
ppc/xive: tctx signaling registers rework
ppc/xive: Split xive recompute from IPB function
ppc/xive: Fix high prio group interrupt being preempted by low prio VP
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
xive2 must take into account redistribution of group interrupts if
the VP directed priority exceeds the group interrupt priority after
this operation. The xive1 code is not group aware so implement this
for xive2.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-47-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Certain TIMA operations should only be performed when a ring is valid,
others when the ring is invalid, and they are considered undefined if
used incorrectly. Add checks for this condition.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-44-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
In preparation to implement POOL context push, add support for POOL
NVP context save/restore.
The NVP p bit is defined in the spec as follows:
If TRUE, the CPPR of a Pool VP in the NVP is updated during store of
the context with the CPPR of the Hard context it was running under.
It's not clear whether non-pool VPs always or never get CPPR updated.
Before this patch, OS contexts always save CPPR, so we will assume that
is the behaviour.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-41-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Have xive_tctx_notify() also set the new PIPR value and rename it to
xive_tctx_pipr_set(). This can replace the last xive_tctx_pipr_update()
caller because it does not need to update IPB (it already sets it).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-36-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The tctx "signaling" registers (PIPR, CPPR, NSR) raise an interrupt on
the target CPU thread. The POOL and PHYS rings both raise hypervisor
interrupts, so they both share one set of signaling registers in the
PHYS ring. The PHYS NSR register contains a field that indicates which
ring has presented the interrupt being signaled to the CPU.
This sharing results in all the "alt_regs" throughout the code. alt_regs
is not very descriptive, and worse is that the name is used for
conversions in both directions, i.e., to find the presenting ring from
the signaling ring, and the signaling ring from the presenting ring.
Instead of alt_regs, use the names sig_regs and sig_ring, and regs and
ring for the presenting ring being worked on. Add a helper function to
get the sign_regs, and add some asserts to ensure the POOL regs are
never used to signal interrupts.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-34-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
xive_tctx_pipr_update() is used for multiple things. In an effort
to make things simpler and less overloaded, split out the function
that is used to present a new interrupt to the tctx.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-31-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Have the match_nvt method only perform a TCTX match but don't present
the interrupt, the caller presents. This has no functional change, but
allows for more complicated presentation logic after matching.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-29-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When disabling (pulling) an xive interrupt context, we need
to redistribute any active group interrupts to other threads
that can handle the interrupt if possible. This support had
already been added for the OS context but had not yet been
added to the pool or physical context.
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-28-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Booting AIX in a PowerVM partition requires the use of the "Acknowledge
O/S Interrupt to even O/S reporting line" special operation provided by
the IBM XIVE interrupt controller. This operation is invoked by writing
a byte (data is irrelevant) to offset 0xC10 of the Thread Interrupt
Management Area (TIMA). It can be used by software to notify the XIVE
logic that the interrupt was received.
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-26-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When an XIVE context is pulled while it has an active, unacknowledged
group interrupt, XIVE will check to see if a context on another thread
can handle the interrupt and, if so, notify that context. If there
are no contexts that can handle the interrupt, then the interrupt is
added to a backlog and XIVE will attempt to escalate the interrupt,
if configured to do so, allowing the higher privileged handler to
activate a context that can handle the original interrupt.
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-23-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Adds support for extracting additional configuration flags from
the XIVE configuration register that are needed for redistribution
of group interrupts.
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-22-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add support for XIVE ESB Interrupt Escalation.
Suggested-by: Michael Kowal <kowal@linux.ibm.com>
[This change was taken from a patch provided by Michael Kowal.]
Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-18-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Rather than functions to return masks to test NSR bits, have functions
to test those bits directly. This should be no functional change, it
just makes the code more readable.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-16-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The queue size of an Event Notification Descriptor (END)
is determined by the 'cl' and QsZ fields of the END.
If the cl field is 1, then the queue size (in bytes) will
be the size of a cache line 128B * 2^QsZ and QsZ is limited
to 4. Otherwise, it will be 4096B * 2^QsZ with QsZ limited
to 12.
Fixes: f8a233dedf ("ppc/xive2: Introduce a XIVE2 core framework")
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-4-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Currently the ramfb device loads the vgabios-ramfb.bin unconditionally,
but only the x86 need the vgabios-ramfb.bin, this can cause that when
use the release package on arm64 it can't find the vgabios-ramfb.bin.
Because only seabios will use the vgabios-ramfb.bin, load the rom logic
is x86-specific. For other !x86 platforms, the edk2 ships an EFI driver
for ramfb, so they don't need to load the romfile.
So add a new property use-legacy-x86-rom in both ramfb and vfio_pci
device, because the vfio display also use the ramfb_setup() to load
the vgabios-ramfb.bin file.
After have this property, the machine type can set the compatibility to
not load the vgabios-ramfb.bin if the arch doesn't need it.
For now the default value is true but it will be turned off by default
in subsequent patch when compats get properly handled.
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
Message-ID: <20250717100941.2230408-2-shahuang@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
SPCR acpi table can now be disabled
vhost-vdpa can now report hashing capability to guest
PPTT acpi table now tells guest vCPUs are identical
vost-user-blk now shuts down faster
loongarch64 now supports bios-tables-test
intel_iommu now supports ATS
cxl now supports DCD Fabric Management Command Set
arm now supports acpi pci hotplug
fixes, cleanups
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ
pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv
joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i
TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg
h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z
ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc=
=sktm
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pci,pc: features, fixes, tests
SPCR acpi table can now be disabled
vhost-vdpa can now report hashing capability to guest
PPTT acpi table now tells guest vCPUs are identical
vost-user-blk now shuts down faster
loongarch64 now supports bios-tables-test
intel_iommu now supports ATS
cxl now supports DCD Fabric Management Command Set
arm now supports acpi pci hotplug
fixes, cleanups
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ
# pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv
# joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i
# TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg
# h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z
# ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc=
# =sktm
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits)
hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release
hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add
hw/cxl: Create helper function to create DC Event Records from extents
hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists
hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config
hw/mem: cxl_type3: Add DC Region bitmap lock
hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header
hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config
hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct
hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info
hw/cxl: fix DC extent capacity tracking
tests: virt: Update expected ACPI tables for virt test
hw/acpi/aml-build: Build a root node in the PPTT table
hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes
tests: virt: Allow changes to PPTT test table
qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp
qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex
tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test
tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests
hw/arm/virt: Let virt support pci hotplug/unplug GED event
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Conflicts:
net/vhost-vdpa.c
vhost_vdpa_set_steering_ebpf() was removed, resolve the context
conflict.
As each target declares the same prototypes, we can
use a single header, removing the TARGET_XXX uses.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Message-Id: <20250513171737.74386-1-philmd@linaro.org>
Allow capping the maximum total size of in-flight VFIO device state buffers
queued at the destination, otherwise a malicious QEMU source could
theoretically cause the target QEMU to allocate unlimited amounts of memory
for buffers-in-flight.
Since this is not expected to be a realistic threat in most of VFIO live
migration use cases and the right value depends on the particular setup
disable this limit by default by setting it to UINT64_MAX.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/4f7cad490988288f58e36b162d7a888ed7e7fd17.1752589295.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This property allows configuring whether to start the config load only
after all iterables were loaded, during non-iterables loading phase.
Such interlocking is required for ARM64 due to this platform VFIO
dependency on interrupt controller being loaded first.
The property defaults to AUTO, which means ON for ARM, OFF for other
platforms.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/0e03c60dbc91f9a9ba2516929574df605b7dfcb4.1752589295.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
FM DCD Management command 0x5604 implemented per CXL r3.2 Spec Section 7.6.7.6.5
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-11-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Prepatory patch for following FMAPI Add/Release Patches. Refactors part
of qmp_cxl_process_dynamic_capacity_prescriptive() into a helper
function to create DC Event Records and insert in the event log.
Moves definition for CXL_NUM_EXTENTS_SUPPORTED to cxl.h so it can be
accessed by cxl-mailbox-utils.c and cxl-events.c, where the helper
function is defined.
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-10-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
FM DCD Management command 0x5602 implemented per CXL r3.2 Spec Section 7.6.7.6.3
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-8-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a lock on the bitmap of each CXLDCRegion in preparation for the next
patch which implements FMAPI Set DC Region Configuration. This command
can modify the block size, which means the region's bitmap must be updated
accordingly.
The lock becomes necessary when commands that add/release extents
(meaning they update the bitmap too) are enabled on a different CCI than
the CCI on which the FMAPI commands are enabled.
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-7-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move definition/enum to cxl_events.h for shared use in next patch
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-6-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add booleans to DC Region struct to represent dsmas flags (defined in CDAT) in
preparation for the next command, which returns the flags in the next mailbox
command 0x5601.
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-4-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
FM DCD Management command 0x5600 implemented per CXL 3.2 Spec Section 7.6.7.6.1.
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Per cxl r3.2 Section 9.13.3.3, extent capacity tracking should include
extents in different states including added, pending, etc.
Before the change, for the in-device extent number tracking purpose, we only
have "total_extent_count" defined, which only tracks the number of
extents accepted. However, we need to track number of extents in other
states also, for now it is extents pending-to-add.
To fix that, we introduce a new counter for dynamic capacity
"nr_extents_accepted" which explicitly tracks number of the extents
accepted by the hosts, and fix "total_extent_count" to include
both accepted and pending extents counting.
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use a local SysBusDevice handle. Also use the newly introduced
sysbus_mmio_map_name which brings better readability about the region
being mapped. GED device has regions which exist depending on some
external properties and it becomes difficult to guess the index of
a region. Better refer to a region by its name.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250714080639.2525563-32-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Some sysbus devices have conditional mmio regions. This
happens for instance with the hw/acpi/ged device. In that case
it becomes difficult to predict which index a specific MMIO
region corresponds to when one needs to mmio map the region.
Introduce a new helper that takes the name of the region instead
of its index. If the region is not found this returns -1.
Otherwise it maps the corresponding index and returns this latter.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250714080639.2525563-31-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
QEMU will notify the OS about PCI hotplug/hotunplug events through
GED interrupts. Let the GED device handle a new PCI hotplug event.
On its occurrence it calls the \\_SB.PCI0.PCNT method with the BLCK
mutex held.
The GED device uses a dedicated MMIO region that will be mapped
by the machine code.
At this point the GED still does not support PCI device hotplug in
its TYPE_HOTPLUG_HANDLER implementation. This will come in a
subsequent patch.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250714080639.2525563-29-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Let pass the root bus to ich9 and piix4 through a property link
instead of through an argument passed to acpi_pcihp_init().
Also make sure the root bus is set at the entry of acpi_pcihp_init().
The rationale of that change is to be consistent with the forecoming ARM
implementation where the machine passes the root bus (steming from GPEX)
to the GED device through a link property.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250714080639.2525563-28-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Modify the DSDT ACPI table to enable ACPI PCI hotplug.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250714080639.2525563-24-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move aml_pci_edsm to pci-bridge.c since we want to reuse that for
ARM and acpi-index support. Also rename it into build_pci_bridge_edsm.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250714080639.2525563-17-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We intend to reuse build_append_pci_bus_devices and build_append_pcihp_slots
on ARM. So let's move them to hw/acpi/pcihp.c as well as all static
helpers they use.
No functional change intended.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-Id: <20250714080639.2525563-15-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We plan to reuse build_append_notification_callback() on ARM
so let's move it to pcihp.c.
No functional change intended.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-Id: <20250714080639.2525563-14-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On ARM we will put the operation regions in AML_SYSTEM_MEMORY.
So let's allow this configuration.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-Id: <20250714080639.2525563-13-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Extract the code that reserves resources for ACPI PCI hotplug
into a new helper named build_append_pcihp_resources() and
move it to pcihp.c. We will reuse it on ARM.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-Id: <20250714080639.2525563-12-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
GPEX acpi_dsdt_add_pci_osc() does basically the same as
build_q35_osc_method().
Rename build_q35_osc_method() into build_pci_host_bridge_osc_method()
and move it into hw/acpi/pci.c. In a subsequent patch we will
use this later in place of acpi_dsdt_add_pci_osc().
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250714080639.2525563-9-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Retrieve the acpi pcihp property value from the ged. In case this latter
is not set, PCI native hotplug is used on pci0. For expander bridges we
keep pci native hotplug, as done on x86 q35.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-Id: <20250714080639.2525563-8-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A new boolean property is introduced. This will be used to turn
ACPI PCI hotplug support. By default it is unset.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250714080639.2525563-7-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>