Commit Graph

121351 Commits

Author SHA1 Message Date
Philippe Mathieu-Daudé
5731baee6c hw/vfio: Compile some common objects once
Some files don't rely on any target-specific knowledge
and can be compiled once:

 - helpers.c
 - container-base.c
 - migration.c (removing unnecessary "exec/ram_addr.h")
 - migration-multifd.c
 - cpr.c

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-4-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-6-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:15 +01:00
Philippe Mathieu-Daudé
80ce7bb5cf hw/vfio/common: Get target page size using runtime helpers
Prefer runtime helpers to get target page size.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250305153929.43687-3-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-5-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Philippe Mathieu-Daudé
514c296787 hw/vfio/common: Include missing 'system/tcg.h' header
Always include necessary headers explicitly, to avoid
when refactoring unrelated ones:

  hw/vfio/common.c:1176:45: error: implicit declaration of function ‘tcg_enabled’;
   1176 |                                             tcg_enabled() ? DIRTY_CLIENTS_ALL :
        |                                             ^~~~~~~~~~~

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250307180337.14811-2-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-4-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Philippe Mathieu-Daudé
a19ce97ed1 hw/vfio/spapr: Do not include <linux/kvm.h>
<linux/kvm.h> is already included by "system/kvm.h" in the next line.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250307180337.14811-3-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-3-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Philippe Mathieu-Daudé
c6cd30fead system: Declare qemu_[min/max]rampagesize() in 'system/hostmem.h'
Both qemu_minrampagesize() and qemu_maxrampagesize() are
related to host memory backends, having the following call
stack:

  qemu_minrampagesize()
     -> find_min_backend_pagesize()
         -> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)

  qemu_maxrampagesize()
     -> find_max_backend_pagesize()
        -> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)

Having TYPE_MEMORY_BACKEND defined in "system/hostmem.h":

  include/system/hostmem.h:23:#define TYPE_MEMORY_BACKEND "memory-backend"

Move their prototype declaration to "system/hostmem.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-7-philmd@linaro.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-2-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Maciej S. Szmigiero
cbb2e10526 vfio/migration: Use BE byte order for device state wire packets
Wire data commonly use BE byte order (including in the existing migration
protocol), use it also for for VFIO device state packets.

This will allow VFIO multifd device state transfer between hosts with
different endianness.
Although currently there is no such use case, it's good to have it now
for completeness.

Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/dcfc04cc1a50655650dbac8398e2742ada84ee39.1741611079.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
e46b7af508 vfio/igd: Fix broken KVMGT OpRegion support
The KVMGT/GVT-g vGPU also exposes OpRegion. But unlike IGD passthrough,
it only needs the OpRegion quirk. A previous change moved x-igd-opregion
handling to config quirk breaks KVMGT functionality as it brings extra
checks and applied other quirks. Here we check if the device is mdev
(KVMGT) or not (passthrough), and then applies corresponding quirks.

As before, users must manually specify x-igd-opregion=on to enable it
on KVMGT devices. In the future, we may check the VID/DID and enable
OpRegion automatically.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-11-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
674f61117d vfio/igd: Introduce x-igd-lpc option for LPC bridge ID quirk
The LPC bridge/Host bridge IDs quirk is also not dependent on legacy
mode. Recent Windows driver no longer depends on these IDs, as well as
Linux i915 driver, while UEFI GOP seems still needs them. Make it an
option to allow users enabling and disabling it as needed.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-10-tomitamoeko@gmail.com
[ clg: - Fixed spelling in vfio_probe_igd_config_quirk() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
434ac62ef2 vfio/igd: Handle x-igd-opregion option in config quirk
Both enable OpRegion option (x-igd-opregion) and legacy mode require
setting up OpRegion copy for IGD devices. As the config quirk no longer
depends on legacy mode, we can now handle x-igd-opregion option there
instead of in vfio_realize.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-9-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
2fedccf03c vfio/igd: Decouple common quirks from legacy mode
So far, IGD-specific quirks all require enabling legacy mode, which is
toggled by assigning IGD to 00:02.0. However, some quirks, like the BDSM
and GGC register quirks, should be applied to all supported IGD devices.
A new config option, x-igd-legacy-mode=[on|off|auto], is introduced to
control the legacy mode only quirks. The default value is "auto", which
keeps current behavior that enables legacy mode implicitly and continues
on error when all following conditions are met.
* Machine type is i440fx
* IGD device is at guest BDF 00:02.0

If any one of the conditions above is not met, the default behavior is
equivalent to "off", QEMU will fail immediately if any error occurs.

Users can also use "on" to force enabling legacy mode. It checks if all
the conditions above are met and set up legacy mode. QEMU will also fail
immediately on error in this case.

Additionally, the hotplug check in legacy mode is removed as hotplugging
IGD device is never supported, and it will be checked when enabling the
OpRegion quirk.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-8-tomitamoeko@gmail.com
[ clg: - Changed warn_report() by info_report() in
         vfio_probe_igd_config_quirk() as suggested by Alex W.
       - Fixed spelling in vfio_probe_igd_config_quirk () ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
9267f96ad6 vfio/igd: Refactor vfio_probe_igd_bar4_quirk into pci config quirk
The actual IO BAR4 write quirk in vfio_probe_igd_bar4_quirk was removed
in previous change, leaving the function not matching its name, so move
it into the newly introduced vfio_config_quirk_setup. There is no
functional change in this commit.

For now, to align with current legacy mode behavior, it returns and
proceeds on error. Later it will fail on error after decoupling the
quirks from legacy mode.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-7-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
b22ab580d2 vfio/pci: Add placeholder for device-specific config space quirks
IGD devices require device-specific quirk to be applied to their PCI
config space. Currently, it is put in the BAR4 quirk that does nothing
to BAR4 itself. Add a placeholder for PCI config space quirks to hold
that quirk later.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-6-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
eabaa7b468 vfio/igd: Move LPC bridge initialization to a separate function
A new option will soon be introduced to decouple the LPC bridge/Host
bridge ID quirk from legacy mode. To prepare for this, move the LPC
bridge initialization into a separate function.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-5-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
ae9d9ec4a6 vfio/igd: Consolidate OpRegion initialization into a single function
Both x-igd-opregion option and legacy mode require identical steps to
set up OpRegion for IGD devices. Consolidate these steps into a single
vfio_pci_igd_setup_opregion function.

The function call in pci.c is wrapped with ifdef temporarily to prevent
build error for non-x86 archs, it will be removed after we decouple it
from legacy mode.

Additionally, move vfio_pci_igd_opregion_init to igd.c to prevent it
from being compiled in non-x86 builds.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-4-tomitamoeko@gmail.com
[ clg: Fixed spelling in vfio_pci_igd_setup_opregion() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
5faec6a5e2 vfio/igd: Do not include GTT stolen size in etc/igd-bdsm-size
Though GTT Stolen Memory (GSM) is right below Data Stolen Memory (DSM)
in host address space, direct access to GSM is prohibited, and it is
not mapped to guest address space. Both host and guest accesses GSM
indirectly through the second half of MMIO BAR0 (GTTMMADR).

Guest firmware only need to reserve a memory region for DSM and program
the BDSM register with the base address of that region, that's actually
what both SeaBIOS[1] and IgdAssignmentDxe does now.

[1] https://gitlab.com/qemu-project/seabios/-/blob/1.12-stable/src/fw/pciinit.c#L319-332

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-3-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Tomita Moeko
5aed8b0f0b vfio/igd: Remove GTT write quirk in IO BAR 4
The IO BAR4 of IGD devices contains a pair of 32-bit address/data
registers, MMIO_Index (0x0) and MMIO_Data (0x4), which provide access
to the MMIO BAR0 (GTTMMADR) from IO space. These registers are probably
only used by the VBIOS, and are not documented by intel. The observed
layout of MMIO_Index register is:
 31                                                   2   1      0
+-------------------------------------------------------------------+
|                        Offset                        | Rsvd | Sel |
+-------------------------------------------------------------------+
- Offset: Byte offset in specified region, 4-byte aligned.
- Sel: Region selector
       0: MMIO register region (first half of MMIO BAR0)
       1: GTT region (second half of MMIO BAR0). Pre Gen11 only.

Currently, QEMU implements a quirk that adjusts the guest Data Stolen
Memory (DSM) region address to be (addr - host BDSM + guest BDSM) when
programming GTT entries via IO BAR4, assuming guest still programs GTT
with host DSM address, which is not the case. Guest's BDSM register is
emulated and initialized to 0 at startup by QEMU, then SeaBIOS programs
its value[1]. As result, the address programmed to GTT entries by VBIOS
running in guest are valid GPA, and this unnecessary adjustment brings
inconsistency.

[1] https://gitlab.com/qemu-project/seabios/-/blob/1.12-stable/src/fw/pciinit.c#L319-332

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-2-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11 17:01:14 +01:00
Kevin Wolf
b75c5f9879 block: Zero block driver state before reopening
Block drivers assume in their .bdrv_open() implementation that their
state in bs->opaque has been zeroed; it is initially allocated with
g_malloc0() in bdrv_open_driver().

bdrv_snapshot_goto() needs to make sure that it is zeroed again before
calling drv->bdrv_open() to avoid that block drivers use stale values.

One symptom of this bug is VMDK running into a double free when the user
tries to apply an internal snapshot like 'qemu-img snapshot -a test
test.vmdk'. This should be a graceful error because VMDK doesn't support
internal snapshots.

==25507== Invalid free() / delete / delete[] / realloc()
==25507==    at 0x484B347: realloc (vg_replace_malloc.c:1801)
==25507==    by 0x54B592A: g_realloc (gmem.c:171)
==25507==    by 0x1B221D: vmdk_add_extent (../block/vmdk.c:570)
==25507==    by 0x1B1084: vmdk_open_sparse (../block/vmdk.c:1059)
==25507==    by 0x1AF3D8: vmdk_open (../block/vmdk.c:1371)
==25507==    by 0x1A2AE0: bdrv_snapshot_goto (../block/snapshot.c:299)
==25507==    by 0x205C77: img_snapshot (../qemu-img.c:3500)
==25507==    by 0x58FA087: (below main) (libc_start_call_main.h:58)
==25507==  Address 0x832f3e0 is 0 bytes inside a block of size 272 free'd
==25507==    at 0x4846B83: free (vg_replace_malloc.c:989)
==25507==    by 0x54AEAC4: g_free (gmem.c:208)
==25507==    by 0x1AF629: vmdk_close (../block/vmdk.c:2889)
==25507==    by 0x1A2A9C: bdrv_snapshot_goto (../block/snapshot.c:290)
==25507==    by 0x205C77: img_snapshot (../qemu-img.c:3500)
==25507==    by 0x58FA087: (below main) (libc_start_call_main.h:58)

This error was discovered by fuzzing qemu-img.

Cc: qemu-stable@nongnu.org
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2853
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2851
Reported-by: Denis Rastyogin <gerben@altlinux.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250310104858.28221-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-03-11 15:49:14 +01:00
Kevin Wolf
000a41b69c block: Remove unused blk_op_is_blocked()
Commit fc4e394b28 removed the last caller of blk_op_is_blocked(). Remove
the now unused function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250206165331.379033-1-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-03-11 15:49:14 +01:00
BALATON Zoltan
0f17ae24b5 docs/system/ppc/amigang.rst: Update for NVRAM emulation
Add NVRAM and hint on how to make it persistent. Also update Linux
boot section which should now boot automatically with the new NVRAM
defaults so manual settings in menu may not be needed normally.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250304205926.87E364E6010@zero.eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
BALATON Zoltan
e6521e41ba ppc/amigaone: Add #defines for memory map constants
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <3b8e54ad9220d57e7b0a33f3570e880f26677ce8.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
BALATON Zoltan
34f053d86b ppc/amigaone: Add kernel and initrd support
Add support for -kernel, -initrd and -append command line options.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <489b1be5d95d5153e924c95b0691b8b53f9ffb9e.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
BALATON Zoltan
c2bed9957a ppc/amigaone: Add default environment
Initialise empty NVRAM with default values. This also enables IDE UDMA
mode in AmigaOS that is faster but has to be enabled in environment
due to problems with real hardware but that does not affect emulation
so we can use faster defaults here.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <4d63f88191612329e0ca8102c7c0d4fc626dc372.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
BALATON Zoltan
addff513a1 ppc/amigaone: Implement NVRAM emulation
The board has a battery backed NVRAM where U-Boot environment is
stored which is also accessed by AmigaOS and e.g. C:NVGetVar command
crashes without it having at least a valid checksum.

[npiggin: 32-bit compile fix]
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <7e4c0107ef6bdc2b20fb1e780a188275c7dc1e49.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
BALATON Zoltan
222d37d389 ppc/amigaone: Simplify replacement dummy_fw
There's no need to do shift in a loop, doing it in one instruction
works just as well, only the result is used.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <446bf740cbb99422be2cc5a31e51a1034eddded7.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Nicholas Piggin
d91b101da1 spapr: Generate random HASHPKEYR for spapr machines
The hypervisor is expected to create a value for the HASHPKEY SPR for
each partition. Currently it uses zero for all partitions, use a
random number instead, which in theory might make kernel ROP protection
more secure.

Signed-of-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241219034035.1826173-4-npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Nicholas Piggin
b4aa82dc3a target/ppc: Avoid warning message for zero process table entries
A translation that encounters a process table entry that is zero is
something that Linux does to cause certain kernel NULL pointer
dereferences to fault. It is not itself a programming error, so avoid
the guest error log.

Message-ID: <20241219034035.1826173-5-npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Nicholas Piggin
d8a624515a target/ppc: Wire up BookE ATB registers for e500 family
From the Freescale PowerPC Architecture Primer:

  Alternate time base APU. This APU, implemented on the e500v2, defines
  a 64-bit time base counter that differs from the PowerPC defined time
  base in that it is not writable and counts at a different, and
  typically much higher, frequency. The alternate time base always
  counts up, wrapping when the 64-bit count overflows.

This implementation of ATB uses the same frequency as the TB. The
existing spr_read_atbu/l functions are unused without this patch
to wire them into the SPR.

RTEMS uses this SPR on the e6500, though this hasn't been tested.

Message-ID: <20241219034035.1826173-6-npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Nicholas Piggin
e8291ec16d target/ppc: fix timebase register reset state
(H)DEC and PURR get reset before icount does, which causes them to be
skewed and not match the init state. This can cause replay to not
match the recorded trace exactly. For DEC and HDEC this is usually not
noticable since they tend to get programmed before affecting the
target machine. PURR has been observed to cause replay bugs when
running Linux.

Fix this by resetting using a time of 0.

Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Vaibhav Jain
5f7d861e65 spapr: nested: Add support for reporting Hostwide state counter
Add support for reporting Hostwide state counters for nested KVM pseries
guests running with 'cap-nested-papr' on Qemu-TCG acting as
L0-hypervisor. The Hostwide state counters are statistics about state that
L0-hypervisor maintains for the L2-guests and represent the state of all
L2-guests, not just a specific one.

These stats counters are exposed to L1-Hypervisor by the L0-Hypervisor via a
new bit-flag named 'getHostWideState' for the H_GUEST_GET_STATE hcall which
is documented at [1]. Once this flag is set the hcall should populate the
Guest-State-Elements in the requested GSB with the stat counter
values. Currently following five counters are supported:

* l0_guest_heap_size_inuse
* l0_guest_heap_size_max
* l0_guest_pagetable_size_inuse
* l0_guest_pagetable_size_max
* l0_guest_pagetable_reclaimed

At the moment '0' is being reported for all these counters as these
counters doesn't align with how L0-Qemu manages Guest memory.

The patch implements support for these counters by adding new members to
the 'struct SpaprMachineStateNested'. These new members are then plugged
into the existing 'guest_state_element_types[]' with the help of a new
macro 'GSBE_NESTED_MACHINE_DW' together with a new helper
'get_machine_ptr()'. guest_state_request_check() is updated to ensure
correctness of the requested GSB and finally h_guest_getset_state() is
updated to handle the newly introduced flag
'GUEST_STATE_REQUEST_HOST_WIDE'.

This patch is tested with the proposed linux-kernel implementation to
expose these stat-counter as perf-events at [2].

[1]
https://lore.kernel.org/all/20241222140247.174998-2-vaibhav@linux.ibm.com

[2]
https://lore.kernel.org/all/20241222140247.174998-1-vaibhav@linux.ibm.com

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250221155449.530645-1-vaibhav@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Shivaprasad G Bhat
5f361ea187 ppc: spapr: Enable 2nd DAWR on Power10 pSeries machine
As per the PAPR, bit 0 of byte 64 in pa-features property
indicates availability of 2nd DAWR registers. i.e. If this bit is set, 2nd
DAWR is present, otherwise not. Use KVM_CAP_PPC_DAWR1 capability to find
whether kvm supports 2nd DAWR or not. If it's supported, allow user to set
the pa-feature bit in guest DT using cap-dawr1 machine capability.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-ID: <173708681866.1678.11128625982438367069.stgit@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Shivaprasad G Bhat
7ea6e12529 ppc: Enable 2nd DAWR support on Power10 PowerNV machine
Extend the existing watchpoint facility from TCG DAWR0 emulation
to DAWR1 on POWER10.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-ID: <173708680684.1678.13237334676438770057.stgit@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Philippe Mathieu-Daudé
0829b6f0a8 hw/ppc/epapr: Do not swap ePAPR magic value
The ePAPR magic value in $r6 doesn't need to be byte swapped.

See ePAPR-v1.1.pdf chapter 5.4.1 "Boot CPU Initial Register State"
and the following mailing-list threads:
https://lore.kernel.org/qemu-devel/CAFEAcA_NR4XW5DNL4nq7vnH4XRH5UWbhQCxuLyKqYk6_FCBrAA@mail.gmail.com/
https://lore.kernel.org/qemu-devel/D6F93NM6OW2L.2FDO88L38PABR@gmail.com/

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-7-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Philippe Mathieu-Daudé
c2ac9f4c29 hw/ppc/spapr: Convert DIRTY_HPTE() macro as hpte_set_dirty() method
Convert DIRTY_HPTE() macro as hpte_set_dirty() method.

sPAPR data structures including the hash page table are big-endian
regardless of current CPU endian mode, so use the big-endian LD/ST
API to access the hash PTEs.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-6-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:32 +10:00
Philippe Mathieu-Daudé
735f9c878a hw/ppc/spapr: Convert CLEAN_HPTE() macro as hpte_set_clean() method
Convert CLEAN_HPTE() macro as hpte_set_clean() method.

sPAPR data structures including the hash page table are big-endian
regardless of current CPU endian mode, so use the big-endian LD/ST
API to access the hash PTEs.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-5-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
9087929887 hw/ppc/spapr: Convert HPTE_DIRTY() macro as hpte_is_dirty() method
Convert HPTE_DIRTY() macro as hpte_is_dirty() method.

sPAPR data structures including the hash page table are big-endian
regardless of current CPU endian mode, so use the big-endian LD/ST
API to access the hash PTEs.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-4-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
c5411a0653 hw/ppc/spapr: Convert HPTE_VALID() macro as hpte_is_valid() method
Convert HPTE_VALID() macro as hpte_is_valid() method.

sPAPR data structures including the hash page table are big-endian
regardless of current CPU endian mode, so use the big-endian LD/ST
API to access the hash PTEs.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-3-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
c894bdf78b hw/ppc/spapr: Convert HPTE() macro as hpte_get_ptr() method
Convert HPTE() macro as hpte_get_ptr() method.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-2-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
c2c687013d target/ppc: Restrict ATTN / SCV / PMINSN helpers to TCG
Move helper_attn(), helper_scv() and helper_pminsn() to
tcg-excp_helper.c.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-15-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
92c787de34 target/ppc: Make powerpc_excp() prototype public
In order to move TCG specific code dependent on powerpc_excp()
in the next commit, expose its prototype in "internal.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-14-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
b8d6a858fe target/ppc: Fix style in excp_helper.c
Fix style in do_rfi() before moving the code around.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-13-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
ad8ad893a3 target/ppc: Restrict various common helpers to TCG
Move helpers common to system/user emulation to tcg-excp_helper.c.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-12-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
1d0b82f86d target/ppc: Restrict exception helpers to TCG
Move exception helpers to tcg-excp_helper.c so they are
only built when TCG is selected. Preprocessor guards
are added for some helpers unused when CONFIG_USER_ONLY.

[npiggin: mention USER_ONLY change]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250127102620.39159-10-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
94a37684a5 target/ppc: Remove raise_exception_ra()
Introduced in commit db789c6cd3 ("ppc: Provide basic
raise_exception_* functions"), raise_exception_ra() has
never been used. Remove as dead code.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-9-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
2f96c00b61 target/ppc: Restrict powerpc_checkstop() to TCG
Expose powerpc_checkstop() prototype, and move it to
tcg-excp_helper.c, only built when TCG is available.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-8-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
30de74bda7 target/ppc: Ensure powerpc_mcheck_checkstop() is only called under TCG
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250127102620.39159-7-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
720c2f2d53 target/ppc: Move ppc_ldl_code() to tcg-excp_helper.c
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-6-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
0fc76338fe target/ppc: Move TCG specific exception handlers to tcg-excp_helper.c
Move the TCGCPUOps handlers to a new unit: tcg-excp_helper.c,
only built when TCG is selected.

See in target/ppc/cpu_init.c:

    #ifdef CONFIG_TCG
    static const TCGCPUOps ppc_tcg_ops = {
      ...
      .do_unaligned_access = ppc_cpu_do_unaligned_access,
      .do_transaction_failed = ppc_cpu_do_transaction_failed,
      .debug_excp_handler = ppc_cpu_debug_excp_handler,
      .debug_check_breakpoint = ppc_cpu_debug_check_breakpoint,
      .debug_check_watchpoint = ppc_cpu_debug_check_watchpoint,
    };
    #endif /* CONFIG_TCG */

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-5-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
215b2ee8f1 target/ppc: Make ppc_ldl_code() declaration public
We are going to move code calling ppc_ldl_code() out of
excp_helper.c where it is defined. Expose its declaration
for few commits, until eventually making it static again
once everything is moved.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-4-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
dan tan
ffb6440cc5 ppc/pnv: Add new PowerPC Special Purpose Registers (RWMR)
Register RWMR - Region Weighted Mode Register
for privileged access in Power9 and Power10

It controls what the SPURR register produces.

Specs:
 - Power10: https://files.openpower.foundation/s/EgCy7C43p2NSRfR

TCG does not model SMT priority, timing, resource controls
and status so this register has no effect for now.

[npiggin: adjust changelog]
Signed-off-by: dan tan <dantan@linux.ibm.com>
Message-ID: <20250116154226.13376-1-dantan@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00
Philippe Mathieu-Daudé
3e84d03815 hw/ppc/spapr: Restrict CONFER hypercall to TCG
KVM handles H_CONFER and does not pass it along to QEMU, so
only vhyp (as used by TCG spapr) needs to handle it.

[npiggin: Add changelog]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250127102620.39159-2-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-11 22:43:31 +10:00