mirror of
https://github.com/qemu/qemu.git
synced 2025-10-23 17:45:11 +00:00

Put in a more accessible place the reasoning behind our decision to officially drop KVM support in the powernv machine. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20211130133153.444601-3-danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
207 lines
7.3 KiB
ReStructuredText
207 lines
7.3 KiB
ReStructuredText
PowerNV family boards (``powernv8``, ``powernv9``, ``powernv10``)
|
|
==================================================================
|
|
|
|
PowerNV (as Non-Virtualized) is the "bare metal" platform using the
|
|
OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can
|
|
be used as an hypervisor OS, running KVM guests, or simply as a host
|
|
OS.
|
|
|
|
The PowerNV QEMU machine tries to emulate a PowerNV system at the
|
|
level of the skiboot firmware, which loads the OS and provides some
|
|
runtime services. Power Systems have a lower firmware (HostBoot) that
|
|
does low level system initialization, like DRAM training. This is
|
|
beyond the scope of what QEMU addresses today.
|
|
|
|
Supported devices
|
|
-----------------
|
|
|
|
* Multi processor support for POWER8, POWER8NVL and POWER9.
|
|
* XSCOM, serial communication sideband bus to configure chiplets.
|
|
* Simple LPC Controller.
|
|
* Processor Service Interface (PSI) Controller.
|
|
* Interrupt Controller, XICS (POWER8) and XIVE (POWER9) and XIVE2 (Power10).
|
|
* POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge.
|
|
* Simple OCC is an on-chip micro-controller used for power management tasks.
|
|
* iBT device to handle BMC communication, with the internal BMC simulator
|
|
provided by QEMU or an external BMC such as an Aspeed QEMU machine.
|
|
* PNOR containing the different firmware partitions.
|
|
|
|
Missing devices
|
|
---------------
|
|
|
|
A lot is missing, among which :
|
|
|
|
* I2C controllers (yet to be merged).
|
|
* NPU/NPU2/NPU3 controllers.
|
|
* EEH support for PCIe Host bridge controllers.
|
|
* NX controller.
|
|
* VAS controller.
|
|
* chipTOD (Time Of Day).
|
|
* Self Boot Engine (SBE).
|
|
* FSI bus.
|
|
|
|
Firmware
|
|
--------
|
|
|
|
The OPAL firmware (OpenPower Abstraction Layer) for OpenPower systems
|
|
includes the runtime services ``skiboot`` and the bootloader kernel and
|
|
initramfs ``skiroot``. Source code can be found on the `OpenPOWER account at
|
|
GitHub <https://github.com/open-power>`_.
|
|
|
|
Prebuilt images of ``skiboot`` and ``skiroot`` are made available on the
|
|
`OpenPOWER <https://github.com/open-power/op-build/releases/>`__ site.
|
|
|
|
QEMU includes a prebuilt image of ``skiboot`` which is updated when a
|
|
more recent version is required by the models.
|
|
|
|
Current acceleration status
|
|
---------------------------
|
|
|
|
KVM acceleration in Linux Power hosts is provided by the kvm-hv and
|
|
kvm-pr modules. kvm-hv is adherent to PAPR and it's not compliant with
|
|
powernv. kvm-pr in theory could be used as a valid accel option but
|
|
this isn't supported by kvm-pr at this moment.
|
|
|
|
To spare users from dealing with not so informative errors when attempting
|
|
to use accel=kvm, the powernv machine will throw an error informing that
|
|
KVM is not supported. This can be revisited in the future if kvm-pr (or
|
|
any other KVM alternative) is usable as KVM accel for this machine.
|
|
|
|
Boot options
|
|
------------
|
|
|
|
Here is a simple setup with one e1000e NIC :
|
|
|
|
.. code-block:: bash
|
|
|
|
$ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 \
|
|
-accel tcg,thread=single \
|
|
-device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0 \
|
|
-netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \
|
|
-kernel ./zImage.epapr \
|
|
-initrd ./rootfs.cpio.xz \
|
|
-nographic
|
|
|
|
and a SATA disk :
|
|
|
|
.. code-block:: bash
|
|
|
|
-device ich9-ahci,id=sata0,bus=pcie.1,addr=0x0 \
|
|
-drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \
|
|
-device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \
|
|
|
|
Complex PCIe configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Six PHBs are defined per chip (POWER9) but no default PCI layout is
|
|
provided (to be compatible with libvirt). One PCI device can be added
|
|
on any of the available PCIe slots using command line options such as:
|
|
|
|
.. code-block:: bash
|
|
|
|
-device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
|
|
-netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
|
|
|
|
-device megasas,id=scsi0,bus=pcie.0,addr=0x0
|
|
-drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
|
|
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
|
|
|
|
Here is a full example with two different storage controllers on
|
|
different PHBs, each with a disk, the second PHB is empty :
|
|
|
|
.. code-block:: bash
|
|
|
|
$ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 -accel tcg,thread=single \
|
|
-kernel ./zImage.epapr -initrd ./rootfs.cpio.xz -bios ./skiboot.lid \
|
|
\
|
|
-device megasas,id=scsi0,bus=pcie.0,addr=0x0 \
|
|
-drive file=./rhel7-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none \
|
|
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 \
|
|
\
|
|
-device pcie-pci-bridge,id=bridge1,bus=pcie.1,addr=0x0 \
|
|
\
|
|
-device ich9-ahci,id=sata0,bus=bridge1,addr=0x1 \
|
|
-drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \
|
|
-device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \
|
|
-device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=bridge1,addr=0x2 \
|
|
-netdev bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=net0 \
|
|
-device nec-usb-xhci,bus=bridge1,addr=0x7 \
|
|
\
|
|
-serial mon:stdio -nographic
|
|
|
|
You can also use VIRTIO devices :
|
|
|
|
.. code-block:: bash
|
|
|
|
-drive file=./fedora-ppc64le.qcow2,if=none,snapshot=on,id=drive0 \
|
|
-device virtio-blk-pci,drive=drive0,id=blk0,bus=pcie.0 \
|
|
\
|
|
-netdev tap,helper=/usr/lib/qemu/qemu-bridge-helper,br=virbr0,id=netdev0 \
|
|
-device virtio-net-pci,netdev=netdev0,id=net0,bus=pcie.1 \
|
|
\
|
|
-fsdev local,id=fsdev0,path=$HOME,security_model=passthrough \
|
|
-device virtio-9p-pci,fsdev=fsdev0,mount_tag=host,bus=pcie.2
|
|
|
|
Multi sockets
|
|
~~~~~~~~~~~~~
|
|
|
|
The number of sockets is deduced from the number of CPUs and the
|
|
number of cores. ``-smp 2,cores=1`` will define a machine with 2
|
|
sockets of 1 core, whereas ``-smp 2,cores=2`` will define a machine
|
|
with 1 socket of 2 cores. ``-smp 8,cores=2``, 4 sockets of 2 cores.
|
|
|
|
BMC configuration
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
OpenPOWER systems negotiate the shutdown and reboot with their
|
|
BMC. The QEMU PowerNV machine embeds an IPMI BMC simulator using the
|
|
iBT interface and should offer the same power features.
|
|
|
|
If you want to define your own BMC, use ``-nodefaults`` and specify
|
|
one on the command line :
|
|
|
|
.. code-block:: bash
|
|
|
|
-device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10
|
|
|
|
The files `palmetto-SDR.bin <http://www.kaod.org/qemu/powernv/palmetto-SDR.bin>`__
|
|
and `palmetto-FRU.bin <http://www.kaod.org/qemu/powernv/palmetto-FRU.bin>`__
|
|
define a Sensor Data Record repository and a Field Replaceable Unit
|
|
inventory for a Palmetto BMC. They can be used to extend the QEMU BMC
|
|
simulator.
|
|
|
|
.. code-block:: bash
|
|
|
|
-device ipmi-bmc-sim,sdrfile=./palmetto-SDR.bin,fruareasize=256,frudatafile=./palmetto-FRU.bin,id=bmc0 \
|
|
-device isa-ipmi-bt,bmc=bmc0,irq=10
|
|
|
|
The PowerNV machine can also be run with an external IPMI BMC device
|
|
connected to a remote QEMU machine acting as BMC, using these options
|
|
:
|
|
|
|
.. code-block:: bash
|
|
|
|
-chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \
|
|
-device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \
|
|
-device isa-ipmi-bt,bmc=bmc0,irq=10 \
|
|
-nodefaults
|
|
|
|
NVRAM
|
|
~~~~~
|
|
|
|
Use a MTD drive to add a PNOR to the machine, and get a NVRAM :
|
|
|
|
.. code-block:: bash
|
|
|
|
-drive file=./witherspoon.pnor,format=raw,if=mtd
|
|
|
|
CAVEATS
|
|
-------
|
|
|
|
* No support for multiple HW threads (SMT=1). Same as pseries.
|
|
|
|
Maintainer contact information
|
|
------------------------------
|
|
|
|
Cédric Le Goater <clg@kaod.org>
|