qemu/hw
David Gibson 8897ea5a9f spapr: Don't attempt to clamp RMA to VRMA constraint
The Real Mode Area (RMA) is the part of memory which a guest can access
when in real (MMU off) mode.  Of course, for a guest under KVM, the MMU
isn't really turned off, it's just in a special translation mode - Virtual
Real Mode Area (VRMA) - which looks like real mode in guest mode.

The mechanics of how this works when using the hash MMU (HPT) put a
constraint on the size of the RMA, which depends on the size of the
HPT.  So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA
we advertise to the guest based on this VRMA limit.

There are several things wrong with this:
 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum
    of Node 0 memory size and the VRMA limit.  That will *often* work the
    same as clamping, but there can be other constraints on RMA size which
    supersede Node 0 memory size.  We have real bugs caused by this
    (currently worked around in the guest kernel)
 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where
    we're past the point that we can actually advertise an RMA limit to the
    guest
 3) But most fundamentally, the VRMA limit depends on host configuration
    (page size) which shouldn't be visible to the guest, but this partially
    exposes it.  This can cause problems with migration in certain edge
    cases, although we will mostly get away with it.

In practice, this clamping is almost never applied anyway.  With 64kiB
pages and the normal rules for sizing of the HPT, the theoretical VRMA
limit will be 4x(guest memory size) and so never hit.  It will hit with
4kiB pages, where it will be (guest memory size)/4.  However all mainstream
distro kernels for POWER have used a 64kiB page size for at least 10 years.

So, simply replace this logic with a check that the RMA we've calculated
based only on guest visible configuration will fit within the host implied
VRMA limit.  This can break if running HPT guests on a host kernel with
4kiB page size.  As noted that's very rare.  There also exist several
possible workarounds:
  * Change the host kernel to use 64kiB pages
  * Use radix MMU (RPT) guests instead of HPT
  * Use 64kiB hugepages on the host to back guest memory
  * Increase the guest memory size so that the RMA hits one of the fixed
    limits before the RMA limit.  This is relatively easy on POWER8 which
    has a 16GiB limit, harder on POWER9 which has a 1TiB limit.
  * Use a guest NUMA configuration which artificially constrains the RMA
    within the VRMA limit (the RMA must always fit within Node 0).

Previously, on KVM, we also temporarily reduced the rma_size to 256M so
that the we'd load the kernel and initrd safely, regardless of the VRMA
limit.  This was a) confusing, b) could significantly limit the size of
images we could load and c) introduced a behavioural difference between
KVM and TCG.  So we remove that as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
2020-03-17 09:41:15 +11:00
..
9pfs 9p/proxy: Fix export_flags 2020-03-10 16:12:49 +01:00
acpi hw/acpi: Include "hw/mem/nvdimm.h" 2020-03-09 15:59:31 +01:00
adc hw/*/Makefile.objs: Move many .o files to common-objs 2020-02-04 09:00:57 +01:00
alpha hw/alpha/dp264: Include "net/net.h" 2020-03-09 15:59:31 +01:00
arm hw/arm/virt: kvm: allow gicv3 by default if v2 cannot work 2020-03-12 16:27:33 +00:00
audio hw/audio/fmopl: Fix a typo twice 2020-03-09 15:59:31 +01:00
block qapi: Flatten object-add 2020-03-06 17:21:27 +01:00
char hw/char/exynos4210_uart: Fix memleaks in exynos4210_uart_init 2020-02-13 14:14:55 +00:00
core core/qdev: fix memleak in qdev_get_gpio_out_connector() 2020-03-09 15:59:31 +01:00
cpu cpu/arm11mpcore: Set number of GIC priority bits to 4 2020-02-28 16:14:57 +00:00
cris hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
display stdvga+bochs-display: add dummy mmio handler 2020-03-16 12:40:47 +01:00
dma dma/xlnx-zdma: Remove redundant statement in zdma_write_dst() 2020-03-09 15:59:31 +01:00
gpio hw/*/Makefile.objs: Move many .o files to common-objs 2020-02-04 09:00:57 +01:00
hppa hw/hppa/machine: Include "net/net.h" 2020-03-09 15:59:31 +01:00
hyperv add device_legacy_reset function to prepare for reset api change 2020-01-30 16:02:03 +00:00
i2c hw/i2c/smbus_ich9: Include "qemu/range.h" 2020-03-09 15:59:31 +01:00
i386 hw/i386: Include "hw/mem/nvdimm.h" 2020-03-09 15:59:31 +01:00
ide hw/ide: Let the DMAIntFunc prototype use a boolean 'is_write' argument 2020-02-20 14:47:08 +01:00
input hw/input: Do not enable CONFIG_PCKBD by default 2020-02-04 09:01:31 +01:00
intc hw/intc/armv7m_nvic: Rebuild hflags on reset 2020-03-12 16:01:37 +00:00
ipack qdev: set properties with device_class_set_props() 2020-01-24 20:59:15 +01:00
ipmi qdev: set properties with device_class_set_props() 2020-01-24 20:59:15 +01:00
isa hw/southbridge/ich9: Removed unused headers 2020-03-09 15:59:31 +01:00
lm32 hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
m68k hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
mem spapr: Add NVDIMM device support 2020-02-21 09:15:04 +11:00
microblaze hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
mips hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
misc target-arm queue: 2020-03-12 17:34:34 +00:00
moxie hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
net hw/arm/allwinner-h3: add EMAC ethernet device 2020-03-12 16:27:33 +00:00
nios2 hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
nubus hw/m68k: add Nubus support 2019-10-28 19:06:47 +01:00
nvram Let cpu_[physical]_memory() calls pass a boolean 'is_write' argument 2020-02-20 14:47:08 +01:00
openrisc hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
pci pcie_root_port: Add hotplug disabling option 2020-03-08 09:18:29 -04:00
pci-bridge pcie_root_port: Add hotplug disabling option 2020-03-08 09:18:29 -04:00
pci-host hw/pci-host/q35: Remove unused includes 2020-03-09 15:59:31 +01:00
pcmcia hw/*/Makefile.objs: Move many .o files to common-objs 2020-02-04 09:00:57 +01:00
ppc spapr: Don't attempt to clamp RMA to VRMA constraint 2020-03-17 09:41:15 +11:00
rdma qdev: set properties with device_class_set_props() 2020-01-24 20:59:15 +01:00
riscv RISC-V Patches for the 5.0 Soft Freeze, Part 3 2020-03-03 11:06:39 +00:00
rtc hw/arm/allwinner: add RTC device support 2020-03-12 16:27:33 +00:00
s390x s390x: ipl: Consolidate iplb validity check into one function 2020-03-10 10:18:20 +01:00
scsi scsi/scsi-disk: Remove redundant statement in scsi_disk_emulate_command() 2020-03-09 15:59:31 +01:00
sd hw/arm/allwinner: add SD/MMC host controller 2020-03-12 16:27:33 +00:00
semihosting semihosting: add qemu_semihosting_console_inc for SYS_READC 2020-01-09 11:41:29 +00:00
sh4 hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
smbios hw/smbios/smbios: Remove unused include 2020-02-06 10:38:57 +01:00
sparc hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
sparc64 hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
ssi aspeed/smc: Fix User mode select/unselect scheme 2020-03-12 16:01:37 +00:00
timer hw/timer/hpet: Include "exec/address-spaces.h" 2020-03-09 15:59:31 +01:00
tpm tpm: Add the SysBus TPM TIS device 2020-03-05 12:18:08 -05:00
tricore hw: Do not initialize MachineClass::is_default to 0 2020-02-28 14:57:19 -05:00
unicore32 hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
usb hw/arm/allwinner-h3: add USB host controller 2020-03-12 16:27:33 +00:00
vfio hw/vfio/display: Remove superfluous semicolon 2020-02-18 20:20:49 +01:00
virtio vhost-vsock: fix error message output 2020-03-08 09:27:09 -04:00
watchdog qdev: set properties with device_class_set_props() 2020-01-24 20:59:15 +01:00
xen xen-bus/block: explicitly assign event channels to an AioContext 2020-02-27 11:50:30 +00:00
xenpv trivial: Remove xenfb_enabled from sysemu.h 2020-02-04 09:00:57 +01:00
xtensa hw/xtensa/xtfpga:fix leak of fdevice tree blob 2020-02-19 10:33:38 +01:00
Kconfig Remove the core bluetooth code 2019-12-17 09:01:14 +01:00
Makefile.objs Remove the core bluetooth code 2019-12-17 09:01:14 +01:00