qemu/hw
David Gibson dbe1a27745 virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually
discard RAM pages inserted into the balloon.  This is basically a Linux
only interface (MADV_DONTNEED exists on some other platforms, but doesn't
always have the same semantics).  It also doesn't work on hugepages and has
some other limitations.

It turns out that postcopy also needs to discard chunks of memory, and uses
a better interface for it: ram_block_discard_range().  It doesn't cover
every case, but it covers more than going direct to madvise() and this
gives us a single place to update for more possibilities in future.

There are some subtleties here to maintain the current balloon behaviour:

* For now, we just ignore requests to balloon in a hugepage backed region.
  That matches current behaviour, because MADV_DONTNEED on a hugepage would
  simply fail, and we ignore the error.

* If host page size is > BALLOON_PAGE_SIZE we can frequently call this on
  non-host-page-aligned addresses.  These would also fail in madvise(),
  which we then ignored.  ram_block_discard_range() error_report()s calls
  on unaligned addresses, so we explicitly check that case to avoid
  spamming the logs.

* We now call ram_block_discard_range() with the *host* page size, whereas
  we previously called madvise() with BALLOON_PAGE_SIZE.  Surprisingly,
  this also matches existing behaviour.  Although the kernel fails madvise
  on unaligned addresses, it will round unaligned sizes *up* to the host
  page size.  Yes, this means that if BALLOON_PAGE_SIZE < guest page size
  we can incorrectly discard more memory than the guest asked us to.  I'm
  planning to address that soon.

Errors other than the ones discussed above, will now be reported by
ram_block_discard_range(), rather than silently ignored, which means we
have a much better chance of seeing when something is going wrong.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20190214043916.22128-5-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-21 12:28:41 -05:00
..
9pfs xen: re-name XenDevice to XenLegacyDevice... 2019-01-14 13:45:40 +00:00
acpi qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
adc Include qapi/error.h exactly where needed 2018-02-09 13:50:17 +01:00
alpha hw/alpha/Makefile.objs: Create CONFIG_* for alpha 2019-02-05 16:50:20 +01:00
arm hw/arm/armsse: Fix miswiring of expansion IRQs 2019-02-15 09:56:39 +00:00
audio audio: fix pc speaker init 2019-01-24 13:10:19 +01:00
block virtio-blk: set correct config size for the host driver 2019-02-13 16:18:17 +08:00
bt char: allow specifying a GMainContext at opening time 2019-02-13 14:23:39 +01:00
char qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
core qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
cpu hw/cpu/cluster: Mark the cpu-cluster device with user_creatable = false 2019-02-06 15:55:56 +01:00
cris hw/cris/Makefile.objs: Create CONFIG_* for cris 2019-02-05 16:50:20 +01:00
display hw/display/milkymist-tmu2: Move inlined code from header to source 2019-02-01 11:58:50 +01:00
dma hw/dma/i8257: Use qemu_log_mask(UNIMP) instead of fprintf 2019-02-14 11:46:30 +01:00
gpio trace: enforce that every trace-events file has a final newline 2019-01-24 14:16:56 +00:00
hppa hw/hppa: forward requests to CPU HPA 2019-02-12 08:59:21 -08:00
hyperv hw/hyperv: fix NULL dereference with pure-kvm SynIC 2018-11-26 14:14:38 -02:00
i2c hw/i2c/Makefile.objs: Create new CONFIG_* variables for EEPROM and ACPI controller 2019-02-05 16:50:21 +01:00
i386 * cpu-exec fixes (Emilio, Laurent) 2019-02-05 19:39:22 +00:00
ide ide: split ioport registration to a separate file 2019-02-05 16:50:19 +01:00
input pckbd: Convert DPRINTF->trace 2019-02-14 11:46:30 +01:00
intc xics: Drop the KVM ICS class 2019-02-18 10:52:08 +11:00
ipack hw/ipack: Use the IEC binary prefix definitions 2018-07-02 15:41:12 +02:00
ipmi ipmi: Use proper struct reference for BT vmstate 2018-08-23 18:46:25 +02:00
isa char: allow specifying a GMainContext at opening time 2019-02-13 14:23:39 +01:00
lm32 hw/lm32/Makefile.objs: Conditionally build lm32 and milkmyst 2019-02-05 16:50:20 +01:00
m68k hw/m68k/Makefile.objs: Conditionally build boards 2019-02-05 16:50:19 +01:00
mem pc-dimm: use same mechanism for [get|set]_addr 2019-02-21 12:28:41 -05:00
microblaze hw/microblaze/Makefile.objs: Create configs for petalogix and xilinx boards 2019-02-05 16:50:19 +01:00
mips hw/mips_int: hold BQL for all interrupt requests 2019-02-14 17:47:28 +01:00
misc cuda: decrease time delay before raising VIA SR interrupt and remove fast path 2019-02-17 21:54:02 +11:00
moxie hw/moxie/Makefile.objs: Conditionally build moxie 2019-02-05 16:50:20 +01:00
net vhost-net: compile it on all targets that have virtio-net. 2019-02-21 12:28:01 -05:00
nios2 hw/nios2/Makefile.objs: Conditionally build nios2 2019-02-05 16:50:20 +01:00
nvram fw_cfg: fix the life cycle and the name of "qemu_extra_params_fw" 2019-02-05 10:58:33 -05:00
openrisc hw/openrisc/Makefile.objs: Create CONFIG_* for openrisc 2019-02-05 16:50:21 +01:00
pci qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
pci-bridge pci/shpc: perform unplug via the hotplug handler 2018-12-20 11:19:12 -05:00
pci-host build: actually use CONFIG_PAM 2019-02-05 16:50:19 +01:00
pcmcia
ppc ppc patch queue 2019-02-19 2019-02-18 16:20:13 +00:00
rdma hw/rdma: modify struct initialization 2019-01-19 11:01:33 +02:00
riscv riscv: Ensure the kernel start address is correctly cast 2019-02-11 15:56:22 -08:00
s390x ppc patch queue 2019-02-19 2019-02-18 16:20:13 +00:00
scsi qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
sd hw: sd: set category of the sd memory card 2019-01-30 10:24:20 +01:00
sh4 * cpu-exec fixes (Emilio, Laurent) 2019-02-05 19:39:22 +00:00
smbios hw/smbios: Move to the hw/firmware/ subdirectory 2018-12-19 16:48:16 -05:00
sparc qemu-sparc queue 2019-02-07 16:49:30 +00:00
sparc64 hw/sparc64: Explicitly set default_display = "std" 2019-02-14 11:46:30 +01:00
ssi aspeed/smc: snoop SPI transfers to fake dummy cycles 2019-01-29 11:46:05 +00:00
timer qapi: move RTC_CHANGE to the target schema 2019-02-18 14:44:05 +01:00
tpm tpm: clear RAM when "memory overwrite" requested 2019-01-17 21:10:57 -05:00
tricore hw/tricore/Makefile.objs: Create CONFIG_* for tricore 2019-02-05 16:50:21 +01:00
unicore32 hw/unicore32/puv3: Drop useless inclusion of "hw/i386/pc.h" 2019-02-06 15:54:12 +01:00
usb usb: remove unnecessary NULL device check from usb_ep_get() 2019-02-20 09:41:23 +01:00
vfio hw/vfio/Makefile.objs: Create new CONFIG_* variables for VFIO core and PCI 2019-02-05 16:50:21 +01:00
virtio virtio-balloon: Use ram_block_discard_range() instead of raw madvise() 2019-02-21 12:28:41 -05:00
watchdog hw/watchdog/wdt_i6300esb: remove a unnecessary comment 2019-01-11 15:46:55 +01:00
xen xen: fix xen-bus state model to allow frontend re-connection 2019-02-04 11:04:49 +00:00
xenpv xen: Replace few mentions of xend by libxl 2019-01-14 13:45:40 +00:00
xtensa hw/xtensa/Makefile.objs: Build xtensa_sim and xtensa_fpga conditionally 2019-02-05 16:50:20 +01:00
Makefile.objs hw/vfio/Makefile.objs: Create new CONFIG_* variables for VFIO core and PCI 2019-02-05 16:50:21 +01:00