In x86, module is the topology level above core, which contains a set
of cores that share certain resources (in current products, the resource
usually includes L2 cache, as well as module scoped features and MSRs).
Though smp.clusters could also share the L2 cache resource [1], there
are following reasons that drive us to introduce the new smp.modules:
* As the CPU topology abstraction in device tree [2], cluster supports
nesting (though currently QEMU hasn't support that). In contrast,
(x86) module does not support nesting.
* Due to nesting, there is great flexibility in sharing resources
on cluster, rather than narrowing cluster down to sharing L2 (and
L3 tags) as the lowest topology level that contains cores.
* Flexible nesting of cluster allows it to correspond to any level
between the x86 package and core.
* In Linux kernel, x86's cluster only represents the L2 cache domain
but QEMU's smp.clusters is the CPU topology level. Linux kernel will
also expose module level topology information in sysfs for x86. To
avoid cluster ambiguity and keep a consistent CPU topology naming
style with the Linux kernel, we introduce module level for x86.
The module is, in existing hardware practice, the lowest layer that
contains the core, while the cluster is able to have a higher
topological scope than the module due to its nesting.
Therefore, place the module between the cluster and the core:
drawer/book/socket/die/cluster/module/core/thread
With the above topological hierarchy order, introduce module level
support in MachineState and MachineClass.
[1]: https://lore.kernel.org/qemu-devel/c3d68005-54e0-b8fe-8dc1-5989fe3c7e69@huawei.com/
[2]: https://www.kernel.org/doc/Documentation/devicetree/bindings/cpu/cpu-topology.txt
Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-2-zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Add a new member "guest_memfd" to memory backends. When it's set
to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm
guest_memfd will be allocated during RAMBlock allocation.
Memory backend's @guest_memfd is wired with @require_guest_memfd
field of MachineState. It avoid looking up the machine in phymem.c.
MachineState::require_guest_memfd is supposed to be set by any VMs
that requires KVM guest memfd as private memory, e.g., TDX VM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20240320083945.991426-8-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
more memslots support in libvhost-user
support PCIe Gen5/Gen6 link speeds in pcie
more traces in vdpa
network simulation devices support in vdpa
SMBIOS type 9 descriptor implementation
Bump max_cpus to 4096 vcpus in q35
aw-bits and granule options in VIRTIO-IOMMU
Support report NUMA nodes for device memory using GI in acpi
Beginning of shutdown event support in pvpanic
fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmXw0TMPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp8x4H+gLMoGwaGAX7gDGPgn2Ix4j/3kO77ZJ9X9k/
1KqZu/9eMS1j2Ei+vZqf05w7qRjxxhwDq3ilEXF/+UFqgAehLqpRRB8j5inqvzYt
+jv0DbL11PBp/oFjWcytm5CbiVsvq8KlqCF29VNzc162XdtcduUOWagL96y8lJfZ
uPrOoyeR7SMH9lp3LLLHWgu+9W4nOS03RroZ6Umj40y5B7yR0Rrppz8lMw5AoQtr
0gMRnFhYXeiW6CXdz+Tzcr7XfvkkYDi/j7ibiNSURLBfOpZa6Y8+kJGKxz5H1K1G
6ZY4PBcOpQzl+NMrktPHogczgJgOK10t+1i/R3bGZYw2Qn/93Eg=
=C0UU
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, cleanups, fixes
more memslots support in libvhost-user
support PCIe Gen5/Gen6 link speeds in pcie
more traces in vdpa
network simulation devices support in vdpa
SMBIOS type 9 descriptor implementation
Bump max_cpus to 4096 vcpus in q35
aw-bits and granule options in VIRTIO-IOMMU
Support report NUMA nodes for device memory using GI in acpi
Beginning of shutdown event support in pvpanic
fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmXw0TMPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRp8x4H+gLMoGwaGAX7gDGPgn2Ix4j/3kO77ZJ9X9k/
# 1KqZu/9eMS1j2Ei+vZqf05w7qRjxxhwDq3ilEXF/+UFqgAehLqpRRB8j5inqvzYt
# +jv0DbL11PBp/oFjWcytm5CbiVsvq8KlqCF29VNzc162XdtcduUOWagL96y8lJfZ
# uPrOoyeR7SMH9lp3LLLHWgu+9W4nOS03RroZ6Umj40y5B7yR0Rrppz8lMw5AoQtr
# 0gMRnFhYXeiW6CXdz+Tzcr7XfvkkYDi/j7ibiNSURLBfOpZa6Y8+kJGKxz5H1K1G
# 6ZY4PBcOpQzl+NMrktPHogczgJgOK10t+1i/R3bGZYw2Qn/93Eg=
# =C0UU
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 12 Mar 2024 22:03:31 GMT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (68 commits)
docs/specs/pvpanic: document shutdown event
hw/cxl: Fix missing reserved data in CXL Device DVSEC
hmat acpi: Fix out of bounds access due to missing use of indirection
hmat acpi: Do not add Memory Proximity Domain Attributes Structure targetting non existent memory.
qemu-options.hx: Document the virtio-iommu-pci aw-bits option
hw/arm/virt: Set virtio-iommu aw-bits default value to 48
hw/i386/q35: Set virtio-iommu aw-bits default value to 39
virtio-iommu: Add an option to define the input range width
virtio-iommu: Trace domain range limits as unsigned int
qemu-options.hx: Document the virtio-iommu-pci granule option
virtio-iommu: Change the default granule to the host page size
virtio-iommu: Add a granule property
hw/i386/acpi-build: Add support for SRAT Generic Initiator structures
hw/acpi: Implement the SRAT GI affinity structure
qom: new object to associate device to NUMA node
hw/i386/pc: Inline pc_cmos_init() into pc_cmos_init_late() and remove it
hw/i386/pc: Set "normal" boot device order in pc_basic_device_init()
hw/i386/pc: Avoid one use of the current_machine global
hw/i386/pc: Remove "rtc_state" link again
Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# hw/core/machine.c
Currently the default input range can extend to 64 bits. On x86,
when the virtio-iommu protects vfio devices, the physical iommu
may support only 39 bits. Let's set the default to 39, as done
for the intel-iommu.
We use hw_compat_8_2 to handle the compatibility for machines
before 9.0 which used to have a virtio-iommu default input range
of 64 bits.
Of course if aw-bits is set from the command line, the default
is overriden.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240307134445.92296-8-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
We used to set the default granule to 4KB but with VFIO assignment
it makes more sense to use the actual host page size.
Indeed when hotplugging a VFIO device protected by a virtio-iommu
on a 64kB/64kB host/guest config, we current get a qemu crash:
"vfio: DMA mapping failed, unable to continue"
This is due to the hot-attached VFIO device calling
memory_region_iommu_set_page_size_mask() with 64kB granule
whereas the virtio-iommu granule was already frozen to 4KB on
machine init done.
Set the granule property to "host" and introduce a new compat.
The page size mask used before 9.0 was qemu_target_page_mask().
Since the virtio-iommu currently only supports x86_64 and aarch64,
this matched a 4KB granule.
Note that the new default will prevent 4kB guest on 64kB host
because the granule will be set to 64kB which would be larger
than the guest page size. In that situation, the virtio-iommu
driver fails on viommu_domain_finalise() with
"granule 0x10000 larger than system page size 0x1000".
In that case the workaround is to request 4K granule.
The current limitation of global granule in the virtio-iommu
should be removed and turned into per domain granule. But
until we get this upgraded, this new default is probably
better because I don't think anyone is currently interested in
running a 4KB page size guest with virtio-iommu on a 64KB host.
However supporting 64kB guest on 64kB host with virtio-iommu and
VFIO looks a more important feature.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20240307134445.92296-4-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Prefer fast cpu_env() over slower CPU QOM cast macro
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmXwPhYRHHRodXRoQHJl
ZGhhdC5jb20ACgkQLtnXdP5wLbWHvBAAgKx5LHFjz3xREVA+LkDTQ49mz0lK3s32
SGvNlIHjiaDGVttVYhVC4sinBWUruG4Lyv/2QN72OJBzn6WUsEUQE3KPH1d7Y3/s
wS9X7mj70n4kugWJqeIJP5AXSRasHmWoQ4QJLVQRJd6+Eb9jqwep0x7bYkI1de6D
bL1Q7bIfkFeNQBXaiPWAm2i+hqmT4C1r8HEAGZIjAsMFrjy/hzBEjNV+pnh6ZSq9
Vp8BsPWRfLU2XHm4WX0o8d89WUMAfUGbVkddEl/XjIHDrUD+Zbd1HAhLyfhsmrnE
jXIwSzm+ML1KX4MoF5ilGtg8Oo0gQDEBy9/xck6G0HCm9lIoLKlgTxK9glr2vdT8
yxZmrM9Hder7F9hKKxmb127xgU6AmL7rYmVqsoQMNAq22D6Xr4UDpgFRXNk2/wO6
zZZBkfZ4H4MpZXbd/KJpXvYH5mQA4IpkOy8LJdE+dbcHX7Szy9ksZdPA+Z10hqqf
zqS13qTs3abxymy2Q/tO3hPKSJCk1+vCGUkN60Wm+9VoLWGoU43qMc7gnY/pCS7m
0rFKtvfwFHhokX1orK0lP/ppVzPv/5oFIeK8YDY9if+N+dU2LCwVZHIuf2/VJPRq
wmgH2vAn3JDoRKPxTGX9ly6AMxuZaeP92qBTOPap0gDhihYzIpaCq9ecEBoTakI7
tdFhV0iRr08=
=NiP4
-----END PGP SIGNATURE-----
Merge tag 'pull-request-2024-03-12' of https://gitlab.com/thuth/qemu into staging
* Add missing ERRP_GUARD() statements in functions that need it
* Prefer fast cpu_env() over slower CPU QOM cast macro
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmXwPhYRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbWHvBAAgKx5LHFjz3xREVA+LkDTQ49mz0lK3s32
# SGvNlIHjiaDGVttVYhVC4sinBWUruG4Lyv/2QN72OJBzn6WUsEUQE3KPH1d7Y3/s
# wS9X7mj70n4kugWJqeIJP5AXSRasHmWoQ4QJLVQRJd6+Eb9jqwep0x7bYkI1de6D
# bL1Q7bIfkFeNQBXaiPWAm2i+hqmT4C1r8HEAGZIjAsMFrjy/hzBEjNV+pnh6ZSq9
# Vp8BsPWRfLU2XHm4WX0o8d89WUMAfUGbVkddEl/XjIHDrUD+Zbd1HAhLyfhsmrnE
# jXIwSzm+ML1KX4MoF5ilGtg8Oo0gQDEBy9/xck6G0HCm9lIoLKlgTxK9glr2vdT8
# yxZmrM9Hder7F9hKKxmb127xgU6AmL7rYmVqsoQMNAq22D6Xr4UDpgFRXNk2/wO6
# zZZBkfZ4H4MpZXbd/KJpXvYH5mQA4IpkOy8LJdE+dbcHX7Szy9ksZdPA+Z10hqqf
# zqS13qTs3abxymy2Q/tO3hPKSJCk1+vCGUkN60Wm+9VoLWGoU43qMc7gnY/pCS7m
# 0rFKtvfwFHhokX1orK0lP/ppVzPv/5oFIeK8YDY9if+N+dU2LCwVZHIuf2/VJPRq
# wmgH2vAn3JDoRKPxTGX9ly6AMxuZaeP92qBTOPap0gDhihYzIpaCq9ecEBoTakI7
# tdFhV0iRr08=
# =NiP4
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 12 Mar 2024 11:35:50 GMT
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* tag 'pull-request-2024-03-12' of https://gitlab.com/thuth/qemu: (55 commits)
user: Prefer fast cpu_env() over slower CPU QOM cast macro
target/xtensa: Prefer fast cpu_env() over slower CPU QOM cast macro
target/tricore: Prefer fast cpu_env() over slower CPU QOM cast macro
target/sparc: Prefer fast cpu_env() over slower CPU QOM cast macro
target/sh4: Prefer fast cpu_env() over slower CPU QOM cast macro
target/rx: Prefer fast cpu_env() over slower CPU QOM cast macro
target/ppc: Prefer fast cpu_env() over slower CPU QOM cast macro
target/openrisc: Prefer fast cpu_env() over slower CPU QOM cast macro
target/nios2: Prefer fast cpu_env() over slower CPU QOM cast macro
target/mips: Prefer fast cpu_env() over slower CPU QOM cast macro
target/microblaze: Prefer fast cpu_env() over slower CPU QOM cast macro
target/m68k: Prefer fast cpu_env() over slower CPU QOM cast macro
target/loongarch: Prefer fast cpu_env() over slower CPU QOM cast macro
target/i386/hvf: Use CPUState typedef
target/hexagon: Prefer fast cpu_env() over slower CPU QOM cast macro
target/cris: Prefer fast cpu_env() over slower CPU QOM cast macro
target/avr: Prefer fast cpu_env() over slower CPU QOM cast macro
target/alpha: Prefer fast cpu_env() over slower CPU QOM cast macro
target: Replace CPU_GET_CLASS(cpu -> obj) in cpu_reset_hold() handler
bulk: Call in place single use cpu_env()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit 1901b4967c ("hw/block/nvme: move msix table and pba to BAR 0")
moved the MSI-X table and PBA to BAR 0 to make room for enabling CMR and
PMR at the same time. As reported by Julien Grall in #2184, this breaks
migration through system hibernation.
Add a machine compatibility parameter and set it on machines pre 6.0 to
enable the old behavior automatically, restoring the hibernation
migration support.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2184
Fixes: 1901b4967c ("hw/block/nvme: move msix table and pba to BAR 0")
Reported-by: Julien Grall julien@xen.org
Tested-by: Julien Grall julien@xen.org
Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Do not accept any Object for CPUArchId::cpu field,
restrict it to CPUState type.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240129164514.73104-3-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
1. Set default "zero-page-detection" option to "multifd". Now
zero page checking can be done in the multifd threads and this
becomes the default configuration.
2. Handle migration QEMU9.0 -> QEMU8.2 compatibility. We provide
backward compatibility where zero page checking is done from the
migration main thread.
Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240311180015.3359271-7-hao.xiang@linux.dev
Signed-off-by: Peter Xu <peterx@redhat.com>
Move the reset of the sysbus (and thus all devices and buses anywhere
on the qbus tree) from qemu_register_reset() to qemu_register_resettable().
This is a behaviour change: because qemu_register_resettable() is
aware of three-phase reset, this now means that:
* 'enter' phase reset methods of devices and buses are called
before any legacy reset callbacks registered with qemu_register_reset()
* 'exit' phase reset methods of devices and buses are called
after any legacy qemu_register_reset() callbacks
Put another way, a qemu_register_reset() callback is now correctly
ordered in the 'hold' phase along with any other 'hold' phase methods.
The motivation for doing this is that we will now be able to resolve
some reset-ordering issues using the three-phase mechanism, because
the 'exit' phase is always after the 'hold' phase, even when the
'hold' phase function was registered with qemu_register_reset().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20240220160622.114437-10-peter.maydell@linaro.org
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
There are warning messages printed from tests/qtest/numa-test.c,
to complain the CPU cluster and NUMA node boundary is broken. Since
the broken boundary is expected, we don't want to see the warning
messages.
# cd /home/gavin/sandbox/qemu.main/build
# MALLOC_PERTURB_=255 QTEST_QEMU_BINARY=./qemu-system-aarch64 \
G_TEST_DBUS_DAEMON=../tests/dbus-vmstate-daemon.sh \
QTEST_QEMU_IMG=./qemu-img \
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \
tests/qtest/numa-test --tap -k
:
qemu-system-aarch64: warning: CPU-0 and CPU-4 in socket-0-cluster-0 \
have been associated with node-0 and node-1 respectively. \
It can cause OSes like Linux to misbehave
:
Skip the invalidation of CPU cluster and NUMA node boundary when
qtest is enabled, to avoid the warning messages.
Fixes: a494fdb715 ("numa: Validate cluster and NUMA node boundary if required")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The names of supported CPU models instead of CPU types should be
printed when the user specified CPU type isn't supported, to be
consistent with the output from '-cpu ?'.
Correct the error messages to print CPU model names instead of CPU
type names.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231204004726.483558-5-gshan@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
It's no sense to check the CPU type when mc->valid_cpu_types[0] is
NULL, which is a program error. Raise an assert on this.
A precise hint for the error message is given when mc->valid_cpu_types[0]
is the only valid entry. Besides, enumeration on mc->valid_cpu_types[0]
when we have mutiple valid entries there is avoided to increase the code
readability, as suggested by Philippe Mathieu-Daudé.
Besides, @cc comes from machine->cpu_type or mc->default_cpu_type. For
the later case, it can be NULL and it's also a program error. We should
use assert() in this case.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Message-ID: <20231204004726.483558-4-gshan@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The logic, to check if the specified CPU type is supported in
machine_run_board_init(), is independent enough. Factor it out into
helper is_cpu_type_supported(). machine_run_board_init() looks a bit
clean with this. Since we're here, @machine_class is renamed to @mc to
avoid multiple line spanning of code. The comments are tweaked a bit
either.
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231204004726.483558-3-gshan@redhat.com>
[PMD: Only call new helper if machine->cpu_type is not NULL]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. The principle is violated by machine_run_board_init() because
it calls error_report(), error_printf(), and exit(1) when the machine
doesn't support the requested CPU type.
Clean this up by using error_setg() and error_append_hint() instead.
No functional change, as the only caller passes &error_fatal.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20231204004726.483558-2-gshan@redhat.com>
[PMD: Correct error_append_hint() argument]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Add a helper to return a machine default CPU type.
If this machine is restricted to a single CPU type,
use it as default, obviously.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231116163726.28952-1-philmd@linaro.org>
The Intel 82576EB GbE Controller say that the Physical and Virtual
Functions support Function Level Reset. Add the capability to the PF
device model using device property "x-pcie-flr-init" which is "on" by
default and "off" for machines <= 8.1 to preserve compatibility.
The FLR capability of the VF model is defined according to the FLR
property of the PF, this to avoid adding an extra compatibility
property.
Cc: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Fixes: 3a977deebe ("Intrdocue igb device emulation")
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
S390 adds two new SMP levels, drawers and books to the CPU
topology.
S390 CPUs have specific topology features like dedication and
entitlement. These indicate to the guest information on host
vCPU scheduling and help the guest make better scheduling decisions.
Add the new levels to the relevant QAPI structs.
Add all the supported topology levels, dedication and entitlement
as properties to S390 CPUs.
Create machine-common.json so we can later include it in
machine-target.json also.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Message-ID: <20231016183925.2384704-3-nsg@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on.
Turn it off by default on machines <= 8.1 for compatibility reasons.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
[ clg: - checkpatch fixes
- improved warn_report() in vfio_realize() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add a "ramfb-dev" section whenever "x-migrate" is turned on. Turn it off
by default on machines <= 8.1 for compatibility reasons.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
vdpa:
shadow vq vlan support
net migration with cvq
cxl:
support emulating 4 HDM decoders
serial number extended capability
virtio:
hared dma-buf
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmUd4/YPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpyM8H/02cRbJcQOjYt7j68zPW6GaDXxBI/UmdWDyG
15LZZbGNOPjyjNd3Vz1M7stQ5rhoKcgo/RdI+0E60a78svgW5JvpXoXR3pksc3Dx
v28B/akXwHUErYFSZQ+2VHNc8OhCd0v2ehxZxbwPEAYIOAj3hcCIVoPGXTnKJmAJ
imr5hjH0wZUc0+xdsmn8Vfdv5NTzpwfVObbGiMZejeJsaoh0y6Rt8RANBMY67KQD
S7/HPlVuDYf/y43t4ZEHNYuV9RaCdZZYlLWwV1scdKaYcofgmtJOKbOdCjHRXgj+
004Afb3rggIoCfnCzOFzhGx+MLDtLjvEn2N4oLEWCLi+k/3huaA=
=GAvH
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pci: features, cleanups
vdpa:
shadow vq vlan support
net migration with cvq
cxl:
support emulating 4 HDM decoders
serial number extended capability
virtio:
hared dma-buf
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits)
libvhost-user: handle shared_object msg
vhost-user: add shared_object msg
hw/display: introduce virtio-dmabuf
util/uuid: add a hash function
virtio: remove unused next argument from virtqueue_split_read_next_desc()
virtio: remove unnecessary thread fence while reading next descriptor
virtio: use shadow_avail_idx while checking number of heads
libvhost-user.c: add assertion to vu_message_read_default
pcie_sriov: unregister_vfs(): fix error path
hw/i386/pc: improve physical address space bound check for 32-bit x86 systems
amd_iommu: Fix APIC address check
vdpa net: follow VirtIO initialization properly at cvq isolation probing
vdpa net: stop probing if cannot set features
vdpa net: fix error message setting virtio status
hw/pci-bridge/cxl-upstream: Add serial number extended capability support
hw/cxl: Support 4 HDM decoders at all levels of topology
hw/cxl: Fix and use same calculation for HDM decoder block size everywhere
hw/cxl: Add utility functions decoder interleave ways and target count.
hw/cxl: Push cxl_decoder_count_enc() and cxl_decode_ig() into .c
vdpa net: zero vhost_vdpa iova_tree pointer at cleanup
...
Conflicts:
hw/core/machine.c
Context conflict with commit 314e0a84cd ("hw/core: remove needless
includes") because it removed an adjacent #include.
current code sets PCI_SEC_LATENCY_TIMER to RW, but for
pcie to pcie bridges it must be RO 0 according to
pci express spec which says:
This register does not apply to PCI Express. It must be read-only
and hardwired to 00h. For PCI Express to PCI/PCI-X Bridges, refer to the
[PCIe-to-PCI-PCI-X-Bridge] for requirements for this register.
also, fix typo in comment where it's made writeable - this typo
is likely what prevented us noticing we violate this requirement
in the 1st place.
Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Message-Id: <de9d05366a70172e1789d10591dbe59e39c3849c.1693432039.git.mst@redhat.com>
Tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* fix for KVM on Apple M2
* introduce machine property "audiodev"
* ui/vnc: Require audiodev= to enable audio
* audio: remove QEMU_AUDIO_* and -audio-help support
* audio: forbid using default audiodev backend with -audiodev and -nodefaults
* remove compatibility code for old machine types
* make-release: do not ship dtc sources
* build system cleanups
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmUb0QgUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOpnAf9EFXfGkXpqQ5Q8ZbVlVc5GQKofMHW
OZwamTBlp/c07+QcQiMxwLhIW0iyDhrfdCjoFSUaTA8O10FM1YrFv4SkUryYb9B3
bmoTl4NeLvmkxpC47GEeaaBfjyM0G/9Ip9Zsuqx3u+gSzwTbkEstA2u7gcsN0tL9
VlhMSiV82uHhRC/DJYLxr+8bRYSIm1AeuI8K/O1yags85Kztf3UiQUhePIKLznMH
BdORjD+i46xM1dE8ifpdsunm462cDWz/faAnIH0YVKBlshnQHXKTO+GDA/Fbfl51
wFfupZXo93wwgawS7elAUzI+gwaKCPRHA8NDcukeO91hTzk6i14y04u5SQ==
=nv64
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* fix from optionrom build
* fix for KVM on Apple M2
* introduce machine property "audiodev"
* ui/vnc: Require audiodev= to enable audio
* audio: remove QEMU_AUDIO_* and -audio-help support
* audio: forbid using default audiodev backend with -audiodev and -nodefaults
* remove compatibility code for old machine types
* make-release: do not ship dtc sources
* build system cleanups
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmUb0QgUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroOpnAf9EFXfGkXpqQ5Q8ZbVlVc5GQKofMHW
# OZwamTBlp/c07+QcQiMxwLhIW0iyDhrfdCjoFSUaTA8O10FM1YrFv4SkUryYb9B3
# bmoTl4NeLvmkxpC47GEeaaBfjyM0G/9Ip9Zsuqx3u+gSzwTbkEstA2u7gcsN0tL9
# VlhMSiV82uHhRC/DJYLxr+8bRYSIm1AeuI8K/O1yags85Kztf3UiQUhePIKLznMH
# BdORjD+i46xM1dE8ifpdsunm462cDWz/faAnIH0YVKBlshnQHXKTO+GDA/Fbfl51
# wFfupZXo93wwgawS7elAUzI+gwaKCPRHA8NDcukeO91hTzk6i14y04u5SQ==
# =nv64
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 03 Oct 2023 04:30:00 EDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (24 commits)
audio: forbid default audiodev backend with -nodefaults
audio: propagate Error * out of audio_init
vt82c686 machines: Support machine-default audiodev with fallback
hw/ppc: Support machine-default audiodev with fallback
hw/arm: Support machine-default audiodev with fallback
Introduce machine property "audiodev"
audio: remove QEMU_AUDIO_* and -audio-help support
audio: simplify flow in audio_init
audio: commonize voice initialization
audio: return Error ** from audio_state_by_name
audio: allow returning an error from the driver init
audio: Require AudioState in AUD_add_capture
ui/vnc: Require audiodev= to enable audio
crypto: only include tls-cipher-suites in emulators
scsi-disk: ensure that FORMAT UNIT commands are terminated
esp: restrict non-DMA transfer length to that of available data
esp: use correct type for esp_dma_enable() in sysbus_esp_gpio_demux()
Makefile: build plugins before running TCG tests
meson: clean up static_library keyword arguments
make-release: do not ship dtc sources
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Many machine types have default audio devices with no way to set the underlying
audiodev. Instead of adding an option for each and every one of them, this new
property can be used as a default during machine initialisation when creating
such devices.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
[Make the property optional, instead of including it in all machines. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
"Host Memory Backends" and "Memory devices" queue ("mem"):
- Support and document VM templating with R/O files using a new "rom"
parameter for memory-backend-file
- Some cleanups and fixes around NVDIMMs and R/O file handling for guest
RAM
- Optimize ioeventfd updates by skipping address spaces that are not
applicable
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmUJdykRHGRhdmlkQHJl
ZGhhdC5jb20ACgkQTd4Q9wD/g1pf2w//akOUoYMuamySGjXtKLVyMKZkjIys+Ama
k2C0xzsWAHBP572ezwHi8uxf5j9kzAjsw6GxDZ7FAamD9MhiohkEvkecloBx6f/c
q3fVHblBNkG7v2urtf4+6PJtJvhzOST2SFXfWeYhO/vaA04AYCDgexv82JN3gA6B
OS8WyOX62b8wILPSY2GLZ8IqpE9XnOYZwzVBn6YB1yo7ZkYEfXO6cA8nykNuNcOE
vppqDo7uVIX6317FWj8ygxmzFfOaj0WT2MT2XFzEIDfg8BInQN8HC4mTn0hcVKMa
N1y+eZH733CQKT+uNBRZ5YOeljOi4d6gEEyvkkA/L7e5D3Qg9hIdvHb4uryCFSWX
Vt07OP1XLBwCZFobOC6sg+2gtTZJxxYK89e6ZzEd0454S24w5bnEteRAaCGOP0XL
ww9xYULqhtZs55UC4rvZHJwdUAk1fIY4VqynwkeQXegvz6BxedNeEkJiiEU0Tizx
N2VpsxAJ7H/LLSFeZoCRESo4azrH6U4n7S/eS1tkCniFqibfe2yIQCDoJVfb42ec
gfg/vThCrDwHkIHzkMmoV8NndA7Q7SIkyMfYeEEBeZMeg8JzYll4DJEw/jQCacxh
KRUa+AZvGlTJUq0mkvyOVfLki+iaehoIUuY1yvMrmdWijPO8n3YybmP9Ljhr8VdR
9MSYZe+I2v8=
=iraT
-----END PGP SIGNATURE-----
Merge tag 'mem-2023-09-19' of https://github.com/davidhildenbrand/qemu into staging
Hi,
"Host Memory Backends" and "Memory devices" queue ("mem"):
- Support and document VM templating with R/O files using a new "rom"
parameter for memory-backend-file
- Some cleanups and fixes around NVDIMMs and R/O file handling for guest
RAM
- Optimize ioeventfd updates by skipping address spaces that are not
applicable
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmUJdykRHGRhdmlkQHJl
# ZGhhdC5jb20ACgkQTd4Q9wD/g1pf2w//akOUoYMuamySGjXtKLVyMKZkjIys+Ama
# k2C0xzsWAHBP572ezwHi8uxf5j9kzAjsw6GxDZ7FAamD9MhiohkEvkecloBx6f/c
# q3fVHblBNkG7v2urtf4+6PJtJvhzOST2SFXfWeYhO/vaA04AYCDgexv82JN3gA6B
# OS8WyOX62b8wILPSY2GLZ8IqpE9XnOYZwzVBn6YB1yo7ZkYEfXO6cA8nykNuNcOE
# vppqDo7uVIX6317FWj8ygxmzFfOaj0WT2MT2XFzEIDfg8BInQN8HC4mTn0hcVKMa
# N1y+eZH733CQKT+uNBRZ5YOeljOi4d6gEEyvkkA/L7e5D3Qg9hIdvHb4uryCFSWX
# Vt07OP1XLBwCZFobOC6sg+2gtTZJxxYK89e6ZzEd0454S24w5bnEteRAaCGOP0XL
# ww9xYULqhtZs55UC4rvZHJwdUAk1fIY4VqynwkeQXegvz6BxedNeEkJiiEU0Tizx
# N2VpsxAJ7H/LLSFeZoCRESo4azrH6U4n7S/eS1tkCniFqibfe2yIQCDoJVfb42ec
# gfg/vThCrDwHkIHzkMmoV8NndA7Q7SIkyMfYeEEBeZMeg8JzYll4DJEw/jQCacxh
# KRUa+AZvGlTJUq0mkvyOVfLki+iaehoIUuY1yvMrmdWijPO8n3YybmP9Ljhr8VdR
# 9MSYZe+I2v8=
# =iraT
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 Sep 2023 06:25:45 EDT
# gpg: using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A
# gpg: issuer "david@redhat.com"
# gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown]
# gpg: aka "David Hildenbrand <davidhildenbrand@gmail.com>" [full]
# gpg: aka "David Hildenbrand <hildenbr@in.tum.de>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D FCCA 4DDE 10F7 00FF 835A
* tag 'mem-2023-09-19' of https://github.com/davidhildenbrand/qemu:
memory: avoid updating ioeventfds for some address_space
machine: Improve error message when using default RAM backend id
softmmu/physmem: Hint that "readonly=on,rom=off" exists when opening file R/W for private mapping fails
docs: Start documenting VM templating
docs: Don't mention "-mem-path" in multi-process.rst
softmmu/physmem: Never return directories from file_ram_open()
softmmu/physmem: Fail creation of new files in file_ram_open() with readonly=true
softmmu/physmem: Bail out early in ram_block_discard_range() with readonly files
softmmu/physmem: Remap with proper protection in qemu_ram_remap()
backends/hostmem-file: Add "rom" property to support VM templating with R/O files
softmmu/physmem: Distinguish between file access mode and mmap protection
nvdimm: Reject writing label data to ROM instead of crashing QEMU
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For migration purposes, users might want to reuse the default RAM
backend id, but specify a different memory backend.
For example, to reuse "pc.ram" on q35, one has to set
-machine q35,memory-backend=pc.ram
Only then, can a memory backend with the id "pc.ram" be created
manually.
Let's improve the error message by improving the hint. Use
error_append_hint() -- which in turn requires ERRP_GUARD().
Message-ID: <20230906120503.359863-12-david@redhat.com>
Suggested-by: ThinerLogoer <logoerthiner1@163.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
USO features of virtio-net device depend on kernel ability
to support them, for backward compatibility by default the
features are disabled on 8.0 and earlier.
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychecnko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The current implementers of ARI are all SR-IOV devices. The ARI next
function number field is undefined for VF according to PCI Express Base
Specification Revision 5.0 Version 1.0 section 9.3.7.7. The PF still
requires some defined value so end the linked list formed with the field
by specifying 0 as required for any ARI implementation according to
section 7.8.7.2.
For migration, the field will keep having 1 as its value on the old
QEMU machine versions.
Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt")
Fixes: 44c2c09488 ("hw/nvme: Add support for SR-IOV")
Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20230710153838.33917-3-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For some architectures like ARM64, multiple CPUs in one cluster can be
associated with different NUMA nodes, which is irregular configuration
because we shouldn't have this in baremetal environment. The irregular
configuration causes Linux guest to misbehave, as the following warning
messages indicate.
-smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
-numa node,nodeid=0,cpus=0-1,memdev=ram0 \
-numa node,nodeid=1,cpus=2-3,memdev=ram1 \
-numa node,nodeid=2,cpus=4-5,memdev=ram2 \
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : build_sched_domains+0x284/0x910
lr : build_sched_domains+0x184/0x910
sp : ffff80000804bd50
x29: ffff80000804bd50 x28: 0000000000000002 x27: 0000000000000000
x26: ffff800009cf9a80 x25: 0000000000000000 x24: ffff800009cbf840
x23: ffff000080325000 x22: ffff0000005df800 x21: ffff80000a4ce508
x20: 0000000000000000 x19: ffff000080324440 x18: 0000000000000014
x17: 00000000388925c0 x16: 000000005386a066 x15: 000000009c10cc2e
x14: 00000000000001c0 x13: 0000000000000001 x12: ffff00007fffb1a0
x11: ffff00007fffb180 x10: ffff80000a4ce508 x9 : 0000000000000041
x8 : ffff80000a4ce500 x7 : ffff80000a4cf920 x6 : 0000000000000001
x5 : 0000000000000001 x4 : 0000000000000007 x3 : 0000000000000002
x2 : 0000000000001000 x1 : ffff80000a4cf928 x0 : 0000000000000001
Call trace:
build_sched_domains+0x284/0x910
sched_init_domains+0xac/0xe0
sched_init_smp+0x48/0xc8
kernel_init_freeable+0x140/0x1ac
kernel_init+0x28/0x140
ret_from_fork+0x10/0x20
Improve the situation to warn when multiple CPUs in one cluster have
been associated with different NUMA nodes. However, one NUMA node is
allowed to be associated with different clusters.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230509002739.18388-2-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU aborts when default RAM backend should be used (i.e. no
explicit '-machine memory-backend=' specified) but user
has created an object which 'id' equals to default RAM backend
name used by board.
$QEMU -machine pc \
-object memory-backend-ram,id=pc.ram,size=4294967296
Actual results:
QEMU 7.2.0 monitor - type 'help' for more information
(qemu) Unexpected error in object_property_try_add() at ../qom/object.c:1239:
qemu-kvm: attempt to add duplicate property 'pc.ram' to object (type 'container')
Aborted (core dumped)
Instead of abort, check for the conflicting 'id' and exit with
an error, suggesting how to remedy the issue.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2207886
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230522131717.3780533-1-imammedo@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Since it's implementation on v8.0.0-rc0, having the PCI_ERR_UNCOR_MASK
set for machine types < 8.0 will cause migration to fail if the target
QEMU version is < 8.0.0 :
qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 cmask: ff wmask: 0 w1cmask:0
qemu-system-x86_64: Failed to load PCIDevice:config
qemu-system-x86_64: Failed to load e1000e:parent_obj
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:02.0/e1000e'
qemu-system-x86_64: load of migration failed: Invalid argument
The above test migrated a 7.2 machine type from QEMU master to QEMU 7.2.0,
with this cmdline:
./qemu-system-x86_64 -M pc-q35-7.2 [-incoming XXX]
In order to fix this, property x-pcie-err-unc-mask was introduced to
control when PCI_ERR_UNCOR_MASK is enabled. This property is enabled by
default, but is disabled if machine type <= 7.2.
Fixes: 010746ae1d ("hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20230503002701.854329-1-leobras@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1576
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We used to flush all channels at the end of each RAM section
sent. That is not needed, so preparing to only flush after a full
iteration through all the RAM.
Default value of the property is false. But we return "true" in
migrate_multifd_flush_after_each_section() until we implement the code
in following patches.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
Rename each-iteration to after-each-section
Rename multifd-sync-after-each-section to
multifd-flush-after-each-section
Move to machine-8.0 (peter)
postcopy_qemufile_src object should be owned by one thread, either the main
thread (e.g. when at the beginning, or at the end of migration), or by the
return path thread (when during a preempt enabled postcopy migration). If
that's not the case the access to the object might be racy.
postcopy_preempt_shutdown_file() can be potentially racy, because it's
called at the end phase of migration on the main thread, however during
which the return path thread hasn't yet been recycled; the recycle happens
in await_return_path_close_on_source() which is after this point.
It means, logically it's posslbe the main thread and the return path thread
are both operating on the same qemufile. While I don't think qemufile is
thread safe at all.
postcopy_preempt_shutdown_file() used to be needed because that's where we
send EOS to dest so that dest can safely shutdown the preempt thread.
To avoid the possible race, remove this only place that a race can happen.
Instead we figure out another way to safely close the preempt thread on
dest.
The core idea during postcopy on deciding "when to stop" is that dest will
send a postcopy SHUT message to src, telling src that all data is there.
Hence to shut the dest preempt thread maybe better to do it directly on
dest node.
This patch proposed such a way that we change postcopy_prio_thread_created
into PreemptThreadStatus, so that we kick the preempt thread on dest qemu
by a sequence of:
mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
qemu_file_shutdown(mis->postcopy_qemufile_dst);
While here shutdown() is probably so far the easiest way to kick preempt
thread from a blocked qemu_get_be64(). Then it reads preempt_thread_status
to make sure it's not a network failure but a willingness to quit the
thread.
We could have avoided that extra status but just rely on migration status.
The problem is postcopy_ram_incoming_cleanup() is just called early enough
so we're still during POSTCOPY_ACTIVE no matter what.. So just make it
simple to have the status introduced.
One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of
postcopy preempt.
Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The system clock is necessary to implement PTP features. While we are
not implementing PTP features for e1000e yet, we do have a plan to
implement them for igb, a new network device derived from e1000e,
so add system clock to the common base first.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
In bad9c5a516 ("virtio-rng-pci: fix migration compat for vectors") I
fixed the virtio-rng-pci migration compatibility, but it was discovered
that we also need to fix the other aliases of the device for the
transitional cases.
Fixes: 9ea02e8f1 ('virtio-rng-pci: Allow setting nvectors, so we can use MSI-X')
bz: https://bugzilla.redhat.com/show_bug.cgi?id=2162569
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20230207174944.138255-1-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>
Tracked down with the help of scripts/clean-includes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230202133830.2152150-21-armbru@redhat.com>
The bitmap and the size are immutable while migration is active: see
virtio_mem_is_busy(). We can migrate this information early, before
migrating any actual RAM content. Further, all information we need for
sanity checks is immutable as well.
Having this information in place early will, for example, allow for
properly preallocating memory before touching these memory locations
during RAM migration: this way, we can make sure that all memory was
actually preallocated and that any user errors (e.g., insufficient
hugetlb pages) can be handled gracefully.
In contrast, usable_region_size and requested_size can theoretically
still be modified on the source while the VM is running. Keep migrating
these properties the usual, late, way.
Use a new device property to keep behavior of compat machines
unmodified.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Fixup the migration compatibility for existing machine types
so that they do not enable msi-x.
Symptom:
(qemu) qemu: get_pci_config_device: Bad config data: i=0x34 read: 84 device: 98 cmask: ff wmask: 0 w1cmask:0
qemu: Failed to load PCIDevice:config
qemu: Failed to load virtio-rng:virtio
qemu: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-rng'
qemu: load of migration failed: Invalid argument
Note: This fix will break migration from 7.2->7.2-fixed with this patch
bz: https://bugzilla.redhat.com/show_bug.cgi?id=2155749
Fixes: 9ea02e8f1 ("virtio-rng-pci: Allow setting nvectors, so we can use MSI-X")
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20230109105809.163975-1-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: David Daney <david.daney@fungible.com>
Fixes: 9ea02e8f1 ("virtio-rng-pci: Allow setting nvectors, so we can use MSI-X")<br>
Signed-off-by: Dr. David Alan Gilbert <<a href="mailto:dgilbert@redhat.com" target="_blank">dgilbert@redhat.com</a>><br>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Move "qemu/accel.h" include from the heavily included
"hw/boards.h" to hw/core/machine.c, the single file using
the AccelState definition.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20221130135641.85328-3-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add 8.0 machine types for arm/i440fx/m68k/q35/s390x/spapr.
Reviewed-by: Cédric Le Goater <clg@kaod.org> [ppc]
Reviewed-by: Thomas Huth <thuth@redhat.com> [s390x]
Reviewed-by: Greg Kurz <groug@kaod.org> [ppc]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20221212152145.124317-2-cohuck@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmOZ6lYSHGFybWJydUBy
ZWRoYXQuY29tAAoJEDhwtADrkYZT6VEQAKynjWh3AIZ4/qOgrVqsP0oRspevLmfH
BbuGoldjYpEE7RbwuCaZalZ7iy7TcSySxnPfUDVsFHd7NWffJVjwKHifGC0D/Ez0
+Ggyb1CBebN+mS7t+BNFUHdMM+wxFIlHwg4f4aTFbn2o0HKgj2a8tcNzNRonZbfa
xURnvbD4G4u0VZEc3Jak+x193xbOJFsuuWq0BZnDuNk+XqjyW2RwfpXLPJVk+82a
4uy/YgYuqXUqBeULwcJj+shBL4SXR9GyajTFMS64przSUle0ADUmXkPtaS2agV7e
Pym/UQuAcxvNyw34fJsiMZxx6rZI9YU30jQUMRLoYcPRR/Q/aiPeiiHtiD6Kaid7
IfOeH/EArXaQRFpD89xj4YcaTnRLQOEj0NXgXvAbQf6eD8JYyao/S/0lCsPZEoA2
nibLqEQ25ncDNXoSomuwtfjVff3w68lODFbhwqfA0gf3cPtCgVZ6xQ8P/McNY6K6
wqFHXMWTDHk1LOCTucjYz1z2TGzTnSG4iWi5Yt6FSxAc958AO+v5ALn/1pcYun+E
azM/MF0AInKj2aJCT530zT0tpCs/Jo07YKC8k6ubi77S0ZdmGS1XLeXkRXfk1+yI
OhuUgiVlSTHxD69DagT2vbnx1mDMM9X+OBIMvEi5nwvD9A/ghaCgkDeGFvbA1ud0
t0mxPBZJ+tiZ
=JJjG
-----END PGP SIGNATURE-----
Merge tag 'pull-misc-2022-12-14' of https://repo.or.cz/qemu/armbru into staging
Miscellaneous patches for 2022-12-14
# gpg: Signature made Wed 14 Dec 2022 15:23:02 GMT
# gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg: issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* tag 'pull-misc-2022-12-14' of https://repo.or.cz/qemu/armbru:
ppc4xx_sdram: Simplify sdram_ddr_size() to return
block/vmdk: Simplify vmdk_co_create() to return directly
cleanup: Tweak and re-run return_directly.cocci
io: Tidy up fat-fingered parameter name
qapi: Use returned bool to check for failure (again)
sockets: Use ERRP_GUARD() where obviously appropriate
qemu-config: Use ERRP_GUARD() where obviously appropriate
qemu-config: Make config_parse_qdict() return bool
monitor: Use ERRP_GUARD() in monitor_init()
monitor: Simplify monitor_fd_param()'s error handling
error: Move ERRP_GUARD() to the beginning of the function
error: Drop a few superfluous ERRP_GUARD()
error: Drop some obviously superfluous error_propagate()
Drop more useless casts from void * to pointer
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with. Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step. This is the step for qapi/machine*.json.
Said commit explains the transformation in more detail. The invariant
violations mentioned there do not occur here.
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221104160712.3005652-16-armbru@redhat.com>
include/qapi/error.h advises to put ERRP_GUARD() right at the
beginning of the function, because only then can it guard the whole
function. Clean up the few spots disregarding the advice.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-4-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
A a new command line parameter "queue_reset" is added.
Meanwhile, the vq reset feature is disabled for pre-7.2 machines.
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-5-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This removes the last of the CXL code from the MachineState where it
is visible to all Machines to only those that support CXL (currently i386/pc)
As i386/pc always support CXL now, stop allocating the state independently.
Note the pxb register hookup code runs even if cxl=off in order to detect
pxb_cxl host bridges and fail to start if any are present as they won't
have the control registers available.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Message-Id: <20220608145440.26106-8-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Paolo Bonzini requested this change to simplify the ongoing
effort to allow machine setup entirely via RPC.
Includes shortening the command line form cxl-fixed-memory-window
to cxl-fmw as the command lines are extremely long even with this
change.
The json change is needed to ensure that there is
a CXLFixedMemoryWindowOptionsList even though the actual
element in the json is never used. Similar to existing
SgxEpcProperties.
Update qemu-options.hx to reflect that this is now a -machine
parameter. The bulk of -M / -machine parameters are documented
under machine, so use that in preference to M.
Update cxl-test and bios-tables-test to reflect new parameters.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Message-Id: <20220608145440.26106-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We cannot provide auto-generated unique or persistent namespace
identifiers (EUI64, NGUID, UUID) easily. Since 6.1, namespaces have been
assigned a generated EUI64 of the form "52:54:00:<namespace counter>".
This is will be unique within a QEMU instance, but not globally.
Revert that this is assigned automatically and immediately deprecate the
compatibility parameter. Users can opt-in to this with the
`eui64-default=on` device parameter or set it explicitly with
`eui64=UINT64`.
Cc: libvir-list@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Make the GICv3 set its number of bits of physical priority from the
implementation-specific value provided in the CPU state struct, in
the same way we already do for virtual priority bits. Because this
would be a migration compatibility break, we provide a property
force-8-bit-prio which is enabled for 7.0 and earlier versioned board
models to retain the legacy "always use 8 bits" behaviour.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220512151457.3899052-6-peter.maydell@linaro.org
Message-id: 20220506162129.2896966-5-peter.maydell@linaro.org
There are going to be some potential overheads to CXL enablement,
for example the host bridge region reserved in memory maps.
Add a machine level control so that CXL is disabled by default.
Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-14-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows setting memory properties without going through
vl.c, and have them validated just the same.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-6-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Handle HostMemoryBackend creation and setting of ms->ram entirely in
machine_run_board_init.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-5-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make -m syntactic sugar for a compound property "-machine
mem.{size,max-size,slots}". The new property does not have
the magic conversion to megabytes of unsuffixed arguments,
and also does not understand that "0" means the default size
(you have to leave it out to get the default). This means
that we need to convert the QemuOpts by hand to a QDict.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-4-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make -boot syntactic sugar for a compound property "-machine boot.{order,menu,...}".
machine_boot_parse is replaced by the setter for the property.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-3-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As part of converting -boot to a property with a QAPI type, define
the struct and use it throughout QEMU to access boot configuration.
machine_boot_parse takes care of doing the QemuOpts->QAPI conversion by
hand, for now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds cluster-id in CPU instance properties, which will be used
by arm/virt machine. Besides, the cluster-id is also verified or
dumped in various spots:
* hw/core/machine.c::machine_set_cpu_numa_node() to associate
CPU with its NUMA node.
* hw/core/machine.c::machine_numa_finish_cpu_init() to record
CPU slots with no NUMA mapping set.
* hw/core/machine-hmp-cmds.c::hmp_hotpluggable_cpus() to dump
cluster-id.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20220503140304.855514-2-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
vmstate_acpi_pcihp_use_acpi_index() was expecting AcpiPciHpState
as state but it actually received PIIX4PMState, because
VMSTATE_PCI_HOTPLUG is a macro and not another struct.
So it ended up accessing random pointer, which resulted
in 'false' return value and acpi_index field wasn't ever
sent.
However in 7.0 that pointer de-references to value > 0, and
destination QEMU starts to expect the field which isn't
sent in migratioon stream from older QEMU (6.2 and older).
As result migration fails with:
qemu-system-x86_64: Missing section footer for 0000:00:01.3/piix4_pm
qemu-system-x86_64: load of migration failed: Invalid argument
In addition with QEMU-6.2, destination due to not expected
state, also never expects the acpi_index field in migration
stream.
Q35 is not affected as it always sends/expects the field as
long as acpi based PCI hotplug is enabled.
Fix issue by introducing compat knob to never send/expect
acpi_index in migration stream for 6.2 and older PC machine
types and always send it for 7.0 and newer PC machine types.
Diagnosed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: b32bd76 ("pci: introduce acpi-index property for PCI device")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/932
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
memory_region_is_mapped() is the wrong check, we actually want to check
whether the backend is already marked mapped.
For example, memory regions mapped via an alias, such as NVDIMMs,
currently don't make memory_region_is_mapped() return "true". As the
machine is initialized before any memory devices (and thereby before
NVDIMMs are initialized), this isn't a fix but merely a cleanup.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211102164317.45658-2-david@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Add 7.0 machine types for arm/i440fx/q35/s390x/spapr.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211217143948.289995-1-cohuck@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The new Cluster-Aware Scheduling support has landed in Linux 5.16,
which has been proved to benefit the scheduling performance (e.g.
load balance and wake_affine strategy) on both x86_64 and AArch64.
So now in Linux 5.16 we have four-level arch-neutral CPU topology
definition like below and a new scheduler level for clusters.
struct cpu_topology {
int thread_id;
int core_id;
int cluster_id;
int package_id;
int llc_id;
cpumask_t thread_sibling;
cpumask_t core_sibling;
cpumask_t cluster_sibling;
cpumask_t llc_sibling;
}
A cluster generally means a group of CPU cores which share L2 cache
or other mid-level resources, and it is the shared resources that
is used to improve scheduler's behavior. From the point of view of
the size range, it's between CPU die and CPU core. For example, on
some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
and 4 CPU cores in each cluster. The 4 CPU cores share a separate
L2 cache and a L3 cache tag, which brings cache affinity advantage.
In virtualization, on the Hosts which have pClusters (physical
clusters), if we can design a vCPU topology with cluster level for
guest kernel and have a dedicated vCPU pinning. A Cluster-Aware
Guest kernel can also make use of the cache affinity of CPU clusters
to gain similar scheduling performance.
This patch adds infrastructure for CPU cluster level topology
configuration and parsing, so that the user can specify cluster
parameter if their machines support it.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Message-Id: <20211228092221.21068-3-wangyanan55@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[PMD: Added '(since 7.0)' to @clusters in qapi/machine.json]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
All methods related to MachineState are prefixed with "machine_".
smp_parse() does not need to be an exception. Rename it and
const'ify the SMPConfiguration argument, since it doesn't need
to be modified.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211216132015.815493-9-philmd@redhat.com>
Change namespaces to be shared namespaces by default (parameter
shared=on). Keep shared=off for older machine types.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
hw_compat_5_2 has an issue: it affects only "virtio-net-pci"
but not "virtio-net-pci-transitional" and
"virtio-net-pci-non-transitional". The solution is to use the
"virtio-net-pci-base" type in compat_props.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999141
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jean-Louis Dupond <jean-louis@dupond.be>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Now that we check sysbus device types during device creation, we
can remove the check in the machine init done notifier.
This was the only thing done by this notifier, so we remove the
whole sysbus_notifier structure of the MachineState.
Note: This notifier was checking all /peripheral and /peripheral-anon
sysbus devices. Now we only check those added by -device cli option or
device_add qmp command when handling the command/option. So if there
are some devices added in one of these containers manually (eg in
machine C code), these will not be checked anymore.
This use case does not seem to appear apart from
hw/xen/xen-legacy-backend.c (it uses qdev_set_id() and in this case,
not for a sysbus device, so it's ok).
Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20211029142258.484907-4-damien.hedde@greensocs.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Right now the allowance check for adding a sysbus device using
-device cli option (or device_add qmp command) is done well after
the device has been created. It is done during the machine init done
notifier: machine_init_notify() in hw/core/machine.c
This new function will allow us to do the check at the right time and
issue an error if it fails.
Also make device_is_dynamic_sysbus() use the new function.
Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20211029142258.484907-2-damien.hedde@greensocs.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
We are going to introduce an unit test for the parser smp_parse()
in hw/core/machine.c, but now machine.c is only built in softmmu.
In order to solve the build dependency on the smp parsing code and
avoid building unrelated stuff for the unit tests, move the tested
code from machine.c into a separate file, i.e., machine-smp.c and
build it in common field.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211026034659.22040-2-wangyanan55@huawei.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The expected output string from cpu_slot_to_string() ought to be
like "socket-id: *, die-id: *, core-id: *, thread-id: *", so add
the missing ", " before "die-id". This affects the readability
of the error message.
Fixes: 176d2cda0d ("i386/cpu: Consolidate die-id validity in smp context")
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20211008075040.18028-1-wangyanan55@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
virtio-vsock features, like VIRTIO_VSOCK_F_SEQPACKET, can be handled
by vhost-vsock-common parent class. In this way, we can reuse the
same code for all virtio-vsock backends (i.e. vhost-vsock,
vhost-user-vsock).
Let's move `seqpacket` property to vhost-vsock-common class, add
vhost_vsock_common_get_features() used by children, and disable
`seqpacket` for vhost-user-vsock device for machine types < 6.2.
The behavior of vhost-vsock device doesn't change; vhost-user-vsock
device now supports `seqpacket` property.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210921161642.206461-3-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 1e08fd0a46 ("vhost-vsock: SOCK_SEQPACKET feature bit support")
enabled the SEQPACKET feature bit.
This commit is released with QEMU 6.1, so if we try to migrate a VM where
the host kernel supports SEQPACKET but machine type version is less than
6.1, we get the following errors:
Features 0x130000002 unsupported. Allowed features: 0x179000000
Failed to load virtio-vhost_vsock:virtio
error while loading state for instance 0x0 of device '0000:00:05.0/virtio-vhost_vsock'
load of migration failed: Operation not permitted
Let's disable the feature bit for machine types < 6.1.
We add a new OnOffAuto property for this, called `seqpacket`.
When it is `auto` (default), QEMU behaves as before, trying to enable the
feature, when it is `on` QEMU will fail if the backend (vhost-vsock
kernel module) doesn't support it.
Fixes: 1e08fd0a46 ("vhost-vsock: SOCK_SEQPACKET feature bit support")
Cc: qemu-stable@nongnu.org
Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210921161642.206461-2-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Put both sanity-check of the input SMP configuration and sanity-check
of the output SMP configuration uniformly in the generic parser. Then
machine_set_smp() will become cleaner, also all the invalid scenarios
can be tested only by calling the parser.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-16-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now we have a common structure SMPCompatProps used to store information
about SMP compatibility stuff, so we can also move smp_prefer_sockets
there for cleaner code.
No functional change intended.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-15-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now we have a generic smp parser for all arches, and there will
not be any other arch specific ones, so let's remove the callback
from MachineClass and call the parser directly.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-14-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently the only difference between smp_parse and pc_smp_parse
is the support of dies parameter and the related error reporting.
With some arch compat variables like "bool dies_supported", we can
make smp_parse generic enough for all arches and the PC specific
one can be removed.
Making smp_parse() generic enough can reduce code duplication and
ease the code maintenance, and also allows extending the topology
with more arch specific members (e.g., clusters) in the future.
Suggested-by: Andrew Jones <drjones@redhat.com>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-13-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that all the possible topology parameters are integrated in struct
CpuTopology, tweak the order of topology members to be "cpus/sockets/
dies/cores/threads/maxcpus" for readability and consistency. We also
tweak the comment by adding explanation of dies parameter.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210929025816.21076-12-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the sanity-check of smp_cpus and max_cpus against mc in function
machine_set_smp(), we are now using ms->smp.max_cpus for the check
but using current_machine->smp.max_cpus in the error message.
Tweak this by uniformly using the local ms.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210929025816.21076-11-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the real SMP hardware topology world, it's much more likely that
we have high cores-per-socket counts and few sockets totally. While
the current preference of sockets over cores in smp parsing results
in a virtual cpu topology with low cores-per-sockets counts and a
large number of sockets, which is just contrary to the real world.
Given that it is better to make the virtual cpu topology be more
reflective of the real world and also for the sake of compatibility,
we start to prefer cores over sockets over threads in smp parsing
since machine type 6.2 for different arches.
In this patch, a boolean "smp_prefer_sockets" is added, and we only
enable the old preference on older machines and enable the new one
since type 6.2 for all arches by using the machine compat mechanism.
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-10-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have two requirements for a valid SMP configuration:
the product of "sockets * cores * threads" must represent all the
possible cpus, i.e., max_cpus, and then must include the initially
present cpus, i.e., smp_cpus.
So we only need to ensure 1) "sockets * cores * threads == maxcpus"
at first and then ensure 2) "maxcpus >= cpus". With a reasonable
order of the sanity check, we can simplify the error reporting code.
When reporting an error message we also report the exact value of
each topology member to make users easily see what's going on.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210929025816.21076-7-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently we directly calculate the omitted cpus based on the given
incomplete collection of parameters. This makes some cmdlines like:
-smp maxcpus=16
-smp sockets=2,maxcpus=16
-smp sockets=2,dies=2,maxcpus=16
-smp sockets=2,cores=4,maxcpus=16
not work. We should probably set the value of cpus to match maxcpus
if it's omitted, which will make above configs start to work.
So the calculation logic of cpus/maxcpus after this patch will be:
When both maxcpus and cpus are omitted, maxcpus will be calculated
from the given parameters and cpus will be set equal to maxcpus.
When only one of maxcpus and cpus is given then the omitted one
will be set to its counterpart's value. Both maxcpus and cpus may
be specified, but maxcpus must be equal to or greater than cpus.
Note: change in this patch won't affect any existing working cmdlines
but allows more incomplete configs to be valid.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-6-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are currently using maxcpus to calculate the omitted sockets
but using cpus to calculate the omitted cores/threads. This makes
cmdlines like:
-smp cpus=8,maxcpus=16
-smp cpus=8,cores=4,maxcpus=16
-smp cpus=8,threads=2,maxcpus=16
work fine but the ones like:
-smp cpus=8,sockets=2,maxcpus=16
-smp cpus=8,sockets=2,cores=4,maxcpus=16
-smp cpus=8,sockets=2,threads=2,maxcpus=16
break the sanity check.
Since we require for a valid config that the product of "sockets * cores
* threads" should equal to the maxcpus, we should uniformly use maxcpus
to calculate their omitted values.
Also the if-branch of "cpus == 0 || sockets == 0" was split into two
branches of "cpus == 0" and "sockets == 0" so that we can clearly read
that we are parsing the configuration with a preference on cpus over
sockets over cores over threads.
Note: change in this patch won't affect any existing working cmdlines
but improves consistency and allows more incomplete configs to be valid.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-5-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To pave the way for the functional improvement in later patches,
make some refactor/cleanup for the smp parsers, including using
local maxcpus instead of ms->smp.max_cpus in the calculation,
defaulting dies to 0 initially like other members, cleanup the
sanity check for dies.
We actually also fix a hidden defect by avoiding directly using
the provided *zero value* in the calculation, which could cause
a segment fault (e.g. using dies=0 in the calculation).
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-4-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the SMP configuration, we should either provide a topology
parameter with a reasonable value (greater than zero) or just
omit it and QEMU will compute the missing value.
The users shouldn't provide a configuration with any parameter
of it specified as zero (e.g. -smp 8,sockets=0) which could
possibly cause unexpected results in the -smp parsing. So we
deprecate this kind of configurations since 6.2 by adding the
explicit sanity check.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-3-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add 6.2 machine types for arm/i440fx/q35/s390x/spapr.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
machine_set_smp() mistakenly checks 'errp' not '*errp',
and so thinks there is an error every single time it runs.
This causes it to jump to the end of the method, skipping
the max CPUs checks. The caller meanwhile sees no error
and so carries on execution. The result of all this is:
$ qemu-system-x86_64 -smp -1
qemu-system-x86_64: GLib: ../glib/gmem.c:142: failed to allocate 481036337048 bytes
instead of
$ qemu-system-x86_64 -smp -1
qemu-system-x86_64: Invalid SMP CPUs -1. The max CPUs supported by machine 'pc-i440fx-6.1' is 255
This is a regression from
commit fe68090e8f
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu May 13 09:03:48 2021 -0400
machine: add smp compound property
Closes: https://gitlab.com/qemu-project/qemu/-/issues/524
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210812175353.4128471-1-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If dies is not supported by this machine's CPU topology, don't
keep processing options and return directly.
Fixes: 0aebebb561 ("machine: reject -smp dies!=1 for non-PC machines")
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210813112608.1452541-2-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The initial value of VLAN Ether Type (VET) register is 0x8100, as per
the manual and real hardware.
While Linux e1000e driver always writes VET register to 0x8100, it is
not always the case for everyone. Drivers relying on the reset value
of VET won't be able to transmit and receive VLAN frames in QEMU.
Unlike e1000 in QEMU, e1000e uses a field 'vet' in "struct E1000Core"
to cache the value of VET register, but the cache only gets updated
when VET register is written. To always get a consistent VET value
no matter VET is written or remains its reset value, drop the 'vet'
field and use 'core->mac[VET]' directly.
Reported-by: Markus Carlstedt <markus.carlstedt@windriver.com>
Signed-off-by: Christina Wang <christina.wang@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The initial value of VLAN Ether Type (VET) register is 0x8100, as per
the manual and real hardware.
While Linux e1000 driver always writes VET register to 0x8100, it is
not always the case for everyone. Drivers relying on the reset value
of VET won't be able to transmit and receive VLAN frames in QEMU.
Reported-by: Markus Carlstedt <markus.carlstedt@windriver.com>
Signed-off-by: Christina Wang <christina.wang@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>