Proxmox-Port/qemu - qemu - Gitea: Git with a cup of tea

mirror of https://github.com/qemu/qemu.git synced 2025-08-18 00:13:15 +00:00

Author	SHA1	Message	Date
Zhao Liu	225aad5a7b	i386/cpu: Mark CPUID[0x80000005] as reserved for Intel Per SDM, 0x80000005 leaf is reserved for Intel CPU, and its current "assert" check blocks adding new cache model for non-AMD CPUs. And please note, although Zhaoxin mostly follows Intel behavior, this leaf is an exception [1]. So, with the compat property "x-vendor-cpuid-only-v2", for the machine since v10.1, check the vendor and encode this leaf as all-0 only for Intel CPU. In addition, drop lines_per_tag assertion in encode_cache_cpuid80000005(), since Zhaoxin will use legacy Intel cache model in this leaf - which doesn't have this field. This fix also resolves 2 FIXMEs of legacy_l1d_cache_amd and legacy_l1i_cache_amd: /FIXME: CPUID leaf 0x80000005 is inconsistent with leaves 2 & 4 / In addition, per AMD's APM, update the comment of CPUID[0x80000005]. [1]: https://lore.kernel.org/qemu-devel/fa16f7a8-4917-4731-9d9f-7d4c10977168@zhaoxin.com/ Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-9-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Zhao Liu	216d9bb6d7	i386/cpu: Add x-vendor-cpuid-only-v2 option for compatibility Add a compat property "x-vendor-cpuid-only-v2" (for PC machine v10.0 and older) to keep the original behavior. This property will be used to adjust vendor specific CPUID fields. Make x-vendor-cpuid-only-v2 depend on x-vendor-cpuid-only. Although x-vendor-cpuid-only and v2 should be initernal only, QEMU doesn't support "internal" property. To avoid any other unexpected issues, check the dependency. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-8-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Zhao Liu	c416411c28	i386/cpu: Drop CPUID 0x2 specific cache info in X86CPUState With the pre-defined cache model legacy_intel_cpuid2_cache_info, for X86CPUState there's no need to cache special cache information for CPUID 0x2 leaf. Drop the cache_info_cpuid2 field of X86CPUState and use the legacy_intel_cpuid2_cache_info directly. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-7-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Zhao Liu	fe77a78149	i386/cpu: Consolidate CPUID 0x4 leaf Modern Intel CPUs use CPUID 0x4 leaf to describe cache information and leave space in 0x2 for prefetch and TLBs (even TLB has its own leaf CPUID 0x18). And 0x2 leaf provides a descriptor 0xFF to instruct software to check cache information in 0x4 leaf instead. Therefore, follow this behavior to encode 0xFF when Intel CPU has 0x4 leaf with "x-consistent-cache=true" for compatibility. In addition, for older CPUs without 0x4 leaf, still enumerate the cache descriptor in 0x2 leaf, except the case that there's no descriptor matching the cache model, then directly encode 0xFF in 0x2 leaf. This makes sense, as in the 0x2 leaf era, all supported caches should have the corresponding descriptor. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-6-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Zhao Liu	6699e5dc86	i386/cpu: Present same cache model in CPUID 0x2 & 0x4 For a long time, the default cache models used in CPUID 0x2 and 0x4 were inconsistent and had a FIXME note from Eduardo at commit `5e891bf8fd` ("target-i386: Use #defines instead of magic numbers for CPUID cache info"): "/FIXME: CPUID leaf 2 descriptor is inconsistent with CPUID leaf 4 /". This difference is wrong, in principle, both 0x2 and 0x4 are used for Intel's cache description. 0x2 leaf is used for ancient machines while 0x4 leaf is a subsequent addition, and both should be based on the same cache model. Furthermore, on real hardware, 0x4 leaf should be used in preference to 0x2 when it is available. Revisiting the git history, that difference occurred much earlier. Current legacy_l2_cache_cpuid2 (hardcode: "0x2c307d"), which is used for CPUID 0x2 leaf, is introduced in commit `d8134d91d9` ("Intel cache info, by Filip Navara."). Its commit message didn't said anything, but its patch [1] mentioned the cache model chosen is "closest to the ones reported in the AMD registers". Now it is not possible to check which AMD generation this cache model is based on (unfortunately, AMD does not use 0x2 leaf), but at least it is close to the Pentium 4. In fact, the patch description of commit `d8134d91d9` is also a bit wrong, the original cache model in leaf 2 is from Pentium Pro, and its cache descriptor had specified the cache line size ad 32 byte by default, while the updated cache model in commit `d8134d91d9` has 64 byte line size. But after so many years, such judgments are no longer meaningful. On the other hand, for legacy_l2_cache, which is used in CPUID 0x4 leaf, is based on Intel Core Duo (patch [2]) and Core2 Duo (commit `e737b32a36` ("Core 2 Duo specification (Alexander Graf).") The patches of Core Duo and Core 2 Duo add the cache model for CPUID 0x4, but did not update CPUID 0x2 encoding. This is the reason that Intel Guests use two cache models in 0x2 and 0x4 all the time. Of course, while no Core Duo or Core 2 Duo machines have been found for double checking, this still makes no sense to encode different cache models on a single machine. Referring to the SDM and the real hardware available, 0x2 leaf can be directly encoded 0xFF to instruct software to go to 0x4 leaf to get the cache information, when 0x4 is available. Therefore, it's time to clean up Intel's default cache models. As the first step, add "x-consistent-cache" compat option to allow newer machines (v10.1 and newer) to have the consistent cache model in CPUID 0x2 and 0x4 leaves. This doesn't affect the CPU models with CPUID level < 4 ("486", "pentium", "pentium2" and "pentium3"), because they have already had the special default cache model - legacy_intel_cpuid2_cache_info. [1]: https://lore.kernel.org/qemu-devel/5b31733c0709081227w3e5f1036odbc649edfdc8c79b@mail.gmail.com/ [2]: https://lore.kernel.org/qemu-devel/478B65C8.2080602@csgraf.de/ Cc: Alexander Graf <agraf@csgraf.de> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-5-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Zhao Liu	2c70a50c8c	i386/cpu: Add default cache model for Intel CPUs with level < 4 Old Intel CPUs with CPUID level < 4, use CPUID 0x2 leaf (if available) to encode cache information. Introduce a cache model "legacy_intel_cpuid2_cache_info" for the CPUs with CPUID level < 4, based on legacy_l1d_cache, legacy_l1i_cache, legacy_l2_cache_cpuid2 and legacy_l3_cache. But for L2 cache, this cache model completes self_init, sets, partitions, no_invd_sharing and share_level fields, referring legacy_l2_cache, to avoid someone increases CPUID level manually and meets assert() error. But the cache information present in CPUID 0x2 leaf doesn't change. This new cache model makes it possible to remove legacy_l2_cache_cpuid2 in X86CPUState and help to clarify historical cache inconsistency issue. Furthermore, apply this legacy cache model to all Intel CPUs with CPUID level < 4. This includes not only "pentium2" and "pentium3" (which have 0x2 leaf), but also "486" and "pentium" (which only have 0x1 leaf, and cache model won't be presented, just for simplicity). A legacy_intel_cpuid2_cache_info cache model doesn't change the cache information of the above CPUs, because they just depend on 0x2 leaf. Only when someone adjusts the min-level to >=4 will the cache information in CPUID leaf 4 differ from before: previously, the L2 cache information in CPUID leaf 0x2 and 0x4 was different, but now with legacy_intel_cpuid2_cache_info, the information they present will be consistent. This case almost never happens, emulating a CPUID that is not supported by the "ancient" hardware is itself meaningless behavior. Therefore, even though there's the above difference (for really rare case) and considering these old CPUs ("486", "pentium", "pentium2" and "pentium3") won't be used for migration, there's no need to add new versioned CPU models Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-4-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Zhao Liu	67d54014af	i386/cpu: Add descriptor 0x49 for CPUID 0x2 encoding The legacy_l2_cache (2nd-level cache: 4 MByte, 16-way set associative, 64 byte line size) corresponds to descriptor 0x49, but at present cpuid2_cache_descriptors doesn't support descriptor 0x49 because it has multiple meanings. The 0x49 is necessary when CPUID 0x2 and 0x4 leaves have the consistent cache model, and use legacy_l2_cache as the default L2 cache. Therefore, add descriptor 0x49 to represent general L2 cache. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-3-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Zhao Liu	1034b94fe9	i386/cpu: Refine comment of CPUID2CacheDescriptorInfo Refer to SDM vol.3 table 1-21, add the notes about the missing descriptor, and fix the typo and comment format. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-2-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Xiaoyao Li	7ff24fb657	i386/tdx: Don't mask off CPUID_EXT_PDCM It gets below warning when booting TDX VMs: warning: TDX forcibly sets the feature: CPUID[eax=01h].ECX.pdcm [bit 15] Because CPUID_EXT_PDCM is fixed1 for TDX, and MSR_IA32_PERF_CAPABILITIES is supported for TDX guest unconditioanlly. Don't mask off CPUID_EXT_PDCM for TDX. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250625035710.2770679-1-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Xiaoyao Li	50fd57418c	i386/tdx: Remove task->watch only when it's valid In some case (e.g., failed to connect to QGS socket), tdx_generate_quote_cleanup() is called with task->watch invalid. It triggers assertion of qemu-system-x86_64: GLib: g_source_remove: assertion 'tag > 0' failed Fix it by checking task->watch. Fixes: `40da501d89` ("i386/tdx: handle TDG.VP.VMCALL<GetQuote>") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250625035505.2770580-1-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Xiaoyao Li	b6c5b41ba2	i386/cpu: Unify family, model and stepping calculation for x86 CPU There are multiple places where CPUID family/model/stepping info are retrieved from env->cpuid_version. Besides, the calculation of family and model inside host_cpu_vendor_fms() doesn't comply to what Intel and AMD define. For family, both Intel and AMD define that Extended Family ID needs to be counted only when (base) Family is 0xF. For model, Intel counts Extended Model when (base) Family is 0x6 or 0xF, while AMD counts EXtended MOdel when (base) Family is 0xF. Introduce generic helper functions to get family, model and stepping from the EAX value of CPUID leaf 1, with the correct calculation formula. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250630080610.3151956-5-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Xiaoyao Li	3359b588f1	i386/kvm-cpu: Fix the indentation inside kvm_cpu_realizefn() The indentation of one of the } inside kvm_cpu_realizefn() isn'f correct. fix it. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250630080610.3151956-4-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Xiaoyao Li	58f17c4a4f	i386: Cleanup the usage of CPUID_VENDOR_INTEL_1 There are code using "env->cpuid_vendor1 == CPUID_VENDOR_INTEL_1" to check if it is Intel vcpu. Cleanup them to just use IS_INTEL_CPU() Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250630080610.3151956-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Xiaoyao Li	7e862c3526	i386/cpu: Use CPUID_MODEL_ID_SZ instead of hardcoded 48 There is already the MACRO CPUID_MODEL_ID_SZ defined in QEMU. Use it to replace all the hardcoded 48. Opportunistically fix the indentation of CPUID_VENDOR_SZ. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250630080610.3151956-2-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Xiaoyao Li	2393fb93ed	i386/cpu: Move the implementation of is_host_cpu_intel() host-cpu.c It's more proper to put is_host_cpu_intel() in host-cpu.c instead of vmsr_energy.c. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250701075738.3451873-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	d60238b4c1	sev: Provide sev_features flags from IGVM VMSA to KVM_SEV_INIT2 IGVM files can contain an initial VMSA that should be applied to each vcpu as part of the initial guest state. The sev_features flags are provided as part of the VMSA structure. However, KVM only allows sev_features to be set during initialization and not as the guest is being prepared for launch. This patch queries KVM for the supported set of sev_features flags and processes the VP context entries in the IGVM file during kvm_init to determine any sev_features flags set in the IGVM file. These are then provided in the call to KVM_SEV_INIT2 to ensure the guest state matches that specified in the IGVM file. The igvm process() function is modified to allow a partial processing of the file during initialization, with only the IGVM_VHT_VP_CONTEXT fields being processed. This means the function is called twice, firstly to extract the sev_features then secondly to actually configure the guest. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/b2f986aae04e1da2aee530c9be22a54c0c59a560.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	2ff75825cc	i386/sev: Add implementation of CGS set_guest_policy() The new cgs_set_guest_policy() function is provided to receive the guest policy flags, SNP ID block and SNP ID authentication from guest configuration such as an IGVM file and apply it to the platform prior to launching the guest. The policy is used to populate values for the existing 'policy', 'id_block' and 'id_auth' parameters. When provided, the guest policy is applied and the ID block configuration is used to verify the launch measurement and signatures. The guest is only successfully started if the expected launch measurements match the actual measurements and the signatures are valid. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/99e82ddec4ad2970c790db8bea16ea3f57eb0e53.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	915b47078d	backends/igvm: Handle policy for SEV guests Adds a handler for the guest policy initialization IGVM section and builds an SEV policy based on this information and the ID block directive if present. The policy is applied using by calling 'set_guest_policy()' on the ConfidentialGuestSupport object. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/57707230bef331b53e9366ce6a23ed25cd6f1293.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	9de40d7df3	backends/igvm: Process initialization sections in IGVM file The initialization sections in IGVM files contain configuration that should be applied to the guest platform before it is started. This includes guest policy and other information that can affect the security level and the startup measurement of a guest. This commit introduces handling of the initialization sections during processing of the IGVM file. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/9de24fb5df402024b40cbe02de0b13faa7cb4d84.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	96a3088f5e	backends/confidential-guest-support: Add set_guest_policy() function For confidential guests a policy can be provided that defines the security level, debug status, expected launch measurement and other parameters that define the configuration of the confidential platform. This commit adds a new function named set_guest_policy() that can be implemented by each confidential platform, such as AMD SEV to set the policy. This will allow configuration of the policy from a multi-platform resource such as an IGVM file without the IGVM processor requiring specific implementation details for each platform. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Link: https://lore.kernel.org/r/d3888a2eb170c8d8c85a1c4b7e99accf3a15589c.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	596c330b19	docs/interop/firmware.json: Add igvm to FirmwareDevice Create an enum entry within FirmwareDevice for 'igvm' to describe that an IGVM file can be used to map firmware into memory as an alternative to pre-existing firmware devices. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/2eca2611d372facbffa65ee8244cf2d321eb9d17.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	565d591f71	docs/system: Add documentation on support for IGVM IGVM support has been implemented for Confidential Guests that support AMD SEV and AMD SEV-ES. Add some documentation that gives some background on the IGVM format and how to use it to configure a confidential guest. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/b4dc920a30717e19cd79bbbe2cc769f3b9ff3d37.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	86e85eb1d4	i386/sev: Implement ConfidentialGuestSupport functions for SEV The ConfidentialGuestSupport object defines a number of virtual functions that are called during processing of IGVM directives to query or configure initial guest state. In order to support processing of IGVM files, these functions need to be implemented by relevant isolation hardware support code such as SEV. This commit implements the required functions for SEV-ES and adds support for processing IGVM files for configuring the guest. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/7145835f729e6195f2fbda308aa90e089a96ae6e.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	4c7f0976b0	i386/sev: Refactor setting of reset vector and initial CPU state When an SEV guest is started, the reset vector and state are extracted from metadata that is contained in the firmware volume. In preparation for using IGVM to setup the initial CPU state, the code has been refactored to populate vmcb_save_area for each CPU which is then applied during guest startup and CPU reset. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Link: https://lore.kernel.org/r/d3c2debca496c4366a278b135f951908f3b9c341.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	b0e8986668	target/i386: Allow setting of R_LDTR and R_TR with cpu_x86_load_seg_cache() The x86 segment registers are identified by the X86Seg enumeration which includes LDTR and TR as well as the normal segment registers. The function 'cpu_x86_load_seg_cache()' uses the enum to determine which segment to set. However, specifying R_LDTR or R_TR results in an out-of-bounds access of the segment array. Possibly by coincidence, the function does correctly set LDTR or TR in this case as the structures for these registers immediately follow the array which is accessed out of bounds. This patch adds correct handling for R_LDTR and R_TR in the function. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/95c69253ea4f91107625872d5e3f0c586376771d.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	224e807f90	sev: Update launch_update_data functions to use Error handling The class function and implementations for updating launch data return a code in case of error. In some cases an error message is generated and in other cases, just the error return value is used. This small refactor adds an 'Error **errp' parameter to all functions which consistently set an error condition if a non-zero value is returned. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/d59721f7b99cfc87aab71f8f551937e98e983615.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	170beda3cf	i386/pc_sysfw: Ensure sysfw flash configuration does not conflict with IGVM When using an IGVM file the configuration of the system firmware is defined by IGVM directives contained in the file. In this case the user should not configure any pflash devices. This commit skips initialization of the ROM mode when pflash0 is not set then checks to ensure no pflash devices have been configured when using IGVM, exiting with an error message if this is not the case. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/c6166cfe128933b04003a9288566b7affe170dfe.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	28e5ef4a65	hw/i386: Add igvm-cfg object and processing for IGVM files An IGVM file contains configuration of guest state that should be applied during configuration of the guest, before the guest is started. This patch allows the user to add an igvm-cfg object to an X86 machine configuration that allows an IGVM file to be configured that will be applied to the guest before it is started. If an IGVM configuration is provided then the IGVM file is processed at the end of the board initialization, before the state transition to PHASE_MACHINE_INITIALIZED. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/23bc66ae4504ba5cf2134826e055b25df3fc9cd9.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	c1d466d267	backends/igvm: Add IGVM loader and configuration Adds an IGVM loader to QEMU which processes a given IGVM file and applies the directives within the file to the current guest configuration. The IGVM loader can be used to configure both confidential and non-confidential guests. For confidential guests, the ConfidentialGuestSupport object for the system is used to encrypt memory, apply the initial CPU state and perform other confidential guest operations. The loader is configured via a new IgvmCfg QOM object which allows the user to provide a path to the IGVM file to process. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/ae3a07d8f514d93845a9c16bb155c847cb567b0d.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	e7ed19507b	backends/confidential-guest-support: Add functions to support IGVM In preparation for supporting the processing of IGVM files to configure guests, this adds a set of functions to ConfidentialGuestSupport allowing configuration of secure virtual machines that can be implemented for each supported isolation platform type such as Intel TDX or AMD SEV-SNP. These functions will be called by IGVM processing code in subsequent patches. This commit provides a default implementation of the functions that either perform no action or generate an error when they are called. Targets that support ConfidentalGuestSupport should override these implementations. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/23e34a106da87427899f93178102e4a6ef50c966.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Roy Hopkins	84fe49d94a	meson: Add optional dependency on IGVM library The IGVM library allows Independent Guest Virtual Machine files to be parsed and processed. IGVM files are used to configure guest memory layout, initial processor state and other configuration pertaining to secure virtual machines. This adds the --enable-igvm configure option, enabled by default, which attempts to locate and link against the IGVM library via pkgconfig and sets CONFIG_IGVM if found. The library is added to the system_ss target in backends/meson.build where the IGVM parsing will be performed by the ConfidentialGuestSupport object. Signed-off-by: Roy Hopkins <roy.hopkins@randomman.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gerd Hoffman <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/45945a83a638c3f08e68c025f378e7b7f4f6d593.1751554099.git.roy.hopkins@randomman.co.uk Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Zhenzhong Duan	b28f6d5c16	i386/tdx: Fix the report of gpa in QAPI Gpa is defined in QAPI but never reported to monitor because has_gpa is never set to ture. Fix it by setting has_gpa to ture when TDX_REPORT_FATAL_ERROR_GPA_VALID is set in error_code. Fixes: `6e250463b0` ("i386/tdx: Wire TDX_REPORT_FATAL_ERROR with GuestPanic facility") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/20250710035538.303136-1-zhenzhong.duan@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:21 +02:00
Xiaoyao Li	efa742b23e	i386/tdx: handle TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT Record the interrupt vector and the apic id of the vcpu that calls TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT. Inject the interrupt to TD guest to notify the completion of <GetQuote> when notify interrupt vector is valid. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250703024021.3559286-5-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:20 +02:00
Xiaoyao Li	55be385b10	i386/tdx: Set value of <GetTdVmCallInfo> based on capabilities of both KVM and QEMU KVM reports the supported TDVMCALL sub leafs in TDX capabilities. one for kernel-supported TDVMCALLs (userspace can set those blindly) and one for user-supported TDVMCALLs (userspace can set those if it knows how to handle them) Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250703024021.3559286-4-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:20 +02:00
Xiaoyao Li	25c98a1350	update Linux headers to KVM tree master To fetch the update of TDX Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250703024021.3559286-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:20 +02:00
Xiaoyao Li	b57999bb25	i386/tdx: Remove enumeration of GetQuote in tdx_handle_get_tdvmcall_info() GHCI is finalized with the <GetQuote> being one of the base VMCALLs, and not enuemrated via <GetTdVmCallInfo>. Adjust tdx_handle_get_tdvmcall_info() to match with GHCI. Opportunistically fix the wrong indentation and explicitly set the ret to TDG_VP_VMCALL_SUCCESS (in case KVM leaves unexpected value). Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250703024021.3559286-2-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:20 +02:00
Paolo Bonzini	29f1ba338b	target/i386: merge host_cpu_instance_init() and host_cpu_max_instance_init() Simplify the accelerators' cpu_instance_init callbacks by doing all host-cpu setup in a single function. Based-on: <20250711000603.438312-1-pbonzini@redhat.com> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:20 +02:00
Paolo Bonzini	5f158abef4	target/i386: move accel_cpu_instance_init to .instance_init With the reordering of instance_post_init callbacks that is new in 10.1 accel_cpu_instance_init must execute in .instance_init as is already the case for RISC-V. Otherwise, for example, setting the vendor property is broken when using KVM or Hypervisor.framework, because KVM sets it after the user's value is set by DeviceState's intance_post_init callback. Reported-by: Like Xu <like.xu.linux@gmail.com> Reported-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:20 +02:00
Paolo Bonzini	810fcc41fc	target/i386: allow reordering max_x86_cpu_initfn vs accel CPU init The PMU feature is only supported by KVM, so move it there. And since all accelerators other than TCG overwrite the vendor, set it in max_x86_cpu_initfn only if it has not been initialized by the superclass. This makes it possible to run max_x86_cpu_initfn after accelerator init. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:20 +02:00
Paolo Bonzini	d93972d88b	target/i386: nvmm, whpx: add accel/CPU class that sets host vendor NVMM and WHPX are virtualizers, and therefore they need to use (at least by default) the host vendor for the guest CPUID. Add a cpu_instance_init implementation to these accelerators. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:19 +02:00
Peter Maydell	d6390204c6	linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC In the linux-user do_fork() function we try to set the FD_CLOEXEC flag on a pidfd like this: fcntl(pid_fd, F_SETFD, fcntl(pid_fd, F_GETFL) \| FD_CLOEXEC); This has two problems: (1) it doesn't check errors, which Coverity complains about (2) we use F_GETFL when we mean F_GETFD Deal with both of these problems by using qemu_set_cloexec() instead. That function will assert() if the fcntls fail, which is fine (we are inside fork_start()/fork_end() so we know nothing can mess around with our file descriptors here, and we just got this one from pidfd_open()). (As we are touching the if() statement here, we correct the indentation.) Coverity: CID 1508111 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250711141217.1429412-1-peter.maydell@linaro.org>	2025-07-11 10:45:14 -06:00
Richard Henderson	c86da2b1dd	tcg: Use uintptr_t in tcg_malloc implementation Avoid ubsan failure with clang-20, tcg.h:715:19: runtime error: applying non-zero offset 64 to null pointer by not using pointers. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2025-07-11 10:43:47 -06:00
Juraj Marcin	beeac2df5f	migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread Recent patch [1] renames the save_live_complete_precopy handler to save_complete, as the machine is not live in most cases when this handler is executed. The same is true also for save_live_complete_precopy_thread, therefore this patch removes the "live" keyword from the handler itself and related types to keep the naming unified. In contrast to save_complete, this handler is only executed at the end of precopy, therefore the "precopy" keyword is retained. [1]: https://lore.kernel.org/all/20250613140801.474264-7-peterx@redhat.com/ Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250626085235.294690-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:39 -03:00
Peter Xu	3345fb3b6d	migration/postcopy: Add latency distribution report for blocktime Add the latency distribution too for blocktime, using order-of-two buckets. It accounts for all the faults, from either vCPU or non-vCPU threads. With prior rework, it's very easy to achieve by adding an array to account for faults in each buckets. Sample output for HMP (while for QMP it's simply an array): Postcopy Latency Distribution: [ 1 us - 2 us ]: 0 [ 2 us - 4 us ]: 0 [ 4 us - 8 us ]: 1 [ 8 us - 16 us ]: 2 [ 16 us - 32 us ]: 2 [ 32 us - 64 us ]: 3 [ 64 us - 128 us ]: 10169 [ 128 us - 256 us ]: 50151 [ 256 us - 512 us ]: 12876 [ 512 us - 1 ms ]: 97 [ 1 ms - 2 ms ]: 42 [ 2 ms - 4 ms ]: 44 [ 4 ms - 8 ms ]: 93 [ 8 ms - 16 ms ]: 138 [ 16 ms - 32 ms ]: 0 [ 32 ms - 65 ms ]: 0 [ 65 ms - 131 ms ]: 0 [ 131 ms - 262 ms ]: 0 [ 262 ms - 524 ms ]: 0 [ 524 ms - 1 sec ]: 0 [ 1 sec - 2 sec ]: 0 [ 2 sec - 4 sec ]: 0 [ 4 sec - 8 sec ]: 0 [ 8 sec - 16 sec ]: 0 Cc: Markus Armbruster <armbru@redhat.com> Acked-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-15-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:39 -03:00
Peter Xu	ed23a15976	migration/postcopy: blocktime allows track / report non-vCPU faults When used to report page fault latencies, the blocktime feature can be almost useless when KVM async page fault is enabled, because in most cases such remote fault will kickoff async page faults, then it's not trackable from blocktime layer. After all these recent rewrites to blocktime layer, it's finally so easy to also support tracking non-vCPU faults. It'll be even faster if we could always index fault records with TIDs, unfortunately we need to maintain the blocktime API which report things in vCPU indexes. Of course this can work not only for kworkers, but also any guest accesses that may reach a missing page, for example, very likely when in the QEMU main thread too (and all other threads whenever applicable). In this case, we don't care about "how long the threads are blocked", but we only care about "how long the fault will be resolved". Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613141217.474825-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:39 -03:00
Peter Xu	b63a2e9e4b	migration/postcopy: Optimize blocktime fault tracking with hashtable Currently, the postcopy blocktime feature maintains vCPU fault information using an array (vcpu_addr[]). It has two issues. Issue 1: Performance Concern ============================ The old algorithm was almost OK and fast on inserts, except that the lookup is slow and won't scale if there are a lot of vCPUs: when a page is copied during postcopy, mark_postcopy_blocktime_end() will walk the whole array trying to find which vCPUs are blocked by the address. So it needs constant O(N) walk for each page resolution. Alexey (the author of postcopy blocktime) mentioned the perf issue and how to optimize it in a piece of comment in the page resolution path. The comment was (interestingly..) not complete, but it's relatively clear what he wanted to say about this perf issue. Issue 2: Wrong Accounting on re-entrancies ========================================== People might think that each vCPU should only and always get one fault at a time, so that when the blocktime layer captured one fault on one vCPU, we should never see another fault message on this vCPU. It's almost correct, except in some extreme rare cases. Case 1: it's possible the fault thread processes the userfaultfd messages too fast so it can see >1 messages on one vCPU before the previous one was resolved. Case 2: it's theoretically also possible one vCPU can get even more than one message on the same fault address if a fault is retried by the kernel (e.g., handle_userfault() got interrupted before page resolution). As this info might be important, instead of using commit message, I put more details into the code as comment, when introducing an array maintaining concurrent faults on one vCPU. Please refer to the comments for details on both cases, especially case 1 which can be tricky. Case 1 sounds rare, but it can be easily reproduced locally for me when we run blocktime together with the migration-test on the vanilla postcopy. New Design ========== This patch should do almost what Alexey mentioned, but slightly differently: instead of having an array to maintain vCPU fault addresses, for each of the fault message we push a message into a hash, indexed by the fault address. With the hash, it can replace the old two structs: both the vcpu_addr[] array, and also the array to store the start time of the fault. However due to above we need one more counter array to account concurrent faults on the same vCPU - that should even be needed in the old code, it's just that the old code was buggy and it will blindly overwrite an existing entry.. now we'll start to really track everything. The hash structure might be more efficient than tree to maintain such addr->(cpu, fault_time) information, so that the insert() and lookup() paths should ideally both be ~O(1). After all, we do not need to sort. Here we need to do one remove() though after the lookup(). It could be slow but only if many vCPUs faulted exactly on the same address (so when the list of cpu entries is long), which should be unlikely. Even with that, it's still a worst case O(N) (consider 400 vCPUs faulted on the same address and how likely is it..) rather than a constant O(N) complexity. When at it, touch up the tracepoints to make them slightly more useful. One tracepoint is added when walking all the fault entries. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-13-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	4c8a119485	migration/postcopy: Cleanup the total blocktime accounting The variable vcpu_total_blocktime isn't easy to follow. In reality, it wants to capture the case where all vCPUs are stopped, and now there will be some vCPUs starts running. The name now starts to conflict with vcpu_blocktime_total[], meanwhile it's actually not necessary to have the variable at all: since nobody is touching smp_cpus_down except ourselves, we can safely do the calculation at the end before decrementing smp_cpus_down. Hopefully this makes the logic easier to read, side benefit is we drop one temp var. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	28a185204e	migration/postcopy: Cache the tid->vcpu mapping for blocktime Looking up the vCPU index for each fault can be expensive when there're hundreds of vCPUs. Provide a cache for tid->vcpu instead with a hash table, then lookup from there. When at it, add another counter to record how many non-vCPU faults it gets. For example, the main thread can also access a guest page that was missing. These kind of faults are not accounted by blocktime so far. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	f07f2a3092	migration/postcopy: Initialize blocktime context only until listen Before this patch, the blocktime context can be created very early, because postcopy_ram_supported_by_host() <- migrate_caps_check() can happen during migration object init. The trick here is the blocktime context needs system vCPU information, which seems to be possible to change after that point. I didn't verify it, but it doesn't sound right. Now move it out and initialize the context only when postcopy listen starts. That is already during a migration so it should be guaranteed the vCPU topology can never change on both sides. While at it, assert that the ctx isn't created instead this time; the old "if" trick isn't needed when we're sure it will only happen once now. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	b4c82b4288	migration/postcopy: Report fault latencies in blocktime Blocktime so far only cares about the time one vcpu (or the whole system) got blocked. It would be also be helpful if it can also report the latency of page requests, which could be very sensitive during postcopy. Blocktime itself is sometimes not very important, especially when one thinks about KVM async PF support, which means vCPUs are literally almost not blocked at all because the guest OS is smart enough to switch to another task when a remote fault is needed. However, latency is still sensitive and important because even if the guest vCPU is running on threads that do not need a remote fault, the workload that accesses some missing page is still affected. Add two entries to the report, showing how long it takes to resolve a remote fault. Mention in the QAPI doc that this is not the real average fault latency, but only the ones that was requested for a remote fault. Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice, meanwhile add the entry checks in qtests for all postcopy tests. Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613141217.474825-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00

... 3 4 5 6 7 ...

122720 Commits