Commit Graph

678 Commits

Author SHA1 Message Date
Oliver Upton
6836e1f30f Documentation: KVM: Use unordered list for pre-init VGIC registers
The intent was to create a single column table, however the markup used
was actually for a header which led to docs build failures:

  Sphinx parallel build error:
  docutils.utils.SystemMessage: Documentation/virt/kvm/devices/arm-vgic-v3.rst:128: (SEVERE/4) Unexpected section title or transition.

Fix the issue by converting the attempted table to an unordered list.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250729142217.0d4e64cd@canb.auug.org.au/
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-ID: <20250729152242.3232229-1-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-29 13:43:50 -04:00
Paolo Bonzini
314b40b3b6 KVM/arm64 changes for 6.17, round #1
- Host driver for GICv5, the next generation interrupt controller for
    arm64, including support for interrupt routing, MSIs, interrupt
    translation and wired interrupts.
 
  - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
    GICv5 hardware, leveraging the legacy VGIC interface.
 
  - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
    userspace to disable support for SGIs w/o an active state on hardware
    that previously advertised it unconditionally.
 
  - Map supporting endpoints with cacheable memory attributes on systems
    with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
    maintenance on the address range.
 
  - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
    hypervisor to inject external aborts into an L2 VM and take traps of
    masked external aborts to the hypervisor.
 
  - Convert more system register sanitization to the config-driven
    implementation.
 
  - Fixes to the visibility of EL2 registers, namely making VGICv3 system
    registers accessible through the VGIC device instead of the ONE_REG
    vCPU ioctls.
 
  - Various cleanups and minor fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaIezbRccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFr/eAQDY5NIG5cR6ZcAWnPQLmGWpz2ou
 pq4Jhn9E/mGR3n5L1AEAsJpfLLpOsmnLBdwfbjmW59gGsa8k3i5tjWEOJ6yzAwk=
 =r+sp
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 changes for 6.17, round #1

 - Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.

 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.

 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.

 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.

 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.

 - Convert more system register sanitization to the config-driven
   implementation.

 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.

 - Various cleanups and minor fixes.
2025-07-29 12:27:40 -04:00
Paolo Bonzini
1a14928e2e Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.17

 - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the
   guest.  Failure to honor FREEZE_IN_SMM can bleed host state into the guest.

 - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to
   prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF.

 - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
   vCPU's CPUID model.

 - Rework the MSR interception code so that the SVM and VMX APIs are more or
   less identical.

 - Recalculate all MSR intercepts from the "source" on MSR filter changes, and
   drop the dedicated "shadow" bitmaps (and their awful "max" size defines).

 - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
   nested SVM MSRPM offsets tracker can't handle an MSR.

 - Advertise support for LKGS (Load Kernel GS base), a new instruction that's
   loosely related to FRED, but is supported and enumerated independently.

 - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED,
   a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON).  Use
   the same approach KVM uses for dealing with "impossible" emulation when
   running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU
   has architecturally impossible state.

 - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
   APERF/MPERF reads, so that a "properly" configured VM can "virtualize"
   APERF/MPERF (with many caveats).

 - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default"
   frequency is unsupported for VMs with a "secure" TSC, and there's no known
   use case for changing the default frequency for other VM types.
2025-07-29 08:36:43 -04:00
Paolo Bonzini
65164fd0f6 KVM/riscv changes for 6.17
- Enabled ring-based dirty memory tracking
 - Improved perf kvm stat to report interrupt events
 - Delegate illegal instruction trap to VS-mode
 - MMU related improvements for KVM RISC-V for upcoming
   nested virtualization
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmiIr0QACgkQrUjsVaLH
 LAf4yA/+KkfQCgMFhwpml6tIzFaa9yS+C9oqemnlWVT/wTADg18/+hraomqUnYhY
 HqOPdWo6O3vmH6E0jR6+AQXz5f4whYl8X8m+HdBHz8c1rF6ozoLMF4Qh3lsDDuZ0
 7pdIEzkNLCA3umBxXUzy7yexKWSzcGD3521toFMPADQHaZTwT8Om5/KLblHB+he3
 1DzsvErPWknWW1pynvSUJoD4zB41Qn364sJvyq4tAW6i8DmxLAmM/+Reh7GBP83r
 t7nAYVdnYikFj0oCb60NcFHqOQpk88mZTqCPMeZD1BoazEDXCPkdx0J44NsBRjun
 BhEpgBLIZDIwOF1A/DDIPrNuNOjSeeUAsAY1sK/yVEOkVZ1HPndyCil5SkE1FJHT
 dmsJBXq96dTlXYo9jBfExFUaUCI1mivLbX7uziIT1876IgLr5NlEJSbwk+TQ8VmR
 IS1PISi7yp5LeZcJEh6PBYgo02UE9gQ/C3tvvcaHbxXyQjVacB6Dw7EnaBArYMwv
 dbEPkxXem90Vup90ixgdLBW3nGCDckpogbsmlqoxV5m3MOknE5L8IuWxKXKB56Tp
 pxp7o1JywBV+Ym5w1BkpTvyEL/a2VDGFOfjryq/h8NAVxXPfwk2plyNhrwLt+Naz
 dYF7RprQL3XTtPnVOND6mMHyl2Sz9y0dc2tj6mpGamSBWPxlP/w=
 =ifud
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.17-2' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.17

- Enabled ring-based dirty memory tracking
- Improved perf kvm stat to report interrupt events
- Delegate illegal instruction trap to VS-mode
- MMU related improvements for KVM RISC-V for upcoming
  nested virtualization
2025-07-29 08:33:04 -04:00
Quan Zhou
f55ffaf896 RISC-V: KVM: Enable ring-based dirty memory tracking
Enable ring-based dirty memory tracking on riscv:

- Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as riscv is weakly
  ordered.
- Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page
  offset.
- Add a check to kvm_vcpu_kvm_riscv_check_vcpu_requests for checking
  whether the dirty ring is soft full.

To handle vCPU requests that cause exits to userspace, modified the
`kvm_riscv_check_vcpu_requests` to return a value (currently only
returns 0 or 1).

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20e116efb1f7aff211dd8e3cf8990c5521ed5f34.1749810735.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28 22:28:22 +05:30
Oliver Upton
0d46e324c0 Merge branch 'kvm-arm64/vgic-v4-ctl' into kvmarm/next
* kvm-arm64/vgic-v4-ctl:
  : Userspace control of nASSGIcap, courtesy of Raghavendra Rao Ananta
  :
  : Allow userspace to decide if support for SGIs without an active state is
  : advertised to the guest, allowing VMs from GICv3-only hardware to be
  : migrated to to GICv4.1 capable machines.
  Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init
  KVM: arm64: selftests: Add test for nASSGIcap attribute
  KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap
  KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization
  KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling
  KVM: arm64: Disambiguate support for vSGIs v. vLPIs

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28 08:11:38 -07:00
Oliver Upton
a7f49a9bf4 Merge branch 'kvm-arm64/el2-reg-visibility' into kvmarm/next
* kvm-arm64/el2-reg-visibility:
  : Fixes to EL2 register visibility, courtesy of Marc Zyngier
  :
  :  - Expose EL2 VGICv3 registers via the VGIC attributes accessor, not the
  :    KVM_{GET,SET}_ONE_REG ioctls
  :
  :  - Condition visibility of FGT registers on the presence of FEAT_FGT in
  :    the VM
  KVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access test
  KVM: arm64: Enforce the sorting of the GICv3 system register table
  KVM: arm64: Clarify the check for reset callback in check_sysreg_table()
  KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2
  KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
  KVM: arm64: selftests: get-reg-list: Add base EL2 registers
  KVM: arm64: selftests: get-reg-list: Simplify feature dependency
  KVM: arm64: Advertise FGT2 registers to userspace
  KVM: arm64: Condition FGT registers on feature availability
  KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
  KVM: arm64: Let GICv3 save/restore honor visibility attribute
  KVM: arm64: Define helper for ICH_VTR_EL2
  KVM: arm64: Define constant value for ICC_SRE_EL2
  KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG
  KVM: arm64: Make RVBAR_EL2 accesses UNDEF

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28 08:06:38 -07:00
Oliver Upton
eed9b14209 Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init
KVM allows userspace to control GICD_IIDR.Revision and
GICD_TYPER2.nASSGIcap prior to initialization for the sake of
provisioning the guest-visible feature set. Document the userspace
expectations surrounding accesses to these registers.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:45:52 -07:00
Paolo Bonzini
4b7d440de2 KVM TDX fixes for 6.16
- Fix a formatting goof in the TDX documentation.
 
  - Reject KVM_SET_TSC_KHZ for guests with a protected TSC (currently only TDX).
 
  - Ensure struct kvm_tdx_capabilities fields that are not explicitly set by KVM
    are zeroed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmh5C9oACgkQOlYIJqCj
 N/1IlRAArAgX2C69qzTWougXFbwl0qiJhY3RpM+jCkiQsSkgICs8kTBSp5OFDRKE
 njtMvdCrhDSXGabRmOlTOjlbauZUU7Ady9Z9GIevVskTu/feknU3I33CQJMBUjCI
 49NG3o3p1vUWluK+WzzclHeojgAFVrPdoWZ3SLhgZf/N9s/hElI46bqpNN/zZOYd
 x2oDC1YJiRDoQmAF/hGDZxvQSZB7ZWvufUEi/lFomjSbWMecn26ynNaFe8jMdjyj
 cBXmsIn41qLMOdVZt+C5h7btcSar1uf0CZ4BfgZbKiAwZn6YH76rzBrAARP15rsI
 iKRfnOTaENx2SkKEX17Xlh1mOFKIMFP5wl2oQbW0pHMEObdUgIcwUlzOtKHmN/Nz
 dCN3W1sYat0fGec4lcx7emNmDDW8EfhmznS9eGo67okGKEbtUnzsDR01Ob335oVN
 F8ORqT1vthn2d3Na1FyAo3a5NKhL2/J67lNThDKzfdiQ7UxYeTjsFW4ys3Li6jDM
 H+LIcXpxLfEwbesbuCHAoyPbAcROFLuNgbTBr1CNm3zsfeqHzxeZAk7aUX6wbBV1
 +F7C7ANfvhae8OBkZoAgJJ3aJEbeloXboQUiijzM12l6Qz+shcpfqmIy5ZrZHQeI
 SKJhHlKDa11vy488sT23Pz1go23sQU83ZbB0x7sdcyOH+1dqCYQ=
 =V3cQ
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-6.16-rc7' of https://github.com/kvm-x86/linux into HEAD

KVM TDX fixes for 6.16

 - Fix a formatting goof in the TDX documentation.

 - Reject KVM_SET_TSC_KHZ for guests with a protected TSC (currently only TDX).

 - Ensure struct kvm_tdx_capabilities fields that are not explicitly set by KVM
   are zeroed.
2025-07-17 17:06:13 +02:00
Marc Zyngier
f68df3aee7 KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
We never documented which GICv3 registers are available for save/restore
via the KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS interface.

Let's take the opportunity of adding the EL2 registers to document the whole
thing in one go.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-12-maz@kernel.org
[ oliver: fix trailing whitespace ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:25:40 -07:00
Paolo Bonzini
8a73c8dbb2 KVM: Documentation: document how KVM is tested
Proper testing greatly simplifies both patch development and review,
but it can be unclear what kind of userspace or guest support
should accompany new features. Clarify maintainer expectations
in terms of testing expectations; additionally, list the cases in
which open-source userspace support is pretty much a necessity and
its absence can only be mitigated by selftests.

While these ideas have long been followed implicitly by KVM contributors
and maintainers, formalize them in writing to provide consistent (though
not universal) guidelines.

Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-15 19:32:18 +02:00
Paolo Bonzini
83131d8469 KVM: Documentation: minimal updates to review-checklist.rst
While the file could stand a larger update, these are the bare minimum changes
needed to make it more widely applicable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-15 19:32:18 +02:00
Kai Huang
b24bbb534c KVM: x86: Reject KVM_SET_TSC_KHZ vCPU ioctl for TSC protected guest
Reject KVM_SET_TSC_KHZ vCPU ioctl if guest's TSC is protected and not
changeable by KVM, and update the documentation to reflect it.

For such TSC protected guests, e.g. TDX guests, typically the TSC is
configured once at VM level before any vCPU are created and remains
unchanged during VM's lifetime.  KVM provides the KVM_SET_TSC_KHZ VM
scope ioctl to allow the userspace VMM to configure the TSC of such VM.
After that the userspace VMM is not supposed to call the KVM_SET_TSC_KHZ
vCPU scope ioctl anymore when creating the vCPU.

The de facto userspace VMM Qemu does this for TDX guests.  The upcoming
SEV-SNP guests with Secure TSC should follow.

Note, TDX support hasn't been fully released as of the "buggy" commit,
i.e. there is no established ABI to break.

Fixes: adafea1106 ("KVM: x86: Add infrastructure for secure TSC")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/r/71bbdf87fdd423e3ba3a45b57642c119ee2dd98c.1752444335.git.kai.huang@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-15 07:05:13 -07:00
Kai Huang
dcbe5a466c KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created
Reject the KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created and
update the documentation to reflect it.

The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC
frequency that all subsequently created vCPUs can use.  It is only
intended to be called before any vCPU is created.  Allowing it to be
called after that only results in confusion but nothing good.

Note this is an ABI change.  But currently in Qemu (the de facto
userspace VMM) only TDX uses this VM ioctl, and it is only called once
before creating any vCPU, therefore the risk of breaking userspace is
pretty low.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/r/135a35223ce8d01cea06b6cef30bfe494ec85827.1752444335.git.kai.huang@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-14 15:29:33 -07:00
Binbin Wu
073b3eca08 Documentation: KVM: Fix unexpected unindent warning
Add proper indentations to bullet list item to resolve the warning:
"Bullet list ends without a blank line; unexpected unindent."

Closes:https://lore.kernel.org/kvm/20250623162110.6e2f4241@canb.auug.org.au

Fixes: 4580dbef5c ("KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250701012536.1281367-1-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10 09:47:23 -07:00
Jim Mattson
a7cec20845 KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs
without interception.

The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not
handled at all. The MSR values are not zeroed on vCPU creation, saved
on suspend, or restored on resume. No accommodation is made for
processor migration or for sharing a logical processor with other
tasks. No adjustments are made for non-unit TSC multipliers. The MSRs
do not account for time the same way as the comparable PMU events,
whether the PMU is virtualized by the traditional emulation method or
the new mediated pass-through approach.

Nonetheless, in a properly constrained environment, this capability
can be combined with a guest CPUID table that advertises support for
CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the
effective physical CPU frequency in /proc/cpuinfo. Moreover, there is
no performance cost for this capability.

Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250530185239.2335185-3-jmattson@google.com
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250626001225.744268-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09 09:33:37 -07:00
Ankit Agrawal
f55ce5a6cd KVM: arm64: Expose new KVM cap for cacheable PFNMAP
Introduce a new KVM capability to expose to the userspace whether
cacheable mapping of PFNMAP is supported.

The ability to safely do the cacheable mapping of PFNMAP is contingent
on S2FWB and ARM64_HAS_CACHE_DIC. S2FWB allows KVM to avoid flushing
the D cache, ARM64_HAS_CACHE_DIC allows KVM to avoid flushing the icache
and turns icache_inval_pou() into a NOP. The cap would be false if
those requirements are missing and is checked by making use of
kvm_arch_supports_cacheable_pfnmap.

This capability would allow userspace to discover the support.
It could for instance be used by userspace to prevent live-migration
across FWB and non-FWB hosts.

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Jason Gunthorpe <jgg@nvidia.com>
CC: Oliver Upton <oliver.upton@linux.dev>
CC: David Hildenbrand <david@redhat.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250705071717.5062-7-ankita@nvidia.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-07 16:54:52 -07:00
Binbin Wu
0c84b53404 Documentation: KVM: Fix unexpected unindent warnings
Add proper indentations to bullet list items to resolve the warning:
"Bullet list ends without a blank line; unexpected unindent."

Closes:https://lore.kernel.org/kvm/20250623162110.6e2f4241@canb.auug.org.au/

Fixes: cf207eac06 ("KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>")
Fixes: 25e8b1dd48 ("KVM: TDX: Exit to userspace for GetTdVmCallInfo")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20250625014829.82289-1-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-25 06:29:45 -07:00
Paolo Bonzini
28224ef02b KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
Allow userspace to advertise TDG.VP.VMCALL subfunctions that the
kernel also supports.  For each output register of GetTdVmCallInfo's
leaf 1, add two fields to KVM_TDX_CAPABILITIES: one for kernel-supported
TDVMCALLs (userspace can set those blindly) and one for user-supported
TDVMCALLs (userspace can set those if it knows how to handle them).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-20 14:20:20 -04:00
Paolo Bonzini
4580dbef5c KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-20 14:09:50 -04:00
Binbin Wu
25e8b1dd48 KVM: TDX: Exit to userspace for GetTdVmCallInfo
Exit to userspace for TDG.VP.VMCALL<GetTdVmCallInfo> via KVM_EXIT_TDX,
to allow userspace to provide information about the support of
TDVMCALLs when r12 is 1 for the TDVMCALLs beyond the GHCI base API.

GHCI spec defines the GHCI base TDVMCALLs: <GetTdVmCallInfo>, <MapGPA>,
<ReportFatalError>, <Instruction.CPUID>, <#VE.RequestMMIO>,
<Instruction.HLT>, <Instruction.IO>, <Instruction.RDMSR> and
<Instruction.WRMSR>. They must be supported by VMM to support TDX guests.

For GetTdVmCallInfo
- When leaf (r12) to enumerate TDVMCALL functionality is set to 0,
  successful execution indicates all GHCI base TDVMCALLs listed above are
  supported.

  Update the KVM TDX document with the set of the GHCI base APIs.

- When leaf (r12) to enumerate TDVMCALL functionality is set to 1, it
  indicates the TDX guest is querying the supported TDVMCALLs beyond
  the GHCI base TDVMCALLs.
  Exit to userspace to let userspace set the TDVMCALL sub-function bit(s)
  accordingly to the leaf outputs.  KVM could set the TDVMCALL bit(s)
  supported by itself when the TDVMCALLs don't need support from userspace
  after returning from userspace and before entering guest. Currently, no
  such TDVMCALLs implemented, KVM just sets the values returned from
  userspace.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
[Adjust userspace API. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-20 13:55:47 -04:00
Binbin Wu
cf207eac06 KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>
Handle TDVMCALL for GetQuote to generate a TD-Quote.

GetQuote is a doorbell-like interface used by TDX guests to request VMM
to generate a TD-Quote signed by a service hosting TD-Quoting Enclave
operating on the host.  A TDX guest passes a TD Report (TDREPORT_STRUCT) in
a shared-memory area as parameter.  Host VMM can access it and queue the
operation for a service hosting TD-Quoting enclave.  When completed, the
Quote is returned via the same shared-memory area.

KVM only checks the GPA from the TDX guest has the shared-bit set and drops
the shared-bit before exiting to userspace to avoid bleeding the shared-bit
into KVM's exit ABI.  KVM forwards the request to userspace VMM (e.g. QEMU)
and userspace VMM queues the operation asynchronously.  KVM sets the return
code according to the 'ret' field set by userspace to notify the TDX guest
whether the request has been queued successfully or not.  When the request
has been queued successfully, the TDX guest can poll the status field in
the shared-memory area to check whether the Quote generation is completed
or not.  When completed, the generated Quote is returned via the same
buffer.

Add KVM_EXIT_TDX as a new exit reason to userspace. Userspace is
required to handle the KVM exit reason as the initial support for TDX,
by reentering KVM to ensure that the TDVMCALL is complete.  While at it,
add a note that KVM_EXIT_HYPERCALL also requires reentry with KVM_RUN.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Mikko Ylinen <mikko.ylinen@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
[Adjust userspace API. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-20 13:09:32 -04:00
Linus Torvalds
cfc4ca8986 Notable changes:
- remove obsolete network transports
 
  - remove PCI IO port support
 
  - start adding seccomp-based process handling
    instead of ptrace
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmhBYTIACgkQ10qiO8sP
 aACobQ//ZggBPinLNWXep4pcfK0/x1mx76cKVIpf1TSI6BpG1kQmkpOIxYDE6JTv
 yo1Ydoy7CMs+xxkDRpsm85qcq8BHhK4Ebfg/jYmRSCSKxtWEeNHJv3RmauQGAxym
 iGLR4Wd7dju0ywiOSAr66cZ0OYHKUbT2j4Vxybb8YG5sJ2s3YVJBYsiGJDtmjF9q
 ezySizAhW8KLScSiqWDruHUq7yEGWa8fp2RNPKT5WhOobZRAJI5upFNwHh0dINaK
 8Qntui4IgG922toBVS26g8ZwV6iJlBUsDttpWZEW1xFBxvxhWI5temW1LVBTvs8M
 mTCiKRd/oGwgtzNmWwXPzW7oJbBA/IlYtGognmaPgjwomyeGmWbnIWsB/1VV1QL4
 5+1+zGQzs8xnN2TsOkIQSiWEEkolreG8NFFY2PZPxiSH6lvkYvlin76DbA+HbmWR
 oU8GBKAwJmn15yxPuRRaCtUaVr4M+siIfBVp5NCgvlnc6scCWVdGlT9e59D6T886
 ZCY4O3UOzhzi9f0xCMx8+XVGjCPntlqLJJQCnSTrtS0+E7B78CxYNZRSLQ83HLa/
 ivDA3fu/rvBON/gRYqd1YDOy0NkRddDZLQEwiedRkRSI5TZdEDQZMnOFdqSDEd/D
 doWw8M3m6g5o2zTOF6XkU9Se1VhkkRDUgxQ+AqLCoMIoM3WVby8=
 =iHzS
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linux-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Johannes Berg:
 "The only really new thing is the long-standing seccomp work
  (originally from 2021!). Wven if it still isn't enabled by default due
  to security concerns it can still be used e.g. for tests.

   - remove obsolete network transports

   - remove PCI IO port support

   - start adding seccomp-based process handling instead of ptrace"

* tag 'uml-for-linux-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (29 commits)
  um: remove "extern" from implementation of sigchld_handler
  um: fix unused variable warning
  um: fix SECCOMP 32bit xstate register restore
  um: pass FD for memory operations when needed
  um: Add SECCOMP support detection and initialization
  um: Implement kernel side of SECCOMP based process handling
  um: Track userspace children dying in SECCOMP mode
  um: Add helper functions to get/set state for SECCOMP
  um: Add stub side of SECCOMP/futex based process handling
  um: Move faultinfo extraction into userspace routine
  um: vector: Use mac_pton() for MAC address parsing
  um: vector: Clean up and modernize log messages
  um: chan_kern: use raw spinlock for irqs_to_free_lock
  MAINTAINERS: remove obsolete file entry in TUN/TAP DRIVER
  um: Fix tgkill compile error on old host OSes
  um: stop using PCI port I/O
  um: Remove legacy network transport infrastructure
  um: vector: Eliminate the dependency on uml_net
  um: Remove obsolete legacy network transports
  um/asm: Replace "REP; NOP" with PAUSE mnemonic
  ...
2025-06-05 11:45:33 -07:00
Linus Torvalds
c00b285024 hyperv-next for v6.16
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmg+jmETHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXiWaCACjYSQcCXW2nnZuWUnGMJq8HD5XGBAH
 tNYzOyp2Y4bXEJzfmbHv8UpJynGr3IFKybCnhm0uAQZCmiR5k4CfMvjPQXcJu9LK
 7yUI/dTGrRGG7f3NClWK2vXg7ATqzRGiPuPDk2lDcP04aQQWaUMDYe5SXIgcqKyZ
 cm2OVHapHGbQ7wA+xXGQcUBb6VJ5+BrQUVOqaEQyl4LURvjaQcn7rVDS0SmEi8gq
 42+KDVd8uWYos5dT57HIq9UI5og3PeTvAvHsx26eX8JWNqwXLgvxRH83kstK+GWY
 uG3sOm5yRbJvErLpJHnyBOlXDFNw2EBeLC1VyhdJXBR8RabgI+H/mrY3
 =4bTC
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel)

 - Fixes for Hyper-V UIO driver (Long Li)

 - Fixes for Hyper-V PCI driver (Michael Kelley)

 - Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley)

 - Documentation updates for Hyper-V VMBus (Michael Kelley)

 - Enhance logging for hv_kvp_daemon (Shradha Gupta)

* tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits)
  Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests
  Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir
  Documentation: hyperv: Update VMBus doc with new features and info
  PCI: hv: Remove unnecessary flex array in struct pci_packet
  Drivers: hv: Remove hv_alloc/free_* helpers
  Drivers: hv: Use kzalloc for panic page allocation
  uio_hv_generic: Align ring size to system page
  uio_hv_generic: Use correct size for interrupt and monitor pages
  Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary
  arch/x86: Provide the CPU number in the wakeup AP callback
  x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap()
  PCI: hv: Get vPCI MSI IRQ domain from DeviceTree
  ACPI: irq: Introduce acpi_get_gsi_dispatcher()
  Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device()
  Drivers: hv: vmbus: Get the IRQ number from DeviceTree
  dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties
  arm64, x86: hyperv: Report the VTL the system boots in
  arm64: hyperv: Initialize the Virtual Trust Level field
  Drivers: hv: Provide arch-neutral implementation of get_vtl()
  Drivers: hv: Enable VTL mode for arm64
  ...
2025-06-03 08:39:20 -07:00
Paolo Bonzini
4e02d4f973 KVM SVM changes for 6.16:
- Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to
    fix a race between AP destroy and VMRUN.
 
  - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM.
 
  - Add support for ALLOWED_SEV_FEATURES.
 
  - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y.
 
  - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features
    that utilize those bits.
 
  - Don't account temporary allocations in sev_send_update_data().
 
  - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmgwmwAACgkQOlYIJqCj
 N/1pHw//edW/x838POMeeCN8j39NBKErW9yZoQLhMbzogttRvfoba+xYY9zXyRFx
 8AXB8+2iLtb7pXUohc0eYN0mNqgD0SnoMLqGfn7nrkJafJSUAJHAoZn1Mdom1M1y
 jHvBPbHCMMsgdLV8wpDRqCNWTH+d5W0kcN5WjKwOswVLj1rybVfK7bSLMhvkk1e5
 RrOR4Ewf95/Ag2b36L4SvS1yG9fTClmKeGArMXhEXjy2INVSpBYyZMjVtjHiNzU9
 TjtB2RSM45O+Zl0T2fZdVW8LFhA6kVeL1v+Oo433CjOQE0LQff3Vl14GCANIlPJU
 PiWN/RIKdWkuxStIP3vw02eHzONCcg2GnNHzEyKQ1xW8lmrwzVRdXZzVsc2Dmowb
 7qGykBQ+wzoE0sMeZPA0k/QOSqg2vGxUQHjR7720loLV9m9Tu/mJnS9e179GJKgI
 e1ArSLwKmHpjwKZqU44IQVTZaxSC4Sg2kI670i21ChPgx8+oVkA6I0LFQXymx7uS
 2lbH+ovTlJSlP9fbaJhMwAU2wpSHAyXif/HPjdw2LTH3NdgXzfEnZfTlAWiP65LQ
 hnz5HvmUalW3x9kmzRmeDIAkDnAXhyt3ZQMvbNzqlO5AfS+Tqh4Ed5EFP3IrQAzK
 HQ+Gi0ip+B84t9Tbi6rfQwzTZEbSSOfYksC7TXqRGhNo/DvHumE=
 =k6rK
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-svm-6.16' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.16:

 - Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to
   fix a race between AP destroy and VMRUN.

 - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM.

 - Add support for ALLOWED_SEV_FEATURES.

 - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y.

 - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features
   that utilize those bits.

 - Don't account temporary allocations in sev_send_update_data().

 - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.
2025-05-27 12:15:49 -04:00
Paolo Bonzini
1f7c9d52b1 KVM/riscv changes for 6.16
- Add vector registers to get-reg-list selftest
 - VCPU reset related improvements
 - Remove scounteren initialization from VCPU reset
 - Support VCPU reset from userspace using set_mpstate() ioctl
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmgx8nkACgkQrUjsVaLH
 LAdmYw//aoxlXA8OKFMWFzEC4yZHfq502X1p95Z9h1i8IYKoRcqNxX8O5MJt0y8Z
 dyhgqKLT5q5bpUf/yxJjuJdZQJtCTC0o1bx89sSiM83M2J+AZymcIS5yFVIcxetv
 JRsaWh8GDc/XYUFBnjYCj5zmA5IObbd/QCVnkATjVRcRS56LnRv98P9YPxN7zhHF
 OhDRWELtUowk6FMgiHo75R9vYhOO1ywzzjLFnK5ZHLSYeQ7bhRX/jY5n1BShsWjp
 k+zVRTpJobw9AU76EPQaEKuqiynQz35ZPkxAiuWYac6SDKwmztvGJ1fCnSNAkJLk
 0kN/eAxv61F5nCRJxOkxK0Uy3U94zA6zX53VdnLoRN4rYpA8CYrE4mNAddN31IBC
 NHlBS59w2EsUxIRL1FiCUrEKKgeSWqJY1NuqsmB4ogeo4MKm8n1OhiYSU0l7NnQ6
 3h77ccnHN95cah2C9XDX3GeZ+on6z1t6FjZ4Enki1w3CCfGdKMsfphwUckTNzSvw
 hTeXhYHcP4VKsYkCcitdLR/VFwO4a3HlnjAtHdJLh0qfJ5SergifZ5/eQXhHu7f1
 1uyfk/6nKculr/8yzUVkOR7kerMjBuBx8jir89ceKD8qeA+4MUI1rdwEvzbzaEY9
 9aDwEVbQ1qMjpeONfKUvbyLJ7uN5Lae1/X4Kafmo2TWNAvtLPLo=
 =+S8x
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.16-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.16

- Add vector registers to get-reg-list selftest
- VCPU reset related improvements
- Remove scounteren initialization from VCPU reset
- Support VCPU reset from userspace using set_mpstate() ioctl
2025-05-26 16:27:00 -04:00
Paolo Bonzini
4d526b02df KVM/arm64 updates for 6.16
* New features:
 
   - Add large stage-2 mapping support for non-protected pKVM guests,
     clawing back some performance.
 
   - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and
     protected modes.
 
   - Enable nested virtualisation support on systems that support it
     (yes, it has been a long time coming), though it is disabled by
     default.
 
 * Improvements, fixes and cleanups:
 
   - Large rework of the way KVM tracks architecture features and links
     them with the effects of control bits. This ensures correctness of
     emulation (the data is automatically extracted from the published
     JSON files), and helps dealing with the evolution of the
     architecture.
 
   - Significant changes to the way pKVM tracks ownership of pages,
     avoiding page table walks by storing the state in the hypervisor's
     vmemmap. This in turn enables the THP support described above.
 
   - New selftest checking the pKVM ownership transition rules
 
   - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
     even if the host didn't have it.
 
   - Fixes for the address translation emulation, which happened to be
     rather buggy in some specific contexts.
 
   - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
     from the number of counters exposed to a guest and addressing a
     number of issues in the process.
 
   - Add a new selftest for the SVE host state being corrupted by a
     guest.
 
   - Keep HCR_EL2.xMO set at all times for systems running with the
     kernel at EL2, ensuring that the window for interrupts is slightly
     bigger, and avoiding a pretty bad erratum on the AmpereOne HW.
 
   - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
     from a pretty bad case of TLB corruption unless accesses to HCR_EL2
     are heavily synchronised.
 
   - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
     tables in a human-friendly fashion.
 
   - and the usual random cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmgwU7UACgkQI9DQutE9
 ekN93g//fNnejxf01dBFIbuylzYEyHZSEH0iTGLeM+ES9zvntCzciTYVzb27oqNG
 RDLShlQYp3w4rAe6ORzyePyHptOmKXCxfj/VXUFp3A7H9QYOxt1nacD3WxI9fCOo
 LzaSLquvgwFBaeTdDE0KdeTUKQHluId+w1Azh0lnHGeUP+lOHNZ8FqoP1/la0q04
 GvVL+l3wz/IhPP8r1YA0Q1bzJ5SLfSpjIw/0F5H/xgI4lyYdHzgFL8sKuSyFeCyM
 2STQi+ZnTCsAs4bkXkw2Pp9CFYrfQgZi+sf7Om+noAKhbJo3vb7/RHpgjv+QCjJy
 Kx4g9CbxHfaM03cH6uSLBoFzsACR1iAuUz8BCSRvvVNH4RVT6H+34nzjLZXLncrP
 gm1uYs9aMTLr91caeAx0aYIMWGYa1uqV0rum3WxyIHezN9Q/NuQoZyfprUufr8oX
 wCYE+ot4VT3DwG0UFZKKwj0BiCbYcbph9nBLVyZJsg8OKxpvspkCtPriFp1kb6BP
 dTTGSXd9JJqwSgP9qJLxijcv6Nfgp2gT42TWwh/dJRZXhnTCvr9IyclFIhoIIq3G
 Q2BkFCXOoEoNQhBA1tiWzJ9nDHf52P72Z2K1gPyyMZwF49HGa2BZBCJGkqX06wSs
 Riolf1/cjFhDno1ThiHKsHT0sG1D4oc9k/1NLq5dyNAEGcgATIA=
 =Jju3
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.16

* New features:

  - Add large stage-2 mapping support for non-protected pKVM guests,
    clawing back some performance.

  - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and
    protected modes.

  - Enable nested virtualisation support on systems that support it
    (yes, it has been a long time coming), though it is disabled by
    default.

* Improvements, fixes and cleanups:

  - Large rework of the way KVM tracks architecture features and links
    them with the effects of control bits. This ensures correctness of
    emulation (the data is automatically extracted from the published
    JSON files), and helps dealing with the evolution of the
    architecture.

  - Significant changes to the way pKVM tracks ownership of pages,
    avoiding page table walks by storing the state in the hypervisor's
    vmemmap. This in turn enables the THP support described above.

  - New selftest checking the pKVM ownership transition rules

  - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
    even if the host didn't have it.

  - Fixes for the address translation emulation, which happened to be
    rather buggy in some specific contexts.

  - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
    from the number of counters exposed to a guest and addressing a
    number of issues in the process.

  - Add a new selftest for the SVE host state being corrupted by a
    guest.

  - Keep HCR_EL2.xMO set at all times for systems running with the
    kernel at EL2, ensuring that the window for interrupts is slightly
    bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

  - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
    from a pretty bad case of TLB corruption unless accesses to HCR_EL2
    are heavily synchronised.

  - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
    tables in a human-friendly fashion.

  - and the usual random cleanups.
2025-05-26 16:19:46 -04:00
Paolo Bonzini
2bb0e39885 Documentation: virt/kvm: remove unreferenced footnote
Replace it with just the URL.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-26 16:12:01 -04:00
Michael Kelley
f77276d1c9 Documentation: hyperv: Update VMBus doc with new features and info
Starting in the 6.15 kernel, VMBus interrupts are automatically
assigned away from a CPU that is being taken offline. Add documentation
describing this case.

Also add details of Hyper-V behavior when the primary channel of
a VMBus device is closed as the result of unbinding the device's
driver. This behavior has not changed, but it was not previously
documented.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20250520044435.7734-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250520044435.7734-1-mhklinux@outlook.com>
2025-05-23 16:30:56 +00:00
Marc Zyngier
7f3225fe8b Merge branch kvm-arm64/nv-nv into kvmarm-master/next
* kvm-arm64/nv-nv:
  : .
  : Flick the switch on the NV support by adding the missing piece
  : in the form of the VNCR page management. From the cover letter:
  :
  : "This is probably the most interesting bit of the whole NV adventure.
  : So far, everything else has been a walk in the park, but this one is
  : where the real fun takes place.
  :
  : With FEAT_NV2, most of the NV support revolves around tricking a guest
  : into accessing memory while it tries to access system registers. The
  : hypervisor's job is to handle the context switch of the actual
  : registers with the state in memory as needed."
  : .
  KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
  KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
  KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
  KVM: arm64: Document NV caps and vcpu flags
  KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
  KVM: arm64: nv: Remove dead code from ERET handling
  KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
  KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
  KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
  KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
  KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
  KVM: arm64: nv: Handle VNCR_EL2-triggered faults
  KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
  KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
  KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
  KVM: arm64: nv: Move TLBI range decoding to a helper
  KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
  KVM: arm64: nv: Extract translation helper from the AT code
  KVM: arm64: nv: Allocate VNCR page when required
  arm64: sysreg: Add layout for VNCR_EL2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23 10:58:57 +01:00
Radim Krčmář
5b9db9c16f RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
Add a toggleable VM capability to reset the VCPU from userspace by
setting MP_STATE_INIT_RECEIVED through IOCTL.

Reset through a mp_state to avoid adding a new IOCTL.
Do not reset on a transition from STOPPED to RUNNABLE, because it's
better to avoid side effects that would complicate userspace adoption.
The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets
the VCPU while preserving the original mp_state -- because we wouldn't
gain much from having a new state it in the rest of KVM, but it's a very
non-standard use of the IOCTL.

Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250515143723.2450630-5-rkrcmar@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21 09:34:57 +05:30
Manali Shukla
89f9edf4c6 KVM: SVM: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs
Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs with Bus Lock
Threshold, which is close enough to VMX's Bus Lock Detection VM-Exit to
allow reusing KVM_CAP_X86_BUS_LOCK_EXIT.

The biggest difference between the two features is that Threshold is
fault-like, whereas Detection is trap-like.  To allow the guest to make
forward progress, Threshold provides a per-VMCB counter which is
decremented every time a bus lock occurs, and a VM-Exit is triggered if
and only if the counter is '0'.

To provide Detection-like semantics, initialize the counter to '0', i.e.
exit on every bus lock, and when re-executing the guilty instruction, set
the counter to '1' to effectively step past the instruction.

Note, in the unlikely scenario that re-executing the instruction doesn't
trigger a bus lock, e.g. because the guest has changed memory types or
patched the guilty instruction, the bus lock counter will be left at '1',
i.e. the guest will be able to do a bus lock on a different instruction.
In a perfect world, KVM would ensure the counter is '0' if the guest has
made forward progress, e.g. if RIP has changed.  But trying to close that
hole would incur non-trivial complexity, for marginal benefit; the intent
of KVM_CAP_X86_BUS_LOCK_EXIT is to allow userspace rate-limit bus locks,
not to allow for precise detection of problematic guest code.  And, it's
simply not feasible to fully close the hole, e.g. if an interrupt arrives
before the original instruction can re-execute, the guest could step past
a different bus lock.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
Link: https://lore.kernel.org/r/20250502050346.14274-5-manali.shukla@amd.com
[sean: fix typo in comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-19 11:05:10 -07:00
Marc Zyngier
29d1697c8c KVM: arm64: Document NV caps and vcpu flags
Describe the two new vcpu flags that control NV, together with
the capabilities that advertise them.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250514103501.2225951-18-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Tiwei Bie
e619e18ed4 um: Remove legacy network transport infrastructure
All legacy network transports have been removed. Vector transports
provide the same capabilities with significantly higher network
throughput. There is no reason to keep the legacy network transport
infrastructure anymore. Remove it to reduce the maintenance burden.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://patch.msgid.link/20250503051710.3286595-4-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-05 10:26:59 +02:00
Tiwei Bie
65eaac591b um: Remove obsolete legacy network transports
These legacy network transports were marked as obsolete in commit
40814b98a5 ("um: Mark non-vector net transports as obsolete").
More than five years have passed since then. Remove these network
transports to reduce the maintenance burden.

Suggested-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://patch.msgid.link/20250503051710.3286595-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-05 10:26:59 +02:00
Marc Zyngier
67bd641517 Merge branch kvm-arm64/nv-pmu-fixes into kvmarm-master/next
* kvm-arm64/nv-pmu-fixes:
  : .
  : Fixes for NV PMU emulation. From the cover letter:
  :
  : "Joey reports that some of his PMU tests do not behave quite as
  : expected:
  :
  : - MDCR_EL2.HPMN is set to 0 out of reset
  :
  : - PMCR_EL0.P should reset all the counters when written from EL2
  :
  : Oliver points out that setting PMCR_EL0.N from userspace by writing to
  : the register is silly with NV, and that we need a new PMU attribute
  : instead.
  :
  : On top of that, I figured out that we had a number of little gotchas:
  :
  : - It is possible for a guest to write an HPMN value that is out of
  :   bound, and it seems valuable to limit it
  :
  : - PMCR_EL0.N should be the maximum number of counters when read from
  :   EL2, and MDCR_EL2.HPMN when read from EL0/EL1
  :
  : - Prevent userspace from updating PMCR_EL0.N when EL2 is available"
  : .
  KVM: arm64: Let kvm_vcpu_read_pmcr() return an EL-dependent value for PMCR_EL0.N
  KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMN
  KVM: arm64: Don't let userspace write to PMCR_EL0.N when the vcpu has EL2
  KVM: arm64: Allow userspace to limit the number of PMU counters for EL2 VMs
  KVM: arm64: Contextualise the handling of PMCR_EL0.P writes
  KVM: arm64: Fix MDCR_EL2.HPMN reset value
  KVM: arm64: Repaint pmcr_n into nr_pmu_counters

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-29 13:38:21 +01:00
Marc Zyngier
b7628c7973 KVM: arm64: Allow userspace to limit the number of PMU counters for EL2 VMs
As long as we had purely EL1 VMs, we could easily update the number
of guest-visible counters by letting userspace write to PMCR_EL0.N.

With VMs started at EL2, PMCR_EL1.N only reflects MDCR_EL2.HPMN,
and we don't have a good way to limit it.

For this purpose, introduce a new PMUv3 attribute that allows
limiting the maximum number of counters. This requires the explicit
selection of a PMU.

Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11 13:08:23 +01:00
Paolo Bonzini
fd02aa45bd Merge branch 'kvm-tdx-initial' into HEAD
This large commit contains the initial support for TDX in KVM.  All x86
parts enable the host-side hypercalls that KVM uses to talk to the TDX
module, a software component that runs in a special CPU mode called SEAM
(Secure Arbitration Mode).

The series is in turn split into multiple sub-series, each with a separate
merge commit:

- Initialization: basic setup for using the TDX module from KVM, plus
  ioctls to create TDX VMs and vCPUs.

- MMU: in TDX, private and shared halves of the address space are mapped by
  different EPT roots, and the private half is managed by the TDX module.
  Using the support that was added to the generic MMU code in 6.14,
  add support for TDX's secure page tables to the Intel side of KVM.
  Generic KVM code takes care of maintaining a mirror of the secure page
  tables so that they can be queried efficiently, and ensuring that changes
  are applied to both the mirror and the secure EPT.

- vCPU enter/exit: implement the callbacks that handle the entry of a TDX
  vCPU (via the SEAMCALL TDH.VP.ENTER) and the corresponding save/restore
  of host state.

- Userspace exits: introduce support for guest TDVMCALLs that KVM forwards to
  userspace.  These correspond to the usual KVM_EXIT_* "heavyweight vmexits"
  but are triggered through a different mechanism, similar to VMGEXIT for
  SEV-ES and SEV-SNP.

- Interrupt handling: support for virtual interrupt injection as well as
  handling VM-Exits that are caused by vectored events.  Exclusive to
  TDX are machine-check SMIs, which the kernel already knows how to
  handle through the kernel machine check handler (commit 7911f145de,
  "x86/mce: Implement recovery for errors in TDX/SEAM non-root mode")

- Loose ends: handling of the remaining exits from the TDX module, including
  EPT violation/misconfig and several TDVMCALL leaves that are handled in
  the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); plus returning
  an error or ignoring operations that are not supported by TDX guests

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-07 07:36:33 -04:00
Paolo Bonzini
269a2c3663 Documentation: kvm: remove KVM_CAP_MIPS_TE
Trap and emulate virtualization is not available anymore for MIPS.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 06:32:17 -04:00
Paolo Bonzini
af339282e2 Documentation: kvm: organize capabilities in the right section
Categorize the capabilities correctly.  Section 6 is for enabled vCPU
capabilities; section 7 is for enabled VM capabilities; section 8 is
for informational ones.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 06:32:17 -04:00
Paolo Bonzini
ed7974fd59 Documentation: kvm: fix some definition lists
Ensure that they have a ":" in front of the defined item.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 06:32:17 -04:00
Paolo Bonzini
2f313018de Documentation: kvm: drop "Capability" heading from capabilities
It is redundant, and sometimes wrong.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 06:32:17 -04:00
Paolo Bonzini
26cb30f22f Documentation: kvm: give correct name for KVM_CAP_SPAPR_MULTITCE
The capability is incorrectly called KVM_CAP_PPC_MULTITCE in the documentation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 06:32:17 -04:00
Paolo Bonzini
f3e555ba45 Documentation: KVM: KVM_GET_SUPPORTED_CPUID now exposes TSC_DEADLINE
TSC_DEADLINE is now advertised unconditionally by KVM_GET_SUPPORTED_CPUID,
since commit 9be4ec35d6 ("KVM: x86: Advertise TSC_DEADLINE_TIMER in
KVM_GET_SUPPORTED_CPUID", 2024-12-18).  Adjust the documentation to
reflect the new behavior.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250331150550.510320-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 06:24:05 -04:00
Paolo Bonzini
0afd104fb3 KVM/arm64 updates for 6.15
- Nested virtualization support for VGICv3, giving the nested
    hypervisor control of the VGIC hardware when running an L2 VM
 
  - Removal of 'late' nested virtualization feature register masking,
    making the supported feature set directly visible to userspace
 
  - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
    of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
 
  - Paravirtual interface for discovering the set of CPU implementations
    where a VM may run, addressing a longstanding issue of guest CPU
    errata awareness in big-little systems and cross-implementation VM
    migration
 
  - Userspace control of the registers responsible for identifying a
    particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
    allowing VMs to be migrated cross-implementation
 
  - pKVM updates, including support for tracking stage-2 page table
    allocations in the protected hypervisor in the 'SecPageTable' stat
 
  - Fixes to vPMU, ensuring that userspace updates to the vPMU after
    KVM_RUN are reflected into the backing perf events
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZ9s9gBccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFp6LAQCOQ1Fidp8RT1NdhLLAhW5D4gLe
 MNT619R4qfqu64ZpeQEAidHMAYaGRk5KDNBq6Jn+awcJnwCcMnh2ok0vTOjz3gY=
 =RC6A
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.15

 - Nested virtualization support for VGICv3, giving the nested
   hypervisor control of the VGIC hardware when running an L2 VM

 - Removal of 'late' nested virtualization feature register masking,
   making the supported feature set directly visible to userspace

 - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
   of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers

 - Paravirtual interface for discovering the set of CPU implementations
   where a VM may run, addressing a longstanding issue of guest CPU
   errata awareness in big-little systems and cross-implementation VM
   migration

 - Userspace control of the registers responsible for identifying a
   particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
   allowing VMs to be migrated cross-implementation

 - pKVM updates, including support for tracking stage-2 page table
   allocations in the protected hypervisor in the 'SecPageTable' stat

 - Fixes to vPMU, ensuring that userspace updates to the vPMU after
   KVM_RUN are reflected into the backing perf events
2025-03-20 12:54:12 -04:00
Oliver Upton
4f2774c57a Merge branch 'kvm-arm64/writable-midr' into kvmarm/next
* kvm-arm64/writable-midr:
  : Writable implementation ID registers, courtesy of Sebastian Ott
  :
  : Introduce a new capability that allows userspace to set the
  : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1,
  : and AIDR_EL1. Also plug a hole in KVM's trap configuration where
  : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not
  : support SME.
  KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
  KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable
  KVM: arm64: Copy guest CTR_EL0 into hyp VM
  KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR
  KVM: arm64: Allow userspace to change the implementation ID registers
  KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value
  KVM: arm64: Maintain per-VM copy of implementation ID regs
  KVM: arm64: Set HCR_EL2.TID1 unconditionally

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19 14:54:32 -07:00
Oliver Upton
d300b0168e Merge branch 'kvm-arm64/pv-cpuid' into kvmarm/next
* kvm-arm64/pv-cpuid:
  : Paravirtualized implementation ID, courtesy of Shameer Kolothum
  :
  : Big-little has historically been a pain in the ass to virtualize. The
  : implementation ID (MIDR, REVIDR, AIDR) of a vCPU can change at the whim
  : of vCPU scheduling. This can be particularly annoying when the guest
  : needs to know the underlying implementation to mitigate errata.
  :
  : "Hyperscalers" face a similar scheduling problem, where VMs may freely
  : migrate between hosts in a pool of heterogenous hardware. And yes, our
  : server-class friends are equally riddled with errata too.
  :
  : In absence of an architected solution to this wart on the ecosystem,
  : introduce support for paravirtualizing the implementation exposed
  : to a VM, allowing the VMM to describe the pool of implementations that a
  : VM may be exposed to due to scheduling/migration.
  :
  : Userspace is expected to intercept and handle these hypercalls using the
  : SMCCC filter UAPI, should it choose to do so.
  smccc: kvm_guest: Fix kernel builds for 32 bit arm
  KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2
  smccc/kvm_guest: Enable errata based on implementation CPUs
  arm64: Make  _midr_in_range_list() an exported function
  KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2
  KVM: arm64: Specify hypercall ABI for retrieving target implementations
  arm64: Modify _midr_range() functions to read MIDR/REVIDR internally

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19 14:53:16 -07:00
Oliver Upton
56e3e5c8f7 Merge branch 'kvm-arm64/nv-vgic' into kvmarm/next
* kvm-arm64/nv-vgic:
  : NV VGICv3 support, courtesy of Marc Zyngier
  :
  : Support for emulating the GIC hypervisor controls and managing shadow
  : VGICv3 state for the L1 hypervisor. As part of it, bring in support for
  : taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt.
  KVM: arm64: nv: Fail KVM init if asking for NV without GICv3
  KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ
  KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
  KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts
  KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2
  KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting
  KVM: arm64: nv: Add Maintenance Interrupt emulation
  KVM: arm64: nv: Handle L2->L1 transition on interrupt injection
  KVM: arm64: nv: Nested GICv3 emulation
  KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses
  KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses
  KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg
  KVM: arm64: nv: Load timer before the GIC
  arm64: sysreg: Add layout for ICH_MISR_EL2
  arm64: sysreg: Add layout for ICH_VTR_EL2
  arm64: sysreg: Add layout for ICH_HCR_EL2

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19 14:51:43 -07:00
Paolo Bonzini
3ecf162a31 KVM Xen changes for 6.15
- Don't write to the Xen hypercall page on MSR writes that are initiated by
    the host (userspace or KVM) to fix a class of bugs where KVM can write to
    guest memory at unexpected times, e.g. during vCPU creation if userspace has
    set the Xen hypercall MSR index to collide with an MSR that KVM emulates.
 
  - Restrict the Xen hypercall MSR indx to the unofficial synthetic range to
    reduce the set of possible collisions with MSRs that are emulated by KVM
    (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside
    in the synthetic range).
 
  - Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config.
 
  - Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID
    entries when updating PV clocks, as there is no guarantee PV clocks will be
    updated between TSC frequency changes and CPUID emulation, and guest reads
    of Xen TSC should be rare, i.e. are not a hot path.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZpO4ACgkQOlYIJqCj
 N/3AMQ/+J4+yOslekq4DHYhZaTvJFf0MqhPgTuf2s6I5p449JWn9rebqK2w0M9Xj
 fJy7/rboQA4QflBuhTiWcC3Dl1lYtxUqqtcCH9608eqKhbeay87OfV0/vgMwWBRs
 FhcOcp1587esJj5gz5M5R9i3S5Yq7Q4fp1+DmS23X41Zz5nTb2q80MY5UklMgI9I
 Ydaw1liB8rRHWbdt9yM4UsI8k4fMuj0PE8pEapoTSfsZm8J4cG9qHKrvuWjuFSCF
 l18Hyl11nWq8eZ5Vg2E2UIz0EgtWIHKu1/fi4av20/JTuA8Mon15WC5q4BBmDDdD
 keR9OJLYclVBh8KweiJSTUE6PcD9A8pWmoWyp6aGRiyyUVhbwysYTzT7uytwQz6w
 RH/vVHe0o/m19SnD9rqsRVObc7dOGorFXScMcf4Qxoq9yQm2p0lJDvq6c9uECLMV
 RIfZrXe9HS67RB9INybS+1fVlLcd0bLgGfG7q9lWLEABD45HpM5daQ4Mlf8+MIE0
 V7egx9t69/WALbJka8pWNISeFRKkB1LRjite+XXasqJ0iFeneM8UKFVB4OMtXL9g
 M0m8ovvySySMkoCq3yMlKxXh4rJ1/D556/bAaJBukMPWFWX9FQaP33U3FuzId7jH
 ztZVugViQMNiIbQVgUSAcgpuJvgpttAciACODlaw2u2Bk1Txmn0=
 =c3Wt
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-xen-6.15' of https://github.com/kvm-x86/linux into HEAD

KVM Xen changes for 6.15

 - Don't write to the Xen hypercall page on MSR writes that are initiated by
   the host (userspace or KVM) to fix a class of bugs where KVM can write to
   guest memory at unexpected times, e.g. during vCPU creation if userspace has
   set the Xen hypercall MSR index to collide with an MSR that KVM emulates.

 - Restrict the Xen hypercall MSR indx to the unofficial synthetic range to
   reduce the set of possible collisions with MSRs that are emulated by KVM
   (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside
   in the synthetic range).

 - Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config.

 - Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID
   entries when updating PV clocks, as there is no guarantee PV clocks will be
   updated between TSC frequency changes and CPUID emulation, and guest reads
   of Xen TSC should be rare, i.e. are not a hot path.
2025-03-19 09:14:59 -04:00
Isaku Yamahata
52f52ea79a Documentation/virt/kvm: Document on Trust Domain Extensions (TDX)
Add documentation to Intel Trusted Domain Extensions (TDX) support.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250227012021.1778144-21-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-14 14:20:58 -04:00