mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-07 04:49:18 +00:00
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be resized.
guest_memfd files do however support PUNCH_HOLE, which can be used to
switch a memory area between guest_memfd and regular anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new guest_memfd
and page attributes infrastructure. This is mostly useful for testing,
since there is no pKVM-like infrastructure to provide a meaningfully
reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages during
CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually care
about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a stable TSC",
because some of them don't expect the "TSC stable" bit (added to the pvclock
ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
flushes on nested transitions, i.e. always satisfies flush requests. This
allows running bleeding edge versions of VMware Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
- On AMD machines with vNMI, always rely on hardware instead of intercepting
IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events using a
dedicated field instead of snapshotting the "previous" counter. If the
hardware PMC count triggers overflow that is recognized in the same VM-Exit
that KVM manually bumps an event count, KVM would pend PMIs for both the
hardware-triggered overflow and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be problematic for
subsystems that require no regressions for W=1 builds.
- Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
"features".
- Don't force a masterclock update when a vCPU synchronizes to the current TSC
generation, as updating the masterclock can cause kvmclock's time to "jump"
unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
partly as a super minor optimization, but mostly to make KVM play nice with
position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
base granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with
a prefix branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV
support to that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing flag
in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix the
various bugs that were lurking due to lack of said annotation.
There are two non-KVM patches buried in the middle of guest_memfd support:
fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
=66Ws
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Generic:
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be
resized. guest_memfd files do however support PUNCH_HOLE, which can
be used to switch a memory area between guest_memfd and regular
anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that
guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new
guest_memfd and page attributes infrastructure. This is mostly
useful for testing, since there is no pKVM-like infrastructure to
provide a meaningfully reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages
during CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in
non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually
care about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a
stable TSC", because some of them don't expect the "TSC stable" bit
(added to the pvclock ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for
TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM
always flushes on nested transitions, i.e. always satisfies flush
requests. This allows running bleeding edge versions of VMware
Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV
support.
- On AMD machines with vNMI, always rely on hardware instead of
intercepting IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters
and other state prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events
using a dedicated field instead of snapshotting the "previous"
counter. If the hardware PMC count triggers overflow that is
recognized in the same VM-Exit that KVM manually bumps an event
count, KVM would pend PMIs for both the hardware-triggered overflow
and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be
problematic for subsystems that require no regressions for W=1
builds.
- Advertise all of the host-supported CPUID bits that enumerate
IA32_SPEC_CTRL "features".
- Don't force a masterclock update when a vCPU synchronizes to the
current TSC generation, as updating the masterclock can cause
kvmclock's time to "jump" unexpectedly, e.g. when userspace
hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter
fault paths, partly as a super minor optimization, but mostly to
make KVM play nice with position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the
code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
"emulation" at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with a prefix
branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV support to
that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list
selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing
flag in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix
the various bugs that were lurking due to lack of said annotation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
x86/kvm: Do not try to disable kvmclock if it was not enabled
KVM: x86: add missing "depends on KVM"
KVM: fix direction of dependency on MMU notifiers
KVM: introduce CONFIG_KVM_COMMON
KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
RISC-V: KVM: selftests: Add get-reg-list test for STA registers
RISC-V: KVM: selftests: Add steal_time test support
RISC-V: KVM: selftests: Add guest_sbi_probe_extension
RISC-V: KVM: selftests: Move sbi_ecall to processor.c
RISC-V: KVM: Implement SBI STA extension
RISC-V: KVM: Add support for SBI STA registers
RISC-V: KVM: Add support for SBI extension registers
RISC-V: KVM: Add SBI STA info to vcpu_arch
RISC-V: KVM: Add steal-update vcpu request
RISC-V: KVM: Add SBI STA extension skeleton
RISC-V: paravirt: Implement steal-time support
RISC-V: Add SBI STA extension definitions
RISC-V: paravirt: Add skeleton for pv-time support
RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
...
|
||
|---|---|---|
| .. | ||
| e820 | ||
| fpu | ||
| numachip | ||
| shared | ||
| trace | ||
| uv | ||
| vdso | ||
| xen | ||
| acenv.h | ||
| acpi.h | ||
| acrn.h | ||
| agp.h | ||
| alternative.h | ||
| amd_hsmp.h | ||
| amd_nb.h | ||
| amd-ibs.h | ||
| apic.h | ||
| apicdef.h | ||
| apm.h | ||
| arch_hweight.h | ||
| archrandom.h | ||
| asm-offsets.h | ||
| asm-prototypes.h | ||
| asm.h | ||
| atomic64_32.h | ||
| atomic64_64.h | ||
| atomic.h | ||
| audit.h | ||
| barrier.h | ||
| bios_ebda.h | ||
| bitops.h | ||
| boot.h | ||
| bootparam_utils.h | ||
| bug.h | ||
| bugs.h | ||
| cache.h | ||
| cacheflush.h | ||
| cacheinfo.h | ||
| ce4100.h | ||
| cfi.h | ||
| checksum_32.h | ||
| checksum_64.h | ||
| checksum.h | ||
| clocksource.h | ||
| cmdline.h | ||
| cmpxchg_32.h | ||
| cmpxchg_64.h | ||
| cmpxchg.h | ||
| coco.h | ||
| compat.h | ||
| cpu_device_id.h | ||
| cpu_entry_area.h | ||
| cpu.h | ||
| cpufeature.h | ||
| cpufeatures.h | ||
| cpuid.h | ||
| cpuidle_haltpoll.h | ||
| cpumask.h | ||
| crash_core.h | ||
| crash.h | ||
| current.h | ||
| debugreg.h | ||
| delay.h | ||
| desc_defs.h | ||
| desc.h | ||
| device.h | ||
| disabled-features.h | ||
| div64.h | ||
| dma-mapping.h | ||
| dma.h | ||
| dmi.h | ||
| doublefault.h | ||
| dwarf2.h | ||
| edac.h | ||
| efi.h | ||
| elf.h | ||
| elfcore-compat.h | ||
| emergency-restart.h | ||
| emulate_prefix.h | ||
| enclu.h | ||
| entry-common.h | ||
| espfix.h | ||
| exec.h | ||
| extable_fixup_types.h | ||
| extable.h | ||
| fb.h | ||
| fixmap.h | ||
| floppy.h | ||
| frame.h | ||
| fsgsbase.h | ||
| ftrace.h | ||
| futex.h | ||
| gart.h | ||
| GEN-for-each-reg.h | ||
| genapic.h | ||
| geode.h | ||
| gsseg.h | ||
| hardirq.h | ||
| highmem.h | ||
| hpet.h | ||
| hugetlb.h | ||
| hw_breakpoint.h | ||
| hw_irq.h | ||
| hyperv_timer.h | ||
| hyperv-tlfs.h | ||
| hypervisor.h | ||
| i8259.h | ||
| ia32_unistd.h | ||
| ia32.h | ||
| ibt.h | ||
| idtentry.h | ||
| imr.h | ||
| inat_types.h | ||
| inat.h | ||
| init.h | ||
| insn-eval.h | ||
| insn.h | ||
| inst.h | ||
| intel_ds.h | ||
| intel_pconfig.h | ||
| intel_pt.h | ||
| intel_punit_ipc.h | ||
| intel_scu_ipc.h | ||
| intel_telemetry.h | ||
| intel-family.h | ||
| intel-mid.h | ||
| invpcid.h | ||
| io_apic.h | ||
| io_bitmap.h | ||
| io.h | ||
| iomap.h | ||
| iommu.h | ||
| iosf_mbi.h | ||
| irq_remapping.h | ||
| irq_stack.h | ||
| irq_vectors.h | ||
| irq_work.h | ||
| irq.h | ||
| irqdomain.h | ||
| irqflags.h | ||
| ist.h | ||
| jailhouse_para.h | ||
| jump_label.h | ||
| kasan.h | ||
| kaslr.h | ||
| kbdleds.h | ||
| Kbuild | ||
| kdebug.h | ||
| kexec-bzimage64.h | ||
| kexec.h | ||
| kfence.h | ||
| kgdb.h | ||
| kmsan.h | ||
| kprobes.h | ||
| kvm_host.h | ||
| kvm_page_track.h | ||
| kvm_para.h | ||
| kvm_types.h | ||
| kvm_vcpu_regs.h | ||
| kvm-x86-ops.h | ||
| kvm-x86-pmu-ops.h | ||
| kvmclock.h | ||
| linkage.h | ||
| local.h | ||
| mach_timer.h | ||
| mach_traps.h | ||
| math_emu.h | ||
| mc146818rtc.h | ||
| mce.h | ||
| mem_encrypt.h | ||
| memtype.h | ||
| microcode.h | ||
| misc.h | ||
| mman.h | ||
| mmconfig.h | ||
| mmu_context.h | ||
| mmu.h | ||
| mmzone_32.h | ||
| mmzone_64.h | ||
| mmzone.h | ||
| module.h | ||
| mpspec_def.h | ||
| mpspec.h | ||
| mshyperv.h | ||
| msi.h | ||
| msr-index.h | ||
| msr-trace.h | ||
| msr.h | ||
| mtrr.h | ||
| mwait.h | ||
| nmi.h | ||
| nops.h | ||
| nospec-branch.h | ||
| numa_32.h | ||
| numa.h | ||
| olpc_ofw.h | ||
| olpc.h | ||
| orc_header.h | ||
| orc_lookup.h | ||
| orc_types.h | ||
| page_32_types.h | ||
| page_32.h | ||
| page_64_types.h | ||
| page_64.h | ||
| page_types.h | ||
| page.h | ||
| paravirt_api_clock.h | ||
| paravirt_types.h | ||
| paravirt.h | ||
| parport.h | ||
| pc-conf-reg.h | ||
| pci_x86.h | ||
| pci-direct.h | ||
| pci-functions.h | ||
| pci.h | ||
| percpu.h | ||
| perf_event_p4.h | ||
| perf_event.h | ||
| pgalloc.h | ||
| pgtable_32_areas.h | ||
| pgtable_32_types.h | ||
| pgtable_32.h | ||
| pgtable_64_types.h | ||
| pgtable_64.h | ||
| pgtable_areas.h | ||
| pgtable_types.h | ||
| pgtable-2level_types.h | ||
| pgtable-2level.h | ||
| pgtable-3level_types.h | ||
| pgtable-3level.h | ||
| pgtable-invert.h | ||
| pgtable.h | ||
| pkeys.h | ||
| pkru.h | ||
| platform_sst_audio.h | ||
| pm-trace.h | ||
| posix_types.h | ||
| preempt.h | ||
| probe_roms.h | ||
| processor-cyrix.h | ||
| processor-flags.h | ||
| processor.h | ||
| prom.h | ||
| proto.h | ||
| pti.h | ||
| ptrace.h | ||
| purgatory.h | ||
| pvclock-abi.h | ||
| pvclock.h | ||
| qrwlock.h | ||
| qspinlock_paravirt.h | ||
| qspinlock.h | ||
| realmode.h | ||
| reboot_fixups.h | ||
| reboot.h | ||
| required-features.h | ||
| resctrl.h | ||
| rmwcc.h | ||
| seccomp.h | ||
| sections.h | ||
| segment.h | ||
| serial.h | ||
| set_memory.h | ||
| setup_arch.h | ||
| setup.h | ||
| sev-common.h | ||
| sev.h | ||
| sgx.h | ||
| shmparam.h | ||
| shstk.h | ||
| sigcontext.h | ||
| sigframe.h | ||
| sighandling.h | ||
| signal.h | ||
| simd.h | ||
| smap.h | ||
| smp.h | ||
| softirq_stack.h | ||
| sparsemem.h | ||
| spec-ctrl.h | ||
| special_insns.h | ||
| spinlock_types.h | ||
| spinlock.h | ||
| sta2x11.h | ||
| stackprotector.h | ||
| stacktrace.h | ||
| static_call.h | ||
| string_32.h | ||
| string_64.h | ||
| string.h | ||
| suspend_32.h | ||
| suspend_64.h | ||
| suspend.h | ||
| svm.h | ||
| switch_to.h | ||
| sync_bitops.h | ||
| sync_core.h | ||
| syscall_wrapper.h | ||
| syscall.h | ||
| syscalls.h | ||
| tdx.h | ||
| text-patching.h | ||
| thermal.h | ||
| thread_info.h | ||
| time.h | ||
| timer.h | ||
| timex.h | ||
| tlb.h | ||
| tlbbatch.h | ||
| tlbflush.h | ||
| topology.h | ||
| trace_clock.h | ||
| trap_pf.h | ||
| trapnr.h | ||
| traps.h | ||
| tsc.h | ||
| uaccess_32.h | ||
| uaccess_64.h | ||
| uaccess.h | ||
| umip.h | ||
| unaccepted_memory.h | ||
| unistd.h | ||
| unwind_hints.h | ||
| unwind.h | ||
| uprobes.h | ||
| user32.h | ||
| user_32.h | ||
| user_64.h | ||
| user.h | ||
| vdso.h | ||
| vermagic.h | ||
| vga.h | ||
| vgtod.h | ||
| vm86.h | ||
| vmalloc.h | ||
| vmware.h | ||
| vmx.h | ||
| vmxfeatures.h | ||
| vsyscall.h | ||
| vvar.h | ||
| word-at-a-time.h | ||
| x86_init.h | ||
| xor_32.h | ||
| xor_64.h | ||
| xor_avx.h | ||
| xor.h | ||