mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-24 04:40:49 +00:00
* Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its guest
* Optimization for vSGI injection, opportunistically compressing MPIDR
to vCPU mapping into a table
* Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
* Guest support for memory operation instructions (FEAT_MOPS)
* Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
* Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
* Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
the overhead of errata mitigations
* Miscellaneous kernel and selftest fixes
LoongArch:
* New architecture. The hardware uses the same model as x86, s390
and RISC-V, where guest/host mode is orthogonal to supervisor/user
mode. The virtualization extensions are very similar to MIPS,
therefore the code also has some similarities but it's been cleaned
up to avoid some of the historical bogosities that are found in
arch/mips. The kernel emulates MMU, timer and CSR accesses, while
interrupt controllers are only emulated in userspace, at least for
now.
RISC-V:
* Support for the Smstateen and Zicond extensions
* Support for virtualizing senvcfg
* Support for virtualized SBI debug console (DBCN)
S390:
* Nested page table management can be monitored through tracepoints
and statistics
x86:
* Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC,
which could result in a dropped timer IRQ
* Avoid WARN on systems with Intel IPI virtualization
* Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without
forcing more common use cases to eat the extra memory overhead.
* Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
SBPB, aka Selective Branch Predictor Barrier).
* Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
creating the original vCPU would cause KVM to try to synchronize the vCPU's
TSC and thus clobber the correct TSC being set by userspace.
* Compute guest wall clock using a single TSC read to avoid generating an
inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
* "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server
2022.
* Don't apply side effects to Hyper-V's synthetic timer on writes from
userspace to fix an issue where the auto-enable behavior can trigger
spurious interrupts, i.e. do auto-enabling only for guest writes.
* Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
without PML enabled.
* Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
* Harden the fast page fault path to guard against encountering an invalid
root when walking SPTEs.
* Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
* Use the fast path directly from the timer callback when delivering Xen
timer events, instead of waiting for the next iteration of the run loop.
This was not done so far because previously proposed code had races,
but now care is taken to stop the hrtimer at critical points such as
restarting the timer or saving the timer information for userspace.
* Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
* Optimize injection of PMU interrupts that are simultaneous with NMIs.
* Usual handful of fixes for typos and other warts.
x86 - MTRR/PAT fixes and optimizations:
* Clean up code that deals with honoring guest MTRRs when the VM has
non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
* Zap EPT entries when non-coherent DMA assignment stops/start to prevent
using stale entries with the wrong memtype.
* Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y.
This was done as a workaround for virtual machine BIOSes that did not
bother to clear CR0.CD (because ancient KVM/QEMU did not bother to
set it, in turn), and there's zero reason to extend the quirk to
also ignore guest PAT.
x86 - SEV fixes:
* Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
running an SEV-ES guest.
* Clean up the recognition of emulation failures on SEV guests, when KVM would
like to "skip" the instruction but it had already been partially emulated.
This makes it possible to drop a hack that second guessed the (insufficient)
information provided by the emulator, and just do the right thing.
Documentation:
* Various updates and fixes, mostly for x86
* MTRR and PAT fixes and optimizations:
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or
l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH
hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p
bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD
eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL
6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g==
=UIxm
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its
guest
- Optimization for vSGI injection, opportunistically compressing
MPIDR to vCPU mapping into a table
- Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
- Guest support for memory operation instructions (FEAT_MOPS)
- Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
- Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
- Load the stage-2 MMU context at vcpu_load() for VHE systems,
reducing the overhead of errata mitigations
- Miscellaneous kernel and selftest fixes
LoongArch:
- New architecture for kvm.
The hardware uses the same model as x86, s390 and RISC-V, where
guest/host mode is orthogonal to supervisor/user mode. The
virtualization extensions are very similar to MIPS, therefore the
code also has some similarities but it's been cleaned up to avoid
some of the historical bogosities that are found in arch/mips. The
kernel emulates MMU, timer and CSR accesses, while interrupt
controllers are only emulated in userspace, at least for now.
RISC-V:
- Support for the Smstateen and Zicond extensions
- Support for virtualizing senvcfg
- Support for virtualized SBI debug console (DBCN)
S390:
- Nested page table management can be monitored through tracepoints
and statistics
x86:
- Fix incorrect handling of VMX posted interrupt descriptor in
KVM_SET_LAPIC, which could result in a dropped timer IRQ
- Avoid WARN on systems with Intel IPI virtualization
- Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs
without forcing more common use cases to eat the extra memory
overhead.
- Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
SBPB, aka Selective Branch Predictor Barrier).
- Fix a bug where restoring a vCPU snapshot that was taken within 1
second of creating the original vCPU would cause KVM to try to
synchronize the vCPU's TSC and thus clobber the correct TSC being
set by userspace.
- Compute guest wall clock using a single TSC read to avoid
generating an inaccurate time, e.g. if the vCPU is preempted
between multiple TSC reads.
- "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which
complain about a "Firmware Bug" if the bit isn't set for select
F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to
appease Windows Server 2022.
- Don't apply side effects to Hyper-V's synthetic timer on writes
from userspace to fix an issue where the auto-enable behavior can
trigger spurious interrupts, i.e. do auto-enabling only for guest
writes.
- Remove an unnecessary kick of all vCPUs when synchronizing the
dirty log without PML enabled.
- Advertise "support" for non-serializing FS/GS base MSR writes as
appropriate.
- Harden the fast page fault path to guard against encountering an
invalid root when walking SPTEs.
- Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
- Use the fast path directly from the timer callback when delivering
Xen timer events, instead of waiting for the next iteration of the
run loop. This was not done so far because previously proposed code
had races, but now care is taken to stop the hrtimer at critical
points such as restarting the timer or saving the timer information
for userspace.
- Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future
flag.
- Optimize injection of PMU interrupts that are simultaneous with
NMIs.
- Usual handful of fixes for typos and other warts.
x86 - MTRR/PAT fixes and optimizations:
- Clean up code that deals with honoring guest MTRRs when the VM has
non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
- Zap EPT entries when non-coherent DMA assignment stops/start to
prevent using stale entries with the wrong memtype.
- Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y
This was done as a workaround for virtual machine BIOSes that did
not bother to clear CR0.CD (because ancient KVM/QEMU did not bother
to set it, in turn), and there's zero reason to extend the quirk to
also ignore guest PAT.
x86 - SEV fixes:
- Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts
SHUTDOWN while running an SEV-ES guest.
- Clean up the recognition of emulation failures on SEV guests, when
KVM would like to "skip" the instruction but it had already been
partially emulated. This makes it possible to drop a hack that
second guessed the (insufficient) information provided by the
emulator, and just do the right thing.
Documentation:
- Various updates and fixes, mostly for x86
- MTRR and PAT fixes and optimizations"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits)
KVM: selftests: Avoid using forced target for generating arm64 headers
tools headers arm64: Fix references to top srcdir in Makefile
KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
KVM: arm64: selftest: Perform ISB before reading PAR_EL1
KVM: arm64: selftest: Add the missing .guest_prepare()
KVM: arm64: Always invalidate TLB for stage-2 permission faults
KVM: x86: Service NMI requests after PMI requests in VM-Enter path
KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
KVM: arm64: Refine _EL2 system register list that require trap reinjection
arm64: Add missing _EL2 encodings
arm64: Add missing _EL12 encodings
KVM: selftests: aarch64: vPMU test for validating user accesses
KVM: selftests: aarch64: vPMU register test for unimplemented counters
KVM: selftests: aarch64: vPMU register test for implemented counters
KVM: selftests: aarch64: Introduce vpmu_counter_access test
tools: Import arm_pmuv3.h
KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
...
344 lines
10 KiB
C
344 lines
10 KiB
C
/* SPDX-License-Identifier: GPL-2.0-only */
|
|
/*
|
|
* Copyright (C) 2012,2013 - ARM Ltd
|
|
* Author: Marc Zyngier <marc.zyngier@arm.com>
|
|
*/
|
|
|
|
#ifndef __ARM64_KVM_MMU_H__
|
|
#define __ARM64_KVM_MMU_H__
|
|
|
|
#include <asm/page.h>
|
|
#include <asm/memory.h>
|
|
#include <asm/mmu.h>
|
|
#include <asm/cpufeature.h>
|
|
|
|
/*
|
|
* As ARMv8.0 only has the TTBR0_EL2 register, we cannot express
|
|
* "negative" addresses. This makes it impossible to directly share
|
|
* mappings with the kernel.
|
|
*
|
|
* Instead, give the HYP mode its own VA region at a fixed offset from
|
|
* the kernel by just masking the top bits (which are all ones for a
|
|
* kernel address). We need to find out how many bits to mask.
|
|
*
|
|
* We want to build a set of page tables that cover both parts of the
|
|
* idmap (the trampoline page used to initialize EL2), and our normal
|
|
* runtime VA space, at the same time.
|
|
*
|
|
* Given that the kernel uses VA_BITS for its entire address space,
|
|
* and that half of that space (VA_BITS - 1) is used for the linear
|
|
* mapping, we can also limit the EL2 space to (VA_BITS - 1).
|
|
*
|
|
* The main question is "Within the VA_BITS space, does EL2 use the
|
|
* top or the bottom half of that space to shadow the kernel's linear
|
|
* mapping?". As we need to idmap the trampoline page, this is
|
|
* determined by the range in which this page lives.
|
|
*
|
|
* If the page is in the bottom half, we have to use the top half. If
|
|
* the page is in the top half, we have to use the bottom half:
|
|
*
|
|
* T = __pa_symbol(__hyp_idmap_text_start)
|
|
* if (T & BIT(VA_BITS - 1))
|
|
* HYP_VA_MIN = 0 //idmap in upper half
|
|
* else
|
|
* HYP_VA_MIN = 1 << (VA_BITS - 1)
|
|
* HYP_VA_MAX = HYP_VA_MIN + (1 << (VA_BITS - 1)) - 1
|
|
*
|
|
* When using VHE, there are no separate hyp mappings and all KVM
|
|
* functionality is already mapped as part of the main kernel
|
|
* mappings, and none of this applies in that case.
|
|
*/
|
|
|
|
#ifdef __ASSEMBLY__
|
|
|
|
#include <asm/alternative.h>
|
|
|
|
/*
|
|
* Convert a kernel VA into a HYP VA.
|
|
* reg: VA to be converted.
|
|
*
|
|
* The actual code generation takes place in kvm_update_va_mask, and
|
|
* the instructions below are only there to reserve the space and
|
|
* perform the register allocation (kvm_update_va_mask uses the
|
|
* specific registers encoded in the instructions).
|
|
*/
|
|
.macro kern_hyp_va reg
|
|
#ifndef __KVM_VHE_HYPERVISOR__
|
|
alternative_cb ARM64_ALWAYS_SYSTEM, kvm_update_va_mask
|
|
and \reg, \reg, #1 /* mask with va_mask */
|
|
ror \reg, \reg, #1 /* rotate to the first tag bit */
|
|
add \reg, \reg, #0 /* insert the low 12 bits of the tag */
|
|
add \reg, \reg, #0, lsl 12 /* insert the top 12 bits of the tag */
|
|
ror \reg, \reg, #63 /* rotate back */
|
|
alternative_cb_end
|
|
#endif
|
|
.endm
|
|
|
|
/*
|
|
* Convert a hypervisor VA to a PA
|
|
* reg: hypervisor address to be converted in place
|
|
* tmp: temporary register
|
|
*/
|
|
.macro hyp_pa reg, tmp
|
|
ldr_l \tmp, hyp_physvirt_offset
|
|
add \reg, \reg, \tmp
|
|
.endm
|
|
|
|
/*
|
|
* Convert a hypervisor VA to a kernel image address
|
|
* reg: hypervisor address to be converted in place
|
|
* tmp: temporary register
|
|
*
|
|
* The actual code generation takes place in kvm_get_kimage_voffset, and
|
|
* the instructions below are only there to reserve the space and
|
|
* perform the register allocation (kvm_get_kimage_voffset uses the
|
|
* specific registers encoded in the instructions).
|
|
*/
|
|
.macro hyp_kimg_va reg, tmp
|
|
/* Convert hyp VA -> PA. */
|
|
hyp_pa \reg, \tmp
|
|
|
|
/* Load kimage_voffset. */
|
|
alternative_cb ARM64_ALWAYS_SYSTEM, kvm_get_kimage_voffset
|
|
movz \tmp, #0
|
|
movk \tmp, #0, lsl #16
|
|
movk \tmp, #0, lsl #32
|
|
movk \tmp, #0, lsl #48
|
|
alternative_cb_end
|
|
|
|
/* Convert PA -> kimg VA. */
|
|
add \reg, \reg, \tmp
|
|
.endm
|
|
|
|
#else
|
|
|
|
#include <linux/pgtable.h>
|
|
#include <asm/pgalloc.h>
|
|
#include <asm/cache.h>
|
|
#include <asm/cacheflush.h>
|
|
#include <asm/mmu_context.h>
|
|
#include <asm/kvm_emulate.h>
|
|
#include <asm/kvm_host.h>
|
|
|
|
void kvm_update_va_mask(struct alt_instr *alt,
|
|
__le32 *origptr, __le32 *updptr, int nr_inst);
|
|
void kvm_compute_layout(void);
|
|
void kvm_apply_hyp_relocations(void);
|
|
|
|
#define __hyp_pa(x) (((phys_addr_t)(x)) + hyp_physvirt_offset)
|
|
|
|
static __always_inline unsigned long __kern_hyp_va(unsigned long v)
|
|
{
|
|
#ifndef __KVM_VHE_HYPERVISOR__
|
|
asm volatile(ALTERNATIVE_CB("and %0, %0, #1\n"
|
|
"ror %0, %0, #1\n"
|
|
"add %0, %0, #0\n"
|
|
"add %0, %0, #0, lsl 12\n"
|
|
"ror %0, %0, #63\n",
|
|
ARM64_ALWAYS_SYSTEM,
|
|
kvm_update_va_mask)
|
|
: "+r" (v));
|
|
#endif
|
|
return v;
|
|
}
|
|
|
|
#define kern_hyp_va(v) ((typeof(v))(__kern_hyp_va((unsigned long)(v))))
|
|
|
|
/*
|
|
* We currently support using a VM-specified IPA size. For backward
|
|
* compatibility, the default IPA size is fixed to 40bits.
|
|
*/
|
|
#define KVM_PHYS_SHIFT (40)
|
|
|
|
#define kvm_phys_shift(mmu) VTCR_EL2_IPA((mmu)->vtcr)
|
|
#define kvm_phys_size(mmu) (_AC(1, ULL) << kvm_phys_shift(mmu))
|
|
#define kvm_phys_mask(mmu) (kvm_phys_size(mmu) - _AC(1, ULL))
|
|
|
|
#include <asm/kvm_pgtable.h>
|
|
#include <asm/stage2_pgtable.h>
|
|
|
|
int kvm_share_hyp(void *from, void *to);
|
|
void kvm_unshare_hyp(void *from, void *to);
|
|
int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
|
|
int __create_hyp_mappings(unsigned long start, unsigned long size,
|
|
unsigned long phys, enum kvm_pgtable_prot prot);
|
|
int hyp_alloc_private_va_range(size_t size, unsigned long *haddr);
|
|
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
|
|
void __iomem **kaddr,
|
|
void __iomem **haddr);
|
|
int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
|
|
void **haddr);
|
|
int create_hyp_stack(phys_addr_t phys_addr, unsigned long *haddr);
|
|
void __init free_hyp_pgds(void);
|
|
|
|
void stage2_unmap_vm(struct kvm *kvm);
|
|
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
|
|
void kvm_uninit_stage2_mmu(struct kvm *kvm);
|
|
void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
|
|
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
|
|
phys_addr_t pa, unsigned long size, bool writable);
|
|
|
|
int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
|
|
|
|
phys_addr_t kvm_mmu_get_httbr(void);
|
|
phys_addr_t kvm_get_idmap_vector(void);
|
|
int __init kvm_mmu_init(u32 *hyp_va_bits);
|
|
|
|
static inline void *__kvm_vector_slot2addr(void *base,
|
|
enum arm64_hyp_spectre_vector slot)
|
|
{
|
|
int idx = slot - (slot != HYP_VECTOR_DIRECT);
|
|
|
|
return base + (idx * SZ_2K);
|
|
}
|
|
|
|
struct kvm;
|
|
|
|
#define kvm_flush_dcache_to_poc(a,l) \
|
|
dcache_clean_inval_poc((unsigned long)(a), (unsigned long)(a)+(l))
|
|
|
|
static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
|
|
{
|
|
u64 cache_bits = SCTLR_ELx_M | SCTLR_ELx_C;
|
|
int reg;
|
|
|
|
if (vcpu_is_el2(vcpu))
|
|
reg = SCTLR_EL2;
|
|
else
|
|
reg = SCTLR_EL1;
|
|
|
|
return (vcpu_read_sys_reg(vcpu, reg) & cache_bits) == cache_bits;
|
|
}
|
|
|
|
static inline void __clean_dcache_guest_page(void *va, size_t size)
|
|
{
|
|
/*
|
|
* With FWB, we ensure that the guest always accesses memory using
|
|
* cacheable attributes, and we don't have to clean to PoC when
|
|
* faulting in pages. Furthermore, FWB implies IDC, so cleaning to
|
|
* PoU is not required either in this case.
|
|
*/
|
|
if (cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
|
|
return;
|
|
|
|
kvm_flush_dcache_to_poc(va, size);
|
|
}
|
|
|
|
static inline size_t __invalidate_icache_max_range(void)
|
|
{
|
|
u8 iminline;
|
|
u64 ctr;
|
|
|
|
asm volatile(ALTERNATIVE_CB("movz %0, #0\n"
|
|
"movk %0, #0, lsl #16\n"
|
|
"movk %0, #0, lsl #32\n"
|
|
"movk %0, #0, lsl #48\n",
|
|
ARM64_ALWAYS_SYSTEM,
|
|
kvm_compute_final_ctr_el0)
|
|
: "=r" (ctr));
|
|
|
|
iminline = SYS_FIELD_GET(CTR_EL0, IminLine, ctr) + 2;
|
|
return MAX_DVM_OPS << iminline;
|
|
}
|
|
|
|
static inline void __invalidate_icache_guest_page(void *va, size_t size)
|
|
{
|
|
/*
|
|
* VPIPT I-cache maintenance must be done from EL2. See comment in the
|
|
* nVHE flavor of __kvm_tlb_flush_vmid_ipa().
|
|
*/
|
|
if (icache_is_vpipt() && read_sysreg(CurrentEL) != CurrentEL_EL2)
|
|
return;
|
|
|
|
/*
|
|
* Blow the whole I-cache if it is aliasing (i.e. VIPT) or the
|
|
* invalidation range exceeds our arbitrary limit on invadations by
|
|
* cache line.
|
|
*/
|
|
if (icache_is_aliasing() || size > __invalidate_icache_max_range())
|
|
icache_inval_all_pou();
|
|
else
|
|
icache_inval_pou((unsigned long)va, (unsigned long)va + size);
|
|
}
|
|
|
|
void kvm_set_way_flush(struct kvm_vcpu *vcpu);
|
|
void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
|
|
|
|
static inline unsigned int kvm_get_vmid_bits(void)
|
|
{
|
|
int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
|
|
|
|
return get_vmid_bits(reg);
|
|
}
|
|
|
|
/*
|
|
* We are not in the kvm->srcu critical section most of the time, so we take
|
|
* the SRCU read lock here. Since we copy the data from the user page, we
|
|
* can immediately drop the lock again.
|
|
*/
|
|
static inline int kvm_read_guest_lock(struct kvm *kvm,
|
|
gpa_t gpa, void *data, unsigned long len)
|
|
{
|
|
int srcu_idx = srcu_read_lock(&kvm->srcu);
|
|
int ret = kvm_read_guest(kvm, gpa, data, len);
|
|
|
|
srcu_read_unlock(&kvm->srcu, srcu_idx);
|
|
|
|
return ret;
|
|
}
|
|
|
|
static inline int kvm_write_guest_lock(struct kvm *kvm, gpa_t gpa,
|
|
const void *data, unsigned long len)
|
|
{
|
|
int srcu_idx = srcu_read_lock(&kvm->srcu);
|
|
int ret = kvm_write_guest(kvm, gpa, data, len);
|
|
|
|
srcu_read_unlock(&kvm->srcu, srcu_idx);
|
|
|
|
return ret;
|
|
}
|
|
|
|
#define kvm_phys_to_vttbr(addr) phys_to_ttbr(addr)
|
|
|
|
/*
|
|
* When this is (directly or indirectly) used on the TLB invalidation
|
|
* path, we rely on a previously issued DSB so that page table updates
|
|
* and VMID reads are correctly ordered.
|
|
*/
|
|
static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
|
|
{
|
|
struct kvm_vmid *vmid = &mmu->vmid;
|
|
u64 vmid_field, baddr;
|
|
u64 cnp = system_supports_cnp() ? VTTBR_CNP_BIT : 0;
|
|
|
|
baddr = mmu->pgd_phys;
|
|
vmid_field = atomic64_read(&vmid->id) << VTTBR_VMID_SHIFT;
|
|
vmid_field &= VTTBR_VMID_MASK(kvm_arm_vmid_bits);
|
|
return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
|
|
}
|
|
|
|
/*
|
|
* Must be called from hyp code running at EL2 with an updated VTTBR
|
|
* and interrupts disabled.
|
|
*/
|
|
static __always_inline void __load_stage2(struct kvm_s2_mmu *mmu,
|
|
struct kvm_arch *arch)
|
|
{
|
|
write_sysreg(mmu->vtcr, vtcr_el2);
|
|
write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
|
|
|
|
/*
|
|
* ARM errata 1165522 and 1530923 require the actual execution of the
|
|
* above before we can switch to the EL1/EL0 translation regime used by
|
|
* the guest.
|
|
*/
|
|
asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
|
|
}
|
|
|
|
static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
|
|
{
|
|
return container_of(mmu->arch, struct kvm, arch);
|
|
}
|
|
#endif /* __ASSEMBLY__ */
|
|
#endif /* __ARM64_KVM_MMU_H__ */
|