mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-23 19:05:22 +00:00
* Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its guest
* Optimization for vSGI injection, opportunistically compressing MPIDR
to vCPU mapping into a table
* Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
* Guest support for memory operation instructions (FEAT_MOPS)
* Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
* Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
* Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
the overhead of errata mitigations
* Miscellaneous kernel and selftest fixes
LoongArch:
* New architecture. The hardware uses the same model as x86, s390
and RISC-V, where guest/host mode is orthogonal to supervisor/user
mode. The virtualization extensions are very similar to MIPS,
therefore the code also has some similarities but it's been cleaned
up to avoid some of the historical bogosities that are found in
arch/mips. The kernel emulates MMU, timer and CSR accesses, while
interrupt controllers are only emulated in userspace, at least for
now.
RISC-V:
* Support for the Smstateen and Zicond extensions
* Support for virtualizing senvcfg
* Support for virtualized SBI debug console (DBCN)
S390:
* Nested page table management can be monitored through tracepoints
and statistics
x86:
* Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC,
which could result in a dropped timer IRQ
* Avoid WARN on systems with Intel IPI virtualization
* Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without
forcing more common use cases to eat the extra memory overhead.
* Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
SBPB, aka Selective Branch Predictor Barrier).
* Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
creating the original vCPU would cause KVM to try to synchronize the vCPU's
TSC and thus clobber the correct TSC being set by userspace.
* Compute guest wall clock using a single TSC read to avoid generating an
inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
* "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server
2022.
* Don't apply side effects to Hyper-V's synthetic timer on writes from
userspace to fix an issue where the auto-enable behavior can trigger
spurious interrupts, i.e. do auto-enabling only for guest writes.
* Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
without PML enabled.
* Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
* Harden the fast page fault path to guard against encountering an invalid
root when walking SPTEs.
* Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
* Use the fast path directly from the timer callback when delivering Xen
timer events, instead of waiting for the next iteration of the run loop.
This was not done so far because previously proposed code had races,
but now care is taken to stop the hrtimer at critical points such as
restarting the timer or saving the timer information for userspace.
* Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
* Optimize injection of PMU interrupts that are simultaneous with NMIs.
* Usual handful of fixes for typos and other warts.
x86 - MTRR/PAT fixes and optimizations:
* Clean up code that deals with honoring guest MTRRs when the VM has
non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
* Zap EPT entries when non-coherent DMA assignment stops/start to prevent
using stale entries with the wrong memtype.
* Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y.
This was done as a workaround for virtual machine BIOSes that did not
bother to clear CR0.CD (because ancient KVM/QEMU did not bother to
set it, in turn), and there's zero reason to extend the quirk to
also ignore guest PAT.
x86 - SEV fixes:
* Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
running an SEV-ES guest.
* Clean up the recognition of emulation failures on SEV guests, when KVM would
like to "skip" the instruction but it had already been partially emulated.
This makes it possible to drop a hack that second guessed the (insufficient)
information provided by the emulator, and just do the right thing.
Documentation:
* Various updates and fixes, mostly for x86
* MTRR and PAT fixes and optimizations:
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or
l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH
hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p
bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD
eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL
6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g==
=UIxm
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its
guest
- Optimization for vSGI injection, opportunistically compressing
MPIDR to vCPU mapping into a table
- Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
- Guest support for memory operation instructions (FEAT_MOPS)
- Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
- Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
- Load the stage-2 MMU context at vcpu_load() for VHE systems,
reducing the overhead of errata mitigations
- Miscellaneous kernel and selftest fixes
LoongArch:
- New architecture for kvm.
The hardware uses the same model as x86, s390 and RISC-V, where
guest/host mode is orthogonal to supervisor/user mode. The
virtualization extensions are very similar to MIPS, therefore the
code also has some similarities but it's been cleaned up to avoid
some of the historical bogosities that are found in arch/mips. The
kernel emulates MMU, timer and CSR accesses, while interrupt
controllers are only emulated in userspace, at least for now.
RISC-V:
- Support for the Smstateen and Zicond extensions
- Support for virtualizing senvcfg
- Support for virtualized SBI debug console (DBCN)
S390:
- Nested page table management can be monitored through tracepoints
and statistics
x86:
- Fix incorrect handling of VMX posted interrupt descriptor in
KVM_SET_LAPIC, which could result in a dropped timer IRQ
- Avoid WARN on systems with Intel IPI virtualization
- Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs
without forcing more common use cases to eat the extra memory
overhead.
- Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
SBPB, aka Selective Branch Predictor Barrier).
- Fix a bug where restoring a vCPU snapshot that was taken within 1
second of creating the original vCPU would cause KVM to try to
synchronize the vCPU's TSC and thus clobber the correct TSC being
set by userspace.
- Compute guest wall clock using a single TSC read to avoid
generating an inaccurate time, e.g. if the vCPU is preempted
between multiple TSC reads.
- "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which
complain about a "Firmware Bug" if the bit isn't set for select
F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to
appease Windows Server 2022.
- Don't apply side effects to Hyper-V's synthetic timer on writes
from userspace to fix an issue where the auto-enable behavior can
trigger spurious interrupts, i.e. do auto-enabling only for guest
writes.
- Remove an unnecessary kick of all vCPUs when synchronizing the
dirty log without PML enabled.
- Advertise "support" for non-serializing FS/GS base MSR writes as
appropriate.
- Harden the fast page fault path to guard against encountering an
invalid root when walking SPTEs.
- Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
- Use the fast path directly from the timer callback when delivering
Xen timer events, instead of waiting for the next iteration of the
run loop. This was not done so far because previously proposed code
had races, but now care is taken to stop the hrtimer at critical
points such as restarting the timer or saving the timer information
for userspace.
- Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future
flag.
- Optimize injection of PMU interrupts that are simultaneous with
NMIs.
- Usual handful of fixes for typos and other warts.
x86 - MTRR/PAT fixes and optimizations:
- Clean up code that deals with honoring guest MTRRs when the VM has
non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
- Zap EPT entries when non-coherent DMA assignment stops/start to
prevent using stale entries with the wrong memtype.
- Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y
This was done as a workaround for virtual machine BIOSes that did
not bother to clear CR0.CD (because ancient KVM/QEMU did not bother
to set it, in turn), and there's zero reason to extend the quirk to
also ignore guest PAT.
x86 - SEV fixes:
- Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts
SHUTDOWN while running an SEV-ES guest.
- Clean up the recognition of emulation failures on SEV guests, when
KVM would like to "skip" the instruction but it had already been
partially emulated. This makes it possible to drop a hack that
second guessed the (insufficient) information provided by the
emulator, and just do the right thing.
Documentation:
- Various updates and fixes, mostly for x86
- MTRR and PAT fixes and optimizations"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits)
KVM: selftests: Avoid using forced target for generating arm64 headers
tools headers arm64: Fix references to top srcdir in Makefile
KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
KVM: arm64: selftest: Perform ISB before reading PAR_EL1
KVM: arm64: selftest: Add the missing .guest_prepare()
KVM: arm64: Always invalidate TLB for stage-2 permission faults
KVM: x86: Service NMI requests after PMI requests in VM-Enter path
KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
KVM: arm64: Refine _EL2 system register list that require trap reinjection
arm64: Add missing _EL2 encodings
arm64: Add missing _EL12 encodings
KVM: selftests: aarch64: vPMU test for validating user accesses
KVM: selftests: aarch64: vPMU register test for unimplemented counters
KVM: selftests: aarch64: vPMU register test for implemented counters
KVM: selftests: aarch64: Introduce vpmu_counter_access test
tools: Import arm_pmuv3.h
KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
...
484 lines
14 KiB
C
484 lines
14 KiB
C
/* SPDX-License-Identifier: GPL-2.0-only */
|
|
/*
|
|
* Based on arch/arm/include/asm/tlbflush.h
|
|
*
|
|
* Copyright (C) 1999-2003 Russell King
|
|
* Copyright (C) 2012 ARM Ltd.
|
|
*/
|
|
#ifndef __ASM_TLBFLUSH_H
|
|
#define __ASM_TLBFLUSH_H
|
|
|
|
#ifndef __ASSEMBLY__
|
|
|
|
#include <linux/bitfield.h>
|
|
#include <linux/mm_types.h>
|
|
#include <linux/sched.h>
|
|
#include <linux/mmu_notifier.h>
|
|
#include <asm/cputype.h>
|
|
#include <asm/mmu.h>
|
|
|
|
/*
|
|
* Raw TLBI operations.
|
|
*
|
|
* Where necessary, use the __tlbi() macro to avoid asm()
|
|
* boilerplate. Drivers and most kernel code should use the TLB
|
|
* management routines in preference to the macro below.
|
|
*
|
|
* The macro can be used as __tlbi(op) or __tlbi(op, arg), depending
|
|
* on whether a particular TLBI operation takes an argument or
|
|
* not. The macros handles invoking the asm with or without the
|
|
* register argument as appropriate.
|
|
*/
|
|
#define __TLBI_0(op, arg) asm (ARM64_ASM_PREAMBLE \
|
|
"tlbi " #op "\n" \
|
|
ALTERNATIVE("nop\n nop", \
|
|
"dsb ish\n tlbi " #op, \
|
|
ARM64_WORKAROUND_REPEAT_TLBI, \
|
|
CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) \
|
|
: : )
|
|
|
|
#define __TLBI_1(op, arg) asm (ARM64_ASM_PREAMBLE \
|
|
"tlbi " #op ", %0\n" \
|
|
ALTERNATIVE("nop\n nop", \
|
|
"dsb ish\n tlbi " #op ", %0", \
|
|
ARM64_WORKAROUND_REPEAT_TLBI, \
|
|
CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) \
|
|
: : "r" (arg))
|
|
|
|
#define __TLBI_N(op, arg, n, ...) __TLBI_##n(op, arg)
|
|
|
|
#define __tlbi(op, ...) __TLBI_N(op, ##__VA_ARGS__, 1, 0)
|
|
|
|
#define __tlbi_user(op, arg) do { \
|
|
if (arm64_kernel_unmapped_at_el0()) \
|
|
__tlbi(op, (arg) | USER_ASID_FLAG); \
|
|
} while (0)
|
|
|
|
/* This macro creates a properly formatted VA operand for the TLBI */
|
|
#define __TLBI_VADDR(addr, asid) \
|
|
({ \
|
|
unsigned long __ta = (addr) >> 12; \
|
|
__ta &= GENMASK_ULL(43, 0); \
|
|
__ta |= (unsigned long)(asid) << 48; \
|
|
__ta; \
|
|
})
|
|
|
|
/*
|
|
* Get translation granule of the system, which is decided by
|
|
* PAGE_SIZE. Used by TTL.
|
|
* - 4KB : 1
|
|
* - 16KB : 2
|
|
* - 64KB : 3
|
|
*/
|
|
#define TLBI_TTL_TG_4K 1
|
|
#define TLBI_TTL_TG_16K 2
|
|
#define TLBI_TTL_TG_64K 3
|
|
|
|
static inline unsigned long get_trans_granule(void)
|
|
{
|
|
switch (PAGE_SIZE) {
|
|
case SZ_4K:
|
|
return TLBI_TTL_TG_4K;
|
|
case SZ_16K:
|
|
return TLBI_TTL_TG_16K;
|
|
case SZ_64K:
|
|
return TLBI_TTL_TG_64K;
|
|
default:
|
|
return 0;
|
|
}
|
|
}
|
|
|
|
/*
|
|
* Level-based TLBI operations.
|
|
*
|
|
* When ARMv8.4-TTL exists, TLBI operations take an additional hint for
|
|
* the level at which the invalidation must take place. If the level is
|
|
* wrong, no invalidation may take place. In the case where the level
|
|
* cannot be easily determined, a 0 value for the level parameter will
|
|
* perform a non-hinted invalidation.
|
|
*
|
|
* For Stage-2 invalidation, use the level values provided to that effect
|
|
* in asm/stage2_pgtable.h.
|
|
*/
|
|
#define TLBI_TTL_MASK GENMASK_ULL(47, 44)
|
|
|
|
#define __tlbi_level(op, addr, level) do { \
|
|
u64 arg = addr; \
|
|
\
|
|
if (alternative_has_cap_unlikely(ARM64_HAS_ARMv8_4_TTL) && \
|
|
level) { \
|
|
u64 ttl = level & 3; \
|
|
ttl |= get_trans_granule() << 2; \
|
|
arg &= ~TLBI_TTL_MASK; \
|
|
arg |= FIELD_PREP(TLBI_TTL_MASK, ttl); \
|
|
} \
|
|
\
|
|
__tlbi(op, arg); \
|
|
} while(0)
|
|
|
|
#define __tlbi_user_level(op, arg, level) do { \
|
|
if (arm64_kernel_unmapped_at_el0()) \
|
|
__tlbi_level(op, (arg | USER_ASID_FLAG), level); \
|
|
} while (0)
|
|
|
|
/*
|
|
* This macro creates a properly formatted VA operand for the TLB RANGE.
|
|
* The value bit assignments are:
|
|
*
|
|
* +----------+------+-------+-------+-------+----------------------+
|
|
* | ASID | TG | SCALE | NUM | TTL | BADDR |
|
|
* +-----------------+-------+-------+-------+----------------------+
|
|
* |63 48|47 46|45 44|43 39|38 37|36 0|
|
|
*
|
|
* The address range is determined by below formula:
|
|
* [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
|
|
*
|
|
*/
|
|
#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl) \
|
|
({ \
|
|
unsigned long __ta = (addr) >> PAGE_SHIFT; \
|
|
__ta &= GENMASK_ULL(36, 0); \
|
|
__ta |= (unsigned long)(ttl) << 37; \
|
|
__ta |= (unsigned long)(num) << 39; \
|
|
__ta |= (unsigned long)(scale) << 44; \
|
|
__ta |= get_trans_granule() << 46; \
|
|
__ta |= (unsigned long)(asid) << 48; \
|
|
__ta; \
|
|
})
|
|
|
|
/* These macros are used by the TLBI RANGE feature. */
|
|
#define __TLBI_RANGE_PAGES(num, scale) \
|
|
((unsigned long)((num) + 1) << (5 * (scale) + 1))
|
|
#define MAX_TLBI_RANGE_PAGES __TLBI_RANGE_PAGES(31, 3)
|
|
|
|
/*
|
|
* Generate 'num' values from -1 to 30 with -1 rejected by the
|
|
* __flush_tlb_range() loop below.
|
|
*/
|
|
#define TLBI_RANGE_MASK GENMASK_ULL(4, 0)
|
|
#define __TLBI_RANGE_NUM(pages, scale) \
|
|
((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
|
|
|
|
/*
|
|
* TLB Invalidation
|
|
* ================
|
|
*
|
|
* This header file implements the low-level TLB invalidation routines
|
|
* (sometimes referred to as "flushing" in the kernel) for arm64.
|
|
*
|
|
* Every invalidation operation uses the following template:
|
|
*
|
|
* DSB ISHST // Ensure prior page-table updates have completed
|
|
* TLBI ... // Invalidate the TLB
|
|
* DSB ISH // Ensure the TLB invalidation has completed
|
|
* if (invalidated kernel mappings)
|
|
* ISB // Discard any instructions fetched from the old mapping
|
|
*
|
|
*
|
|
* The following functions form part of the "core" TLB invalidation API,
|
|
* as documented in Documentation/core-api/cachetlb.rst:
|
|
*
|
|
* flush_tlb_all()
|
|
* Invalidate the entire TLB (kernel + user) on all CPUs
|
|
*
|
|
* flush_tlb_mm(mm)
|
|
* Invalidate an entire user address space on all CPUs.
|
|
* The 'mm' argument identifies the ASID to invalidate.
|
|
*
|
|
* flush_tlb_range(vma, start, end)
|
|
* Invalidate the virtual-address range '[start, end)' on all
|
|
* CPUs for the user address space corresponding to 'vma->mm'.
|
|
* Note that this operation also invalidates any walk-cache
|
|
* entries associated with translations for the specified address
|
|
* range.
|
|
*
|
|
* flush_tlb_kernel_range(start, end)
|
|
* Same as flush_tlb_range(..., start, end), but applies to
|
|
* kernel mappings rather than a particular user address space.
|
|
* Whilst not explicitly documented, this function is used when
|
|
* unmapping pages from vmalloc/io space.
|
|
*
|
|
* flush_tlb_page(vma, addr)
|
|
* Invalidate a single user mapping for address 'addr' in the
|
|
* address space corresponding to 'vma->mm'. Note that this
|
|
* operation only invalidates a single, last-level page-table
|
|
* entry and therefore does not affect any walk-caches.
|
|
*
|
|
*
|
|
* Next, we have some undocumented invalidation routines that you probably
|
|
* don't want to call unless you know what you're doing:
|
|
*
|
|
* local_flush_tlb_all()
|
|
* Same as flush_tlb_all(), but only applies to the calling CPU.
|
|
*
|
|
* __flush_tlb_kernel_pgtable(addr)
|
|
* Invalidate a single kernel mapping for address 'addr' on all
|
|
* CPUs, ensuring that any walk-cache entries associated with the
|
|
* translation are also invalidated.
|
|
*
|
|
* __flush_tlb_range(vma, start, end, stride, last_level)
|
|
* Invalidate the virtual-address range '[start, end)' on all
|
|
* CPUs for the user address space corresponding to 'vma->mm'.
|
|
* The invalidation operations are issued at a granularity
|
|
* determined by 'stride' and only affect any walk-cache entries
|
|
* if 'last_level' is equal to false.
|
|
*
|
|
*
|
|
* Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
|
|
* on top of these routines, since that is our interface to the mmu_gather
|
|
* API as used by munmap() and friends.
|
|
*/
|
|
static inline void local_flush_tlb_all(void)
|
|
{
|
|
dsb(nshst);
|
|
__tlbi(vmalle1);
|
|
dsb(nsh);
|
|
isb();
|
|
}
|
|
|
|
static inline void flush_tlb_all(void)
|
|
{
|
|
dsb(ishst);
|
|
__tlbi(vmalle1is);
|
|
dsb(ish);
|
|
isb();
|
|
}
|
|
|
|
static inline void flush_tlb_mm(struct mm_struct *mm)
|
|
{
|
|
unsigned long asid;
|
|
|
|
dsb(ishst);
|
|
asid = __TLBI_VADDR(0, ASID(mm));
|
|
__tlbi(aside1is, asid);
|
|
__tlbi_user(aside1is, asid);
|
|
dsb(ish);
|
|
mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
|
|
}
|
|
|
|
static inline void __flush_tlb_page_nosync(struct mm_struct *mm,
|
|
unsigned long uaddr)
|
|
{
|
|
unsigned long addr;
|
|
|
|
dsb(ishst);
|
|
addr = __TLBI_VADDR(uaddr, ASID(mm));
|
|
__tlbi(vale1is, addr);
|
|
__tlbi_user(vale1is, addr);
|
|
mmu_notifier_arch_invalidate_secondary_tlbs(mm, uaddr & PAGE_MASK,
|
|
(uaddr & PAGE_MASK) + PAGE_SIZE);
|
|
}
|
|
|
|
static inline void flush_tlb_page_nosync(struct vm_area_struct *vma,
|
|
unsigned long uaddr)
|
|
{
|
|
return __flush_tlb_page_nosync(vma->vm_mm, uaddr);
|
|
}
|
|
|
|
static inline void flush_tlb_page(struct vm_area_struct *vma,
|
|
unsigned long uaddr)
|
|
{
|
|
flush_tlb_page_nosync(vma, uaddr);
|
|
dsb(ish);
|
|
}
|
|
|
|
static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm)
|
|
{
|
|
/*
|
|
* TLB flush deferral is not required on systems which are affected by
|
|
* ARM64_WORKAROUND_REPEAT_TLBI, as __tlbi()/__tlbi_user() implementation
|
|
* will have two consecutive TLBI instructions with a dsb(ish) in between
|
|
* defeating the purpose (i.e save overall 'dsb ish' cost).
|
|
*/
|
|
if (alternative_has_cap_unlikely(ARM64_WORKAROUND_REPEAT_TLBI))
|
|
return false;
|
|
|
|
return true;
|
|
}
|
|
|
|
static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
|
|
struct mm_struct *mm,
|
|
unsigned long uaddr)
|
|
{
|
|
__flush_tlb_page_nosync(mm, uaddr);
|
|
}
|
|
|
|
/*
|
|
* If mprotect/munmap/etc occurs during TLB batched flushing, we need to
|
|
* synchronise all the TLBI issued with a DSB to avoid the race mentioned in
|
|
* flush_tlb_batched_pending().
|
|
*/
|
|
static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm)
|
|
{
|
|
dsb(ish);
|
|
}
|
|
|
|
/*
|
|
* To support TLB batched flush for multiple pages unmapping, we only send
|
|
* the TLBI for each page in arch_tlbbatch_add_pending() and wait for the
|
|
* completion at the end in arch_tlbbatch_flush(). Since we've already issued
|
|
* TLBI for each page so only a DSB is needed to synchronise its effect on the
|
|
* other CPUs.
|
|
*
|
|
* This will save the time waiting on DSB comparing issuing a TLBI;DSB sequence
|
|
* for each page.
|
|
*/
|
|
static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
|
|
{
|
|
dsb(ish);
|
|
}
|
|
|
|
/*
|
|
* This is meant to avoid soft lock-ups on large TLB flushing ranges and not
|
|
* necessarily a performance improvement.
|
|
*/
|
|
#define MAX_DVM_OPS PTRS_PER_PTE
|
|
|
|
/*
|
|
* __flush_tlb_range_op - Perform TLBI operation upon a range
|
|
*
|
|
* @op: TLBI instruction that operates on a range (has 'r' prefix)
|
|
* @start: The start address of the range
|
|
* @pages: Range as the number of pages from 'start'
|
|
* @stride: Flush granularity
|
|
* @asid: The ASID of the task (0 for IPA instructions)
|
|
* @tlb_level: Translation Table level hint, if known
|
|
* @tlbi_user: If 'true', call an additional __tlbi_user()
|
|
* (typically for user ASIDs). 'flase' for IPA instructions
|
|
*
|
|
* When the CPU does not support TLB range operations, flush the TLB
|
|
* entries one by one at the granularity of 'stride'. If the TLB
|
|
* range ops are supported, then:
|
|
*
|
|
* 1. If 'pages' is odd, flush the first page through non-range
|
|
* operations;
|
|
*
|
|
* 2. For remaining pages: the minimum range granularity is decided
|
|
* by 'scale', so multiple range TLBI operations may be required.
|
|
* Start from scale = 0, flush the corresponding number of pages
|
|
* ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
|
|
* until no pages left.
|
|
*
|
|
* Note that certain ranges can be represented by either num = 31 and
|
|
* scale or num = 0 and scale + 1. The loop below favours the latter
|
|
* since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
|
|
*/
|
|
#define __flush_tlb_range_op(op, start, pages, stride, \
|
|
asid, tlb_level, tlbi_user) \
|
|
do { \
|
|
int num = 0; \
|
|
int scale = 0; \
|
|
unsigned long addr; \
|
|
\
|
|
while (pages > 0) { \
|
|
if (!system_supports_tlb_range() || \
|
|
pages % 2 == 1) { \
|
|
addr = __TLBI_VADDR(start, asid); \
|
|
__tlbi_level(op, addr, tlb_level); \
|
|
if (tlbi_user) \
|
|
__tlbi_user_level(op, addr, tlb_level); \
|
|
start += stride; \
|
|
pages -= stride >> PAGE_SHIFT; \
|
|
continue; \
|
|
} \
|
|
\
|
|
num = __TLBI_RANGE_NUM(pages, scale); \
|
|
if (num >= 0) { \
|
|
addr = __TLBI_VADDR_RANGE(start, asid, scale, \
|
|
num, tlb_level); \
|
|
__tlbi(r##op, addr); \
|
|
if (tlbi_user) \
|
|
__tlbi_user(r##op, addr); \
|
|
start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
|
|
pages -= __TLBI_RANGE_PAGES(num, scale); \
|
|
} \
|
|
scale++; \
|
|
} \
|
|
} while (0)
|
|
|
|
#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
|
|
__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false)
|
|
|
|
static inline void __flush_tlb_range(struct vm_area_struct *vma,
|
|
unsigned long start, unsigned long end,
|
|
unsigned long stride, bool last_level,
|
|
int tlb_level)
|
|
{
|
|
unsigned long asid, pages;
|
|
|
|
start = round_down(start, stride);
|
|
end = round_up(end, stride);
|
|
pages = (end - start) >> PAGE_SHIFT;
|
|
|
|
/*
|
|
* When not uses TLB range ops, we can handle up to
|
|
* (MAX_DVM_OPS - 1) pages;
|
|
* When uses TLB range ops, we can handle up to
|
|
* (MAX_TLBI_RANGE_PAGES - 1) pages.
|
|
*/
|
|
if ((!system_supports_tlb_range() &&
|
|
(end - start) >= (MAX_DVM_OPS * stride)) ||
|
|
pages >= MAX_TLBI_RANGE_PAGES) {
|
|
flush_tlb_mm(vma->vm_mm);
|
|
return;
|
|
}
|
|
|
|
dsb(ishst);
|
|
asid = ASID(vma->vm_mm);
|
|
|
|
if (last_level)
|
|
__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
|
|
else
|
|
__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
|
|
|
|
dsb(ish);
|
|
mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
|
|
}
|
|
|
|
static inline void flush_tlb_range(struct vm_area_struct *vma,
|
|
unsigned long start, unsigned long end)
|
|
{
|
|
/*
|
|
* We cannot use leaf-only invalidation here, since we may be invalidating
|
|
* table entries as part of collapsing hugepages or moving page tables.
|
|
* Set the tlb_level to 0 because we can not get enough information here.
|
|
*/
|
|
__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
|
|
}
|
|
|
|
static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
|
|
{
|
|
unsigned long addr;
|
|
|
|
if ((end - start) > (MAX_DVM_OPS * PAGE_SIZE)) {
|
|
flush_tlb_all();
|
|
return;
|
|
}
|
|
|
|
start = __TLBI_VADDR(start, 0);
|
|
end = __TLBI_VADDR(end, 0);
|
|
|
|
dsb(ishst);
|
|
for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12))
|
|
__tlbi(vaale1is, addr);
|
|
dsb(ish);
|
|
isb();
|
|
}
|
|
|
|
/*
|
|
* Used to invalidate the TLB (walk caches) corresponding to intermediate page
|
|
* table levels (pgd/pud/pmd).
|
|
*/
|
|
static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr)
|
|
{
|
|
unsigned long addr = __TLBI_VADDR(kaddr, 0);
|
|
|
|
dsb(ishst);
|
|
__tlbi(vaae1is, addr);
|
|
dsb(ish);
|
|
isb();
|
|
}
|
|
#endif
|
|
|
|
#endif
|