Commit Graph

655 Commits

Author SHA1 Message Date
Marc Zyngier
c5332898dc KVM: arm64: Descope kvm_arm_sys_reg_{get,set}_reg()
Having kvm_arm_sys_reg_get_reg and co in kvm_host.h gives the
impression that these functions are free to be called from
anywhere.

Not quite. They really are tied to out internal sysreg handling,
and they would be better off in the sys_regs.h header, which is
private. kvm_host.h could also get a bit of a diet, so let's
just do that.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-07-17 11:55:33 +01:00
Marc Zyngier
dc94f89ae6 Merge branch kvm-arm64/burn-the-flags into kvmarm-master/next
* kvm-arm64/burn-the-flags:
  : .
  : Rework the per-vcpu flags to make them more manageable,
  : splitting them in different sets that have specific
  : uses:
  :
  : - configuration flags
  : - input to the world-switch
  : - state bookkeeping for the kernel itself
  :
  : The FP tracking is also simplified and tracked outside
  : of the flags as a separate state.
  : .
  KVM: arm64: Move the handling of !FP outside of the fast path
  KVM: arm64: Document why pause cannot be turned into a flag
  KVM: arm64: Reduce the size of the vcpu flag members
  KVM: arm64: Add build-time sanity checks for flags
  KVM: arm64: Warn when PENDING_EXCEPTION and INCREMENT_PC are set together
  KVM: arm64: Convert vcpu sysregs_loaded_on_cpu to a state flag
  KVM: arm64: Kill unused vcpu flags field
  KVM: arm64: Move vcpu WFIT flag to the state flag set
  KVM: arm64: Move vcpu ON_UNSUPPORTED_CPU flag to the state flag set
  KVM: arm64: Move vcpu SVE/SME flags to the state flag set
  KVM: arm64: Move vcpu debug/SPE/TRBE flags to the input flag set
  KVM: arm64: Move vcpu PC/Exception flags to the input flag set
  KVM: arm64: Move vcpu configuration flags into their own set
  KVM: arm64: Add three sets of flags to the vcpu state
  KVM: arm64: Add helpers to manipulate vcpu flags among a set
  KVM: arm64: Move FP state ownership from flag to a tristate
  KVM: arm64: Drop FP_FOREIGN_STATE from the hypervisor code

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:30:10 +01:00
Marc Zyngier
0fa4a3137e KVM: arm64: Document why pause cannot be turned into a flag
It would be tempting to turn the 'pause' state into a flag.

However, this cannot easily be done as it is updated out of context,
while all the flags expect to only be updated from the vcpu thread.
Turning it into a flag would require to make all flag updates
atomic, which isn't necessary desireable.

Document this, and take this opportunity to move the field next
to the flag sets, filling a hole in the vcpu structure.

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:23:51 +01:00
Marc Zyngier
54ddda919c KVM: arm64: Reduce the size of the vcpu flag members
Now that we can detect flags overflowing their container, reduce
the size of all flag set members in the vcpu struct, turning them
into 8bit quantities.

Even with the FP state enum occupying 32bit, the whole of the state
that was represented by flags is smaller by one byte. Profit!

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:23:46 +01:00
Marc Zyngier
5a3984f4ec KVM: arm64: Add build-time sanity checks for flags
Flags are great, but flags can also be dangerous: it is easy
to encode a flag that is bigger than its container (unless the
container is a u64), and it is easy to construct a flag value
that doesn't fit in the mask that is associated with it.

Add a couple of build-time sanity checks that ensure we catch
these two cases.

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:23:41 +01:00
Marc Zyngier
30b6ab45f8 KVM: arm64: Convert vcpu sysregs_loaded_on_cpu to a state flag
The aptly named boolean 'sysregs_loaded_on_cpu' tracks whether
some of the vcpu system registers are resident on the physical
CPU when running in VHE mode.

This is obviously a flag in hidding, so let's convert it to
a state flag, since this is solely a host concern (the hypervisor
itself always knows which state we're in).

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:23:32 +01:00
Marc Zyngier
781e3ae148 KVM: arm64: Kill unused vcpu flags field
Horray, we have now sorted all the preexisting flags, and the
'flags' field is now unused. Get rid of it while nobody is
looking.

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:23:28 +01:00
Marc Zyngier
eebc538d8e KVM: arm64: Move vcpu WFIT flag to the state flag set
The host kernel uses the WFIT flag to remember that a vcpu has used
this instruction and wake it up as required. Move it to the state
set, as nothing in the hypervisor uses this information.

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:23:23 +01:00
Marc Zyngier
aff3ccd732 KVM: arm64: Move vcpu ON_UNSUPPORTED_CPU flag to the state flag set
The ON_UNSUPPORTED_CPU flag is only there to track the sad fact
that we have ended-up on a CPU where we cannot really run.

Since this is only for the host kernel's use, move it to the state
set.

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:23:19 +01:00
Marc Zyngier
0affa37fcd KVM: arm64: Move vcpu SVE/SME flags to the state flag set
The two HOST_{SVE,SME}_ENABLED are only used for the host kernel
to track its own state across a vcpu run so that it can be fully
restored.

Move these flags to the so called state set.

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:23:14 +01:00
Marc Zyngier
b1da49088a KVM: arm64: Move vcpu debug/SPE/TRBE flags to the input flag set
The three debug flags (which deal with the debug registers, SPE and
TRBE) all are input flags to the hypervisor code.

Move them into the input set and convert them to the new accessors.

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-29 10:23:03 +01:00
Marc Zyngier
699bb2e0c6 KVM: arm64: Move vcpu PC/Exception flags to the input flag set
The PC update flags (which also deal with exception injection)
is one of the most complicated use of the flag we have. Make it
more fool prof by:

- moving it over to the new accessors and assign it to the
  input flag set

- turn the combination of generic ELx flags with another flag
  indicating the target EL itself into an explicit set of
  flags for each EL and vector combination

- add a new accessor to pend the exception

This is otherwise a pretty straightformward conversion.

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-10 09:54:34 +01:00
Marc Zyngier
4c0680d394 KVM: arm64: Move vcpu configuration flags into their own set
The KVM_ARM64_{GUEST_HAS_SVE,VCPU_SVE_FINALIZED,GUEST_HAS_PTRAUTH}
flags are purely configuration flags. Once set, they are never cleared,
but evaluated all over the code base.

Move these three flags into the configuration set in one go, using
the new accessors, and take this opportunity to drop the KVM_ARM64_
prefix which doesn't provide any help.

Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-09 15:43:46 +01:00
Marc Zyngier
bcbfb588cf KVM: arm64: Drop stale comment
The layout of 'struct kvm_vcpu_arch' has evolved significantly since
the initial port of KVM/arm64, so remove the stale comment suggesting
that a prefix of the structure is used exclusively from assembly code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220609121223.2551-7-will@kernel.org
2022-06-09 13:24:02 +01:00
Marc Zyngier
690bacb83b KVM: arm64: Add three sets of flags to the vcpu state
It so appears that each of the vcpu flags is really belonging to
one of three categories:

- a configuration flag, set once and for all
- an input flag generated by the kernel for the hypervisor to use
- a state flag that is only for the kernel's own bookkeeping

As we are going to split all the existing flags into these three
sets, introduce all three in one go.

No functional change other than a bit of bloat...

Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-09 12:02:12 +01:00
Marc Zyngier
e87abb73e5 KVM: arm64: Add helpers to manipulate vcpu flags among a set
Careful analysis of the vcpu flags show that this is a mix of
configuration, communication between the host and the hypervisor,
as well as anciliary state that has no consistency. It'd be a lot
better if we could split these flags into consistent categories.

However, even if we split these flags apart, we want to make sure
that each flag can only be applied to its own set, and not across
sets.

To achieve this, use a preprocessor hack so that each flag is always
associated with:

- the set that contains it,

- a mask that describe all the bits that contain it (for a simple
  flag, this is the same thing as the flag itself, but we will
  eventually have values that cover multiple bits at once).

Each flag is thus a triplet that is not directly usable as a value,
but used by three helpers that allow the flag to be set, cleared,
and fetched. By mandating the use of such helper, we can easily
enforce that a flag can only be used with the set it belongs to.

Finally, one last helper "unpacks" the raw value from the triplet
that represents a flag, which is useful for multi-bit values that
need to be enumerated (in a switch statement, for example).

Further patches will start making use of this infrastructure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-06-09 12:02:06 +01:00
Marc Zyngier
f8077b0d59 KVM: arm64: Move FP state ownership from flag to a tristate
The KVM FP code uses a pair of flags to denote three states:

- FP_ENABLED set: the guest owns the FP state
- FP_HOST set: the host owns the FP state
- FP_ENABLED and FP_HOST clear: nobody owns the FP state at all

and both flags set is an illegal state, which nothing ever checks
for...

As it turns out, this isn't really a good match for flags, and
we'd be better off if this was a simpler tristate, each state
having a name that actually reflect the state:

- FP_STATE_FREE
- FP_STATE_HOST_OWNED
- FP_STATE_GUEST_OWNED

Kill the two flags, and move over to an enum encoding these
three states. This results in less confusing code, and less risk of
ending up in the uncharted territory of a 4th state if we forget
to clear one of the two flags.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
2022-06-09 12:01:58 +01:00
Marc Zyngier
e9ada6c208 KVM: arm64: Drop FP_FOREIGN_STATE from the hypervisor code
The vcpu KVM_ARM64_FP_FOREIGN_FPSTATE flag tracks the thread's own
TIF_FOREIGN_FPSTATE so that we can evaluate just before running
the vcpu whether it the FP regs contain something that is owned
by the vcpu or not by updating the rest of the FP flags.

We do this in the hypervisor code in order to make sure we're
in a context where we are not interruptible. But we already
have a hook in the run loop to generate this flag. We may as
well update the FP flags directly and save the pointless flag
tracking.

Whilst we're at it, rename update_fp_enabled() to guest_owns_fp_regs()
to indicate what the leftover of this helper actually do.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
2022-06-09 12:01:51 +01:00
Linus Torvalds
bf9095424d S390:
* ultravisor communication device driver
 
 * fix TEID on terminating storage key ops
 
 RISC-V:
 
 * Added Sv57x4 support for G-stage page table
 
 * Added range based local HFENCE functions
 
 * Added remote HFENCE functions based on VCPU requests
 
 * Added ISA extension registers in ONE_REG interface
 
 * Updated KVM RISC-V maintainers entry to cover selftests support
 
 ARM:
 
 * Add support for the ARMv8.6 WFxT extension
 
 * Guard pages for the EL2 stacks
 
 * Trap and emulate AArch32 ID registers to hide unsupported features
 
 * Ability to select and save/restore the set of hypercalls exposed
   to the guest
 
 * Support for PSCI-initiated suspend in collaboration with userspace
 
 * GICv3 register-based LPI invalidation support
 
 * Move host PMU event merging into the vcpu data structure
 
 * GICv3 ITS save/restore fixes
 
 * The usual set of small-scale cleanups and fixes
 
 x86:
 
 * New ioctls to get/set TSC frequency for a whole VM
 
 * Allow userspace to opt out of hypercall patching
 
 * Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
 
 AMD SEV improvements:
 
 * Add KVM_EXIT_SHUTDOWN metadata for SEV-ES
 
 * V_TSC_AUX support
 
 Nested virtualization improvements for AMD:
 
 * Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
   nested vGIF)
 
 * Allow AVIC to co-exist with a nested guest running
 
 * Fixes for LBR virtualizations when a nested guest is running,
   and nested LBR virtualization support
 
 * PAUSE filtering for nested hypervisors
 
 Guest support:
 
 * Decoupling of vcpu_is_preempted from PV spinlocks
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKN9M4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNLeAf+KizAlQwxEehHHeNyTkZuKyMawrD6
 zsqAENR6i1TxiXe7fDfPFbO2NR0ZulQopHbD9mwnHJ+nNw0J4UT7g3ii1IAVcXPu
 rQNRGMVWiu54jt+lep8/gDg0JvPGKVVKLhxUaU1kdWT9PhIOC6lwpP3vmeWkUfRi
 PFL/TMT0M8Nfryi0zHB0tXeqg41BiXfqO8wMySfBAHUbpv8D53D2eXQL6YlMM0pL
 2quB1HxHnpueE5vj3WEPQ3PCdy1M2MTfCDBJAbZGG78Ljx45FxSGoQcmiBpPnhJr
 C6UGP4ZDWpml5YULUoA70k5ylCbP+vI61U4vUtzEiOjHugpPV5wFKtx5nw==
 =ozWx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "S390:

   - ultravisor communication device driver

   - fix TEID on terminating storage key ops

  RISC-V:

   - Added Sv57x4 support for G-stage page table

   - Added range based local HFENCE functions

   - Added remote HFENCE functions based on VCPU requests

   - Added ISA extension registers in ONE_REG interface

   - Updated KVM RISC-V maintainers entry to cover selftests support

  ARM:

   - Add support for the ARMv8.6 WFxT extension

   - Guard pages for the EL2 stacks

   - Trap and emulate AArch32 ID registers to hide unsupported features

   - Ability to select and save/restore the set of hypercalls exposed to
     the guest

   - Support for PSCI-initiated suspend in collaboration with userspace

   - GICv3 register-based LPI invalidation support

   - Move host PMU event merging into the vcpu data structure

   - GICv3 ITS save/restore fixes

   - The usual set of small-scale cleanups and fixes

  x86:

   - New ioctls to get/set TSC frequency for a whole VM

   - Allow userspace to opt out of hypercall patching

   - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

  AMD SEV improvements:

   - Add KVM_EXIT_SHUTDOWN metadata for SEV-ES

   - V_TSC_AUX support

  Nested virtualization improvements for AMD:

   - Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
     nested vGIF)

   - Allow AVIC to co-exist with a nested guest running

   - Fixes for LBR virtualizations when a nested guest is running, and
     nested LBR virtualization support

   - PAUSE filtering for nested hypervisors

  Guest support:

   - Decoupling of vcpu_is_preempted from PV spinlocks"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits)
  KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest
  KVM: selftests: x86: Sync the new name of the test case to .gitignore
  Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND
  x86, kvm: use correct GFP flags for preemption disabled
  KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer
  x86/kvm: Alloc dummy async #PF token outside of raw spinlock
  KVM: x86: avoid calling x86 emulator without a decoded instruction
  KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak
  x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave)
  s390/uv_uapi: depend on CONFIG_S390
  KVM: selftests: x86: Fix test failure on arch lbr capable platforms
  KVM: LAPIC: Trace LAPIC timer expiration on every vmentry
  KVM: s390: selftest: Test suppression indication on key prot exception
  KVM: s390: Don't indicate suppression on dirtying, failing memop
  selftests: drivers/s390x: Add uvdevice tests
  drivers/s390/char: Add Ultravisor io device
  MAINTAINERS: Update KVM RISC-V entry to cover selftests support
  RISC-V: KVM: Introduce ISA extension register
  RISC-V: KVM: Cleanup stale TLB entries when host CPU changes
  RISC-V: KVM: Add remote HFENCE functions based on VCPU requests
  ...
2022-05-26 14:20:14 -07:00
Paolo Bonzini
47e8eec832 KVM/arm64 updates for 5.19
- Add support for the ARMv8.6 WFxT extension
 
 - Guard pages for the EL2 stacks
 
 - Trap and emulate AArch32 ID registers to hide unsupported features
 
 - Ability to select and save/restore the set of hypercalls exposed
   to the guest
 
 - Support for PSCI-initiated suspend in collaboration with userspace
 
 - GICv3 register-based LPI invalidation support
 
 - Move host PMU event merging into the vcpu data structure
 
 - GICv3 ITS save/restore fixes
 
 - The usual set of small-scale cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmKGAGsPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDB/gQAMhyZ+wCG0OMEZhwFF6iDfxVEX2Kw8L41NtD
 a/e6LDWuIOGihItpRkYROc5myG74D7XckF2Bz3G7HJoU4vhwHOV/XulE26GFizoC
 O1GVRekeSUY81wgS1yfo0jojLupBkTjiq3SjTHoDP7GmCM0qDPBtA0QlMRzd2bMs
 Kx0+UUXZUHFSTXc7Lp4vqNH+tMp7se+yRx7hxm6PCM5zG+XYJjLxnsZ0qpchObgU
 7f6YFojsLUs1SexgiUqJ1RChVQ+FkgICh5HyzORvGtHNNzK6D2sIbsW6nqMGAMql
 Kr3A5O/VOkCztSYnLxaa76/HqD21mvUrXvr3grhabNc7rOmuzWV0dDgr6c6wHKHb
 uNCtH4d7Ra06gUrEOrfsgLOLn0Zqik89y6aIlMsnTudMg9gMNgFHy1jz4LM7vMkY
 FS5AVj059heg2uJcfgTvzzcqneyuBLBmF3dS4coowO6oaj8SycpaEmP5e89zkPMI
 1kk8d0e6RmXuCh/2AJ8GxxnKvBPgqp2mMKXOCJ8j4AmHEDX/CKpEBBqIWLKkplUU
 8DGiOWJUtRZJg398dUeIpiVLoXJthMODjAnkKkuhiFcQbXomlwgg7YSnNAz6TRED
 Z7KR2leC247kapHnnagf02q2wED8pBeyrxbQPNdrHtSJ9Usm4nTkY443HgVTJW3s
 aTwPZAQ7
 =mh7W
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 5.19

- Add support for the ARMv8.6 WFxT extension

- Guard pages for the EL2 stacks

- Trap and emulate AArch32 ID registers to hide unsupported features

- Ability to select and save/restore the set of hypercalls exposed
  to the guest

- Support for PSCI-initiated suspend in collaboration with userspace

- GICv3 register-based LPI invalidation support

- Move host PMU event merging into the vcpu data structure

- GICv3 ITS save/restore fixes

- The usual set of small-scale cleanups and fixes

[Due to the conflict, KVM_SYSTEM_EVENT_SEV_TERM is relocated
 from 4 to 6. - Paolo]
2022-05-25 05:09:23 -04:00
Catalin Marinas
0616ea3f1b Merge branch 'for-next/esr-elx-64-bit' into for-next/core
* for-next/esr-elx-64-bit:
  : Treat ESR_ELx as a 64-bit register.
  KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high
  KVM: arm64: Treat ESR_EL2 as a 64-bit register
  arm64: Treat ESR_ELx as a 64-bit register
  arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscall
  arm64: Make ESR_ELx_xVC_IMM_MASK compatible with assembly
2022-05-20 18:51:54 +01:00
Marc Zyngier
822ca7f82b Merge branch kvm-arm64/misc-5.19 into kvmarm-master/next
* kvm-arm64/misc-5.19:
  : .
  : Misc fixes and general improvements for KVMM/arm64:
  :
  : - Better handle out of sequence sysregs in the global tables
  :
  : - Remove a couple of unnecessary loads from constant pool
  :
  : - Drop unnecessary pKVM checks
  :
  : - Add all known M1 implementations to the SEIS workaround
  :
  : - Cleanup kerneldoc warnings
  : .
  KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaround
  KVM: arm64: pkvm: Don't mask already zeroed FEAT_SVE
  KVM: arm64: pkvm: Drop unnecessary FP/SIMD trap handler
  KVM: arm64: nvhe: Eliminate kernel-doc warnings
  KVM: arm64: Avoid unnecessary absolute addressing via literals
  KVM: arm64: Print emulated register table name when it is unsorted
  KVM: arm64: Don't BUG_ON() if emulated register table is unsorted

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:48:36 +01:00
Marc Zyngier
8794b4f510 Merge branch kvm-arm64/per-vcpu-host-pmu-data into kvmarm-master/next
* kvm-arm64/per-vcpu-host-pmu-data:
  : .
  : Pass the host PMU state in the vcpu to avoid the use of additional
  : shared memory between EL1 and EL2 (this obviously only applies
  : to nVHE and Protected setups).
  :
  : Patches courtesy of Fuad Tabba.
  : .
  KVM: arm64: pmu: Restore compilation when HW_PERF_EVENTS isn't selected
  KVM: arm64: Reenable pmu in Protected Mode
  KVM: arm64: Pass pmu events to hyp via vcpu
  KVM: arm64: Repack struct kvm_pmu to reduce size
  KVM: arm64: Wrapper for getting pmu_events

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:48:36 +01:00
Marc Zyngier
3b8e21e3c3 Merge branch kvm-arm64/psci-suspend into kvmarm-master/next
* kvm-arm64/psci-suspend:
  : .
  : Add support for PSCI SYSTEM_SUSPEND and allow userspace to
  : filter the wake-up events.
  :
  : Patches courtesy of Oliver.
  : .
  Documentation: KVM: Fix title level for PSCI_SUSPEND
  selftests: KVM: Test SYSTEM_SUSPEND PSCI call
  selftests: KVM: Refactor psci_test to make it amenable to new tests
  selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test
  selftests: KVM: Create helper for making SMCCC calls
  selftests: KVM: Rename psci_cpu_on_test to psci_test
  KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  KVM: arm64: Add support for userspace to suspend a vCPU
  KVM: arm64: Return a value from check_vcpu_requests()
  KVM: arm64: Rename the KVM_REQ_SLEEP handler
  KVM: arm64: Track vCPU power state using MP state values
  KVM: arm64: Dedupe vCPU power off helpers
  KVM: arm64: Don't depend on fallthrough to hide SYSTEM_RESET2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:48:20 +01:00
Marc Zyngier
0586e28aaa Merge branch kvm-arm64/hcall-selection into kvmarm-master/next
* kvm-arm64/hcall-selection:
  : .
  : Introduce a new set of virtual sysregs for userspace to
  : select the hypercalls it wants to see exposed to the guest.
  :
  : Patches courtesy of Raghavendra and Oliver.
  : .
  KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run
  KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace
  Documentation: Fix index.rst after psci.rst renaming
  selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list
  selftests: KVM: aarch64: Introduce hypercall ABI test
  selftests: KVM: Create helper for making SMCCC calls
  selftests: KVM: Rename psci_cpu_on_test to psci_test
  tools: Import ARM SMCCC definitions
  Docs: KVM: Add doc for the bitmap firmware registers
  Docs: KVM: Rename psci.rst to hypercalls.rst
  KVM: arm64: Add vendor hypervisor firmware register
  KVM: arm64: Add standard hypervisor firmware register
  KVM: arm64: Setup a framework for hypercall bitmap firmware registers
  KVM: arm64: Factor out firmware register handling from psci.c

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:47:03 +01:00
Marc Zyngier
20492a62b9 KVM: arm64: pmu: Restore compilation when HW_PERF_EVENTS isn't selected
Moving kvm_pmu_events into the vcpu (and refering to it) broke the
somewhat unusual case where the kernel has no support for a PMU
at all.

In order to solve this, move things around a bit so that we can
easily avoid refering to the pmu structure outside of PMU-aware
code. As a bonus, pmu.c isn't compiled in when HW_PERF_EVENTS
isn't selected.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/202205161814.KQHpOzsJ-lkp@intel.com
2022-05-16 13:42:41 +01:00
Fuad Tabba
84d751a019 KVM: arm64: Pass pmu events to hyp via vcpu
Instead of the host accessing hyp data directly, pass the pmu
events of the current cpu to hyp via the vcpu.

This adds 64 bits (in two fields) to the vcpu that need to be
synced before every vcpu run in nvhe and protected modes.
However, it isolates the hypervisor from the host, which allows
us to use pmu in protected mode in a subsequent patch.

No visible side effects in behavior intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220510095710.148178-4-tabba@google.com
2022-05-15 11:26:41 +01:00
Alexandru Elisei
f1f0c0cfea KVM: arm64: Don't BUG_ON() if emulated register table is unsorted
To emulate a register access, KVM uses a table of registers sorted by
register encoding to speed up queries using binary search.

When Linux boots, KVM checks that the table is sorted and uses a BUG_ON()
statement to let the user know if it's not. The unfortunate side effect is
that an unsorted sysreg table brings down the whole kernel, not just KVM,
even though the rest of the kernel can function just fine without KVM. To
make matters worse, on machines which lack a serial console, the user is
left pondering why the machine is taking so long to boot.

Improve this situation by returning an error from kvm_arch_init() if the
sysreg tables are not in the correct order. The machine is still very much
usable for the user, with the exception of virtualization, who can now
easily determine what went wrong.

A minor typo has also been corrected in the check_sysreg_table() function.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220428103405.70884-2-alexandru.elisei@arm.com
2022-05-04 17:52:50 +01:00
Marc Zyngier
d25f30fe41 Merge branch kvm-arm64/aarch32-idreg-trap into kvmarm-master/next
* kvm-arm64/aarch32-idreg-trap:
  : .
  : Add trapping/sanitising infrastructure for AArch32 systen registers,
  : allowing more control over what we actually expose (such as the PMU).
  :
  : Patches courtesy of Oliver and Alexandru.
  : .
  KVM: arm64: Fix new instances of 32bit ESRs
  KVM: arm64: Hide AArch32 PMU registers when not available
  KVM: arm64: Start trapping ID registers for 32 bit guests
  KVM: arm64: Plumb cp10 ID traps through the AArch64 sysreg handler
  KVM: arm64: Wire up CP15 feature registers to their AArch64 equivalents
  KVM: arm64: Don't write to Rt unless sys_reg emulation succeeds
  KVM: arm64: Return a bool from emulate_cp()

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-04 09:42:45 +01:00
Marc Zyngier
b2c4caf331 Merge branch kvm-arm64/wfxt into kvmarm-master/next
* kvm-arm64/wfxt:
  : .
  : Add support for the WFET/WFIT instructions that provide the same
  : service as WFE/WFI, only with a timeout.
  : .
  KVM: arm64: Expose the WFXT feature to guests
  KVM: arm64: Offer early resume for non-blocking WFxT instructions
  KVM: arm64: Handle blocking WFIT instruction
  KVM: arm64: Introduce kvm_counter_compute_delta() helper
  KVM: arm64: Simplify kvm_cpu_has_pending_timer()
  arm64: Use WFxT for __delay() when possible
  arm64: Add wfet()/wfit() helpers
  arm64: Add HWCAP advertising FEAT_WFXT
  arm64: Add RV and RN fields for ESR_ELx_WFx_ISS
  arm64: Expand ESR_ELx_WFx_ISS_TI to match its ARMv8.7 definition

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-04 09:42:16 +01:00
Oliver Upton
bfbab44568 KVM: arm64: Implement PSCI SYSTEM_SUSPEND
ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
software to request that a system be placed in the deepest possible
low-power state. Effectively, software can use this to suspend itself to
RAM.

Unfortunately, there really is no good way to implement a system-wide
PSCI call in KVM. Any precondition checks done in the kernel will need
to be repeated by userspace since there is no good way to protect a
critical section that spans an exit to userspace. SYSTEM_RESET and
SYSTEM_OFF are equally plagued by this issue, although no users have
seemingly cared for the relatively long time these calls have been
supported.

The solution is to just make the whole implementation userspace's
problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that
indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND.
Additionally, add a CAP to get buy-in from userspace for this new exit
type.

Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in.
If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide
explicit documentation of userspace's responsibilites for the exit and
point to the PSCI specification to describe the actual PSCI call.

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
2022-05-04 09:28:45 +01:00
Oliver Upton
7b33a09d03 KVM: arm64: Add support for userspace to suspend a vCPU
Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
is in a suspended state. In the suspended state the vCPU will block
until a wakeup event (pending interrupt) is recognized.

Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
userspace that KVM has recognized one such wakeup event. It is the
responsibility of userspace to then make the vCPU runnable, or leave it
suspended until the next wakeup event.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
2022-05-04 09:28:45 +01:00
Oliver Upton
b171f9bbb1 KVM: arm64: Track vCPU power state using MP state values
A subsequent change to KVM will add support for additional power states.
Store the MP state by value rather than keeping track of it as a
boolean.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-4-oupton@google.com
2022-05-04 09:28:44 +01:00
Oliver Upton
1e5794295c KVM: arm64: Dedupe vCPU power off helpers
vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the
former and replace all callsites to the latter.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-3-oupton@google.com
2022-05-04 09:28:44 +01:00
Raghavendra Rao Ananta
b22216e1a6 KVM: arm64: Add vendor hypervisor firmware register
Introduce the firmware register to hold the vendor specific
hypervisor service calls (owner value 6) as a bitmap. The
bitmap represents the features that'll be enabled for the
guest, as configured by the user-space. Currently, this
includes support for KVM-vendor features along with
reading the UID, represented by bit-0, and Precision Time
Protocol (PTP), represented by bit-1.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: tidy-up bitmap values]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-5-rananta@google.com
2022-05-03 21:30:19 +01:00
Raghavendra Rao Ananta
428fd6788d KVM: arm64: Add standard hypervisor firmware register
Introduce the firmware register to hold the standard hypervisor
service calls (owner value 5) as a bitmap. The bitmap represents
the features that'll be enabled for the guest, as configured by
the user-space. Currently, this includes support only for
Paravirtualized time, represented by bit-0.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: tidy-up bitmap values]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-4-rananta@google.com
2022-05-03 21:30:19 +01:00
Raghavendra Rao Ananta
05714cab7d KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM regularly introduces new hypercall services to the guests without
any consent from the userspace. This means, the guests can observe
hypercall services in and out as they migrate across various host
kernel versions. This could be a major problem if the guest
discovered a hypercall, started using it, and after getting migrated
to an older kernel realizes that it's no longer available. Depending
on how the guest handles the change, there's a potential chance that
the guest would just panic.

As a result, there's a need for the userspace to elect the services
that it wishes the guest to discover. It can elect these services
based on the kernels spread across its (migration) fleet. To remedy
this, extend the existing firmware pseudo-registers, such as
KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
for all the hypercall services available.

These firmware registers are categorized based on the service call
owners, but unlike the existing firmware pseudo-registers, they hold
the features supported in the form of a bitmap.

During the VM initialization, the registers are set to upper-limit of
the features supported by the corresponding registers. It's expected
that the VMMs discover the features provided by each register via
GET_ONE_REG, and write back the desired values using SET_ONE_REG.
KVM allows this modification only until the VM has started.

Some of the standard features are not mapped to any bits of the
registers. But since they can recreate the original problem of
making it available without userspace's consent, they need to
be explicitly added to the case-list in
kvm_hvc_call_default_allowed(). Any function-id that's not enabled
via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
be returned as SMCCC_RET_NOT_SUPPORTED to the guest.

Older userspace code can simply ignore the feature and the
hypercall services will be exposed unconditionally to the guests,
thus ensuring backward compatibility.

In this patch, the framework adds the register only for ARM's standard
secure services (owner value 4). Currently, this includes support only
for ARM True Random Number Generator (TRNG) service, with bit-0 of the
register representing mandatory features of v1.0. Other services are
momentarily added in the upcoming patches.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: reduced the scope of some helpers, tidy-up bitmap max values,
 dropped error-only fast path]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
2022-05-03 21:30:19 +01:00
Oliver Upton
9369bc5c5e KVM: arm64: Plumb cp10 ID traps through the AArch64 sysreg handler
In order to enable HCR_EL2.TID3 for AArch32 guests KVM needs to handle
traps where ESR_EL2.EC=0x8, which corresponds to an attempted VMRS
access from an ID group register. Specifically, the MVFR{0-2} registers
are accessed this way from AArch32. Conveniently, these registers are
architecturally mapped to MVFR{0-2}_EL1 in AArch64. Furthermore, KVM
already handles reads to these aliases in AArch64.

Plumb VMRS read traps through to the general AArch64 system register
handler.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220503060205.2823727-5-oupton@google.com
2022-05-03 11:14:34 +01:00
Sean Christopherson
f502cc568d KVM: Add max_vcpus field in common 'struct kvm'
For TDX guests, the maximum number of vcpus needs to be specified when the
TDX guest VM is initialized (creating the TDX data corresponding to TDX
guest) before creating vcpu.  It needs to record the maximum number of
vcpus on VM creation (KVM_CREATE_VM) and return error if the number of
vcpus exceeds it

Because there is already max_vcpu member in arm64 struct kvm_arch, move it
to common struct kvm and initialize it to KVM_MAX_VCPUS before
kvm_arch_init_vm() instead of adding it to x86 struct kvm_arch.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-Id: <e53234cdee6a92357d06c80c03d77c19cdefb804.1646422845.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-02 11:42:42 -04:00
Alexandru Elisei
0b12620fdd KVM: arm64: Treat ESR_EL2 as a 64-bit register
ESR_EL2 was defined as a 32-bit register in the initial release of the
ARM Architecture Manual for Armv8-A, and was later extended to 64 bits,
with bits [63:32] RES0. ARMv8.7 introduced FEAT_LS64, which makes use of
bits [36:32].

KVM treats ESR_EL1 as a 64-bit register when saving and restoring the
guest context, but ESR_EL2 is handled as a 32-bit register. Start
treating ESR_EL2 as a 64-bit register to allow KVM to make use of the
most significant 32 bits in the future.

The type chosen to represent ESR_EL2 is u64, as that is consistent with the
notation KVM overwhelmingly uses today (u32), and how the rest of the
registers are declared.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-5-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Mark Brown
861262ab86 KVM: arm64: Handle SME host state when running guests
While we don't currently support SME in guests we do currently support it
for the host system so we need to take care of SME's impact, including
the floating point register state, when running guests. Simiarly to SVE
we need to manage the traps in CPACR_RL1, what is new is the handling of
streaming mode and ZA.

Normally we defer any handling of the floating point register state until
the guest first uses it however if the system is in streaming mode FPSIMD
and SVE operations may generate SME traps which we would need to distinguish
from actual attempts by the guest to use SME. Rather than do this for the
time being if we are in streaming mode when entering the guest we force
the floating point state to be saved immediately and exit streaming mode,
meaning that the guest won't generate SME traps for supported operations.

We could handle ZA in the access trap similarly to the FPSIMD/SVE state
without the disruption caused by streaming mode but for simplicity
handle it the same way as streaming mode for now.

This will be revisited when we support SME for guests (hopefully before SME
hardware becomes available), for now it will only incur additional cost on
systems with SME and even there only if streaming mode or ZA are enabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419112247.711548-27-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:23 +01:00
Mark Brown
0033cd9339 arm64/sme: Implement ZA context switching
Allocate space for storing ZA on first access to SME and use that to save
and restore ZA state when context switching. We do this by using the vector
form of the LDR and STR ZA instructions, these do not require streaming
mode and have implementation recommendations that they avoid contention
issues in shared SMCU implementations.

Since ZA is architecturally guaranteed to be zeroed when enabled we do not
need to explicitly zero ZA, either we will be restoring from a saved copy
or trapping on first use of SME so we know that ZA must be disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-16-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22 18:51:02 +01:00
Marc Zyngier
89f5074c50 KVM: arm64: Handle blocking WFIT instruction
When trapping a blocking WFIT instruction, take it into account when
computing the deadline of the background timer.

The state is tracked with a new vcpu flag, and is gated by a new
CPU capability, which isn't currently enabled.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419182755.601427-6-maz@kernel.org
2022-04-20 13:24:45 +01:00
Reiji Watanabe
26bf74bd9f KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs
KVM allows userspace to configure either all EL1 32bit or 64bit vCPUs
for a guest.  At vCPU reset, vcpu_allowed_register_width() checks
if the vcpu's register width is consistent with all other vCPUs'.
Since the checking is done even against vCPUs that are not initialized
(KVM_ARM_VCPU_INIT has not been done) yet, the uninitialized vCPUs
are erroneously treated as 64bit vCPU, which causes the function to
incorrectly detect a mixed-width VM.

Introduce KVM_ARCH_FLAG_EL1_32BIT and KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED
bits for kvm->arch.flags.  A value of the EL1_32BIT bit indicates that
the guest needs to be configured with all 32bit or 64bit vCPUs, and
a value of the REG_WIDTH_CONFIGURED bit indicates if a value of the
EL1_32BIT bit is valid (already set up). Values in those bits are set at
the first KVM_ARM_VCPU_INIT for the guest based on KVM_ARM_VCPU_EL1_32BIT
configuration for the vCPU.

Check vcpu's register width against those new bits at the vcpu's
KVM_ARM_VCPU_INIT (instead of against other vCPUs' register width).

Fixes: 66e94d5caf ("KVM: arm64: Prevent mixed-width VM creation")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220329031924.619453-2-reijiw@google.com
2022-04-06 12:29:45 +01:00
Linus Torvalds
1ebdbeb03e ARM:
- Proper emulation of the OSLock feature of the debug architecture
 
 - Scalibility improvements for the MMU lock when dirty logging is on
 
 - New VMID allocator, which will eventually help with SVA in VMs
 
 - Better support for PMUs in heterogenous systems
 
 - PSCI 1.1 support, enabling support for SYSTEM_RESET2
 
 - Implement CONFIG_DEBUG_LIST at EL2
 
 - Make CONFIG_ARM64_ERRATUM_2077057 default y
 
 - Reduce the overhead of VM exit when no interrupt is pending
 
 - Remove traces of 32bit ARM host support from the documentation
 
 - Updated vgic selftests
 
 - Various cleanups, doc updates and spelling fixes
 
 RISC-V:
 
 - Prevent KVM_COMPAT from being selected
 
 - Optimize __kvm_riscv_switch_to() implementation
 
 - RISC-V SBI v0.3 support
 
 s390:
 
 - memop selftest
 
 - fix SCK locking
 
 - adapter interruptions virtualization for secure guests
 
 - add Claudio Imbrenda as maintainer
 
 - first step to do proper storage key checking
 
 x86:
 
 - Continue switching kvm_x86_ops to static_call(); introduce
   static_call_cond() and __static_call_ret0 when applicable.
 
 - Cleanup unused arguments in several functions
 
 - Synthesize AMD 0x80000021 leaf
 
 - Fixes and optimization for Hyper-V sparse-bank hypercalls
 
 - Implement Hyper-V's enlightened MSR bitmap for nested SVM
 
 - Remove MMU auditing
 
 - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
   page tracking is enabled
 
 - Cleanup the implementation of the guest PGD cache
 
 - Preparation for the implementation of Intel IPI virtualization
 
 - Fix some segment descriptor checks in the emulator
 
 - Allow AMD AVIC support on systems with physical APIC ID above 255
 
 - Better API to disable virtualization quirks
 
 - Fixes and optimizations for the zapping of page tables:
 
   - Zap roots in two passes, avoiding RCU read-side critical sections
     that last too long for very large guests backed by 4 KiB SPTEs.
 
   - Zap invalid and defunct roots asynchronously via concurrency-managed
     work queue.
 
   - Allowing yielding when zapping TDP MMU roots in response to the root's
     last reference being put.
 
   - Batch more TLB flushes with an RCU trick.  Whoever frees the paging
     structure now holds RCU as a proxy for all vCPUs running in the guest,
     i.e. to prolongs the grace period on their behalf.  It then kicks the
     the vCPUs out of guest mode before doing rcu_read_unlock().
 
 Generic:
 
 - Introduce __vcalloc and use it for very large allocations that
   need memcg accounting
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmI4fdwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMq8gf/WoeVHtw2QlL5Mmz6McvRRmPAYPLV
 wLUIFNrRqRvd8Tw4kivzZoh/xTpwmnojv0YdK5SjKAiMjgv094YI1LrNp1JSPvmL
 pitocMkA10RSJNWHeEMg9cMSKH0rKiqeYl6S1e2XsdB+UZZ2BINOCVtvglmjTAvJ
 dFBdKdBkqjAUZbdXAGIvz4JEEER3N/LkFDKGaUGX+0QIQOzGBPIyLTxynxIDG6mt
 RViCCFyXdy5NkVp5hZFm96vQ2qAlWL9B9+iKruQN++82+oqWbeTdSqPhdwF7GyFz
 BfOv3gobQ2c4ef/aMLO5LswZ9joI1t/4kQbbAn6dNybpOAz/NXfDnbNefg==
 =keox
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - Proper emulation of the OSLock feature of the debug architecture

   - Scalibility improvements for the MMU lock when dirty logging is on

   - New VMID allocator, which will eventually help with SVA in VMs

   - Better support for PMUs in heterogenous systems

   - PSCI 1.1 support, enabling support for SYSTEM_RESET2

   - Implement CONFIG_DEBUG_LIST at EL2

   - Make CONFIG_ARM64_ERRATUM_2077057 default y

   - Reduce the overhead of VM exit when no interrupt is pending

   - Remove traces of 32bit ARM host support from the documentation

   - Updated vgic selftests

   - Various cleanups, doc updates and spelling fixes

  RISC-V:
   - Prevent KVM_COMPAT from being selected

   - Optimize __kvm_riscv_switch_to() implementation

   - RISC-V SBI v0.3 support

  s390:
   - memop selftest

   - fix SCK locking

   - adapter interruptions virtualization for secure guests

   - add Claudio Imbrenda as maintainer

   - first step to do proper storage key checking

  x86:
   - Continue switching kvm_x86_ops to static_call(); introduce
     static_call_cond() and __static_call_ret0 when applicable.

   - Cleanup unused arguments in several functions

   - Synthesize AMD 0x80000021 leaf

   - Fixes and optimization for Hyper-V sparse-bank hypercalls

   - Implement Hyper-V's enlightened MSR bitmap for nested SVM

   - Remove MMU auditing

   - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
     page tracking is enabled

   - Cleanup the implementation of the guest PGD cache

   - Preparation for the implementation of Intel IPI virtualization

   - Fix some segment descriptor checks in the emulator

   - Allow AMD AVIC support on systems with physical APIC ID above 255

   - Better API to disable virtualization quirks

   - Fixes and optimizations for the zapping of page tables:

      - Zap roots in two passes, avoiding RCU read-side critical
        sections that last too long for very large guests backed by 4
        KiB SPTEs.

      - Zap invalid and defunct roots asynchronously via
        concurrency-managed work queue.

      - Allowing yielding when zapping TDP MMU roots in response to the
        root's last reference being put.

      - Batch more TLB flushes with an RCU trick. Whoever frees the
        paging structure now holds RCU as a proxy for all vCPUs running
        in the guest, i.e. to prolongs the grace period on their behalf.
        It then kicks the the vCPUs out of guest mode before doing
        rcu_read_unlock().

  Generic:
   - Introduce __vcalloc and use it for very large allocations that need
     memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  KVM: use kvcalloc for array allocations
  KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
  kvm: x86: Require const tsc for RT
  KVM: x86: synthesize CPUID leaf 0x80000021h if useful
  KVM: x86: add support for CPUID leaf 0x80000021
  KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
  Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
  kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
  KVM: arm64: fix typos in comments
  KVM: arm64: Generalise VM features into a set of flags
  KVM: s390: selftests: Add error memop tests
  KVM: s390: selftests: Add more copy memop tests
  KVM: s390: selftests: Add named stages for memop test
  KVM: s390: selftests: Add macro as abstraction for MEM_OP
  KVM: s390: selftests: Split memop tests
  KVM: s390x: fix SCK locking
  RISC-V: KVM: Implement SBI HSM suspend call
  RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
  RISC-V: Add SBI HSM suspend related defines
  RISC-V: KVM: Implement SBI v0.3 SRST extension
  ...
2022-03-24 11:58:57 -07:00
Marc Zyngier
06394531b4 KVM: arm64: Generalise VM features into a set of flags
We currently deal with a set of booleans for VM features,
while they could be better represented as set of flags
contained in an unsigned long, similarily to what we are
doing on the CPU side.

Signed-off-by: Marc Zyngier <maz@kernel.org>
[Oliver: Flag-ify the 'ran_once' boolean]
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220311174001.605719-2-oupton@google.com
2022-03-18 14:02:33 +00:00
James Morse
5bdf343760 KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
CPUs vulnerable to Spectre-BHB either need to make an SMC-CC firmware
call from the vectors, or run a sequence of branches. This gets added
to the hyp vectors. If there is no support for arch-workaround-1 in
firmware, the indirect vector will be used.

kvm_init_vector_slots() only initialises the two indirect slots if
the platform is vulnerable to Spectre-v3a. pKVM's hyp_map_vectors()
only initialises __hyp_bp_vect_base if the platform is vulnerable to
Spectre-v3a.

As there are about to more users of the indirect vectors, ensure
their entries in hyp_spectre_vector_selector[] are always initialised,
and __hyp_bp_vect_base defaults to the regular VA mapping.

The Spectre-v3a check is moved to a helper
kvm_system_needs_idmapped_vectors(), and merged with the code
that creates the hyp mappings.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
2022-02-15 17:38:25 +00:00
Marc Zyngier
00e6dae00e Merge branch kvm-arm64/pmu-bl into kvmarm-master/next
* kvm-arm64/pmu-bl:
  : .
  : Improve PMU support on heterogeneous systems, courtesy of Alexandru Elisei
  : .
  KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  KVM: arm64: Keep a list of probed PMUs
  KVM: arm64: Keep a per-VM pointer to the default PMU
  perf: Fix wrong name in comment for struct perf_cpu_context
  KVM: arm64: Do not change the PMU event filter after a VCPU has run

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-02-08 17:54:41 +00:00
Alexandru Elisei
583cda1b0e KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
device ioctl. If the VCPU is scheduled on a physical CPU which has a
different PMU, the perf events needed to emulate a guest PMU won't be
scheduled in and the guest performance counters will stop counting. Treat
it as an userspace error and refuse to run the VCPU in this situation.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
2022-02-08 17:51:22 +00:00
Marc Zyngier
46b1878214 KVM: arm64: Keep a per-VM pointer to the default PMU
As we are about to allow selection of the PMU exposed to a guest, start by
keeping track of the default one instead of only the PMU version.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220127161759.53553-4-alexandru.elisei@arm.com
2022-02-08 17:51:21 +00:00
Marc Zyngier
5177fe91e4 KVM: arm64: Do not change the PMU event filter after a VCPU has run
Userspace can specify which events a guest is allowed to use with the
KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be
identified by a guest from reading the PMCEID{0,1}_EL0 registers.

Changing the PMU event filter after a VCPU has run can cause reads of the
registers performed before the filter is changed to return different values
than reads performed with the new event filter in place. The architecture
defines the two registers as read-only, and this behaviour contradicts
that.

Keep track when the first VCPU has run and deny changes to the PMU event
filter to prevent this from happening.

Signed-off-by: Marc Zyngier <maz@kernel.org>
[ Alexandru E: Added commit message, updated ioctl documentation ]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
2022-02-08 17:51:21 +00:00
Marc Zyngier
ebca68972e Merge branch kvm-arm64/vmid-allocator into kvmarm-master/next
* kvm-arm64/vmid-allocator:
  : .
  : VMID allocation rewrite from Shameerali Kolothum Thodi, paving the
  : way for pinned VMIDs and SVA.
  : .
  KVM: arm64: Make active_vmids invalid on vCPU schedule out
  KVM: arm64: Align the VMID allocation with the arm64 ASID
  KVM: arm64: Make VMID bits accessible outside of allocator
  KVM: arm64: Introduce a new VMID allocator for KVM

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-02-08 14:58:38 +00:00
Shameer Kolothum
100b4f092f KVM: arm64: Make active_vmids invalid on vCPU schedule out
Like ASID allocator, we copy the active_vmids into the
reserved_vmids on a rollover. But it's unlikely that
every CPU will have a vCPU as current task and we may
end up unnecessarily reserving the VMID space.

Hence, set active_vmids to an invalid one when scheduling
out a vCPU.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-5-shameerali.kolothum.thodi@huawei.com
2022-02-08 14:57:04 +00:00
Julien Grall
3248136b36 KVM: arm64: Align the VMID allocation with the arm64 ASID
At the moment, the VMID algorithm will send an SGI to all the
CPUs to force an exit and then broadcast a full TLB flush and
I-Cache invalidation.

This patch uses the new VMID allocator. The benefits are:
   - Aligns with arm64 ASID algorithm.
   - CPUs are not forced to exit at roll-over. Instead,
     the VMID will be marked reserved and context invalidation
     is broadcasted. This will reduce the IPIs traffic.
   - More flexible to add support for pinned KVM VMIDs in
     the future.
   
With the new algo, the code is now adapted:
    - The call to update_vmid() will be done with preemption
      disabled as the new algo requires to store information
      per-CPU.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-4-shameerali.kolothum.thodi@huawei.com
2022-02-08 14:57:03 +00:00
Shameer Kolothum
f8051e9609 KVM: arm64: Make VMID bits accessible outside of allocator
Since we already set the kvm_arm_vmid_bits in the VMID allocator
init function, make it accessible outside as well so that it can
be used in the subsequent patch.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-3-shameerali.kolothum.thodi@huawei.com
2022-02-08 14:46:28 +00:00
Shameer Kolothum
417838392f KVM: arm64: Introduce a new VMID allocator for KVM
A new VMID allocator for arm64 KVM use. This is based on
arm64 ASID allocator algorithm.

One major deviation from the ASID allocator is the way we
flush the context. Unlike ASID allocator, we expect less
frequent rollover in the case of VMIDs. Hence, instead of
marking the CPU as flush_pending and issuing a local context
invalidation on the next context switch, we  broadcast TLB
flush + I-cache invalidation over the inner shareable domain
on rollover.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-2-shameerali.kolothum.thodi@huawei.com
2022-02-08 14:46:28 +00:00
Marc Zyngier
2bb48074b3 Merge branch kvm-arm64/mmu-rwlock into kvmarm-master/next
* kvm-arm64/mmu-rwlock:
  : .
  : MMU locking optimisations from Jing Zhang, allowing permission
  : relaxations to occur in parallel.
  : .
  KVM: selftests: Add vgic initialization for dirty log perf test for ARM
  KVM: arm64: Add fast path to handle permission relaxation during dirty logging
  KVM: arm64: Use read/write spin lock for MMU protection

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-02-08 14:29:29 +00:00
Jing Zhang
fcc5bf8963 KVM: arm64: Use read/write spin lock for MMU protection
Replace MMU spinlock with rwlock and update all instances of the lock
being acquired with a write lock acquisition.
Future commit will add a fast path for permission relaxation during
dirty logging under a read lock.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220118015703.3630552-2-jingzhangos@google.com
2022-02-08 14:27:52 +00:00
Oliver Upton
7dabf02f43 KVM: arm64: Emulate the OS Lock
The OS lock blocks all debug exceptions at every EL. To date, KVM has
not implemented the OS lock for its guests, despite the fact that it is
mandatory per the architecture. Simple context switching between the
guest and host is not appropriate, as its effects are not constrained to
the guest context.

Emulate the OS Lock by clearing MDE and SS in MDSCR_EL1, thereby
blocking all but software breakpoint instructions.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-5-oupton@google.com
2022-02-08 14:23:41 +00:00
Oliver Upton
d42e26716d KVM: arm64: Stash OSLSR_EL1 in the cpu context
An upcoming change to KVM will emulate the OS Lock from the PoV of the
guest. Add OSLSR_EL1 to the cpu context and handle reads using the
stored value. Define some mnemonics for for handling the OSLM field and
use them to make the reset value of OSLSR_EL1 more readable.

Wire up a custom handler for writes from userspace and prevent any of
the invariant bits from changing. Note that the OSLK bit is not
invariant and will be made writable by the aforementioned change.

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-3-oupton@google.com
2022-02-08 14:23:41 +00:00
Linus Torvalds
79e06c4c49 RISCV:
- Use common KVM implementation of MMU memory caches
 
 - SBI v0.2 support for Guest
 
 - Initial KVM selftests support
 
 - Fix to avoid spurious virtual interrupts after clearing hideleg CSR
 
 - Update email address for Anup and Atish
 
 ARM:
 - Simplification of the 'vcpu first run' by integrating it into
   KVM's 'pid change' flow
 
 - Refactoring of the FP and SVE state tracking, also leading to
   a simpler state and less shared data between EL1 and EL2 in
   the nVHE case
 
 - Tidy up the header file usage for the nvhe hyp object
 
 - New HYP unsharing mechanism, finally allowing pages to be
   unmapped from the Stage-1 EL2 page-tables
 
 - Various pKVM cleanups around refcounting and sharing
 
 - A couple of vgic fixes for bugs that would trigger once
   the vcpu xarray rework is merged, but not sooner
 
 - Add minimal support for ARMv8.7's PMU extension
 
 - Rework kvm_pgtable initialisation ahead of the NV work
 
 - New selftest for IRQ injection
 
 - Teach selftests about the lack of default IPA space and
   page sizes
 
 - Expand sysreg selftest to deal with Pointer Authentication
 
 - The usual bunch of cleanups and doc update
 
 s390:
 - fix sigp sense/start/stop/inconsistency
 
 - cleanups
 
 x86:
 - Clean up some function prototypes more
 
 - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation
 
 - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
 
 - completely remove potential TOC/TOU races in nested SVM consistency checks
 
 - update some PMCs on emulated instructions
 
 - Intel AMX support (joint work between Thomas and Intel)
 
 - large MMU cleanups
 
 - module parameter to disable PMU virtualization
 
 - cleanup register cache
 
 - first part of halt handling cleanups
 
 - Hyper-V enlightened MSR bitmap support for nested hypervisors
 
 Generic:
 - clean up Makefiles
 
 - introduce CONFIG_HAVE_KVM_DIRTY_RING
 
 - optimize memslot lookup using a tree
 
 - optimize vCPU array usage by converting to xarray
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHhxvsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPZkAf+Nz92UL/5nNGcdHtE4m7AToMmitE9
 bYkesf9BMQvAe5wjkABLuoHGi6ay4jabo4fiGzbdkiK7lO5YgfsWiMB3/MT5fl4E
 jRPzaVQabp3YZLM8UYCBmfUVuRj524S967SfSRe0AvYjDEH8y7klPf4+7sCsFT0/
 Px9Vf2KGuOlf0eM78yKg4rGaF0jS22eLgXm6FfNMY8/e29ZAo/jyUmqBY+Z2xxZG
 aWhceDtSheW1jwLHLj3nOlQJvHTn8LVGXBE/R8Gda3ZjrBV2rKaDi4Fh+HD+dz86
 2zVXwzQ7uck2CMW73GMoXMTWoKSHMyvlBOs1BdvBm4UsnGcXR+q8IFCeuQ==
 =s73m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "RISCV:

   - Use common KVM implementation of MMU memory caches

   - SBI v0.2 support for Guest

   - Initial KVM selftests support

   - Fix to avoid spurious virtual interrupts after clearing hideleg CSR

   - Update email address for Anup and Atish

  ARM:

   - Simplification of the 'vcpu first run' by integrating it into KVM's
     'pid change' flow

   - Refactoring of the FP and SVE state tracking, also leading to a
     simpler state and less shared data between EL1 and EL2 in the nVHE
     case

   - Tidy up the header file usage for the nvhe hyp object

   - New HYP unsharing mechanism, finally allowing pages to be unmapped
     from the Stage-1 EL2 page-tables

   - Various pKVM cleanups around refcounting and sharing

   - A couple of vgic fixes for bugs that would trigger once the vcpu
     xarray rework is merged, but not sooner

   - Add minimal support for ARMv8.7's PMU extension

   - Rework kvm_pgtable initialisation ahead of the NV work

   - New selftest for IRQ injection

   - Teach selftests about the lack of default IPA space and page sizes

   - Expand sysreg selftest to deal with Pointer Authentication

   - The usual bunch of cleanups and doc update

  s390:

   - fix sigp sense/start/stop/inconsistency

   - cleanups

  x86:

   - Clean up some function prototypes more

   - improved gfn_to_pfn_cache with proper invalidation, used by Xen
     emulation

   - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery

   - completely remove potential TOC/TOU races in nested SVM consistency
     checks

   - update some PMCs on emulated instructions

   - Intel AMX support (joint work between Thomas and Intel)

   - large MMU cleanups

   - module parameter to disable PMU virtualization

   - cleanup register cache

   - first part of halt handling cleanups

   - Hyper-V enlightened MSR bitmap support for nested hypervisors

  Generic:

   - clean up Makefiles

   - introduce CONFIG_HAVE_KVM_DIRTY_RING

   - optimize memslot lookup using a tree

   - optimize vCPU array usage by converting to xarray"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits)
  x86/fpu: Fix inline prefix warnings
  selftest: kvm: Add amx selftest
  selftest: kvm: Move struct kvm_x86_state to header
  selftest: kvm: Reorder vcpu_load_state steps for AMX
  kvm: x86: Disable interception for IA32_XFD on demand
  x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
  kvm: selftests: Add support for KVM_CAP_XSAVE2
  kvm: x86: Add support for getting/setting expanded xstate buffer
  x86/fpu: Add uabi_size to guest_fpu
  kvm: x86: Add CPUID support for Intel AMX
  kvm: x86: Add XCR0 support for Intel AMX
  kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
  kvm: x86: Emulate IA32_XFD_ERR for guest
  kvm: x86: Intercept #NM for saving IA32_XFD_ERR
  x86/fpu: Prepare xfd_err in struct fpu_guest
  kvm: x86: Add emulation for IA32_XFD
  x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
  kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
  x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
  x86/fpu: Add guest support to xfd_enable_feature()
  ...
2022-01-16 16:15:14 +02:00
Paolo Bonzini
7fd55a02a4 KVM/arm64 updates for Linux 5.16
- Simplification of the 'vcpu first run' by integrating it into
   KVM's 'pid change' flow
 
 - Refactoring of the FP and SVE state tracking, also leading to
   a simpler state and less shared data between EL1 and EL2 in
   the nVHE case
 
 - Tidy up the header file usage for the nvhe hyp object
 
 - New HYP unsharing mechanism, finally allowing pages to be
   unmapped from the Stage-1 EL2 page-tables
 
 - Various pKVM cleanups around refcounting and sharing
 
 - A couple of vgic fixes for bugs that would trigger once
   the vcpu xarray rework is merged, but not sooner
 
 - Add minimal support for ARMv8.7's PMU extension
 
 - Rework kvm_pgtable initialisation ahead of the NV work
 
 - New selftest for IRQ injection
 
 - Teach selftests about the lack of default IPA space and
   page sizes
 
 - Expand sysreg selftest to deal with Pointer Authentication
 
 - The usual bunch of cleanups and doc update
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmHYIpgPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDndsP/RsBmX6bmQnDEhaaqfGAxOETyq/my1eT9r/V
 3Ax4fEqSFfD5yHbYvqNRC8ueycH4r8WAr4ACWDAI6XpS/pYx00nx2N+HCSgjGyQR
 FeXqITuGPEsn4NkGuPci0PFmI8rVUzanl1ugRGQAETVrZo2ZVH2uqKVGT8XOlu0J
 FB/0x6Z4vMuIgEXyfa+DZ8WdW1aCRgPU2oyOdSdWE57/grjyLJqk6EdMmLyaQ19E
 vz6vXuRnA/GQwOtByqYEnQ8a4VXsQedCMqg/f9mj0BxpDzxC1ps8Nrpv36aJXKUN
 LEXapP9bCWPW9LqaKAOZnQYrUIIEFHsCUom0n3reDHrgObA+jivpz75L8GEr3CdC
 Bv78N04Yymjpp2WA6CuO3r9HjL1nJ6tYqobXU2pvqln4nNC3Ukucjq9ZVuWgS6Hx
 qOZXgPcZ/HpS3l/U+dAu8yIcV2SchQXDudaq8BsfLd8M1bD+oirSBolZFSvz7MYZ
 6+jtEDLUOEO5s4rXiJF46+MauxiELcjaewAEK4WwrS8NBwEyhYe9EPsYcQ5pcrQF
 QwAd1+y7oLfhpGHv5KJKWswfvbtlLCm6NOAhawq0UXM8bS+79tu0dGjiDzVPBuSf
 SyA3VtBSKxcpvCrljw9ubtjxvKrviE0MDvlmTP2B1NU+lwm8xRBiwUwOH29qP9zU
 HDeUj2fy
 =HkZk
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.16

- Simplification of the 'vcpu first run' by integrating it into
  KVM's 'pid change' flow

- Refactoring of the FP and SVE state tracking, also leading to
  a simpler state and less shared data between EL1 and EL2 in
  the nVHE case

- Tidy up the header file usage for the nvhe hyp object

- New HYP unsharing mechanism, finally allowing pages to be
  unmapped from the Stage-1 EL2 page-tables

- Various pKVM cleanups around refcounting and sharing

- A couple of vgic fixes for bugs that would trigger once
  the vcpu xarray rework is merged, but not sooner

- Add minimal support for ARMv8.7's PMU extension

- Rework kvm_pgtable initialisation ahead of the NV work

- New selftest for IRQ injection

- Teach selftests about the lack of default IPA space and
  page sizes

- Expand sysreg selftest to deal with Pointer Authentication

- The usual bunch of cleanups and doc update
2022-01-07 10:42:19 -05:00
Marc Zyngier
1c53a1ae36 Merge branch kvm-arm64/misc-5.17 into kvmarm-master/next
* kvm-arm64/misc-5.17:
  : .
  : Misc fixes and improvements:
  : - Add minimal support for ARMv8.7's PMU extension
  : - Constify kvm_io_gic_ops
  : - Drop kvm_is_transparent_hugepage() prototype
  : - Drop unused workaround_flags field
  : - Rework kvm_pgtable initialisation
  : - Documentation fixes
  : - Replace open-coded SCTLR_EL1.EE useage with its defined macro
  : - Sysreg list selftest update to handle PAuth
  : - Include cleanups
  : .
  KVM: arm64: vgic: Replace kernel.h with the necessary inclusions
  KVM: arm64: Fix comment typo in kvm_vcpu_finalize_sve()
  KVM: arm64: selftests: get-reg-list: Add pauth configuration
  KVM: arm64: Fix comment on barrier in kvm_psci_vcpu_on()
  KVM: arm64: Fix comment for kvm_reset_vcpu()
  KVM: arm64: Use defined value for SCTLR_ELx_EE
  KVM: arm64: Rework kvm_pgtable initialisation
  KVM: arm64: Drop unused workaround_flags vcpu field

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-01-04 17:16:15 +00:00
Quentin Perret
52b28657eb KVM: arm64: pkvm: Unshare guest structs during teardown
Make use of the newly introduced unshare hypercall during guest teardown
to unmap guest-related data structures from the hyp stage-1.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-15-qperret@google.com
2021-12-16 12:58:57 +00:00
Marc Zyngier
142ff9bddb KVM: arm64: Drop unused workaround_flags vcpu field
workaround_flags is a leftover from our earlier Spectre-v4 workaround
implementation, and now serves no purpose.

Get rid of the field and the corresponding asm-offset definition.

Fixes: 29e8910a56 ("KVM: arm64: Simplify handling of ARCH_WORKAROUND_2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-08 14:54:07 +00:00
Sean Christopherson
005467e06b KVM: Drop obsolete kvm_arch_vcpu_block_finish()
Drop kvm_arch_vcpu_block_finish() now that all arch implementations are
nops.

No functional change intended.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:50 -05:00
Marc Zyngier
2d761dbf7f Merge branch kvm-arm64/fpsimd-tracking into kvmarm-master/next
* kvm-arm64/fpsimd-tracking:
  : .
  : Simplify the handling of both the FP/SIMD and SVE state by
  : removing the need for mapping the thread at EL2, and by
  : dropping the tracking of the host's SVE state which is
  : always invalid by construction.
  : .
  arm64/fpsimd: Document the use of TIF_FOREIGN_FPSTATE by KVM
  KVM: arm64: Stop mapping current thread_info at EL2
  KVM: arm64: Introduce flag shadowing TIF_FOREIGN_FPSTATE
  KVM: arm64: Remove unused __sve_save_state
  KVM: arm64: Get rid of host SVE tracking/saving
  KVM: arm64: Reorder vcpu flag definitions

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-01 12:13:44 +00:00
Marc Zyngier
cc5705fb1b KVM: arm64: Drop vcpu->arch.has_run_once for vcpu->pid
With the transition to kvm_arch_vcpu_run_pid_change() to handle
the "run once" activities, it becomes obvious that has_run_once
is now an exact shadow of vcpu->pid.

Replace vcpu->arch.has_run_once with a new vcpu_has_run_once()
helper that directly checks for vcpu->pid, and get rid of the
now unused field.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-01 11:51:22 +00:00
Marc Zyngier
052f064d42 KVM: arm64: Move kvm_arch_vcpu_run_pid_change() out of line
Having kvm_arch_vcpu_run_pid_change() inline doesn't bring anything
to the table. Move it next to kvm_vcpu_first_run_init(), which will
be convenient for what is next to come.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-01 11:51:21 +00:00
Marc Zyngier
bee14bca73 KVM: arm64: Stop mapping current thread_info at EL2
Now that we can track an equivalent of TIF_FOREIGN_FPSTATE, drop
the mapping of current's thread_info at EL2.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-11-22 16:01:39 +00:00
Marc Zyngier
af9a0e21d8 KVM: arm64: Introduce flag shadowing TIF_FOREIGN_FPSTATE
We currently have to maintain a mapping the thread_info structure
at EL2 in order to be able to check the TIF_FOREIGN_FPSTATE flag.

In order to eventually get rid of this, start with a vcpu flag that
shadows the thread flag on each entry into the hypervisor.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-11-22 16:01:39 +00:00
Marc Zyngier
8383741ab2 KVM: arm64: Get rid of host SVE tracking/saving
The SVE host tracking in KVM is pretty involved. It relies on a
set of flags tracking the ownership of the SVE register, as well
as that of the EL0 access.

It is also pretty scary: __hyp_sve_save_host() computes
a thread_struct pointer and obtains a sve_state which gets directly
accessed without further ado, even on nVHE. How can this even work?

The answer to that is that it doesn't, and that this is mostly dead
code. Closer examination shows that on executing a syscall, userspace
loses its SVE state entirely. This is part of the ABI. Another
thing to notice is that although the kernel provides helpers such as
kernel_neon_begin()/end(), they only deal with the FP/NEON state,
and not SVE.

Given that you can only execute a guest as the result of a syscall,
and that the kernel cannot use SVE by itself, it becomes pretty
obvious that there is never any host SVE state to save, and that
this code is only there to increase confusion.

Get rid of the TIF_SVE tracking and host save infrastructure altogether.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-11-22 15:30:27 +00:00
Marc Zyngier
892fd259cb KVM: arm64: Reorder vcpu flag definitions
The vcpu arch flags are in an interesting, semi random order.
As I have made the mistake of reusing a flag once, let's rework
this in an order that I find a bit less confusing.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-11-22 15:25:20 +00:00
Sean Christopherson
17ed14eba2 KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c
Call KVM's (un)register perf callbacks helpers directly from arm.c and
delete perf.c

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211111020738.2512932-17-seanjc@google.com
2021-11-17 14:49:11 +01:00
Sean Christopherson
e1bfc24577 KVM: Move x86's perf guest info callbacks to generic KVM
Move x86's perf guest callbacks into common KVM, as they are semantically
identical to arm64's callbacks (the only other such KVM callbacks).
arm64 will convert to the common versions in a future patch.

Implement the necessary arm64 arch hooks now to avoid having to provide
stubs or a temporary #define (from x86) to avoid arm64 compilation errors
when CONFIG_GUEST_PERF_EVENTS=y.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211111020738.2512932-13-seanjc@google.com
2021-11-17 14:49:10 +01:00
Sean Christopherson
2934e3d093 perf: Stop pretending that perf can handle multiple guest callbacks
Drop the 'int' return value from the perf (un)register callbacks helpers
and stop pretending perf can support multiple callbacks.  The 'int'
returns are not future proofing anything as none of the callers take
action on an error.  It's also not obvious that there will ever be
co-tenant hypervisors, and if there are, that allowing multiple callbacks
to be registered is desirable or even correct.

Opportunistically rename callbacks=>cbs in the affected declarations to
match their definitions.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20211111020738.2512932-5-seanjc@google.com
2021-11-17 14:49:07 +01:00
Paolo Bonzini
84886c262e KVM/arm64 fixes for 5.16, take #1
- Fix the host S2 finalization by solely iterating over the memblocks
   instead of the whole IPA space
 
 - Tighten the return value of kvm_vcpu_preferred_target() now that
   32bit support is long gone
 
 - Make sure the extraction of ESR_ELx.EC is limited to the architected
   bits
 
 - Comment fixups
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmGNhFYPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDA18P/RUmhBWJhx/KE4atccFK6Iy+L4q0vdZehBJN
 v/lqMQzqj2DOxClFWTYrLt/GJGjxy9IQorW1F2FTLGWVUp4LgwUPtWJed7qmXYD9
 loPq69eVc/c8uwtYRFEsfYSbIGHmwJr6WOO0oA7z8Q0HNusO7mLbSywmo4Uz8eYN
 hQHIBZ19WLCoNOz1SctxNfOj4RDav0ybKaR6XZbvuOHOd3wa8zgbNp9e495/ovtG
 agWhicFrBOccU7/mZhcGwP4wMI/A/lbxY7KicjrGFHjTCnCOCcFjB+smhhJsjQOr
 4KCK4cBvFkXjTRgh764K+KftT+PPQU82ThOoI2S9ZP5v7KdZON0813u/NlXFW/eI
 fiJegq568AuQvcJ3RJtNSBV3pgJYVVnG1wlrLNhFmm9F9Wzq9BTw62rgHEV2+k/7
 BTGzdHSluCKwxjX7CckEk8ZUsJHRdyApFJBwbPEZhngNg7ZVRqW9oGhO1Oxf8auP
 6DlTlCkT436/54YtWiN/Ipp9CptMsPqbX2pQp1kKIgtypmZLjTEGZcYflVu3v2lr
 IJYizOfgjJcosjgF4K7hEJLnviow1u7u9S6Odl0XMJ0SJwD+MPkv5KAZLTdkmRa3
 0bjayOtaOn1S3R3b3jXVn8v2FXmHAMTmkvboaSEc2bQ9EuTODgBxwXlm8E6TBdoY
 Btbmm7DU
 =IxVi
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm64 fixes for 5.16, take #1

- Fix the host S2 finalization by solely iterating over the memblocks
  instead of the whole IPA space

- Tighten the return value of kvm_vcpu_preferred_target() now that
  32bit support is long gone

- Make sure the extraction of ESR_ELx.EC is limited to the architected
  bits

- Comment fixups
2021-11-12 16:01:55 -05:00
YueHaibing
08e873cb70 KVM: arm64: Change the return type of kvm_vcpu_preferred_target()
kvm_vcpu_preferred_target() always return 0 because kvm_target_cpu()
never returns a negative error code.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211105011500.16280-1-yuehaibing@huawei.com
2021-11-08 10:48:47 +00:00
Paolo Bonzini
4e33868433 KVM/arm64 updates for Linux 5.16
- More progress on the protected VM front, now with the full
   fixed feature set as well as the limitation of some hypercalls
   after initialisation.
 
 - Cleanup of the RAZ/WI sysreg handling, which was pointlessly
   complicated
 
 - Fixes for the vgic placement in the IPA space, together with a
   bunch of selftests
 
 - More memcg accounting of the memory allocated on behalf of a guest
 
 - Timer and vgic selftests
 
 - Workarounds for the Apple M1 broken vgic implementation
 
 - KConfig cleanups
 
 - New kvmarm.mode=none option, for those who really dislike us
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmF7u5YPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD6w8QAIKDLJCTqkxv5Vh4ZSmtXxg4gTZMBlg8oSQ8
 sVL639aqBvFe3A6Vmz6IwBm+NT7Sm1zxkuH9qHzVR1gmXq0oLYNrIuyrzRW8PvqO
 hIkSRRoVsf03755TmkxwR7/2jAFxb6FhEVAy6VWdQyI44orihIPvMp8aTIq+jvU+
 XoNGb/rPf9HpSUtvuaHYvZhSZBhoi5dRnkr33R1+VR69n7Axs8lm905xcl6Pt0a0
 QqYZWQvFu/BXPyNflG7LUsegRF/iiV2vNTbNNowkzlV5suqxBpJAp6ApDL/gWrHv
 ya/6cMqicSjBIkWnawhXY98w6/5xfzK4IV/zc00FNWOlUdVP89Thqrgc8EkigS9R
 BGcxFFqj41snr+ensSBBIkNtV+dBX52H3rUE0F9seiTXm8QWI86JobdeNadT8tUP
 TXdOeCUcA+cp4Ngln18lsbOEaBkPA5H1po1nUFPHbKnVOxnqXScB7E/xF6rAbryV
 m+Z+oidU7MyS/Ev/Da0ww/XFx7cs2ez9EgeQvjcdFAvUMqS6kcXEExvgGYlm+KRQ
 GBMKPLCNHKdflMANoSpol7MZUmPJ45XoWKW1rntj2r9X+oJW2Z2hEx32xrWDJdqK
 ixnbjog5kNZb0CjLGsUC90lo2hpRJecaLhAjgTLYaNC1QxGPrt92eat6gnwuMTBc
 mpADqi7w
 =qBAO
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.16

- More progress on the protected VM front, now with the full
  fixed feature set as well as the limitation of some hypercalls
  after initialisation.

- Cleanup of the RAZ/WI sysreg handling, which was pointlessly
  complicated

- Fixes for the vgic placement in the IPA space, together with a
  bunch of selftests

- More memcg accounting of the memory allocated on behalf of a guest

- Timer and vgic selftests

- Workarounds for the Apple M1 broken vgic implementation

- KConfig cleanups

- New kvmarm.mode=none option, for those who really dislike us
2021-10-31 02:28:48 -04:00
Marc Zyngier
be08c3cf3c Merge branch kvm-arm64/pkvm/fixed-features into kvmarm-master/next
* kvm-arm64/pkvm/fixed-features: (22 commits)
  : .
  : Add the pKVM fixed feature that allows a bunch of exceptions
  : to either be forbidden or be easily handled at EL2.
  : .
  KVM: arm64: pkvm: Give priority to standard traps over pvm handling
  KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array()
  KVM: arm64: pkvm: Move kvm_handle_pvm_restricted around
  KVM: arm64: pkvm: Consolidate include files
  KVM: arm64: pkvm: Preserve pending SError on exit from AArch32
  KVM: arm64: pkvm: Handle GICv3 traps as required
  KVM: arm64: pkvm: Drop sysregs that should never be routed to the host
  KVM: arm64: pkvm: Drop AArch32-specific registers
  KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI
  KVM: arm64: pkvm: Use a single function to expose all id-regs
  KVM: arm64: Fix early exit ptrauth handling
  KVM: arm64: Handle protected guests at 32 bits
  KVM: arm64: Trap access to pVM restricted features
  KVM: arm64: Move sanitized copies of CPU features
  KVM: arm64: Initialize trap registers for protected VMs
  KVM: arm64: Add handlers for protected VM System Registers
  KVM: arm64: Simplify masking out MTE in feature id reg
  KVM: arm64: Add missing field descriptor for MDCR_EL2
  KVM: arm64: Pass struct kvm to per-EC handlers
  KVM: arm64: Move early handlers to per-EC handlers
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-18 17:20:50 +01:00
Fuad Tabba
2a0c343386 KVM: arm64: Initialize trap registers for protected VMs
Protected VMs have more restricted features that need to be
trapped. Moreover, the host should not be trusted to set the
appropriate trapping registers and their values.

Initialize the trapping registers, i.e., hcr_el2, mdcr_el2, and
cptr_el2 at EL2 for protected guests, based on the values of the
guest's feature id registers.

No functional change intended as trap handlers introduced in the
previous patch are still not hooked in to the guest exit
handlers.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-9-tabba@google.com
2021-10-11 14:57:29 +01:00
Marc Zyngier
b6a68b97af KVM: arm64: Allow KVM to be disabled from the command line
Although KVM can be compiled out of the kernel, it cannot be disabled
at runtime. Allow this possibility by introducing a new mode that
will prevent KVM from initialising.

This is useful in the (limited) circumstances where you don't want
KVM to be available (what is wrong with you?), or when you want
to install another hypervisor instead (good luck with that).

Reviewed-by: David Brazdil <dbrazdil@google.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Scull <ascull@google.com>
Link: https://lore.kernel.org/r/20211001170553.3062988-1-maz@kernel.org
2021-10-11 09:48:47 +01:00
Juergen Gross
78b497f2e6 kvm: use kvfree() in kvm_arch_free_vm()
By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
use the common variant. This can be accomplished by adding another
macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.

Further simplification can be achieved by adding __kvm_arch_free_vm()
doing the common part.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-Id: <20210903130808.30142-5-jgross@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:57 -04:00
Marc Zyngier
7c7b363d62 Merge branch kvm-arm64/pkvm-fixed-features-prologue into kvmarm-master/next
* kvm-arm64/pkvm-fixed-features-prologue:
  : Rework a bunch of common infrastructure as a prologue
  : to Fuad Tabba's protected VM fixed feature series.
  KVM: arm64: Upgrade trace_kvm_arm_set_dreg32() to 64bit
  KVM: arm64: Add config register bit definitions
  KVM: arm64: Add feature register flag definitions
  KVM: arm64: Track value of cptr_el2 in struct kvm_vcpu_arch
  KVM: arm64: Keep mdcr_el2's value as set by __init_el2_debug
  KVM: arm64: Restore mdcr_el2 from vcpu
  KVM: arm64: Refactor sys_regs.h,c for nVHE reuse
  KVM: arm64: Fix names of config register fields
  KVM: arm64: MDCR_EL2 is a 64-bit register
  KVM: arm64: Remove trailing whitespace in comment
  KVM: arm64: placeholder to check if VM is protected

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20 12:23:53 +01:00
Marc Zyngier
ca3385a507 Merge branch kvm-arm64/generic-entry into kvmarm-master/next
Switch KVM/arm64 to the generic entry code, courtesy of Oliver Upton

* kvm-arm64/generic-entry:
  KVM: arm64: Use generic KVM xfer to guest work function
  entry: KVM: Allow use of generic KVM entry w/o full generic support
  KVM: arm64: Record number of signal exits as a vCPU stat

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20 12:23:09 +01:00
Marc Zyngier
3ce5db8a59 Merge branch kvm-arm64/misc-5.15 into kvmarm-master/next
* kvm-arm64/misc-5.15:
  : Misc improvements for 5.15:
  :
  : - Account the number of VMID-wide TLB invalidations as
  :   remote TLB flushes
  : - Fix comments in the VGIC code
  : - Cleanup the PMU IMPDEF identification
  : - Streamline the TGRAN2 usage
  : - Avoid advertising a 52bit IPA range for non-64KB configs
  : - Avoid spurious signalling when a HW-mapped interrupt is in the
  :   A+P state on entry, and in the P state on exit, but that the
  :   physical line is not pending anymore.
  : - Bunch of minor cleanups
  KVM: arm64: vgic: Resample HW pending state on deactivation
  KVM: arm64: vgic: Drop WARN from vgic_get_irq
  KVM: arm64: Drop unused REQUIRES_VIRT
  KVM: arm64: Drop check_kvm_target_cpu() based percpu probe
  KVM: arm64: Drop init_common_resources()
  KVM: arm64: Use ARM64_MIN_PARANGE_BITS as the minimum supported IPA
  arm64/mm: Add remaining ID_AA64MMFR0_PARANGE_ macros
  KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size
  arm64/mm: Define ID_AA64MMFR0_TGRAN_2_SHIFT
  KVM: arm64: perf: Replace '0xf' instances with ID_AA64DFR0_PMUVER_IMP_DEF
  KVM: arm64: Fix comments related to GICv2 PMR reporting
  KVM: arm64: Count VMID-wide TLB invalidations
  arm64/kexec: Test page size support with new TGRAN range values

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20 12:14:56 +01:00
Fuad Tabba
cd496228fd KVM: arm64: Track value of cptr_el2 in struct kvm_vcpu_arch
Track the baseline guest value for cptr_el2 in struct
kvm_vcpu_arch, similar to the other registers that control traps.
Use this value when setting cptr_el2 for the guest.

Currently this value is unchanged (CPTR_EL2_DEFAULT), but future
patches will set trapping bits based on features supported for
the guest.

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-9-tabba@google.com
2021-08-20 11:12:17 +01:00
Fuad Tabba
1460b4b25f KVM: arm64: Restore mdcr_el2 from vcpu
On deactivating traps, restore the value of mdcr_el2 from the
newly created and preserved host value vcpu context, rather than
directly reading the hardware register.

Up until and including this patch the two values are the same,
i.e., the hardware register and the vcpu one. A future patch will
be changing the value of mdcr_el2 on activating traps, and this
ensures that its value will be restored.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-7-tabba@google.com
2021-08-20 11:12:17 +01:00
Fuad Tabba
d6c850dd6c KVM: arm64: MDCR_EL2 is a 64-bit register
Fix the places in KVM that treat MDCR_EL2 as a 32-bit register.
More recent features (e.g., FEAT_SPEv1p2) use bits above 31.

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-4-tabba@google.com
2021-08-20 11:12:17 +01:00
Fuad Tabba
2ea7f65580 KVM: arm64: placeholder to check if VM is protected
Add a function to check whether a VM is protected (under pKVM).
Since the creation of protected VMs isn't enabled yet, this is a
placeholder that always returns false. The intention is for this
to become a check for protected VMs in the future (see Will's RFC).

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/kvmarm/20210603183347.1695-1-will@kernel.org/
Link: https://lore.kernel.org/r/20210817081134.2918285-2-tabba@google.com
2021-08-20 11:12:05 +01:00
Oliver Upton
fe5161d2c3 KVM: arm64: Record number of signal exits as a vCPU stat
Most other architectures that implement KVM record a statistic
indicating the number of times a vCPU has exited due to a pending
signal. Add support for that stat to arm64.

Reviewed-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210802192809.1851010-2-oupton@google.com
2021-08-19 11:19:41 +01:00
Anshuman Khandual
6b7982fefc KVM: arm64: Drop check_kvm_target_cpu() based percpu probe
kvm_target_cpu() never returns a negative error code, so check_kvm_target()
would never have 'ret' filled with a negative error code. Hence the percpu
probe via check_kvm_target_cpu() does not make sense as its never going to
find an unsupported CPU, forcing kvm_arch_init() to exit early. Hence lets
just drop this percpu probe (and also check_kvm_target_cpu()) altogether.

While here, this also changes kvm_target_cpu() return type to a u32, making
it explicit that an error code will not be returned from this function.

Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-kernel@vger.kernel.org
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1628744994-16623-5-git-send-email-anshuman.khandual@arm.com
2021-08-18 09:26:07 +01:00
Marc Zyngier
7a3ba3095a KVM: arm64: Remove PMSWINC_EL0 shadow register
We keep an entry for the PMSWINC_EL0 register in the vcpu structure,
while *never* writing anything there outside of reset.

Given that the register is defined as write-only, that we always
trap when this register is accessed, there is little point in saving
anything anyway.

Get rid of the entry, and save a mighty 8 bytes per vcpu structure.

We still need to keep it exposed to userspace in order to preserve
backward compatibility with previously saved VMs. Since userspace
cannot expect any effect of writing to PMSWINC_EL0, treat the
register as RAZ/WI for the purpose of userspace access.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210719123902.1493805-5-maz@kernel.org
2021-08-02 14:26:34 +01:00
Paolo Bonzini
b8917b4ae4 KVM/arm64 updates for v5.14.
- Add MTE support in guests, complete with tag save/restore interface
 - Reduce the impact of CMOs by moving them in the page-table code
 - Allow device block mappings at stage-2
 - Reduce the footprint of the vmemmap in protected mode
 - Support the vGIC on dumb systems such as the Apple M1
 - Add selftest infrastructure to support multiple configuration
   and apply that to PMU/non-PMU setups
 - Add selftests for the debug architecture
 - The usual crop of PMU fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmDV2bEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDEr8P/ivwROx5NwGcHGmU5RfUCT3aFqhtVHHwD/lu
 jPcgoO61kz9TelOu6QRaVuK+mVHxcq3iP4R8nPq/QCkUlEXTmK2xkyhXhGXSYpH4
 6jM8+BbC3eG7iAxx6H0UM4JTl4Riwat6ZZtXpWEWs9TKqOHOQYFpMkxSttwVZ1CZ
 SjbtFvXLEdzKn6PzUWnKdBNMV/mHsdAtohZit9oJOc4ttc8072XxETQ4TFQ+MSvA
 j9zY9QPmWzgcZnotqRRu9sbTGO2vxtXuUtY3sjdD8+C9OgSe9qvpnNjymcmfwaMu
 1fBkfh65oaO4ItJBdGOUOoEcFqwN5imPiI7CB/O+ZYkO9sBCuTUPSQwPkyiwXb9r
 bUkTaQw2nZiNWsqR1x07fQ2sGYbMp5mnmgmqiV4MUWkLmFp9LZATCWYTTn24cBNS
 6SjVP6/8S0r3EhLnYjH0Pn1we5PooU1EF6RlCAd3ewYoo+9fPnwjNYwIWH5i5wB7
 +tnei44NACAw9cfbos+BYQQ/dY15OSFzLzIMomlabB7OpXOdDg3H6tJnPbFwWwXb
 9nF8XdHqxeDVVVrDCAx1BSodSXm9xqgnQM2RDGTUnpVcAfqAr3MXX6VsyKQDzj8T
 QXF9qOVCBAABv6BXAvSQ6mvMJZDUVbUPEPhf7kXzF46JsRd6A7wWoU/OnMGHQ/w7
 wjvH8HVy
 =fWBV
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for v5.14.

- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
  and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
2021-06-25 11:24:24 -04:00
Jing Zhang
0193cc908b KVM: stats: Separate generic stats from architecture specific ones
Generic KVM stats are those collected in architecture independent code
or those supported by all architectures; put all generic statistics in
a separate structure.  This ensures that they are defined the same way
in the statistics API which is being added, removing duplication among
different architectures in the declaration of the descriptors.

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-2-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 11:47:56 -04:00
Marc Zyngier
9f03db6673 Merge branch kvm-arm64/mmu/mte into kvmarm-master/next
KVM/arm64 support for MTE, courtesy of Steven Price.
It allows the guest to use memory tagging, and offers
a new userspace API to save/restore the tags.

* kvm-arm64/mmu/mte:
  KVM: arm64: Document MTE capability and ioctl
  KVM: arm64: Add ioctl to fetch/store tags in a guest
  KVM: arm64: Expose KVM_ARM_CAP_MTE
  KVM: arm64: Save/restore MTE registers
  KVM: arm64: Introduce MTE VM feature
  arm64: mte: Sync tags for pages where PTE is untagged

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-22 15:09:34 +01:00
Steven Price
f0376edb1d KVM: arm64: Add ioctl to fetch/store tags in a guest
The VMM may not wish to have it's own mapping of guest memory mapped
with PROT_MTE because this causes problems if the VMM has tag checking
enabled (the guest controls the tags in physical RAM and it's unlikely
the tags are correct for the VMM).

Instead add a new ioctl which allows the VMM to easily read/write the
tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
while the VMM can still read/write the tags for the purpose of
migration.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210621111716.37157-6-steven.price@arm.com
2021-06-22 14:08:06 +01:00
Steven Price
e1f358b504 KVM: arm64: Save/restore MTE registers
Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210621111716.37157-4-steven.price@arm.com
2021-06-22 14:08:05 +01:00
Steven Price
ea7fc1bb1c KVM: arm64: Introduce MTE VM feature
Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This will expose the feature to the guest and automatically
tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
storage) to ensure that the guest cannot see stale tags, and so that
the tags are correctly saved/restored across swap.

Actually exposing the new capability to user space happens in a later
patch.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
[maz: move VM_SHARED sampling into the critical section]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210621111716.37157-3-steven.price@arm.com
2021-06-22 14:08:05 +01:00
Marc Zyngier
d0c94c4979 KVM: arm64: Restore PMU configuration on first run
Restoring a guest with an active virtual PMU results in no perf
counters being instanciated on the host side. Not quite what
you'd expect from a restore.

In order to fix this, force a writeback of PMCR_EL0 on the first
run of a vcpu (using a new request so that it happens once the
vcpu has been loaded). This will in turn create all the host-side
counters that were missing.

Reported-by: Jinank Jain <jinankj@amazon.de>
Tested-by: Jinank Jain <jinankj@amazon.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/87wnrbylxv.wl-maz@kernel.org
Link: https://lore.kernel.org/r/b53dfcf9bbc4db7f96154b1cd5188d72b9766358.camel@amazon.de
2021-06-18 14:18:37 +01:00
Paolo Bonzini
e3cb6fa0e2 KVM: switch per-VM stats to u64
Make them the same type as vCPU stats.  There is no reason
to limit the counters to unsigned long.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 14:25:27 -04:00
Paolo Bonzini
c4f71901d5 KVM/arm64 updates for Linux 5.13
New features:
 
 - Stage-2 isolation for the host kernel when running in protected mode
 - Guest SVE support when running in nVHE mode
 - Force W^X hypervisor mappings in nVHE mode
 - ITS save/restore for guests using direct injection with GICv4.1
 - nVHE panics now produce readable backtraces
 - Guest support for PTP using the ptp_kvm driver
 - Performance improvements in the S2 fault handler
 - Alexandru is now a reviewer (not really a new feature...)
 
 Fixes:
 - Proper emulation of the GICR_TYPER register
 - Handle the complete set of relocation in the nVHE EL2 object
 - Get rid of the oprofile dependency in the PMU code (and of the
   oprofile body parts at the same time)
 - Debug and SPE fixes
 - Fix vcpu reset
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCCpuAPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD2G8QALWQYeBggKnNmAJfuihzZ2WariBmgcENs2R2
 qNZ/Py6dIF+b69P68nmgrEV1x2Kp35cPJbBwXnnrS4FCB5tk0b8YMaj00QbiRIYV
 UXbPxQTmYO1KbevpoEcw8NmR4bZJ/hRYPuzcQG7CCMKIZw0zj2cMcBofzQpTOAp/
 CgItdcv7at3iwamQatfU9vUmC0nDdnjdIwSxTAJOYMVV1ENwtnYSNgZVo4XLTg7n
 xR/5Qx27PKBJw7GyTRAIIxKAzNXG2tDL+GVIHe4AnRp3z3La8sr6PJf7nz9MCmco
 ISgeY7EGQINzmm4LahpnV+2xwwxOWo8QotxRFGNuRTOBazfARyAbp97yJ6eXJUpa
 j0qlg3xK9neyIIn9BQKkKx4sY9V45yqkuVDsK6odmqPq3EE01IMTRh1N/XQi+sTF
 iGrlM3ZW4AjlT5zgtT9US/FRXeDKoYuqVCObJeXZdm3sJSwEqTAs0JScnc0YTsh7
 m30CODnomfR2y5X6GoaubbQ0wcZ2I20K1qtIm+2F6yzD5P1/3Yi8HbXMxsSWyYWZ
 1ldoSa+ZUQlzV9Ot0S3iJ4PkphLKmmO96VlxE2+B5gQG50PZkLzsr8bVyYOuJC8p
 T83xT9xd07cy+FcGgF9veZL99Y6BLHMa6ZwFUolYNbzJxqrmqyR1aiJMEBIcX+aP
 ACeKW1w5
 =fpey
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.13

New features:

- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)

Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
  oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
2021-04-23 07:41:17 -04:00
Sean Christopherson
b4c5936c47 KVM: Kill off the old hva-based MMU notifier callbacks
Yank out the hva-based MMU notifier APIs now that all architectures that
use the notifiers have moved to the gfn-based APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:08 -04:00
Sean Christopherson
cd4c718352 KVM: arm64: Convert to the gfn-based MMU notifier callbacks
Move arm64 to the gfn-base MMU notifier APIs, which do the hva->gfn
lookup in common code.

No meaningful functional change intended, though the exact order of
operations is slightly different since the memslot lookups occur before
calling into arch code.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:07 -04:00
Maxim Levitsky
fa18aca927 KVM: aarch64: implement KVM_CAP_SET_GUEST_DEBUG2
Move KVM_GUESTDBG_VALID_MASK to kvm_host.h
and use it to return the value of this capability.
Compile tested only.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401135451.1004564-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:03 -04:00
Sean Christopherson
5f7c292b89 KVM: Move prototypes for MMU notifier callbacks to generic code
Move the prototypes for the MMU notifier callbacks out of arch code and
into common code.  There is no benefit to having each arch replicate the
prototypes since any deviation from the invocation in common code will
explode.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:30:55 -04:00
Marc Zyngier
3d63ef4d52 Merge branch 'kvm-arm64/memslot-fixes' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:35:40 +01:00
Marc Zyngier
ac5ce2456e Merge branch 'kvm-arm64/host-stage2' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:35:09 +01:00
Marc Zyngier
fbb31e5f3a Merge branch 'kvm-arm64/debug-5.13' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:34:15 +01:00
Alexandru Elisei
263d6287da KVM: arm64: Initialize VCPU mdcr_el2 before loading it
When a VCPU is created, the kvm_vcpu struct is initialized to zero in
kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
result is that KVM executes for a short time with MDCR_EL2 set to zero.

This has several unintended consequences:

* Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
  DDI 0487G.a, page D13-3820. The behavior specified by the architecture
  in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
  value less than or equal to PMCR_EL0.N, which means that an unknown
  number of counters are now disabled by MDCR_EL2.HPME, which is zero.

* The host configuration for the other debug features controlled by
  MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
  doesn't use the other fields, but that might change in the future.

Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
has a consistent value after each vcpu_load().

Fixes: d5a21bcc29 ("KVM: arm64: Move common VHE/non-VHE trap config in separate functions")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210407144857.199746-3-alexandru.elisei@arm.com
2021-04-07 16:40:03 +01:00
Gavin Shan
eab6214847 KVM: arm64: Hide kvm_mmu_wp_memory_region()
We needn't expose the function as it's only used by mmu.c since it
was introduced by commit c64735554c ("KVM: arm: Add initial dirty
page locking support").

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210316041126.81860-2-gshan@redhat.com
2021-04-07 14:33:22 +01:00
Suzuki K Poulose
a1319260bf arm64: KVM: Enable access to TRBE support for host
For a nvhe host, the EL2 must allow the EL1&0 translation
regime for TraceBuffer (MDCR_EL2.E2TB == 0b11). This must
be saved/restored over a trip to the guest. Also, before
entering the guest, we must flush any trace data if the
TRBE was enabled. And we must prohibit the generation
of trace while we are in EL1 by clearing the TRFCR_EL1.

For vhe, the EL2 must prevent the EL1 access to the Trace
Buffer.

The MDCR_EL2 bit definitions for TRBE are available here :

  https://developer.arm.com/documentation/ddi0601/2020-12/AArch64-Registers/

Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405164307.1720226-8-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-04-06 16:05:28 -06:00
Suzuki K Poulose
d2602bb4f5 KVM: arm64: Move SPE availability check to VCPU load
At the moment, we check the availability of SPE on the given
CPU (i.e, SPE is implemented and is allowed at the host) during
every guest entry. This can be optimized a bit by moving the
check to vcpu_load time and recording the availability of the
feature on the current CPU via a new flag. This will also be useful
for adding the TRBE support.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Alexandru Elisei <Alexandru.Elisei@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405164307.1720226-7-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-04-06 16:05:20 -06:00
Marc Zyngier
7c4199375a KVM: arm64: Drop the CPU_FTR_REG_HYP_COPY infrastructure
Now that the read_ctr macro has been specialised for nVHE,
the whole CPU_FTR_REG_HYP_COPY infrastrcture looks completely
overengineered.

Simplify it by populating the two u64 quantities (MMFR0 and 1)
that the hypervisor need.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-25 11:01:03 +00:00
Quentin Perret
cfb1a98de7 KVM: arm64: Use kvm_arch in kvm_s2_mmu
In order to make use of the stage 2 pgtable code for the host stage 2,
change kvm_s2_mmu to use a kvm_arch pointer in lieu of the kvm pointer,
as the host will have the former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-21-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
f320bc742b KVM: arm64: Prepare the creation of s1 mappings at EL2
When memory protection is enabled, the EL2 code needs the ability to
create and manage its own page-table. To do so, introduce a new set of
hypercalls to bootstrap a memory management system at EL2.

This leads to the following boot flow in nVHE Protected mode:

 1. the host allocates memory for the hypervisor very early on, using
    the memblock API;

 2. the host creates a set of stage 1 page-table for EL2, installs the
    EL2 vectors, and issues the __pkvm_init hypercall;

 3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
    and stores it in the memory pool provided by the host;

 4. the hypervisor then extends its stage 1 mappings to include a
    vmemmap in the EL2 VA space, hence allowing to use the buddy
    allocator introduced in a previous patch;

 5. the hypervisor jumps back in the idmap page, switches from the
    host-provided page-table to the new one, and wraps up its
    initialization by enabling the new allocator, before returning to
    the host.

 6. the host can free the now unused page-table created for EL2, and
    will now need to issue hypercalls to make changes to the EL2 stage 1
    mappings instead of modifying them directly.

Note that for the sake of simplifying the review, this patch focuses on
the hypervisor side of things. In other words, this only implements the
new hypercalls, but does not make use of them from the host yet. The
host-side changes will follow in a subsequent patch.

Credits to Will for __pkvm_init_switch_pgd.

Acked-by: Will Deacon <will@kernel.org>
Co-authored-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-18-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
7a440cc783 KVM: arm64: Enable access to sanitized CPU features at EL2
Introduce the infrastructure in KVM enabling to copy CPU feature
registers into EL2-owned data-structures, to allow reading sanitised
values directly at EL2 in nVHE.

Given that only a subset of these features are being read by the
hypervisor, the ones that need to be copied are to be listed under
<asm/kvm_cpufeature.h> together with the name of the nVHE variable that
will hold the copy. This introduces only the infrastructure enabling
this copy. The first users will follow shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-14-qperret@google.com
2021-03-19 12:01:20 +00:00
Quentin Perret
40a50853d3 KVM: arm64: Make kvm_call_hyp() a function call at Hyp
kvm_call_hyp() has some logic to issue a function call or a hypercall
depending on the EL at which the kernel is running. However, all the
code compiled under __KVM_NVHE_HYPERVISOR__ is guaranteed to only run
at EL2 which allows us to simplify.

Add ifdefery to kvm_host.h to simplify kvm_call_hyp() in .hyp.text.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-9-qperret@google.com
2021-03-19 12:01:20 +00:00
Daniel Kiss
6e94095c55 KVM: arm64: Enable SVE support for nVHE
Now that KVM is equipped to deal with SVE on nVHE, remove the code
preventing it from being used as well as the bits of documentation
that were mentioning the incompatibility.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 14:23:19 +00:00
Marc Zyngier
468f3477ef KVM: arm64: Introduce vcpu_sve_vq() helper
The KVM code contains a number of "sve_vq_from_vl(vcpu->arch.sve_max_vl)"
instances, and we are about to add more.

Introduce vcpu_sve_vq() as a shorthand for this expression.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 11:24:10 +00:00
Marc Zyngier
985d3a1bea KVM: arm64: Let vcpu_sve_pffr() handle HYP VAs
The vcpu_sve_pffr() returns a pointer, which can be an interesting
thing to do on nVHE. Wrap the pointer with kern_hyp_va(), and
take this opportunity to remove the unnecessary casts (sve_state
being a void *).

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 11:24:00 +00:00
Paolo Bonzini
8c6e67bec3 KVM/arm64 updates for Linux 5.12
- Make the nVHE EL2 object relocatable, resulting in much more
   maintainable code
 - Handle concurrent translation faults hitting the same page
   in a more elegant way
 - Support for the standard TRNG hypervisor call
 - A bunch of small PMU/Debug fixes
 - Allow the disabling of symbol export from assembly code
 - Simplification of the early init hypercall handling
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmAmjqEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDoUEQAIrJ7YF4v4gz06a0HG9+b6fbmykHyxlG7jfm
 trvctfaiKzOybKoY5odPpNFzhbYOOdXXqYipyTHGwBYtGSy9G/9SjMKSUrfln2Ni
 lr1wBqapr9TE+SVKoR8pWWuZxGGbHVa7brNuMbMsMi1wwAsM2/n70H9PXrdq3QiK
 Ge1DWLso2oEfhtTwqNKa4dwB2MHjBhBFhhq+Nq5pslm6mmxJaYqz7pyBmw/C+2cc
 oU/6kpAa1yPAauptWXtYXJYOMHihxgEa1IdK3Gl0hUyFyu96xVkwH/KFsj+bRs23
 QGGCSdy4313hzaoGaSOTK22R98Aeg0wI9a6tcCBvVVjTAztnlu1FPtUZr8e/F7uc
 +r8xVJUJFiywt3Zktf/D7YDK9LuMMqFnj0BkI4U9nIBY59XZRNhENsBCmjru5lnL
 iXa5cuta03H4emfssIChLpgn0XHFas6t5dFXBPGbXyw0qsQchTw98iQX9LVxefUK
 rOUGPIN4nE9ESRIZe0SPlAVeCtNP8cLH7+0YG9MJ1QeDVYaUsnvy9Ln/ox+514mR
 5y2KJ6y7xnLB136SKCzPDDloYtz7BDiJq6a/RPiXKGheKoxy+N+BSe58yWCqFZYE
 Fx/cGUr7oSg39U7gCboog6BDp5e2CXBfbRllg6P47bZFfdPNwzNEzHvk49VltMxx
 Rl2W05bk
 =6EwV
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.12

- Make the nVHE EL2 object relocatable, resulting in much more
  maintainable code
- Handle concurrent translation faults hitting the same page
  in a more elegant way
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Allow the disabling of symbol export from assembly code
- Simplification of the early init hypercall handling
2021-02-12 11:23:44 -05:00
Vitaly Kuznetsov
4fc096a99e KVM: Raise the maximum number of user memslots
Current KVM_USER_MEM_SLOTS limits are arch specific (512 on Power, 509 on x86,
32 on s390, 16 on MIPS) but they don't really need to be. Memory slots are
allocated dynamically in KVM when added so the only real limitation is
'id_to_index' array which is 'short'. We don't have any other
KVM_MEM_SLOTS_NUM/KVM_USER_MEM_SLOTS-sized statically defined structures.

Low KVM_USER_MEM_SLOTS can be a limiting factor for some configurations.
In particular, when QEMU tries to start a Windows guest with Hyper-V SynIC
enabled and e.g. 256 vCPUs the limit is hit as SynIC requires two pages per
vCPU and the guest is free to pick any GFN for each of them, this fragments
memslots as QEMU wants to have a separate memslot for each of these pages
(which are supposed to act as 'overlay' pages).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210127175731.2020089-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09 08:17:08 -05:00
Ard Biesheuvel
a8e190cdae KVM: arm64: Implement the TRNG hypervisor call
Provide a hypervisor implementation of the ARM architected TRNG firmware
interface described in ARM spec DEN0098. All function IDs are implemented,
including both 32-bit and 64-bit versions of the TRNG_RND service, which
is the centerpiece of the API.

The API is backed by the kernel's entropy pool only, to avoid guests
draining more precious direct entropy sources.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
[Andre: minor fixes, drop arch_get_random() usage]
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210106103453.152275-6-andre.przywara@arm.com
2021-01-25 22:19:31 +00:00
Marc Zyngier
767c973f2e KVM: arm64: Declutter host PSCI 0.1 handling
Although there is nothing wrong with the current host PSCI relay
implementation, we can clean it up and remove some of the helpers
that do not improve the overall readability of the legacy PSCI 0.1
handling.

Opportunity is taken to turn the bitmap into a set of booleans,
and creative use of preprocessor macros make init and check
more concise/readable.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-12-22 12:56:44 +00:00
David Brazdil
61fe0c37af KVM: arm64: Minor cleanup of hyp variables used in host
Small cleanup moving declarations of hyp-exported variables to
kvm_host.h and using macros to avoid having to refer to them with
kvm_nvhe_sym() in host.

No functional change intended.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201208142452.87237-5-dbrazdil@google.com
2020-12-22 10:49:01 +00:00
David Brazdil
ff367fe473 KVM: arm64: Prevent use of invalid PSCI v0.1 function IDs
PSCI driver exposes a struct containing the PSCI v0.1 function IDs
configured in the DT. However, the struct does not convey the
information whether these were set from DT or contain the default value
zero. This could be a problem for PSCI proxy in KVM protected mode.

Extend config passed to KVM with a bit mask with individual bits set
depending on whether the corresponding function pointer in psci_ops is
set, eg. set bit for PSCI_CPU_SUSPEND if psci_ops.cpu_suspend != NULL.

Previously config was split into multiple global variables. Put
everything into a single struct for convenience.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201208142452.87237-2-dbrazdil@google.com
2020-12-22 10:47:59 +00:00
Marc Zyngier
3a514592b6 Merge remote-tracking branch 'origin/kvm-arm64/psci-relay' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-12-09 10:00:24 +00:00
David Brazdil
3eb681fba2 KVM: arm64: Add ARM64_KVM_PROTECTED_MODE CPU capability
Expose the boolean value whether the system is running with KVM in
protected mode (nVHE + kernel param). CPU capability was selected over
a global variable to allow use in alternatives.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-3-dbrazdil@google.com
2020-12-04 08:44:19 +00:00
David Brazdil
d8b369c4e3 KVM: arm64: Add kvm-arm.mode early kernel parameter
Add an early parameter that allows users to select the mode of operation
for KVM/arm64.

For now, the only supported value is "protected". By passing this flag
users opt into the hypervisor placing additional restrictions on the
host kernel. These allow the hypervisor to spawn guests whose state is
kept private from the host. Restrictions will include stage-2 address
translation to prevent host from accessing guest memory, filtering its
SMC calls, etc.

Without this parameter, the default behaviour remains selecting VHE/nVHE
based on hardware support and CONFIG_ARM64_VHE.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-2-dbrazdil@google.com
2020-12-04 08:43:43 +00:00
Marc Zyngier
f86e54653e Merge remote-tracking branch 'origin/kvm-arm64/csv3' into kvmarm-master/queue
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-12-03 19:12:24 +00:00
Marc Zyngier
4f1df628d4 KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV3=1 if the CPUs are Meltdown-safe
Cores that predate the introduction of ID_AA64PFR0_EL1.CSV3 to
the ARMv8 architecture have this field set to 0, even of some of
them are not affected by the vulnerability.

The kernel maintains a list of unaffected cores (A53, A55 and a few
others) so that it doesn't impose an expensive mitigation uncessarily.

As we do for CSV2, let's expose the CSV3 property to guests that run
on HW that is effectively not vulnerable. This can be reset to zero
by writing to the ID register from userspace, ensuring that VMs can
be migrated despite the new property being set.

Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-30 16:02:53 +00:00
Marc Zyngier
90f0e16c64 Merge branch 'kvm-arm64/misc-5.11' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-27 19:48:24 +00:00
Will Deacon
bf118a5cb7 KVM: arm64: Remove unused __extended_idmap_trampoline() prototype
__extended_idmap_trampoline() was removed a long time ago by
3421e9d88d ("arm64: KVM: Simplify HYP init/teardown") so remove the
unused function prototype.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201118194402.2892-4-will@kernel.org
2020-11-27 18:59:05 +00:00
Will Deacon
36fb4cd55f KVM: arm64: Remove kvm_arch_vm_ioctl_check_extension()
kvm_arch_vm_ioctl_check_extension() is only called from
kvm_vm_ioctl_check_extension(), so we can inline it and remove the extra
function.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201118194402.2892-3-will@kernel.org
2020-11-27 18:59:05 +00:00
Will Deacon
8d14797b53 KVM: arm64: Move 'struct kvm_arch_memory_slot' out of uapi/
'struct kvm_arch_memory_slot' isn't part of the user ABI, so move it out
of the uapi/ headers in case we start using it in future and accidentally
back ourselves into a corner.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201118194402.2892-2-will@kernel.org
2020-11-27 18:59:05 +00:00
Marc Zyngier
6e5d8c713d Merge branch 'kvm-arm64/pmu-undef' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-27 11:46:47 +00:00
Marc Zyngier
14bda7a927 KVM: arm64: Add kvm_vcpu_has_pmu() helper
There are a number of places where we check for the KVM_ARM_VCPU_PMU_V3
feature. Wrap this check into a new kvm_vcpu_has_pmu(), and use
it at the existing locations.

No functional change.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-27 11:39:14 +00:00
Marc Zyngier
149f120edb Merge branch 'kvm-arm64/copro-no-more' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-27 11:33:16 +00:00
Marc Zyngier
37da329ed6 Merge branch 'kvm-arm64/el2-pc' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-27 11:33:10 +00:00
Marc Zyngier
23711a5e66 KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace
We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts
that are immune to Spectre-v2, but that don't have this field set,
most likely because they predate the specification.

However, this prevents the migration of guests that have started on
a host the doesn't fake this CSV2 setting to one that does, as KVM
rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't
what is already there.

In order to fix this, allow userspace to set this field as long as
this doesn't result in a promising more than what is already there
(setting CSV2 to 0 is acceptable, but setting it to 1 when it is
already set to 0 isn't).

Fixes: e1026237f9 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2")
Reported-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
2020-11-12 21:22:22 +00:00
Marc Zyngier
6ac4a5ac50 KVM: arm64: Drop kvm_coproc.h
kvm_coproc.h used to serve as a compatibility layer for the files
shared between the 32 and 64 bit ports.

Another one bites the dust...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 11:22:52 +00:00
Marc Zyngier
5f7e02aebd KVM: arm64: Drop legacy copro shadow register
Finally remove one of the biggest 32bit legacy: the copro shadow
mapping. We won't missit.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 11:22:52 +00:00
Marc Zyngier
1da42c34d7 KVM: arm64: Map AArch32 cp14 register to AArch64 sysregs
Similarly to what has been done on the cp15 front, repaint the
debug registers to use their AArch64 counterparts. This results
in some simplification as we can remove the 32bit-specific
accessors.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 11:22:51 +00:00
Marc Zyngier
4ff3fc316d KVM: arm64: Move AArch32 exceptions over to AArch64 sysregs
The use of the AArch32-specific accessors have always been a bit
annoying on 64bit, and it is time for a change.

Let's move the AArch32 exception injection over to the AArch64 encoding,
which requires us to split the two halves of FAR_EL1 into DFAR and IFAR.
This enables us to drop the preempt_disable() games on VHE, and to kill
the last user of the vcpu_cp15() macro.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 11:22:51 +00:00
Marc Zyngier
ca4e514774 KVM: arm64: Introduce handling of AArch32 TTBCR2 traps
ARMv8.2 introduced TTBCR2, which shares TCR_EL1 with TTBCR.
Gracefully handle traps to this register when HCR_EL2.TVM is set.

Cc: stable@vger.kernel.org
Reported-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 11:22:50 +00:00
Marc Zyngier
e650b64f1a KVM: arm64: Add basic hooks for injecting exceptions from EL2
Add the basic infrastructure to describe injection of exceptions
into a guest. So far, nothing uses this code path.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 08:34:25 +00:00
Marc Zyngier
21c810017c KVM: arm64: Move VHE direct sysreg accessors into kvm_host.h
As we are about to need to access system registers from the HYP
code based on their internal encoding, move the direct sysreg
accessors to a common include file, with a VHE-specific guard.

No functionnal change.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 08:34:25 +00:00
Marc Zyngier
cdb5e02ed1 KVM: arm64: Make kvm_skip_instr() and co private to HYP
In an effort to remove the vcpu PC manipulations from EL1 on nVHE
systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
to increment PC post emulation is now signalled via a flag in the
vcpu structure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-10 08:34:24 +00:00
Marc Zyngier
4a1c2c7f63 KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
The DBGD{CCINT,SCRext} and DBGVCR register entries in the cp14 array
are missing their target register, resulting in all accesses being
targetted at the guard sysreg (indexed by __INVALID_SYSREG__).

Point the emulation code at the actual register entries.

Fixes: bdfb4b389c ("arm64: KVM: add trap handlers for AArch32 debug registers")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201029172409.2768336-1-maz@kernel.org
2020-10-29 19:49:03 +00:00
Marc Zyngier
14ef9d0492 Merge branch 'kvm-arm64/hyp-pcpu' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-30 14:05:35 +01:00
Marc Zyngier
816c347f3a Merge remote-tracking branch 'arm64/for-next/ghostbusters' into kvm-arm64/hyp-pcpu
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-30 09:48:30 +01:00
David Brazdil
2a1198c9b4 kvm: arm64: Create separate instances of kvm_host_data for VHE/nVHE
Host CPU context is stored in a global per-cpu variable `kvm_host_data`.
In preparation for introducing independent per-CPU region for nVHE hyp,
create two separate instances of `kvm_host_data`, one for VHE and one
for nVHE.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-9-dbrazdil@google.com
2020-09-30 08:37:13 +01:00
Marc Zyngier
7311467702 KVM: arm64: Get rid of kvm_arm_have_ssbd()
kvm_arm_have_ssbd() is now completely unused, get rid of it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:17 +01:00
Will Deacon
d4647f0a2a arm64: Rewrite Spectre-v2 mitigation code
The Spectre-v2 mitigation code is pretty unwieldy and hard to maintain.
This is largely due to it being written hastily, without much clue as to
how things would pan out, and also because it ends up mixing policy and
state in such a way that it is very difficult to figure out what's going
on.

Rewrite the Spectre-v2 mitigation so that it clearly separates state from
policy and follows a more structured approach to handling the mitigation.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:15 +01:00
Marc Zyngier
2e02cbb236 Merge branch 'kvm-arm64/pmu-5.9' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-29 15:12:54 +01:00
Marc Zyngier
d7eec2360e KVM: arm64: Add PMU event filtering infrastructure
It can be desirable to expose a PMU to a guest, and yet not want the
guest to be able to count some of the implemented events (because this
would give information on shared resources, for example.

For this, let's extend the PMUv3 device API, and offer a way to setup a
bitmap of the allowed events (the default being no bitmap, and thus no
filtering).

Userspace can thus allow/deny ranges of event. The default policy
depends on the "polarity" of the first filter setup (default deny if the
filter allows events, and default allow if the filter denies events).
This allows to setup exactly what is allowed for a given guest.

Note that although the ioctl is per-vcpu, the map of allowed events is
global to the VM (it can be setup from any vcpu until the vcpu PMU is
initialized).

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-29 14:19:39 +01:00
Marc Zyngier
fd65a3b5f8 KVM: arm64: Use event mask matching architecture revision
The PMU code suffers from a small defect where we assume that the event
number provided by the guest is always 16 bit wide, even if the CPU only
implements the ARMv8.0 architecture. This isn't really problematic in
the sense that the event number ends up in a system register, cropping
it to the right width, but still this needs fixing.

In order to make it work, let's probe the version of the PMU that the
guest is going to use. This is done by temporarily creating a kernel
event and looking at the PMUVer field that has been saved at probe time
in the associated arm_pmu structure. This in turn gets saved in the kvm
structure, and subsequently used to compute the event mask that gets
used throughout the PMU code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-29 14:19:38 +01:00
Marc Zyngier
81867b75db Merge branch 'kvm-arm64/nvhe-hyp-context' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-16 10:59:17 +01:00
Andrew Scull
04e4caa8d3 KVM: arm64: nVHE: Migrate hyp-init to SMCCC
To complete the transition to SMCCC, the hyp initialization is given a
function ID. This looks neater than comparing the hyp stub function IDs
to the page table physical address.

Some care is taken to only clobber x0-3 before the host context is saved
as only those registers can be clobbered accoring to SMCCC. Fortunately,
only a few acrobatics are needed. The possible new tpidr_el2 is moved to
the argument in x2 so that it can be stashed in tpidr_el2 early to free
up a scratch register. The page table configuration then makes use of
x0-2.

Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-19-ascull@google.com
2020-09-15 18:39:04 +01:00
Andrew Scull
054698316d KVM: arm64: nVHE: Migrate hyp interface to SMCCC
Rather than passing arbitrary function pointers to run at hyp, define
and equivalent set of SMCCC functions.

Since the SMCCC functions are strongly tied to the original function
prototypes, it is not expected for the host to ever call an invalid ID
but a warning is raised if this does ever occur.

As __kvm_vcpu_run is used for every switch between the host and a guest,
it is explicitly singled out to be identified before the other function
IDs to improve the performance of the hot path.

Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-18-ascull@google.com
2020-09-15 18:39:04 +01:00
Andrew Scull
d7ca1079d8 KVM: arm64: Remove kvm_host_data_t typedef
The kvm_host_data_t typedef is used inconsistently and goes against the
kernel's coding style. Remove it in favour of the full struct specifier.

Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-4-ascull@google.com
2020-09-15 18:39:01 +01:00
Marc Zyngier
ae8bd85ca8 Merge branch 'kvm-arm64/pt-new' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-11 15:54:30 +01:00
Will Deacon
74cfa7ea66 KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'
The stage-2 page-tables are entirely encapsulated by the 'pgt' field of
'struct kvm_s2_mmu', so remove the unused 'pgd' field.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-21-will@kernel.org
2020-09-11 15:51:15 +01:00
Will Deacon
71233d05f4 KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables
Introduce alloc() and free() functions to the generic page-table code
for guest stage-2 page-tables and plumb these into the existing KVM
page-table allocator. Subsequent patches will convert other operations
within the KVM allocator over to the generic code.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-6-will@kernel.org
2020-09-11 15:51:13 +01:00
Will Deacon
fdfe7cbd58 KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 18:03:47 -04:00
Andrew Jones
004a01241c arm64/x86: KVM: Introduce steal-time cap
arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
support for steal-time. However this is unnecessary, as only a KVM
fd is required, and it complicates userspace (userspace may prefer
delaying vcpu creation until after feature probing). Introduce a cap
that can be checked instead. While x86 can already probe steal-time
support with a kvm fd (KVM_GET_SUPPORTED_CPUID), we add the cap there
too for consistency.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20200804170604.42662-7-drjones@redhat.com
2020-08-21 14:05:19 +01:00
Andrew Jones
53f985584e KVM: arm64: pvtime: Fix stolen time accounting across migration
When updating the stolen time we should always read the current
stolen time from the user provided memory, not from a kernel
cache. If we use a cache then we'll end up resetting stolen time
to zero on the first update after migration.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200804170604.42662-5-drjones@redhat.com
2020-08-21 14:04:14 +01:00
Paolo Bonzini
0378daef0c KVM/arm64 updates for Linux 5.9:
- Split the VHE and nVHE hypervisor code bases, build the EL2 code
   separately, allowing for the VHE code to now be built with instrumentation
 
 - Level-based TLB invalidation support
 
 - Restructure of the vcpu register storage to accomodate the NV code
 
 - Pointer Authentication available for guests on nVHE hosts
 
 - Simplification of the system register table parsing
 
 - MMU cleanups and fixes
 
 - A number of post-32bit cleanups and other fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl8q5DEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDQFAP/jtscnC5OxEOoGNW1gvg/1QI/BuU4zLvqQL1
 OEW72fUQlil7tmF/CbLLKnsBpxKmzO02C3wDdg3oaRi884bRtTXdok0nsFuCvrZD
 u/wrlMnP0zTjjk1uwIFfZJTx+nnUiT0jC6ffvGxB/jnTJk/8atvOUFL7ODFEfixz
 mS5g1jwwJkRmWKESFg7KGSghKuwXTvo4HVWCfME+t1rQwAa03stXFV8H5tkU6+cG
 BRIssxo7BkAV2AozwL7hgl/M6wd6QvbOrYJqgb67+sQ8qts0YNne96NN3InMedb1
 RENyDssXlA+VI0HoYyEbYnPtFy1Hoj1lOGDZLEZAEH1qcmWrV+hApnoSXSmuofvn
 QlfOWCyd92CZySu21MALRUVXbrKkA3zT2b9R93A5z7iEBPY+Wk0ryJCO6IxdZzF8
 48LNjtzb/Kd0SMU/issJlw+u6fJvLbpnSzXNsYYhiiTMUE9cbu2SEkq0SkonH0a4
 d3V8UifZyeffXsOfOAG0DJZOu/fWZp1/I3tfzujtG9rCb+jTQueJ4E1cFYrwSO6b
 sFNyiI1AzlwcCippG08zSUX61nGfKXBuMXuhIlMRk7GeiF95DmSXuxEgYndZX9I+
 E6zJr1iQk/1lrip41svDIIOBHuMbIeD/w1bsOKi7Zoa270MxB4r2Z3IqRMgosoE5
 l4YO9pl1
 =Ukr4
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next-5.6

KVM/arm64 updates for Linux 5.9:

- Split the VHE and nVHE hypervisor code bases, build the EL2 code
  separately, allowing for the VHE code to now be built with instrumentation

- Level-based TLB invalidation support

- Restructure of the vcpu register storage to accomodate the NV code

- Pointer Authentication available for guests on nVHE hosts

- Simplification of the system register table parsing

- MMU cleanups and fixes

- A number of post-32bit cleanups and other fixes
2020-08-09 12:58:23 -04:00
Linus Torvalds
921d2597ab s390: implement diag318
x86:
 * Report last CPU for debugging
 * Emulate smaller MAXPHYADDR in the guest than in the host
 * .noinstr and tracing fixes from Thomas
 * nested SVM page table switching optimization and fixes
 
 Generic:
 * Unify shadow MMU cache data structures across architectures
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8pC+oUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNcOwgAjomqtEqQNlp7DdZT7VyyklzbxX1/
 ud7v+oOJ8K4sFlf64lSthjPo3N9rzZCcw+yOXmuyuITngXOGc3tzIwXpCzpLtuQ1
 WO1Ql3B/2dCi3lP5OMmsO1UAZqy9pKLg1dfeYUPk48P5+p7d/NPmk+Em5kIYzKm5
 JsaHfCp2EEXomwmljNJ8PQ1vTjIQSSzlgYUBZxmCkaaX7zbEUMtxAQCStHmt8B84
 33LczwXBm3viSWrzsoBV37I70+tseugiSGsCfUyupXOvq55d6D9FCqtCb45Hn4Vh
 Ik8ggKdalsk/reiGEwNw1/3nr6mRMkHSbl+Mhc4waOIFf9dn0urgQgOaDg==
 =YVx0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "s390:
   - implement diag318

  x86:
   - Report last CPU for debugging
   - Emulate smaller MAXPHYADDR in the guest than in the host
   - .noinstr and tracing fixes from Thomas
   - nested SVM page table switching optimization and fixes

  Generic:
   - Unify shadow MMU cache data structures across architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  KVM: SVM: Fix sev_pin_memory() error handling
  KVM: LAPIC: Set the TDCR settable bits
  KVM: x86: Specify max TDP level via kvm_configure_mmu()
  KVM: x86/mmu: Rename max_page_level to max_huge_page_level
  KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
  KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
  KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
  KVM: VMX: Make vmx_load_mmu_pgd() static
  KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
  KVM: VMX: Drop a duplicate declaration of construct_eptp()
  KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
  KVM: Using macros instead of magic values
  MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
  KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
  KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
  KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
  KVM: VMX: Add guest physical address check in EPT violation and misconfig
  KVM: VMX: introduce vmx_need_pf_intercept
  KVM: x86: update exception bitmap on CPUID changes
  KVM: x86: rename update_bp_intercept to update_exception_bitmap
  ...
2020-08-06 12:59:31 -07:00
Marc Zyngier
bf4086b1a1 KVM: arm64: Prevent vcpu_has_ptrauth from generating OOL functions
So far, vcpu_has_ptrauth() is implemented in terms of system_supports_*_auth()
calls, which are declared "inline". In some specific conditions (clang
and SCS), the "inline" very much turns into an "out of line", which
leads to a fireworks when this predicate is evaluated on a non-VHE
system (right at the beginning of __hyp_handle_ptrauth).

Instead, make sure vcpu_has_ptrauth gets expanded inline by directly
using the cpus_have_final_cap() helpers, which are __always_inline,
generate much better code, and are the only thing that make sense when
running at EL2 on a nVHE system.

Fixes: 29eb5a3c57 ("KVM: arm64: Handle PtrAuth traps early")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lore.kernel.org/r/20200722162231.3689767-1-maz@kernel.org
2020-07-28 09:03:57 +01:00
Tianjia Zhang
74cc7e0c35 KVM: arm64: clean up redundant 'kvm_run' parameters
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
structure. For historical reasons, many kvm-related function parameters
retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
patch does a unified cleanup of these remaining redundant parameters.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200623131418.31473-3-tianjia.zhang@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 04:26:40 -04:00
Sean Christopherson
c1a33aebe9 KVM: arm64: Use common KVM implementation of MMU memory caches
Move to the common MMU memory cache implementation now that the common
code and arm64's existing code are semantically compatible.

No functional change intended.

Cc: Marc Zyngier <maz@kernel.org>
Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-19-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:43 -04:00
Sean Christopherson
e539451b7e KVM: arm64: Use common code's approach for __GFP_ZERO with memory caches
Add a "gfp_zero" member to arm64's 'struct kvm_mmu_memory_cache' to make
the struct and its usage compatible with the common 'struct
kvm_mmu_memory_cache' in linux/kvm_host.h.  This will minimize code
churn when arm64 moves to the common implementation in a future patch, at
the cost of temporarily having somewhat silly code.

Note, GFP_PGTABLE_USER is equivalent to GFP_KERNEL_ACCOUNT | GFP_ZERO:

  #define GFP_PGTABLE_USER  (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
  |
  -> #define GFP_PGTABLE_KERNEL        (GFP_KERNEL | __GFP_ZERO)

  == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO

versus

  #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)

    with __GFP_ZERO explicitly OR'd in

  == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO

No functional change intended.

Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-18-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:43 -04:00
Marc Zyngier
41ce82f63c KVM: arm64: timers: Move timer registers to the sys_regs file
Move the timer gsisters to the sysreg file. This will further help when
they are directly changed by a nesting hypervisor in the VNCR page.

This requires moving the initialisation of the timer struct so that some
of the helpers (such as arch_timer_ctx_index) can work correctly at an
early stage.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:38 +01:00
Marc Zyngier
710f198218 KVM: arm64: Move SPSR_EL1 to the system register array
SPSR_EL1 being a VNCR-capable register with ARMv8.4-NV, move it to
the sysregs array and update the accessors.

Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:38 +01:00
Marc Zyngier
fd85b66789 KVM: arm64: Disintegrate SPSR array
As we're about to move SPSR_EL1 into the VNCR page, we need to
disassociate it from the rest of the 32bit cruft. Let's break
the array into individual fields.

Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:38 +01:00
Marc Zyngier
1bded23ea7 KVM: arm64: Move SP_EL1 to the system register array
SP_EL1 being a VNCR-capable register with ARMv8.4-NV, move it to the
system register array and update the accessors.

Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:38 +01:00
Marc Zyngier
98909e6d1c KVM: arm64: Move ELR_EL1 to the system register array
As ELR-EL1 is a VNCR-capable register with ARMv8.4-NV, let's move it to
the sys_regs array and repaint the accessors. While we're at it, let's
kill the now useless accessors used only on the fault injection path.

Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:38 +01:00
Marc Zyngier
e47c2055c6 KVM: arm64: Make struct kvm_regs userspace-only
struct kvm_regs is used by userspace to indicate which register gets
accessed by the {GET,SET}_ONE_REG API. But as we're about to refactor
the layout of the in-kernel register structures, we need the kernel to
move away from it.

Let's make kvm_regs userspace only, and let the kernel map it to its own
internal representation.

Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:38 +01:00
Marc Zyngier
71071acfd3 KVM: arm64: hyp: Use ctxt_sys_reg/__vcpu_sys_reg instead of raw sys_regs access
Switch the hypervisor code to using ctxt_sys_reg/__vcpu_sys_reg instead
of raw sys_regs accesses. No intended functionnal change.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:37 +01:00
Marc Zyngier
1b422dd7fc KVM: arm64: Introduce accessor for ctxt->sys_reg
In order to allow the disintegration of the per-vcpu sysreg array,
let's introduce a new helper (ctxt_sys_reg()) that returns the
in-memory copy of a system register, picked from a given context.

__vcpu_sys_reg() is rewritten to use this helper.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:37 +01:00
Christoffer Dall
a0e50aa3f4 KVM: arm64: Factor out stage 2 page table data from struct kvm
As we are about to reuse our stage 2 page table manipulation code for
shadow stage 2 page tables in the context of nested virtualization, we
are going to manage multiple stage 2 page tables for a single VM.

This requires some pretty invasive changes to our data structures,
which moves the vmid and pgd pointers into a separate structure and
change pretty much all of our mmu code to operate on this structure
instead.

The new structure is called struct kvm_s2_mmu.

There is no intended functional change by this patch alone.

Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
[Designed data structure layout in collaboration]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Co-developed-by: Marc Zyngier <maz@kernel.org>
[maz: Moved the last_vcpu_ran down to the S2 MMU structure as well]
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:37 +01:00
David Brazdil
13aeb9b400 KVM: arm64: Split hyp/sysreg-sr.c to VHE/nVHE
sysreg-sr.c contains KVM's code for saving/restoring system registers, with
some code shared between VHE/nVHE. These common routines are moved to
a header file, VHE-specific code is moved to vhe/sysreg-sr.c and nVHE-specific
code to nvhe/sysreg-sr.c.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200625131420.71444-12-dbrazdil@google.com
2020-07-05 18:38:29 +01:00
Andrew Scull
f50b6f6ae1 KVM: arm64: Handle calls to prefixed hyp functions
Once hyp functions are moved to a hyp object, they will have prefixed symbols.
This change declares and gets the address of the prefixed version for calls to
the hyp functions.

To aid migration, the hyp functions that have not yet moved have their prefixed
versions aliased to their non-prefixed version. This begins with all the hyp
functions being listed and will reduce to none of them once the migration is
complete.

Signed-off-by: Andrew Scull <ascull@google.com>

[David: Extracted kvm_call_hyp nVHE branches into own helper macros, added
        comments around symbol aliases.]

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200625131420.71444-6-dbrazdil@google.com
2020-07-05 18:38:04 +01:00
Paolo Bonzini
49b3deaad3 KVM/arm64 fixes for Linux 5.8, take #1
* 32bit VM fixes:
   - Fix embarassing mapping issue between AArch32 CSSELR and AArch64
     ACTLR
   - Add ACTLR2 support for AArch32
   - Get rid of the useless ACTLR_EL1 save/restore
   - Fix CP14/15 accesses for AArch32 guests on BE hosts
   - Ensure that we don't loose any state when injecting a 32bit
     exception when running on a VHE host
 
 * 64bit VM fixes:
   - Fix PtrAuth host saving happening in preemptible contexts
   - Optimize PtrAuth lazy enable
   - Drop vcpu to cpu context pointer
   - Fix sparse warnings for HYP per-CPU accesses
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl7h6r8PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDE3gP/iogqGjZasUIwk4gdIc4IaxxNsfTYJFIh5uw
 sedAqwCQg3OftX0jptp6GhI3ZIG5UPuGDM7f3aio6i02pjx6bfBxGJ9AXqNcp6gN
 WcECHsAfzHUScznRhBbVflKkOF4dzfzyiutnMdknihePOyO9drwdvzXuJa37cs52
 tsCneP9xQ/vQWdqu42uPS7HtSepSa/Lf/qeKGaTDWQIvNYGI3PctQvRAxx4FNHc/
 SMUpS5zdTFceVoya/2+azTJ24R1lbwlPwaw2WoaghB+QmREKN8uMKy5kjrO5YUnH
 8BtjESiNBI2CZYSwcxFt+QNA6EmymwDwfrmOE+7iBCZelOLWLVYbJ7icKX3kT731
 gts5PBD8JlZWAnbH/Mbo4qngXJwHaijA38Bt8rvSphI0aK6iOU6DP5BuOurzNRde
 XczDYq3lqdCC2ynROjRpH4paVo7s0sBjjgZ7OsWqsw9uRAogwTkVE2sEi4HdqNAH
 JHhIHEKj7t/bRtzneXVk6ngoezIs6sIdcqrUZ+rAMnmMHbrzBoEqnlrlQ7e2/UXY
 yvY5Yc3/H2pKRCK/KznOi1nVG+xUZp4RZp552pwULF+JVbmMHIOxn3IxiejfMZVx
 czD5cxMcgMWa14ZZRN0DynT9wCg+s+MGaKGR6STyudVYHFBTr7hrsuM1zq/neMQf
 JcUBVUot
 =I2Li
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for Linux 5.8, take #1

* 32bit VM fixes:
  - Fix embarassing mapping issue between AArch32 CSSELR and AArch64
    ACTLR
  - Add ACTLR2 support for AArch32
  - Get rid of the useless ACTLR_EL1 save/restore
  - Fix CP14/15 accesses for AArch32 guests on BE hosts
  - Ensure that we don't loose any state when injecting a 32bit
    exception when running on a VHE host

* 64bit VM fixes:
  - Fix PtrAuth host saving happening in preemptible contexts
  - Optimize PtrAuth lazy enable
  - Drop vcpu to cpu context pointer
  - Fix sparse warnings for HYP per-CPU accesses
2020-06-11 14:02:32 -04:00
Marc Zyngier
15c99816ed Merge branch 'kvm-arm64/ptrauth-fixes' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-06-10 19:10:40 +01:00
Marc Zyngier
3204be4109 KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
AArch32 CP1x registers are overlayed on their AArch64 counterparts
in the vcpu struct. This leads to an interesting problem as they
are stored in their CPU-local format, and thus a CP1x register
doesn't "hit" the lower 32bit portion of the AArch64 register on
a BE host.

To workaround this unfortunate situation, introduce a bias trick
in the vcpu_cp1x() accessors which picks the correct half of the
64bit register.

Cc: stable@vger.kernel.org
Reported-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-06-10 16:03:57 +01:00
Marc Zyngier
07da1ffaa1 KVM: arm64: Remove host_cpu_context member from vcpu structure
For very long, we have kept this pointer back to the per-cpu
host state, despite having working per-cpu accessors at EL2
for some time now.

Recent investigations have shown that this pointer is easy
to abuse in preemptible context, which is a sure sign that
it would better be gone. Not to mention that a per-cpu
pointer is faster to access at all times.

Reported-by: Andrew Scull <ascull@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com
Reviewed-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-06-09 10:59:52 +01:00
Linus Torvalds
039aeb9deb ARM:
- Move the arch-specific code into arch/arm64/kvm
 - Start the post-32bit cleanup
 - Cherry-pick a few non-invasive pre-NV patches
 
 x86:
 - Rework of TLB flushing
 - Rework of event injection, especially with respect to nested virtualization
 - Nested AMD event injection facelift, building on the rework of generic code
 and fixing a lot of corner cases
 - Nested AMD live migration support
 - Optimization for TSC deadline MSR writes and IPIs
 - Various cleanups
 - Asynchronous page fault cleanups (from tglx, common topic branch with tip tree)
 - Interrupt-based delivery of asynchronous "page ready" events (host side)
 - Hyper-V MSRs and hypercalls for guest debugging
 - VMX preemption timer fixes
 
 s390:
 - Cleanups
 
 Generic:
 - switch vCPU thread wakeup from swait to rcuwait
 
 The other architectures, and the guest side of the asynchronous page fault
 work, will come next week.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp
 ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1
 5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4
 7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ
 asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy
 CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg==
 =v7Wn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - Move the arch-specific code into arch/arm64/kvm

   - Start the post-32bit cleanup

   - Cherry-pick a few non-invasive pre-NV patches

  x86:
   - Rework of TLB flushing

   - Rework of event injection, especially with respect to nested
     virtualization

   - Nested AMD event injection facelift, building on the rework of
     generic code and fixing a lot of corner cases

   - Nested AMD live migration support

   - Optimization for TSC deadline MSR writes and IPIs

   - Various cleanups

   - Asynchronous page fault cleanups (from tglx, common topic branch
     with tip tree)

   - Interrupt-based delivery of asynchronous "page ready" events (host
     side)

   - Hyper-V MSRs and hypercalls for guest debugging

   - VMX preemption timer fixes

  s390:
   - Cleanups

  Generic:
   - switch vCPU thread wakeup from swait to rcuwait

  The other architectures, and the guest side of the asynchronous page
  fault work, will come next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
  KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
  KVM: check userspace_addr for all memslots
  KVM: selftests: update hyperv_cpuid with SynDBG tests
  x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
  x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
  x86/kvm/hyper-v: Add support for synthetic debugger interface
  x86/hyper-v: Add synthetic debugger definitions
  KVM: selftests: VMX preemption timer migration test
  KVM: nVMX: Fix VMX preemption timer migration
  x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
  KVM: x86/pmu: Support full width counting
  KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
  KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
  KVM: x86: acknowledgment mechanism for async pf page ready notifications
  KVM: x86: interrupt based APF 'page ready' event delivery
  KVM: introduce kvm_read_guest_offset_cached()
  KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
  KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
  KVM: VMX: Replace zero-length array with flexible-array
  ...
2020-06-03 15:13:47 -07:00
Paolo Bonzini
380609445c KVM/arm64 updates for Linux 5.8:
- Move the arch-specific code into arch/arm64/kvm
 - Start the post-32bit cleanup
 - Cherry-pick a few non-invasive pre-NV patches
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl7RLp8PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD/iAQAJOHsS1PT9y/Gefam5os9FqKpogj68e3rx9k
 XfPcweexBVqmDWSI4vmL9xHW2F7z4EwAE4dIDsTCKHpihK30+jH8l12tOJBz35yp
 MR1hYjv43F54xzKkkuP4F4wo3Ygg4ipjHZPReGkaGj1QOQs6N/YKa1aSSYfzkzCz
 VLCSqPQz45CkGPYEGwuPn13AjHqGQAwPhteJNAoCxViw1KAldmoqDk6kbKB+b+7a
 2oIvxiTZejICsgSX6UvqQYNG52AyZ/5Daq8iraaigQ8sGyKr+/2Yi+3RUUH6p7ns
 aCsictk+RS3BzMAKDw6MPYc7OhJBhxQEV1pdiPpt0tpS4L9LNmBagKzlaBKZhwdr
 dYDAjOlbgZZUJpKnlBAipuVlQySHdm2WjXr4msdY69D7OGxmkzU/zkSIokqdA2hr
 MuL5W1v2Z1UpxyVltb+c/4lPcFZNnRI0Mz1WcvliEojlf2zzKYMcBAl3bTiAuil5
 aTT2+1G0OSCfUfr8Zart4LoAHeczw4zG/Pern+hl92eMXUlX3pIcqzQaLtVmmEE/
 ecPShMowKsXOOGGp/T8Q04N1fr6KzmufP5+kgJDFZfo6iJ6r5uQ9G8nuLmp3wQOX
 c9mNCwdSxrFBTJ10KfLHquKqwfl18VXzKDx1pzO5nSupmKWfWZ5YFO8j2709e83x
 R42MqKEG
 =aD+9
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.8:

- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
2020-06-01 04:26:27 -04:00
Will Deacon
c350717ec7 Merge branch 'for-next/kvm/errata' into for-next/core
KVM CPU errata rework
(Andrew Scull and Marc Zyngier)
* for-next/kvm/errata:
  KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  arm64: Unify WORKAROUND_SPECULATIVE_AT_{NVHE,VHE}
2020-05-28 18:02:51 +01:00
Marc Zyngier
b130a8f70c KVM: arm64: Check advertised Stage-2 page size capability
With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can
advertise the supported Stage-2 page sizes.

Let's check this at boot time.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-05-28 17:28:51 +01:00
Marc Zyngier
8f7f4fe756 KVM: arm64: Drop obsolete comment about sys_reg ordering
The general comment about keeping the enum order in sync
with the save/restore code has been obsolete for many years now.

Just drop it.

Note that there are other ordering requirements in the enum,
such as the PtrAuth and PMU registers, which are still valid.

Reported-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-05-28 13:16:57 +01:00
David Brazdil
71b3ec5f22 KVM: arm64: Clean up cpu_init_hyp_mode()
Pull bits of code to the only place where it is used. Remove empty function
__cpu_init_stage2(). Remove redundant has_vhe() check since this function is
nVHE-only. No functional changes intended.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200515152056.83158-1-dbrazdil@google.com
2020-05-25 16:15:47 +01:00
Keqian Zhu
c862626e19 KVM: arm64: Support enabling dirty log gradually in small chunks
There is already support of enabling dirty log gradually in small chunks
for x86 in commit 3c9bd4006b ("KVM: x86: enable dirty log gradually in
small chunks"). This adds support for arm64.

x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET
is enabled. However, for arm64, both huge pages and normal pages can be
write protected gradually by userspace.

Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G
Linux VMs with different page size. The memory pressure is 127G in each
case. The time taken of memory_global_dirty_log_start in QEMU is listed
below:

Page Size      Before    After Optimization
  4K            650ms         1.8ms
  2M             4ms          1.8ms
  1G             2ms          1.8ms

Besides the time reduction, the biggest improvement is that we will minimize
the performance side effect (because of dissolving huge pages and marking
memslots dirty) on guest after enabling dirty log.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200413122023.52583-1-zhukeqian1@huawei.com
2020-05-16 15:05:02 +01:00
David Matlack
cb953129bf kvm: add halt-polling cpu usage stats
Two new stats for exposing halt-polling cpu usage:
halt_poll_success_ns
halt_poll_fail_ns

Thus sum of these 2 stats is the total cpu time spent polling. "success"
means the VCPU polled until a virtual interrupt was delivered. "fail"
means the VCPU had to schedule out (either because the maximum poll time
was reached or it needed to yield the CPU).

To avoid touching every arch's kvm_vcpu_stat struct, only update and
export halt-polling cpu usage stats if we're on x86.

Exporting cpu usage as a u64 and in nanoseconds means we will overflow at
~500 years, which seems reasonably large.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Jon Cargille <jcargill@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

Message-Id: <20200508182240.68440-1-jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:26 -04:00
Andrew Scull
02ab1f5018 arm64: Unify WORKAROUND_SPECULATIVE_AT_{NVHE,VHE}
Errata 1165522, 1319367 and 1530923 each allow TLB entries to be
allocated as a result of a speculative AT instruction. In order to
avoid mandating VHE on certain affected CPUs, apply the workaround to
both the nVHE and the VHE case for all affected CPUs.

Signed-off-by: Andrew Scull <ascull@google.com>
Acked-by: Will Deacon <will@kernel.org>
CC: Marc Zyngier <maz@kernel.org>
CC: James Morse <james.morse@arm.com>
CC: Suzuki K Poulose <suzuki.poulose@arm.com>
CC: Will Deacon <will@kernel.org>
CC: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20200504094858.108917-1-ascull@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-05-04 16:05:47 +01:00
Marc Zyngier
d9c3872cd2 KVM: arm64: GICv4.1: Reload VLPI configuration on distributor enable/disable
Each time a Group-enable bit gets flipped, the state of these bits
needs to be forwarded to the hardware. This is a pretty heavy
handed operation, requiring all vcpus to reload their GICv4
configuration. It is thus implemented as a new request type.

These enable bits are programmed into the HW by setting the VGrp{0,1}En
fields of GICR_VPENDBASER when the vPEs are made resident again.

Of course, we only support Group-1 for now...

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-22-maz@kernel.org
2020-03-24 12:15:51 +00:00
Paolo Bonzini
e951445f4d KVM/arm fixes for 5.6, take #1
- Fix compilation on 32bit
 - Move  VHE guest entry/exit into the VHE-specific entry code
 - Make sure all functions called by the non-VHE HYP code is tagged as __always_inline
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl5VsNMPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDLhUQAIsecO9IyYjy1J0Q5AxaKLL7NuKYlAaty2xX
 uY6UkTfPNsEaHFXSGYXWPDxrmkgArp2wuy4WVQB59Om00+LE7h9kiz7+xKpcUy1G
 UoHa5mzMlqoOeUIWO/oSU6LYHhYDnIpHTDco93YrscU4nNRevJZ/GVeuQeMblzuZ
 Sg7cWc+0V43FXUt9Jw8BsNhXH/D0l0p3v86p7GZLcSfFAccO62YfOwC8J/znLPym
 4S+O9RYQkCczvzFeQVYQwqImOAunaOb0OzERUbm8icOF6ekYGwywjrtlmAC/3q+q
 1g/te1yfwQ8fpprWl4QSH0sQVdfAcxdDZqcWtN2LhNaEShZtNa5yKpsRGn1V0eAS
 tIO8eexAKCXoASHrrwfSkizYjRAeDabmodBQmS50/isY9OdBE2tDel+BLrCjzBJ2
 hABwEZ3Q78216EuoqsZqWaEUZ3ck0iSW3IcXglmHE4TC8Iq6dwskvOPjay+msHr9
 dcHDCxFIN4jzv9QcpKN8LkxfmW0Us28bzap3OhKfrz0nv7b4n+j0q1xbKL1QnN/l
 RcDPW0dQeXuX9vYMeYIUDQcV4IgTUkF6IPDCRW7KCApi98HfPTbrfQ97nir79zDp
 pD8NXaNFr4PtxJoheYYia3sjZMt/fgfvP2dM32iOpsMu7W1FXdfQN7heNSc6MQmO
 ciyhf/mj
 =NpPo
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.6, take #1

- Fix compilation on 32bit
- Move  VHE guest entry/exit into the VHE-specific entry code
- Make sure all functions called by the non-VHE HYP code is tagged as __always_inline
2020-02-28 11:50:06 +01:00
Mark Rutland
b3f15ec3d8 kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
With VHE, running a vCPU always requires the sequence:

1. kvm_arm_vhe_guest_enter();
2. kvm_vcpu_run_vhe();
3. kvm_arm_vhe_guest_exit()

... and as we invoke this from the shared arm/arm64 KVM code, 32-bit arm
has to provide stubs for all three functions.

To simplify the common code, and make it easier to make further
modifications to the arm64-specific portions in the near future, let's
fold kvm_arm_vhe_guest_enter() and kvm_arm_vhe_guest_exit() into
kvm_vcpu_run_vhe().

The 32-bit stubs for kvm_arm_vhe_guest_enter() and
kvm_arm_vhe_guest_exit() are removed, as they are no longer used. The
32-bit stub for kvm_vcpu_run_vhe() is left as-is.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200210114757.2889-1-mark.rutland@arm.com
2020-02-17 14:38:37 +00:00
Linus Torvalds
e813e65038 ARM: Cleanups and corner case fixes
PPC: Bugfixes
 
 x86:
 * Support for mapping DAX areas with large nested page table entries.
 * Cleanups and bugfixes here too.  A particularly important one is
 a fix for FPU load when the thread has TIF_NEED_FPU_LOAD.  There is
 also a race condition which could be used in guest userspace to exploit
 the guest kernel, for which the embargo expired today.
 * Fast path for IPI delivery vmexits, shaving about 200 clock cycles
 from IPI latency.
 * Protect against "Spectre-v1/L1TF" (bring data in the cache via
 speculative out of bound accesses, use L1TF on the sibling hyperthread
 to read it), which unfortunately is an even bigger whack-a-mole game
 than SpectreV1.
 
 Sean continues his mission to rewrite KVM.  In addition to a sizable
 number of x86 patches, this time he contributed a pretty large refactoring
 of vCPU creation that affects all architectures but should not have any
 visible effect.
 
 s390 will come next week together with some more x86 patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeMxtCAAoJEL/70l94x66DQxIIAJv9hMmXLQHGFnUMskjGErR6
 DCLSC0YRdRMwE50CerblyJtGsMwGsPyHZwvZxoAceKJ9w0Yay9cyaoJ87ItBgHoY
 ce0HrqIUYqRSJ/F8WH2lSzkzMBr839rcmqw8p1tt4D5DIsYnxHGWwRaaP+5M/1KQ
 YKFu3Hea4L00U339iIuDkuA+xgz92LIbsn38svv5fxHhPAyWza0rDEYHNgzMKuoF
 IakLf5+RrBFAh6ZuhYWQQ44uxjb+uQa9pVmcqYzzTd5t1g4PV5uXtlJKesHoAvik
 Eba8IEUJn+HgQJjhp3YxQYuLeWOwRF3bwOiZ578MlJ4OPfYXMtbdlqCQANHOcGk=
 =H/q1
 -----END PGP SIGNATURE-----

Merge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "This is the first batch of KVM changes.

  ARM:
   - cleanups and corner case fixes.

  PPC:
   - Bugfixes

  x86:
   - Support for mapping DAX areas with large nested page table entries.

   - Cleanups and bugfixes here too. A particularly important one is a
     fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is
     also a race condition which could be used in guest userspace to
     exploit the guest kernel, for which the embargo expired today.

   - Fast path for IPI delivery vmexits, shaving about 200 clock cycles
     from IPI latency.

   - Protect against "Spectre-v1/L1TF" (bring data in the cache via
     speculative out of bound accesses, use L1TF on the sibling
     hyperthread to read it), which unfortunately is an even bigger
     whack-a-mole game than SpectreV1.

  Sean continues his mission to rewrite KVM. In addition to a sizable
  number of x86 patches, this time he contributed a pretty large
  refactoring of vCPU creation that affects all architectures but should
  not have any visible effect.

  s390 will come next week together with some more x86 patches"

* tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  x86/KVM: Clean up host's steal time structure
  x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
  x86/kvm: Cache gfn to pfn translation
  x86/kvm: Introduce kvm_(un)map_gfn()
  x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
  KVM: PPC: Book3S PR: Fix -Werror=return-type build failure
  KVM: PPC: Book3S HV: Release lock on page-out failure path
  KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer
  KVM: arm64: pmu: Only handle supported event counters
  KVM: arm64: pmu: Fix chained SW_INCR counters
  KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled
  KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
  KVM: x86: Use a typedef for fastop functions
  KVM: X86: Add 'else' to unify fastop and execute call path
  KVM: x86: inline memslot_valid_for_gpte
  KVM: x86/mmu: Use huge pages for DAX-backed files
  KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()
  KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()
  KVM: x86/mmu: Zap any compound page when collapsing sptes
  KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)
  ...
2020-01-31 09:30:41 -08:00
Paolo Bonzini
621ab20c06 KVM/arm updates for Linux 5.6
- Fix MMIO sign extension
 - Fix HYP VA tagging on tag space exhaustion
 - Fix PSTATE/CPSR handling when generating exception
 - Fix MMU notifier's advertizing of young pages
 - Fix poisoned page handling
 - Fix PMU SW event handling
 - Fix TVAL register access
 - Fix AArch32 external abort injection
 - Fix ITS unmapped collection handling
 - Various cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl4y1z0PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDCsUQALsrDOOzbKPRfcJjk0+XSf3uDd9GvvQ6F48p
 zB8eerSZOSF4o/BMNHkcRkMaVyLRE9xzHYAfueaHYOFnaEHAO5YpMPE03Rme/SeM
 F3ZnT+iyt+GkbSRyJbR4u0QCuvhFSu8ve18TLRMDrFO6L8i/MH3AdexO9uWjKByI
 FBEUVNbq/nVma0I0DBcx2GeCKiu79O/Gf7qquRI8CnptmXvk/FFZz89bCxDLjRaM
 3d9OGzXd5Diy4BrAVG5gHbSYaEZ8uId0ltxTuI1spk2ju5kJOW0NStDDMXRr5Dc8
 0CXJmeQrw9QgTBRd52n9CL5JZvKyCRDRSx33aGoJaDyqo3d3mJoT9wzJ2+/FVK7q
 RhlrJHNpYzN31j/Op0wE85coyvrEZCqMmcGLTpuFB6LOLsJ41a/jkvbR431ayT9G
 phqBmpQ3BrxVDGwA1aRUf8VzimW0EV15YNkV63lOGvG6bpikKiNSwlwWhVF7q4zU
 UiwlJyNITCzOkavMY0FRJ3VubjpoOYU4XmwLiyavBM4o71cztONd/USb7w7p6Xy2
 cix8kpjHo7aYlJKl1Si92kIbndskXNKWrYvBwlOGaeIby9/EA7Jsnh7Ps6HOk+1C
 POVExwl3ZQrKjRh3N4mxTnB53NU09ATQ4VukP0pBnDNOMLFF87g07R3w2S5EdejV
 usIVNvlS
 =mT8t
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for Linux 5.6

- Fix MMIO sign extension
- Fix HYP VA tagging on tag space exhaustion
- Fix PSTATE/CPSR handling when generating exception
- Fix MMU notifier's advertizing of young pages
- Fix poisoned page handling
- Fix PMU SW event handling
- Fix TVAL register access
- Fix AArch32 external abort injection
- Fix ITS unmapped collection handling
- Various cleanups
2020-01-30 18:13:14 +01:00
Paolo Bonzini
7495e22bb1 KVM: Move running VCPU from ARM to common code
For ring-based dirty log tracking, it will be more efficient to account
writes during schedule-out or schedule-in to the currently running VCPU.
We would like to do it even if the write doesn't use the current VCPU's
address space, as is the case for cached writes (see commit 4e335d9e7d,
"Revert "KVM: Support vCPU-based gfn->hva cache"", 2017-05-02).

Therefore, add a mechanism to track the currently-loaded kvm_vcpu struct.
There is already something similar in KVM/ARM; one important difference
is that kvm_arch_vcpu_{load,put} have two callers in virt/kvm/kvm_main.c:
we have to update both the architecture-independent vcpu_{load,put} and
the preempt notifiers.

Another change made in the process is to allow using kvm_get_running_vcpu()
in preemptible code.  This is allowed because preempt notifiers ensure
that the value does not change even after the VCPU thread is migrated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:54 +01:00
Sean Christopherson
ddd259c9aa KVM: Drop kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit()
Remove kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() now that all
arch specific implementations are nops.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:33 +01:00
Sean Christopherson
19bcc89eb8 KVM: arm64: Free sve_state via arm specific hook
Add an arm specific hook to free the arm64-only sve_state.  Doing so
eliminates the last functional code from kvm_arch_vcpu_uninit() across
all architectures and paves the way for removing kvm_arch_vcpu_init()
and kvm_arch_vcpu_uninit() entirely.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:32 +01:00
Marc Zyngier
0e20f5e255 KVM: arm/arm64: Cleanup MMIO handling
Our MMIO handling is a bit odd, in the sense that it uses an
intermediate per-vcpu structure to store the various decoded
information that describe the access.

But the same information is readily available in the HSR/ESR_EL2
field, and we actually use this field to populate the structure.

Let's simplify the whole thing by getting rid of the superfluous
structure and save a (tiny) bit of space in the vcpu structure.

[32bit fix courtesy of Olof Johansson <olof@lixom.net>]
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-01-23 10:38:14 +00:00
Will Deacon
ab3906c531 Merge branch 'for-next/errata' into for-next/core
* for-next/errata: (3 commits)
  arm64: Workaround for Cortex-A55 erratum 1530923
  ...
2020-01-22 11:35:05 +00:00
Steven Price
e85d68faed arm64: Rename WORKAROUND_1165522 to SPECULATIVE_AT_VHE
Cortex-A55 is affected by a similar erratum, so rename the existing
workaround for errarum 1165522 so it can be used for both errata.

Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-16 10:43:53 +00:00
Suzuki K Poulose
b51c6ac220 arm64: Introduce system_capabilities_finalized() marker
We finalize the system wide capabilities after the SMP CPUs
are booted by the kernel. This is used as a marker for deciding
various checks in the kernel. e.g, sanity check the hotplugged
CPUs for missing mandatory features.

However there is no explicit helper available for this in the
kernel. There is sys_caps_initialised, which is not exposed.
The other closest one we have is the jump_label arm64_const_caps_ready
which denotes that the capabilities are set and the capability checks
could use the individual jump_labels for fast path. This is
performed before setting the ELF Hwcaps, which must be checked
against the new CPUs. We also perform some of the other initialization
e.g, SVE setup, which is important for the use of FP/SIMD
where SVE is supported. Normally userspace doesn't get to run
before we finish this. However the in-kernel users may
potentially start using the neon mode. So, we need to
reject uses of neon mode before we are set. Instead of defining
a new marker for the completion of SVE setup, we could simply
reuse the arm64_const_caps_ready and enable it once we have
finished all the setup. Also we could expose this to the
various users as "system_capabilities_finalized()" to make
it more meaningful than "const_caps_ready".

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-14 17:10:21 +00:00
Linus Torvalds
752272f16d ARM:
- Data abort report and injection
 - Steal time support
 - GICv4 performance improvements
 - vgic ITS emulation fixes
 - Simplify FWB handling
 - Enable halt polling counters
 - Make the emulated timer PREEMPT_RT compliant
 
 s390:
 - Small fixes and cleanups
 - selftest improvements
 - yield improvements
 
 PPC:
 - Add capability to tell userspace whether we can single-step the guest.
 - Improve the allocation of XIVE virtual processor IDs
 - Rewrite interrupt synthesis code to deliver interrupts in virtual
   mode when appropriate.
 - Minor cleanups and improvements.
 
 x86:
 - XSAVES support for AMD
 - more accurate report of nested guest TSC to the nested hypervisor
 - retpoline optimizations
 - support for nested 5-level page tables
 - PMU virtualization optimizations, and improved support for nested
   PMU virtualization
 - correct latching of INITs for nested virtualization
 - IOAPIC optimization
 - TSX_CTRL virtualization for more TAA happiness
 - improved allocation and flushing of SEV ASIDs
 - many bugfixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJd27PMAAoJEL/70l94x66DspsH+gPc6YWtKJFJH58Zj8NrNh6y
 t0FwDFcvUa51+m4jaY4L5Y8+zqu1dZFnPPhFGqNWpxrjCEvE/glQJv3BiUX06Seh
 aYUHNymGoYCTJOHaaGhV+NlgQaDuZOCOkIsOLAPehyFd1KojwB+FRC0xmO6aROPw
 9yQgYrKuK1UUn5HwxBNrMS4+Xv+2iKv/9sTnq1G4W2qX2NZQg84LVPg1zIdkCh3D
 3GOvoCBEk3ivQqjmdE7rP/InPr0XvW0b6TFhchIk8J6jEIQFHsmOUefiTvTxsIHV
 OKAZwvyeYPrYHA/aDZpaBmY2aR0ydfKDUQcviNIJoF1vOktGs0hvl3VbsmG8QCg=
 =OSI1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - data abort report and injection
   - steal time support
   - GICv4 performance improvements
   - vgic ITS emulation fixes
   - simplify FWB handling
   - enable halt polling counters
   - make the emulated timer PREEMPT_RT compliant

  s390:
   - small fixes and cleanups
   - selftest improvements
   - yield improvements

  PPC:
   - add capability to tell userspace whether we can single-step the
     guest
   - improve the allocation of XIVE virtual processor IDs
   - rewrite interrupt synthesis code to deliver interrupts in virtual
     mode when appropriate.
   - minor cleanups and improvements.

  x86:
   - XSAVES support for AMD
   - more accurate report of nested guest TSC to the nested hypervisor
   - retpoline optimizations
   - support for nested 5-level page tables
   - PMU virtualization optimizations, and improved support for nested
     PMU virtualization
   - correct latching of INITs for nested virtualization
   - IOAPIC optimization
   - TSX_CTRL virtualization for more TAA happiness
   - improved allocation and flushing of SEV ASIDs
   - many bugfixes and cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints
  KVM: x86: Grab KVM's srcu lock when setting nested state
  KVM: x86: Open code shared_msr_update() in its only caller
  KVM: Fix jump label out_free_* in kvm_init()
  KVM: x86: Remove a spurious export of a static function
  KVM: x86: create mmu/ subdirectory
  KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page
  KVM: x86: remove set but not used variable 'called'
  KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning
  KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it
  KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality
  KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
  KVM: x86: do not modify masked bits of shared MSRs
  KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
  KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
  KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
  KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT
  KVM: x86: Unexport kvm_vcpu_reload_apic_access_page()
  KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1
  KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization
  ...
2019-11-25 18:02:36 -08:00
Marc Zyngier
a4b28f5c67 Merge remote-tracking branch 'kvmarm/kvm-arm64/stolen-time' into kvmarm-master/next 2019-10-24 15:04:09 +01:00
Steven Price
58772e9a3d KVM: arm64: Provide VCPU attributes for stolen time
Allow user space to inform the KVM host where in the physical memory
map the paravirtualized time structures should be located.

User space can set an attribute on the VCPU providing the IPA base
address of the stolen time structure for that VCPU. This must be
repeated for every VCPU in the VM.

The address is given in terms of the physical address visible to
the guest and must be 64 byte aligned. The guest will discover the
address via a hypercall.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21 19:20:29 +01:00
Steven Price
8564d6372a KVM: arm64: Support stolen time reporting via shared structure
Implement the service call for configuring a shared structure between a
VCPU and the hypervisor in which the hypervisor can write the time
stolen from the VCPU's execution time by other tasks on the host.

User space allocates memory which is placed at an IPA also chosen by user
space. The hypervisor then updates the shared structure using
kvm_put_guest() to ensure single copy atomicity of the 64-bit value
reporting the stolen time in nanoseconds.

Whenever stolen time is enabled by the guest, the stolen time counter is
reset.

The stolen time itself is retrieved from the sched_info structure
maintained by the Linux scheduler code. We enable SCHEDSTATS when
selecting KVM Kconfig to ensure this value is meaningful.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21 19:20:28 +01:00
Steven Price
b48c1a45a1 KVM: arm64: Implement PV_TIME_FEATURES call
This provides a mechanism for querying which paravirtualized time
features are available in this hypervisor.

Also add the header file which defines the ABI for the paravirtualized
time features we're about to add.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21 19:20:27 +01:00
Christoffer Dall
c726200dd1 KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace
For a long time, if a guest accessed memory outside of a memslot using
any of the load/store instructions in the architecture which doesn't
supply decoding information in the ESR_EL2 (the ISV bit is not set), the
kernel would print the following message and terminate the VM as a
result of returning -ENOSYS to userspace:

  load/store instruction decoding not implemented

The reason behind this message is that KVM assumes that all accesses
outside a memslot is an MMIO access which should be handled by
userspace, and we originally expected to eventually implement some sort
of decoding of load/store instructions where the ISV bit was not set.

However, it turns out that many of the instructions which don't provide
decoding information on abort are not safe to use for MMIO accesses, and
the remaining few that would potentially make sense to use on MMIO
accesses, such as those with register writeback, are not used in
practice.  It also turns out that fetching an instruction from guest
memory can be a pretty horrible affair, involving stopping all CPUs on
SMP systems, handling multiple corner cases of address translation in
software, and more.  It doesn't appear likely that we'll ever implement
this in the kernel.

What is much more common is that a user has misconfigured his/her guest
and is actually not accessing an MMIO region, but just hitting some
random hole in the IPA space.  In this scenario, the error message above
is almost misleading and has led to a great deal of confusion over the
years.

It is, nevertheless, ABI to userspace, and we therefore need to
introduce a new capability that userspace explicitly enables to change
behavior.

This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
which does exactly that, and introduces a new exit reason to report the
event to userspace.  User space can then emulate an exception to the
guest, restart the guest, suspend the guest, or take any other
appropriate action as per the policy of the running system.

Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21 18:59:44 +01:00
Marc Zyngier
f226650494 arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clear
The GICv3 architecture specification is incredibly misleading when it
comes to PMR and the requirement for a DSB. It turns out that this DSB
is only required if the CPU interface sends an Upstream Control
message to the redistributor in order to update the RD's view of PMR.

This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
the case in Linux. It can still be set from EL3, so some special care
is required. But the upshot is that in the (hopefuly large) majority
of the cases, we can drop the DSB altogether.

This relies on a new static key being set if the boot CPU has PMHE
set. The drawback is that this static key has to be exported to
modules.

Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-15 12:26:09 +01:00
Linus Torvalds
39d7530d74 ARM:
* support for chained PMU counters in guests
 * improved SError handling
 * handle Neoverse N1 erratum #1349291
 * allow side-channel mitigation status to be migrated
 * standardise most AArch64 system register accesses to msr_s/mrs_s
 * fix host MPIDR corruption on 32bit
 * selftests ckleanups
 
 x86:
 * PMU event {white,black}listing
 * ability for the guest to disable host-side interrupt polling
 * fixes for enlightened VMCS (Hyper-V pv nested virtualization),
 * new hypercall to yield to IPI target
 * support for passing cstate MSRs through to the guest
 * lots of cleanups and optimizations
 
 Generic:
 * Some txt->rST conversions for the documentation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdJzdIAAoJEL/70l94x66DQDoH/i83/8kX4I8AWDlushPru4ts
 Q4lCE5VAPha+o4pLb1dtfFL3gTmSbsB1N++JSlqK3JOo6LphIOy6b0wBjQBbAa6U
 3CT1dJaHJoScLLj09vyBlvClGUH2ZKEQTWOiquCCf7JfPofxwPUA6vJ7TYsdkckx
 zR3ygbADWmnfS7hFfiqN3JzuYh9eoooGNWSU+Giq6VF41SiL3IqhBGZhWS0zE9c2
 2c5lpqqdeHmAYNBqsyzNiDRKp7+zLFSmZ7Z5/0L755L8KYwR6F5beTnmBMHvb4lA
 PWH/SWOC8EYR+PEowfrH+TxKZwp0gMn1kcAKjilHk0uCRwG1IzuHAr2jlNxICCk=
 =t/Oq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - support for chained PMU counters in guests
   - improved SError handling
   - handle Neoverse N1 erratum #1349291
   - allow side-channel mitigation status to be migrated
   - standardise most AArch64 system register accesses to msr_s/mrs_s
   - fix host MPIDR corruption on 32bit
   - selftests ckleanups

  x86:
   - PMU event {white,black}listing
   - ability for the guest to disable host-side interrupt polling
   - fixes for enlightened VMCS (Hyper-V pv nested virtualization),
   - new hypercall to yield to IPI target
   - support for passing cstate MSRs through to the guest
   - lots of cleanups and optimizations

  Generic:
   - Some txt->rST conversions for the documentation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (128 commits)
  Documentation: virtual: Add toctree hooks
  Documentation: kvm: Convert cpuid.txt to .rst
  Documentation: virtual: Convert paravirt_ops.txt to .rst
  KVM: x86: Unconditionally enable irqs in guest context
  KVM: x86: PMU Event Filter
  kvm: x86: Fix -Wmissing-prototypes warnings
  KVM: Properly check if "page" is valid in kvm_vcpu_unmap
  KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register
  KVM: LAPIC: Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane
  kvm: LAPIC: write down valid APIC registers
  KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
  KVM: doc: Add API documentation on the KVM_REG_ARM_WORKAROUNDS register
  KVM: arm/arm64: Add save/restore support for firmware workaround state
  arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests
  KVM: arm/arm64: Support chained PMU counters
  KVM: arm/arm64: Remove pmc->bitmask
  KVM: arm/arm64: Re-create event when setting counter value
  KVM: arm/arm64: Extract duplicated code to own function
  KVM: arm/arm64: Rename kvm_pmu_{enable/disable}_counter functions
  KVM: LAPIC: ARBPRI is a reserved register for x2APIC
  ...
2019-07-12 15:35:14 -07:00
Linus Torvalds
dfd437a257 arm64 updates for 5.3:
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
 
 - Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
   manage the permissions of executable vmalloc regions more strictly
 
 - Slight performance improvement by keeping softirqs enabled while
   touching the FPSIMD/SVE state (kernel_neon_begin/end)
 
 - Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG
   and AXFLAG instructions for floating point comparison flags
   manipulation) and FRINT (rounding floating point numbers to integers)
 
 - Re-instate ARM64_PSEUDO_NMI support which was previously marked as
   BROKEN due to some bugs (now fixed)
 
 - Improve parking of stopped CPUs and implement an arm64-specific
   panic_smp_self_stop() to avoid warning on not being able to stop
   secondary CPUs during panic
 
 - perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
   platforms
 
 - perf: DDR performance monitor support for iMX8QXP
 
 - cache_line_size() can now be set from DT or ACPI/PPTT if provided to
   cope with a system cache info not exposed via the CPUID registers
 
 - Avoid warning on hardware cache line size greater than
   ARCH_DMA_MINALIGN if the system is fully coherent
 
 - arm64 do_page_fault() and hugetlb cleanups
 
 - Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
 
 - Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags'
   introduced in 5.1)
 
 - CONFIG_RANDOMIZE_BASE now enabled in defconfig
 
 - Allow the selection of ARM64_MODULE_PLTS, currently only done via
   RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
   over into the vmalloc area
 
 - Make ZONE_DMA32 configurable
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl0eHqcACgkQa9axLQDI
 XvFyNA/+L+bnkz8m3ncydlqqfXomQn4eJJVQ8Uksb0knJz+1+3CUxxbO4ry4jXZN
 fMkbggYrDPRKpDbsUl0lsRipj7jW9bqan+N37c3SWqCkgb6HqDaHViwxdx6Ec/Uk
 gHudozDSPh/8c7hxGcSyt/CFyuW6b+8eYIQU5rtIgz8aVY2BypBvS/7YtYCbIkx0
 w4CFleRTK1zXD5mJQhrc6jyDx659sVkrAvdhf6YIymOY8nBTv40vwdNo3beJMYp8
 Po/+0Ixu+VkHUNtmYYZQgP/AGH96xiTcRnUqd172JdtRPpCLqnLqwFokXeVIlUKT
 KZFMDPzK+756Ayn4z4huEePPAOGlHbJje8JVNnFyreKhVVcCotW7YPY/oJR10bnc
 eo7yD+DxABTn+93G2yP436bNVa8qO1UqjOBfInWBtnNFJfANIkZweij/MQ6MjaTA
 o7KtviHnZFClefMPoiI7HDzwL8XSmsBDbeQ04s2Wxku1Y2xUHLx4iLmadwLQ1ZPb
 lZMTZP3N/T1554MoURVA1afCjAwiqU3bt1xDUGjbBVjLfSPBAn/25IacsG9Li9AF
 7Rp1M9VhrfLftjFFkB2HwpbhRASOxaOSx+EI3kzEfCtM2O9I1WHgP3rvCdc3l0HU
 tbK0/IggQicNgz7GSZ8xDlWPwwSadXYGLys+xlMZEYd3pDIOiFc=
 =0TDT
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}

 - Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
   manage the permissions of executable vmalloc regions more strictly

 - Slight performance improvement by keeping softirqs enabled while
   touching the FPSIMD/SVE state (kernel_neon_begin/end)

 - Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
   XAFLAG and AXFLAG instructions for floating point comparison flags
   manipulation) and FRINT (rounding floating point numbers to integers)

 - Re-instate ARM64_PSEUDO_NMI support which was previously marked as
   BROKEN due to some bugs (now fixed)

 - Improve parking of stopped CPUs and implement an arm64-specific
   panic_smp_self_stop() to avoid warning on not being able to stop
   secondary CPUs during panic

 - perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
   platforms

 - perf: DDR performance monitor support for iMX8QXP

 - cache_line_size() can now be set from DT or ACPI/PPTT if provided to
   cope with a system cache info not exposed via the CPUID registers

 - Avoid warning on hardware cache line size greater than
   ARCH_DMA_MINALIGN if the system is fully coherent

 - arm64 do_page_fault() and hugetlb cleanups

 - Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)

 - Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
   'arm_boot_flags' introduced in 5.1)

 - CONFIG_RANDOMIZE_BASE now enabled in defconfig

 - Allow the selection of ARM64_MODULE_PLTS, currently only done via
   RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
   over into the vmalloc area

 - Make ZONE_DMA32 configurable

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
  perf: arm_spe: Enable ACPI/Platform automatic module loading
  arm_pmu: acpi: spe: Add initial MADT/SPE probing
  ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
  ACPI/PPTT: Modify node flag detection to find last IDENTICAL
  x86/entry: Simplify _TIF_SYSCALL_EMU handling
  arm64: rename dump_instr as dump_kernel_instr
  arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
  arm64: Implement panic_smp_self_stop()
  arm64: Improve parking of stopped CPUs
  arm64: Expose FRINT capabilities to userspace
  arm64: Expose ARMv8.5 CondM capability to userspace
  arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
  arm64: ARM64_MODULES_PLTS must depend on MODULES
  arm64: bpf: do not allocate executable memory
  arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
  arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
  arm64: module: create module allocations without exec permissions
  arm64: Allow user selection of ARM64_MODULE_PLTS
  acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
  arm64: Allow selecting Pseudo-NMI again
  ...
2019-07-08 09:54:55 -07:00
Marc Zyngier
1e0cf16cda KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register
As part of setting up the host context, we populate its
MPIDR by using cpu_logical_map(). It turns out that contrary
to arm64, cpu_logical_map() on 32bit ARM doesn't return the
*full* MPIDR, but a truncated version.

This leaves the host MPIDR slightly corrupted after the first
run of a VM, since we won't correctly restore the MPIDR on
exit. Oops.

Since we cannot trust cpu_logical_map(), let's adopt a different
strategy. We move the initialization of the host CPU context as
part of the per-CPU initialization (which, in retrospect, makes
a lot of sense), and directly read the MPIDR from the HW. This
is guaranteed to work on both arm and arm64.

Reported-by: Andre Przywara <Andre.Przywara@arm.com>
Tested-by: Andre Przywara <Andre.Przywara@arm.com>
Fixes: 32f1395519 ("arm/arm64: KVM: Statically configure the host's view of MPIDR")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-07-08 16:29:48 +01:00
Andre Przywara
c118bbb527 arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests
Recent commits added the explicit notion of "workaround not required" to
the state of the Spectre v2 (aka. BP_HARDENING) workaround, where we
just had "needed" and "unknown" before.

Export this knowledge to the rest of the kernel and enhance the existing
kvm_arm_harden_branch_predictor() to report this new state as well.
Export this new state to guests when they use KVM's firmware interface
emulation.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-07-05 13:56:27 +01:00
Julien Thierry
bd82d4bd21 arm64: Fix incorrect irqflag restore for priority masking
When using IRQ priority masking to disable interrupts, in order to deal
with the PSR.I state, local_irq_save() would convert the I bit into a
PMR value (GIC_PRIO_IRQOFF). This resulted in local_irq_restore()
potentially modifying the value of PMR in undesired location due to the
state of PSR.I upon flag saving [1].

In an attempt to solve this issue in a less hackish manner, introduce
a bit (GIC_PRIO_IGNORE_PMR) for the PMR values that can represent
whether PSR.I is being used to disable interrupts, in which case it
takes precedence of the status of interrupt masking via PMR.

GIC_PRIO_PSR_I_SET is chosen such that (<pmr_value> |
GIC_PRIO_PSR_I_SET) does not mask more interrupts than <pmr_value> as
some sections (e.g. arch_cpu_idle(), interrupt acknowledge path)
requires PMR not to mask interrupts that could be signaled to the
CPU when using only PSR.I.

[1] https://www.spinics.net/lists/arm-kernel/msg716956.html

Fixes: 4a503217ce ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
Cc: <stable@vger.kernel.org> # 5.1.x-
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wei Li <liwei391@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-06-21 15:50:10 +01:00
Thomas Gleixner
caab277b1d treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not see http www gnu org
  licenses

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:07 +02:00
James Morse
b7c50fab66 KVM: arm64: Move pmu hyp code under hyp's Makefile to avoid instrumentation
KVM's pmu.c contains the __hyp_text needed to switch the pmu registers
between host and guest. Because this isn't covered by the 'hyp' Makefile,
it can be built with kasan and friends when these are enabled in Kconfig.

When starting a guest, this results in:
| Kernel panic - not syncing: HYP panic:
| PS:a00003c9 PC:000083000028ada0 ESR:86000007
| FAR:000083000028ada0 HPFAR:0000000029df5300 PAR:0000000000000000
| VCPU:000000004e10b7d6
| CPU: 0 PID: 3088 Comm: qemu-system-aar Not tainted 5.2.0-rc1 #11026
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Plat
| Call trace:
|  dump_backtrace+0x0/0x200
|  show_stack+0x20/0x30
|  dump_stack+0xec/0x158
|  panic+0x1ec/0x420
|  panic+0x0/0x420
| SMP: stopping secondary CPUs
| Kernel Offset: disabled
| CPU features: 0x002,25006082
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:

This is caused by functions in pmu.c calling the instrumented
code, which isn't mapped to hyp. From objdump -r:
| RELOCATION RECORDS FOR [.hyp.text]:
| OFFSET           TYPE              VALUE
| 0000000000000010 R_AARCH64_CALL26  __sanitizer_cov_trace_pc
| 0000000000000018 R_AARCH64_CALL26  __asan_load4_noabort
| 0000000000000024 R_AARCH64_CALL26  __asan_load4_noabort

Move the affected code to a new file under 'hyp's Makefile.

Fixes: 3d91befbb3 ("arm64: KVM: Enable !VHE support for :G/:H perf event modifiers")
Cc: Andrew Murray <Andrew.Murray@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-05-24 14:53:20 +01:00
Andrew Murray
435e53fb5e arm64: KVM: Enable VHE support for :G/:H perf event modifiers
With VHE different exception levels are used between the host (EL2) and
guest (EL1) with a shared exception level for userpace (EL0). We can take
advantage of this and use the PMU's exception level filtering to avoid
enabling/disabling counters in the world-switch code. Instead we just
modify the counter type to include or exclude EL0 at vcpu_{load,put} time.

We also ensure that trapped PMU system register writes do not re-enable
EL0 when reconfiguring the backing perf events.

This approach completely avoids blackout windows seen with !VHE.

Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24 15:46:26 +01:00
Andrew Murray
3d91befbb3 arm64: KVM: Enable !VHE support for :G/:H perf event modifiers
Enable/disable event counters as appropriate when entering and exiting
the guest to enable support for guest or host only event counting.

For both VHE and non-VHE we switch the counters between host/guest at
EL2.

The PMU may be on when we change which counters are enabled however
we avoid adding an isb as we instead rely on existing context
synchronisation events: the eret to enter the guest (__guest_enter)
and eret in kvm_call_hyp for __kvm_vcpu_run_nvhe on returning.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24 15:36:22 +01:00
Andrew Murray
eb41238cf1 arm64: KVM: Add accessors to track guest/host only counters
In order to effeciently switch events_{guest,host} perf counters at
guest entry/exit we add bitfields to kvm_cpu_context for guest and host
events as well as accessors for updating them.

A function is also provided which allows the PMU driver to determine
if a counter should start counting when it is enabled. With exclude_host,
we may only start counting when entering the guest.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24 15:35:30 +01:00
Andrew Murray
630a16854d arm64: KVM: Encapsulate kvm_cpu_context in kvm_host_data
The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
a typedef to kvm_cpu_context and is used to store host cpu context. The
kvm_cpu_context structure is also used elsewhere to hold vcpu context.
In order to use the percpu to hold additional future host information we
encapsulate kvm_cpu_context in a new structure and rename the typedef and
percpu to match.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24 15:35:24 +01:00
Amit Daniel Kachhap
a22fa321d1 KVM: arm64: Add userspace flag to enable pointer authentication
Now that the building blocks of pointer authentication are present, lets
add userspace flags KVM_ARM_VCPU_PTRAUTH_ADDRESS and
KVM_ARM_VCPU_PTRAUTH_GENERIC. These flags will enable pointer
authentication for the KVM guest on a per-vcpu basis through the ioctl
KVM_ARM_VCPU_INIT.

This features will allow the KVM guest to allow the handling of
pointer authentication instructions or to treat them as undefined
if not set.

Necessary documentations are added to reflect the changes done.

Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24 15:30:40 +01:00
Mark Rutland
384b40caa8 KVM: arm/arm64: Context-switch ptrauth registers
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.

Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.

When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.

Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.

Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.

This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24 15:30:40 +01:00
Amit Daniel Kachhap
b890d75c4c KVM: arm64: Add a vcpu flag to control ptrauth for guest
A per vcpu flag is added to check if pointer authentication is
enabled for the vcpu or not. This flag may be enabled according to
the necessary user policies and host capabilities.

This patch also adds a helper to check the flag.

Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-23 08:47:01 +01:00
Dave Martin
92e68b2b1b KVM: arm/arm64: Clean up vcpu finalization function parameter naming
Currently, the internal vcpu finalization functions use a different
name ("what") for the feature parameter than the name ("feature")
used in the documentation.

To avoid future confusion, this patch converts everything to use
the name "feature" consistently.

No functional change.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18 17:14:02 +01:00
Dave Martin
a3be836df7 KVM: arm/arm64: Demote kvm_arm_init_arch_resources() to just set up SVE
The introduction of kvm_arm_init_arch_resources() looks like
premature factoring, since nothing else uses this hook yet and it
is not clear what will use it in the future.

For now, let's not pretend that this is a general thing:

This patch simply renames the function to kvm_arm_init_sve(),
retaining the arm stub version under the new name.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18 17:14:01 +01:00
Dave Martin
9a3cdf26e3 KVM: arm64/sve: Allow userspace to enable SVE for vcpus
Now that all the pieces are in place, this patch offers a new flag
KVM_ARM_VCPU_SVE that userspace can pass to KVM_ARM_VCPU_INIT to
turn on SVE for the guest, on a per-vcpu basis.

As part of this, support for initialisation and reset of the SVE
vector length set and registers is added in the appropriate places,
as well as finally setting the KVM_ARM64_GUEST_HAS_SVE vcpu flag,
to turn on the SVE support code.

Allocation of the SVE register storage in vcpu->arch.sve_state is
deferred until the SVE configuration is finalized, by which time
the size of the registers is known.

Setting the vector lengths supported by the vcpu is considered
configuration of the emulated hardware rather than runtime
configuration, so no support is offered for changing the vector
lengths available to an existing vcpu across reset.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:54 +00:00
Dave Martin
9033bba4b5 KVM: arm64/sve: Add pseudo-register for the guest's vector lengths
This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
allow userspace to set and query the set of vector lengths visible
to the guest.

In the future, multiple register slices per SVE register may be
visible through the ioctl interface.  Once the set of slices has
been determined we would not be able to allow the vector length set
to be changed any more, in order to avoid userspace seeing
inconsistent sets of registers.  For this reason, this patch adds
support for explicit finalization of the SVE configuration via the
KVM_ARM_VCPU_FINALIZE ioctl.

Finalization is the proper place to allocate the SVE register state
storage in vcpu->arch.sve_state, so this patch adds that as
appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
was previously a no-op on arm64.

To simplify the logic for determining what vector lengths can be
supported, some code is added to KVM init to work this out, in the
kvm_arm_init_arch_resources() hook.

The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
Subsequent patches will allow SVE to be turned on for guest vcpus,
making it visible.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:54 +00:00
Dave Martin
7dd32a0d01 KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctl
Some aspects of vcpu configuration may be too complex to be
completed inside KVM_ARM_VCPU_INIT.  Thus, there may be a
requirement for userspace to do some additional configuration
before various other ioctls will work in a consistent way.

In particular this will be the case for SVE, where userspace will
need to negotiate the set of vector lengths to be made available to
the guest before the vcpu becomes fully usable.

In order to provide an explicit way for userspace to confirm that
it has finished setting up a particular vcpu feature, this patch
adds a new ioctl KVM_ARM_VCPU_FINALIZE.

When userspace has opted into a feature that requires finalization,
typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
matching call to KVM_ARM_VCPU_FINALIZE is now required before
KVM_RUN or KVM_GET_REG_LIST is allowed.  Individual features may
impose additional restrictions where appropriate.

No existing vcpu features are affected by this, so current
userspace implementations will continue to work exactly as before,
with no need to issue KVM_ARM_VCPU_FINALIZE.

As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
placeholder: no finalizable features exist yet, so ioctl is not
required and will always yield EINVAL.  Subsequent patches will add
the finalization logic to make use of this ioctl for SVE.

No functional change for existing userspace.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:54 +00:00
Dave Martin
0f062bfe36 KVM: arm/arm64: Add hook for arch-specific KVM initialisation
This patch adds a kvm_arm_init_arch_resources() hook to perform
subarch-specific initialisation when starting up KVM.

This will be used in a subsequent patch for global SVE-related
setup on arm64.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:54 +00:00
Dave Martin
e1c9c98345 KVM: arm64/sve: Add SVE support to register access ioctl interface
This patch adds the following registers for access via the
KVM_{GET,SET}_ONE_REG interface:

 * KVM_REG_ARM64_SVE_ZREG(n, i) (n = 0..31) (in 2048-bit slices)
 * KVM_REG_ARM64_SVE_PREG(n, i) (n = 0..15) (in 256-bit slices)
 * KVM_REG_ARM64_SVE_FFR(i) (in 256-bit slices)

In order to adapt gracefully to future architectural extensions,
the registers are logically divided up into slices as noted above:
the i parameter denotes the slice index.

This allows us to reserve space in the ABI for future expansion of
these registers.  However, as of today the architecture does not
permit registers to be larger than a single slice, so no code is
needed in the kernel to expose additional slices, for now.  The
code can be extended later as needed to expose them up to a maximum
of 32 slices (as carved out in the architecture itself) if they
really exist someday.

The registers are only visible for vcpus that have SVE enabled.
They are not enumerated by KVM_GET_REG_LIST on vcpus that do not
have SVE.

Accesses to the FPSIMD registers via KVM_REG_ARM_CORE is not
allowed for SVE-enabled vcpus: SVE-aware userspace can use the
KVM_REG_ARM64_SVE_ZREG() interface instead to access the same
register state.  This avoids some complex and pointless emulation
in the kernel to convert between the two views of these aliased
registers.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:54 +00:00
Dave Martin
b43b5dd990 KVM: arm64/sve: Context switch the SVE registers
In order to give each vcpu its own view of the SVE registers, this
patch adds context storage via a new sve_state pointer in struct
vcpu_arch.  An additional member sve_max_vl is also added for each
vcpu, to determine the maximum vector length visible to the guest
and thus the value to be configured in ZCR_EL2.LEN while the vcpu
is active.  This also determines the layout and size of the storage
in sve_state, which is read and written by the same backend
functions that are used for context-switching the SVE state for
host tasks.

On SVE-enabled vcpus, SVE access traps are now handled by switching
in the vcpu's SVE context and disabling the trap before returning
to the guest.  On other vcpus, the trap is not handled and an exit
back to the host occurs, where the handle_sve() fallback path
reflects an undefined instruction exception back to the guest,
consistently with the behaviour of non-SVE-capable hardware (as was
done unconditionally prior to this patch).

No SVE handling is added on non-VHE-only paths, since VHE is an
architectural and Kconfig prerequisite of SVE.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:53 +00:00
Dave Martin
73433762fc KVM: arm64/sve: System register context switch and access support
This patch adds the necessary support for context switching ZCR_EL1
for each vcpu.

ZCR_EL1 is trapped alongside the FPSIMD/SVE registers, so it makes
sense for it to be handled as part of the guest FPSIMD/SVE context
for context switch purposes instead of handling it as a general
system register.  This means that it can be switched in lazily at
the appropriate time.  No effort is made to track host context for
this register, since SVE requires VHE: thus the hosts's value for
this register lives permanently in ZCR_EL2 and does not alias the
guest's value at any time.

The Hyp switch and fpsimd context handling code is extended
appropriately.

Accessors are added in sys_regs.c to expose the SVE system
registers and ID register fields.  Because these need to be
conditionally visible based on the guest configuration, they are
implemented separately for now rather than by use of the generic
system register helpers.  This may be abstracted better later on
when/if there are more features requiring this model.

ID_AA64ZFR0_EL1 is RO-RAZ for MRS/MSR when SVE is disabled for the
guest, but for compatibility with non-SVE aware KVM implementations
the register should not be enumerated at all for KVM_GET_REG_LIST
in this case.  For consistency we also reject ioctl access to the
register.  This ensures that a non-SVE-enabled guest looks the same
to userspace, irrespective of whether the kernel KVM implementation
supports SVE.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:53 +00:00
Dave Martin
1765edbab1 KVM: arm64: Add a vcpu flag to control SVE visibility for the guest
Since SVE will be enabled or disabled on a per-vcpu basis, a flag
is needed in order to track which vcpus have it enabled.

This patch adds a suitable flag and a helper for checking it.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:52 +00:00
Dave Martin
3f61f40947 KVM: arm64: Add missing #includes to kvm_host.h
kvm_host.h uses some declarations from other headers that are
currently included by accident, without an explicit #include.

This patch adds a few #includes that are clearly missing.  Although
the header builds without them today, this should help to avoid
future surprises.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:52 +00:00
Linus Torvalds
636deed6c0 ARM: some cleanups, direct physical timer assignment, cache sanitization
for 32-bit guests
 
 s390: interrupt cleanup, introduction of the Guest Information Block,
 preparation for processor subfunctions in cpu models
 
 PPC: bug fixes and improvements, especially related to machine checks
 and protection keys
 
 x86: many, many cleanups, including removing a bunch of MMU code for
 unnecessary optimizations; plus AVIC fixes.
 
 Generic: memcg accounting
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJci+7XAAoJEL/70l94x66DUMkIAKvEefhceySHYiTpfefjLjIC
 16RewgHa+9CO4Oo5iXiWd90fKxtXLXmxDQOS4VGzN0rxvLGRw/fyXIxL1MDOkaAO
 l8SLSNuewY4XBUgISL3PMz123r18DAGOuy9mEcYU/IMesYD2F+wy5lJ17HIGq6X2
 RpoF1p3qO1jfkPTKOob6Ixd4H5beJNPKpdth7LY3PJaVhDxgouj32fxnLnATVSnN
 gENQ10fnt8BCjshRYW6Z2/9bF15JCkUFR1xdBW2/xh1oj+kvPqqqk2bEN1eVQzUy
 2hT/XkwtpthqjSbX8NNavWRSFnOnbMLTRKQyIXmFVsM5VoSrwtiGsCFzBgcT++I=
 =XIzU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - some cleanups
   - direct physical timer assignment
   - cache sanitization for 32-bit guests

  s390:
   - interrupt cleanup
   - introduction of the Guest Information Block
   - preparation for processor subfunctions in cpu models

  PPC:
   - bug fixes and improvements, especially related to machine checks
     and protection keys

  x86:
   - many, many cleanups, including removing a bunch of MMU code for
     unnecessary optimizations
   - AVIC fixes

  Generic:
   - memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
  kvm: vmx: fix formatting of a comment
  KVM: doc: Document the life cycle of a VM and its resources
  MAINTAINERS: Add KVM selftests to existing KVM entry
  Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
  KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
  KVM: PPC: Fix compilation when KVM is not enabled
  KVM: Minor cleanups for kvm_main.c
  KVM: s390: add debug logging for cpu model subfunctions
  KVM: s390: implement subfunction processor calls
  arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
  KVM: arm/arm64: Remove unused timer variable
  KVM: PPC: Book3S: Improve KVM reference counting
  KVM: PPC: Book3S HV: Fix build failure without IOMMU support
  Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
  x86: kvmguest: use TSC clocksource if invariant TSC is exposed
  KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
  KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
  KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
  KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
  KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
  ...
2019-03-15 15:00:28 -07:00
Linus Torvalds
3d8dfe75ef arm64 updates for 5.1:
- Pseudo NMI support for arm64 using GICv3 interrupt priorities
 
 - uaccess macros clean-up (unsafe user accessors also merged but
   reverted, waiting for objtool support on arm64)
 
 - ptrace regsets for Pointer Authentication (ARMv8.3) key management
 
 - inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by the
   riscv maintainers)
 
 - arm64/perf updates: PMU bindings converted to json-schema, unused
   variable and misleading comment removed
 
 - arm64/debug fixes to ensure checking of the triggering exception level
   and to avoid the propagation of the UNKNOWN FAR value into the si_code
   for debug signals
 
 - Workaround for Fujitsu A64FX erratum 010001
 
 - lib/raid6 ARM NEON optimisations
 
 - NR_CPUS now defaults to 256 on arm64
 
 - Minor clean-ups (documentation/comments, Kconfig warning, unused
   asm-offsets, clang warnings)
 
 - MAINTAINERS update for list information to the ARM64 ACPI entry
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlyCl0cACgkQa9axLQDI
 XvEyKxAAiogBZLbyhcy8bTUHVzVoJE0FyAkdO2wWnnaff2Ohkhy1Y/npv33IeK2q
 RknxqDIx2DUUVPJNRZGoI/WwBtTZdKaAnW4rIKG84yC1eAkFcd96WQasaZzcp1qY
 HmvbJiYXM0bh+0J7i3Wgry/QzOkrltJFJW2kp6Wd5aFE+R1WyWyxT6d+Fp0J3vlA
 bT70jlpBK6LXEOmmBS+04Ml02+8MvaGxIl8EInBHSfDLRLErj5E8n41rRHKUiSWz
 maWI+kVoLYwOE68xiZlDftUBEeQpUSWgg2nxeK+640QSl1wJmVcRcY9nm6TZeMG2
 AiZTR9a7cP5rrdSN5suUmb7d4AMMVlVMisGDlwb+9oCxeTRDzg0uwACaVgHfPqQr
 UeBdHbL9nStN7uBH23H8L9mKk+tqpFmk0sgzdrKejOwysAiqWV8aazb/Na3qnVRl
 J1B5opxMnGOsjXmHvtG/tiZl281Uwz5ZmzfLmIY3gUZgUgdA3511Egp0ry5y1dzJ
 SkYC4Hmzb2ybQvXGIDDa3OzCwXXiqyqKsO+O8Egg1k4OIwbp3w+NHE7gKeA+dMgD
 gjN7zEalCUi46Q28xiCPEb+88BpQ18czIWGQLb9mAnmYeZPjqqenXKXuRHr4lgVe
 jPURJ/vqvFEglZJN1RDuQHKzHEcm5f2XE566sMZYdSoeiUCb0QM=
 =2U56
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Pseudo NMI support for arm64 using GICv3 interrupt priorities

 - uaccess macros clean-up (unsafe user accessors also merged but
   reverted, waiting for objtool support on arm64)

 - ptrace regsets for Pointer Authentication (ARMv8.3) key management

 - inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by
   the riscv maintainers)

 - arm64/perf updates: PMU bindings converted to json-schema, unused
   variable and misleading comment removed

 - arm64/debug fixes to ensure checking of the triggering exception
   level and to avoid the propagation of the UNKNOWN FAR value into the
   si_code for debug signals

 - Workaround for Fujitsu A64FX erratum 010001

 - lib/raid6 ARM NEON optimisations

 - NR_CPUS now defaults to 256 on arm64

 - Minor clean-ups (documentation/comments, Kconfig warning, unused
   asm-offsets, clang warnings)

 - MAINTAINERS update for list information to the ARM64 ACPI entry

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
  arm64: mmu: drop paging_init comments
  arm64: debug: Ensure debug handlers check triggering exception level
  arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
  Revert "arm64: uaccess: Implement unsafe accessors"
  arm64: avoid clang warning about self-assignment
  arm64: Kconfig.platforms: fix warning unmet direct dependencies
  lib/raid6: arm: optimize away a mask operation in NEON recovery routine
  lib/raid6: use vdupq_n_u8 to avoid endianness warnings
  arm64: io: Hook up __io_par() for inX() ordering
  riscv: io: Update __io_[p]ar() macros to take an argument
  asm-generic/io: Pass result of I/O accessor to __io_[p]ar()
  arm64: Add workaround for Fujitsu A64FX erratum 010001
  arm64: Rename get_thread_info()
  arm64: Remove documentation about TIF_USEDFPU
  arm64: irqflags: Fix clang build warnings
  arm64: Enable the support of pseudo-NMIs
  arm64: Skip irqflags tracing for NMI in IRQs disabled context
  arm64: Skip preemption when exiting an NMI
  arm64: Handle serror in NMI context
  irqchip/gic-v3: Allow interrupts to be set as pseudo-NMI
  ...
2019-03-10 10:17:23 -07:00
Christoffer Dall
e329fb75d5 KVM: arm/arm64: Factor out VMID into struct kvm_vmid
In preparation for nested virtualization where we are going to have more
than a single VMID per VM, let's factor out the VMID data into a
separate VMID data structure and change the VMID allocator to operate on
this new structure instead of using a struct kvm.

This also means that udate_vttbr now becomes update_vmid, and that the
vttbr itself is generated on the fly based on the stage 2 page table
base address and the vmid.

We cache the physical address of the pgd when allocating the pgd to
avoid doing the calculation on every entry to the guest and to avoid
calling into potentially non-hyp-mapped code from hyp/EL2.

If we wanted to merge the VMID allocator with the arm64 ASID allocator
at some point in the future, it should actually become easier to do that
after this patch.

Note that to avoid mapping the kvm_vmid_bits variable into hyp, we
simply forego the masking of the vmid value in kvm_get_vttbr and rely on
update_vmid to always assign a valid vmid value (within the supported
range).

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
[maz: minor cleanups]
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-19 21:05:35 +00:00
Marc Zyngier
32f1395519 arm/arm64: KVM: Statically configure the host's view of MPIDR
We currently eagerly save/restore MPIDR. It turns out to be
slightly pointless:
- On the host, this value is known as soon as we're scheduled on a
  physical CPU
- In the guest, this value cannot change, as it is set by KVM
  (and this is a read-only register)

The result of the above is that we can perfectly avoid the eager
saving of MPIDR_EL1, and only keep the restore. We just have
to setup the host contexts appropriately at boot time.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-02-19 21:05:35 +00:00
Marc Zyngier
18fc7bf8e0 arm64: KVM: Allow for direct call of HYP functions when using VHE
When running VHE, there is no need to jump via some stub to perform
a "HYP" function call, as there is a single address space.

Let's thus change kvm_call_hyp() and co to perform a direct call
in this case. Although this results in a bit of code expansion,
it allows the compiler to check for type compatibility, something
that we are missing so far.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-02-19 21:05:24 +00:00
Marc Zyngier
7aa8d14641 arm/arm64: KVM: Introduce kvm_call_hyp_ret()
Until now, we haven't differentiated between HYP calls that
have a return value and those who don't. As we're about to
change this, introduce kvm_call_hyp_ret(), and change all
call sites that actually make use of a return value.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-02-19 21:05:24 +00:00
Marc Zyngier
358b28f09f arm/arm64: KVM: Allow a VCPU to fully reset itself
The current kvm_psci_vcpu_on implementation will directly try to
manipulate the state of the VCPU to reset it.  However, since this is
not done on the thread that runs the VCPU, we can end up in a strangely
corrupted state when the source and target VCPUs are running at the same
time.

Fix this by factoring out all reset logic from the PSCI implementation
and forwarding the required information along with a request to the
target VCPU.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-02-07 11:44:13 +00:00
Julien Thierry
85738e05dc arm64: kvm: Unmask PMR before entering guest
Interrupts masked by ICC_PMR_EL1 will not be signaled to the CPU. This
means that hypervisor will not receive masked interrupts while running a
guest.

We need to make sure that all maskable interrupts are masked from the
time we call local_irq_disable() in the main run loop, and remain so
until we call local_irq_enable() after returning from the guest, and we
need to ensure that we see no interrupts at all (including pseudo-NMIs)
in the middle of the VM world-switch, while at the same time we need to
ensure we exit the guest when there are interrupts for the host.

We can accomplish this with pseudo-NMIs enabled by:
  (1) local_irq_disable: set the priority mask
  (2) enter guest: set PSTATE.I
  (3)              clear the priority mask
  (4) eret to guest
  (5) exit guest:  set the priotiy mask
                   clear PSTATE.I (and restore other host PSTATE bits)
  (6) local_irq_enable: clear the priority mask.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-06 10:05:19 +00:00
Linus Torvalds
42b00f122c * ARM: selftests improvements, large PUD support for HugeTLB,
single-stepping fixes, improved tracing, various timer and vGIC
 fixes
 
 * x86: Processor Tracing virtualization, STIBP support, some correctness fixes,
 refactorings and splitting of vmx.c, use the Hyper-V range TLB flush hypercall,
 reduce order of vcpu struct, WBNOINVD support, do not use -ftrace for __noclone
 functions, nested guest support for PAUSE filtering on AMD, more Hyper-V
 enlightenments (direct mode for synthetic timers)
 
 * PPC: nested VFIO
 
 * s390: bugfixes only this time
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJcH0vFAAoJEL/70l94x66Dw/wH/2FZp1YOM5OgiJzgqnXyDbyf
 dNEfWo472MtNiLsuf+ZAfJojVIu9cv7wtBfXNzW+75XZDfh/J88geHWNSiZDm3Fe
 aM4MOnGG0yF3hQrRQyEHe4IFhGFNERax8Ccv+OL44md9CjYrIrsGkRD08qwb+gNh
 P8T/3wJEKwUcVHA/1VHEIM8MlirxNENc78p6JKd/C7zb0emjGavdIpWFUMr3SNfs
 CemabhJUuwOYtwjRInyx1y34FzYwW3Ejuc9a9UoZ+COahUfkuxHE8u+EQS7vLVF6
 2VGVu5SA0PqgmLlGhHthxLqVgQYo+dB22cRnsLtXlUChtVAq8q9uu5sKzvqEzuE=
 =b4Jx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - selftests improvements
   - large PUD support for HugeTLB
   - single-stepping fixes
   - improved tracing
   - various timer and vGIC fixes

  x86:
   - Processor Tracing virtualization
   - STIBP support
   - some correctness fixes
   - refactorings and splitting of vmx.c
   - use the Hyper-V range TLB flush hypercall
   - reduce order of vcpu struct
   - WBNOINVD support
   - do not use -ftrace for __noclone functions
   - nested guest support for PAUSE filtering on AMD
   - more Hyper-V enlightenments (direct mode for synthetic timers)

  PPC:
   -  nested VFIO

  s390:
   - bugfixes only this time"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
  KVM: x86: Add CPUID support for new instruction WBNOINVD
  kvm: selftests: ucall: fix exit mmio address guessing
  Revert "compiler-gcc: disable -ftracer for __noclone functions"
  KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
  KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
  KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
  MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry
  KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
  KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
  KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
  KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
  KVM: Make kvm_set_spte_hva() return int
  KVM: Replace old tlb flush function with new one to flush a specified range.
  KVM/MMU: Add tlb flush with range helper function
  KVM/VMX: Add hv tlb range flush support
  x86/hyper-v: Add HvFlushGuestAddressList hypercall support
  KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
  KVM: x86: Disable Intel PT when VMXON in L1 guest
  KVM: x86: Set intercept for Intel PT MSRs read/write
  KVM: x86: Implement Intel PT MSRs read/write emulation
  ...
2018-12-26 11:46:28 -08:00
Lan Tianyu
748c0e312f KVM: Make kvm_set_spte_hva() return int
The patch is to make kvm_set_spte_hva() return int and caller can
check return value to determine flush tlb or not.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:41 +01:00
Christoffer Dall
da6f16662a KVM: arm64: Make vcpu const in vcpu_read_sys_reg
vcpu_read_sys_reg should not be modifying the VCPU structure.
Eventually, to handle EL2 sysregs for nested virtualization, we will
call vcpu_read_sys_reg from places that have a const vcpu pointer, which
will complain about the lack of the const modifier on the read path.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19 17:47:07 +00:00
Mark Rutland
bd7d95cafb arm64: KVM: Consistently advance singlestep when emulating instructions
When we emulate a guest instruction, we don't advance the hardware
singlestep state machine, and thus the guest will receive a software
step exception after a next instruction which is not emulated by the
host.

We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
whether userspace requested a single step, and fake a debug exception
from within the kernel. Other times, we advance the HW singlestep state
rely on the HW to generate the exception for us. Thus, the observed step
behaviour differs for host and guest.

Let's make this simpler and consistent by always advancing the HW
singlestep state machine when we skip an instruction. Thus we can rely
on the hardware to generate the singlestep exception for us, and never
need to explicitly check for an active-pending step, nor do we need to
fake a debug exception from the guest.

Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18 14:11:37 +00:00
Marc Zyngier
8b2cca9ade arm64: KVM: Force VHE for systems affected by erratum 1165522
In order to easily mitigate ARM erratum 1165522, we need to force
affected CPUs to run in VHE mode if using KVM.

Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:59:07 +00:00
Marc Zyngier
33e5f4e509 KVM: arm64: Rework detection of SVE, !VHE systems
An SVE system is so far the only case where we mandate VHE. As we're
starting to grow this requirements, let's slightly rework the way we
deal with that situation, allowing for easy extension of this check.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:57:52 +00:00
Linus Torvalds
0d1e8b8d2b KVM updates for v4.20
ARM:
  - Improved guest IPA space support (32 to 52 bits)
 
  - RAS event delivery for 32bit
 
  - PMU fixes
 
  - Guest entry hardening
 
  - Various cleanups
 
  - Port of dirty_log_test selftest
 
 PPC:
  - Nested HV KVM support for radix guests on POWER9.  The performance is
    much better than with PR KVM.  Migration and arbitrary level of
    nesting is supported.
 
  - Disable nested HV-KVM on early POWER9 chips that need a particular hardware
    bug workaround
 
  - One VM per core mode to prevent potential data leaks
 
  - PCI pass-through optimization
 
  - merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base
 
 s390:
  - Initial version of AP crypto virtualization via vfio-mdev
 
  - Improvement for vfio-ap
 
  - Set the host program identifier
 
  - Optimize page table locking
 
 x86:
  - Enable nested virtualization by default
 
  - Implement Hyper-V IPI hypercalls
 
  - Improve #PF and #DB handling
 
  - Allow guests to use Enlightened VMCS
 
  - Add migration selftests for VMCS and Enlightened VMCS
 
  - Allow coalesced PIO accesses
 
  - Add an option to perform nested VMCS host state consistency check
    through hardware
 
  - Automatic tuning of lapic_timer_advance_ns
 
  - Many fixes, minor improvements, and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJb0FINAAoJEED/6hsPKofoI60IAJRS3vOAQ9Fav8cJsO1oBHcX
 3+NexfnBke1bzrjIR3SUcHKGZbdnVPNZc+Q4JjIbPpPmmOMU5jc9BC1dmd5f4Vzh
 BMnQ0yCvgFv3A3fy/Icx1Z8NJppxosdmqdQLrQrNo8aD3cjnqY2yQixdXrAfzLzw
 XEgKdIFCCz8oVN/C9TT4wwJn6l9OE7BM5bMKGFy5VNXzMu7t64UDOLbbjZxNgi1g
 teYvfVGdt5mH0N7b2GPPWRbJmgnz5ygVVpVNQUEFrdKZoCm6r5u9d19N+RRXAwan
 ZYFj10W2T8pJOUf3tryev4V33X7MRQitfJBo4tP5hZfi9uRX89np5zP1CFE7AtY=
 =yEPW
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "ARM:
   - Improved guest IPA space support (32 to 52 bits)

   - RAS event delivery for 32bit

   - PMU fixes

   - Guest entry hardening

   - Various cleanups

   - Port of dirty_log_test selftest

  PPC:
   - Nested HV KVM support for radix guests on POWER9. The performance
     is much better than with PR KVM. Migration and arbitrary level of
     nesting is supported.

   - Disable nested HV-KVM on early POWER9 chips that need a particular
     hardware bug workaround

   - One VM per core mode to prevent potential data leaks

   - PCI pass-through optimization

   - merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base

  s390:
   - Initial version of AP crypto virtualization via vfio-mdev

   - Improvement for vfio-ap

   - Set the host program identifier

   - Optimize page table locking

  x86:
   - Enable nested virtualization by default

   - Implement Hyper-V IPI hypercalls

   - Improve #PF and #DB handling

   - Allow guests to use Enlightened VMCS

   - Add migration selftests for VMCS and Enlightened VMCS

   - Allow coalesced PIO accesses

   - Add an option to perform nested VMCS host state consistency check
     through hardware

   - Automatic tuning of lapic_timer_advance_ns

   - Many fixes, minor improvements, and cleanups"

* tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
  Revert "kvm: x86: optimize dr6 restore"
  KVM: PPC: Optimize clearing TCEs for sparse tables
  x86/kvm/nVMX: tweak shadow fields
  selftests/kvm: add missing executables to .gitignore
  KVM: arm64: Safety check PSTATE when entering guest and handle IL
  KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips
  arm/arm64: KVM: Enable 32 bits kvm vcpu events support
  arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension()
  KVM: arm64: Fix caching of host MDCR_EL2 value
  KVM: VMX: enable nested virtualization by default
  KVM/x86: Use 32bit xor to clear registers in svm.c
  kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD
  kvm: vmx: Defer setting of DR6 until #DB delivery
  kvm: x86: Defer setting of CR2 until #PF delivery
  kvm: x86: Add payload operands to kvm_multiple_exception
  kvm: x86: Add exception payload fields to kvm_vcpu_events
  kvm: x86: Add has_payload and payload to kvm_queued_exception
  KVM: Documentation: Fix omission in struct kvm_vcpu_events
  KVM: selftests: add Enlightened VMCS test
  ...
2018-10-25 17:57:35 -07:00
Linus Torvalds
5289851171 arm64 updates for 4.20:
- Core mmu_gather changes which allow tracking the levels of page-table
   being cleared together with the arm64 low-level flushing routines
 
 - Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to
   mitigate Spectre-v4 dynamically without trapping to EL3 firmware
 
 - Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack
 
 - Optimise emulation of MRS instructions to ID_* registers on ARMv8.4
 
 - Support for Common Not Private (CnP) translations allowing threads of
   the same CPU to share the TLB entries
 
 - Accelerated crc32 routines
 
 - Move swapper_pg_dir to the rodata section
 
 - Trap WFI instruction executed in user space
 
 - ARM erratum 1188874 workaround (arch_timer)
 
 - Miscellaneous fixes and clean-ups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlvKGdEACgkQa9axLQDI
 XvGSQBAAiOH6aQABL4TB7c5KIc7C+Unjm6QCFCoaeGWoHuemnM6cFJ7RQsi0GqnP
 dVEX5V/FKfmeTWO5g24Ah+MbTm3Bt6+81gywAmi1rrHhmCaCIPjT7xDqy/WsLlvt
 7WtgegSGvQ7DIMj2dbfFav6+ra67qAiYZTc46jvuynVl6DrE3BCiyTDbXAWt2nzP
 Xf3un4AHRbg3UEMUZTLqU5q4z0tbM6rEAZru8O0UOTnD2q7uttUqW3Ab7fpuEkkj
 lEVrMWD3h8SJg+Df9CbXmCNOjh4VhwBwDb5LgO8vA/AcyV/YLEF5b2OUAk/28qwo
 0GBwjqRyI4+YQ9LPg41MhGzrlnta0HCdYoeNLgLQZiDcUkuSfGhoA+MNZNOR8B08
 sCWF7F6f8UIQm8KMMBiYYdlVyUYgHLsWE/1+CyeLV0oIoWT5k3c+Xe3pho9KpVb0
 Co04TqMlqalry0sbevHz5c55H7iWIjB1Tpo3SxM105dVJVibXRPXkz+WZ5iPO+xa
 ex2j1kjNdA/AUzrSCZ5lh22zhg0WsfwD++E5meAaJMxieim8FeZDRga43rowJ0BA
 zMbSNB/+NDFZ9EhC40VaUfKk8Tkgiug9J5swv0+v7hy1QLDyydHhbOecTuIueauM
 6taiT2Iuov5yFng1eonYj4htvouVF4WOhPGthFPJMOcrB9mLMhs=
 =3Mc8
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Apart from some new arm64 features and clean-ups, this also contains
  the core mmu_gather changes for tracking the levels of the page table
  being cleared and a minor update to the generic
  compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ.

  Summary:

   - Core mmu_gather changes which allow tracking the levels of
     page-table being cleared together with the arm64 low-level flushing
     routines

   - Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to
     mitigate Spectre-v4 dynamically without trapping to EL3 firmware

   - Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack

   - Optimise emulation of MRS instructions to ID_* registers on ARMv8.4

   - Support for Common Not Private (CnP) translations allowing threads
     of the same CPU to share the TLB entries

   - Accelerated crc32 routines

   - Move swapper_pg_dir to the rodata section

   - Trap WFI instruction executed in user space

   - ARM erratum 1188874 workaround (arch_timer)

   - Miscellaneous fixes and clean-ups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
  arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work
  arm64: cpufeature: Trap CTR_EL0 access only where it is necessary
  arm64: cpufeature: Fix handling of CTR_EL0.IDC field
  arm64: cpufeature: ctr: Fix cpu capability check for late CPUs
  Documentation/arm64: HugeTLB page implementation
  arm64: mm: Use __pa_symbol() for set_swapper_pgd()
  arm64: Add silicon-errata.txt entry for ARM erratum 1188873
  Revert "arm64: uaccess: implement unsafe accessors"
  arm64: mm: Drop the unused cpu parameter
  MAINTAINERS: fix bad sdei paths
  arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines
  arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c
  arm64: xen: Use existing helper to check interrupt status
  arm64: Use daifflag_restore after bp_hardening
  arm64: daifflags: Use irqflags functions for daifflags
  arm64: arch_timer: avoid unused function warning
  arm64: Trap WFI executed in userspace
  arm64: docs: Document SSBS HWCAP
  arm64: docs: Fix typos in ELF hwcaps
  arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe
  ...
2018-10-22 17:30:06 +01:00
Dongjiu Geng
375bdd3b5d arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension()
Rename kvm_arch_dev_ioctl_check_extension() to
kvm_arch_vm_ioctl_check_extension(), because it does
not have any relationship with device.

Renaming this function can make code readable.

Cc: James Morse <james.morse@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-18 10:12:53 +01:00
Marc Zyngier
bca607ebc7 KVM: arm/arm64: Rename kvm_arm_config_vm to kvm_arm_setup_stage2
VM tends to be a very overloaded term in KVM, so let's keep it
to describe the virtual machine. For the virtual memory setup,
let's use the "stage2" suffix.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-03 11:45:29 +01:00
Suzuki K Poulose
0f62f0e95b kvm: arm64: Set a limit on the IPA size
So far we have restricted the IPA size of the VM to the default
value (40bits). Now that we can manage the IPA size per VM and
support dynamic stage2 page tables, we can allow VMs to have
larger IPA. This patch introduces a the maximum IPA size
supported on the host. This is decided by the following factors :

 1) Maximum PARange supported by the CPUs - This can be inferred
    from the system wide safe value.
 2) Maximum PA size supported by the host kernel (48 vs 52)
 3) Number of levels in the host page table (as we base our
    stage2 tables on the host table helpers).

Since the stage2 page table code is dependent on the stage1
page table, we always ensure that :

  Number of Levels at Stage1 >= Number of Levels at Stage2

So we limit the IPA to make sure that the above condition
is satisfied. This will affect the following combinations
of VA_BITS and IPA for different page sizes.

  Host configuration | Unsupported IPA ranges
  39bit VA, 4K       | [44, 48]
  36bit VA, 16K      | [41, 48]
  42bit VA, 64K      | [47, 52]

Supporting the above combinations need independent stage2
page table manipulation code, which would need substantial
changes. We could purse the solution independently and
switch the page table code once we have it ready.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <cdall@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-03 11:44:55 +01:00
Suzuki K Poulose
7665f3a849 kvm: arm64: Configure VTCR_EL2 per VM
Add support for setting the VTCR_EL2 per VM, rather than hard
coding a value at boot time per CPU. This would allow us to tune
the stage2 page table parameters per VM in the later changes.

We compute the VTCR fields based on the system wide sanitised
feature registers, except for the hardware management of Access
Flags (VTCR_EL2.HA). It is fine to run a system with a mix of
CPUs that may or may not update the page table Access Flags.
Since the bit is RES0 on CPUs that don't support it, the bit
should be ignored on them.

Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <cdall@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-01 13:50:29 +01:00
Suzuki K Poulose
5b6c6742b5 kvm: arm/arm64: Allow arch specific configurations for VM
Allow the arch backends to perform VM specific initialisation.
This will be later used to handle IPA size configuration and per-VM
VTCR configuration on arm64.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <cdall@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-01 13:50:29 +01:00
Will Deacon
7c36447ae5 KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe
When running without VHE, it is necessary to set SCTLR_EL2.DSSBS if SSBD
has been forcefully disabled on the kernel command-line.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14 17:46:20 +01:00
Steven Price
df3190e220 arm64: KVM: Remove pgd_lock
The lock has never been used and the page tables are protected by
mmu_lock in struct kvm.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-09-07 15:06:03 +02:00
Marc Zyngier
a35381e10d KVM: Remove obsolete kvm_unmap_hva notifier backend
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to
deal with. Drop the now obsolete code.

Fixes: fb1522e099 ("KVM: update to new mmu_notifier semantic v2")
Cc: James Hogan <jhogan@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-09-07 15:06:02 +02:00
James Morse
539aee0edb KVM: arm64: Share the parts of get/set events useful to 32bit
The get/set events helpers to do some work to check reserved
and padding fields are zero. This is useful on 32bit too.

Move this code into virt/kvm/arm/arm.c, and give the arch
code some underscores.

This is temporarily hidden behind __KVM_HAVE_VCPU_EVENTS until
32bit is wired up.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21 16:02:31 +01:00
Dongjiu Geng
b7b27facc7 arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS
For the migrating VMs, user space may need to know the exception
state. For example, in the machine A, KVM make an SError pending,
when migrate to B, KVM also needs to pend an SError.

This new IOCTL exports user-invisible states related to SError.
Together with appropriate user space changes, user space can get/set
the SError exception state to do migrate/snapshot/suspend.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
[expanded documentation wording]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21 16:02:30 +01:00
Marc Zyngier
9bc03f1df3 arm64: KVM: Cleanup tpidr_el2 init on non-VHE
When running on a non-VHE system, we initialize tpidr_el2 to
contain the per-CPU offset required to reach per-cpu variables.

Actually, we initialize it twice: the first time as part of the
EL2 initialization, by copying tpidr_el1 into its el2 counterpart,
and another time by calling into __kvm_set_tpidr_el2.

It turns out that the first part is wrong, as it includes the
distance between the kernel mapping and the linear mapping, while
EL2 only cares about the linear mapping. This was the last vestige
of the first per-cpu use of tpidr_el2 that came in with SDEI.
The only caller then was hyp_panic(), and its now using the
pc-relative get_host_ctxt() stuff, instead of kimage addresses
from the literal pool.

It is not a big deal, as we override it straight away, but it is
slightly confusing. In order to clear said confusion, let's
set this directly as part of the hyp-init code, and drop the
ad-hoc HYP helper.

Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21 16:02:17 +01:00
Dave Martin
b3eb56b629 KVM: arm64/sve: Fix SVE trap restoration for non-current tasks
Commit e6b673b ("KVM: arm64: Optimise FPSIMD handling to reduce
guest/host thrashing") attempts to restore the configuration of
userspace SVE trapping via a call to fpsimd_bind_task_to_cpu(), but
the logic for determining when to do this is not correct.

The patch makes the errnoenous assumption that the only task that
may try to enter userspace with the currently loaded FPSIMD/SVE
register content is current.  This may not be the case however:  if
some other user task T is scheduled on the CPU during the execution
of the KVM run loop, and the vcpu does not try to use the registers
in the meantime, then T's state may be left there intact.  If T
happens to be the next task to enter userspace on this CPU then the
hooks for reloading the register state and configuring traps will
be skipped.

(Also, current never has SVE state at this point anyway and should
always have the trap enabled, as a side-effect of the ioctl()
syscall needed to reach the KVM run loop in the first place.)

This patch instead restores the state of the EL0 trap from the
state observed at the most recent vcpu_load(), ensuring that the
trap is set correctly for the loaded context (if any).

Fixes: e6b673b741 ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21 09:14:44 +01:00
Linus Torvalds
b357bf6023 Small update for KVM.
* ARM: lazy context-switching of FPSIMD registers on arm64, "split"
 regions for vGIC redistributor
 
 * s390: cleanups for nested, clock handling, crypto, storage keys and
 control register bits
 
 * x86: many bugfixes, implement more Hyper-V super powers,
 implement lapic_timer_advance_ns even when the LAPIC timer
 is emulated using the processor's VMX preemption timer.  Two
 security-related bugfixes at the top of the branch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJbH8Z/AAoJEL/70l94x66DF+UIAJeOuTp6LGasT/9uAb2OovaN
 +5kGmOPGFwkTcmg8BQHI2fXT4vhxMXWPFcQnyig9eXJVxhuwluXDOH4P9IMay0yw
 VDCBsWRdMvZDQad2hn6Z5zR4Jx01XrSaG/KqvXbbDKDCy96mWG7SYAY2m3ZwmeQi
 3Pa3O3BTijr7hBYnMhdXGkSn4ZyU8uPaAgIJ8795YKeOJ2JmioGYk6fj6y2WCxA3
 ztJymBjTmIoZ/F8bjuVouIyP64xH4q9roAyw4rpu7vnbWGqx1fjPYJoB8yddluWF
 JqCPsPzhKDO7mjZJy+lfaxIlzz2BN7tKBNCm88s5GefGXgZwk3ByAq/0GQ2M3rk=
 =H5zI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Small update for KVM:

  ARM:
   - lazy context-switching of FPSIMD registers on arm64
   - "split" regions for vGIC redistributor

  s390:
   - cleanups for nested
   - clock handling
   - crypto
   - storage keys
   - control register bits

  x86:
   - many bugfixes
   - implement more Hyper-V super powers
   - implement lapic_timer_advance_ns even when the LAPIC timer is
     emulated using the processor's VMX preemption timer.
   - two security-related bugfixes at the top of the branch"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (79 commits)
  kvm: fix typo in flag name
  kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access
  KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system
  KVM: x86: introduce linear_{read,write}_system
  kvm: nVMX: Enforce cpl=0 for VMX instructions
  kvm: nVMX: Add support for "VMWRITE to any supported field"
  kvm: nVMX: Restrict VMX capability MSR changes
  KVM: VMX: Optimize tscdeadline timer latency
  KVM: docs: nVMX: Remove known limitations as they do not exist now
  KVM: docs: mmu: KVM support exposing SLAT to guests
  kvm: no need to check return value of debugfs_create functions
  kvm: Make VM ioctl do valloc for some archs
  kvm: Change return type to vm_fault_t
  KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008
  kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation
  KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability
  KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation
  KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation
  KVM: introduce kvm_make_vcpus_request_mask() API
  KVM: x86: hyperv: do rep check for each hypercall separately
  ...
2018-06-12 11:34:04 -07:00
Marc Orr
d1e5b0e98e kvm: Make VM ioctl do valloc for some archs
The kvm struct has been bloating. For example, it's tens of kilo-bytes
for x86, which turns out to be a large amount of memory to allocate
contiguously via kzalloc. Thus, this patch does the following:
1. Uses architecture-specific routines to allocate the kvm struct via
   vzalloc for x86.
2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc
   when has_vhe() is true.

Other architectures continue to default to kalloc, as they have a
dependency on kalloc or have a small-enough struct kvm.

Signed-off-by: Marc Orr <marcorr@google.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-01 19:18:26 +02:00
Marc Zyngier
5d81f7dc9b arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
Now that all our infrastructure is in place, let's expose the
availability of ARCH_WORKAROUND_2 to guests. We take this opportunity
to tidy up a couple of SMCCC constants.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 18:00:59 +01:00
Marc Zyngier
55e3748e89 arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
In order to offer ARCH_WORKAROUND_2 support to guests, we need
a bit of infrastructure.

Let's add a flag indicating whether or not the guest uses
SSBD mitigation. Depending on the state of this flag, allow
KVM to disable ARCH_WORKAROUND_2 before entering the guest,
and enable it when exiting it.

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 18:00:55 +01:00
Dave Martin
21cdd7fd76 KVM: arm64: Remove eager host SVE state saving
Now that the host SVE context can be saved on demand from Hyp,
there is no longer any need to save this state in advance before
entering the guest.

This patch removes the relevant call to
kvm_fpsimd_flush_cpu_state().

Since the problem that function was intended to solve now no longer
exists, the function and its dependencies are also deleted.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25 12:28:30 +01:00
Dave Martin
85acda3b4a KVM: arm64: Save host SVE context as appropriate
This patch adds SVE context saving to the hyp FPSIMD context switch
path.  This means that it is no longer necessary to save the host
SVE state in advance of entering the guest, when in use.

In order to avoid adding pointless complexity to the code, VHE is
assumed if SVE is in use.  VHE is an architectural prerequisite for
SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
kernels that support both SVE and KVM.

Historically, software models exist that can expose the
architecturally invalid configuration of SVE without VHE, so if
this situation is detected at kvm_init() time then KVM will be
disabled.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25 12:28:29 +01:00
Dave Martin
e6b673b741 KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
This patch refactors KVM to align the host and guest FPSIMD
save/restore logic with each other for arm64.  This reduces the
number of redundant save/restore operations that must occur, and
reduces the common-case IRQ blackout time during guest exit storms
by saving the host state lazily and optimising away the need to
restore the host state before returning to the run loop.

Four hooks are defined in order to enable this:

 * kvm_arch_vcpu_run_map_fp():
   Called on PID change to map necessary bits of current to Hyp.

 * kvm_arch_vcpu_load_fp():
   Set up FP/SIMD for entering the KVM run loop (parse as
   "vcpu_load fp").

 * kvm_arch_vcpu_ctxsync_fp():
   Get FP/SIMD into a safe state for re-enabling interrupts after a
   guest exit back to the run loop.

   For arm64 specifically, this involves updating the host kernel's
   FPSIMD context tracking metadata so that kernel-mode NEON use
   will cause the vcpu's FPSIMD state to be saved back correctly
   into the vcpu struct.  This must be done before re-enabling
   interrupts because kernel-mode NEON may be used by softirqs.

 * kvm_arch_vcpu_put_fp():
   Save guest FP/SIMD state back to memory and dissociate from the
   CPU ("vcpu_put fp").

Also, the arm64 FPSIMD context switch code is updated to enable it
to save back FPSIMD state for a vcpu, not just current.  A few
helpers drive this:

 * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
   mark this CPU as having context fp (which may belong to a vcpu)
   currently loaded in its registers.  This is the non-task
   equivalent of the static function fpsimd_bind_to_cpu() in
   fpsimd.c.

 * task_fpsimd_save():
   exported to allow KVM to save the guest's FPSIMD state back to
   memory on exit from the run loop.

 * fpsimd_flush_state():
   invalidate any context's FPSIMD state that is currently loaded.
   Used to disassociate the vcpu from the CPU regs on run loop exit.

These changes allow the run loop to enable interrupts (and thus
softirqs that may use kernel-mode NEON) without having to save the
guest's FPSIMD state eagerly.

Some new vcpu_arch fields are added to make all this work.  Because
host FPSIMD state can now be saved back directly into current's
thread_struct as appropriate, host_cpu_context is no longer used
for preserving the FPSIMD state.  However, it is still needed for
preserving other things such as the host's system registers.  To
avoid ABI churn, the redundant storage space in host_cpu_context is
not removed for now.

arch/arm is not addressed by this patch and continues to use its
current save/restore logic.  It could provide implementations of
the helpers later if desired.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25 12:28:28 +01:00
Dave Martin
fa89d31c53 KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags
In struct vcpu_arch, the debug_flags field is used to store
debug-related flags about the vcpu state.

Since we are about to add some more flags related to FPSIMD and
SVE, it makes sense to add them to the existing flags field rather
than adding new fields.  Since there is only one debug_flags flag
defined so far, there is plenty of free space for expansion.

In preparation for adding more flags, this patch renames the
debug_flags field to simply "flags", and updates comments
appropriately.

The flag definitions are also moved to <asm/kvm_host.h>, since
their presence in <asm/kvm_asm.h> was for purely historical
reasons:  these definitions are not used from asm any more, and not
very likely to be as more Hyp asm is migrated to C.

KVM_ARM64_DEBUG_DIRTY_SHIFT has not been used since commit
1ea66d27e7 ("arm64: KVM: Move away from the assembly version of
the world switch"), so this patch gets rid of that too.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: fixed minor conflict]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25 12:28:13 +01:00
Marc Zyngier
85bd0ba1ff arm/arm64: KVM: Add PSCI version selection API
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
or 1.0 to a guest, defaulting to the latest version of the PSCI
implementation that is compatible with the requested version. This is
no different from doing a firmware upgrade on KVM.

But in order to give a chance to hypothetical badly implemented guests
that would have a fit by discovering something other than PSCI 0.2,
let's provide a new API that allows userspace to pick one particular
version of the API.

This is implemented as a new class of "firmware" registers, where
we expose the PSCI version. This allows the PSCI version to be
save/restored as part of a guest migration, and also set to
any supported version if the guest requires it.

Cc: stable@vger.kernel.org #4.16
Reviewed-by: Christoffer Dall <cdall@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-20 16:32:23 +01:00
Christoffer Dall
d47533dab9 KVM: arm64: Introduce framework for accessing deferred sysregs
We are about to defer saving and restoring some groups of system
registers to vcpu_put and vcpu_load on supported systems.  This means
that we need some infrastructure to access system registes which
supports either accessing the memory backing of the register or directly
accessing the system registers, depending on the state of the system
when we access the register.

We do this by defining read/write accessor functions, which can handle
both "immediate" and "deferrable" system registers.  Immediate registers
are always saved/restored in the world-switch path, but deferrable
registers are only saved/restored in vcpu_put/vcpu_load when supported
and sysregs_loaded_on_cpu will be set in that case.

Note that we don't use the deferred mechanism yet in this patch, but only
introduce infrastructure.  This is to improve convenience of review in
the subsequent patches where it is clear which registers become
deferred.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19 10:53:17 +00:00
Christoffer Dall
8d404c4c24 KVM: arm64: Rewrite system register accessors to read/write functions
Currently we access the system registers array via the vcpu_sys_reg()
macro.  However, we are about to change the behavior to some times
modify the register file directly, so let's change this to two
primitives:

 * Accessor macros vcpu_write_sys_reg() and vcpu_read_sys_reg()
 * Direct array access macro __vcpu_sys_reg()

The accessor macros should be used in places where the code needs to
access the currently loaded VCPU's state as observed by the guest.  For
example, when trapping on cache related registers, a write to a system
register should go directly to the VCPU version of the register.

The direct array access macro can be used in places where the VCPU is
known to never be running (for example userspace access) or for
registers which are never context switched (for example all the PMU
system registers).

This rewrites all users of vcpu_sys_regs to one of the macros described
above.

No functional change.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19 10:53:16 +00:00
Christoffer Dall
52f6c4f021 KVM: arm64: Change 32-bit handling of VM system registers
We currently handle 32-bit accesses to trapped VM system registers using
the 32-bit index into the coproc array on the vcpu structure, which is a
union of the coproc array and the sysreg array.

Since all the 32-bit coproc indices are created to correspond to the
architectural mapping between 64-bit system registers and 32-bit
coprocessor registers, and because the AArch64 system registers are the
double in size of the AArch32 coprocessor registers, we can always find
the system register entry that we must update by dividing the 32-bit
coproc index by 2.

This is going to make our lives much easier when we have to start
accessing system registers that use deferred save/restore and might
have to be read directly from the physical CPU.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19 10:53:16 +00:00
Christoffer Dall
3f5c90b890 KVM: arm64: Introduce VHE-specific kvm_vcpu_run
So far this is mostly (see below) a copy of the legacy non-VHE switch
function, but we will start reworking these functions in separate
directions to work on VHE and non-VHE in the most optimal way in later
patches.

The only difference after this patch between the VHE and non-VHE run
functions is that we omit the branch-predictor variant-2 hardening for
QC Falkor CPUs, because this workaround is specific to a series of
non-VHE ARMv8.0 CPUs.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19 10:53:13 +00:00
Christoffer Dall
bc192ceec3 KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
As we are about to move a bunch of save/restore logic for VHE kernels to
the load and put functions, we need some infrastructure to do this.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19 10:53:11 +00:00
Christoffer Dall
3df59d8dd3 KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
We currently have a separate read-modify-write of the HCR_EL2 on entry
to the guest for the sole purpose of setting the VF and VI bits, if set.
Since this is most rarely the case (only when using userspace IRQ chip
and interrupts are in flight), let's get rid of this operation and
instead modify the bits in the vcpu->arch.hcr[_el2] directly when
needed.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19 10:53:10 +00:00
Christoffer Dall
4464e210de KVM: arm64: Avoid storing the vcpu pointer on the stack
We already have the percpu area for the host cpu state, which points to
the VCPU, so there's no need to store the VCPU pointer on the stack on
every context switch.  We can be a little more clever and just use
tpidr_el2 for the percpu offset and load the VCPU pointer from the host
context.

This has the benefit of being able to retrieve the host context even
when our stack is corrupted, and it has a potential performance benefit
because we trade a store plus a load for an mrs and a load on a round
trip to the guest.

This does require us to calculate the percpu offset without including
the offset from the kernel mapping of the percpu array to the linear
mapping of the array (which is what we store in tpidr_el1), because a
PC-relative generated address in EL2 is already giving us the hyp alias
of the linear mapping of a kernel address.  We do this in
__cpu_init_hyp_mode() by using kvm_ksym_ref().

The code that accesses ESR_EL2 was previously using an alternative to
use the _EL1 accessor on VHE systems, but this was actually unnecessary
as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2
accessor does the same thing on both systems.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19 10:53:09 +00:00
Linus Torvalds
15303ba5d1 KVM changes for 4.16
ARM:
 - Include icache invalidation optimizations, improving VM startup time
 
 - Support for forwarded level-triggered interrupts, improving
   performance for timers and passthrough platform devices
 
 - A small fix for power-management notifiers, and some cosmetic changes
 
 PPC:
 - Add MMIO emulation for vector loads and stores
 
 - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
   requiring the complex thread synchronization of older CPU versions
 
 - Improve the handling of escalation interrupts with the XIVE interrupt
   controller
 
 - Support decrement register migration
 
 - Various cleanups and bugfixes.
 
 s390:
 - Cornelia Huck passed maintainership to Janosch Frank
 
 - Exitless interrupts for emulated devices
 
 - Cleanup of cpuflag handling
 
 - kvm_stat counter improvements
 
 - VSIE improvements
 
 - mm cleanup
 
 x86:
 - Hypervisor part of SEV
 
 - UMIP, RDPID, and MSR_SMI_COUNT emulation
 
 - Paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
 
 - Allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512
   features
 
 - Show vcpu id in its anonymous inode name
 
 - Many fixes and cleanups
 
 - Per-VCPU MSR bitmaps (already merged through x86/pti branch)
 
 - Stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJafvMtAAoJEED/6hsPKofo6YcH/Rzf2RmshrWaC3q82yfIV0Qz
 Z8N8yJHSaSdc3Jo6cmiVj0zelwAxdQcyjwlT7vxt5SL2yML+/Q0st9Hc3EgGGXPm
 Il99eJEl+2MYpZgYZqV8ff3mHS5s5Jms+7BITAeh6Rgt+DyNbykEAvzt+MCHK9cP
 xtsIZQlvRF7HIrpOlaRzOPp3sK2/MDZJ1RBE7wYItK3CUAmsHim/LVYKzZkRTij3
 /9b4LP1yMMbziG+Yxt1o682EwJB5YIat6fmDG9uFeEVI5rWWN7WFubqs8gCjYy/p
 FX+BjpOdgTRnX+1m9GIj0Jlc/HKMXryDfSZS07Zy4FbGEwSiI5SfKECub4mDhuE=
 =C/uD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "ARM:

   - icache invalidation optimizations, improving VM startup time

   - support for forwarded level-triggered interrupts, improving
     performance for timers and passthrough platform devices

   - a small fix for power-management notifiers, and some cosmetic
     changes

  PPC:

   - add MMIO emulation for vector loads and stores

   - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
     requiring the complex thread synchronization of older CPU versions

   - improve the handling of escalation interrupts with the XIVE
     interrupt controller

   - support decrement register migration

   - various cleanups and bugfixes.

  s390:

   - Cornelia Huck passed maintainership to Janosch Frank

   - exitless interrupts for emulated devices

   - cleanup of cpuflag handling

   - kvm_stat counter improvements

   - VSIE improvements

   - mm cleanup

  x86:

   - hypervisor part of SEV

   - UMIP, RDPID, and MSR_SMI_COUNT emulation

   - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit

   - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
     AVX512 features

   - show vcpu id in its anonymous inode name

   - many fixes and cleanups

   - per-VCPU MSR bitmaps (already merged through x86/pti branch)

   - stable KVM clock when nesting on Hyper-V (merged through
     x86/hyperv)"

* tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
  KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
  KVM: PPC: Book3S HV: Branch inside feature section
  KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
  KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
  KVM: PPC: Book3S PR: Fix broken select due to misspelling
  KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
  KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
  KVM: PPC: Book3S HV: Drop locks before reading guest memory
  kvm: x86: remove efer_reload entry in kvm_vcpu_stat
  KVM: x86: AMD Processor Topology Information
  x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
  kvm: embed vcpu id to dentry of vcpu anon inode
  kvm: Map PFN-type memory regions as writable (if possible)
  x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
  KVM: arm/arm64: Fixup userspace irqchip static key optimization
  KVM: arm/arm64: Fix userspace_irqchip_in_use counting
  KVM: arm/arm64: Fix incorrect timer_is_pending logic
  MAINTAINERS: update KVM/s390 maintainers
  MAINTAINERS: add Halil as additional vfio-ccw maintainer
  MAINTAINERS: add David as a reviewer for KVM/s390
  ...
2018-02-10 13:16:35 -08:00
Marc Zyngier
6167ec5c91 arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
A new feature of SMCCC 1.1 is that it offers firmware-based CPU
workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides
BP hardening for CVE-2017-5715.

If the host has some mitigation for this issue, report that
we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the
host workaround on every guest exit.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:54:05 +00:00
James Morse
0067df413b KVM: arm64: Handle RAS SErrors from EL2 on guest exit
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.

There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.

The current SError from EL2 code unmasks SError and tries to fence any
pending SError into a single instruction window. It then leaves SError
unmasked.

With the v8.2 RAS Extensions we may take an SError for a 'corrected'
error, but KVM is only able to handle SError from EL2 if they occur
during this single instruction window...

The RAS Extensions give us a new instruction to synchronise and
consume SErrors. The RAS Extensions document (ARM DDI0587),
'2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
SError interrupts generated by 'instructions, translation table walks,
hardware updates to the translation tables, and instruction fetches on
the same PE'. This makes ESB equivalent to KVMs existing
'dsb, mrs-daifclr, isb' sequence.

Use the alternatives to synchronise and consume any SError using ESB
instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
in the exit_code so that we can restart the vcpu if it turns out this
SError has no impact on the vcpu.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16 15:09:36 +00:00
James Morse
3368bd8097 KVM: arm64: Handle RAS SErrors from EL1 on guest exit
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.

There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.

For SError that interrupt a guest and are routed to EL2 the existing
behaviour is to inject an impdef SError into the guest.

Add code to handle RAS SError based on the ESR. For uncontained and
uncategorized errors arm64_is_fatal_ras_serror() will panic(), these
errors compromise the host too. All other error types are contained:
For the fatal errors the vCPU can't make progress, so we inject a virtual
SError. We ignore contained errors where we can make progress as if
we're lucky, we may not hit them again.

If only some of the CPUs support RAS the guest will see the cpufeature
sanitised version of the id registers, but we may still take RAS SError
on this CPU. Move the SError handling out of handle_exit() into a new
handler that runs before we can be preempted. This allows us to use
this_cpu_has_cap(), via arm64_is_ras_serror().

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16 15:09:13 +00:00
James Morse
c773ae2b34 KVM: arm64: Save/Restore guest DISR_EL1
If we deliver a virtual SError to the guest, the guest may defer it
with an ESB instruction. The guest reads the deferred value via DISR_EL1,
but the guests view of DISR_EL1 is re-mapped to VDISR_EL2 when HCR_EL2.AMO
is set.

Add the KVM code to save/restore VDISR_EL2, and make it accessible to
userspace as DISR_EL1.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16 15:08:52 +00:00
James Morse
4715c14bc1 KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.
Prior to v8.2's RAS Extensions, the HCR_EL2.VSE 'virtual SError' feature
generated an SError with an implementation defined ESR_EL1.ISS, because we
had no mechanism to specify the ESR value.

On Juno this generates an all-zero ESR, the most significant bit 'ISV'
is clear indicating the remainder of the ISS field is invalid.

With the RAS Extensions we have a mechanism to specify this value, and the
most significant bit has a new meaning: 'IDS - Implementation Defined
Syndrome'. An all-zero SError ESR now means: 'RAS error: Uncategorized'
instead of 'no valid ISS'.

Add KVM support for the VSESR_EL2 register to specify an ESR value when
HCR_EL2.VSE generates a virtual SError. Change kvm_inject_vabt() to
specify an implementation-defined value.

We only need to restore the VSESR_EL2 value when HCR_EL2.VSE is set, KVM
save/restores this bit during __{,de}activate_traps() and hardware clears the
bit once the guest has consumed the virtual-SError.

Future patches may add an API (or KVM CAP) to pend a virtual SError with
a specified ESR.

Cc: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16 15:08:41 +00:00
James Morse
4f5abad9e8 KVM: arm/arm64: mask/unmask daif around VHE guests
Non-VHE systems take an exception to EL2 in order to world-switch into the
guest. When returning from the guest KVM implicitly restores the DAIF
flags when it returns to the kernel at EL1.

With VHE none of this exception-level jumping happens, so KVMs
world-switch code is exposed to the host kernel's DAIF values, and KVM
spills the guest-exit DAIF values back into the host kernel.
On entry to a guest we have Debug and SError exceptions unmasked, KVM
has switched VBAR but isn't prepared to handle these. On guest exit
Debug exceptions are left disabled once we return to the host and will
stay this way until we enter user space.

Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
happen after the hosts VBAR value has been synchronised by the isb in
__vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
setting KVMs VBAR value, but is kept here for symmetry.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16 15:08:24 +00:00
James Morse
c97e166e54 KVM: arm64: Change hyp_panic()s dependency on tpidr_el2
Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the
host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and
on VHE they can be the same register.

KVM calls hyp_panic() when anything unexpected happens. This may occur
while a guest owns the EL1 registers. KVM stashes the vcpu pointer in
tpidr_el2, which it uses to find the host context in order to restore
the host EL1 registers before parachuting into the host's panic().

The host context is a struct kvm_cpu_context allocated in the per-cpu
area, and mapped to hyp. Given the per-cpu offset for this CPU, this is
easy to find. Change hyp_panic() to take a pointer to the
struct kvm_cpu_context. Wrap these calls with an asm function that
retrieves the struct kvm_cpu_context from the host's per-cpu area.

Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during
kvm init. (Later patches will make this unnecessary for VHE hosts)

We print out the vcpu pointer as part of the panic message. Add a back
reference to the 'running vcpu' in the host cpu context to preserve this.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-13 10:44:22 +00:00
Christoffer Dall
61bbe38027 KVM: arm/arm64: Avoid work when userspace iqchips are not used
We currently check if the VM has a userspace irqchip in several places
along the critical path, and if so, we do some work which is only
required for having an irqchip in userspace.  This is unfortunate, as we
could avoid doing any work entirely, if we didn't have to support
irqchip in userspace.

Realizing the userspace irqchip on ARM is mostly a developer or hobby
feature, and is unlikely to be used in servers or other scenarios where
performance is a priority, we can use a refcounted static key to only
check the irqchip configuration when we have at least one VM that uses
an irqchip in userspace.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:46 +01:00
Alex Bennée
696673d192 KVM: arm/arm64: debug: Introduce helper for single-step
After emulating instructions we may want return to user-space to handle
single-step debugging. Introduce a helper function, which, if
single-step is enabled, sets the run structure for return and returns
true.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-11-29 16:46:19 +01:00
Dave Martin
17eed27b02 arm64/sve: KVM: Prevent guests from using SVE
Until KVM has full SVE support, guests must not be allowed to
execute SVE instructions.

This patch enables the necessary traps, and also ensures that the
traps are disabled again on exit from the guest so that the host
can still use SVE if it wants to.

On guest exit, high bits of the SVE Zn registers may have been
clobbered as a side-effect the execution of FPSIMD instructions in
the guest.  The existing KVM host FPSIMD restore code is not
sufficient to restore these bits, so this patch explicitly marks
the CPU as not containing cached vector state for any task, thus
forcing a reload on the next return to userspace.  This is an
interim measure, in advance of adding full SVE awareness to KVM.

This marking of cached vector state in the CPU as invalid is done
using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c.  Due
to the repeated use of this rather obscure operation, it makes
sense to factor it out as a separate helper with a clearer name.
This patch factors it out as fpsimd_flush_cpu_state(), and ports
all callers to use it.

As a side effect of this refactoring, a this_cpu_write() in
fpsimd_cpu_pm_notifier() is changed to __this_cpu_write().  This
should be fine, since cpu_pm_enter() is supposed to be called only
with interrupts disabled.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03 15:24:19 +00:00
Jérôme Glisse
fb1522e099 KVM: update to new mmu_notifier semantic v2
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Changed since v1 (Linus Torvalds)
    - remove now useless kvm_arch_mmu_notifier_invalidate_page()

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31 16:13:00 -07:00
Paolo Bonzini
04a7ea04d5 KVM/ARM updates for 4.13
- vcpu request overhaul
 - allow timer and PMU to have their interrupt number
   selected from userspace
 - workaround for Cavium erratum 30115
 - handling of memory poisonning
 - the usual crop of fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAllWCM0VHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDjJ0QAI16x6+trKhH31lTSYekYfqm4hZ2
 Fp7IbALW9KNCaY35tZov2Zuh99qGRduxTh7ewqhKpON8kkU+UKj0F7zH22+vfN4m
 yas/+uNr8R9VLyvea4ysPsgx8Q8v1Ix9setohHYNZIL9/klVqtaHpYvArHVF/mzq
 p2j/NxRS2dlp9r2TtoMRMhA05u6r0wolhUuh+z9v2ipib0gfOBIG24jsqCTEcD9n
 5A/cVd+ztYshkrV95h3y9peahwt3zOA4QBGzrQ2K25jp0s54nqhmC7JTNSa8dtar
 YGW2MuAMoIFTwCFAlpwCzrwpOJFzF3Q6A8bOxei2fjclzjPMgT1xQxuhOoe4ntFa
 lTPxSHalm5W6dFTW90YSo2DBcPe+N7sQkhjR0cCeY3GYsOFhXMLTlOl5Pt1YK1or
 +3FAI74tFRKvVmb9mhZeGTvuzhDgRvtf3Qq5rjwlGzKc2BBOEgtMyj/Wgwo4N6Dz
 IjOnoRaUGELoBCWoTorMxLpsPBdPVSUxNyJTdAhqZ/ZtT1xqjhFNLZcrVWmOTzDM
 1cav+jZkla4sLmJSNDD54aCSvvtPHis0nZn9PRlh12xgOyYiAVx4K++MNuWP0P37
 hbh1gbPT+FcoVxPurUsX/pjNlTucPZcBwFytZDQlpwtPBpEFzJiImLYe/PldRb0f
 9WQOH1Y1+q14MF+N
 =6hNK
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM updates for 4.13

- vcpu request overhaul
- allow timer and PMU to have their interrupt number
  selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups

Conflicts:
	arch/s390/include/asm/kvm_host.h
2017-06-30 12:38:26 +02:00
Andrew Jones
325f9c649c KVM: arm/arm64: use vcpu requests for irq injection
Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
kick meant to trigger the interrupt injection could be sent while
the VCPU is outside guest mode, which means no IPI is sent, and
after it has called kvm_vgic_flush_hwstate(), meaning it won't see
the updated GIC state until its next exit some time later for some
other reason.  The receiving VCPU only needs to check this request
in VCPU RUN to handle it.  By checking it, if it's pending, a
memory barrier will be issued that ensures all state is visible.
See "Ensuring Requests Are Seen" of
Documentation/virtual/kvm/vcpu-requests.rst

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-06-04 16:53:56 +02:00