More often than not, a failed VM-entry in an x86 production
environment is induced by a defective CPU. To help identify the bad
hardware, include the id of the last logical CPU to run a vCPU in the
information provided to userspace on a KVM exit for failed VM-entry or
for KVM internal errors not associated with emulation. The presence of
this additional information is indicated by a new capability,
KVM_CAP_LAST_CPU.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-5-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
DIAGNOSE 0x318 (diag318) sets information regarding the environment
the VM is running in (Linux, z/VM, etc) and is observed via
firmware/service events.
This is a privileged s390x instruction that must be intercepted by
SIE. Userspace handles the instruction as well as migration. Data
is communicated via VCPU register synchronization.
The Control Program Name Code (CPNC) is stored in the SIE block. The
CPNC along with the Control Program Version Code (CPVC) are stored
in the kvm_vcpu_arch struct.
This data is reset on load normal and clear resets.
Signed-off-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200622154636.5499-3-walling@linux.ibm.com
[borntraeger@de.ibm.com: fix sync_reg position]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested virtualization
- Nested AMD event injection facelift, building on the rework of generic code
and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page fault
work, will come next week.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp
ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1
5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4
7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ
asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy
CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg==
=v7Wn
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested
virtualization
- Nested AMD event injection facelift, building on the rework of
generic code and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch
with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host
side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page
fault work, will come next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
KVM: check userspace_addr for all memslots
KVM: selftests: update hyperv_cpuid with SynDBG tests
x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
x86/kvm/hyper-v: Add support for synthetic debugger interface
x86/hyper-v: Add synthetic debugger definitions
KVM: selftests: VMX preemption timer migration test
KVM: nVMX: Fix VMX preemption timer migration
x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
KVM: x86/pmu: Support full width counting
KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
KVM: x86: acknowledgment mechanism for async pf page ready notifications
KVM: x86: interrupt based APF 'page ready' event delivery
KVM: introduce kvm_read_guest_offset_cached()
KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
KVM: VMX: Replace zero-length array with flexible-array
...
Add support for Hyper-V synthetic debugger (syndbg) interface.
The syndbg interface is using MSRs to emulate a way to send/recv packets
data.
The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled
and if it supports the synthetic debugger interface it will attempt to
use it, instead of trying to initialize a network adapter.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200529134543.1127440-4-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The problem the patch is trying to address is the fact that 'struct
kvm_hyperv_exit' has different layout on when compiling in 32 and 64 bit
modes.
In 64-bit mode the default alignment boundary is 64 bits thus
forcing extra gaps after 'type' and 'msr' but in 32-bit mode the
boundary is at 32 bits thus no extra gaps.
This is an issue as even when the kernel is 64 bit, the userspace using
the interface can be both 32 and 64 bit but the same 32 bit userspace has
to work with 32 bit kernel.
The issue is fixed by forcing the 64 bit layout, this leads to ABI
change for 32 bit builds and while we are obviously breaking '32 bit
userspace with 32 bit kernel' case, we're fixing the '32 bit userspace
with 64 bit kernel' one.
As the interface has no (known) users and 32 bit KVM is rather baroque
nowadays, this seems like a reasonable decision.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200424113746.3473563-2-arilou@gmail.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce new capability to indicate that KVM supports interrupt based
delivery of 'page ready' APF events. This includes support for both
MSR_KVM_ASYNC_PF_INT and MSR_KVM_ASYNC_PF_ACK.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_CAP_HALT_POLL is a per-VM capability that lets userspace
control the halt-polling time, allowing halt-polling to be tuned or
disabled on particular VMs.
With dynamic halt-polling, a VM's VCPUs can poll from anywhere from
[0, halt_poll_ns] on each halt. KVM_CAP_HALT_POLL sets the
upper limit on the poll time.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Jon Cargille <jcargill@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20200417221446.108733-1-jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
At present, on Power systems with Protected Execution Facility
hardware and an ultravisor, a KVM guest can transition to being a
secure guest at will. Userspace (QEMU) has no way of knowing
whether a host system is capable of running secure guests. This
will present a problem in future when the ultravisor is capable of
migrating secure guests from one host to another, because
virtualization management software will have no way to ensure that
secure guests only run in domains where all of the hosts can
support secure guests.
This adds a VM capability which has two functions: (a) userspace
can query it to find out whether the host can support secure guests,
and (b) userspace can enable it for a guest, which allows that
guest to become a secure guest. If userspace does not enable it,
KVM will return an error when the ultravisor does the hypercall
that indicates that the guest is starting to transition to a
secure guest. The ultravisor will then abort the transition and
the guest will terminate.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
1. Allow to disable gisa
2. protected virtual machines
Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's
state like guest memory and guest registers anymore. Instead the
PVMs are mostly managed by a new entity called Ultravisor (UV),
which provides an API, so KVM and the PV can request management
actions.
PVMs are encrypted at rest and protected from hypervisor access
while running. They switch from a normal operation into protected
mode, so we can still use the standard boot process to load a
encrypted blob and then move it into protected mode.
Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.
One mm related patch will go via Andrews mm tree ( mm/gup/writeback:
add callbacks for inaccessible pages)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJeZf9tAAoJEBF7vIC1phx89J0P/iv3wCoMNDqAttnHa/UQFF04
njUadNYkAADDrsabIEOs9O+BE1/4BVspnIunE4+xw76p5M/7/g5eIhXWcLudhlnL
+XtvuEwz/2ffA9JWAAYNKB7cGqBM9BCC+iYzAF9ah6sPLmlDCoF+hRe0g+0tXSON
cklUJFril9bOcxd/MxrzFLcmipbxT/Z4/10eBY+FHcm6SQGOKAtJH0xL7X3PfPI5
L/6ZhML9exsj1Iplkrl8BomMRoYOrvfq/jMaZp9SwmfXaOKYmNU3a19MhzfZ593h
bfR92H8kZRy/TpBd7EnpxYGQ/n53HkUhFMhtqkkkeHW1rCo8ccwC4VfnXb+KqQp+
nJ8KieWG+OlKKFDuZPl5Gq+jQqjJfzchbyMTYnBNe+GPT5zg76tJXmQyDn5X9p3R
mfg+9ZEeEonMu7px93Ht1gLdPiC2gjRckjuBDPqMGEhG2z2SQ/MLri+WnproIQRa
TcE7rZBtuyrGFTq4M4dEcsUW02xnOaav6H57kkl8EwqYwgDHlqoUbt0AvLFyW07a
RlH7drmhKDwTJkcOhOLeLNM8Un6NvnsLZ8Lbcr9rRf9Z9Lpc+zW88BSwJ7MM/GH8
FEQM8Omnn8KAJTENpIm3bHHyvsi0kJEhl+c3Ila3QnYzXZbJ3ZDaJZngMAbUUnVl
YNeFyyALzOgVVBx4kvTm
=x6Hn
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Features and Enhancements for 5.7 part1
1. Allow to disable gisa
2. protected virtual machines
Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's
state like guest memory and guest registers anymore. Instead the
PVMs are mostly managed by a new entity called Ultravisor (UV),
which provides an API, so KVM and the PV can request management
actions.
PVMs are encrypted at rest and protected from hypervisor access
while running. They switch from a normal operation into protected
mode, so we can still use the standard boot process to load a
encrypted blob and then move it into protected mode.
Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.
One mm related patch will go via Andrews mm tree ( mm/gup/writeback:
add callbacks for inaccessible pages)
It could take kvm->mmu_lock for an extended period of time when
enabling dirty log for the first time. The main cost is to clear
all the D-bits of last level SPTEs. This situation can benefit from
manual dirty log protect as well, which can reduce the mmu_lock
time taken. The sequence is like this:
1. Initialize all the bits of the dirty bitmap to 1 when enabling
dirty log for the first time
2. Only write protect the huge pages
3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
SPTEs gradually in small chunks
Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
I did some tests with a 128G windows VM and counted the time taken
of memory_global_dirty_log_start, here is the numbers:
VM Size Before After optimization
128G 460ms 10ms
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that everything is in place, we can announce the feature.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
diag 308 subcode 0 and 1 require several KVM and Ultravisor interactions.
Specific to these "soft" reboots are
* The "unshare all" UVC
* The "prepare for reset" UVC
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Now that we can't access guest memory anymore, we have a dedicated
satellite block that's a bounce buffer for instruction data.
We re-use the memop interface to copy the instruction data to / from
userspace. This lets us re-use a lot of QEMU code which used that
interface to make logical guest memory accesses which are not possible
anymore in protected mode anyway.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This contains 3 main changes:
1. changes in SIE control block handling for secure guests
2. helper functions for create/destroy/unpack secure guests
3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
machines
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The architecture states that we need to reset local IRQs for all CPU
resets. Because the old reset interface did not support the normal CPU
reset we never did that on a normal reset.
Let's implement an interface for the missing normal and clear resets
and reset all local IRQs, registers and control structures as stated
in the architecture.
Userspace might already reset the registers via the vcpu run struct,
but as we need the interface for the interrupt clearing part anyway,
we implement the resets fully and don't rely on userspace to reset the
rest.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/r/20200131100205.74720-4-frankja@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF.
This ioctl will be issued by QEMU during reset and includes the
the following steps:
- Release all device pages of the secure guest.
- Ask UV to terminate the guest via UV_SVM_TERMINATE ucall
- Unpin the VPA pages so that they can be migrated back to secure
side when guest becomes secure again. This is required because
pinned pages can't be migrated.
- Reinit the partition scoped page tables
After these steps, guest is ready to issue UV_ESM call once again
to switch to secure mode.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[Implementation of uv_svm_terminate() and its call from
guest shutdown path]
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[Unpinning of VPA pages]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
- Allow non-ISV data aborts to be reported to userspace
- Allow injection of data aborts from userspace
- Expose stolen time to guests
- GICv4 performance improvements
- vgic ITS emulation fixes
- Simplify FWB handling
- Enable halt pool counters
- Make the emulated timer PREEMPT_RT compliant
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl3VZ0EPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDwikQAJ/lQT97zEKV3dnpD/jjmEic/3QvTGljS4p+
pbwZAzyoSMc09lAK2pkaGRRc7euPnp4uLdRS4SenToVzmUQCzuxpQEEMdV/wjp4V
WLQ1WnTEAhYkm7k5MVo4uy3eD7nVWHWXgfQJvzL4EYZ5R/gd9NzBrnAc6LLV6hp9
0eLXIYrGFa1GESzF6P6sBDJhYpqVUcQlTI8I43kZH3iCC4+OsBxIkhHREZYsELhW
MZJIM9ZCskg2tvPC4UysaFiGjBYUJNJ0V+fFOrhyGzludP8i8rNRwgA60mznwvNw
V4N6/gLlkGK7nLqP+noUcU2wTBnIu389TcWWsX47CuDUzawCd8Fb9kX2zYauQSyS
ujE0uzoo/nhPFysh9OVVeLUZ6o/wotXoMp2t32t1c5h9N1hISEJvAWavMxTY6KzF
NEn9hWFjNcgBoArz9GKn9p2nBQpCDvu+2SlI4nL/qgZ7lPC4O3U1uq9myCOLj/gu
Can/u5EAwgyIBDVcEPHV+vP2GjyeERdXprGiG2VJTYlbHsdjgISTR+5Fy32KdGlP
YygeZxJtzretr3AYsWqD6Mri30FDSoYy9rUOWBpa+ZHbJPac0M+uKOqntV1OxPX9
QUkmNEdJcDr8fkcKxnEZ/MaZxFTGPp4vfhiT4A7dUkWTFq7ajvGo8IwN1d7PvWxS
LFMij1Js
=SxBH
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for Linux 5.5:
- Allow non-ISV data aborts to be reported to userspace
- Allow injection of data aborts from userspace
- Expose stolen time to guests
- GICv4 performance improvements
- vgic ITS emulation fixes
- Simplify FWB handling
- Enable halt pool counters
- Make the emulated timer PREEMPT_RT compliant
Conflicts:
include/uapi/linux/kvm.h
Allow user space to inform the KVM host where in the physical memory
map the paravirtualized time structures should be located.
User space can set an attribute on the VCPU providing the IPA base
address of the stolen time structure for that VCPU. This must be
repeated for every VCPU in the VM.
The address is given in terms of the physical address visible to
the guest and must be 64 byte aligned. The guest will discover the
address via a hypercall.
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
In some scenarios, such as buggy guest or incorrect configuration of the
VMM and firmware description data, userspace will detect a memory access
to a portion of the IPA, which is not mapped to any MMIO region.
For this purpose, the appropriate action is to inject an external abort
to the guest. The kernel already has functionality to inject an
external abort, but we need to wire up a signal from user space that
lets user space tell the kernel to do this.
It turns out, we already have the set event functionality which we can
perfectly reuse for this.
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
For a long time, if a guest accessed memory outside of a memslot using
any of the load/store instructions in the architecture which doesn't
supply decoding information in the ESR_EL2 (the ISV bit is not set), the
kernel would print the following message and terminate the VM as a
result of returning -ENOSYS to userspace:
load/store instruction decoding not implemented
The reason behind this message is that KVM assumes that all accesses
outside a memslot is an MMIO access which should be handled by
userspace, and we originally expected to eventually implement some sort
of decoding of load/store instructions where the ISV bit was not set.
However, it turns out that many of the instructions which don't provide
decoding information on abort are not safe to use for MMIO accesses, and
the remaining few that would potentially make sense to use on MMIO
accesses, such as those with register writeback, are not used in
practice. It also turns out that fetching an instruction from guest
memory can be a pretty horrible affair, involving stopping all CPUs on
SMP systems, handling multiple corner cases of address translation in
software, and more. It doesn't appear likely that we'll ever implement
this in the kernel.
What is much more common is that a user has misconfigured his/her guest
and is actually not accessing an MMIO region, but just hitting some
random hole in the IPA space. In this scenario, the error message above
is almost misleading and has led to a great deal of confusion over the
years.
It is, nevertheless, ABI to userspace, and we therefore need to
introduce a new capability that userspace explicitly enables to change
behavior.
This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
which does exactly that, and introduces a new exit reason to report the
event to userspace. User space can then emulate an exception to the
guest, restart the guest, suspend the guest, or take any other
appropriate action as per the policy of the running system.
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
the next instruction to be single stepped via the
KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.
This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
to inform userspace about the state of single stepping support.
We currently don't have support for guest single stepping implemented
in Book3S HV so the capability is only present for Book3S PR and
BookE.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Some additional RISC-V updates for v5.4-rc1. This includes one
significant fix:
- Prevent interrupts from being unconditionally re-enabled during
exception handling if they were disabled in the context in which the
exception occurred
Also a few other fixes:
- Fix a build error when sparse memory support is manually enabled
- Prevent CPUs beyond CONFIG_NR_CPUS from being enabled in early boot
And a few minor improvements:
- DT improvements: in the FU540 SoC DT files, improve U-Boot
compatibility by adding an "ethernet0" alias, drop an unnecessary
property from the DT files, and add support for the PWM device
- KVM preparation: add a KVM-related macro for future RISC-V KVM
support, and export some symbols required to build KVM support as
modules
- defconfig additions: build more drivers by default for QEMU
configurations
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl2OUysACgkQx4+xDQu9
Kks5Lg//bVTjarTnzgelTMz99UWiVhFc9X2H/r0zPWAms6if1bQVxIPDzffJnO9n
ldnYMLKSo3s7vUNdbs8NqdrgxZ21XSU1x3KllnEPub89e5e12wjMpqFcow0Yx0GF
16QEsHfvUsAfIcXJinf6YnQ1AO3n+DnvhxGE3ey+8AIJk676cnsxU0R5o9defLdk
XCC/o7BAt+qBZytJiSeIKQkozQkPbrzoQs2HZpqm/pDllVMFrpYAKaVLWMiqdvjO
teMqEgeQLjnHxrAHrfeKG8ssPmnf81TWOFM1fmUy2kqxsGiZSPj2LrpaghQpN/EB
1j0irgzesI0vX90rYwV4WtYoeBKICqcXo+c6wVrKaUrnBX0NiQbt55Ryf2k0JFw7
YlEPjEt46qRMLQ5J5LFALkp1bSCwX4XU3BVKCHLuzTKlg3ntHJGYAA9gPbGIdmYK
NkNYqQUowOWI6x9SkPJwStpXyckqCkj9nSp8WV5YqmgrpfZjw0b7yaU1GxfLhVkt
ygGyl0EMlL3iwcGHfn+qXg8m2zB9PjcdJZGwkAks0kWroxxg+xnEfA413zaiVNO2
AKFRLQtoZG6mtujoDBNxyc09ewovte7lXil/+rLQEcQhZJPMpK3oiM53NH736Zi5
fpt7YhanIGRswPaNQgwz981R23aDb6NZXUUaI4XStfDxJN5Px6M=
=wfz5
-----END PGP SIGNATURE-----
Merge tag 'riscv/for-v5.4-rc1-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Paul Walmsley:
"Some additional RISC-V updates.
This includes one significant fix:
- Prevent interrupts from being unconditionally re-enabled during
exception handling if they were disabled in the context in which
the exception occurred
Also a few other fixes:
- Fix a build error when sparse memory support is manually enabled
- Prevent CPUs beyond CONFIG_NR_CPUS from being enabled in early boot
And a few minor improvements:
- DT improvements: in the FU540 SoC DT files, improve U-Boot
compatibility by adding an "ethernet0" alias, drop an unnecessary
property from the DT files, and add support for the PWM device
- KVM preparation: add a KVM-related macro for future RISC-V KVM
support, and export some symbols required to build KVM support as
modules
- defconfig additions: build more drivers by default for QEMU
configurations"
* tag 'riscv/for-v5.4-rc1-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Avoid interrupts being erroneously enabled in handle_exception()
riscv: dts: sifive: Drop "clock-frequency" property of cpu nodes
riscv: dts: sifive: Add ethernet0 to the aliases node
RISC-V: Export kernel symbols for kvm
KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface
arch/riscv: disable excess harts before picking main boot hart
RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig
RISC-V: Fix building error when CONFIG_SPARSEMEM_MANUAL=y
riscv: dts: Add DT support for SiFive FU540 PWM driver
Hyper-V direct tlb flush function should be enabled for
guest that only uses Hyper-V hypercall. User space
hypervisor(e.g, Qemu) can disable KVM identification in
CPUID and just exposes Hyper-V identification to make
sure the precondition. Add new KVM capability KVM_CAP_
HYPERV_DIRECT_TLBFLUSH for user space to enable Hyper-V
direct tlb function and this function is default to be
disabled in KVM.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We will be using ONE_REG interface accessing VCPU registers from
user-space hence we add KVM_REG_RISCV for RISC-V VCPU registers.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Receiving an unexpected exit reason from hardware should be considered
as a severe bug in KVM. Therefore, instead of just injecting #UD to
guest and ignore it, exit to userspace on internal error so that
it could handle it properly (probably by terminating guest).
In addition, prefer to use vcpu_unimpl() instead of WARN_ONCE()
as handling unexpected exit reason should be a rare unexpected
event (that was expected to never happen) and we prefer to print
a message on it every time it occurs to guest.
Furthermore, dump VMCS/VMCB to dmesg to assist diagnosing such cases.
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While parts of the VGIC support a large number of vcpus (we
bravely allow up to 512), other parts are more limited.
One of these limits is visible in the KVM_IRQ_LINE ioctl, which
only allows 256 vcpus to be signalled when using the CPU or PPI
types. Unfortunately, we've cornered ourselves badly by allocating
all the bits in the irq field.
Since the irq_type subfield (8 bit wide) is currently only taking
the values 0, 1 and 2 (and we have been careful not to allow anything
else), let's reduce this field to only 4 bits, and allocate the
remaining 4 bits to a vcpu2_index, which acts as a multiplier:
vcpu_id = 256 * vcpu2_index + vcpu_index
With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
allowing this to be discovered, it becomes possible to inject
PPIs to up to 4096 vcpus. But please just don't.
Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories. We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory. Fix up the documentation
to match that.
Fixes: ed16648eb5 ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some events can provide a guest with information about other guests or the
host (e.g. L3 cache stats); providing the capability to restrict access
to a "safe" set of events would limit the potential for the PMU to be used
in any side channel attacks. This change introduces a new VM ioctl that
sets an event filter. If the guest attempts to program a counter for
any blacklisted or non-whitelisted event, the kernel counter won't be
created, so any RDPMC/RDMSR will show 0 instances of that event.
Signed-off-by: Eric Hankland <ehankland@google.com>
[Lots of changes. All remaining bugs are probably mine. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow guest reads CORE cstate when exposing host CPU power management capabilities
to the guest. PKG cstate is restricted to avoid a guest to get the whole package
information in multi-tenant scenario.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- guest SVE support
- guest Pointer Authentication support
- Better discrimination of perf counters between host and guests
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlzMM9kVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDEp8P/iqZvvZlLdlnWQwluWh237c28kAo
zELO0L7Wl+OJ66v2hzM+NPBi5kv/9pSv7AoKNLv3398YmKFt0n7yUB+MHi0BC9xi
ZEp4etCOiVcqcWWeDiAXLdR9OQlb7IDBDc56s4V9HQgK3sEb4u8aEJIy/nDBVniv
GVLMh1EOsrviIYso6UVxI1X7lPQevpCS0kv9/llhhzEj8QDxnQThjDuW3wrAyhQi
F9XNVjAMW8rft7vvok9cxT4v+TR1HgUajquoSrjXuonWHgKnC9tSH/dHILNK8Zij
5OApojGlZQrXIa5Sk3JOhGahVVY9Y+ewsw58J5bJxd0/xrKXnWk/Lann7NE+UcBf
RJMHfanIO/+JJRzHhagejK7pqnYXD1PWBwF8z3Hefs1IVw4eBvPBGuhIULJ6+eSP
+3JCwiOiwshG43gZlGmHcgvhPdeX4r/BlopWV9+0X/gAjcU1+3+ZG6J3jeAcC1Kx
i481dSzlZ7Ar7VWDCk7WgcmDvUwHXtxq0HbqzQjPBO04kkakjdPZZrZIX3+Qhlem
GpkPVb2z5h5KTk9Fx03ZXxPVdiOQh1UmNC8jlsYZPWcJVTLkySs7HWXZJe+WTs4Z
NLuen/eA4/NCon+UA6XdIG5Ddn/J39UuF1lCApHPHn576rwz+HmqpcN59XiU6y4h
XHIxzajFcXNpn802
=fjph
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 5.2
- guest SVE support
- guest Pointer Authentication support
- Better discrimination of perf counters between host and guests
Conflicts:
include/uapi/linux/kvm.h
- Fix a bug, fix a spelling mistake, remove some useless code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJc2kTEAAoJEJ2a6ncsY3GfS88IAImcIlKXMvzSKtHFxGpRap17
9LTZs5MQAUZHVMFJXmrJLBgogtGxUw53aX53woeyerytZsoGU4+YzwgLhk4XBEzA
5Kt5ahlxu82sa2ThH1zyLlNWFXiTECgD5ErNTdavLbNlaKE8YG160+65/mSyixGz
vs5wLSYGv/37no1ay6PIZ3DtwqdrYq5nJbuG+ZsaamUHPJOGprqHqg0gaTJ877NZ
yQDUS7OVuEJ1pdUUK/elP+cnlqR9smaP5OUNsXYMHWJgPJMjc27/thBJy93iS1kk
/zKQ8AFmxqoaePnR7ymTbqurfFFHBiSavUmyWopSQppNHCf4DDE8XjLs9MXKez8=
=Lco4
-----END PGP SIGNATURE-----
Merge tag 'kvm-ppc-next-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
PPC KVM update for 5.2
* Support for guests to access the new POWER9 XIVE interrupt controller
hardware directly, reducing interrupt latency and overhead for guests.
* In-kernel implementation of the H_PAGE_INIT hypercall.
* Reduce memory usage of sparsely-populated IOMMU tables.
* Several bug fixes.
Second PPC KVM update for 5.2
* Fix a bug, fix a spelling mistake, remove some useless code.
The previous KVM_CAP_MANUAL_DIRTY_LOG_PROTECT has some problem which
blocks the correct usage from userspace. Obsolete the old one and
introduce a new capability bit for it.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to
let QEMU connect the vCPU presenters to the XIVE KVM device if
required. The capability is not advertised for now as the full support
for the XIVE native exploitation mode is not yet available. When this
is case, the capability will be advertised on PowerNV Hypervisors
only. Nested guests (pseries KVM Hypervisor) are not supported.
Internally, the interface to the new KVM device is protected with a
new interrupt mode: KVMPPC_IRQ_XIVE.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This is the basic framework for the new KVM device supporting the XIVE
native exploitation mode. The user interface exposes a new KVM device
to be created by QEMU, only available when running on a L0 hypervisor.
Support for nested guests is not available yet.
The XIVE device reuses the device structure of the XICS-on-XIVE device
as they have a lot in common. That could possibly change in the future
if the need arise.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This patch advertises the capability of two cpu feature called address
pointer authentication and generic pointer authentication. These
capabilities depend upon system support for pointer authentication and
VHE mode.
The current arm64 KVM partially implements pointer authentication and
support of address/generic authentication are tied together. However,
separate ABI requirements for both of them is added so that any future
isolated implementation will not require any ABI changes.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
To provide a uniform way to check for KVM SVE support amongst other
features, this patch adds a suitable capability KVM_CAP_ARM_SVE,
and reports it as present when SVE is available.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Some aspects of vcpu configuration may be too complex to be
completed inside KVM_ARM_VCPU_INIT. Thus, there may be a
requirement for userspace to do some additional configuration
before various other ioctls will work in a consistent way.
In particular this will be the case for SVE, where userspace will
need to negotiate the set of vector lengths to be made available to
the guest before the vcpu becomes fully usable.
In order to provide an explicit way for userspace to confirm that
it has finished setting up a particular vcpu feature, this patch
adds a new ioctl KVM_ARM_VCPU_FINALIZE.
When userspace has opted into a feature that requires finalization,
typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
matching call to KVM_ARM_VCPU_FINALIZE is now required before
KVM_RUN or KVM_GET_REG_LIST is allowed. Individual features may
impose additional restrictions where appropriate.
No existing vcpu features are affected by this, so current
userspace implementations will continue to work exactly as before,
with no need to issue KVM_ARM_VCPU_FINALIZE.
As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
placeholder: no finalizable features exist yet, so ioctl is not
required and will always yield EINVAL. Subsequent patches will add
the finalization logic to make use of this ioctl for SVE.
No functional change for existing userspace.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The Arm SVE architecture defines registers that are up to 2048 bits
in size (with some possibility of further future expansion).
In order to avoid the need for an excessively large number of
ioctls when saving and restoring a vcpu's registers, this patch
adds a #define to make support for individual 2048-bit registers
through the KVM_{GET,SET}_ONE_REG ioctl interface official. This
will allow each SVE register to be accessed in a single call.
There are sufficient spare bits in the register id size field for
this change, so there is no ABI impact, providing that
KVM_GET_REG_LIST does not enumerate any 2048-bit register unless
userspace explicitly opts in to the relevant architecture-specific
features.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
With every new Hyper-V Enlightenment we implement we're forced to add a
KVM_CAP_HYPERV_* capability. While this approach works it is fairly
inconvenient: the majority of the enlightenments we do have corresponding
CPUID feature bit(s) and userspace has to know this anyways to be able to
expose the feature to the guest.
Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one
cap to rule them all!") returning all Hyper-V CPUID feature leaves.
Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible:
Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000,
0x40000001) and we would probably confuse userspace in case we decide to
return these twice.
KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop
KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is a per-VM capability which can be enabled by userspace so that
the faulting linear address will be included with the information
about a pending #PF in L2, and the "new DR6 bits" will be included
with the information about a pending #DB in L2. With this capability
enabled, the L1 hypervisor can now intercept #PF before CR2 is
modified. Under VMX, the L1 hypervisor can now intercept #DB before
DR6 and DR7 are modified.
When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
generally provide an appropriate payload when injecting a #PF or #DB
exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
checkpoints, this payload is not required.
Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
region while advanced debugging of RTM transactional regions was
enabled. This is the reverse of DR6.RTM, which is cleared in this
scenario.
This capability also enables exception.pending in struct
kvm_vcpu_events, which allows userspace to distinguish between pending
and injected exceptions.
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enlightened VMCS is opt-in. The current version does not contain all
fields supported by nested VMX so we must not advertise the
corresponding VMX features if enlightened VMCS is enabled.
Userspace is given the enlightened VMCS version supported by KVM as
part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to
be advertised to the nested hypervisor, currently done via a cpuid
leaf for Hyper-V.
Suggested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Coalesced pio is based on coalesced mmio and can be used for some port
like rtc port, pci-host config port and so on.
Specially in case of rtc as coalesced pio, some versions of windows guest
access rtc frequently because of rtc as system tick. guest access rtc like
this: write register index to 0x70, then write or read data from 0x71.
writing 0x70 port is just as index and do nothing else. So we can use
coalesced pio to handle this scene to reduce VM-EXIT time.
When starting and closing a virtual machine, it will access pci-host config
port frequently. So setting these port as coalesced pio can reduce startup
and shutdown time.
without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8
using perf tools: (guest OS : windows 7 64bit)
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 86 30.99% 74.59% 9us 29us 10.75us (+- 3.41%)
0xcf8:POUT 1119 2.60% 2.12% 2.79us 56.83us 3.41us (+- 2.23%)
with my patch
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 106 32.02% 29.47% 0us 10us 1.57us (+- 7.38%)
0xcf8:POUT 1065 1.67% 0.28% 0.41us 65.44us 0.66us (+- 10.55%)
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using hypercall for sending IPIs is faster because this allows to specify
any number of vCPUs (even > 64 with sparse CPU set), the whole procedure
will take only one VMEXIT.
Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi
hypercall can't be 'fast' (passing parameters through registers) but
apparently this is not true, Windows always uses it as 'fast' so we need
to support that.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds a KVM_PPC_NO_HASH flag to the flags field of the
kvm_ppc_smmu_info struct, and arranges for it to be set when
running as a nested hypervisor, as an unambiguous indication
to userspace that HPT guests are not supported. Reporting the
KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as
indicating only that the new HPT features in ISA V3.0 are not
supported, leaving it ambiguous whether pre-V3.0 HPT features
are supported.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
With this, userspace can enable a KVM-HV guest to run nested guests
under it.
The administrator can control whether any nested guests can be run;
setting the "nested" module parameter to false prevents any guests
becoming nested hypervisors (that is, any attempt to enable the nested
capability on a guest will fail). Guests which are already nested
hypervisors will continue to be so.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Allow specifying the physical address size limit for a new
VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This
allows us to finalise the stage2 page table as early as possible
and hence perform the right checks on the memory slots
without complication. The size is encoded as Log2(PA_Size) in
bits[7:0] of the type field. For backward compatibility the
value 0 is reserved and implies 40bits. Also, lift the limit
of the IPA to host limit and allow lower IPA sizes (e.g, 32).
The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE
for the availability of this feature. The cap check returns the
maximum limit for the physical address shift supported by the host.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <cdall@kernel.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.
Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).
Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltxmb4VHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpD7E0P/0qn1IMtskaC7EglFCm72+NXe1CW
ZAtxTHzetjf7977dA3bVsg4gEKvVx5b3YuRT76u4hBoSa0rFJ8Q9iSC8wL4u9Idf
JUQjwVIUxMeGW5fR0VFDkd9SkDYtNGdjQcVl2I8UpV+lnLC/2Vfr4xR5qBad2pAQ
zjthdpQMjZWClyhPkOv6WjVsW0lNw0xDkZWgCViBY+TdT7Gmw/q8hmvj9TEwbMGT
7tmQl9MupQ2bLY8WuTiGA6eNiEZld9esJGthI43xGQDJl4Y3FeciIZWcBru20+wu
GnC3QS3FlmYlp2WuWcKU9lEGXhmoX/7/1WVhZkoMsIvi05c2JCxSxstK7QNfUaAH
8q2/Wc0fYIGm2owH+b1Mpn0w37GZtgl7Bxxzakg7B7Ko0q/EnO7z6XVup1/abKRU
NtUKlWIL7NDiHjHO6j0hBb3rGi7B3wo86P7GTPJb12Dg9EBF5DVhekXeGI/ChzE9
WIV1PxR0seSapzlJ92HHmWLAtcRLtXXesqcctmN4d2URBtsx9DEwo0Upiz//reYE
TBncQbtniVt2xXEl7sqNEYei75IxC3Dg1AgDL/zVQDl8PW0UvKo8Qb0cW7EnF9Vg
AcjD6R72dAgbqUMYOP0nriKxzXwa0Jls9aF3zBgcikKMGeyD6Z/Exlq4LexhSeuw
cWKsrQUYcLGKZPRN
=b6+A
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 4.19
- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups
For nested virtualization L0 KVM is managing a bit of state for L2 guests,
this state can not be captured through the currently available IOCTLs. In
fact the state captured through all of these IOCTLs is usually a mix of L1
and L2 state. It is also dependent on whether the L2 guest was running at
the moment when the process was interrupted to save its state.
With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
that is in VMX operation.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
[karahmed@ - rename structs and functions and make them ready for AMD and
address previous comments.
- handle nested.smm state.
- rebase & a bit of refactoring.
- Merge 7/8 and 8/8 into one patch. ]
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
General KVM huge page support on s390 has to be enabled via the
kvm.hpage module parameter. Either nested or hpage can be enabled, as
we currently do not support vSIE for huge backed guests. Once the vSIE
support is added we will either drop the parameter or enable it as
default.
For a guest the feature has to be enabled through the new
KVM_CAP_S390_HPAGE_1M capability and the hpage module
parameter. Enabling it means that cmm can't be enabled for the vm and
disables pfmf and storage key interpretation.
This is due to the fact that in some cases, in upcoming patches, we
have to split huge pages in the guest mapping to be able to set more
granular memory protection on 4k pages. These split pages have fake
page tables that are not visible to the Linux memory management which
subsequently will not manage its PGSTEs, while the SIE will. Disabling
these features lets us manage PGSTE data in a consistent matter and
solve that problem.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
For the arm64 RAS Extension, user space can inject a virtual-SError
with specified ESR. So user space needs to know whether KVM support
to inject such SError, this interface adds this query for this capability.
KVM will check whether system support RAS Extension, if supported, KVM
returns true to user space, otherwise returns false.
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
[expanded documentation wording]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
KVM_X86_DISABLE_EXITS_HTL really refers to exit on halt.
Obviously a typo: should be named KVM_X86_DISABLE_EXITS_HLT.
Fixes: caa057a2ca ("KVM: X86: Provide a capability to disable HLT intercepts")
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We need a new capability to indicate support for the newly added
HvFlushVirtualAddress{List,Space}{,Ex} hypercalls. Upon seeing this
capability, userspace is supposed to announce PV TLB flush features
by setting the appropriate CPUID bits (if needed).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of
capabilities.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Allowing a guest to execute MWAIT without interception enables a guest
to put a (physical) CPU into a power saving state, where it takes
longer to return from than what may be desired by the host.
Don't give a guest that power over a host by default. (Especially,
since nothing prevents a guest from using MWAIT even when it is not
advertised via CPUID.)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace hardcoded padding size value for struct kvm_sync_regs
with #define SYNC_REGS_SIZE_BYTES.
Also update the value specified in api.txt from outdated hardcoded
value to SYNC_REGS_SIZE_BYTES.
Signed-off-by: Ken Hofsass <hofsass@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
In Hyper-V, the fast guest->host notification mechanism is the
SIGNAL_EVENT hypercall, with a single parameter of the connection ID to
signal.
Currently this hypercall incurs a user exit and requires the userspace
to decode the parameters and trigger the notification of the potentially
different I/O context.
To avoid the costly user exit, process this hypercall and signal the
corresponding eventfd in KVM, similar to ioeventfd. The association
between the connection id and the eventfd is established via the newly
introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an
(srcu-protected) IDR.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[asm/hyperv.h changes approved by KY Srinivasan. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Provide a new KVM capability that allows bits within MSRs to be recognized
as features. Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Tweaked documentation. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The new firmware interfaces for branch prediction behaviour changes
are transparently available for the guest. Nevertheless, there is
new state attached that should be migrated and properly resetted.
Provide a mechanism for handling reset, migration and VSIE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[Changed capability number to 152. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace
information about the underlying machine's level of vulnerability
to the recently announced vulnerabilities CVE-2017-5715,
CVE-2017-5753 and CVE-2017-5754, and whether the machine provides
instructions to assist software to work around the vulnerabilities.
The ioctl returns two u64 words describing characteristics of the
CPU and required software behaviour respectively, plus two mask
words which indicate which bits have been filled in by the kernel,
for extensibility. The bit definitions are the same as for the
new H_GET_CPU_CHARACTERISTICS hypercall.
There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which
indicates whether the new ioctl is available.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This part of Secure Encrypted Virtualization (SEV) patch series focuses on KVM
changes required to create and manage SEV guests.
SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have their
pages (code and data) secured such that only the guest itself has access to
unencrypted version. Each encrypted VM is associated with a unique encryption key;
if its data is accessed to a different entity using a different key the encrypted
guest's data will be incorrectly decrypted, leading to unintelligible data.
This security model ensures that hypervisor will no longer able to inspect or
alter any guest code or data.
The key management of this feature is handled by a separate processor known as
the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key
Management Specification (see below) provides a set of commands which can be
used by hypervisor to load virtual machine keys through the AMD-SP driver.
The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPT_OP). The
ioctl will be used by qemu to issue SEV guest-specific commands defined in Key
Management Specification.
The following links provide additional details:
AMD Memory Encryption white paper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34
SEV Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
SEV Guest BIOS support:
SEV support has been add to EDKII/OVMF BIOS
https://github.com/tianocore/edk2
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Old kernels did not check for zero in the irq_state.flags field and old
QEMUs did not zero the flag/reserved fields when calling
KVM_S390_*_IRQ_STATE. Let's add comments to prevent future uses of
these fields.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION
and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to
register/unregister the guest memory regions which may contain the encrypted
data (e.g guest RAM, PCI BAR, SMRAM etc).
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Improvements-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
If the hardware supports memory encryption then the
KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform
specific memory encryption commands.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Common:
- Python 3 support in kvm_stat
- Accounting of slabs to kmemcg
ARM:
- Optimized arch timer handling for KVM/ARM
- Improvements to the VGIC ITS code and introduction of an ITS reset
ioctl
- Unification of the 32-bit fault injection logic
- More exact external abort matching logic
PPC:
- Support for running hashed page table (HPT) MMU mode on a host that
is using the radix MMU mode; single threaded mode on POWER 9 is
added as a pre-requisite
- Resolution of merge conflicts with the last second 4.14 HPT fixes
- Fixes and cleanups
s390:
- Some initial preparation patches for exitless interrupts and crypto
- New capability for AIS migration
- Fixes
x86:
- Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs, and
after-reset state
- Refined dependencies for VMX features
- Fixes for nested SMI injection
- A lot of cleanups
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJaDayXAAoJEED/6hsPKofo/3UH/3HvlcHt+ADTkCU1/iiKAs+i
0zngIOXIxgHDnV0ww6bV+Znww0BzTYgKCAXX76z603jdpDwG/pzQQcbLDF5ZoJnD
sQtF10gZinWaRsHlfbLqjrHGL2pGDHO1UKBKLJ0bAIyORPZBxs7i+VmrY/blnr9c
0wsybJ8RbvwAxjsDL5jeX/z4NehPupmKUc4Lf0eZdSHwVOf9sjn+MP6jJ0r2JcIb
D+zddPBiLStzN97t4gZpQsrlj3LKrDS+6hY+1TjSvlh+yHKFVFh58VhLm4DuDeb5
bYOAlWJ/gAWEzfvr5Ld+Nd7SqWWn/14logPkQ4gcU4BI/neAOzk4c6hJfCHl1nk=
=593n
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"First batch of KVM changes for 4.15
Common:
- Python 3 support in kvm_stat
- Accounting of slabs to kmemcg
ARM:
- Optimized arch timer handling for KVM/ARM
- Improvements to the VGIC ITS code and introduction of an ITS reset
ioctl
- Unification of the 32-bit fault injection logic
- More exact external abort matching logic
PPC:
- Support for running hashed page table (HPT) MMU mode on a host that
is using the radix MMU mode; single threaded mode on POWER 9 is
added as a pre-requisite
- Resolution of merge conflicts with the last second 4.14 HPT fixes
- Fixes and cleanups
s390:
- Some initial preparation patches for exitless interrupts and crypto
- New capability for AIS migration
- Fixes
x86:
- Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs,
and after-reset state
- Refined dependencies for VMX features
- Fixes for nested SMI injection
- A lot of cleanups"
* tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits)
KVM: s390: provide a capability for AIS state migration
KVM: s390: clear_io_irq() requests are not expected for adapter interrupts
KVM: s390: abstract conversion between isc and enum irq_types
KVM: s390: vsie: use common code functions for pinning
KVM: s390: SIE considerations for AP Queue virtualization
KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup
KVM: PPC: Book3S HV: Cosmetic post-merge cleanups
KVM: arm/arm64: fix the incompatible matching for external abort
KVM: arm/arm64: Unify 32bit fault injection
KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET
KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared
KVM: arm/arm64: vgic-its: New helper functions to free the caches
KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device
arm/arm64: KVM: Load the timer state when enabling the timer
KVM: arm/arm64: Rework kvm_timer_should_fire
KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
KVM: arm/arm64: Move phys_timer_emulate function
KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
...
The AIS capability was introduced in 4.12, while the interface to
migrate the state was added in 4.13. Unfortunately it is not possible
for userspace to detect the migration capability without creating a flic
kvm device. As in QEMU the cpu model detection runs on the "none"
machine this will result in cpu model issues regarding the "ais"
capability.
To get the "ais" capability properly let's add a new KVM capability that
tells userspace that AIS states can be migrated.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Many user space API headers are missing licensing information, which
makes it hard for compliance tools to determine the correct license.
By default are files without license information under the default
license of the kernel, which is GPLV2. Marking them GPLV2 would exclude
them from being included in non GPLV2 code, which is obviously not
intended. The user space API headers fall under the syscall exception
which is in the kernels COPYING file:
NOTE! This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived work".
otherwise syscall usage would not be possible.
Update the files which contain no license information with an SPDX
license identifier. The chosen identifier is 'GPL-2.0 WITH
Linux-syscall-note' which is the officially assigned identifier for the
Linux syscall exception. SPDX license identifiers are a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne. See the previous patch in this series for the
methodology of how this patch was researched.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This adds information about storage keys to the struct returned by
the KVM_PPC_GET_SMMU_INFO ioctl. The new fields replace a pad field,
which was zeroed by previous kernel versions. Thus userspace that
knows about the new fields will see zeroes when running on an older
kernel, indicating that storage keys are not supported. The size of
the structure has not changed.
The number of keys is hard-coded for the CPUs supported by HV KVM,
which is just POWER7, POWER8 and POWER9.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Hyper-V identifies vCPUs by Virtual Processor Index, which can be
queried via HV_X64_MSR_VP_INDEX msr. It is defined by the spec as a
sequential number which can't exceed the maximum number of vCPUs per VM.
APIC ids can be sparse and thus aren't a valid replacement for VP
indices.
Current KVM uses its internal vcpu index as VP_INDEX. However, to make
it predictable and persistent across VM migrations, the userspace has to
control the value of VP_INDEX.
This patch achieves that, by storing vp_index explicitly on vcpu, and
allowing HV_X64_MSR_VP_INDEX to be set from the host side. For
compatibility it's initialized to KVM vcpu index. Also a few variables
are renamed to make clear distinction betweed this Hyper-V vp_index and
KVM vcpu_id (== APIC id). Besides, a new capability,
KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
attempting msr writes where unsupported, to avoid spamming error logs.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
There is a flaw in the Hyper-V SynIC implementation in KVM: when message
page or event flags page is enabled by setting the corresponding msr,
KVM zeroes it out. This is problematic because on migration the
corresponding MSRs are loaded on the destination, so the content of
those pages is lost.
This went unnoticed so far because the only user of those pages was
in-KVM hyperv synic timers, which could continue working despite that
zeroing.
Newer QEMU uses those pages for Hyper-V VMBus implementation, and
zeroing them breaks the migration.
Besides, in newer QEMU the content of those pages is fully managed by
QEMU, so zeroing them is undesirable even when writing the MSRs from the
guest side.
To support this new scheme, introduce a new capability,
KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
pages aren't zeroed out in KVM.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
In case of KVM_S390_GET_CMMA_BITS, the kernel does not only read struct
kvm_s390_cmma_log passed from userspace (which constitutes _IOC_WRITE),
it also writes back a return value (which constitutes _IOC_READ) making
this an _IOWR ioctl instead of _IOW.
Fixes: 4036e387 ("KVM: s390: ioctls to get and set guest storage attributes")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts
pending.
* Add the struct used in the ioctls to get and set CMMA attributes.
* Add the two functions needed to get and set the CMMA attributes for
guest pages.
* Add the two ioctls that use the aforementioned functions.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Now that userspace can set the virtual SMT mode by enabling the
KVM_CAP_PPC_SMT capability, it is useful for userspace to be able
to query the set of possible virtual SMT modes. This provides a
new capability, KVM_CAP_PPC_SMT_POSSIBLE, to provide this
information. The return value is a bitmap of possible modes, with
bit N set if virtual SMT mode 2^N is available. That is, 1 indicates
SMT1 is available, 2 indicates that SMT2 is available, 3 indicates
that both SMT1 and SMT2 are available, and so on.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This introduces a new KVM capability to control how KVM behaves
on machine check exception (MCE) in HV KVM guests.
If this capability has not been enabled, KVM redirects machine check
exceptions to guest's 0x200 vector, if the address in error belongs to
the guest. With this capability enabled, KVM will cause a guest exit
with the exit reason indicating an NMI.
The new capability is required to avoid problems if a new kernel/KVM
is used with an old QEMU, running a guest that doesn't issue
"ibm,nmi-register". As old QEMU does not understand the NMI exit
type, it treats it as a fatal error. However, the guest could have
handled the machine check error if the exception was delivered to
guest's 0x200 interrupt vector instead of NMI exit in case of old
QEMU.
[paulus@ozlabs.org - Reworded the commit message to be clearer,
enable only on HV KVM.]
Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Changes include:
- Using the common sysreg definitions between KVM and arm64
- Improved hyp-stub implementation with support for kexec and kdump on the 32-bit side
- Proper PMU exception handling
- Performance improvements of our GIC handling
- Support for irqchip in userspace with in-kernel arch-timers and PMU support
- A fix for a race condition in our PSCI code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJY/IasAAoJEEtpOizt6ddyd7gH/2N3BIMxi/Uqigx0e0byA43s
f+8gNq8A71VBTERGW2l9QP1/AZAXpQYNWdWmN2jn+91x2yoVL7AT00gEsliSLEZv
tqZaTGFXKi1vNihYrxEWm1mfVNzhRrnbW6vjLrO4J5Advq7T3OWhNuVt2BLTxz3Y
h0iqOWNVrUD9h3QSBFH8tz7yXhguDTSppAcXbE0tACdRu4vN50wqEWokHJG5TsMG
Tl3KYWrcc3YCKlAJGuJi7t5rMrXk+g1q6HnxlIN6OSk0POC2Vmw9/Gigtltj1Qwh
ZEAwsnka/U8ak8WaWeZa3EsGTSFSoAk/+pKv2FB8mFN+uOmWDqVlEiol4dW49AY=
=mEOk
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM Changes for v4.12.
Changes include:
- Using the common sysreg definitions between KVM and arm64
- Improved hyp-stub implementation with support for kexec and kdump on the 32-bit side
- Proper PMU exception handling
- Performance improvements of our GIC handling
- Support for irqchip in userspace with in-kernel arch-timers and PMU support
- A fix for a race condition in our PSCI code
Conflicts:
Documentation/virtual/kvm/api.txt
include/uapi/linux/kvm.h
Guests that are heavy on futexes end up IPI'ing each other a lot. That
can lead to significant slowdowns and latency increase for those guests
when running within KVM.
If only a single guest is needed on a host, we have a lot of spare host
CPU time we can throw at the problem. Modern CPUs implement a feature
called "MWAIT" which allows guests to wake up sleeping remote CPUs without
an IPI - thus without an exit - at the expense of never going out of guest
context.
The decision whether this is something sensible to use should be up to the
VM admin, so to user space. We can however allow MWAIT execution on systems
that support it properly hardware wise.
This patch adds a CAP to user space and a KVM cpuid leaf to indicate
availability of native MWAIT execution. With that enabled, the worst a
guest can do is waste as many cycles as a "jmp ." would do, so it's not
a privilege problem.
We consciously do *not* expose the feature in our CPUID bitmap, as most
people will want to benefit from sleeping vCPUs to allow for over commit.
Reported-by: "Gabriel L. Somlo" <gsomlo@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[agraf: fix amd, change commit message]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
without passing them to user space which saves time on switching
to user space and back.
This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
KVM tries to handle a TCE request in the real mode, if failed
it passes the request to the virtual mode to complete the operation.
If it a virtual mode handler fails, the request is passed to
the user space; this is not expected to happen though.
To avoid dealing with page use counters (which is tricky in real mode),
this only accelerates SPAPR TCE IOMMU v2 clients which are required
to pre-register the userspace memory. The very first TCE request will
be handled in the VFIO SPAPR TCE driver anyway as the userspace view
of the TCE table (iommu_table::it_userspace) is not allocated till
the very first mapping happens and we cannot call vmalloc in real mode.
If we fail to update a hardware IOMMU table unexpected reason, we just
clear it and move on as there is nothing really we can do about it -
for example, if we hot plug a VFIO device to a guest, existing TCE tables
will be mirrored automatically to the hardware and there is no interface
to report to the guest about possible failures.
This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
and associates a physical IOMMU table with the SPAPR TCE table (which
is a guest view of the hardware IOMMU table). The iommu_table object
is cached and referenced so we do not have to look up for it in real mode.
This does not implement the UNSET counterpart as there is no use for it -
once the acceleration is enabled, the existing userspace won't
disable it unless a VFIO container is destroyed; this adds necessary
cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.
This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
space.
This adds real mode version of WARN_ON_ONCE() as the generic version
causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
returns in the code, this also adds a check for already existing
vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().
This finally makes use of vfio_external_user_iommu_id() which was
introduced quite some time ago and was considered for removal.
Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds a capability number for in-kernel support for VFIO on
SPAPR platform.
The capability will tell the user space whether in-kernel handlers of
H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
must not attempt allocating a TCE table in the host kernel via
the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
will not be passed to the user space which is desired action in
the situation like that.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1. guarded storage support for guests
This contains an s390 base Linux feature branch that is necessary
to implement the KVM part
2. Provide an interface to implement adapter interruption suppression
which is necessary for proper zPCI support
3. Use more defines instead of numbers
4. Provide logging for lazy enablement of runtime instrumentation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJY50fGAAoJEBF7vIC1phx8TK4P/0y7J6go7NFniVXf0K5/teAZ
kH6BRwoWQDCGAwkpz8fqhKTNxdEq1bjLS5aNf2R3oGt61yoKim4GLSYeRymhKWo0
WZbwq8VpCWxjysx4wyg1nm4J05EDqaPvfxtD9ONCxZNXvPeZR5pIY1uo3twuNROS
z+bOtP1YK7FyzbgGsHnc47YKrOMa3LwTQfAAhf3eVHs5f9cksIqlQYlo/H7fJY96
Z3qpZmrwWP5avZAenxfUKS2dEiT2lzFvUP1waYwm4sZ7fImdgBRlnFzFK/O7qiIV
c9KHW+qao37NU1AsqJibXZvFQBJixKxWa8nCKsagpkZAMDoPNoutHsxXpNJNrp4S
Zq+OWbiYRKXnndGI94RtMUE+bmRvcj37a648+nwzRPA9N889GQR829qKLANsqCwY
7bM3gE9d5BkAASp0uZ4bgrfyU4tlGTH8WLNSLM+upeOGjWEk3HdOEFuJQNspgaWQ
oaAl4TROQhGvZXYxzTHP26jEG0IAIkUd2CUULAe8lZ2vk+kc4xPAfhjQ41DFembz
fDMXXKgjtHvT6Z5USbGJdkOBkhzGyOuETZJyrj7D60OmlHXeoBPgWIJ21cS8ZU4F
M0cM+1DcXbOOKjfIbZRbLh9OdAhBya8VzHnyw7T2vl9hXIAhEH9MwTfVsnBujdca
Q4BBOIh3HV9VfT0exaaF
=Jtjb
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
From: Christian Borntraeger <borntraeger@de.ibm.com>
KVM: s390: features for 4.12
1. guarded storage support for guests
This contains an s390 base Linux feature branch that is necessary
to implement the KVM part
2. Provide an interface to implement adapter interruption suppression
which is necessary for proper zPCI support
3. Use more defines instead of numbers
4. Provide logging for lazy enablement of runtime instrumentation
We have 2 modes for dealing with interrupts in the ARM world. We can
either handle them all using hardware acceleration through the vgic or
we can emulate a gic in user space and only drive CPU IRQ pins from
there.
Unfortunately, when driving IRQs from user space, we never tell user
space about events from devices emulated inside the kernel, which may
result in interrupt line state changes, so we lose out on for example
timer and PMU events if we run with user space gic emulation.
Define an ABI to publish such device output levels to userspace.
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Introduce a cap to enable AIS facility bit, and add documentation
for this capability.
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add a new KVM_CAP_MIPS_64BIT capability to indicate that 64-bit MIPS
guests are available and supported. In this case it should still be
possible to run 32-bit guest code. If not available it won't be possible
to run 64-bit guest code and the instructions may not be available, or
the kernel may not support full context switching of 64-bit registers.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Add new KVM_CAP_MIPS_VZ and KVM_CAP_MIPS_TE capabilities, and in order
to allow MIPS KVM to support VZ without confusing old users (which
expect the trap & emulate implementation), define and start checking
KVM_CREATE_VM type codes.
The codes available are:
- KVM_VM_MIPS_TE = 0
This is the current value expected from the user, and will create a
VM using trap & emulate in user mode, confined to the user mode
address space. This may in future become unavailable if the kernel is
only configured to support VZ, in which case the EINVAL error will be
returned and KVM_CAP_MIPS_TE won't be available even though
KVM_CAP_MIPS_VZ is.
- KVM_VM_MIPS_VZ = 1
This can be provided when the KVM_CAP_MIPS_VZ capability is available
to create a VM using VZ, with a fully virtualized guest virtual
address space. If VZ support is unavailable in the kernel, the EINVAL
error will be returned (although old kernels without the
KVM_CAP_MIPS_VZ capability may well succeed and create a trap &
emulate VM).
This is designed to allow the desired implementation (T&E vs VZ) to be
potentially chosen at runtime rather than being fixed in the kernel
configuration.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-doc@vger.kernel.org
This patch adds guarded storage support for KVM guest. We need to
setup the necessary control blocks, the kvm_run structure for the
new registers, the necessary wrappers for VSIE, as well as the
machine check save areas.
GS is enabled lazily and the register saving and reloading is done in
KVM code. As this feature adds new content for migration, we provide
a new capability for enablement (KVM_CAP_S390_GS).
Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
a VCPU out of KVM_RUN through a POSIX signal. A signal is attached
to a dummy signal handler; by blocking the signal outside KVM_RUN and
unblocking it inside, this possible race is closed:
VCPU thread service thread
--------------------------------------------------------------
check flag
set flag
raise signal
(signal handler does nothing)
KVM_RUN
However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
tsk->sighand->siglock on every KVM_RUN. This lock is often on a
remote NUMA node, because it is on the node of a thread's creator.
Taking this lock can be very expensive if there are many userspace
exits (as is the case for SMP Windows VMs without Hyper-V reference
time counter).
As an alternative, we can put the flag directly in kvm_run so that
KVM can see it:
VCPU thread service thread
--------------------------------------------------------------
raise signal
signal handler
set run->immediate_exit
KVM_RUN
check run->immediate_exit
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
advertise whether KVM is capable of handling the PAPR extensions for
resizing the hashed page table during guest runtime. It also adds
definitions for two new VM ioctl()s to implement this extension, and
documentation of the same.
Note that, HPT resizing is already possible with KVM PR without kernel
modification, since the HPT is managed within userspace (qemu). The
capability defined here will only be set where an in-kernel implementation
of resizing is necessary, i.e. for KVM HV. To determine if the userspace
resize implementation can be used, it's necessary to check
KVM_CAP_PPC_ALLOC_HTAB. Unfortunately older kernels incorrectly set
KVM_CAP_PPC_ALLOC_HTAB even with KVM PR. If userspace it want to support
resizing with KVM PR on such kernels, it will need a workaround.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest. The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables. (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).
The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.
The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU. It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.
Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
x86: userspace can now hide nested VMX features from guests; nested
VMX can now run Hyper-V in a guest; support for AVX512_4VNNIW and
AVX512_FMAPS in KVM; infrastructure support for virtual Intel GPUs.
PPC: support for KVM guests on POWER9; improved support for interrupt
polling; optimizations and cleanups.
s390: two small optimizations, more stuff is in flight and will be
in 4.11.
ARM: support for the GICv3 ITS on 32bit platforms.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQExBAABCAAbBQJYTkP0FBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
lZIH/iT1n9OQXcuTpYYnQhuCenzI3GZZOIMTbCvK2i5bo0FIJKxVn0EiAAqZSXvO
nO185FqjOgLuJ1AD1kJuxzye5suuQp4HIPWWgNHcexLuy43WXWKZe0IQlJ4zM2Xf
u31HakpFmVDD+Cd1qN3yDXtDrRQ79/xQn2kw7CWb8olp+pVqwbceN3IVie9QYU+3
gCz0qU6As0aQIwq2PyalOe03sO10PZlm4XhsoXgWPG7P18BMRhNLTDqhLhu7A/ry
qElVMANT7LSNLzlwNdpzdK8rVuKxETwjlc1UP8vSuhrwad4zM2JJ1Exk26nC2NaG
D0j4tRSyGFIdx6lukZm7HmiSHZ0=
=mkoB
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small release, the most interesting stuff is x86 nested virt
improvements.
x86:
- userspace can now hide nested VMX features from guests
- nested VMX can now run Hyper-V in a guest
- support for AVX512_4VNNIW and AVX512_FMAPS in KVM
- infrastructure support for virtual Intel GPUs.
PPC:
- support for KVM guests on POWER9
- improved support for interrupt polling
- optimizations and cleanups.
s390:
- two small optimizations, more stuff is in flight and will be in
4.11.
ARM:
- support for the GICv3 ITS on 32bit platforms"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest
KVM: arm/arm64: timer: Check for properly initialized timer on init
KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs
KVM: x86: Handle the kthread worker using the new API
KVM: nVMX: invvpid handling improvements
KVM: nVMX: check host CR3 on vmentry and vmexit
KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry
KVM: nVMX: propagate errors from prepare_vmcs02
KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID
KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
KVM: nVMX: support restore of VMX capability MSRs
KVM: nVMX: generate non-true VMX MSRs based on true versions
KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs.
KVM: x86: Add kvm_skip_emulated_instruction and use it.
KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12
KVM: VMX: Reorder some skip_emulated_instruction calls
KVM: x86: Add a return value to kvm_emulate_cpuid
KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h
...
The KVM_PPC_PVINFO_FLAGS_EV_IDLE macro defines a bit for use in the flags
field of struct kvm_ppc_pvinfo. However, changes since that was introduced
have moved it away from that structure definition, which is confusing.
Move it back next to the structure it belongs with.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Userspace can read the exact value of kvmclock by reading the TSC
and fetching the timekeeping parameters out of guest memory. This
however is brittle and not necessary anymore with KVM 4.11. Provide
a mechanism that lets userspace know if the new KVM_GET_CLOCK
semantics are in effect, and---since we are at it---if the clock
is stable across all VCPUs.
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Includes GSI routing support to go along with the new VGIC and a small fix that
has been cooking in -next for a while.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXoydqAAoJEEtpOizt6ddyM3oH/1A4VeG/J9q4fBPXqY2tVWXs
c3P7UgNcrEgUNs/F9ykQY/lb31deecUzaBt1OyTf+RlsNbihq3dQdYcBhxtUODw/
Faok582ya3UFgLW+IRHcID0EbkVOpIzMhOStYsnU/Dz7HG1JL9HdPzwkid7iu9LT
fI6yrrBnJFjdWAAQ4BkcEKBENRsY8NTs7jX5vnFA92MkUBby7BmariPDD3FtrB+f
Ob9B7CxM30pNqsN7OA/QvFOHMJHxf3s1TBKwmPHe5TLIfSzV1YxcEGiMc0lWqF4v
BT8ZeMGCtjDw94tND1DskfQQRPaMqPmGuRTrAW/IuE2n92bFtbqIqs7Cbw0fzLE=
=Vm6Q
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4.8-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM Changes for v4.8 - Take 2
Includes GSI routing support to go along with the new VGIC and a small fix that
has been cooking in -next for a while.
Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to
determine if a PowerPC KVM guest should use HTM (Hardware Transactional
Memory).
This will be used by QEMU to populate the pa-features bits in the
guest's device tree.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- GICv3 ITS emulation
- Simpler idmap management that fixes potential TLB conflicts
- Honor the kernel protection in HYP mode
- Removal of the old vgic implementation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXkk6wAAoJECPQ0LrRPXpDkIQP/iJ2yXTxrfbJoyaVq1vuMn3R
UFhVwNXP8OEjQrmp5lvMBazB1MRBkNDzlVXL1fSb+ijKmbIELOqHhO6ijrkK4zmc
0Ie0x5Bt4gIFPTZyZORVpy1eU/0YFGWERAfsAjYdMCeKwHjaUCRSrZBXF2YsFTfo
Hh/ILvHa8TjUXWsQXvtZCL6AAnkDKBsbDWqsq5zspuT+PA8umI+dGLIiULXBpc4t
S2TCDxOU1JgsAn+Y0XVbPXV9id+bs5LRd6nNH/RmipIVqWmukSrScXOjg/po/l2S
laO4tHmyEeN6ecnCxWttpjacNwyTDNh5n3lL1ceBnBZFqn1k/7NjqV3fQzJxGd1T
1U6edE9+EuS9uXWF5XcEuAD660EiMs4FLVSjPgqYQtto3gOHilmuWL9eeeOOgCem
Lknnu/7G8h36PaQuLnEXWXQb7jeS2rTuC0RqxCG62gD9UWEJTckRz5pRh/e6gz7n
ZVXMrwGiVZ3zR78qE6i2j5CZ6A0BMAK3nZ85AI3kmgKg0CfVY28uPOj8llAOaYm+
0XVdfRj7ed75eu3GobjHUyZ0fQ40jovmH2vy3mupBm5XBUHgH/j6X510KJ1UTLWI
C2EO9KogbjoVeu60mQi4bKGSPi8/wdgYqVft/Qzl5D5iFvQ7Ia+TQNMArCQazBID
Ihe1E09NGrHjV3Yw/GWV
=2Del
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next
KVM/ARM changes for Linux 4.8
- GICv3 ITS emulation
- Simpler idmap management that fixes potential TLB conflicts
- Honor the kernel protection in HYP mode
- Removal of the old vgic implementation