Commit Graph

14187 Commits

Author SHA1 Message Date
Linus Torvalds
8a6bd2f40e Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "An unfortunately larger set of fixes, but a large portion is
  selftests:

   - Fix the missing clusterid initializaiton for x2apic cluster
     management which caused boot failures due to IPIs being sent to the
     wrong cluster

   - Drop TX_COMPAT when a 64bit executable is exec()'ed from a compat
     task

   - Wrap access to __supported_pte_mask in __startup_64() where clang
     compile fails due to a non PC relative access being generated.

   - Two fixes for 5 level paging fallout in the decompressor:

      - Handle GOT correctly for paging_prepare() and
        cleanup_trampoline()

      - Fix the page table handling in cleanup_trampoline() to avoid
        page table corruption.

   - Stop special casing protection key 0 as this is inconsistent with
     the manpage and also inconsistent with the allocation map handling.

   - Override the protection key wen moving away from PROT_EXEC to
     prevent inaccessible memory.

   - Fix and update the protection key selftests to address breakage and
     to cover the above issue

   - Add a MOV SS self test"

[ Part of the x86 fixes were in the earlier core pull due to dependencies ]

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
  x86/apic/x2apic: Initialize cluster ID properly
  x86/boot/compressed/64: Fix moving page table out of trampoline memory
  x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()
  x86/pkeys: Do not special case protection key 0
  x86/pkeys/selftests: Add a test for pkey 0
  x86/pkeys/selftests: Save off 'prot' for allocations
  x86/pkeys/selftests: Fix pointer math
  x86/pkeys: Override pkey when moving away from PROT_EXEC
  x86/pkeys/selftests: Fix pkey exhaustion test off-by-one
  x86/pkeys/selftests: Add PROT_EXEC test
  x86/pkeys/selftests: Factor out "instruction page"
  x86/pkeys/selftests: Allow faults on unknown keys
  x86/pkeys/selftests: Avoid printf-in-signal deadlocks
  x86/pkeys/selftests: Remove dead debugging code, fix dprint_in_signal
  x86/pkeys/selftests: Stop using assert()
  x86/pkeys/selftests: Give better unexpected fault error messages
  x86/selftests: Add mov_to_ss test
  x86/mpx/selftests: Adjust the self-test to fresh distros that export the MPX ABI
  x86/pkeys/selftests: Adjust the self-test to fresh distros that export the pkeys ABI
  ...
2018-05-20 11:28:32 -07:00
Linus Torvalds
74cce52f9f Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fix from Thomas Gleixner:
 "Fix a regression in the new AMD SMCA code which issues an SMP function
  call from the early interrupt disabled region of CPU hotplug. To avoid
  that, use cached block addresses which can be used directly"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE/AMD: Cache SMCA MISC block addresses
2018-05-20 11:20:40 -07:00
Linus Torvalds
583dbad340 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:

 - Unbreak the BPF compilation which got broken by the unconditional
   requirement of asm-goto, which is not supported by clang.

 - Prevent probing on exception masking instructions in uprobes and
   kprobes to avoid the issues of the delayed exceptions instead of
   having an ugly workaround.

 - Prevent a double free_page() in the error path of do_kexec_load()

 - A set of objtool updates addressing various issues mostly related to
   switch tables and the noreturn detection for recursive sibling calls

 - Header sync for tools.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Detect RIP-relative switch table references, part 2
  objtool: Detect RIP-relative switch table references
  objtool: Support GCC 8 switch tables
  objtool: Support GCC 8's cold subfunctions
  objtool: Fix "noreturn" detection for recursive sibling calls
  objtool, kprobes/x86: Sync the latest <asm/insn.h> header with tools/objtool/arch/x86/include/asm/insn.h
  x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation
  uprobes/x86: Prohibit probing on MOV SS instruction
  kprobes/x86: Prohibit probing on exception masking instructions
  x86/kexec: Avoid double free_page() upon do_kexec_load() failure
2018-05-20 10:01:38 -07:00
Borislav Petkov
78ce241099 x86/MCE/AMD: Cache SMCA MISC block addresses
... into a global, two-dimensional array and service subsequent reads from
that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs
disabled).

In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage
of the bank->blocks pointer.

Fixes: 27bd595027 ("x86/mce/AMD: Get address from already initialized block")
Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de>
Tested-by: Johannes Hirte <johannes.hirte@datenkhaos.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
2018-05-19 15:19:30 +02:00
Dmitry Safonov
acf4602001 x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
The x86 mmap() code selects the mmap base for an allocation depending on
the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
for 32bit mm->mmap_compat_base.

exec() calls mmap() which in turn uses in_compat_syscall() to check whether
the mapping is for a 32bit or a 64bit task. The decision is made on the
following criteria:

  ia32    child->thread.status & TS_COMPAT
   x32    child->pt_regs.orig_ax & __X32_SYSCALL_BIT
  ia64    !ia32 && !x32

__set_personality_x32() was dropping TS_COMPAT flag, but
set_personality_64bit() has kept compat syscall flag making
in_compat_syscall() return true during the first exec() syscall.

Which in result has user-visible effects, mentioned by Alexey:
1) It breaks ASAN
$ gcc -fsanitize=address wrap.c -o wrap-asan
$ ./wrap32 ./wrap-asan true
==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==1217==Process memory map follows:
        0x000000400000-0x000000401000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000600000-0x000000601000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000601000-0x000000602000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x0000f7dbd000-0x0000f7de2000   /lib64/ld-2.27.so
        0x0000f7fe2000-0x0000f7fe3000   /lib64/ld-2.27.so
        0x0000f7fe3000-0x0000f7fe4000   /lib64/ld-2.27.so
        0x0000f7fe4000-0x0000f7fe5000
        0x7fed9abff000-0x7fed9af54000
        0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
[snip]

2) It doesn't seem to be great for security if an attacker always knows
that ld.so is going to be mapped into the first 4GB in this case
(the same thing happens for PIEs as well).

The testcase:
$ cat wrap.c

int main(int argc, char *argv[]) {
  execvp(argv[1], &argv[1]);
  return 127;
}

$ gcc wrap.c -o wrap
$ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
AT_BASE:         0x7f63b8309000
AT_BASE:         0x7faec143c000
AT_BASE:         0x7fbdb25fa000

$ gcc -m32 wrap.c -o wrap32
$ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
AT_BASE:         0xf7eff000
AT_BASE:         0xf7cee000
AT_BASE:         0x7f8b9774e000

Fixes: 1b028f784e ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Fixes: ada26481df ("x86/mm: Make in_compat_syscall() work during exec")
Reported-by: Alexey Izbyshev <izbyshev@ispras.ru>
Bisected-by: Alexander Monakov <amonakov@ispras.ru>
Investigated-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Alexander Monakov <amonakov@ispras.ru>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20180517233510.24996-1-dima@arista.com
2018-05-19 12:31:05 +02:00
Linus Torvalds
3acf4e3952 k10temp fixes
Fix race condition when accessing System Management Network registers
 Fix reading critical temperatures on F15h M60h and M70h
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJa+0BbAAoJEMsfJm/On5mBo3EQAJxtFC7pA7JzY0yZsXvaA+50
 ObN9EtG5mhVMZQfcOThcN6ZGzV12rpJltsCp6Poy0g8n7rgLiB5y2IJvinM7ETil
 6zbw5onfv2So/WyvXWBylEI0J4WjtGc8n17S1+nlT+Ppy4ID6PQPv1pGfr7YVI0o
 0T2sLSfDQD7vgtvpHi7A+4q2hbsI0HjS3LKI8CAy4UboZ8yltxJBsgV7gJ3fbv4Z
 tX9DOH05bGsCR/9vwoA3rRVbUKbvPnwTY36DCAyT53QuYRIBwREXi/xkxCkKdSsn
 X3o78TPkvE/qTyK1ZjuJ5yxDdLmesibiKOtyPBeaPaTQ+jcayfSr+rQrAvsZ2Ogp
 8pjZ5he3LR4/8wdmBhZBBcDXDdBMar8SRMSpPrBRyWONpn5fSLuszUkintKTND4c
 dH1zlXmYjRFsQBW2O+/b6k1Hq/p654mwD4hBbxHN7FVBnrWDWzUgd2xSpQLxSqkz
 sfyd6wsvrVeUCGHAsgVY9sXYlbrTjI1WWkOX4EAJC2YKvWDYTB/kQXg0I5vICN4m
 9tLyoC8tvKothIe8J1U5VUeGgpP5QES+yf7YNF9gc02D8l5xlsWuUAVrBI1XBOdS
 0MXFFFxM68Y6ufhIiahSXPM7vocSFi6CuuYbuz6Z09a2L9cahG4C5+Qe9E9h6PjM
 N4uOoFJGKckctQYJB0rO
 =SujR
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Two k10temp fixes:

   - fix race condition when accessing System Management Network
     registers

   - fix reading critical temperatures on F15h M60h and M70h

  Also add PCI ID's for the AMD Raven Ridge root bridge"

* tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (k10temp) Use API function to access System Management Network
  x86/amd_nb: Add support for Raven Ridge CPUs
  hwmon: (k10temp) Fix reading critical temperature register
2018-05-17 15:58:12 -07:00
Thomas Gleixner
fed71f7d98 x86/apic/x2apic: Initialize cluster ID properly
Rick bisected a regression on large systems which use the x2apic cluster
mode for interrupt delivery to the commit wich reworked the cluster
management.

The problem is caused by a missing initialization of the clusterid field
in the shared cluster data structures. So all structures end up with
cluster ID 0 which only allows sharing between all CPUs which belong to
cluster 0. All other CPUs with a cluster ID > 0 cannot share the data
structure because they cannot find existing data with their cluster
ID. This causes malfunction with IPIs because IPIs are sent to the wrong
cluster and the caller waits for ever that the target CPU handles the IPI.

Add the missing initialization when a upcoming CPU is the first in a
cluster so that the later booting CPUs can find the data and share it for
proper operation.

Fixes: 023a611748 ("x86/apic/x2apic: Simplify cluster management")
Reported-by: Rick Warner <rick@microway.com>
Bisected-by: Rick Warner <rick@microway.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Rick Warner <rick@microway.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805171418210.1947@nanos.tec.linutronix.de
2018-05-17 21:00:12 +02:00
Michael S. Tsirkin
633711e828 kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
KVM_HINTS_DEDICATED seems to be somewhat confusing:

Guest doesn't really care whether it's the only task running on a host
CPU as long as it's not preempted.

And there are more reasons for Guest to be preempted than host CPU
sharing, for example, with memory overcommit it can get preempted on a
memory access, post copy migration can cause preemption, etc.

Let's call it KVM_HINTS_REALTIME which seems to better
match what guests expect.

Also, the flag most be set on all vCPUs - current guests assume this.
Note so in the documentation.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-17 19:12:13 +02:00
Alexander Potapenko
4a09f0210c x86/boot/64/clang: Use fixup_pointer() to access '__supported_pte_mask'
Clang builds with defconfig started crashing after the following
commit:

  fb43d6cb91 ("x86/mm: Do not auto-massage page protections")

This was caused by introducing a new global access in __startup_64().

Code in __startup_64() can be relocated during execution, but the compiler
doesn't have to generate PC-relative relocations when accessing globals
from that function. Clang actually does not generate them, which leads
to boot-time crashes. To work around this problem, every global pointer
must be adjusted using fixup_pointer().

Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: dvyukov@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: linux-mm@kvack.org
Cc: md@google.com
Cc: mka@chromium.org
Fixes: fb43d6cb91 ("x86/mm: Do not auto-massage page protections")
Link: http://lkml.kernel.org/r/20180509091822.191810-1-glider@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:14:30 +02:00
Masami Hiramatsu
13ebe18c94 uprobes/x86: Prohibit probing on MOV SS instruction
Since MOV SS and POP SS instructions will delay the exceptions until the
next instruction is executed, single-stepping on it by uprobes must be
prohibited.

uprobe already rejects probing on POP SS (0x1f), but allows probing on MOV
SS (0x8e and reg == 2).  This checks the target instruction and if it is
MOV SS or POP SS, returns -ENOTSUPP to reject probing.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S . Miller" <davem@davemloft.net>
Link: https://lkml.kernel.org/r/152587072544.17316.5950935243917346341.stgit@devbox
2018-05-13 19:52:56 +02:00
Masami Hiramatsu
ee6a7354a3 kprobes/x86: Prohibit probing on exception masking instructions
Since MOV SS and POP SS instructions will delay the exceptions until the
next instruction is executed, single-stepping on it by kprobes must be
prohibited.

However, kprobes usually executes those instructions directly on trampoline
buffer (a.k.a. kprobe-booster), except for the kprobes which has
post_handler. Thus if kprobe user probes MOV SS with post_handler, it will
do single-stepping on the MOV SS.

This means it is safe that if it is used via ftrace or perf/bpf since those
don't use the post_handler.

Anyway, since the stack switching is a rare case, it is safer just
rejecting kprobes on such instructions.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S . Miller" <davem@davemloft.net>
Link: https://lkml.kernel.org/r/152587069574.17316.3311695234863248641.stgit@devbox
2018-05-13 19:52:55 +02:00
Tetsuo Handa
a466ef76b8 x86/kexec: Avoid double free_page() upon do_kexec_load() failure
>From ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 9 May 2018 12:12:39 +0900
Subject: [PATCH v3] x86/kexec: avoid double free_page() upon do_kexec_load() failure.

syzbot is reporting crashes after memory allocation failure inside
do_kexec_load() [1]. This is because free_transition_pgtable() is called
by both init_transition_pgtable() and machine_kexec_cleanup() when memory
allocation failed inside init_transition_pgtable().

Regarding 32bit code, machine_kexec_free_page_tables() is called by both
machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory
allocation failed inside machine_kexec_alloc_page_tables().

Fix this by leaving the error handling to machine_kexec_cleanup()
(and optionally setting NULL after free_page()).

[1] https://syzkaller.appspot.com/bug?id=91e52396168cf2bdd572fe1e1bc0bc645c1c6b40

Fixes: f5deb79679 ("x86: kexec: Use one page table in x86_64 machine_kexec")
Fixes: 92be3d6bdf ("kexec/i386: allocate page table pages dynamically")
Reported-by: syzbot <syzbot+d96f60296ef613fe1d69@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: prudo@linux.vnet.ibm.com
Cc: Huang Ying <ying.huang@intel.com>
Cc: syzkaller-bugs@googlegroups.com
Cc: takahiro.akashi@linaro.org
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: akpm@linux-foundation.org
Cc: dyoung@redhat.com
Cc: kirill.shutemov@linux.intel.com
Link: https://lkml.kernel.org/r/201805091942.DGG12448.tMFVFSJFQOOLHO@I-love.SAKURA.ne.jp
2018-05-13 19:50:06 +02:00
Guenter Roeck
f9bc6b2dd9 x86/amd_nb: Add support for Raven Ridge CPUs
Add Raven Ridge root bridge and data fabric PCI IDs.
This is required for amd_pci_dev_to_node_id() and amd_smn_read().

Cc: stable@vger.kernel.org # v4.16+
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-05-13 09:00:27 -07:00
Linus Torvalds
9c48eb6aab Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
 "Unbreak the CPUID CPUID_8000_0008_EBX reload which got dropped when
  the evaluation of physical and virtual bits which uses the same CPUID
  leaf was moved out of get_cpu_cap()"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Restore CPUID_8000_0008_EBX reload
2018-05-06 05:37:24 -10:00
Thomas Gleixner
c65732e4f7 x86/cpu: Restore CPUID_8000_0008_EBX reload
The recent commt which addresses the x86_phys_bits corruption with
encrypted memory on CPUID reload after a microcode update lost the reload
of CPUID_8000_0008_EBX as well.

As a consequence IBRS and IBRS_FW are not longer detected

Restore the behaviour by bringing the reload of CPUID_8000_0008_EBX
back. This restore has a twist due to the convoluted way the cpuid analysis
works:

CPUID_8000_0008_EBX is used by AMD to enumerate IBRB, IBRS, STIBP. On Intel
EBX is not used. But the speculation control code sets the AMD bits when
running on Intel depending on the Intel specific speculation control
bits. This was done to use the same bits for alternatives.

The change which moved the 8000_0008 evaluation out of get_cpu_cap() broke
this nasty scheme due to ordering. So that on Intel the store to
CPUID_8000_0008_EBX clears the IBRB, IBRS, STIBP bits which had been set
before by software.

So the actual CPUID_8000_0008_EBX needs to go back to the place where it
was and the phys/virt address space calculation cannot touch it.

In hindsight this should have used completely synthetic bits for IBRB,
IBRS, STIBP instead of reusing the AMD bits, but that's for 4.18.

/me needs to find time to cleanup that steaming pile of ...

Fixes: d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Reported-by: Jörg Otte <jrg.otte@gmail.com>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kirill.shutemov@linux.intel.com
Cc: Borislav Petkov <bp@alien8.de
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805021043510.1668@nanos.tec.linutronix.de
2018-05-02 16:44:38 +02:00
Peter Zijlstra
e3b4f79025 x86/tsc: Fix mark_tsc_unstable()
mark_tsc_unstable() also needs to affect tsc_early, Now that
clocksource_mark_unstable() can be used on a clocksource irrespective of
its registration state, use it on both tsc_early and tsc.

This does however require cs->list to be initialized empty, otherwise it
cannot tell the registation state before registation.

Fixes: aa83c45762 ("x86/tsc: Introduce early tsc clocksource")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Diego Viola <diego.viola@gmail.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: len.brown@intel.com
Cc: rjw@rjwysocki.net
Cc: rui.zhang@intel.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180430100344.533326547@infradead.org
2018-05-02 16:10:40 +02:00
Peter Zijlstra
e9088adda1 x86/tsc: Always unregister clocksource_tsc_early
Don't leave the tsc-early clocksource registered if it errors out
early.

This was reported by Diego, who on his Core2 era machine got TSC
invalidated while it was running with tsc-early (due to C-states).
This results in keeping tsc-early with very bad effects.

Reported-and-Tested-by: Diego Viola <diego.viola@gmail.com>
Fixes: aa83c45762 ("x86/tsc: Introduce early tsc clocksource")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: len.brown@intel.com
Cc: rjw@rjwysocki.net
Cc: diego.viola@gmail.com
Cc: rui.zhang@intel.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180430100344.350507853@infradead.org
2018-05-02 16:10:40 +02:00
Petr Tesarik
3db3eb2852 x86/setup: Do not reserve a crash kernel region if booted on Xen PV
Xen PV domains cannot shut down and start a crash kernel. Instead,
the crashing kernel makes a SCHEDOP_shutdown hypercall with the
reason code SHUTDOWN_crash, cf. xen_crash_shutdown() machine op in
arch/x86/xen/enlighten_pv.c.

A crash kernel reservation is merely a waste of RAM in this case. It
may also confuse users of kexec_load(2) and/or kexec_file_load(2).
When flags include KEXEC_ON_CRASH or KEXEC_FILE_ON_CRASH,
respectively, these syscalls return success, which is technically
correct, but the crash kexec image will never be actually used.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: xen-devel@lists.xenproject.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jean Delvare <jdelvare@suse.de>
Link: https://lkml.kernel.org/r/20180425120835.23cef60c@ezekiel.suse.cz
2018-04-27 17:06:28 +02:00
jacek.tomaka@poczta.fm
b837913fc2 x86/cpu/intel: Add missing TLB cpuid values
Make kernel print the correct number of TLB entries on Intel Xeon Phi 7210
(and others)

Before:
[ 0.320005] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
After:
[ 0.320005] Last level dTLB entries: 4KB 256, 2MB 128, 4MB 128, 1GB 16

The entries do exist in the official Intel SMD but the type column there is
incorrect (states "Cache" where it should read "TLB"), but the entries for
the values 0x6B, 0x6C and 0x6D are correctly described as 'Data TLB'.

Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180423161425.24366-1-jacekt@dugeo.com
2018-04-26 21:42:44 +02:00
Yazen Ghannam
da6fa7ef67 x86/smpboot: Don't use mwait_play_dead() on AMD systems
Recent AMD systems support using MWAIT for C1 state. However, MWAIT will
not allow deeper cstates than C1 on current systems.

play_dead() expects to use the deepest state available.  The deepest state
available on AMD systems is reached through SystemIO or HALT. If MWAIT is
available, it is preferred over the other methods, so the CPU never reaches
the deepest possible state.

Don't try to use MWAIT to play_dead() on AMD systems. Instead, use CPUIDLE
to enter the deepest state advertised by firmware. If CPUIDLE is not
available then fallback to HALT.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Link: https://lkml.kernel.org/r/20180403140228.58540-1-Yazen.Ghannam@amd.com
2018-04-26 16:06:19 +02:00
Borislav Petkov
09e182d17e x86/microcode: Do not exit early from __reload_late()
Vitezslav reported a case where the

  "Timeout during microcode update!"

panic would hit. After a deeper look, it turned out that his .config had
CONFIG_HOTPLUG_CPU disabled which practically made save_mc_for_early() a
no-op.

When that happened, the discovered microcode patch wasn't saved into the
cache and the late loading path wouldn't find any.

This, then, lead to early exit from __reload_late() and thus CPUs waiting
until the timeout is reached, leading to the panic.

In hindsight, that function should have been written so it does not return
before the post-synchronization. Oh well, I know better now...

Fixes: bb8c13d61a ("x86/microcode: Fix CPU synchronization routine")
Reported-by: Vitezslav Samel <vitezslav@samel.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vitezslav Samel <vitezslav@samel.cz>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz
Link: https://lkml.kernel.org/r/20180421081930.15741-2-bp@alien8.de
2018-04-24 09:48:22 +02:00
Borislav Petkov
84749d8375 x86/microcode/intel: Save microcode patch unconditionally
save_mc_for_early() was a no-op on !CONFIG_HOTPLUG_CPU but the
generic_load_microcode() path saves the microcode patches it has found into
the cache of patches which is used for late loading too. Regardless of
whether CPU hotplug is used or not.

Make the saving unconditional so that late loading can find the proper
patch.

Reported-by: Vitezslav Samel <vitezslav@samel.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vitezslav Samel <vitezslav@samel.cz>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz
Link: https://lkml.kernel.org/r/20180421081930.15741-1-bp@alien8.de
2018-04-24 09:48:22 +02:00
Thomas Gleixner
7010adcdd2 x86/jailhouse: Fix incorrect SPDX identifier
GPL2.0 is not a valid SPDX identiier. Replace it with GPL-2.0.

Fixes: 4a362601ba ("x86/jailhouse: Add infrastructure for running in non-root cell")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Link: https://lkml.kernel.org/r/20180422220832.815346488@linutronix.de
2018-04-23 10:17:28 +02:00
Linus Torvalds
37a535edd7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of fixes for x86:

   - Prevent X2APIC ID 0xFFFFFFFF from being treated as valid, which
     causes the possible CPU count to be wrong.

   - Prevent 32bit truncation in calc_hpet_ref() which causes the TSC
     calibration to fail

   - Fix the page table setup for temporary text mappings in the resume
     code which causes resume failures

   - Make the page table dump code handle HIGHPTE correctly instead of
     oopsing

   - Support for topologies where NUMA nodes share an LLC to prevent a
     invalid topology warning and further malfunction on such systems.

   - Remove the now unused pci-nommu code

   - Remove stale function declarations"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/power/64: Fix page-table setup for temporary text mapping
  x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y
  x86,sched: Allow topologies where NUMA nodes share an LLC
  x86/processor: Remove two unused function declarations
  x86/acpi: Prevent X2APIC id 0xffffffff from being accounted
  x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
  x86: Remove pci-nommu.c
2018-04-22 11:40:52 -07:00
Dave Young
a841aa83df kexec_file: do not add extra alignment to efi memmap
Chun-Yi reported a kernel warning message below:

  WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 early_iounmap+0x4f/0x12c()
  early_iounmap(ffffffffff200180, 00000118) [0] size not consistent 00000120

The problem is x86 kexec_file_load adds extra alignment to the efi
memmap: in bzImage64_load():

        efi_map_sz = efi_get_runtime_map_size();
        efi_map_sz = ALIGN(efi_map_sz, 16);

And __efi_memmap_init maps with the size including the alignment bytes
but efi_memmap_unmap use nr_maps * desc_size which does not include the
extra bytes.

The alignment in kexec code is only needed for the kexec buffer internal
use Actually kexec should pass exact size of the efi memmap to 2nd
kernel.

Link: http://lkml.kernel.org/r/20180417083600.GA1972@dhcp-128-65.nay.redhat.com
Signed-off-by: Dave Young <dyoung@redhat.com>
Reported-by: joeyli <jlee@suse.com>
Tested-by: Randy Wright <rwright@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20 17:18:36 -07:00
Alison Schofield
1340ccfa9a x86,sched: Allow topologies where NUMA nodes share an LLC
Intel's Skylake Server CPUs have a different LLC topology than previous
generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided
into two "slices", each containing half the cores, half the LLC, and one
memory controller and each slice is enumerated to Linux as a NUMA
node. This is similar to how the cores and LLC were arranged for the
Cluster-On-Die (CoD) feature.

CoD allowed the same cache line to be present in each half of the LLC.
But, with SNC, each line is only ever present in *one* slice. This means
that the portion of the LLC *available* to a CPU depends on the data being
accessed:

    Remote socket: entire package LLC is shared
    Local socket->local slice: data goes into local slice LLC
    Local socket->remote slice: data goes into remote-slice LLC. Slightly
                    		higher latency than local slice LLC.

The biggest implication from this is that a process accessing all
NUMA-local memory only sees half the LLC capacity.

The CPU describes its cache hierarchy with the CPUID instruction. One of
the CPUID leaves enumerates the "logical processors sharing this
cache". This information is used for scheduling decisions so that tasks
move more freely between CPUs sharing the cache.

But, the CPUID for the SNC configuration discussed above enumerates the LLC
as being shared by the entire package. This is not 100% precise because the
entire cache is not usable by all accesses. But, it *is* the way the
hardware enumerates itself, and this is not likely to change.

The userspace visible impact of all the above is that the sysfs info
reports the entire LLC as being available to the entire package. As noted
above, this is not true for local socket accesses. This patch does not
correct the sysfs info. It is the same, pre and post patch.

The current code emits the following warning:

 sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.

The warning is coming from the topology_sane() check in smpboot.c because
the topology is not matching the expectations of the model for obvious
reasons.

To fix this, add a vendor and model specific check to never call
topology_sane() for these systems. Also, just like "Cluster-on-Die" disable
the "coregroup" sched_domain_topology_level and use NUMA information from
the SRAT alone.

This is OK at least on the hardware we are immediately concerned about
because the LLC sharing happens at both the slice and at the package level,
which are also NUMA boundaries.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: brice.goglin@gmail.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com
2018-04-17 15:39:55 +02:00
Dou Liyang
10daf10ab1 x86/acpi: Prevent X2APIC id 0xffffffff from being accounted
RongQing reported that there are some X2APIC id 0xffffffff in his machine's
ACPI MADT table, which makes the number of possible CPU inaccurate.

The reason is that the ACPI X2APIC parser has no sanity check for APIC ID
0xffffffff, which is an invalid id in all APIC types. See "Intel® 64
Architecture x2APIC Specification", Chapter 2.4.1.

Add a sanity check to acpi_parse_x2apic() which ignores the invalid id.

Reported-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: len.brown@intel.com
Cc: rjw@rjwysocki.net
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20180412014052.25186-1-douly.fnst@cn.fujitsu.com
2018-04-17 11:56:31 +02:00
Xiaoming Gao
d3878e164d x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
The TSC calibration code uses HPET as reference. The conversion normalizes
the delta of two HPET timestamps:

    hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6

and then divides the normalized delta of the corresponding TSC timestamps
by the result to calulate the TSC frequency.

    tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref

This uses do_div() which takes an u32 as the divisor, which worked so far
because the HPET frequency was low enough that 'hpetref' never exceeded
32bit.

On Skylake machines the HPET frequency increased so 'hpetref' can exceed
32bit. do_div() truncates the divisor, which causes the calibration to
fail.

Use div64_u64() to avoid the problem.

[ tglx: Fixes whitespace mangled patch and rewrote changelog ]

Signed-off-by: Xiaoming Gao <newtongao@tencent.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: peterz@infradead.org
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/38894564-4fc9-b8ec-353f-de702839e44e@gmail.com
2018-04-17 11:50:42 +02:00
Christoph Hellwig
ef97837db9 x86: Remove pci-nommu.c
The commit that switched x86 to dma_direct_ops stopped using and building
this file, but accidentally left it in the tree.  Remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.infradead.org
Link: https://lkml.kernel.org/r/20180416124442.13831-1-hch@lst.de
2018-04-17 11:48:06 +02:00
Joerg Roedel
e6f39e87b6 x86/ldt: Fix support_pte_mask filtering in map_ldt_struct()
The |= operator will let us end up with an invalid PTE. Use
the correct &= instead.

[ The bug was also independently reported by Shuah Khan ]

Fixes: fb43d6cb91 ('x86/mm: Do not auto-massage page protections')
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-16 11:20:34 -07:00
Linus Torvalds
9fb71c2f23 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes and updates for x86:

   - Address a swiotlb regression which was caused by the recent DMA
     rework and made driver fail because dma_direct_supported() returned
     false

   - Fix a signedness bug in the APIC ID validation which caused invalid
     APIC IDs to be detected as valid thereby bloating the CPU possible
     space.

   - Fix inconsisten config dependcy/select magic for the MFD_CS5535
     driver.

   - Fix a corruption of the physical address space bits when encryption
     has reduced the address space and late cpuinfo updates overwrite
     the reduced bit information with the original value.

   - Dominiks syscall rework which consolidates the architecture
     specific syscall functions so all syscalls can be wrapped with the
     same macros. This allows to switch x86/64 to struct pt_regs based
     syscalls. Extend the clearing of user space controlled registers in
     the entry patch to the lower registers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix signedness bug in APIC ID validity checks
  x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
  x86/olpc: Fix inconsistent MFD_CS5535 configuration
  swiotlb: Use dma_direct_supported() for swiotlb_ops
  syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
  syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
  syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
  syscalls/core, syscalls/x86: Clean up syscall stub naming convention
  syscalls/x86: Extend register clearing on syscall entry to lower registers
  syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
  syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
  syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
  syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
  syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
  x86/syscalls: Don't pointlessly reload the system call number
  x86/mm: Fix documentation of module mapping range with 4-level paging
  x86/cpuid: Switch to 'static const' specifier
2018-04-15 16:12:35 -07:00
Linus Torvalds
6b0a02e86c Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
 "Another series of PTI related changes:

   - Remove the manual stack switch for user entries from the idtentry
     code. This debloats entry by 5k+ bytes of text.

   - Use the proper types for the asm/bootparam.h defines to prevent
     user space compile errors.

   - Use PAGE_GLOBAL for !PCID systems to gain back performance

   - Prevent setting of huge PUD/PMD entries when the entries are not
     leaf entries otherwise the entries to which the PUD/PMD points to
     and are populated get lost"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
  x86/pti: Leave kernel text global for !PCID
  x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image
  x86/pti: Enable global pages for shared areas
  x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
  x86/mm: Comment _PAGE_GLOBAL mystery
  x86/mm: Remove extra filtering in pageattr code
  x86/mm: Do not auto-massage page protections
  x86/espfix: Document use of _PAGE_GLOBAL
  x86/mm: Introduce "default" kernel PTE mask
  x86/mm: Undo double _PAGE_PSE clearing
  x86/mm: Factor out pageattr _PAGE_GLOBAL setting
  x86/entry/64: Drop idtentry's manual stack switch for user entries
  x86/uapi: Fix asm/bootparam.h userspace compilation errors
2018-04-15 13:35:29 -07:00
Philipp Rudo
3be3f61d25 kernel/kexec_file.c: allow archs to set purgatory load address
For s390 new kernels are loaded to fixed addresses in memory before they
are booted.  With the current code this is a problem as it assumes the
kernel will be loaded to an 'arbitrary' address.  In particular,
kexec_locate_mem_hole searches for a large enough memory region and sets
the load address (kexec_bufer->mem) to it.

Luckily there is a simple workaround for this problem.  By returning 1
in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off.  This
allows the architecture to set kbuf->mem by hand.  While the trick works
fine for the kernel it does not for the purgatory as here the
architectures don't have access to its kexec_buffer.

Give architectures access to the purgatories kexec_buffer by changing
kexec_load_purgatory to take a pointer to it.  With this change
architectures have access to the buffer and can edit it as they need.

A nice side effect of this change is that we can get rid of the
purgatory_info->purgatory_load_address field.  As now the information
stored there can directly be accessed from kbuf->mem.

Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:28 -07:00
Philipp Rudo
8da0b72495 kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
The current code uses the sh_offset field in purgatory_info->sechdrs to
store a pointer to the current load address of the section.  Depending
whether the section will be loaded or not this is either a pointer into
purgatory_info->purgatory_buf or kexec_purgatory.  This is not only a
violation of the ELF standard but also makes the code very hard to
understand as you cannot tell if the memory you are using is read-only
or not.

Remove this misuse and store the offset of the section in
pugaroty_info->purgatory_buf in sh_offset.

Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:28 -07:00
Philipp Rudo
8aec395b84 kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
When the relocations are applied to the purgatory only the section the
relocations are applied to is writable.  The other sections, i.e.  the
symtab and .rel/.rela, are in read-only kexec_purgatory.  Highlight this
by marking the corresponding variables as 'const'.

While at it also change the signatures of arch_kexec_apply_relocations* to
take section pointers instead of just the index of the relocation section.
This removes the second lookup and sanity check of the sections in arch
code.

Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:28 -07:00
AKASHI Takahiro
babac4a84a kexec_file, x86: move re-factored code to generic side
In the previous patches, commonly-used routines, exclude_mem_range() and
prepare_elf64_headers(), were carved out.  Now place them in kexec
common code.  A prefix "crash_" is given to each of their names to avoid
possible name collisions.

Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro
eb7dae947e x86: kexec_file: clean up prepare_elf64_headers()
Removing bufp variable in prepare_elf64_headers() makes the code simpler
and more understandable.

Link: http://lkml.kernel.org/r/20180306102303.9063-7-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro
8d5f894a31 x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
While CRASH_MAX_RANGES (== 16) seems to be good enough, fixed-number
array is not a good idea in general.

In this patch, size of crash_mem buffer is calculated as before and the
buffer is now dynamically allocated.  This change also allows removing
crash_elf_data structure.

Link: http://lkml.kernel.org/r/20180306102303.9063-6-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro
c72c7e6709 x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
The code guarded by CONFIG_X86_64 is necessary on some architectures
which have a dedicated kernel mapping outside of linear memory mapping.
(arm64 is among those.)

In this patch, an additional argument, kernel_map, is added to enable/
disable the code removing #ifdef.

Link: http://lkml.kernel.org/r/20180306102303.9063-5-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro
cbe6601617 x86: kexec_file: purge system-ram walking from prepare_elf64_headers()
While prepare_elf64_headers() in x86 looks pretty generic for other
architectures' use, it contains some code which tries to list crash
memory regions by walking through system resources, which is not always
architecture agnostic.  To make this function more generic, the related
code should be purged.

In this patch, prepare_elf64_headers() simply scans crash_mem buffer
passed and add all the listed regions to elf header as a PT_LOAD
segment.  So walk_system_ram_res(prepare_elf64_headers_callback) have
been moved forward before prepare_elf64_headers() where the callback,
prepare_elf64_headers_callback(), is now responsible for filling up
crash_mem buffer.

Meanwhile exclude_elf_header_ranges() used to be called every time in
this callback it is rather redundant and now called only once in
prepare_elf_headers() as well.

Link: http://lkml.kernel.org/r/20180306102303.9063-4-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro
9ec4ecef0a kexec_file,x86,powerpc: factor out kexec_file_ops functions
As arch_kexec_kernel_image_{probe,load}(),
arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig()
are almost duplicated among architectures, they can be commonalized with
an architecture-defined kexec_file_ops array.  So let's factor them out.

Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
Linus Torvalds
681857ef0d Merge branch 'parisc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:

 - fix panic when halting system via "shutdown -h now"

 - drop own coding in favour of generic CONFIG_COMPAT_BINFMT_ELF
   implementation

 - add FPE_CONDTRAP constant: last outstanding parisc-specific cleanup
   for Eric Biedermans siginfo patches

 - move some functions to .init and some to .text.hot linker sections

* 'parisc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Prevent panic at system halt
  parisc: Switch to generic COMPAT_BINFMT_ELF
  parisc: Move cache flush functions into .text.hot section
  parisc/signal: Add FPE_CONDTRAP for conditional trap handling
2018-04-12 17:07:04 -07:00
Ingo Molnar
ef389b7346 Merge branch 'WIP.x86/asm' into x86/urgent, because the topic is ready
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:42:34 +02:00
Dave Hansen
430d4005b8 x86/mm: Comment _PAGE_GLOBAL mystery
I was mystified as to where the _PAGE_GLOBAL in the kernel page tables
for kernel text came from.  I audited all the places I could find, but
I missed one: head_64.S.

The page tables that we create in here live for a long time, and they
also have _PAGE_GLOBAL set, despite whether the processor supports it
or not.  It's harmless, and we got *lucky* that the pageattr code
accidentally clears it when we wipe it out of __supported_pte_mask and
then later try to mark kernel text read-only.

Comment some of these properties to make it easier to find and
understand in the future.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205513.079BB265@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:05:58 +02:00
Dave Hansen
fb43d6cb91 x86/mm: Do not auto-massage page protections
A PTE is constructed from a physical address and a pgprotval_t.
__PAGE_KERNEL, for instance, is a pgprot_t and must be converted
into a pgprotval_t before it can be used to create a PTE.  This is
done implicitly within functions like pfn_pte() by massage_pgprot().

However, this makes it very challenging to set bits (and keep them
set) if your bit is being filtered out by massage_pgprot().

This moves the bit filtering out of pfn_pte() and friends.  For
users of PAGE_KERNEL*, filtering will be done automatically inside
those macros but for users of __PAGE_KERNEL*, they need to do their
own filtering now.

Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
instead of massage_pgprot().  This way, we still *look* for
unsupported bits and properly warn about them if we find them.  This
might happen if an unfiltered __PAGE_KERNEL* value was passed in,
for instance.

- printk format warning fix from: Arnd Bergmann <arnd@arndb.de>
- boot crash fix from:            Tom Lendacky <thomas.lendacky@amd.com>
- crash bisected by:              Mike Galbraith <efault@gmx.de>

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-and-fixed-by: Arnd Bergmann <arnd@arndb.de>
Fixed-by: Tom Lendacky <thomas.lendacky@amd.com>
Bisected-by: Mike Galbraith <efault@gmx.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:04:22 +02:00
Pavel Tatashin
6f84f8d158 xen, mm: allow deferred page initialization for xen pv domains
Juergen Gross noticed that commit f7f99100d8 ("mm: stop zeroing memory
during allocation in vmemmap") broke XEN PV domains when deferred struct
page initialization is enabled.

This is because the xen's PagePinned() flag is getting erased from
struct pages when they are initialized later in boot.

Juergen fixed this problem by disabling deferred pages on xen pv
domains.  It is desirable, however, to have this feature available as it
reduces boot time.  This fix re-enables the feature for pv-dmains, and
fixes the problem the following way:

The fix is to delay setting PagePinned flag until struct pages for all
allocated memory are initialized, i.e.  until after free_all_bootmem().

A new x86_init.hyper op init_after_bootmem() is called to let xen know
that boot allocator is done, and hence struct pages for all the
allocated memory are now initialized.  If deferred page initialization
is enabled, the rest of struct pages are going to be initialized later
in boot once page_alloc_init_late() is called.

xen_after_bootmem() walks page table's pages and marks them pinned.

Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jia Zhang <zhang.jia@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Helge Deller
75abf64287 parisc/signal: Add FPE_CONDTRAP for conditional trap handling
Posix and common sense requires that SI_USER not be a signal specific
si_code. Thus add a new FPE_CONDTRAP si_code for conditional traps.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2018-04-11 11:40:35 +02:00
Li RongQing
a774635db5 x86/apic: Fix signedness bug in APIC ID validity checks
The APIC ID as parsed from ACPI MADT is validity checked with the
apic->apic_id_valid() callback, which depends on the selected APIC type.

For non X2APIC types APIC IDs >= 0xFF are invalid, but values > 0x7FFFFFFF
are detected as valid. This happens because the 'apicid' argument of the
apic_id_valid() callback is type 'int'. So the resulting comparison

   apicid < 0xFF

evaluates to true for all unsigned int values > 0x7FFFFFFF which are handed
to default_apic_id_valid(). As a consequence, invalid APIC IDs in !X2APIC
mode are considered valid and accounted as possible CPUs.

Change the apicid argument type of the apic_id_valid() callback to u32 so
the evaluation is unsigned and returns the correct result.

[ tglx: Massaged changelog ]

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: jgross@suse.com
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1523322966-10296-1-git-send-email-lirongqing@baidu.com
2018-04-10 16:46:39 +02:00
Kirill A. Shutemov
d94a155c59 x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
Some features (Intel MKTME, AMD SME) reduce the number of effectively
available physical address bits. cpuinfo_x86::x86_phys_bits is adjusted
accordingly during the early cpu feature detection.

Though if get_cpu_cap() is called later again then this adjustement is
overwritten. That happens in setup_pku(), which is called after
detect_tme().

To address this, extract the address sizes enumeration into a separate
function, which is only called only from early_identify_cpu() and from
generic_identify().

This makes get_cpu_cap() safe to be called later during boot proccess
without overwriting cpuinfo_x86::x86_phys_bits.

[ tglx: Massaged changelog ]

Fixes: cb06d8e3d0 ("x86/tme: Detect if TME and MKTME is activated by BIOS")
Reported-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: linux-mm@kvack.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180410092704.41106-1-kirill.shutemov@linux.intel.com
2018-04-10 16:33:21 +02:00
Linus Torvalds
d8312a3f61 ARM:
- VHE optimizations
 - EL2 address space randomization
 - speculative execution mitigations ("variant 3a", aka execution past invalid
 privilege register access)
 - bugfixes and cleanups
 
 PPC:
 - improvements for the radix page fault handler for HV KVM on POWER9
 
 s390:
 - more kvm stat counters
 - virtio gpu plumbing
 - documentation
 - facilities improvements
 
 x86:
 - support for VMware magic I/O port and pseudo-PMCs
 - AMD pause loop exiting
 - support for AMD core performance extensions
 - support for synchronous register access
 - expose nVMX capabilities to userspace
 - support for Hyper-V signaling via eventfd
 - use Enlightened VMCS when running on Hyper-V
 - allow userspace to disable MWAIT/HLT/PAUSE vmexits
 - usual roundup of optimizations and nested virtualization bugfixes
 
 Generic:
 - API selftest infrastructure (though the only tests are for x86 as of now)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
 rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
 N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
 kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
 UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
 Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
 =bPlD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - VHE optimizations

   - EL2 address space randomization

   - speculative execution mitigations ("variant 3a", aka execution past
     invalid privilege register access)

   - bugfixes and cleanups

  PPC:
   - improvements for the radix page fault handler for HV KVM on POWER9

  s390:
   - more kvm stat counters

   - virtio gpu plumbing

   - documentation

   - facilities improvements

  x86:
   - support for VMware magic I/O port and pseudo-PMCs

   - AMD pause loop exiting

   - support for AMD core performance extensions

   - support for synchronous register access

   - expose nVMX capabilities to userspace

   - support for Hyper-V signaling via eventfd

   - use Enlightened VMCS when running on Hyper-V

   - allow userspace to disable MWAIT/HLT/PAUSE vmexits

   - usual roundup of optimizations and nested virtualization bugfixes

  Generic:
   - API selftest infrastructure (though the only tests are for x86 as
     of now)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
  kvm: x86: fix a prototype warning
  kvm: selftests: add sync_regs_test
  kvm: selftests: add API testing infrastructure
  kvm: x86: fix a compile warning
  KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
  KVM: X86: Introduce handle_ud()
  KVM: vmx: unify adjacent #ifdefs
  x86: kvm: hide the unused 'cpu' variable
  KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
  Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
  kvm: Add emulation for movups/movupd
  KVM: VMX: raise internal error for exception during invalid protected mode state
  KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
  KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  KVM: x86: Fix misleading comments on handling pending exceptions
  KVM: x86: Rename interrupt.pending to interrupt.injected
  KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
  x86/kvm: use Enlightened VMCS when running on Hyper-V
  x86/hyper-v: detect nested features
  x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  ...
2018-04-09 11:42:31 -07:00