Commits
ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
8570978ea0 ("x86/boot/compressed/64: Don't pre-map memory in KASLR code")
set up a new page table in the decompressor stub, but without explicit
mappings for boot_params and the kernel command line, relying on the #PF
handler instead.
This is fragile, as boot_params and the command line mappings are
required for the main kernel. If EARLY_PRINTK and RANDOMIZE_BASE are
disabled, a QEMU/OVMF boot never accesses the command line in the
decompressor stub, and so it never gets mapped. The main kernel accesses
it from the identity mapping if AMD_MEM_ENCRYPT is enabled, and will
crash.
Fix this by adding back the explicit mapping of boot_params and the
command line.
Note: the changes also removed the explicit mapping of the main kernel,
with the result that .bss and .brk may not be in the identity mapping,
but those don't get accessed by the main kernel before it switches to
its own page tables.
[ bp: Pass boot_params with a MOV %rsp... instead of PUSH/POP. Use
block formatting for the comment. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20201016200404.1615994-1-nivedita@alum.mit.edu
Commit
ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
started using a new set of pagetables even without KASLR.
After that commit, initialize_identity_maps() is called before the
5-level paging variables are setup in choose_random_location(), which
will not work if 5-level paging is actually enabled.
Fix this by moving the initialization of __pgtable_l5_enabled,
pgdir_shift and ptrs_per_p4d into cleanup_trampoline(), which is called
immediately after the finalization of whether the kernel is executing
with 4- or 5-level paging. This will be earlier than anything that might
require those variables, and keeps the 4- vs 5-level paging code all in
one place.
Fixes: ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20201010191110.4060905-1-nivedita@alum.mit.edu
called SEV by also encrypting the guest register state, making the
registers inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared between
the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so
in order for that exception mechanism to work, the early x86 init code
needed to be made able to handle exceptions, which, in itself, brings
a bunch of very nice cleanups and improvements to the early boot code
like an early page fault handler, allowing for on-demand building of the
identity mapping. With that, !KASLR configurations do not use the EFI
page table anymore but switch to a kernel-controlled one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly
separate from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and behind
static keys to minimize the performance impact on !SEV-ES setups.
Work by Joerg Roedel and Thomas Lendacky and others.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+FiKYACgkQEsHwGGHe
VUqS5BAAlh5mKwtxXMyFyAIHa5tpsgDjbecFzy1UVmZyxN0JHLlM3NLmb+K52drY
PiWjNNMi/cFMFazkuLFHuY0poBWrZml8zRS/mExKgUJC6EtguS9FQnRE9xjDBoWQ
gOTSGJWEzT5wnFqo8qHwlC2CDCSF1hfL8ks3cUFW2tCWus4F9pyaMSGfFqD224rg
Lh/8+arDMSIKE4uH0cm7iSuyNpbobId0l5JNDfCEFDYRigQZ6pZsQ9pbmbEpncs4
rmjDvBA5eHDlNMXq0ukqyrjxWTX4ZLBOBvuLhpyssSXnnu2T+Tcxg09+ZSTyJAe0
LyC9Wfo0v78JASXMAdeH9b1d1mRYNMqjvnBItNQoqweoqUXWz7kvgxCOp6b/G4xp
cX5YhB6BprBW2DXL45frMRT/zX77UkEKYc5+0IBegV2xfnhRsjqQAQaWLIksyEaX
nz9/C6+1Sr2IAv271yykeJtY6gtlRjg/usTlYpev+K0ghvGvTmuilEiTltjHrso1
XAMbfWHQGSd61LNXofvx/GLNfGBisS6dHVHwtkayinSjXNdWxI6w9fhbWVjQ+y2V
hOF05lmzaJSG5kPLrsFHFqm2YcxOmsWkYYDBHvtmBkMZSf5B+9xxDv97Uy9NETcr
eSYk//TEkKQqVazfCQS/9LSm0MllqKbwNO25sl0Tw2k6PnheO2g=
=toqi
-----END PGP SIGNATURE-----
Merge tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV-ES support from Borislav Petkov:
"SEV-ES enhances the current guest memory encryption support called SEV
by also encrypting the guest register state, making the registers
inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared
between the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
so in order for that exception mechanism to work, the early x86 init
code needed to be made able to handle exceptions, which, in itself,
brings a bunch of very nice cleanups and improvements to the early
boot code like an early page fault handler, allowing for on-demand
building of the identity mapping. With that, !KASLR configurations do
not use the EFI page table anymore but switch to a kernel-controlled
one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly separate
from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and
behind static keys to minimize the performance impact on !SEV-ES
setups.
Work by Joerg Roedel and Thomas Lendacky and others"
* tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
x86/sev-es: Check required CPU features for SEV-ES
x86/efi: Add GHCB mappings when SEV-ES is active
x86/sev-es: Handle NMI State
x86/sev-es: Support CPU offline/online
x86/head/64: Don't call verify_cpu() on starting APs
x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
x86/realmode: Setup AP jump table
x86/realmode: Add SEV-ES specific trampoline entry point
x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
x86/sev-es: Handle #DB Events
x86/sev-es: Handle #AC Events
x86/sev-es: Handle VMMCALL Events
x86/sev-es: Handle MWAIT/MWAITX Events
x86/sev-es: Handle MONITOR/MONITORX Events
x86/sev-es: Handle INVD Events
x86/sev-es: Handle RDPMC Events
x86/sev-es: Handle RDTSC(P) Events
...
the .fixup section, by Uros Bizjak.
* Replace __force_order dummy variable with a memory clobber to fix LLVM
requiring a definition for former and to prevent memory accesses from
still being cached/reordered, by Arvind Sankar.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EODIACgkQEsHwGGHe
VUqPBRAAguaiNy8gPGNRSvqRWTzbxh/IAqB+5rjSH48biRnZm4o7Nsw9tL8kSXN/
yWcGJxEtvheaITFh+rN31jINPCuLdQ2/LaJ+fX13zhgaMmX5RrLZ3FPoGa+eu+y5
yAN8GaBM3VZ14Yzou8q5JF5001yRxXM8UsRzg8XVO7TORB6OOxnnrUbxYvUcLer5
O219NnRtClU1ojZc5u2P1vR5McwIMf66qIkH1gn477utxeFOL380p/ukPOTNPYUH
HsCVLJl0RPVQMI0UNiiRw6V76fHi38kIYJfR7Rg6Jy+k/U0z+eDXPg2/aHZj63NP
K7pZ7XgbaBWbHSr8C9+CsCCAmTBYOascVpcu7X+qXJPS93IKpg7e+9rAKKqlY5Wq
oe6IN975TjzZ+Ay0ZBRlxzFOn2ZdSPJIJhCC3MyDlBgx7KNIVgmvKQ1BiKQ/4ZQX
foEr6HWIIKzQQwyI++pC0AvZ63hwM8X3xIF+6YsyXvNrGs+ypEhsAQpa4Q3XXvDi
88afyFAhdgClbvAjbjefkPekzzLv+CYJa2hUCqsuR8Kh55DiAs204oszVHs4HzBk
nqLffuaKXo7Vg6XOMiK/y8pGWsu5Sdp1YMBVoedvENvrVf5awt1SapV31dKC+6g9
iF6ljSMJWYmLOmNSC3wmdivEgMLxWfgejKH6ltWnR6MZUE5KeGE=
=8moV
-----END PGP SIGNATURE-----
Merge tag 'x86_asm_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Borislav Petkov:
"Two asm wrapper fixes:
- Use XORL instead of XORQ to avoid a REX prefix and save some bytes
in the .fixup section, by Uros Bizjak.
- Replace __force_order dummy variable with a memory clobber to fix
LLVM requiring a definition for former and to prevent memory
accesses from still being cached/reordered, by Arvind Sankar"
* tag 'x86_asm_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Replace __force_order with a memory clobber
x86/uaccess: Use XORL %0,%0 in __get_user_asm()
also fixes some corner case bugs.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EjbwRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g5ihAAw1B4ZV+yG6BKMIx8oC+LSefSX0lk6nv0
pmPWIdvmlL1FHlyvALymU2UCneA7KYkGjdz6LYtXlvdGTpVj1M7mJAFBp8ugJj2k
oOelkohTsTHOU/Oae4+OX8E8Kt5PsFS21wSoTo+WYI2pgl1iGzVPPCreU+uMB3lP
IaenOvBlNeXUy0S1tDYLFQT3wtS+lC2Yu6mG5AklDxGvAxpO73U9KJnRwpx34aCL
Z24fW5gZkZg5KQa1ULjZLMUoCNEz24QP/V5PMmgbD3QRkxTeNliaGPvbBV5c2fJd
WOaI4QUEah2wTtRgXv3z1SMbhAFnIoo1fAZKs1DEfT+HUO+Aw6YDpH58TdaPCs4/
ES3Tk9lcRUITAeilbO4pGPUEq+9FPHZ+hZaevQTDcjHmE4VPG984UqFzm2Pc4NEb
gZ6R0zKU9I7w6GFyXn3KeaPwf79CVGuJ4Zd1Is86ab2ft7V3m4cyHT7BKBkF+skD
MQEAhroE7WL/+E1mvK/zej9I9PTbHtk50GdmKTzcnrNdFLehGz6hyMM/PnBKE2j5
Gsw+x/DjCXQSpwVTkJCBLf3BqMpbi7C5vzwbIPYcKuE3Ggo0f1n9rUUJwCNybo21
XhSWq8GFX35+5nBLfznXRbaU1uyvWwWvheWwR56LbTHgdPUlgVr/aa/KQ5bliz0a
JCUUcDnUZjw=
=+VGg
-----END PGP SIGNATURE-----
Merge tag 'x86-kaslr-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kaslr updates from Ingo Molnar:
"This cleans up and simplifies the x86 KASLR code, and also fixes some
corner case bugs"
* tag 'x86-kaslr-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/kaslr: Replace strlen() with strnlen()
x86/kaslr: Add a check that the random address is in range
x86/kaslr: Make local variables 64-bit
x86/kaslr: Replace 'unsigned long long' with 'u64'
x86/kaslr: Make minimum/image_size 'unsigned long'
x86/kaslr: Small cleanup of find_random_phys_addr()
x86/kaslr: Drop unnecessary alignment in find_random_virt_addr()
x86/kaslr: Drop redundant check in store_slot_info()
x86/kaslr: Make the type of number of slots/slot areas consistent
x86/kaslr: Drop test for command-line parameters before parsing
x86/kaslr: Simplify process_gb_huge_pages()
x86/kaslr: Short-circuit gb_huge_pages on x86-32
x86/kaslr: Fix off-by-one error in process_gb_huge_pages()
x86/kaslr: Drop some redundant checks from __process_mem_region()
x86/kaslr: Drop redundant variable in __process_mem_region()
x86/kaslr: Eliminate 'start_orig' local variable from __process_mem_region()
x86/kaslr: Drop redundant cur_entry from __process_mem_region()
x86/kaslr: Fix off-by-one error in __process_mem_region()
x86/kaslr: Initialize mem_limit to the real maximum address
x86/kaslr: Fix process_efi_entries comment
...
because the heuristics that various linkers & compilers use to handle them
(include these bits into the output image vs discarding them silently)
are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook (et al)
adds build time asserts and build time warnings if there's any orphan section
in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix a metric
ton of dependencies and some outright bugs related to this, before we can
finally enable the checks on the x86, ARM and ARM64 platforms.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+Edv4RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hiKBAApdJEOaK7hMc3013DYNctklIxEPJL2mFJ
11YJRIh4pUJTF0TE+EHT/D+rSIuRsyuoSmOQBQ61/wVSnyG067GjjVJRqh/eYaJ1
fDhJi2FuHOjXl+CiN0KxzBjjp+V4NhF7jHT59tpQSvfZeg7FjteoxfztxaCp5ek3
S3wHB3CC4c4jE3lfjHem1E9/PwT4kwPYx1c3gAUdEqJdjkihjX9fWusfjLeqW6/d
Y5VkApi6bL9XiZUZj5l0dEIweLJJ86+PkKJqpo3spxxEak1LSn1MEix+lcJ8e1Kg
sb/bEEivDcmFlFWOJnn0QLquCR0Cx5bz1pwsL0tuf0yAd4+sXX5IMuGUysZlEdKM
BHL9h5HbevGF4BScwZwZH7lyEg7q67s5KnRu4hxy0Swfcj7y0oT/9lXqpbpZ2DqO
Hd+bRRQKIbqnTMp0hcit9LfpLp93vj0dBlaV5ocAJJlu62u9VnwGG5HQuZ5giLUr
kA1SLw63Y1wopFRxgFyER8les7eLsu0zxHeK44rRVlVnfI99OMTOgVNicmDFy3Fm
AfcnfJG0BqBEJGQz5es34uQQKKBwFPtC9NztopI62KiwOspYYZyrO1BNxdOc6DlS
mIHrmO89HMXuid5eolvLaFqUWirHoWO8TlycgZxUWVHc2txVPjAEU/axouU/dSSU
w/6GpzAa+7g=
=fXAw
-----END PGP SIGNATURE-----
Merge tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull orphan section checking from Ingo Molnar:
"Orphan link sections were a long-standing source of obscure bugs,
because the heuristics that various linkers & compilers use to handle
them (include these bits into the output image vs discarding them
silently) are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook
(et al) adds build time asserts and build time warnings if there's any
orphan section in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix
a metric ton of dependencies and some outright bugs related to this,
before we can finally enable the checks on the x86, ARM and ARM64
platforms"
* tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/boot/compressed: Warn on orphan section placement
x86/build: Warn on orphan section placement
arm/boot: Warn on orphan section placement
arm/build: Warn on orphan section placement
arm64/build: Warn on orphan section placement
x86/boot/compressed: Add missing debugging sections to output
x86/boot/compressed: Remove, discard, or assert for unwanted sections
x86/boot/compressed: Reorganize zero-size section asserts
x86/build: Add asserts for unwanted sections
x86/build: Enforce an empty .got.plt section
x86/asm: Avoid generating unused kprobe sections
arm/boot: Handle all sections explicitly
arm/build: Assert for unwanted sections
arm/build: Add missing sections
arm/build: Explicitly keep .ARM.attributes sections
arm/build: Refactor linker script headers
arm64/build: Assert for unwanted sections
arm64/build: Add missing DWARF sections
arm64/build: Use common DISCARDS in linker script
arm64/build: Remove .eh_frame* sections due to unwind tables
...
The CRn accessor functions use __force_order as a dummy operand to
prevent the compiler from reordering CRn reads/writes with respect to
each other.
The fact that the asm is volatile should be enough to prevent this:
volatile asm statements should be executed in program order. However GCC
4.9.x and 5.x have a bug that might result in reordering. This was fixed
in 8.1, 7.3 and 6.5. Versions prior to these, including 5.x and 4.9.x,
may reorder volatile asm statements with respect to each other.
There are some issues with __force_order as implemented:
- It is used only as an input operand for the write functions, and hence
doesn't do anything additional to prevent reordering writes.
- It allows memory accesses to be cached/reordered across write
functions, but CRn writes affect the semantics of memory accesses, so
this could be dangerous.
- __force_order is not actually defined in the kernel proper, but the
LLVM toolchain can in some cases require a definition: LLVM (as well
as GCC 4.9) requires it for PIE code, which is why the compressed
kernel has a definition, but also the clang integrated assembler may
consider the address of __force_order to be significant, resulting in
a reference that requires a definition.
Fix this by:
- Using a memory clobber for the write functions to additionally prevent
caching/reordering memory accesses across CRn writes.
- Using a dummy input operand with an arbitrary constant address for the
read functions, instead of a global variable. This will prevent reads
from being reordered across writes, while allowing memory loads to be
cached/reordered across CRn reads, which should be safe.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
Link: https://lore.kernel.org/lkml/20200527135329.1172644-1-arnd@arndb.de/
Link: https://lkml.kernel.org/r/20200902232152.3709896-1-nivedita@alum.mit.edu
The x86-64 psABI [0] specifies special relocation types
(R_X86_64_[REX_]GOTPCRELX) for indirection through the Global Offset
Table, semantically equivalent to R_X86_64_GOTPCREL, which the linker
can take advantage of for optimization (relaxation) at link time. This
is supported by LLD and binutils versions 2.26 onwards.
The compressed kernel is position-independent code, however, when using
LLD or binutils versions before 2.27, it must be linked without the -pie
option. In this case, the linker may optimize certain instructions into
a non-position-independent form, by converting foo@GOTPCREL(%rip) to $foo.
This potential issue has been present with LLD and binutils-2.26 for a
long time, but it has never manifested itself before now:
- LLD and binutils-2.26 only relax
movq foo@GOTPCREL(%rip), %reg
to
leaq foo(%rip), %reg
which is still position-independent, rather than
mov $foo, %reg
which is permitted by the psABI when -pie is not enabled.
- GCC happens to only generate GOTPCREL relocations on mov instructions.
- CLang does generate GOTPCREL relocations on non-mov instructions, but
when building the compressed kernel, it uses its integrated assembler
(due to the redefinition of KBUILD_CFLAGS dropping -no-integrated-as),
which has so far defaulted to not generating the GOTPCRELX
relocations.
Nick Desaulniers reports [1,2]:
"A recent change [3] to a default value of configuration variable
(ENABLE_X86_RELAX_RELOCATIONS OFF -> ON) in LLVM now causes Clang's
integrated assembler to emit R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX
relocations. LLD will relax instructions with these relocations based
on whether the image is being linked as position independent or not.
When not, then LLD will relax these instructions to use absolute
addressing mode (R_RELAX_GOT_PC_NOPIC). This causes kernels built with
Clang and linked with LLD to fail to boot."
Patch series [4] is a solution to allow the compressed kernel to be
linked with -pie unconditionally, but even if merged is unlikely to be
backported. As a simple solution that can be applied to stable as well,
prevent the assembler from generating the relaxed relocation types using
the -mrelax-relocations=no option. For ease of backporting, do this
unconditionally.
[0] https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/linker-optimization.tex#L65
[1] https://lore.kernel.org/lkml/20200807194100.3570838-1-ndesaulniers@google.com/
[2] https://github.com/ClangBuiltLinux/linux/issues/1121
[3] https://reviews.llvm.org/rGc41a18cf61790fc898dcda1055c3efbf442c14c0
[4] https://lore.kernel.org/lkml/20200731202738.2577854-1-nivedita@alum.mit.edu/
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200812004308.1448603-1-nivedita@alum.mit.edu
Make sure the machine supports RDRAND, otherwise there is no trusted
source of randomness in the system.
To also check this in the pre-decompression stage, make has_cpuflag()
not depend on CONFIG_RANDOMIZE_BASE anymore.
Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-73-joro@8bytes.org
Calling down to EFI runtime services can result in the firmware
performing VMGEXIT calls. The firmware is likely to use the GHCB of the
OS (e.g., for setting EFI variables), so each GHCB in the system needs
to be identity-mapped in the EFI page tables, as unencrypted, to avoid
page faults.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Moved GHCB mapping loop to sev-es.c ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-72-joro@8bytes.org
Implement a handler for #VC exceptions caused by RDTSC and RDTSCP
instructions. Also make it available in the pre-decompression stage
because the KASLR code uses RDTSC/RDTSCP to gather entropy and some
hypervisors intercept these instructions.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: - Adapt to #VC handling infrastructure
- Make it available early ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-55-joro@8bytes.org
Force a page-fault on any further accesses to the GHCB page when they
shouldn't happen anymore. This will catch any bugs where a #VC exception
is raised even though none is expected anymore.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-25-joro@8bytes.org
Install an exception handler for #VC exception that uses a GHCB. Also
add the infrastructure for handling different exit-codes by decoding
the instruction that caused the exception and error handling.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-24-joro@8bytes.org
The functions are needed to map the GHCB for SEV-ES guests. The GHCB
is used for communication with the hypervisor, so its content must not
be encrypted. After the GHCB is not needed anymore it must be mapped
encrypted again so that the running kernel image can safely re-use the
memory.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-23-joro@8bytes.org
Call set_sev_encryption_mask() while still on the stage 1 #VC-handler
because the stage 2 handler needs the kernel's own page tables to be
set up, to which calling set_sev_encryption_mask() is a prerequisite.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-21-joro@8bytes.org
Add the first handler for #VC exceptions. At stage 1 there is no GHCB
yet because the kernel might still be running on the EFI page table.
The stage 1 handler is limited to the MSR-based protocol to talk to the
hypervisor and can only support CPUID exit-codes, but that is enough to
get to stage 2.
[ bp: Zap superfluous newlines after rd/wrmsr instruction mnemonics. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-20-joro@8bytes.org
Changing the function to take start and end as parameters instead of
start and size simplifies the callers which don't need to calculate the
size if they already have start and end.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-19-joro@8bytes.org
With the page-fault handler in place, he identity mapping can be built
on-demand. So remove the code which manually creates the mappings and
unexport/remove the functions used for it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-18-joro@8bytes.org
When booted through startup_64(), the kernel keeps running on the EFI
page table until the KASLR code sets up its own page table. Without
KASLR, the pre-decompression boot code never switches off the EFI page
table. Change that by unconditionally switching to a kernel-controlled
page table after relocation.
This makes sure the kernel can make changes to the mapping when
necessary, for example map pages unencrypted in SEV and SEV-ES guests.
Also, remove the debug_putstr() calls in initialize_identity_maps()
because the function now runs before console_init() is called.
[ bp: Massage commit message. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-17-joro@8bytes.org
Install a page-fault handler to add an identity mapping to addresses
not yet mapped. Also do some checking whether the error code is sane.
This makes non SEV-ES machines use the exception handling
infrastructure in the pre-decompressions boot code too, making it less
likely to break in the future.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-16-joro@8bytes.org
The file contains only code related to identity-mapped page tables.
Rename the file and compile it always in.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-15-joro@8bytes.org
Add code needed to setup an IDT in the early pre-decompression
boot-code. The IDT is loaded first in startup_64, which is after
EfiExitBootServices() has been called, and later reloaded when the
kernel image has been relocated to the end of the decompression area.
This allows to setup different IDT handlers before and after the
relocation.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-14-joro@8bytes.org
The x86-64 ABI defines a red-zone on the stack:
The 128-byte area beyond the location pointed to by %rsp is considered
to be reserved and shall not be modified by signal or interrupt
handlers. Therefore, functions may use this area for temporary data
that is not needed across function calls. In particular, leaf
functions may use this area for their entire stack frame, rather than
adjusting the stack pointer in the prologue and epilogue. This area is
known as the red zone.
This is not compatible with exception handling, because the IRET frame
written by the hardware at the stack pointer and the functions to handle
the exception will overwrite the temporary variables of the interrupted
function, causing undefined behavior. So disable red-zones for the
pre-decompression boot code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-13-joro@8bytes.org
We don't want to depend on the linker's orphan section placement
heuristics as these can vary between linkers, and may change between
versions. All sections need to be explicitly handled in the linker
script.
Now that all sections are explicitly handled, enable orphan section
warnings.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20200902025347.2504702-6-keescook@chromium.org
In preparation for warning on orphan sections, stop the linker from
generating the .eh_frame* sections, discard unwanted non-zero-sized
generated sections, and enforce other expected-to-be-zero-sized sections
(since discarding them might hide problems with them suddenly gaining
unexpected entries).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200821194310.3089815-28-keescook@chromium.org
The .comment section doesn't belong in STABS_DEBUG. Split it out into a
new macro named ELF_DETAILS. This will gain other non-debug sections
that need to be accounted for when linking with --orphan-handling=warn.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org
Since commits
c041b5ad86 ("x86, boot: Create a separate string.h file to provide standard string functions")
fb4cac573e ("x86, boot: Move memcmp() into string.h and string.c")
the decompressor stub has been using the compiler's builtin memcpy,
memset and memcmp functions, _except_ where it would likely have the
largest impact, in the decompression code itself.
Remove the #undef's of memcpy and memset in misc.c so that the
decompressor code also uses the compiler builtins.
The rationale given in the comment doesn't really apply: just because
some functions use the out-of-line version is no reason to not use the
builtin version in the rest.
Replace the comment with an explanation of why memzero and memmove are
being #define'd.
Drop the suggestion to #undef in boot/string.h as well: the out-of-line
versions are not really optimized versions, they're generic code that's
good enough for the preboot environment. The compiler will likely
generate better code for constant-size memcpy/memset/memcmp if it is
allowed to.
Most decompressors' performance is unchanged, with the exception of LZ4
and 64-bit ZSTD.
Before After ARCH
LZ4 73ms 10ms 32
LZ4 120ms 10ms 64
ZSTD 90ms 74ms 64
Measurements on QEMU on 2.2GHz Broadwell Xeon, using defconfig kernels.
Decompressor code size has small differences, with the largest being
that 64-bit ZSTD decreases just over 2k. The largest code size increase
was on 64-bit XZ, of about 400 bytes.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Suggested-by: Nick Terrell <nickrterrell@gmail.com>
Tested-by: Nick Terrell <nickrterrell@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a linker script check that there are no run-time relocations, and
remove the old one that tries to check via looking for specially-named
sections in the object files.
Drop the tests for -fPIE compiler option and -pie linker option, as they
are available in all supported gcc and binutils versions (as well as
clang and lld).
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://lore.kernel.org/r/20200731230820.1742553-8-keescook@chromium.org
The BFD linker generates run-time relocations for z_input_len and
z_output_len, even though they are absolute symbols.
This is fixed for binutils-2.35 [1]. Work around this for earlier
versions by defining two variables input_len and output_len in addition
to the symbols, and use them via position-independent references.
This eliminates the last two run-time relocations in the head code and
allows us to drop the -z noreloc-overflow flag to the linker.
Move the -pie and --no-dynamic-linker LDFLAGS to LDFLAGS_vmlinux instead
of KBUILD_LDFLAGS. There shouldn't be anything else getting linked, but
this is the more logical location for these flags, and modversions might
call the linker if an EXPORT_SYMBOL is left over accidentally in one of
the decompressors.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=25754
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20200731230820.1742553-7-keescook@chromium.org
The assembly code in head_{32,64}.S, while meant to be
position-independent, generates run-time relocations because it uses
instructions such as:
leal gdt(%edx), %eax
which make the assembler and linker think that the code is using %edx as
an index into gdt, and hence gdt needs to be relocated to its run-time
address.
On 32-bit, with lld Dmitry Golovin reports that this results in a
link-time error with default options (i.e. unless -z notext is
explicitly passed):
LD arch/x86/boot/compressed/vmlinux
ld.lld: error: can't create dynamic relocation R_386_32 against local
symbol in readonly segment; recompile object files with -fPIC or pass
'-Wl,-z,notext' to allow text relocations in the output
With the BFD linker, this generates a warning during the build, if
--warn-shared-textrel is enabled, which at least Gentoo enables by
default:
LD arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating a DT_TEXTREL in object
On 64-bit, it is not possible to link the kernel as -pie with lld, and
it is only possible with a BFD linker that supports -z noreloc-overflow,
i.e. versions >2.26. This is because these instructions cannot really be
relocated: the displacement field is only 32-bits wide, and thus cannot
be relocated for a 64-bit load address. The -z noreloc-overflow option
simply overrides the linker error, and results in R_X86_64_RELATIVE
relocations that apply a 64-bit relocation to a 32-bit field anyway.
This happens to work because nothing will process these run-time
relocations.
Start fixing this by removing relocations from .head.text:
- On 32-bit, use a base register that holds the address of the GOT and
reference symbol addresses using @GOTOFF, i.e.
leal gdt@GOTOFF(%edx), %eax
- On 64-bit, most of the code can (and already does) use %rip-relative
addressing, however the .code32 bits can't, and the 64-bit code also
needs to reference symbol addresses as they will be after moving the
compressed kernel to the end of the decompression buffer.
For these cases, reference the symbols as an offset to startup_32 to
avoid creating relocations, i.e.:
leal (gdt-startup_32)(%bp), %eax
This only works in .head.text as the subtraction cannot be represented
as a PC-relative relocation unless startup_32 is in the same section
as the code. Move efi32_pe_entry into .head.text so that it can use
the same method to avoid relocations.
Reported-by: Dmitry Golovin <dima@golovin.in>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20200731230820.1742553-6-keescook@chromium.org
In a previous patch, we have eliminated GOT entries from the decompressor
binary and added an assertion that the .got section is empty. This means
that the GOT fixup routines that exist in both the 32-bit and 64-bit
startup routines have become dead code, and can be removed.
While at it, drop the KEEP() from the linker script, as it has no effect
on the contents of output sections that are created by the linker itself.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200731230820.1742553-4-keescook@chromium.org
Eliminate all GOT entries in the decompressor binary, by forcing hidden
visibility for all symbol references, which informs the compiler that
such references will be resolved at link time without the need for
allocating GOT entries.
To ensure that no GOT entries will creep back in, add an assertion to
the decompressor linker script that will fire if the .got section has
a non-zero size.
[Arvind: move hidden.h to include/linux instead of making a copy]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200731230820.1742553-3-keescook@chromium.org
The .got.plt section contains the part of the GOT which is used by PLT
entries, and which gets updated lazily by the dynamic loader when
function calls are dispatched through those PLT entries.
On fully linked binaries such as the kernel proper or the decompressor,
this never happens, and so in practice, the .got.plt section consists
only of the first 3 magic entries that are meant to point at the _DYNAMIC
section and at the fixup routine in the loader. However, since we don't
use a dynamic loader, those entries are never populated or used.
This means that treating those entries like ordinary GOT entries, and
updating their values based on the actual placement of the executable in
memory is completely pointless, and we can just ignore the .got.plt
section entirely, provided that it has no additional entries beyond
the first 3 ones.
So add an assertion in the linker script to ensure that this assumption
holds, and move the contents out of the [_got, _egot) memory range that
is modified by the GOT fixup routines.
While at it, drop the KEEP(), since it has no effect on the contents
of output sections that are created by the linker itself.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200731230820.1742553-2-keescook@chromium.org
- run the checker (e.g. sparse) after the compiler
- remove unneeded cc-option tests for old compiler flags
- fix tar-pkg to install dtbs
- introduce ccflags-remove-y and asflags-remove-y syntax
- allow to trace functions in sub-directories of lib/
- introduce hostprogs-always-y and userprogs-always-y syntax
- various Makefile cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl8wJXEVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGMGEP/0jDq/WafbfPN0aU83EqEWLt/sKg
bluzmf/6HGx3XVRnuAzsHNNqysUx77WJiDsU/jbC/zdH8Iox3Sc1diE2sELLNAfY
iJmQ8NBPggyU74aYG3OJdpDjz8T9EX/nVaYrjyFlbuXElM+Qvo8Z4Fz6NpWqKWlA
gU+yGxEPPdX6MLHcSPSIu1hGWx7UT4fgfx3zDFTI2qvbQgQjKtzyTjAH5Cm3o87h
rfomvHSSoAUg+Fh1LediRh1tJlkdVO+w7c+LNwCswmdBtkZuxecj1bQGUTS8GaLl
CCWOKYfWp0KsVf1veXNNNaX/ecbp+Y34WErFq3V9Fdq5RmVlp+FPSGMyjDMRiQ/p
LGvzbJLPpG586MnK8of0dOj6Es6tVPuq6WH2HuvsyTGcZJDpFTTxRcK3HDkE8ig6
ZtuM3owB/Mep8IzwY2yWQiDrc7TX5Fz8S4hzGPU1zG9cfj4VT6TBqHGAy1Eql/0l
txj6vJpnbQSdXiIX8MIU3yH35Y7eW3JYWgspTZH5Woj1S/wAWwuG93Fuuxq6mQIJ
q6LSkMavtOfuCjOA9vJBZewpKXRU6yo0CzWNL/5EZ6z/r/I+DGtfb/qka8oYUDjX
9H0cecL37AQxDHRPTxCZDQF0TpYiFJ6bmnMftK9NKNuIdvsk9DF7UBa3EdUNIj38
yKS3rI7Lw55xWuY3
=bkNQ
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- run the checker (e.g. sparse) after the compiler
- remove unneeded cc-option tests for old compiler flags
- fix tar-pkg to install dtbs
- introduce ccflags-remove-y and asflags-remove-y syntax
- allow to trace functions in sub-directories of lib/
- introduce hostprogs-always-y and userprogs-always-y syntax
- various Makefile cleanups
* tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: stop filtering out $(GCC_PLUGINS_CFLAGS) from cc-option base
kbuild: include scripts/Makefile.* only when relevant CONFIG is enabled
kbuild: introduce hostprogs-always-y and userprogs-always-y
kbuild: sort hostprogs before passing it to ifneq
kbuild: move host .so build rules to scripts/gcc-plugins/Makefile
kbuild: Replace HTTP links with HTTPS ones
kbuild: trace functions in subdirectories of lib/
kbuild: introduce ccflags-remove-y and asflags-remove-y
kbuild: do not export LDFLAGS_vmlinux
kbuild: always create directories of targets
powerpc/boot: add DTB to 'targets'
kbuild: buildtar: add dtbs support
kbuild: remove cc-option test of -ffreestanding
kbuild: remove cc-option test of -fno-stack-protector
Revert "kbuild: Create directory for target DTB"
kbuild: run the checker after the compiler
- Add support for zstd compressed kernel
- Define __DISABLE_EXPORTS in Makefile
- Remove __DISABLE_EXPORTS definition from kaslr.c
- Bump the heap size for zstd.
- Update the documentation.
Integrates the ZSTD decompression code to the x86 pre-boot code.
Zstandard requires slightly more memory during the kernel decompression
on x86 (192 KB vs 64 KB), and the memory usage is independent of the
window size.
__DISABLE_EXPORTS is now defined in the Makefile, which covers both
the existing use in kaslr.c, and the use needed by the zstd decompressor
in misc.c.
This patch has been boot tested with both a zstd and gzip compressed
kernel on i386 and x86_64 using buildroot and QEMU.
Additionally, this has been tested in production on x86_64 devices.
We saw a 2 second boot time reduction by switching kernel compression
from xz to zstd.
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-7-nickrterrell@gmail.com
Change the type of local variables/fields that store mem_vector
addresses to u64 to make it less likely that 32-bit overflow will cause
issues on 32-bit.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-21-nivedita@alum.mit.edu
Change type of minimum/image_size arguments in process_mem_region to
'unsigned long'. These actually can never be above 4G (even on x86_64),
and they're 'unsigned long' in every other function except this one.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-19-nivedita@alum.mit.edu
Just a trivial rearrangement to do all the processing together, and only
have one call to slots_fetch_random() in the source.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-18-nivedita@alum.mit.edu
Drop unnecessary alignment of image_size to CONFIG_PHYSICAL_ALIGN in
find_random_virt_addr, it cannot change the result: the largest valid
slot is the largest n that satisfies
minimum + n * CONFIG_PHYSICAL_ALIGN + image_size <= KERNEL_IMAGE_SIZE
(since minimum is already aligned) and so n is equal to
(KERNEL_IMAGE_SIZE - minimum - image_size) / CONFIG_PHYSICAL_ALIGN
even if image_size is not aligned to CONFIG_PHYSICAL_ALIGN.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-17-nivedita@alum.mit.edu
Drop unnecessary check that number of slots is not zero in
store_slot_info, it's guaranteed to be at least 1 by the calculation.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-16-nivedita@alum.mit.edu
The number of slots can be 'unsigned int', since on 64-bit, the maximum
amount of memory is 2^52, the minimum alignment is 2^21, so the slot
number cannot be greater than 2^31. But in case future processors have
more than 52 physical address bits, make it 'unsigned long'.
The slot areas are limited by MAX_SLOT_AREA, currently 100. It is
indexed by an int, but the number of areas is stored as 'unsigned long'.
Change both to 'unsigned int' for consistency.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-15-nivedita@alum.mit.edu
This check doesn't save anything. In the case when none of the
parameters are present, each strstr will scan args twice (once to find
the length and then for searching), six scans in total. Just going ahead
and parsing the arguments only requires three scans: strlen, memcpy, and
parsing. This will be the first malloc, so free will actually free up
the memory, so the check doesn't save heap space either.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-14-nivedita@alum.mit.edu
32-bit does not have GB pages, so don't bother checking for them. Using
the IS_ENABLED() macro allows the compiler to completely remove the
gb_huge_pages code.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-12-nivedita@alum.mit.edu
Clip the start and end of the region to minimum and mem_limit prior to
the loop. region.start can only increase during the loop, so raising it
to minimum before the loop is enough.
A region that becomes empty due to this will get checked in
the first iteration of the loop.
Drop the check for overlap extending beyond the end of the region. This
will get checked in the next loop iteration anyway.
Rename end to region_end for symmetry with region.start.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-10-nivedita@alum.mit.edu
region.size can be trimmed to store the portion of the region before the
overlap, instead of a separate mem_vector variable.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-9-nivedita@alum.mit.edu
In case of an overlap, the beginning of the region should be used even
if it is exactly image_size, not just strictly larger.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-6-nivedita@alum.mit.edu
On 64-bit, the kernel must be placed below MAXMEM (64TiB with 4-level
paging or 4PiB with 5-level paging). This is currently not enforced by
KASLR, which thus implicitly relies on physical memory being limited to
less than 64TiB.
On 32-bit, the limit is KERNEL_IMAGE_SIZE (512MiB). This is enforced by
special checks in __process_mem_region().
Initialize mem_limit to the maximum (depending on architecture), instead
of ULLONG_MAX, and make sure the command-line arguments can only
decrease it. This makes the enforcement explicit on 64-bit, and
eliminates the 32-bit specific checks to keep the kernel below 512M.
Check upfront to make sure the minimum address is below the limit before
doing any work.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200727230801.3468620-5-nivedita@alum.mit.edu
Since commit:
0982adc746 ("x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice")
process_efi_entries() will return true if we have an EFI memmap, not just
if it contained EFI_MEMORY_MORE_RELIABLE regions.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200727230801.3468620-4-nivedita@alum.mit.edu
Drop the warning on seeing "--" in handle_mem_options(). This will trigger
whenever one of the memory options is present in the command line
together with "--", but there's no problem if that is the case.
Replace goto with break.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200727230801.3468620-3-nivedita@alum.mit.edu
Handle the possibility that the command line is NULL.
Replace open-coded strlen with a function call.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200727230801.3468620-2-nivedita@alum.mit.edu
- Fix the I/O bitmap invalidation on XEN PV, which was overlooked in the
recent ioperm/iopl rework. This caused the TSS and XEN's I/O bitmap to
get out of sync.
- Use the proper vectors for HYPERV.
- Make disabling of stack protector for the entry code work with GCC
builds which enable stack protector by default. Removing the option is
not sufficient, it needs an explicit -fno-stack-protector to shut it
off.
- Mark check_user_regs() noinstr as it is called from noinstr code. The
missing annotation causes it to be placed in the text section which
makes it instrumentable.
- Add the missing interrupt disable in exc_alignment_check()
- Fixup a XEN_PV build dependency in the 32bit entry code
- A few fixes to make the Clang integrated assembler happy
- Move EFI stub build to the right place for out of tree builds
- Make prepare_exit_to_usermode() static. It's not longer called from ASM
code.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8UR+MTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQCUD/4/9W5FFvdZvQPwmXsHPaVnW9hUsXxG
0tjc34xqDcgEl1U3khu+6jj+oHx+JM+4wGP/V49Wqx6xkrJ33/a8uYErAgI7+Pmp
s3T2gXMWkgJtYFlDQdAMbeuuM2cOFZJw4BxxvTMth5EixQvk1EkX6QyBjLaSGo8y
78sWtZ6Oh5Ql9ua/9TOilewLsCsQSFIFn0o/hawwwPUMrwGvD29scha0XHom+AO7
uwejfU8klq2HJJaLaaiUaiNBkFz0TNGJtY+3mQpw8BPjCuuBQhYygrS0X4uQzo01
4XJzhDnOVbAYWqi0/T+mAEmuJ9NBZJwYiYrwBYCkZgELwJKLzhzO2GOgP9xEsFY4
VUNgqHFhKrQp10k2k4L/A5tmr+0GntiCQhdZi+/gty6oO/t3ni57pRcAhA9qBNOb
8ZqumBwgaaAIqcmdtoyXAIveWOHnzwKEg6wmIGFbyCwHjeLJKJG7KhpXIpEuX+j2
DC7EfYvRB+jllAk1CBypBvzD0DHfMZ0myPxCcZiW2wHTVAlkpY7hiIyPHqocjE9L
OjOQ7FS6E2/p24lYVcLUFWcESxGFvQjjxwXk7htjpGUIZsQOhz/LOW+CIPCsfbqE
HoEsHmNltksYYV9FDfevXRp5sbxpx3wQSLOgqNqiOpy4cTCG8boalUqHQ0OsN8Oa
EgU067yF77ymRg==
=QAeH
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull x86 fixes from Thomas Gleixner:
"A pile of fixes for x86:
- Fix the I/O bitmap invalidation on XEN PV, which was overlooked in
the recent ioperm/iopl rework. This caused the TSS and XEN's I/O
bitmap to get out of sync.
- Use the proper vectors for HYPERV.
- Make disabling of stack protector for the entry code work with GCC
builds which enable stack protector by default. Removing the option
is not sufficient, it needs an explicit -fno-stack-protector to
shut it off.
- Mark check_user_regs() noinstr as it is called from noinstr code.
The missing annotation causes it to be placed in the text section
which makes it instrumentable.
- Add the missing interrupt disable in exc_alignment_check()
- Fixup a XEN_PV build dependency in the 32bit entry code
- A few fixes to make the Clang integrated assembler happy
- Move EFI stub build to the right place for out of tree builds
- Make prepare_exit_to_usermode() static. It's not longer called from
ASM code"
* tag 'x86-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Don't add the EFI stub to targets
x86/entry: Actually disable stack protector
x86/ioperm: Fix io bitmap invalidation on Xen PV
x86: math-emu: Fix up 'cmp' insn for clang ias
x86/entry: Fix vectors to IDTENTRY_SYSVEC for CONFIG_HYPERV
x86/entry: Add compatibility with IAS
x86/entry/common: Make prepare_exit_to_usermode() static
x86/entry: Mark check_user_regs() noinstr
x86/traps: Disable interrupts in exc_aligment_check()
x86/entry/32: Fix XEN_PV build dependency
vmlinux-objs-y is added to targets, which currently means that the EFI
stub gets added to the targets as well. It shouldn't be added since it
is built elsewhere.
This confuses Makefile.build which interprets the EFI stub as a target
$(obj)/$(objtree)/drivers/firmware/efi/libstub/lib.a
and will create drivers/firmware/efi/libstub/ underneath
arch/x86/boot/compressed, to hold this supposed target, if building
out-of-tree. [0]
Fix this by pulling the stub out of vmlinux-objs-y into efi-obj-y.
[0] See scripts/Makefile.build near the end:
# Create directories for object files if they do not exist
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lkml.kernel.org/r/20200715032631.1562882-1-nivedita@alum.mit.edu
Some Makefiles already pass -ffreestanding unconditionally.
For example, arch/arm64/lib/Makefile, arch/x86/purgatory/Makefile.
No problem report so far about hard-coding this option. So, we can
assume all supported compilers know -ffreestanding.
I confirmed GCC 4.8 and Clang manuals document this option.
Get rid of cc-option from -ffreestanding.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Some Makefiles already pass -fno-stack-protector unconditionally.
For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile.
No problem report so far about hard-coding this option. So, we can
assume all supported compilers know -fno-stack-protector.
GCC 4.8 and Clang support this option (https://godbolt.org/z/_HDGzN)
Get rid of cc-option from -fno-stack-protector.
Remove CONFIG_CC_HAS_STACKPROTECTOR_NONE, which is always 'y'.
Note:
arch/mips/vdso/Makefile adds -fno-stack-protector twice, first
unconditionally, and second conditionally. I removed the second one.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Commit
17054f492d ("efi/x86: Implement mixed mode boot without the handover protocol")
introduced a new entry point for the EFI stub to be booted in mixed mode
on 32-bit firmware.
When entered via efi32_pe_entry, control is first transferred to
startup_32 to setup for the switch to long mode, and then the EFI stub
proper is entered via efi_pe_entry. efi_pe_entry is an MS ABI function,
and the ABI requires 32 bytes of shadow stack space to be allocated by
the caller, as well as the stack being aligned to 8 mod 16 on entry.
Allocate 40 bytes on the stack before switching to 64-bit mode when
calling efi_pe_entry to account for this.
For robustness, explicitly align boot_stack_end to 16 bytes. It is
currently implicitly aligned since .bss is cacheline-size aligned,
head_64.o is the first object file with a .bss section, and the heap and
boot sizes are aligned.
Fixes: 17054f492d ("efi/x86: Implement mixed mode boot without the handover protocol")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200617131957.2507632-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Merge the state of the locking kcsan branch before the read/write_once()
and the atomics modifications got merged.
Squash the fallout of the rebase on top of the read/write once and atomic
fallback work into the merge. The history of the original branch is
preserved in tag locking-kcsan-2020-06-02.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes. Fix this up with the aid of
the below script and manual adjustments here and there.
import sys
import re
if len(sys.argv) is not 3:
print "USAGE: %s <file> <header>" % (sys.argv[0])
sys.exit(1)
hdr_to_move="#include <linux/%s>" % sys.argv[2]
moved = False
in_hdrs = False
with open(sys.argv[1], "r") as f:
lines = f.readlines()
for _line in lines:
line = _line.rstrip('
')
if line == hdr_to_move:
continue
if line.startswith("#include <linux/"):
in_hdrs = True
elif not moved and in_hdrs:
moved = True
print hdr_to_move
print line
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.
Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VK9cRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ibgw//acOg/6o7HzHS19nEDfRf2grtipPq0lZN
laIBlGNQdyQHoTMbvF4X8hE1VuALdcr+kVCXirvHnTVsE62fqR8KzdTeEPHHSamy
VWZkaOGq+jZiJnM4EZ1j6y0E6Cf9SWU2Zho4Ov/j88s3aYhkYG6EU+8dZMpI2pLU
EqZAqzuZ8lJYDchv+Xbd/dN3p8DoCzbcZ5nJN+mDaHiVruLB3fk3cqBjAhAbvYFO
X2Fk4yNccvHWjGbBNbgoddTRt/ZHC+PhiIGvE+KzcDLZipjUj4M7WxznLGdILFT/
Vpys3Uewa64bQk/GURuxh7A/IjzqohCKq0pLugU3B1FW6nASCUuySbN8KroIiGo8
Vnesc6G4G+KtxJGq18/umSaDoX9RmNM7iyeGt2G3yyV5MFPz83XZmtCVHizY6ayk
PPDB1lPXks3NpdKBgH/SYDfm7GBI3CwH7ttr3+DSl8nfadfIjQtu5hnhdBLeGWj4
AVhWSTyaLfABkRoU+DEg9YbzvcywjNOp0sblIxhxFiPKECymhNdBmljQmW6EMTRg
j1El5pdYp0D+MNyBTewgD033yMm5pLsHZX+aiyG5ULizevemjWrnprzFYFnSYBZY
ivfRnsK7zzWh+cejJJiZKPPR4RDu+VNneCd2PWjqX6VwPd03QjmOI8zw7WeLSbZl
kzzhOThwvdo=
=idS6
-----END PGP SIGNATURE-----
Merge tag 'x86-build-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
"Misc dependency fixes, plus a documentation update about memory
protection keys support"
* tag 'x86-build-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Kconfig: Update config and kernel doc for MPK feature on AMD
x86/boot: Discard .discard.unreachable for arch/x86/boot/compressed/vmlinux
x86/boot/build: Add phony targets in arch/x86/boot/Makefile to PHONY
x86/boot/build: Make 'make bzlilo' not depend on vmlinux or $(obj)/bzImage
x86/boot/build: Add cpustr.h to targets and remove clean-files
- Add the initrdmem= boot option to specify an initrd embedded in RAM (flash most likely)
- Sanitize the CS value earlier during boot, which also fixes SEV-ES.
- Various fixes and smaller cleanups.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VKk8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1i25w/6A8okusHJMXyXMYddRHNiL57x3DcTRsTO
09Wz7e0YrL53HqQEyaqtSam/0VqgSaHDQb/gRb2Ci0G+XzZ3BFYvICVWTW6NcvnA
VSUoHC8Mr83Aq3UfAEcJZZ0bHNuoKymO256v2tZPGCSGgZoxQdoe4/6W1uMxxjLr
NFpeyAm93zTe1+MmA/ZcFxH+xOZPYVPhl7+KgO3muMH/hGoS3Dt+RCuB9VHTgMvf
4mN6IxN3cVHDogt7usdtWjgrYnhY0SjiWo858+MDWsrW5oXifsXLJ5jJr1Ea1nGx
qqVyaCqAVNobOkpsBLHg1DiD/rr9A4sfS/etmAjWsPO6kAx9Mq9+B2DG5fTU/gB+
zd76M3Jl3wyjdy6hPMyiZGlFFM9l3efyp/iYPhFWgPqVlkkOvbO+9FWVDbFtErQw
WpEG2d8KHN4+ph8D04ExeKJKCKaYnAaHKk13fZnjjeQhatyGGAYn6hx+rT/x+onM
2CeRG/+KcnlzKgXqYX6/YT++XlaCKgMntO/FdLT99/4CD92rqQdhwJ6JNH1U8nXO
LWjrV5ZH6R3n5Hr5+J/Kcd9/kIfAqWG3t/eiTEPEjJIUWXEdhBoQWErSce4on5a7
6eBfkKEQxIYAdC1iO2uoKEtEpMDvFWoIIVjdlVTFiJ8Np9uvv7lPByr/0TJ+N5b7
fgOrzglWuxo=
=U/uh
-----END PGP SIGNATURE-----
Merge tag 'x86-boot-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
"Misc updates:
- Add the initrdmem= boot option to specify an initrd embedded in RAM
(flash most likely)
- Sanitize the CS value earlier during boot, which also fixes SEV-ES
- Various fixes and smaller cleanups"
* tag 'x86-boot-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Correct relocation destination on old linkers
x86/boot/compressed/64: Switch to __KERNEL_CS after GDT is loaded
x86/boot: Fix -Wint-to-pointer-cast build warning
x86/boot: Add kstrtoul() from lib/
x86/tboot: Mark tboot static
x86/setup: Add an initrdmem= option to specify initrd physical address
Instead of using efi_gdt64 to switch back to 64-bit mode and then
switching to the real boot-time GDT, just switch to the boot-time GDT
directly. The two GDT's are identical other than efi_gdt64 not including
the 32-bit code segment.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200523221513.1642948-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
With commit
ce5e3f909f ("efi/printf: Add 64-bit and 8-bit integer support")
arch/x86/boot/compressed/vmlinux may have an undesired .discard.unreachable
section coming from drivers/firmware/efi/libstub/vsprintf.stub.o. That section
gets generated from unreachable() annotations when CONFIG_STACK_VALIDATION is
enabled.
.discard.unreachable contains an R_X86_64_PC32 relocation which will be
warned about by LLD: a non-SHF_ALLOC section (.discard.unreachable) is
not part of the memory image, thus conceptually the distance between a
non-SHF_ALLOC and a SHF_ALLOC is not a constant which can be resolved at
link time:
% ld.lld -m elf_x86_64 -T arch/x86/boot/compressed/vmlinux.lds ... -o arch/x86/boot/compressed/vmlinux
ld.lld: warning: vsprintf.c:(.discard.unreachable+0x0): has non-ABS relocation R_X86_64_PC32 against symbol ''
Reuse the DISCARDS macro which includes .discard.* to drop
.discard.unreachable.
[ bp: Massage and complete the commit message. ]
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Arvind Sankar <nivedita@alum.mit.edu>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://lkml.kernel.org/r/20200520182010.242489-1-maskray@google.com
For the 32-bit kernel, as described in
6d92bc9d48 ("x86/build: Build compressed x86 kernels as PIE"),
pre-2.26 binutils generates R_386_32 relocations in PIE mode. Since the
startup code does not perform relocation, any reloc entry with R_386_32
will remain as 0 in the executing code.
Commit
974f221c84 ("x86/boot: Move compressed kernel to the end of the
decompression buffer")
added a new symbol _end but did not mark it hidden, which doesn't give
the correct offset on older linkers. This causes the compressed kernel
to be copied beyond the end of the decompression buffer, rather than
flush against it. This region of memory may be reserved or already
allocated for other purposes by the bootloader.
Mark _end as hidden to fix. This changes the relocation from R_386_32 to
R_386_RELATIVE even on the pre-2.26 binutils.
For 64-bit, this is not strictly necessary, as the 64-bit kernel is only
built as PIE if the linker supports -z noreloc-overflow, which implies
binutils-2.27+, but for consistency, mark _end as hidden here too.
The below illustrates the before/after impact of the patch using
binutils-2.25 and gcc-4.6.4 (locally compiled from source) and QEMU.
Disassembly before patch:
48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax
4e: 2d 00 00 00 00 sub $0x0,%eax
4f: R_386_32 _end
Disassembly after patch:
48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax
4e: 2d 00 f0 76 00 sub $0x76f000,%eax
4f: R_386_RELATIVE *ABS*
Dump from extract_kernel before patch:
early console in extract_kernel
input_data: 0x0207c098 <--- this is at output + init_size
input_len: 0x0074fef1
output: 0x01000000
output_len: 0x00fa63d0
kernel_total_size: 0x0107c000
needed_size: 0x0107c000
Dump from extract_kernel after patch:
early console in extract_kernel
input_data: 0x0190d098 <--- this is at output + init_size - _end
input_len: 0x0074fef1
output: 0x01000000
output_len: 0x00fa63d0
kernel_total_size: 0x0107c000
needed_size: 0x0107c000
Fixes: 974f221c84 ("x86/boot: Move compressed kernel to the end of the decompression buffer")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200207214926.3564079-1-nivedita@alum.mit.edu
When the pre-decompression code loads its first GDT in startup_64(), it
is still running on the CS value of the previous GDT. In the case of
SEV-ES, this is the EFI GDT but it can be anything depending on what has
loaded the kernel (boot loader, container runtime, etc.)
To make exception handling work (especially IRET) the CPU needs to
switch to a CS value in the current GDT, so jump to __KERNEL_CS after
the first GDT is loaded. This is prudent also as a general sanitization
of CS to a known good value.
[ bp: Massage commit message. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200428151725.31091-13-joro@8bytes.org
Fix this warning when building 32-bit with
CONFIG_RANDOMIZE_BASE=y
CONFIG_MEMORY_HOTREMOVE=y
arch/x86/boot/compressed/acpi.c:316:9: warning: \
cast to pointer from integer of different size [-Wint-to-pointer-cast]
Have get_cmdline_acpi_rsdp() return unsigned long which is the proper
type to convert to a pointer of the respective width.
[ bp: Rewrite commit message, touch ups. ]
Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1587645588-7130-3-git-send-email-vamshi.k.sthambamkadi@gmail.com
Instead of using __efistub_global to force variables into the .data
section, leave them in the .bss but pull the EFI stub's .bss section
into .data in the linker script for the compressed kernel.
Add relocation checking for x86 as well to catch non-PC-relative
relocations that require runtime processing, since the EFI stub does not
do any runtime relocation processing.
This will catch, for example, data relocations created by static
initializers of pointers.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200416151227.3360778-3-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6TbaUeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhgkH/iWpiKvosA20HJjC
rBqYeJPxQsgZTuBieWJ+MeVxbpcF7RlM4c+glyvg3QJhHwIEG58dl6LBrQbAyBAR
aFHNojr1iAYOruVCGnU3pA008YZiwUIDv/ZQ4DF8fmIU2vI2mJ6qHBv3XDl4G2uR
Nwz8Eu9AgIwZM5coomVOSmoWyFy7Vxmb7W+3t5VmKsvOWx4ib9kyQtOIkvQDEl7j
XCbWfI0xDQr6LFOm4jnCi5R/LhJ2LIqqIvHHrunbpszM8IwK797jCXz4im+dmd5Y
+km46N7a8pDqri36xXz1gdBAU3eG7Pt1NyvfjwRVTdX4GquQ2MT0GoojxbLxUP3y
3pEsQuE=
=whbL
-----END PGP SIGNATURE-----
Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refresh
Resolve these conflicts:
arch/x86/Kconfig
arch/x86/kernel/Makefile
Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines
in the Kconfig to reduce the probability of future conflicts.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Here are 3 SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your current
tree, that should be trivial to handle (one file modified by two things,
one file deleted.)
All 3 of these have been in linux-next for a while, with no reported
issues other than the merge conflict.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm
m5AuCzO3Azt9KBi7NL+L
=2Lm5
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here are three SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your
current tree, that should be trivial to handle (one file modified by
two things, one file deleted.)
All three of these have been in linux-next for a while, with no
reported issues other than the merge conflict"
* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
ASoC: MT6660: make spdxcheck.py happy
.gitignore: add SPDX License Identifier
.gitignore: remove too obvious comments
Pull x86 boot updates from Ingo Molnar:
"Misc cleanups and small enhancements all around the map"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed: Fix debug_puthex() parameter type
x86/setup: Fix static memory detection
x86/vmlinux: Drop unneeded linker script discard of .eh_frame
x86/*/Makefile: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
x86/boot/compressed: Remove .eh_frame section from bzImage
x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
x86/boot/compressed/64: Use 32-bit (zero-extended) MOV for z_output_len
x86/boot/compressed/64: Use LEA to initialize boot stack pointer
In the CONFIG_X86_VERBOSE_BOOTUP=Y case, the debug_puthex() macro just
turns into __puthex(), which takes 'unsigned long' as parameter.
But in the CONFIG_X86_VERBOSE_BOOTUP=N case, it is a function which
takes 'unsigned char *', causing compile warnings when the function is
used. Fix the parameter type to get rid of the warnings.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200319091407.1481-11-joro@8bytes.org
In preparation for being able to decompress into a buffer starting at a
different address than startup_32, save the calculated output address
instead of recalculating it later.
We now keep track of three addresses:
%edx: startup_32 as we were loaded by bootloader
%ebx: new location of compressed kernel
%ebp: start of decompression buffer
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-2-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-16-ardb@kernel.org
The load address is compared with LOAD_PHYSICAL_ADDR using a signed
comparison currently (using jge instruction).
When loading a 64-bit kernel using the new efi32_pe_entry() point added by:
97aa276579 ("efi/x86: Add true mixed mode entry point into .compat section")
using Qemu with -m 3072, the firmware actually loads us above 2Gb,
resulting in a very early crash.
Use the JAE instruction to perform a unsigned comparison instead, as physical
addresses should be considered unsigned.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-6-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-14-ardb@kernel.org
code32_start is meant for 16-bit real-mode bootloaders to inform the
kernel where the 32-bit protected mode code starts. Nothing in the
protected mode kernel except the EFI stub uses it.
efi_main() currently returns boot_params, with code32_start set inside it
to tell efi_stub_entry() where startup_32 is located. Since it was invoked
by efi_stub_entry() in the first place, boot_params is already known.
Return the address of startup_32 instead.
This will allow a 64-bit kernel to live above 4Gb, for example, and it's
cleaner as well.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-5-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-13-ardb@kernel.org
The following commit:
ef5a7b5eb1 ("efi/x86: Remove GDT setup from efi_main")
introduced GDT setup into the 32-bit kernel's startup_32, and reloads
the GDTR after relocating the kernel for paranoia's sake.
A followup commit:
32d009137a ("x86/boot: Reload GDTR after copying to the end of the buffer")
introduced a similar GDTR reload in the 64-bit kernel as well.
The GDTR is adjusted by (init_size-_end), however this may not be the
correct offset to apply if the kernel was loaded at a misaligned address
or below LOAD_PHYSICAL_ADDR, as in that case the decompression buffer
has an additional offset from the original load address.
This should never happen for a conformant bootloader, but we're being
paranoid anyway, so just store the new GDT address in there instead of
adding any offsets, which is simpler as well.
Fixes: ef5a7b5eb1 ("efi/x86: Remove GDT setup from efi_main")
Fixes: 32d009137a ("x86/boot: Reload GDTR after copying to the end of the buffer")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20200226230031.3011645-2-nivedita@alum.mit.edu
This time, the set of changes for the EFI subsystem is much larger than
usual. The main reasons are:
- Get things cleaned up before EFI support for RISC-V arrives, which will
increase the size of the validation matrix, and therefore the threshold to
making drastic changes,
- After years of defunct maintainership, the GRUB project has finally started
to consider changes from the distros regarding UEFI boot, some of which are
highly specific to the way x86 does UEFI secure boot and measured boot,
based on knowledge of both shim internals and the layout of bootparams and
the x86 setup header. Having this maintenance burden on other architectures
(which don't need shim in the first place) is hard to justify, so instead,
we are introducing a generic Linux/UEFI boot protocol.
Summary of changes:
- Boot time GDT handling changes (Arvind)
- Simplify handling of EFI properties table on arm64
- Generic EFI stub cleanups, to improve command line handling, file I/O,
memory allocation, etc.
- Introduce a generic initrd loading method based on calling back into
the firmware, instead of relying on the x86 EFI handover protocol or
device tree.
- Introduce a mixed mode boot method that does not rely on the x86 EFI
handover protocol either, and could potentially be adopted by other
architectures (if another one ever surfaces where one execution mode
is a superset of another)
- Clean up the contents of struct efi, and move out everything that
doesn't need to be stored there.
- Incorporate support for UEFI spec v2.8A changes that permit firmware
implementations to return EFI_UNSUPPORTED from UEFI runtime services at
OS runtime, and expose a mask of which ones are supported or unsupported
via a configuration table.
- Various documentation updates and minor code cleanups (Heinrich)
- Partial fix for the lack of by-VA cache maintenance in the decompressor
on 32-bit ARM. Note that these patches were deliberately put at the
beginning so they can be used as a stable branch that will be shared with
a PR containing the complete fix, which I will send to the ARM tree.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl5S7WYACgkQwjcgfpV0
+n1jmQgAmwV3V8pbgB4mi4P2Mv8w5Zj5feUe6uXnTR2AFv5nygLcTzdxN+TU/6lc
OmZv2zzdsAscYlhuUdI/4t4cXIjHAZI39+UBoNRuMqKbkbvXCFscZANLxvJjHjZv
FFbgUk0DXkF0BCEDuSLNavidAv4b3gZsOmHbPfwuB8xdP05LbvbS2mf+2tWVAi2z
ULfua/0o9yiwl19jSS6iQEPCvvLBeBzTLW7x5Rcm/S0JnotzB59yMaeqD7jO8JYP
5PvI4WM/l5UfVHnZp2k1R76AOjReALw8dQgqAsT79Q7+EH3sNNuIjU6omdy+DFf4
tnpwYfeLOaZ1SztNNfU87Hsgnn2CGw==
=pDE3
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core
Pull EFI updates for v5.7 from Ard Biesheuvel:
This time, the set of changes for the EFI subsystem is much larger than
usual. The main reasons are:
- Get things cleaned up before EFI support for RISC-V arrives, which will
increase the size of the validation matrix, and therefore the threshold to
making drastic changes,
- After years of defunct maintainership, the GRUB project has finally started
to consider changes from the distros regarding UEFI boot, some of which are
highly specific to the way x86 does UEFI secure boot and measured boot,
based on knowledge of both shim internals and the layout of bootparams and
the x86 setup header. Having this maintenance burden on other architectures
(which don't need shim in the first place) is hard to justify, so instead,
we are introducing a generic Linux/UEFI boot protocol.
Summary of changes:
- Boot time GDT handling changes (Arvind)
- Simplify handling of EFI properties table on arm64
- Generic EFI stub cleanups, to improve command line handling, file I/O,
memory allocation, etc.
- Introduce a generic initrd loading method based on calling back into
the firmware, instead of relying on the x86 EFI handover protocol or
device tree.
- Introduce a mixed mode boot method that does not rely on the x86 EFI
handover protocol either, and could potentially be adopted by other
architectures (if another one ever surfaces where one execution mode
is a superset of another)
- Clean up the contents of struct efi, and move out everything that
doesn't need to be stored there.
- Incorporate support for UEFI spec v2.8A changes that permit firmware
implementations to return EFI_UNSUPPORTED from UEFI runtime services at
OS runtime, and expose a mask of which ones are supported or unsupported
via a configuration table.
- Various documentation updates and minor code cleanups (Heinrich)
- Partial fix for the lack of by-VA cache maintenance in the decompressor
on 32-bit ARM. Note that these patches were deliberately put at the
beginning so they can be used as a stable branch that will be shared with
a PR containing the complete fix, which I will send to the ARM tree.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that .eh_frame sections for the files in setup.elf and realmode.elf
are not generated anymore, the linker scripts don't need the special
output section name /DISCARD/ any more.
Remove the one in the main kernel linker script as well, since there are
no .eh_frame sections already, and fix up a comment referencing .eh_frame.
Update the comment in asm/dwarf2.h referring to .eh_frame so it continues
to make sense, as well as being more specific.
[ bp: Touch up commit message. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lkml.kernel.org/r/20200224232129.597160-3-nivedita@alum.mit.edu
While discussing a patch to discard .eh_frame from the compressed
vmlinux using the linker script, Fangrui Song pointed out [1] that these
sections shouldn't exist in the first place because arch/x86/Makefile
uses -fno-asynchronous-unwind-tables.
It turns out this is because the Makefiles used to build the compressed
kernel redefine KBUILD_CFLAGS, dropping this flag.
Add the flag to the Makefile for the compressed kernel, as well as the
EFI stub Makefile to fix this.
Also add the flag to boot/Makefile and realmode/rm/Makefile so that the
kernel's boot code (boot/setup.elf) and realmode trampoline
(realmode/rm/realmode.elf) won't be compiled with .eh_frame sections,
since their linker scripts also just discard them.
[1] https://lore.kernel.org/lkml/20200222185806.ywnqhfqmy67akfsa@google.com/
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lkml.kernel.org/r/20200224232129.597160-2-nivedita@alum.mit.edu
Discarding unnecessary sections with "*(*)" (see thread at Link: below)
works fine with the bfd linker but fails with lld:
$ make -j$(nproc) -s CC=clang LD=ld.lld O=out.x86_64 distclean defconfig bzImage
ld.lld: error: discarding .shstrtab section is not allowed
lld tries to also discard essential sections like .shstrtab, .symtab and
.strtab, which results in the link failing since .shstrtab is required
by the ELF specification: the e_shstrndx field in the ELF header is the
index of .shstrtab, and each section in the section table is required to
have an sh_name that points into the .shstrtab.
.symtab and .strtab are also necessary to generate the zoffset.h file
for the bzImage header.
Since the only sizeable section that can be discarded is .eh_frame,
restrict the discard to only .eh_frame to be safe.
[ bp: Flesh out commit message and replace offending commit with this one. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lkml.kernel.org/r/20200109150218.16544-2-nivedita@alum.mit.edu
Add support for booting 64-bit x86 kernels from 32-bit firmware running
on 64-bit capable CPUs without requiring the bootloader to implement
the EFI handover protocol or allocate the setup block, etc etc, all of
which can be done by the stub itself, using code that already exists.
Instead, create an ordinary EFI application entrypoint but implemented
in 32-bit code [so that it can be invoked by 32-bit firmware], and stash
the address of this 32-bit entrypoint in the .compat section where the
bootloader can find it.
Note that we use the setup block embedded in the binary to go through
startup_32(), but it gets reallocated and copied in efi_pe_entry(),
using the same code that runs when the x86 kernel is booted in EFI
mode from native firmware. This requires the loaded image protocol to
be installed on the kernel image's EFI handle, and point to the kernel
image itself and not to its loader. This, in turn, requires the
bootloader to use the LoadImage() boot service to load the 64-bit
image from 32-bit firmware, which is in fact supported by firmware
based on EDK2. (Only StartImage() will fail, and instead, the newly
added entrypoint needs to be invoked)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Most of the EFI stub source files of all architectures reside under
drivers/firmware/efi/libstub, where they share a Makefile with special
CFLAGS and an include file with declarations that are only relevant
for stub code.
Currently, we carry a lot of stub specific stuff in linux/efi.h only
because eboot.c in arch/x86 needs them as well. So let's move eboot.c
into libstub/, and move the contents of eboot.h that we still care
about into efistub.h
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When using the native PE entry point (as opposed to the EFI handover
protocol entry point that is used more widely), we set code32_start,
which is a 32-bit wide field, to the effective symbol address of
startup_32, which could overflow given that the EFI loader may have
located the running image anywhere in memory, and we haven't reached
the point yet where we relocate ourselves.
Since we relocate ourselves if code32_start != pref_address, this
isn't likely to lead to problems in practice, given how unlikely
it is that the truncated effective address of startup_32 happens
to equal pref_address. But it is better to defer the assignment
of code32_start to after the relocation, when it is guaranteed to
fit.
While at it, move the call to efi_relocate_kernel() to an earlier
stage so it is more likely that our preferred offset in memory has
not been occupied by other memory allocations done in the mean time.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
We have some code in the EFI stub entry point that takes the address
of the apm_bios_info struct in the newly allocated and zeroed out
boot_params structure, only to zero it out again. This is pointless
so remove it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Rearrange the instructions a bit to use a 32-bit displacement once
instead of 2/3 times. This saves 8 bytes of machine code.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-8-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The limit value for the GDTR should be such that adding it to the base
address gives the address of the last byte of the GDT, i.e. it should be
one less than the size, not the size.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-7-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The 64-bit kernel will already load a GDT in startup_64, which is the
next function to execute after return from efi_main.
Add GDT setup code to the 32-bit kernel's startup_32 as well. Doing it
in the head code has the advantage that we can avoid potentially
corrupting the GDT during copy/decompression. This also removes
dependence on having a specific GDT layout setup by the bootloader.
Both startup_32 and startup_64 now clear interrupts on entry, so we can
remove that from efi_main as well.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-6-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
startup_32 already clears these flags on entry, do it in startup_64 as
well for consistency.
The direction flag in particular is not specified to be cleared in the
boot protocol documentation, and we currently call into C code
(paging_prepare) without explicitly clearing it.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-5-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The GDT may get overwritten during the copy or during extract_kernel,
which will cause problems if any segment register is touched before the
GDTR is reloaded by the decompressed kernel. For safety update the GDTR
to point to the GDT within the copied kernel.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-4-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When booting in mixed mode, the firmware's GDT is still installed at
handover entry in efi32_stub_entry. We save the GDTR for later use in
__efi64_thunk but we are assuming that descriptor 2 (__KERNEL_CS) is a
valid 32-bit code segment descriptor and that descriptor 3
(__KERNEL_DS/__BOOT_DS) is a valid data segment descriptor.
This happens to be true for OVMF (it actually uses descriptor 1 for data
segments, but descriptor 3 is also setup as data), but we shouldn't
depend on this being the case.
Fix this by saving the code and data selectors in addition to the GDTR
in efi32_stub_entry, and restoring them in __efi64_thunk before calling
the firmware. The UEFI specification guarantees that selectors will be
flat, so using the DS selector for all the segment registers should be
enough.
We also need to install our own GDT before initializing segment
registers in startup_32, so move the GDT load up to the beginning of the
function.
[ardb: mention mixed mode in the commit log]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-3-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Commit
5b11f1cee5 ("x86, boot: straighten out ranges to copy/zero in
compressed/head*.S")
introduced a separate .pgtable section, splitting it out from the rest
of .bss. This section was added without the writeable flag, marking it
as read-only. This results in the linker putting the .rela.dyn section
(containing bogus dynamic relocations from head_64.o) after the .bss and
.pgtable sections.
When objcopy is used to convert compressed/vmlinux into a binary for
the bzImage:
$ objcopy -O binary -R .note -R .comment -S arch/x86/boot/compressed/vmlinux \
arch/x86/boot/vmlinux.bin
the .bss and .pgtable sections get materialized as ~176KiB of zero
bytes in the binary in order to place .rela.dyn at the correct location.
Fix this by marking .pgtable as writeable. This moves the .rela.dyn
section up in the ELF image layout so that .bss and .pgtable are the
last allocated sections and so don't appear in bzImage.
[ bp: Massage commit message. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200109150218.16544-1-nivedita@alum.mit.edu
GCC 10 changed the default to -fno-common, which leads to
LD arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/pgtable_64.o:(.bss+0x0): multiple definition of `__force_order'; \
arch/x86/boot/compressed/kaslr_64.o:(.bss+0x0): first defined here
make[2]: *** [arch/x86/boot/compressed/Makefile:119: arch/x86/boot/compressed/vmlinux] Error 1
Since __force_order is already provided in pgtable_64.c, there is no
need to declare __force_order in kaslr_64.c.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200124181811.4780-1-hjl.tools@gmail.com
z_output_len is the size of the decompressed payload (i.e. vmlinux +
vmlinux.relocs) and is generated as an unsigned 32-bit quantity by
mkpiggy.c.
The current
movq $z_output_len, %r9
instruction generates a sign-extended move to %r9. Using
movl $z_output_len, %r9d
will instead zero-extend into %r9, which is appropriate for an unsigned
32-bit quantity. This is also what is already done for z_input_len, the
size of the compressed payload.
[ bp:
Also, z_output_len cannot be a 64-bit quantity because it participates
in:
init_size: .long INIT_SIZE # kernel initialization size
through INIT_SIZE which is a 32-bit quantity determined by the .long
directive (vs .quad for 64-bit). Furthermore, if it really must be a
64-bit quantity, then the insn must be MOVABS which can accommodate a
64-bit immediate and which the toolchain does not generate automatically.
]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200211173333.1722739-1-nivedita@alum.mit.edu
- fix randconfig to generate a sane .config
- rename hostprogs-y / always to hostprogs / always-y, which are
more natual syntax.
- optimize scripts/kallsyms
- fix yes2modconfig and mod2yesconfig
- make multiple directory targets ('make foo/ bar/') work
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl47NfMVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGRGwP/3AHO8P0wGEeFKs3ziSMjs2W7/Pj
lN08Kuxm0u3LnyEEcHVUveoi+xBYqvrw0RsGgYf5S8q0Mpep7MPqbfkDUxV/0Zkj
QP2CsvOTbjdBjH7q3ojkwLcDl0Pxu9mg3eZMRXZ2WQeNXuMRw6Bicoh7ElvB1Bv/
HC+j30i2Me3cf/riQGSAsstvlXyIR8RaerR8PfRGESTysiiN76+JcHTatJHhOJL9
O6XKkzo8/CXMYKKVF4Ae4NP+WFg6E96/pAPx0Rf47RbPX9UG35L9rkzTDnk70Ms6
OhKiu3hXsRX7mkqApuoTqjge4+iiQcKZxYmMXU1vGlIRzjwg19/4YFP6pDSCcnIu
kKb8KN4o4N41N7MFS3OLZWwISA8Vw6RbtwDZ3AghDWb7EHb9oNW42mGfcAPr1+wZ
/KH6RHTzaz+5q2MgyMY1NhADFrhIT9CvDM+UJECgbokblnw7PHAnPmbsuVak9ZOH
u9ojO1HpTTuIYO6N6v4K5zQBZF1N+RvkmBnhHd8j6SksppsCoC/G62QxgXhF2YK3
FQMpATCpuyengLxWAmPEjsyyPOlrrdu9UxqNsXVy5ol40+7zpxuHwKcQKCa9urJR
rcpbIwLaBcLhHU4BmvBxUk5aZxxGV2F0O0gXTOAbT2xhd6BipZSMhUmN49SErhQm
NC/coUmQX7McxMXh
=sv4U
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- fix randconfig to generate a sane .config
- rename hostprogs-y / always to hostprogs / always-y, which are more
natual syntax.
- optimize scripts/kallsyms
- fix yes2modconfig and mod2yesconfig
- make multiple directory targets ('make foo/ bar/') work
* tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: make multiple directory targets work
kconfig: Invalidate all symbols after changing to y or m.
kallsyms: fix type of kallsyms_token_table[]
scripts/kallsyms: change table to store (strcut sym_entry *)
scripts/kallsyms: rename local variables in read_symbol()
kbuild: rename hostprogs-y/always to hostprogs/always-y
kbuild: fix the document to use extra-y for vmlinux.lds
kconfig: fix broken dependency in randconfig-generated .config
- Ensure that the PIT is set up when the local APIC is disable or
configured in legacy mode. This is caused by an ordering issue
introduced in the recent changes which skip PIT initialization when the
TSC and APIC frequencies are already known.
- Handle malformed SRAT tables during early ACPI parsing which caused an
infinite loop anda boot hang.
- Fix a long standing race in the affinity setting code which affects PCI
devices with non-maskable MSI interrupts. The problem is caused by the
non-atomic writes of the MSI address (destination APIC id) and data
(vector) fields which the device uses to construct the MSI message. The
non-atomic writes are mandated by PCI.
If both fields change and the device raises an interrupt after writing
address and before writing data, then the MSI block constructs a
inconsistent message which causes interrupts to be lost and subsequent
malfunction of the device.
The fix is to redirect the interrupt to the new vector on the current
CPU first and then switch it over to the new target CPU. This allows to
observe an eventually raised interrupt in the transitional stage (old
CPU, new vector) to be observed in the APIC IRR and retriggered on the
new target CPU and the new vector. The potential spurious interrupts
caused by this are harmless and can in the worst case expose a buggy
driver (all handlers have to be able to deal with spurious interrupts as
they can and do happen for various reasons).
- Add the missing suspend/resume mechanism for the HYPERV hypercall page
which prevents resume hibernation on HYPERV guests. This change got
lost before the merge window.
- Mask the IOAPIC before disabling the local APIC to prevent potentially
stale IOAPIC remote IRR bits which cause stale interrupt lines after
resume.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5AEJwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWY2D/47ur9gsVQGryKzneVAr0SCsq4Un11e
uifX4ldu4gCEBRTYhpgcpiFKeLvY/QJ6uOD+gQUHyy/s+lCf6yzE6UhXEqSCtcT7
LkSxD8jAFf6KhMA6iqYBfyxUsPMXBetLjjHWsyc/kf15O/vbYm7qf05timmNZkDS
S7C+yr3KRqRjLR7G7t4twlgC9aLcNUQihUdsH2qyTvjnlkYHJLDa0/Js7bFYYKVx
9GdUDLvPFB1mZ76g012De4R3kJsWitiyLlQ38DP5VysKulnszUCdiXlgCEFrgxvQ
OQhLafQzOAzvxQmP+1alODR0dmJZA8k0zsDeeTB/vTpRvv6+Pe2qUswLSpauBzuq
TpDsrv8/5pwZh28+91f/Unk+tH8NaVNtGe/Uf+ePxIkn1nbqL84o4NHGplM6R97d
HAWdZQZ1cGRLf6YRRJ+57oM/5xE3vBbF1Wn0+QDTFwdsk2vcxuQ4eB3M/8E1V7Zk
upp8ty50bZ5+rxQ8XTq/eb8epSRnfLoBYpi4ux6MIOWRdmKDl40cDeZCzA2kNP7m
qY1haaRN3ksqvhzc0Yf6cL+CgvC4ur8gRHezfOqmBzVoaLyVEFIVjgjR/ojf0bq8
/v+L9D5+IdIv4jEZruRRs0gOXNDzoBbvf0qKGaO0tUTWiDsv7c5AGixp8aozniHS
HXsv1lIpRuC7WQ==
=WxKD
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for X86:
- Ensure that the PIT is set up when the local APIC is disable or
configured in legacy mode. This is caused by an ordering issue
introduced in the recent changes which skip PIT initialization when
the TSC and APIC frequencies are already known.
- Handle malformed SRAT tables during early ACPI parsing which caused
an infinite loop anda boot hang.
- Fix a long standing race in the affinity setting code which affects
PCI devices with non-maskable MSI interrupts. The problem is caused
by the non-atomic writes of the MSI address (destination APIC id)
and data (vector) fields which the device uses to construct the MSI
message. The non-atomic writes are mandated by PCI.
If both fields change and the device raises an interrupt after
writing address and before writing data, then the MSI block
constructs a inconsistent message which causes interrupts to be
lost and subsequent malfunction of the device.
The fix is to redirect the interrupt to the new vector on the
current CPU first and then switch it over to the new target CPU.
This allows to observe an eventually raised interrupt in the
transitional stage (old CPU, new vector) to be observed in the APIC
IRR and retriggered on the new target CPU and the new vector.
The potential spurious interrupts caused by this are harmless and
can in the worst case expose a buggy driver (all handlers have to
be able to deal with spurious interrupts as they can and do happen
for various reasons).
- Add the missing suspend/resume mechanism for the HYPERV hypercall
page which prevents resume hibernation on HYPERV guests. This
change got lost before the merge window.
- Mask the IOAPIC before disabling the local APIC to prevent
potentially stale IOAPIC remote IRR bits which cause stale
interrupt lines after resume"
* tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Mask IOAPIC entries when disabling the local APIC
x86/hyperv: Suspend/resume the hypercall page for hibernation
x86/apic/msi: Plug non-maskable MSI affinity race
x86/boot: Handle malformed SRAT tables during early ACPI parsing
x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
In old days, the "host-progs" syntax was used for specifying host
programs. It was renamed to the current "hostprogs-y" in 2004.
It is typically useful in scripts/Makefile because it allows Kbuild to
selectively compile host programs based on the kernel configuration.
This commit renames like follows:
always -> always-y
hostprogs-y -> hostprogs
So, scripts/Makefile will look like this:
always-$(CONFIG_BUILD_BIN2C) += ...
always-$(CONFIG_KALLSYMS) += ...
...
hostprogs := $(always-y) $(always-m)
I think this makes more sense because a host program is always a host
program, irrespective of the kernel configuration. We want to specify
which ones to compile by CONFIG options, so always-y will be handier.
The "always", "hostprogs-y", "hostprogs-m" will be kept for backward
compatibility for a while.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Break an infinite loop when early parsing of the SRAT table is caused
by a subtable with zero length. Known to affect the ASUS WS X299 SAGE
motherboard with firmware version 1201 which has a large block of
zeros in its SRAT table. The kernel could boot successfully on this
board/firmware prior to the introduction of early parsing this table or
after a BIOS update.
[ bp: Fixup whitespace damage and commit message. Make it return 0 to
denote that there are no immovable regions because who knows what
else is broken in this BIOS. ]
Fixes: 02a3e3cdb7 ("x86/boot: Parse SRAT table and count immovable memory regions")
Signed-off-by: Steven Clarkson <sc@lambdal.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-acpi@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206343
Link: https://lkml.kernel.org/r/CAHKq8taGzj0u1E_i=poHUam60Bko5BpiJ9jn0fAupFUYexvdUQ@mail.gmail.com
Pull x86 asm updates from Ingo Molnar:
"Misc updates:
- Remove last remaining calls to exception_enter/exception_exit() and
simplify the entry code some more.
- Remove force_iret()
- Add support for "Fast Short Rep Mov", which is available starting
with Ice Lake Intel CPUs - and make the x86 assembly version of
memmove() use REP MOV for all sizes when FSRM is available.
- Micro-optimize/simplify the 32-bit boot code a bit.
- Use a more future-proof SYSRET instruction mnemonic"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Simplify calculation of output address
x86/entry/64: Add instruction suffix to SYSRET
x86: Remove force_iret()
x86/cpufeatures: Add support for fast short REP; MOVSB
x86/context-tracking: Remove exception_enter/exit() from KVM_PV_REASON_PAGE_NOT_PRESENT async page fault
x86/context-tracking: Remove exception_enter/exit() from do_page_fault()
x86_64 EFI systems are unable to boot due to a typo in a recent commit:
EFI config tables not found.
-- System halted
This was probably due to the absense of CONFIG_EFI_MIXED=y in testing.
Fixes: 796eb8d26a ("efi/libstub/x86: Use const attribute for efi_is_64bit()")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: tglx@linutronix.de
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20200122191430.4888-1-cai@lca.pw
Condense the calculation of decompressed kernel start a little.
Committer notes:
before:
ebp = ebx - (init_size - _end)
after:
eax = (ebx + _end) - init_size
where in both ebx contains the temporary address the kernel is moved to
for in-place decompression.
The before and after difference in register state is %eax and %ebp
but that is immaterial because the compressed image is not built with
-mregparm, i.e., all arguments of the following extract_kernel() call
are passed on the stack.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200107194436.2166846-1-nivedita@alum.mit.edu
The only users of these got removed, so they also need to be
removed to avoid warnings:
arch/x86/boot/compressed/eboot.c: In function 'setup_efi_pci':
arch/x86/boot/compressed/eboot.c:117:16: error: unused variable 'nr_pci' [-Werror=unused-variable]
unsigned long nr_pci;
^~~~~~
arch/x86/boot/compressed/eboot.c: In function 'setup_uga':
arch/x86/boot/compressed/eboot.c:244:16: error: unused variable 'nr_ugas' [-Werror=unused-variable]
unsigned long nr_ugas;
^~~~~~~
Fixes: 2732ea0d5c ("efi/libstub: Use a helper to iterate over a EFI handle array")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-4-ardb@kernel.org
Reduce the stack frame of the EFI stub's mixed mode thunk routine by
8 bytes, by moving the GDT and return addresses to EBP and EBX, which
we need to preserve anyway, since their top halves will be cleared by
the call into 32-bit firmware code. Doing so results in the UEFI code
being entered with a 16 byte aligned stack, as mandated by the UEFI
spec, fixing the last occurrence in the 64-bit kernel where we violate
this requirement.
Also, move the saved GDT from a global variable to an unused part of the
stack frame, and touch up some other parts of the code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-3-ardb@kernel.org
Reshuffle the x86 stub code a bit so that we can tag the efi_is_64bit()
function with the 'const' attribute, which permits the compiler to
optimize away any redundant calls. Since we have two different entry
points for 32 and 64 bit firmware in the startup code, this also
simplifies the C code since we'll enter it with the efi_is64 variable
already set.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-2-ardb@kernel.org
Introduce the ability to define macros to perform argument translation
for the calls that need it, and define them for the boot services that
we currently use.
When calling 32-bit firmware methods in mixed mode, all output
parameters that are 32-bit according to the firmware, but 64-bit in the
kernel (ie OUT UINTN * or OUT VOID **) must be initialized in the
kernel, or the upper 32 bits may contain garbage. Define macros that
zero out the upper 32 bits of the output before invoking the firmware
method.
When a 32-bit EFI call takes 64-bit arguments, the mixed-mode call must
push the two 32-bit halves as separate arguments onto the stack. This
can be achieved by splitting the argument into its two halves when
calling the assembler thunk. Define a macro to do this for the
free_pages boot service.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-17-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On x86 we need to thunk through assembler stubs to call the EFI services
for mixed mode, and for runtime services in 64-bit mode. The assembler
stubs have limits on how many arguments it handles. Introduce a few
macros to check that we do not try to pass too many arguments to the
stubs.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-16-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit c3710de506 ("efi/libstub/x86: Drop __efi_early() export and
efi_config struct") introduced a reference from C code in eboot.c to
the startup_32 symbol defined in the .S startup code. This results in
a GOT based reference to startup_32, and since GOT entries carry
absolute addresses, they need to be fixed up before they can be used.
On modern toolchains (binutils 2.26 or later), this reference is
relaxed into a R_386_GOTOFF relocation (or the analogous X86_64 one)
which never uses the absolute address in the entry, and so we get
away with not fixing up the GOT table before calling the EFI entry
point. However, GCC 4.6 combined with a binutils of the era (2.24)
will produce a true GOT indirected reference, resulting in a wrong
value to be returned for the address of startup_32() if the boot
code is not running at the address it was linked at.
Fortunately, we can easily override this behavior, and force GCC to
emit the GOTOFF relocations explicitly, by setting the visibility
pragma 'hidden'.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-3-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The mixed mode refactor actually broke mixed mode by failing to
pass the bootparam structure to startup_32(). This went unnoticed
because it apparently has a high tolerance for being passed random
junk, and still boots fine in some cases. So let's fix this by
populating %esi as required when entering via efi32_stub_entry,
and while at it, preserve the arguments themselves instead of their
address in memory (via the stack pointer) since that memory could
be clobbered before we get to it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-2-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of storing the return address in a global variable when calling
a 32-bit EFI service from the 64-bit stub, avoid the indirection via
efi_exit32, and take the return address from the stack.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-26-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The macros efi_call_early and efi_call_runtime are used to call EFI
boot services and runtime services, respectively. However, the naming
is confusing, given that the early vs runtime distinction may suggest
that these are used for calling the same set of services either early
or late (== at runtime), while in reality, the sets of services they
can be used with are completely disjoint, and efi_call_runtime is also
only usable in 'early' code.
So do a global sweep to replace all occurrences with efi_bs_call or
efi_rt_call, respectively, where BS and RT match the idiom used by
the UEFI spec to refer to boot time or runtime services.
While at it, use 'func' as the macro parameter name for the function
pointers, which is less likely to collide and cause weird build errors.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-24-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
None of the definitions of the efi_table_attr() still refer to
their 'table' argument so let's get rid of it entirely.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-23-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After refactoring the mixed mode support code, efi_call_proto()
no longer uses its protocol argument in any of its implementation,
so let's remove it altogether.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-22-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mixed mode translates calls from the 64-bit kernel into the 32-bit
firmware by wrapping them in a call to a thunking routine that
pushes a 32-bit word onto the stack for each argument passed to the
function, regardless of the argument type. This works surprisingly
well for most services and protocols, with the exception of ones that
take explicit 64-bit arguments.
efi_free() invokes the FreePages() EFI boot service, which takes
a efi_physical_addr_t as its address argument, and this is one of
those 64-bit types. This means that the 32-bit firmware will
interpret the (addr, size) pair as a single 64-bit quantity, and
since it is guaranteed to have the high word set (as size > 0),
it will always fail due to the fact that EFI memory allocations are
always < 4 GB on 32-bit firmware.
So let's fix this by giving the thunking code a little hand, and
pass two values for the address, and a third one for the size.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-21-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have a helper efi_system_table() that gives us the address of the
EFI system table in memory, so there is no longer point in passing
it around from each function to the next.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-20-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As a first step towards getting rid of the need to pass around a function
parameter 'sys_table_arg' pointing to the EFI system table, remove the
references to it in the printing code, which is represents the majority
of the use cases.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-19-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The various pointers we stash in the efi_config struct which we
retrieve using __efi_early() are simply copies of the ones in
the EFI system table, which we have started accessing directly
in the previous patch. So drop all the __efi_early() related
plumbing, as well as all the assembly code dealing with efi_config,
which allows us to move the PE/COFF entry point to C code as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-18-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use a single implementation for efi_char16_printk() across all
architectures.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-17-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The efi_call macros on ARM have a dependency on a variable 'sys_table_arg'
existing in the scope of the macro instantiation. Since this variable
always points to the same data structure, let's create a global getter
for it and use that instead.
Note that the use of a global variable with external linkage is avoided,
given the problems we had in the past with early processing of the GOT
tables.
While at it, drop the redundant casts in the efi_table_attr and
efi_call_proto macros.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-16-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We use special wrapper routines to invoke firmware services in the
native case as well as the mixed mode case. For mixed mode, the need
is obvious, but for the native cases, we can simply rely on the
compiler to generate the indirect call, given that GCC now has
support for the MS calling convention (and has had it for quite some
time now). Note that on i386, the decompressor and the EFI stub are not
built with -mregparm=3 like the rest of the i386 kernel, so we can
safely allow the compiler to emit the indirect calls here as well.
So drop all the wrappers and indirection, and switch to either native
calls, or direct calls into the thunk routine for mixed mode.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-14-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Annotate all the firmware routines (boot services, runtime services and
protocol methods) called in the boot context as __efiapi, and make
it expand to __attribute__((ms_abi)) on 64-bit x86. This allows us
to use the compiler to generate the calls into firmware that use the
MS calling convention instead of the SysV one.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-13-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We will soon remove another level of pointer casting, so let's make
sure all type handling involving firmware calls at boot time is correct.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-12-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that we have incorporated the mixed mode protocol definitions
into the native ones using unions, we no longer need the separate
32/64 bit struct definitions, with the exception of the EFI system
table definition and the boot services, runtime services and
configuration table definitions. So drop the unused ones.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-11-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, we support mixed mode by casting all boot time firmware
calls to 64-bit explicitly on native 64-bit systems, and to 32-bit
on 32-bit systems or 64-bit systems running with 32-bit firmware.
Due to this explicit awareness of the bitness in the code, we do a
lot of casting even on generic code that is shared with other
architectures, where mixed mode does not even exist. This casting
leads to loss of coverage of type checking by the compiler, which
we should try to avoid.
So instead of distinguishing between 32-bit vs 64-bit, distinguish
between native vs mixed, and limit all the nasty casting and
pointer mangling to the code that actually deals with mixed mode.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-10-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation of moving to a native vs. mixed mode split rather than a
32 vs. 64 bit split when it comes to invoking EFI firmware services,
update all the native protocol definitions and redefine them as unions
containing an anonymous struct for the native view and a struct called
'mixed_mode' describing the 32-bit view of the protocol when called from
64-bit code.
While at it, flesh out some PCI I/O member definitions that we will be
needing shortly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-9-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Iterating over a EFI handle array is a bit finicky, since we have
to take mixed mode into account, where handles are only 32-bit
while the native efi_handle_t type is 64-bit.
So introduce a helper, and replace the various occurrences of
this pattern.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-8-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The EFI mixed mode entry code goes through the ordinary startup_32()
routine before jumping into the kernel's EFI boot code in 64-bit
mode. The 32-bit startup code must be entered with paging disabled,
but this is not documented as a requirement for the EFI handover
protocol, and so we should disable paging explicitly when entering
the kernel from 32-bit EFI firmware.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224132909.102540-4-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce a new READELF variable to top-level Makefile, so the name of
readelf binary can be specified.
Before this change the name of the binary was hardcoded to
"$(CROSS_COMPILE)readelf" which might not be present for every
toolchain.
This allows to build with LLVM Object Reader by using make parameter
READELF=llvm-readelf.
Link: https://github.com/ClangBuiltLinux/linux/issues/771
Signed-off-by: Dmitry Golovin <dima@golovin.in>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
- Update the ACPICA code in the kernel to upstream revision 20191018
including:
* Fixes for Clang warnings (Bob Moore).
* Fix for possible overflow in get_tick_count() (Bob Moore).
* Introduction of acpi_unload_table() (Bob Moore).
* Debugger and utilities updates (Erik Schmauss).
* Fix for unloading tables loaded via configfs (Nikolaus Voss).
- Add support for EFI specific purpose memory to optionally allow
either application-exclusive or core-kernel-mm managed access to
differentiated memory (Dan Williams).
- Fix and clean up processing of the HMAT table (Brice Goglin,
Qian Cai, Tao Xu).
- Update the ACPI EC driver to make it work on systems with
hardware-reduced ACPI (Daniel Drake).
- Always build in support for the Generic Event Device (GED) to
allow one kernel binary to work both on systems with full
hardware ACPI and hardware-reduced ACPI (Arjan van de Ven).
- Fix the table unload mechanism to unregister platform devices
created when the given table was loaded (Andy Shevchenko).
- Rework the lid blacklist handling in the button driver and add
more lid quirks to it (Hans de Goede).
- Improve ACPI-based device enumeration for some platforms based
on Intel BayTrail SoCs (Hans de Goede).
- Add an OpRegion driver for the Cherry Trail Crystal Cove PMIC
and prevent handlers from being registered for unhandled PMIC
OpRegions (Hans de Goede).
- Unify ACPI _HID/_UID matching (Andy Shevchenko).
- Clean up documentation and comments (Cao jin, James Pack, Kacper
Piwiński).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl3dHNkSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx/NkP/2y6DWjslA6UW4gjZwaRBcjYoyWExMtQ
Z86goiRJtP+/NqOwm09wHFcV6FdZ4kitUno3UgMCDZJjrURapg1D0rxb1lSYtMzs
mGr2FBZlVsJ9erOVSzKj1x2afVhdgl0Rl0fxPzoKgCFt8tCJar6cXy4CVEQKdeLs
eUui2ksXMIEODGhpN/tr/fJqY4O4jlLmPY6gKWfFpSTsv6lnZmzcCxLf5EvUU7JW
O91/jXdWz4Vl6IdP32sce6dGDjkvwnY105c7HeBf5EQWUe9RHFuSex982qhCD8U+
iE+JzlhoYpUb03EktJSXbL++IKUHvoUpTanbhka6unMhazC86x0hDf7ruUtYo2Bk
V8347CFeQ1x2O5IabfJNnUfKaMYhYmOXIoFHJTLKFO5mcCJmP8KOOyDAYilC1psb
RJpl1fDoAhk7NqhMttyBqfxiotP0kMoKuqtAAl8Y0hTF0DwR9IfKntuTtp1yTGds
R4dpJrizUDzw1/o4fCWbc3dFZQR3NFGpL/EAyfPzqjGaeaBBkLoNYstqkal5XHwT
CILmQg2WHoNuQLXZ4NFFDrM2k2G+VUAjQdkYcb/MCOFbw+aTVPu1wyQq37RLtbMo
9UwGeeT6SXW3iA1nyMoM+YvitjmxS7gHPPPl+b9G6kBubAzBPp91Ra0Mj9dPIGRB
Evv5nzOIh8Hi
=7Cqr
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream revision
20191018, add support for EFI specific purpose memory, update the ACPI
EC driver to make it work on systems with hardware-reduced ACPI,
improve ACPI-based device enumeration for some platforms, rework the
lid blacklist handling in the button driver and add more lid quirks to
it, unify ACPI _HID/_UID matching, fix assorted issues and clean up
the code and documentation.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20191018
including:
* Fixes for Clang warnings (Bob Moore)
* Fix for possible overflow in get_tick_count() (Bob Moore)
* Introduction of acpi_unload_table() (Bob Moore)
* Debugger and utilities updates (Erik Schmauss)
* Fix for unloading tables loaded via configfs (Nikolaus Voss)
- Add support for EFI specific purpose memory to optionally allow
either application-exclusive or core-kernel-mm managed access to
differentiated memory (Dan Williams)
- Fix and clean up processing of the HMAT table (Brice Goglin, Qian
Cai, Tao Xu)
- Update the ACPI EC driver to make it work on systems with
hardware-reduced ACPI (Daniel Drake)
- Always build in support for the Generic Event Device (GED) to allow
one kernel binary to work both on systems with full hardware ACPI
and hardware-reduced ACPI (Arjan van de Ven)
- Fix the table unload mechanism to unregister platform devices
created when the given table was loaded (Andy Shevchenko)
- Rework the lid blacklist handling in the button driver and add more
lid quirks to it (Hans de Goede)
- Improve ACPI-based device enumeration for some platforms based on
Intel BayTrail SoCs (Hans de Goede)
- Add an OpRegion driver for the Cherry Trail Crystal Cove PMIC and
prevent handlers from being registered for unhandled PMIC OpRegions
(Hans de Goede)
- Unify ACPI _HID/_UID matching (Andy Shevchenko)
- Clean up documentation and comments (Cao jin, James Pack, Kacper
Piwiński)"
* tag 'acpi-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
ACPI: OSI: Shoot duplicate word
ACPI: HMAT: use %u instead of %d to print u32 values
ACPI: NUMA: HMAT: fix a section mismatch
ACPI: HMAT: don't mix pxm and nid when setting memory target processor_pxm
ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device
ACPI: NUMA: HMAT: Register HMAT at device_initcall level
device-dax: Add a driver for "hmem" devices
dax: Fix alloc_dax_region() compile warning
lib: Uplevel the pmem "region" ida to a global allocator
x86/efi: Add efi_fake_mem support for EFI_MEMORY_SP
arm/efi: EFI soft reservation to memblock
x86/efi: EFI soft reservation to E820 enumeration
efi: Common enable/disable infrastructure for EFI soft reservation
x86/efi: Push EFI_MEMMAP check into leaf routines
efi: Enumerate EFI_MEMORY_SP
ACPI: NUMA: Establish a new drivers/acpi/numa/ directory
ACPICA: Update version to 20191018
ACPICA: debugger: remove leading whitespaces when converting a string to a buffer
ACPICA: acpiexec: initialize all simple types and field units from user input
ACPICA: debugger: add field unit support for acpi_db_get_next_token
...
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Wire up the EFI RNG code for x86. This enables an additional source
of entropy during early boot.
- Enable the TPM event log code on ARM platforms.
- Update Ard's email address"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: libstub/tpm: enable tpm eventlog function for ARM platforms
x86: efi/random: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
efi/random: use arch-independent efi_call_proto()
MAINTAINERS: update Ard's email address to @kernel.org
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- Cross-arch changes to move the linker sections for NOTES and
EXCEPTION_TABLE into the RO_DATA area, where they belong on most
architectures. (Kees Cook)
- Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
trap jumps into the middle of those padding areas instead of
sliding execution. (Kees Cook)
- A thorough cleanup of symbol definitions within x86 assembler code.
The rather randomly named macros got streamlined around a
(hopefully) straightforward naming scheme:
SYM_START(name, linkage, align...)
SYM_END(name, sym_type)
SYM_FUNC_START(name)
SYM_FUNC_END(name)
SYM_CODE_START(name)
SYM_CODE_END(name)
SYM_DATA_START(name)
SYM_DATA_END(name)
etc - with about three times of these basic primitives with some
label, local symbol or attribute variant, expressed via postfixes.
No change in functionality intended. (Jiri Slaby)
- Misc other changes, cleanups and smaller fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
x86/entry/64: Remove pointless jump in paranoid_exit
x86/entry/32: Remove unused resume_userspace label
x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
m68k: Convert missed RODATA to RO_DATA
x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
x86/mm: Report actual image regions in /proc/iomem
x86/mm: Report which part of kernel image is freed
x86/mm: Remove redundant address-of operators on addresses
xtensa: Move EXCEPTION_TABLE to RO_DATA segment
powerpc: Move EXCEPTION_TABLE to RO_DATA segment
parisc: Move EXCEPTION_TABLE to RO_DATA segment
microblaze: Move EXCEPTION_TABLE to RO_DATA segment
ia64: Move EXCEPTION_TABLE to RO_DATA segment
h8300: Move EXCEPTION_TABLE to RO_DATA segment
c6x: Move EXCEPTION_TABLE to RO_DATA segment
arm64: Move EXCEPTION_TABLE to RO_DATA segment
alpha: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Actually use _etext for the end of the text segment
vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
...
Pull x86 boot updates from Ingo Molnar:
"The main changes were:
- Extend the boot protocol to allow future extensions without hitting
the setup_header size limit.
- Add quirk to devicetree systems to disable the RTC unless it's
listed as a supported device.
- Fix ld.lld linker pedantry"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Introduce setup_indirect
x86/boot: Introduce kernel_info.setup_type_max
x86/boot: Introduce kernel_info
x86/init: Allow DT configured systems to disable RTC at boot time
x86/realmode: Explicitly set entry point via ENTRY in linker script
This patch enables KCSAN for x86, with updates to build rules to not use
KCSAN for several incompatible compilation units.
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The setup_data is a bit awkward to use for extremely large data objects,
both because the setup_data header has to be adjacent to the data object
and because it has a 32-bit length field. However, it is important that
intermediate stages of the boot process have a way to identify which
chunks of memory are occupied by kernel data. Thus introduce an uniform
way to specify such indirect data as setup_indirect struct and
SETUP_INDIRECT type.
And finally bump setup_header version in arch/x86/boot/header.S.
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: ard.biesheuvel@linaro.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: dave.hansen@linux.intel.com
Cc: eric.snowberg@oracle.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: kanth.ghatraju@oracle.com
Cc: linux-doc@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rdunlap@infradead.org
Cc: ross.philipson@oracle.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191112134640.16035-4-daniel.kiper@oracle.com
This field contains maximal allowed type for setup_data.
Do not bump setup_header version in arch/x86/boot/header.S because it
will be followed by additional changes coming into the Linux/x86 boot
protocol.
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: ard.biesheuvel@linaro.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: dave.hansen@linux.intel.com
Cc: eric.snowberg@oracle.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: kanth.ghatraju@oracle.com
Cc: linux-doc@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rdunlap@infradead.org
Cc: ross.philipson@oracle.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191112134640.16035-3-daniel.kiper@oracle.com
The relationships between the headers are analogous to the various data
sections:
setup_header = .data
boot_params/setup_data = .bss
What is missing from the above list? That's right:
kernel_info = .rodata
We have been (ab)using .data for things that could go into .rodata or .bss for
a long time, for lack of alternatives and -- especially early on -- inertia.
Also, the BIOS stub is responsible for creating boot_params, so it isn't
available to a BIOS-based loader (setup_data is, though).
setup_header is permanently limited to 144 bytes due to the reach of the
2-byte jump field, which doubles as a length field for the structure, combined
with the size of the "hole" in struct boot_params that a protected-mode loader
or the BIOS stub has to copy it into. It is currently 119 bytes long, which
leaves us with 25 very precious bytes. This isn't something that can be fixed
without revising the boot protocol entirely, breaking backwards compatibility.
boot_params proper is limited to 4096 bytes, but can be arbitrarily extended
by adding setup_data entries. It cannot be used to communicate properties of
the kernel image, because it is .bss and has no image-provided content.
kernel_info solves this by providing an extensible place for information about
the kernel image. It is readonly, because the kernel cannot rely on a
bootloader copying its contents anywhere, but that is OK; if it becomes
necessary it can still contain data items that an enabled bootloader would be
expected to copy into a setup_data chunk.
Do not bump setup_header version in arch/x86/boot/header.S because it
will be followed by additional changes coming into the Linux/x86 boot
protocol.
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: ard.biesheuvel@linaro.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: dave.hansen@linux.intel.com
Cc: eric.snowberg@oracle.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: kanth.ghatraju@oracle.com
Cc: linux-doc@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rdunlap@infradead.org
Cc: ross.philipson@oracle.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191112134640.16035-2-daniel.kiper@oracle.com
Given that EFI_MEMORY_SP is platform BIOS policy decision for marking
memory ranges as "reserved for a specific purpose" there will inevitably
be scenarios where the BIOS omits the attribute in situations where it
is desired. Unlike other attributes if the OS wants to reserve this
memory from the kernel the reservation needs to happen early in init. So
early, in fact, that it needs to happen before e820__memblock_setup()
which is a pre-requisite for efi_fake_memmap() that wants to allocate
memory for the updated table.
Introduce an x86 specific efi_fake_memmap_early() that can search for
attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table
accordingly.
The KASLR code that scans the command line looking for user-directed
memory reservations also needs to be updated to consider
"efi_fake_mem=nn@ss:0x40000" requests.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
interpretation of the EFI Memory Types as "reserved for a specific
purpose".
The proposed Linux behavior for specific purpose memory is that it is
reserved for direct-access (device-dax) by default and not available for
any kernel usage, not even as an OOM fallback. Later, through udev
scripts or another init mechanism, these device-dax claimed ranges can
be reconfigured and hot-added to the available System-RAM with a unique
node identifier. This device-dax management scheme implements "soft" in
the "soft reserved" designation by allowing some or all of the
reservation to be recovered as typical memory. This policy can be
disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with
efi=nosoftreserve.
This patch introduces 2 new concepts at once given the entanglement
between early boot enumeration relative to memory that can optionally be
reserved from the kernel page allocator by default. The new concepts
are:
- E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP
attribute on EFI_CONVENTIONAL memory, update the E820 map with this
new type. Only perform this classification if the
CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as
typical ram.
- IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for
a device driver to search iomem resources for application specific
memory. Teach the iomem code to identify such ranges as "Soft Reserved".
Note that the comment for do_add_efi_memmap() needed refreshing since it
seemed to imply that the efi map might overflow the e820 table, but that
is not an issue as of commit 7b6e4ba3cb "x86/boot/e820: Clean up the
E820_X_MAX definition" that removed the 128 entry limit for
e820__range_add().
A follow-on change integrates parsing of the ACPI HMAT to identify the
node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
now, just identify and reserve memory of this type.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Invoke the EFI_RNG_PROTOCOL protocol in the context of the x86 EFI stub,
same as is done on arm/arm64 since commit 568bc4e870 ("efi/arm*/libstub:
Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table"). Within the stub,
a Linux-specific RNG seed UEFI config table will be seeded. The EFI routines
in the core kernel will pick that up later, yet still early during boot,
to seed the kernel entropy pool. If CONFIG_RANDOM_TRUST_BOOTLOADER, entropy
is credited for this seed.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Currently, kernel fails to boot on some HyperV VMs when using EFI.
And it's a potential issue on all x86 platforms.
It's caused by broken kernel relocation on EFI systems, when below three
conditions are met:
1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
by the loader.
2. There isn't enough room to contain the kernel, starting from the
default load address (eg. something else occupied part the region).
3. In the memmap provided by EFI firmware, there is a memory region
starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
kernel.
EFI stub will perform a kernel relocation when condition 1 is met. But
due to condition 2, EFI stub can't relocate kernel to the preferred
address, so it fallback to ask EFI firmware to alloc lowest usable memory
region, got the low region mentioned in condition 3, and relocated
kernel there.
It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
is the lowest acceptable kernel relocation address.
The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
address if kernel is located below it. Then the relocation before
decompression, which move kernel to the end of the decompression buffer,
will overwrite other memory region, as there is no enough memory there.
To fix it, just don't let EFI stub relocate the kernel to any address
lower than lowest acceptable address.
[ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ]
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When building with "EXTRA_CFLAGS=-Wall" gcc warns:
arch/x86/boot/compressed/acpi.c:29:30: warning: get_cmdline_acpi_rsdp defined but not used [-Wunused-function]
get_cmdline_acpi_rsdp() is only used when CONFIG_RANDOMIZE_BASE and
CONFIG_MEMORY_HOTREMOVE are both enabled, so any build where one of these
config options is disabled has this issue.
Move the function under the same ifdef guard as the call site.
[ tglx: Add context to the changelog so it becomes useful ]
Fixes: 41fa1ee9c6 ("acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1569719633-32164-1-git-send-email-zhenzhong.duan@oracle.com
These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.
Now, ENTRY/ENDPROC can be forced to be undefined on X86, so do so.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Allison Randal <allison@lohutok.net>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bill Metzenthen <billm@melbpc.org.au>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: platform-driver-x86@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191011115108.12392-28-jslaby@suse.cz
These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.
Make sure ENTRY/ENDPROC is not defined on X86_64, given these were the
last users.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernate]
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits]
Acked-by: Herbert Xu <herbert@gondor.apana.org.au> [crypto]
Cc: Allison Randal <allison@lohutok.net>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Armijn Hemel <armijn@tjaldur.nl>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Enrico Weigelt <info@metux.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20191011115108.12392-25-jslaby@suse.cz
All these are functions which are invoked from elsewhere but they are
not typical C functions. So annotate them using the new SYM_CODE_START.
All these were not balanced with any END, so mark their ends by
SYM_CODE_END appropriately too.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [power mgmt]
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: platform-driver-x86@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20191011115108.12392-23-jslaby@suse.cz
There are a couple of assembly functions which are invoked only locally
in the file they are defined. In C, they are marked "static". In
assembly, annotate them using SYM_{FUNC,CODE}_START_LOCAL (and switch
their ENDPROC to SYM_{FUNC,CODE}_END too). Whether FUNC or CODE is used,
depends on whether ENDPROC or END was used for a particular function
before.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191011115108.12392-21-jslaby@suse.cz
Use the new SYM_DATA, SYM_DATA_START, and SYM_DATA_END* macros for data,
so that the data in the object file look sane:
Value Size Type Bind Vis Ndx Name
0000 10 OBJECT GLOBAL DEFAULT 3 efi32_boot_gdt
000a 10 OBJECT LOCAL DEFAULT 3 save_gdt
0014 8 OBJECT LOCAL DEFAULT 3 func_rt_ptr
001c 48 OBJECT GLOBAL DEFAULT 3 efi_gdt64
004c 0 OBJECT LOCAL DEFAULT 3 efi_gdt64_end
0000 48 OBJECT LOCAL DEFAULT 3 gdt
0030 0 OBJECT LOCAL DEFAULT 3 gdt_end
0030 8 OBJECT LOCAL DEFAULT 3 efi_config
0038 49 OBJECT GLOBAL DEFAULT 3 efi32_config
0069 49 OBJECT GLOBAL DEFAULT 3 efi64_config
All have correct size and type now.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Allison Randal <allison@lohutok.net>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Enrico Weigelt <info@metux.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20191011115108.12392-13-jslaby@suse.cz
.Lrelocated, .Lpaging_enabled, .Lno_longmode, and .Lin_pm32 are
self-standing local functions, annotate them as such and preserve "no
alignment".
The annotations do not generate anything yet.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20191011115108.12392-8-jslaby@suse.cz
The kernel image map is created using PMD pages, which can include
some extra space beyond what's actually needed. Round the size of the
memory hole we search for up to the next PMD boundary, to be certain
all of the space to be mapped is usable RAM and includes no reserved
areas.
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/df4f49f05c0c27f108234eb93db5c613d09ea62e.1569358539.git.steve.wahl@hpe.com
It was observed that the kernel embeds the absolute build path in the
x86 boot image when the __FILE__ macro is expanded.
> From https://bugzilla.yoctoproject.org/show_bug.cgi?id=13458:
If you turn on the buildpaths QA test, or try a reproducible build, you
discover that the kernel image contains build paths.
$ strings bzImage-5.0.19-yocto-standard |grep tmp/
out of pgt_buf in
/data/poky-tmp/reproducible/tmp/work-shared/qemux86-64/kernel-source/arch/x86/boot/compressed/kaslr_64.c!?
But what's this in the top-level Makefile:
$ git grep prefix-map
Makefile:KBUILD_CFLAGS += $(call
cc-option,-fmacro-prefix-map=$(srctree)/=)
So the __FILE__ shouldn't be using the full path. However
arch/x86/boot/compressed/Makefile has this:
KBUILD_CFLAGS := -m$(BITS) -O2
So that clears KBUILD_FLAGS, removing the -fmacro-prefix-map option.
Use -fmacro-prefix-map to have relative paths in the boot image too.
[ bp: Massage commit message and put the KBUILD_CFLAGS addition in
..boot/Makefile after the KBUILD_AFLAGS assignment because gas
doesn't support -fmacro-prefix-map. ]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
Signed-off-by: Ross Burton <ross.burton@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Rimar <grimar@accesssoftek.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190926093226.8568-1-ross.burton@intel.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204333
Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.
From the original description:
This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.
The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.
There are two major changes since this was last proposed for mainline:
- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/
- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.
The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.
The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:
lockdown={integrity|confidentiality}
Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.
This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.
New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.
The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.
Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"
* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...
Boris suggests to make a local label (prepend ".L") to these functions
to eliminate them from the symbol table. These are functions with very
local names and really should not be visible anywhere.
Note that objtool won't see these functions anymore (to generate ORC
debug info). But all the functions are not annotated with ENDPROC, so
they won't have objtool's attention anyway.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steve Winslow <swinslow@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20190906075550.23435-2-jslaby@suse.cz
Gustavo noticed that 'new' can be left uninitialized if 'bios_start'
happens to be less or equal to 'entry->addr + entry->size'.
Initialize the variable at the begin of the iteration to the current value
of 'bios_start'.
Fixes: 0a46fff2f9 ("x86/boot/compressed/64: Fix boot on machines with broken E820 table")
Reported-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190826133326.7cxb4vbmiawffv2r@box
This option allows userspace to pass the RSDP address to the kernel, which
makes it possible for a user to modify the workings of hardware. Reject
the option when the kernel is locked down. This requires some reworking
of the existing RSDP command line logic, since the early boot code also
makes use of a command-line passed RSDP when locating the SRAT table
before the lockdown code has been initialised. This is achieved by
separating the command line RSDP path in the early boot code from the
generic RSDP path, and then copying the command line RSDP into boot
params in the kernel proper if lockdown is not enabled. If lockdown is
enabled and an RSDP is provided on the command line, this will only be
used when parsing SRAT (which shouldn't permit kernel code execution)
and will be ignored in the rest of the kernel.
(Modified by Matthew Garrett in order to handle the early boot RSDP
environment)
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: Dave Young <dyoung@redhat.com>
cc: linux-acpi@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
BIOS on Samsung 500C Chromebook reports very rudimentary E820 table that
consists of 2 entries:
BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] usable
BIOS-e820: [mem 0x00000000fffff000-0x00000000ffffffff] reserved
It breaks logic in find_trampoline_placement(): bios_start lands on the
end of the first 4k page and trampoline start gets placed below 0.
Detect underflow and don't touch bios_start for such cases. It makes
kernel ignore E820 table on machines that doesn't have two usable pages
below BIOS_START_MAX.
Fixes: 1b3a626436 ("x86/boot/compressed/64: Validate trampoline placement against E820")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203463
Link: https://lkml.kernel.org/r/20190813131654.24378-1-kirill.shutemov@linux.intel.com
Kernel build warns:
'sanitize_boot_params' defined but not used [-Wunused-function]
at below files:
arch/x86/boot/compressed/cmdline.c
arch/x86/boot/compressed/error.c
arch/x86/boot/compressed/early_serial_console.c
arch/x86/boot/compressed/acpi.c
That's becausethey each include misc.h which includes a definition of
sanitize_boot_params() via bootparam_utils.h.
Remove the inclusion from misc.h and have the c file including
bootparam_utils.h directly.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1563283092-1189-1-git-send-email-zhenzhong.duan@oracle.com
Pull x86 boot updates from Thomas Gleixner:
"Assorted updates to kexec/kdump:
- Proper kexec support for 4/5-level paging and jumping from a
5-level to a 4-level paging kernel.
- Make the EFI support for kexec/kdump more robust
- Enforce that the GDT is properly aligned instead of getting the
alignment by chance"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kdump/64: Restrict kdump kernel reservation to <64TB
x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel
x86/boot: Add xloadflags bits to check for 5-level paging support
x86/boot: Make the GDT 8-byte aligned
x86/kexec: Add the ACPI NVS region to the ident map
x86/boot: Call get_rsdp_addr() after console_init()
Revert "x86/boot: Disable RSDP parsing temporarily"
x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels
x86/kexec: Add the EFI system tables and ACPI tables to the ident map
The segment descriptors are loaded with an implicitly LOCK-ed instruction,
which could trigger the split lock #AC exception if the variable is not
properly aligned and crosses a cache line.
Align the GDT properly so the descriptors are all 8 byte aligned.
Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lkml.kernel.org/r/20190627045525.105266-1-xiaoyao.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this file is part of the linux kernel and is made available under
the terms of the gnu general public license version 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 28 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.534229504@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
... so that early debugging output from the RSDP parsing code can be
visible and collected.
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: kexec@lists.infradead.org
Cc: x86@kernel.org
TODO:
- ask dyoung and Dirk van der Merwe <dirk.vandermerwe@netronome.com> to
test again.
This reverts commit 36f0c42355.
Now that the required fixes are in place, reenable early RSDP parsing.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Commit
3a63f70bf4 ("x86/boot: Early parse RSDP and save it in boot_params")
broke kexec boot on EFI systems. efi_get_rsdp_addr() in the early
parsing code tries to search RSDP from the EFI tables but that will
crash because the table address is virtual when the kernel was booted by
kexec (set_virtual_address_map() has run in the first kernel and cannot
be run again in the second kernel).
In the case of kexec, the physical address of EFI tables is provided via
efi_setup_data in boot_params, which is set up by kexec(1).
Factor out the table parsing code and use different pointers depending
on whether the kernel is booted by kexec or not.
[ bp: Massage. ]
Fixes: 3a63f70bf4 ("x86/boot: Early parse RSDP and save it in boot_params")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Dave Young <dyoung@redhat.com>
Link: https://lkml.kernel.org/r/20190408231011.GA5402@jeru.linux.bs1.fc.nec.co.jp
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin street fifth floor boston ma
02110 1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 46 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141334.135501091@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The KASAN subsystem wraps calls to memcpy(), memset() and memmove()
to sanitize the arguments before invoking the actual routines, which
have been renamed to __memcpy(), __memset() and __memmove(),
respectively. When CONFIG_KASAN is enabled for the kernel build but
KASAN code generation is disabled for the compilation unit (which is
needed for things like the EFI stub or the decompressor), the string
routines are just #define'd to their __ prefixed names so that they
are simply invoked directly.
This does however rely on those __ prefixed names to exist in the
symbol namespace, which is not currently the case for the x86
decompressor, which may lead to errors like
drivers/firmware/efi/libstub/tpm.o: In function `efi_retrieve_tpm2_eventlog':
tpm.c:(.text+0x2a8): undefined reference to `__memcpy'
So let's expose the __ prefixed symbols in the decompressor when
KASAN is enabled.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <matthewgarrett@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* acpica:
ACPICA: Update version to 20190405
ACPICA: Namespace: add check to avoid null pointer dereference
ACPICA: Update version to 20190329
ACPICA: utilities: fix spelling of PCC to platform_comm_channel
ACPICA: Rename nameseg length macro/define for clarity
ACPICA: Rename nameseg compare macro for clarity
ACPICA: Rename nameseg copy macro for clarity
The original intention to move RDSP parsing very early, before KASLR
does its ranges selection, was to accommodate movable memory regions
machines (CONFIG_MEMORY_HOTREMOVE) to still be able to do memory
hotplug.
However, that broke kexec'ing a kernel on EFI machines because depending
on where the EFI systab was mapped, on at least one machine it isn't
present in the kexec mapping of the second kernel, leading to a triple
fault in the early code.
Fixing this properly requires significantly involved surgery and we
cannot allow ourselves to do that, that close to the merge window.
So disable the RSDP parsing code temporarily until it is fixed properly
in the next release cycle.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190419141952.GE10324@zn.tnic
ACPICA commit 92ec0935f27e217dff0b176fca02c2ec3d782bb5
ACPI_COMPARE_NAME changed to ACPI_COMPARE_NAMESEG
This clarifies (1) this is a compare on 4-byte namesegs, not
a generic compare. Improves understanding of the code.
Link: https://github.com/acpica/acpica/commit/92ec0935
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The declarations related to immovable memory handling are out of the
BOOT_COMPRESSED_MISC_H #ifdef scope, wrap them inside.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190304055546.18566-1-bhe@redhat.com
Pull x86 boot fix from Thomas Gleixner:
"A trivial fix for the previous x86/boot pull request which did not
make it in time"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/KASLR: Always return a value from process_mem_region
Pull x86 cleanups from Ingo Molnar:
"Various cleanups and simplifications, none of them really stands out,
they are all over the place"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/uaccess: Remove unused __addr_ok() macro
x86/smpboot: Remove unused phys_id variable
x86/mm/dump_pagetables: Remove the unused prev_pud variable
x86/fpu: Move init_xstate_size() to __init section
x86/cpu_entry_area: Move percpu_setup_debug_store() to __init section
x86/mtrr: Remove unused variable
x86/boot/compressed/64: Explain paging_prepare()'s return value
x86/resctrl: Remove duplicate MSR_MISC_FEATURE_CONTROL definition
x86/asm/suspend: Drop ENTRY from local data
x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery
x86/boot: Save several bytes in decompressor
x86/trap: Remove useless declaration
x86/mm/tlb: Remove unused cpu variable
x86/events: Mark expected switch-case fall-throughs
x86/asm-prototypes: Remove duplicate include <asm/page.h>
x86/kernel: Mark expected switch-case fall-throughs
x86/insn-eval: Mark expected switch-case fall-through
x86/platform/UV: Replace kmalloc() and memset() with k[cz]alloc() calls
x86/e820: Replace kmalloc() + memcpy() with kmemdup()
Pull x86 build updates from Ingo Molnar:
"Misc cleanups and a retpoline code generation optimization"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, retpolines: Raise limit for generating indirect calls from switch-case
x86/build: Use the single-argument OUTPUT_FORMAT() linker script command
x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
x86/build: Mark per-CPU symbols as absolute explicitly for LLD
Pull x86 boot updates from Ingo Molnar:
"Most of the changes center around the difficult problem of KASLR
pinning down hot-removable memory regions. At the very early stage
KASRL is making irreversible kernel address layout decisions we don't
have full knowledge about the memory maps yet.
So the changes from Chao Fan add this (parsing the RSDP table early),
together with fixes from Borislav Petkov"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed/64: Do not read legacy ROM on EFI system
x86/boot: Correct RSDP parsing with 32-bit EFI
x86/kexec: Fill in acpi_rsdp_addr from the first kernel
x86/boot: Fix randconfig build error due to MEMORY_HOTREMOVE
x86/boot: Fix cmdline_find_option() prototype visibility
x86/boot/KASLR: Limit KASLR to extract the kernel in immovable memory only
x86/boot: Parse SRAT table and count immovable memory regions
x86/boot: Early parse RSDP and save it in boot_params
x86/boot: Search for RSDP in memory
x86/boot: Search for RSDP in the EFI tables
x86/boot: Add "acpi_rsdp=" early parsing
x86/boot: Copy kstrtoull() to boot/string.c
x86/boot: Build the command line parsing code unconditionally
When compiling with -Wreturn-type, clang warns:
arch/x86/boot/compressed/kaslr.c:704:1: warning: control may reach end of
non-void function [-Wreturn-type]
This function's return statement should have been placed outside the
ifdeffed region. Move it there.
Fixes: 690eaa5320 ("x86/boot/KASLR: Limit KASLR to extract the kernel in immovable memory only")
Signed-off-by: Louis Taylor <louis@kragniz.eu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: bhe@redhat.com
Cc: kirill.shutemov@linux.intel.com
Cc: jflat@chromium.org
Link: https://lkml.kernel.org/r/20190302184929.28971-1-louis@kragniz.eu
EFI systems do not necessarily provide a legacy ROM. If the ROM is missing
the memory is not mapped at all.
Trying to dereference values in the legacy ROM area leads to a crash on
Macbook Pro.
Only look for values in the legacy ROM area for non-EFI system.
Fixes: 3548e131ec ("x86/boot/compressed/64: Find a place for 32-bit trampoline")
Reported-by: Pitam Mitra <pitamm@gmail.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Bockjoo Kim <bockjoo@phys.ufl.edu>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190219075224.35058-1-kirill.shutemov@linux.intel.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202351
Guenter Roeck reported triple faults of a 64-bit VM using a 32-bit OVMF
EFI image. After some singlestepping of the image in gdb, it turned out
that some of the EFI config tables were at bogus addresses. Which, as
Ard pointed out, results from using the wrong efi_config_table typedef.
So switch all EFI table pointers to unsigned longs and convert them to
the proper typedef only when accessing them. This way, the proper table
type is being used.
Shorten variable names, while at it.
Fixes: 33f0df8d84 ("x86/boot: Search for RSDP in the EFI tables")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190208190248.GA10854@roeck-us.net
paging_prepare() returns a two-quadword structure which lands
into RDX:RAX:
- Address of the trampoline is returned in RAX.
- Non zero RDX means trampoline needs to enable 5-level paging.
Document that explicitly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dave.hansen@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kyle D Pelton <kyle.d.pelton@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190206154756.matwldebbxkmlnae@black.fi.intel.com
RDMSR in the trampoline code overwrites EDX but that register is used
to indicate whether 5-level paging has to be enabled and if clobbered,
leads to failure to boot on a 5-level paging machine.
Preserve EDX on the stack while we are dealing with EFER.
Fixes: b677dfae5a ("x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode")
Reported-by: Kyle D Pelton <kyle.d.pelton@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dave.hansen@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190206115253.1907-1-kirill.shutemov@linux.intel.com
When building randconfigs, one of the failures is:
ld: arch/x86/boot/compressed/kaslr.o: in function `choose_random_location':
kaslr.c:(.text+0xbf7): undefined reference to `count_immovable_mem_regions'
ld: kaslr.c:(.text+0xcbe): undefined reference to `immovable_mem'
make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
because CONFIG_ACPI is not enabled in this particular .config but
CONFIG_MEMORY_HOTREMOVE is and count_immovable_mem_regions() is
unresolvable because it is defined in compressed/acpi.c which is the
compilation unit that depends on CONFIG_ACPI.
Add CONFIG_ACPI to the explicit dependencies for MEMORY_HOTREMOVE.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190205131033.9564-1-bp@alien8.de
ac09c5f43c ("x86/boot: Build the command line parsing code unconditionally")
enabled building the command line parsing code unconditionally but it
forgot to remove the respective ifdeffery around the prototypes in the
misc.h header, leading to
arch/x86/boot/compressed/acpi.c: In function ‘get_acpi_rsdp’:
arch/x86/boot/compressed/acpi.c:37:8: warning: implicit declaration of function \
‘cmdline_find_option’ [-Wimplicit-function-declaration]
ret = cmdline_find_option("acpi_rsdp", val, MAX_ADDR_LEN);
^~~~~~~~~~~~~~~~~~~
for configs where neither CONFIG_EARLY_PRINTK nor CONFIG_RANDOMIZE_BASE
was defined.
Drop the ifdeffery in the header too.
Fixes: ac09c5f43c ("x86/boot: Build the command line parsing code unconditionally")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/5c51daf0.83pQEkvDZILqoSYW%lkp@intel.com
Link: https://lkml.kernel.org/r/20190205131352.GA27396@zn.tnic
KASLR may randomly choose a range which is located in movable memory
regions. As a result, this will break memory hotplug and make the
movable memory chosen by KASLR immovable.
Therefore, limit KASLR to choose memory regions in the immovable range
after consulting the SRAT table.
[ bp:
- Rewrite commit message.
- Trim comments.
]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: caoj.fnst@cn.fujitsu.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-8-fanc.fnst@cn.fujitsu.com
Parse SRAT for the immovable memory regions and use that information to
control which offset KASLR selects so that it doesn't overlap with any
movable region.
[ bp:
- Move struct mem_vector where it is visible so that it builds.
- Correct comments.
- Rewrite commit message.
]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: <caoj.fnst@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <indou.takao@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: <kasong@redhat.com>
Cc: <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <msys.mizuma@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-7-fanc.fnst@cn.fujitsu.com
The RSDP is needed by KASLR so parse it early and save it in
boot_params.acpi_rsdp_addr, before KASLR setup runs.
RSDP is needed by other kernel facilities so have the parsing code
built-in instead of a long "depends on" line in Kconfig.
[ bp:
- Trim commit message and comments
- Add CONFIG_ACPI dependency in the Makefile
- Move ->acpi_rsdp_addr assignment with the rest of boot_params massaging in extract_kernel().
]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-6-fanc.fnst@cn.fujitsu.com
The immovable memory ranges information in the SRAT table is necessary
to fix the issue of KASLR not paying attention to movable memory regions
when selecting the offset. Therefore, SRAT needs to be parsed.
Depending on the boot: KEXEC/EFI/BIOS, the methods to compute RSDP are
different. When booting from EFI, the EFI table points to the RSDP. So
iterate over the EFI system tables in order to find the RSDP.
[ bp:
- Heavily massage commit message
- Trim comments
- Move the CONFIG_ACPI ifdeffery into the Makefile.
]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-4-fanc.fnst@cn.fujitsu.com
KASLR may randomly choose offsets which are located in movable memory
regions resulting in the movable memory becoming immovable.
The ACPI SRAT (System/Static Resource Affinity Table) describes memory
ranges including ranges of memory provided by hot-added memory devices.
In order to access SRAT, one needs the Root System Description Pointer
(RSDP) with which to find the Root/Extended System Description Table
(R/XSDT) which then contains the system description tables of which SRAT
is one of.
In case the RSDP address has been passed on the command line (kexec-ing
a second kernel) parse it from there.
[ bp: Rewrite the commit message and cleanup the code. ]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-3-fanc.fnst@cn.fujitsu.com
gdt64 represents the content of GDTR under x86-64, which actually needs
10 bytes only, ".long" & ".word" is superfluous.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <bp@alien8.de>
Cc: <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20190123100014.23721-1-caoj.fnst@cn.fujitsu.com
In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
when the hypervsior detects that the guest sets CR0.PG to 0. This causes
the guest OS to reboot when it tries to return from 32-bit trampoline code
because the CPU is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but
EFER.LME=0. As a precaution, set EFER.LME=1 as part of long mode
activation procedure. This extra step won't cause any harm when Linux is
booted on a bare-metal machine.
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20190104054411.12489-1-wei@redhat.com
The various x86 linker scripts use the three-argument linker script
command variant OUTPUT_FORMAT(DEFAULT, BIG, LITTLE) which specifies
three object file formats when the -EL and -EB linker command line
options are used. When -EB is specified, OUTPUT_FORMAT issues the BIG
object file format, when -EL, LITTLE, respectively, and when neither is
specified, DEFAULT.
However, those -E[LB] options are not used by arch/x86/ so switch to the
simple OUTPUT_FORMAT(BFDNAME) macro variant.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190109181531.27513-1-bp@alien8.de
Since commit 9c2af1c737 ("kbuild: add .DELETE_ON_ERROR special
target"), the target file is automatically deleted on failure.
The boilerplate code
... || { rm -f $@; false; }
is unneeded.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The following commit:
d64934019f ("x86/efi: Use efi_exit_boot_services()")
introduced a regression on systems with large memory maps causing them
to hang on boot. The first "goto get_map" that was removed from
exit_boot() ensured there was enough room for the memory map when
efi_call_early(exit_boot_services) was called. This happens when
(nr_desc > ARRAY_SIZE(params->e820_table).
Chain of events:
exit_boot()
efi_exit_boot_services()
efi_get_memory_map <- at this point the mm can't grow over 8 desc
priv_func()
exit_boot_func()
allocate_e820ext() <- new mm grows over 8 desc from e820 alloc
efi_call_early(exit_boot_services) <- mm key doesn't match so retry
efi_call_early(get_memory_map) <- not enough room for new mm
system hangs
This patch allocates the e820 buffer before calling efi_exit_boot_services()
and fixes the regression.
[ mingo: minor cleanliness edits. ]
Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 paravirt updates from Ingo Molnar:
"Two main changes:
- Remove no longer used parts of the paravirt infrastructure and put
large quantities of paravirt ops under a new config option
PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross)
- Enable PV spinlocks on Hyperv (Yi Sun)"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Enable PV qspinlock for Hyper-V
x86/hyperv: Add GUEST_IDLE_MSR support
x86/paravirt: Clean up native_patch()
x86/paravirt: Prevent redefinition of SAVE_FLAGS macro
x86/xen: Make xen_reservation_lock static
x86/paravirt: Remove unneeded mmu related paravirt ops bits
x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella
x86/paravirt: Introduce new config option PARAVIRT_XXL
x86/paravirt: Remove unused paravirt bits
x86/paravirt: Use a single ops structure
x86/paravirt: Remove clobbers from struct paravirt_patch_site
x86/paravirt: Remove clobbers parameter from paravirt patch functions
x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static
x86/xen: Add SPDX identifier in arch/x86/xen files
x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM
x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c
x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
Pull x86 boot updates from Ingo Molnar:
"Two cleanups and a bugfix for a rare boot option combination"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/KASLR: Remove return value from handle_mem_options()
x86/corruption-check: Use pr_*() instead of printk()
x86/corruption-check: Fix panic in memory_corruption_check() when boot option without value is provided
Pull EFI updates from Ingo Molnar:
"The main changes are:
- Add support for enlisting the help of the EFI firmware to create
memory reservations that persist across kexec.
- Add page fault handling to the runtime services support code on x86
so we can more gracefully recover from buggy EFI firmware.
- Fix command line handling on x86 for the boot path that omits the
stub's PE/COFF entry point.
- Other assorted fixes and updates"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: boot: Fix EFI stub alignment
efi/x86: Call efi_parse_options() from efi_main()
efi/x86: earlyprintk - Add 64bit efi fb address support
efi/x86: drop task_lock() from efi_switch_mm()
efi/x86: Handle page faults occurring while running EFI runtime services
efi: Make efi_rts_work accessible to efi page fault handler
efi/efi_test: add exporting ResetSystem runtime service
efi/libstub: arm: support building with clang
efi: add API to reserve memory persistently across kexec reboot
efi/arm: libstub: add a root memreserve config table
efi: honour memory reservations passed via a linux specific config table
When compiling the kernel with Clang, this warning appears even though
it is disabled for the whole kernel because this folder has its own set
of KBUILD_CFLAGS. It was disabled before the beginning of git history.
In file included from arch/x86/boot/compressed/kaslr.c:29:
In file included from arch/x86/boot/compressed/misc.h:21:
In file included from ./include/linux/elf.h:5:
In file included from ./arch/x86/include/asm/elf.h:77:
In file included from ./arch/x86/include/asm/vdso.h:11:
In file included from ./include/linux/mm_types.h:9:
In file included from ./include/linux/spinlock.h:88:
In file included from ./arch/x86/include/asm/spinlock.h:43:
In file included from ./arch/x86/include/asm/qrwlock.h:6:
./include/asm-generic/qrwlock.h:101:53: warning: passing 'u32 *' (aka
'unsigned int *') to parameter of type 'int *' converts between pointers
to integer types with different sign [-Wpointer-sign]
if (likely(atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)))
^~~~~
./include/linux/compiler.h:76:40: note: expanded from macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
./include/asm-generic/atomic-instrumented.h:69:66: note: passing
argument to parameter 'old' here
static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
^
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20181013010713.6999-1-natechancellor@gmail.com
Commit
1958b5fc40 ("x86/boot: Add early boot support when running with SEV active")
can occasionally cause system resets when kexec-ing a second kernel even
if SEV is not active.
That's because get_sev_encryption_bit() uses 32-bit rIP-relative
addressing to read the value of enc_bit - a variable which caches a
previously detected encryption bit position - but kexec may allocate
the early boot code to a higher location, beyond the 32-bit addressing
limit.
In this case, garbage will be read and get_sev_encryption_bit() will
return the wrong value, leading to accessing memory with the wrong
encryption setting.
Therefore, remove enc_bit, and thus get rid of the need to do 32-bit
rIP-relative addressing in the first place.
[ bp: massage commit message heavily. ]
Fixes: 1958b5fc40 ("x86/boot: Add early boot support when running with SEV active")
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: brijesh.singh@amd.com
Cc: kexec@lists.infradead.org
Cc: dyoung@redhat.com
Cc: bhe@redhat.com
Cc: ghook@redhat.com
Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
Before this commit we were only calling efi_parse_options() from
make_boot_params(), but make_boot_params() only gets called if the
kernel gets booted directly as an EFI executable. So when booted through
e.g. grub we ended up not parsing the commandline in the boot code.
This makes the drivers/firmware/efi/libstub code ignore the "quiet"
commandline argument resulting in the following message being printed:
"EFI stub: UEFI Secure Boot is enabled."
Despite the quiet request. This commits adds an extra call to
efi_parse_options() to efi_main() to make sure that the options are
always processed. This fixes quiet not working.
This also fixes the libstub code ignoring nokaslr and efi=nochunk.
Reported-by: Peter Robinson <pbrobinson@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
It's not used by its sole user, so remove this unused functionality.
Also remove a stray unused variable that GCC didn't warn about for some reason.
Suggested-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kirill.shutemov@linux.intel.com
Link: http://lkml.kernel.org/r/20180807015705.21697-1-fanc.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- add build_{menu,n,g,x}config targets for compile-testing Kconfig
- fix and improve recursive dependency detection in Kconfig
- fix parallel building of menuconfig/nconfig
- fix syntax error in clang-version.sh
- suppress distracting log from syncconfig
- remove obsolete "rpm" target
- remove VMLINUX_SYMBOL(_STR) macro entirely
- fix microblaze build with CONFIG_DYNAMIC_FTRACE
- move compiler test for dead code/data elimination to Kconfig
- rename well-known LDFLAGS variable to KBUILD_LDFLAGS
- misc fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbgYhCAAoJED2LAQed4NsGErAP/jt7gt76+N0PZmADBZqyVR/H
4k286g3OiT7DIcdvwqE5BRvu+zNOamDujnnXw63/jwu2RjrkLX/JnhzTbC0IZleZ
KeO4bU4ZH0WFa0Ny9pp0LAnzbXGMnQjDXygcUd5BFoEd5JSLKW2PISEEjRh6b5B7
swJRdgySFaMrUBRNf13FwH5EvX/D0xZQe/wFhFCOv6L4gJZFMmpGUIepgTjTUmxZ
wcNN6xxXg+ulLHVcPdPQ9EYssNHN5xNys02+IdIrhhXuNHji/TFm4dGYuU+dDGeE
Eu4O6Qs7pg0PFGrZ5gLxXDJEp75W+uaTNOqV+jcjq8MRxJuWxyy2biUeelKRT/KH
0iv4ZQJVOMOhl8fZgLtQaXHyQ++5uwd6kvPPf+XFdkogGAIXK0wKWLoALFEOXwb6
z1BBnFx09LrKPGt0ZlKX624OEczedv/UAFiSh3Ic2S3PFEpq4oHrEGhTnyKRobPv
OEcF3RqKjmAdK7PLy4kVpTLhkutkWWhw6Giy9qXUkXYJWonJR7NTQ1mIan2LoGZC
sGi+qKae/8xgO2Nerx59tZpkiHYTMfYeAo8frzWurOxm3YzEfaxNNGPl+IMW7VKz
cNPzQZ5tMUy4i4PAhk/gIWibnUTPfjDbWsZSMtIbO0GFcao56EvllwD8/awuy7lO
QkaAeZHFcF+qgU3muaYK
=Vsb2
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- add build_{menu,n,g,x}config targets for compile-testing Kconfig
- fix and improve recursive dependency detection in Kconfig
- fix parallel building of menuconfig/nconfig
- fix syntax error in clang-version.sh
- suppress distracting log from syncconfig
- remove obsolete "rpm" target
- remove VMLINUX_SYMBOL(_STR) macro entirely
- fix microblaze build with CONFIG_DYNAMIC_FTRACE
- move compiler test for dead code/data elimination to Kconfig
- rename well-known LDFLAGS variable to KBUILD_LDFLAGS
- misc fixes and cleanups
* tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: rename LDFLAGS to KBUILD_LDFLAGS
kbuild: pass LDFLAGS to recordmcount.pl
kbuild: test dead code/data elimination support in Kconfig
initramfs: move gen_initramfs_list.sh from scripts/ to usr/
vmlinux.lds.h: remove stale <linux/export.h> include
export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR()
Coccinelle: remove pci_alloc_consistent semantic to detect in zalloc-simple.cocci
kbuild: make sorting initramfs contents independent of locale
kbuild: remove "rpm" target, which is alias of "rpm-pkg"
kbuild: Fix LOADLIBES rename in Documentation/kbuild/makefiles.txt
kconfig: suppress "configuration written to .config" for syncconfig
kconfig: fix "Can't open ..." in parallel build
kbuild: Add a space after `!` to prevent parsing as file pattern
scripts: modpost: check memory allocation results
kconfig: improve the recursive dependency report
kconfig: report recursive dependency involving 'imply'
kconfig: error out when seeing recursive dependency
kconfig: add build-only configurator targets
scripts/dtc: consolidate include path options in Makefile
Commit a0f97e06a4 ("kbuild: enable 'make CFLAGS=...' to add
additional options to CC") renamed CFLAGS to KBUILD_CFLAGS.
Commit 222d394d30 ("kbuild: enable 'make AFLAGS=...' to add
additional options to AS") renamed AFLAGS to KBUILD_AFLAGS.
Commit 06c5040cdb ("kbuild: enable 'make CPPFLAGS=...' to add
additional options to CPP") renamed CPPFLAGS to KBUILD_CPPFLAGS.
For some reason, LDFLAGS was not renamed.
Using a well-known variable like LDFLAGS may result in accidental
override of the variable.
Kbuild generally uses KBUILD_ prefixed variables for the internally
appended options, so here is one more conversion to sanitize the
naming convention.
I did not touch Makefiles under tools/ since the tools build system
is a different world.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
To allow existing C code to be incorporated into the decompressor or the
UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
declarations into nops, and #define it in places where such exports are
undesirable. Note that this gets rid of a rather dodgy redefine of
linux/export.h's header guard.
Link: http://lkml.kernel.org/r/20180704083651.24360-3-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 boot updates from Thomas Gleixner:
"Boot code updates for x86:
- Allow to skip a given amount of huge pages for address layout
randomization on the kernel command line to prevent regressions in
the huge page allocation with small memory sizes
- Various cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Use CC_SET()/CC_OUT() instead of open coding it
x86/boot/KASLR: Make local variable mem_limit static
x86/boot/KASLR: Skip specified number of 1GB huge pages when doing physical randomization (KASLR)
x86/boot/KASLR: Add two new functions for 1GB huge pages handling
Pull EFI updates from Thomas Gleixner:
"The EFI pile:
- Make mixed mode UEFI runtime service invocations mutually
exclusive, as mandated by the UEFI spec
- Perform UEFI runtime services calls from a work queue so the calls
into the firmware occur from a kernel thread
- Honor the UEFI memory map attributes for live memory regions
configured by UEFI as a framebuffer. This works around a coherency
problem with KVM guests running on ARM.
- Cleanups, improvements and fixes all over the place"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efivars: Call guid_parse() against guid_t type of variable
efi/cper: Use consistent types for UUIDs
efi/x86: Replace references to efi_early->is64 with efi_is_64bit()
efi: Deduplicate efi_open_volume()
efi/x86: Add missing NULL initialization in UGA draw protocol discovery
efi/x86: Merge 32-bit and 64-bit UGA draw protocol setup routines
efi/x86: Align efi_uga_draw_protocol typedef names to convention
efi/x86: Merge the setup_efi_pci32() and setup_efi_pci64() routines
efi/x86: Prevent reentrant firmware calls in mixed mode
efi/esrt: Only call efi_mem_reserve() for boot services memory
fbdev/efifb: Honour UEFI memory map attributes when mapping the FB
efi: Drop type and attribute checks in efi_mem_desc_lookup()
efi/libstub/arm: Add opt-in Kconfig option for the DTB loader
efi: Remove the declaration of efi_late_init() as the function is unused
efi/cper: Avoid using get_seconds()
efi: Use a work queue to invoke EFI Runtime Services
efi/x86: Use non-blocking SetVariable() for efi_delete_dummy_variable()
efi/x86: Clean up the eboot code
There were two report of boot failure cased by trampoline placed into
a reserved memory region. It can happen on machines that don't report
EBDA correctly.
Fix the problem by re-validating the found address against the E820 table.
If the address is in a reserved area, find the next usable region below the
initial address.
Fixes: 3548e131ec ("x86/boot/compressed/64: Find a place for 32-bit trampoline")
Reported-by: Dmitry Malkin <d.malkin@real-time-systems.com>
Reported-by: youling 257 <youling257@gmail.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180801133225.38121-1-kirill.shutemov@linux.intel.com
Fix the following sparse warning:
arch/x86/boot/compressed/kaslr.c:102:20: warning: symbol 'mem_limit' was not declared. Should it be static?
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/1532958273-47725-1-git-send-email-zhongjiang@huawei.com
Dirk Gouders reported that two consecutive "make" invocations on an
already compiled tree will show alternating behaviors:
$ make
CALL scripts/checksyscalls.sh
DESCEND objtool
CHK include/generated/compile.h
DATAREL arch/x86/boot/compressed/vmlinux
Kernel: arch/x86/boot/bzImage is ready (#48)
Building modules, stage 2.
MODPOST 165 modules
$ make
CALL scripts/checksyscalls.sh
DESCEND objtool
CHK include/generated/compile.h
LD arch/x86/boot/compressed/vmlinux
ZOFFSET arch/x86/boot/zoffset.h
AS arch/x86/boot/header.o
LD arch/x86/boot/setup.elf
OBJCOPY arch/x86/boot/setup.bin
OBJCOPY arch/x86/boot/vmlinux.bin
BUILD arch/x86/boot/bzImage
Setup is 15644 bytes (padded to 15872 bytes).
System is 6663 kB
CRC 3eb90f40
Kernel: arch/x86/boot/bzImage is ready (#48)
Building modules, stage 2.
MODPOST 165 modules
He bisected it back to:
commit 98f7852537 ("x86/boot: Refuse to build with data relocations")
The root cause was the use of the "if_changed" kbuild function multiple
times for the same target. It was designed to only be used once per
target, otherwise it will effectively always trigger, flipping back and
forth between the two commands getting recorded by "if_changed". Instead,
this patch merges the two commands into a single function to get stable
build artifacts (i.e. .vmlinux.cmd), and a single build behavior.
Bisected-and-Reported-by: Dirk Gouders <dirk@gouders.net>
Fix-Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180724230827.GA37823@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are a couple of places in the x86 EFI stub code where we select
between 32-bit and 64-bit versions of the support routines based on
the value of efi_early->is64. Referencing that field directly is a
bad idea, since it prevents the compiler from inferring that this
field can never be true on a 32-bit build, and can only become false
on a 64-bit build if support for mixed mode is compiled in. This
results in dead code to be retained in the uncompressed part of the
kernel image, which is wasteful.
So switch to the efi_is_64bit() helper, which will resolve to a
constant boolean unless building for 64-bit with mixed mode support.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-8-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's one ARM, one x86_32 and one x86_64 version of efi_open_volume()
which can be folded into a single shared version by masking their
differences with the efi_call_proto() macro introduced by commit:
3552fdf29f ("efi: Allow bitness-agnostic protocol calls").
To be able to dereference the device_handle attribute from the
efi_loaded_image_t table in an arch- and bitness-agnostic manner,
introduce the efi_table_attr() macro (which already exists for x86)
to arm and arm64.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-7-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The UGA draw protocol discovery routine looks for a EFI handle that has
both the UGA draw protocol and the PCI I/O protocol installed. It checks
for the latter by calling handle_protocol() and pass it a PCI I/O
protocol pointer variable by reference, but fails to initialize it to
NULL, which means the non-NULL check later on in the code could produce
false positives, given that the return code of the handle_protocol() call
is ignored entirely. So add the missing initialization.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-6-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The two versions of setup_uga##() are mostly identical, with the
exception of the size of EFI_HANDLE. So let's merge the two, and
pull the implementation into the calling function setup_uga().
Note that the 32-bit version was only mixed-mode safe by accident:
it only calls the get_mode() method of the UGA draw protocol, which
happens to be the first member, and so truncating the 64-bit void* at
offset 0 to 32 bits happens to produce the correct value. But let's
not rely on that, and use the proper API instead.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-5-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The linux-efi subsystem uses typedefs with the _t suffix to declare
data structures that originate in the UEFI spec. Our type mangling
for mixed mode depends on this convention, so rename the UGA drawing
protocols to allow efi_call_proto() to be used with them in a
subsequent patch.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-4-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After merging the 32-bit and 64-bit versions of the code that invokes
the PCI I/O protocol methods to preserve PCI ROM images in commit:
2c3625cb9f ("efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() ...")
there are still separate code paths for 32-bit and 64-bit, where the only
difference is the size of a EFI_HANDLE. So let's parameterize a single
implementation for that difference only, and get rid of the two copies of
the code.
While at it, rename __setup_efi_pci() to preserve_pci_rom_image() to
better reflect its purpose.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Various small cleanups:
- Standardize printk messages:
'alloc' => 'allocate'
'mem' => 'memory'
also put variable names in printk messages between quotes.
- Align mass-assignments vertically for better readability
- Break multi-line function prototypes at the name where possible,
not in the middle of the parameter list
- Use a newline before return statements consistently.
- Use curly braces in a balanced fashion.
- Remove stray newlines.
No change in functionality.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180711094040.12506-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Hans de Goede reported that his mixed EFI mode Bay Trail tablet
would not boot at all any more, but enter a reboot loop without
any logs printed by the kernel.
Unbreak 64-bit Linux/x86 on 32-bit UEFI:
When it was first introduced, the EFI stub code that copies the
contents of PCI option ROMs originally only intended to do so if
the EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM attribute was *not* set.
The reason was that the UEFI spec permits PCI option ROM images
to be provided by the platform directly, rather than via the ROM
BAR, and in this case, the OS can only access them at runtime if
they are preserved at boot time by copying them from the areas
described by PciIo->RomImage and PciIo->RomSize.
However, it implemented this check erroneously, as can be seen in
commit:
dd5fc854de ("EFI: Stash ROMs if they're not in the PCI BAR")
which introduced:
if (!attributes & EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM)
continue;
and given that the numeric value of EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM
is 0x4000, this condition never becomes true, and so the option ROMs
were copied unconditionally.
This was spotted and 'fixed' by commit:
886d751a2e ("x86, efi: correct precedence of operators in setup_efi_pci")
but inadvertently inverted the logic at the same time, defeating
the purpose of the code, since it now only preserves option ROM
images that can be read from the ROM BAR as well.
Unsurprisingly, this broke some systems, and so the check was removed
entirely in the following commit:
739701888f ("x86, efi: remove attribute check from setup_efi_pci")
It is debatable whether this check should have been included in the
first place, since the option ROM image provided to the UEFI driver by
the firmware may be different from the one that is actually present in
the card's flash ROM, and so whatever PciIo->RomImage points at should
be preferred regardless of whether the attribute is set.
As this was the only use of the attributes field, we can remove
the call to PciIo->Attributes() entirely, which is especially
nice because its prototype involves uint64_t type by-value
arguments which the EFI mixed mode has trouble dealing with.
Any mixed mode system with PCI is likely to be affected.
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180711090235.9327-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When KASLR is enabled then 1GB huge pages allocations might regress
sporadically.
To reproduce on a KVM guest with 4GB RAM:
- add the following options to the kernel command-line:
'default_hugepagesz=1G hugepagesz=1G hugepages=1'
- boot the guest and check number of 1GB pages reserved:
# grep HugePages_Total /proc/meminfo
- sporadically, every couple of bootups the output of this
command shows that when booting with "nokaslr" HugePages_Total is always 1,
while booting without "nokaslr" sometimes HugePages_Total is set as 0
(that is, reserving the 1GB page failed).
Note that you may need to boot a few times to trigger the issue,
because it's somewhat non-deterministic.
The root cause is that kernel may be put into the only good 1GB huge page
in the [0x40000000, 0x7fffffff] physical range randomly.
Below is the dmesg output snippet from the KVM guest. We can see that only
[0x40000000, 0x7fffffff] region is good 1GB huge page,
[0x100000000, 0x13fffffff] will be touched by the memblock top-down allocation:
[...] e820: BIOS-provided physical RAM map:
[...] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[...] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[...] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[...] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[...] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[...] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[...] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[...] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
Besides, on bare-metal machines with larger memory, one less 1GB huge page
might be available with KASLR enabled. That too is because the kernel
image might be randomized into those "good" 1GB huge pages.
To fix this, firstly parse the kernel command-line to get how many 1GB huge
pages are specified. Then try to skip the specified number of 1GB huge
pages when decide which memory region kernel can be randomized into.
Also change the name of handle_mem_memmap() as handle_mem_options()
since it handles not only 'mem=' and 'memmap=', but also 'hugepagesxxx' now.
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: douly.fnst@cn.fujitsu.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: indou.takao@jp.fujitsu.com
Cc: keescook@chromium.org
Cc: lcapitulino@redhat.com
Cc: yasu.isimatu@gmail.com
Link: http://lkml.kernel.org/r/20180625031656.12443-3-bhe@redhat.com
[ Rewrote the changelog, fixed style problems in the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce two new functions: parse_gb_huge_pages() and process_gb_huge_pages(),
which handle a conflict between KASLR and huge pages of 1GB.
These two functions will be used in the next patch:
- parse_gb_huge_pages() is used to parse kernel command-line to get
how many 1GB huge pages have been specified. A static global
variable 'max_gb_huge_pages' is added to store the number.
- process_gb_huge_pages() is used to skip as many 1GB huge pages
as possible from the passed in memory region according to the
specified number.
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: douly.fnst@cn.fujitsu.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: indou.takao@jp.fujitsu.com
Cc: keescook@chromium.org
Cc: lcapitulino@redhat.com
Cc: yasu.isimatu@gmail.com
Link: http://lkml.kernel.org/r/20180625031656.12443-2-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit:
2c3625cb9f ("efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function")
... merged the two versions of __setup_efi_pciXX(), without taking into
account that the 32-bit version used a rather dodgy trick to pass an
immediate 0 constant as argument for a uint64_t parameter.
The issue is caused by the fact that on x86, UEFI protocol method calls
are redirected via struct efi_config::call(), which is a variadic function,
and so the compiler has to infer the types of the parameters from the
arguments rather than from the prototype.
As the 32-bit x86 calling convention passes arguments via the stack,
passing the unqualified constant 0 twice is the same as passing 0ULL,
which is why the 32-bit code in __setup_efi_pci32() contained the
following call:
status = efi_early->call(pci->attributes, pci,
EfiPciIoAttributeOperationGet, 0, 0,
&attributes);
to invoke this UEFI protocol method:
typedef
EFI_STATUS
(EFIAPI *EFI_PCI_IO_PROTOCOL_ATTRIBUTES) (
IN EFI_PCI_IO_PROTOCOL *This,
IN EFI_PCI_IO_PROTOCOL_ATTRIBUTE_OPERATION Operation,
IN UINT64 Attributes,
OUT UINT64 *Result OPTIONAL
);
After the merge, we inadvertently ended up with this version for both
32-bit and 64-bit builds, breaking the latter.
So replace the two zeroes with the explicitly typed constant 0ULL,
which works as expected on both 32-bit and 64-bit builds.
Wilfried tested the 64-bit build, and I checked the generated assembly
of a 32-bit build with and without this patch, and they are identical.
Reported-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hdegoede@redhat.com
Cc: linux-efi@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 updates and fixes from Thomas Gleixner:
- Fix the (late) fallout from the vector management rework causing
hlist corruption and irq descriptor reference leaks caused by a
missing sanity check.
The straight forward fix triggered another long standing issue to
surface. The pre rework code hid the issue due to being way slower,
but now the chance that user space sees an EBUSY error return when
updating irq affinities is way higher, though quite a bunch of
userspace tools do not handle it properly despite the fact that EBUSY
could be returned for at least 10 years.
It turned out that the EBUSY return can be avoided completely by
utilizing the existing delayed affinity update mechanism for irq
remapped scenarios as well. That's a bit more error handling in the
kernel, but avoids fruitless fingerpointing discussions with tool
developers.
- Decouple PHYSICAL_MASK from AMD SME as its going to be required for
the upcoming Intel memory encryption support as well.
- Handle legacy device ACPI detection properly for newer platforms
- Fix the wrong argument ordering in the vector allocation tracepoint
- Simplify the IDT setup code for the APIC=n case
- Use the proper string helpers in the MTRR code
- Remove a stale unused VDSO source file
- Convert the microcode update lock to a raw spinlock as its used in
atomic context.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Enable CMT and MBM on new Skylake stepping
x86/apic/vector: Print APIC control bits in debugfs
genirq/affinity: Defer affinity setting if irq chip is busy
x86/platform/uv: Use apic_ack_irq()
x86/ioapic: Use apic_ack_irq()
irq_remapping: Use apic_ack_irq()
x86/apic: Provide apic_ack_irq()
genirq/migration: Avoid out of line call if pending is not set
genirq/generic_pending: Do not lose pending affinity update
x86/apic/vector: Prevent hlist corruption and leaks
x86/vector: Fix the args of vector_alloc tracepoint
x86/idt: Simplify the idt_setup_apic_and_irq_gates()
x86/platform/uv: Remove extra parentheses
x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME
x86: Mark native_set_p4d() as __always_inline
x86/microcode: Make the late update update_lock a raw lock for RT
x86/mtrr: Convert to use strncpy_from_user() helper
x86/mtrr: Convert to use match_string() helper
x86/vdso: Remove unused file
x86/i8237: Register device based on FADT legacy boot flag
AMD SME claims one bit from physical address to indicate whether the
page is encrypted or not. To achieve that we clear out the bit from
__PHYSICAL_MASK.
The capability to adjust __PHYSICAL_MASK is required beyond AMD SME.
For instance for upcoming Intel Multi-Key Total Memory Encryption.
Factor it out into a separate feature with own Kconfig handle.
It also helps with overhead of AMD SME. It saves more than 3k in .text
on defconfig + AMD_MEM_ENCRYPT:
add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564)
We would need to return to this once we have infrastructure to patch
constants in code. That's good candidate for it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-mm@kvack.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180518113028.79825-1-kirill.shutemov@linux.intel.com
Pull x86 boot updates from Ingo Molnar:
- Centaur CPU updates (David Wang)
- AMD and other CPU topology enumeration improvements and fixes
(Borislav Petkov, Thomas Gleixner, Suravee Suthikulpanit)
- Continued 5-level paging work (Kirill A. Shutemov)
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Mark __pgtable_l5_enabled __initdata
x86/mm: Mark p4d_offset() __always_inline
x86/mm: Introduce the 'no5lvl' kernel parameter
x86/mm: Stop pretending pgtable_l5_enabled is a variable
x86/mm: Unify pgtable_l5_enabled usage in early boot code
x86/boot/compressed/64: Fix trampoline page table address calculation
x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to detect_num_cpu_cores()
x86/Centaur: Report correct CPU/cache topology
x86/CPU: Move cpu_detect_cache_sizes() into init_intel_cacheinfo()
x86/CPU: Make intel_num_cpu_cores() generic
x86/CPU: Move cpu local function declarations to local header
x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available
x86/CPU: Modify detect_extended_topology() to return result
x86/CPU/AMD: Calculate last level cache ID from number of sharing threads
x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c
perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id
x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
x86/Centaur: Initialize supported CPU features properly
Pull EFI updates from Ingo Molnar:
- decode x86 CPER data (Yazen Ghannam)
- ignore unrealistically large option ROMs (Hans de Goede)
- initialize UEFI secure boot state during Xen dom0 boot (Daniel Kiper)
- additional minor tweaks and fixes.
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/capsule-loader: Don't output reset log when reset flags are not set
efi/x86: Ignore unrealistically large option ROMs
efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function
efi: Align efi_pci_io_protocol typedefs to type naming convention
efi/libstub/tpm: Make function efi_retrieve_tpm2_eventlog_1_2() static
efi: Decode IA32/X64 Context Info structure
efi: Decode IA32/X64 MS Check structure
efi: Decode additional IA32/X64 Bus Check fields
efi: Decode IA32/X64 Cache, TLB, and Bus Check structures
efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs
efi: Decode IA32/X64 Processor Error Info Structure
efi: Decode IA32/X64 Processor Error Section
efi: Fix IA32/X64 Processor Error Record definition
efi/cper: Remove the INDENT_SP silliness
x86/xen/efi: Initialize UEFI secure boot state during dom0 boot
Pull x86 fixes from Thomas Gleixner:
"An unfortunately larger set of fixes, but a large portion is
selftests:
- Fix the missing clusterid initializaiton for x2apic cluster
management which caused boot failures due to IPIs being sent to the
wrong cluster
- Drop TX_COMPAT when a 64bit executable is exec()'ed from a compat
task
- Wrap access to __supported_pte_mask in __startup_64() where clang
compile fails due to a non PC relative access being generated.
- Two fixes for 5 level paging fallout in the decompressor:
- Handle GOT correctly for paging_prepare() and
cleanup_trampoline()
- Fix the page table handling in cleanup_trampoline() to avoid
page table corruption.
- Stop special casing protection key 0 as this is inconsistent with
the manpage and also inconsistent with the allocation map handling.
- Override the protection key wen moving away from PROT_EXEC to
prevent inaccessible memory.
- Fix and update the protection key selftests to address breakage and
to cover the above issue
- Add a MOV SS self test"
[ Part of the x86 fixes were in the earlier core pull due to dependencies ]
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
x86/apic/x2apic: Initialize cluster ID properly
x86/boot/compressed/64: Fix moving page table out of trampoline memory
x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()
x86/pkeys: Do not special case protection key 0
x86/pkeys/selftests: Add a test for pkey 0
x86/pkeys/selftests: Save off 'prot' for allocations
x86/pkeys/selftests: Fix pointer math
x86/pkeys: Override pkey when moving away from PROT_EXEC
x86/pkeys/selftests: Fix pkey exhaustion test off-by-one
x86/pkeys/selftests: Add PROT_EXEC test
x86/pkeys/selftests: Factor out "instruction page"
x86/pkeys/selftests: Allow faults on unknown keys
x86/pkeys/selftests: Avoid printf-in-signal deadlocks
x86/pkeys/selftests: Remove dead debugging code, fix dprint_in_signal
x86/pkeys/selftests: Stop using assert()
x86/pkeys/selftests: Give better unexpected fault error messages
x86/selftests: Add mov_to_ss test
x86/mpx/selftests: Adjust the self-test to fresh distros that export the MPX ABI
x86/pkeys/selftests: Adjust the self-test to fresh distros that export the pkeys ABI
...
This kernel parameter allows to force kernel to use 4-level paging even
if hardware and kernel support 5-level paging.
The option may be useful to work around regressions related to 5-level
paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Usually pgtable_l5_enabled is defined using cpu_feature_enabled().
cpu_feature_enabled() is not available in early boot code. We use
several different preprocessor tricks to get around it. It's messy.
Unify them all.
If cpu_feature_enabled() is not yet available, USE_EARLY_PGTABLE_L5 can
be defined before all includes. It makes pgtable_l5_enabled rely on
__pgtable_l5_enabled variable instead. This approach fits all early
users.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Hugh noticied that we calculate the address of the trampoline page table
incorrectly in cleanup_trampoline().
TRAMPOLINE_32BIT_PGTABLE_OFFSET has to be divided by sizeof(unsigned long),
since trampoline_32bit is an 'unsigned long' pointer.
TRAMPOLINE_32BIT_PGTABLE_OFFSET is zero so the bug doesn't have a
visible effect.
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e9d0e6330e ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Link: http://lkml.kernel.org/r/20180518103528.59260-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cleanup_trampoline() relocates the top-level page table out of
trampoline memory. We use 'top_pgtable' as our new top-level page table.
But if the 'top_pgtable' would be referenced from C in a usual way,
the address of the table will be calculated relative to RIP.
After kernel gets relocated, the address will be in the middle of
decompression buffer and the page table may get overwritten.
This leads to a crash.
We calculate the address of other page tables relative to the relocation
address. It makes them safe. We should do the same for 'top_pgtable'.
Calculate the address of 'top_pgtable' in assembly and pass down to
cleanup_trampoline().
Move the page table to .pgtable section where the rest of page tables
are. The section is @nobits so we save 4k in kernel image.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e9d0e6330e ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Link: http://lkml.kernel.org/r/20180516080131.27913-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric and Hugh have reported instant reboot due to my recent changes in
decompression code.
The root cause is that I didn't realize that we need to adjust GOT to be
able to run C code that early.
The problem is only visible with an older toolchain. Binutils >= 2.24 is
able to eliminate GOT references by replacing them with RIP-relative
address loads:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=80d873266dec
We need to adjust GOT two times:
- before calling paging_prepare() using the initial load address
- before calling C code from the relocated kernel
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 194a9749c7 ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G")
Link: http://lkml.kernel.org/r/20180516080131.27913-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
setup_efi_pci() tries to save a copy of each PCI option ROM as this may
be necessary for the device driver for the PCI device to have access too.
On some systems the efi_pci_io_protocol's romimage and romsize fields
contain invalid data, which looks a bit like pointers pointing back into
other EFI code or data. Interpreting these pointers as romsize leads to
a very large value and if we then try to alloc this amount of memory to
save a copy the alloc call fails.
This leads to a "Failed to alloc mem for rom" error being printed on the
EFI console for each PCI device.
This commit avoids the printing of these errors, by checking romsize before
doing the alloc and if it is larger then the EFI spec limit of 16 MiB
silently ignore the ROM fields instead of trying to alloc mem and fail.
Tested-by: Hans de Goede <hdegoede@redhat.com>
[ardb: deduplicate 32/64 bit changes, use SZ_16M symbolic constant]
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-16-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As suggested by Lukas, use his efi_call_proto() and efi_table_attr()
macros to merge __setup_efi_pci32() and __setup_efi_pci64() into a
single function, removing the need to duplicate changes made in
subsequent patches across both.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-15-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to use the helper macros that perform type mangling with the
EFI PCI I/O protocol struct typedefs, align their Linux typenames with
the convention we use for definitionns that originate in the UEFI spec,
and add the trailing _t to each.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-14-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mixed mode allows a kernel built for x86_64 to interact with 32-bit
EFI firmware, but requires us to define all struct definitions carefully
when it comes to pointer sizes.
'struct efi_pci_io_protocol_32' currently uses a 'void *' for the
'romimage' field, which will be interpreted as a 64-bit field
on such kernels, potentially resulting in bogus memory references
and subsequent crashes.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A PTE is constructed from a physical address and a pgprotval_t.
__PAGE_KERNEL, for instance, is a pgprot_t and must be converted
into a pgprotval_t before it can be used to create a PTE. This is
done implicitly within functions like pfn_pte() by massage_pgprot().
However, this makes it very challenging to set bits (and keep them
set) if your bit is being filtered out by massage_pgprot().
This moves the bit filtering out of pfn_pte() and friends. For
users of PAGE_KERNEL*, filtering will be done automatically inside
those macros but for users of __PAGE_KERNEL*, they need to do their
own filtering now.
Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
instead of massage_pgprot(). This way, we still *look* for
unsupported bits and properly warn about them if we find them. This
might happen if an unfiltered __PAGE_KERNEL* value was passed in,
for instance.
- printk format warning fix from: Arnd Bergmann <arnd@arndb.de>
- boot crash fix from: Tom Lendacky <thomas.lendacky@amd.com>
- crash bisected by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-and-fixed-by: Arnd Bergmann <arnd@arndb.de>
Fixed-by: Tom Lendacky <thomas.lendacky@amd.com>
Bisected-by: Mike Galbraith <efault@gmx.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull EFI updates from Ingo Molnar:
"The main EFI changes in this cycle were:
- Fix the apple-properties code (Andy Shevchenko)
- Add WARN() on arm64 if UEFI Runtime Services corrupt the reserved
x18 register (Ard Biesheuvel)
- Use efi_switch_mm() on x86 instead of manipulating %cr3 directly
(Sai Praneeth)
- Fix early memremap leak in ESRT code (Ard Biesheuvel)
- Switch to L"xxx" notation for wide string literals (Ard Biesheuvel)
- ... plus misc other cleanups and bugfixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
x86/efi: Replace efi_pgd with efi_mm.pgd
efi: Use string literals for efi_char16_t variable initializers
efi/esrt: Fix handling of early ESRT table mapping
efi: Use efi_mm in x86 as well as ARM
efi: Make const array 'apple' static
efi/apple-properties: Use memremap() instead of ioremap()
efi: Reorder pr_notice() with add_device_randomness() call
x86/efi: Replace GFP_ATOMIC with GFP_KERNEL in efi_query_variable_store()
efi/arm64: Check whether x18 is preserved by runtime services calls
efi/arm*: Stop printing addresses of virtual mappings
efi/apple-properties: Remove redundant attribute initialization from unmarshal_key_value_pairs()
efi/arm*: Only register page tables when they exist
Pull x86 mm updates from Ingo Molnar:
- Extend the memmap= boot parameter syntax to allow the redeclaration
and dropping of existing ranges, and to support all e820 range types
(Jan H. Schönherr)
- Improve the W+X boot time security checks to remove false positive
warnings on Xen (Jan Beulich)
- Support booting as Xen PVH guest (Juergen Gross)
- Improved 5-level paging (LA57) support, in particular it's possible
now to have a single kernel image for both 4-level and 5-level
hardware (Kirill A. Shutemov)
- AMD hardware RAM encryption support (SME/SEV) fixes (Tom Lendacky)
- Preparatory commits for hardware-encrypted RAM support on Intel CPUs.
(Kirill A. Shutemov)
- Improved Intel-MID support (Andy Shevchenko)
- Show EFI page tables in page_tables debug files (Andy Lutomirski)
- ... plus misc fixes and smaller cleanups
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
x86/cpu/tme: Fix spelling: "configuation" -> "configuration"
x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT
x86/mm: Update comment in detect_tme() regarding x86_phys_bits
x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic
x86/mm: Remove pointless checks in vmalloc_fault
x86/platform/intel-mid: Add special handling for ACPI HW reduced platforms
ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callback
ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and export
x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf
x86/pconfig: Detect PCONFIG targets
x86/tme: Detect if TME and MKTME is activated by BIOS
x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
x86/boot/compressed/64: Use page table in trampoline memory
x86/boot/compressed/64: Use stack from trampoline memory
x86/boot/compressed/64: Make sure we have a 32-bit code segment
x86/mm: Do not use paravirtualized calls in native_set_p4d()
kdump, vmcoreinfo: Export pgtable_l5_enabled value
x86/boot/compressed/64: Prepare new top-level page table for trampoline
x86/boot/compressed/64: Set up trampoline memory
x86/boot/compressed/64: Save and restore trampoline memory
...
Pull x86 build updates from Ingo Molnar:
"The biggest change is the forcing of asm-goto support on x86, which
effectively increases the GCC minimum supported version to gcc-4.5 (on
x86)"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Don't pass in -D__KERNEL__ multiple times
x86: Remove FAST_FEATURE_TESTS
x86: Force asm-goto
x86/build: Drop superfluous ALIGN from the linker script
Some .<target>.cmd files under arch/x86 are showing two instances of
-D__KERNEL__, like arch/x86/boot/ and arch/x86/realmode/rm/.
__KERNEL__ is already defined in KBUILD_CPPFLAGS in the top Makefile,
so it can be dropped safely.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kbuild@vger.kernel.org
Link: http://lkml.kernel.org/r/20180316084944.3997-1-caoj.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In arch/x86/boot/compressed/kaslr_64.c, CONFIG_AMD_MEM_ENCRYPT support was
initially #undef'd to support SME with minimal effort. When support for
SEV was added, the #undef remained and some minimal support for setting the
encryption bit was added for building identity mapped pagetable entries.
Commit b83ce5ee91 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
changed __PHYSICAL_MASK_SHIFT from 46 to 52 in support of 5-level paging.
This change resulted in SEV guests failing to boot because the encryption
bit was no longer being automatically masked out. The compressed boot
path now requires sme_me_mask to be defined in order for the pagetable
functions, such as pud_present(), to properly mask out the encryption bit
(currently bit 47) when evaluating pagetable entries.
Add an sme_me_mask variable in arch/x86/boot/compressed/mem_encrypt.S,
which is set when SEV is active, delete the #undef CONFIG_AMD_MEM_ENCRYPT
from arch/x86/boot/compressed/kaslr_64.c and use sme_me_mask when building
the identify mapped pagetable entries.
Fixes: b83ce5ee91 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180327220711.8702.55842.stgit@tlendack-t1.amdoffice.net
Pull x86 and PTI fixes from Ingo Molnar:
"Misc fixes:
- fix EFI pagetables freeing
- fix vsyscall pagetable setting on Xen PV guests
- remove ancient CONFIG_X86_PPRO_FENCE=y - x86 is TSO again
- fix two binutils (ld) development version related incompatibilities
- clean up breakpoint handling
- fix an x86 self-test"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64: Don't use IST entry for #BP stack
x86/efi: Free efi_pgd with free_pages()
x86/vsyscall/64: Use proper accessor to update P4D entry
x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirk
x86/boot/64: Verify alignment of the LOAD segment
x86/build/64: Force the linker to use 2MB page size
selftests/x86/ptrace_syscall: Fix for yet more glibc interference
Since the x86-64 kernel must be aligned to 2MB, refuse to boot the
kernel if the alignment of the LOAD segment isn't a multiple of 2MB.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOrR7xSJgUfiCoZLuqWUwymRxXPoGBW38%2BpN%3D9g%2ByKNhZw@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch addresses a shortcoming in current boot process on machines
that supports 5-level paging.
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.
But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
This patch implements a trampoline in lower memory to handle this
situation.
We only need the memory for a very short time, until the main kernel
image sets up own page tables.
We go through the trampoline even if we don't have to: if we're already
in 5-level paging mode or if we don't need to switch to it. This way the
trampoline gets tested on every boot.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.
But if the bootloader put the kernel above 4G (i.e. in kexec() case),
we would lose control as soon as paging is disabled, because the code
becomes unreachable to the CPU.
To handle the situation, we need a trampoline in lower memory that would
take care of switching on 5-level paging.
Apart from the trampoline code itself we also need a place to store
top-level page table in lower memory as we don't have a way to load
64-bit values into CR3 in 32-bit mode. We only really need 8 bytes there
as we only use the very first entry of the page table. But we allocate a
whole page anyway.
This patch switches 32-bit code to use page table in trampoline memory.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As the first step on using trampoline memory, let's make 32-bit code use
stack there.
Separate stack is required to return back from trampoline and we cannot
user stack from 64-bit mode as it may be above 4G.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When kernel starts in 64-bit mode we inherit the GDT from the bootloader.
It may cause a problem if the GDT doesn't have a 32-bit code segment
where we expect it to be.
Load our own GDT with known segments.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180312100246.89175-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that we unambiguously build the entire kernel with -fshort-wchar,
it is no longer necessary to open code efi_char16_t[] initializers as
arrays of characters, and we can move to the L"xxx" notation instead.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180312084500.10764-6-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If trampoline code would need to switch between 4- and 5-level paging
modes, we have to use a page table in trampoline memory.
Having it in trampoline memory guarantees that it's below 4G and we can
point CR3 to it from 32-bit trampoline code.
We only use the page table if the desired paging mode doesn't match the
mode we are in. Otherwise the page table is unused and trampoline code
wouldn't touch CR3.
For 4- to 5-level paging transition, we set up current (4-level paging)
CR3 as the first and the only entry in a new top-level page table.
For 5- to 4-level paging transition, copy page table pointed by first
entry in the current top-level page table as our new top-level page
table.
If the page table is used by trampoline we would need to copy it to new
page table outside trampoline and update CR3 before restoring trampoline
memory.
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The memory area we found for trampoline shouldn't contain anything
useful. But let's preserve the data anyway. Just to be on safe side.
paging_prepare() would save the data into a buffer.
cleanup_trampoline() would restore it back once we are done with the
trampoline.
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling of
paging, which works fine if kernel itself is loaded below 4G.
But if the bootloader puts the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
To handle the situation, we need a trampoline in lower memory that would
take care of switching on 5-level paging.
This patch finds a spot in low memory for a trampoline.
The heuristic is based on code in reserve_bios_regions().
We find the end of low memory based on BIOS and EBDA start addresses.
The trampoline is put just before end of low memory. It's mimic approach
taken to allocate memory for realtime trampoline.
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180226180451.86788-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Don't populate the const read-only array 'buf' on the stack but instead
make it static. Makes the object code smaller by 64 bytes:
Before:
text data bss dec hex filename
9264 1 16 9281 2441 arch/x86/boot/compressed/eboot.o
After:
text data bss dec hex filename
9200 1 16 9217 2401 arch/x86/boot/compressed/eboot.o
(GCC version 7.2.0 x86_64)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180308080020.22828-13-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Both x86/mm and x86/boot contain 5-level paging related patches,
unify them to have a single tree to work against.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On lkml suggestions were made to split up such trivial typo fixes into per subsystem
patches:
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
struct efi_uga_draw_protocol *uga = NULL, *first_uga;
efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
unsigned long nr_ugas;
- u32 *handles = (u32 *)uga_handle;;
+ u32 *handles = (u32 *)uga_handle;
efi_status_t status = EFI_INVALID_PARAMETER;
int i;
This patch is the result of the following script:
$ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
... followed by manual review to make sure it's all good.
Splitting this up is just crazy talk, let's get over with this and just do it.
Reported-by: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All pieces of the puzzle are in place and we can now allow to boot with
CONFIG_X86_5LEVEL=y on a machine without LA57 support.
Kernel will detect that LA57 is missing and fold p4d at runtime.
Update the documentation and the Kconfig option description to reflect the
change.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Switching between paging modes requires the folding of the p4d page table level
when we only have 4 paging levels, which means we need to adjust 'pgdir_shift'
and 'ptrs_per_p4d' during early boot according to paging mode.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'pgtable_l5_enabled' indicates which paging mode we are using. We need to
initialize it at boot-time according to machine capability.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For boot-time switching between 4- and 5-level paging we need to be able
to fold p4d page table level at runtime. It requires variable
PGDIR_SHIFT and PTRS_PER_P4D.
The change doesn't affect the kernel image size much:
text data bss dec hex filename
8628091 4734304 1368064 14730459 e0c4db vmlinux.before
8628393 4734340 1368064 14730797 e0c62d vmlinux.after
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The new flag would indicate what paging mode we are in.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename l5_paging_required() to paging_prepare() and change the
interface of the function.
This is a preparation for the next patch, which would make the function
also allocate memory for the 32-bit trampoline.
The function now returns a 128-bit structure. RAX would return
trampoline memory address (zero for now) and RDX would indicate if we
need to enable 5-level paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Typo fixes and general clarification. ]
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The name of the file -- pagetable.c -- is misleading: it only contains
helpers used for KASLR in 64-bit mode.
Let's rename the file to reflect its content.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull tpm updates from James Morris:
- reduce polling delays in tpm_tis
- support retrieving TPM 2.0 Event Log through EFI before
ExitBootServices
- replace tpm-rng.c with a hwrng device managed by the driver for each
TPM device
- TPM resource manager synthesizes TPM_RC_COMMAND_CODE response instead
of returning -EINVAL for unknown TPM commands. This makes user space
more sound.
- CLKRUN fixes:
* Keep #CLKRUN disable through the entier TPM command/response flow
* Check whether #CLKRUN is enabled before disabling and enabling it
again because enabling it breaks PS/2 devices on a system where it
is disabled
* 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
tpm: remove unused variables
tpm: remove unused data fields from I2C and OF device ID tables
tpm: only attempt to disable the LPC CLKRUN if is already enabled
tpm: follow coding style for variable declaration in tpm_tis_core_init()
tpm: delete the TPM_TIS_CLK_ENABLE flag
tpm: Update MAINTAINERS for Jason Gunthorpe
tpm: Keep CLKRUN enabled throughout the duration of transmit_cmd()
tpm_tis: Move ilb_base_addr to tpm_tis_data
tpm2-cmd: allow more attempts for selftest execution
tpm: return a TPM_RC_COMMAND_CODE response if command is not implemented
tpm: Move Linux RNG connection to hwrng
tpm: use struct tpm_chip for tpm_chip_find_get()
tpm: parse TPM event logs based on EFI table
efi: call get_event_log before ExitBootServices
tpm: add event log format version
tpm: rename event log provider files
tpm: move tpm_eventlog.h outside of drivers folder
tpm: use tpm_msleep() value as max delay
tpm: reduce tpm polling delay in tpm_tis_core
tpm: move wait_for_tpm_stat() to respective driver files
With TPM 2.0 specification, the event logs may only be accessible by
calling an EFI Boot Service. Modify the EFI stub to copy the log area to
a new Linux-specific EFI configuration table so it remains accessible
once booted.
When calling this service, it is possible to specify the expected format
of the logs: TPM 1.2 (SHA1) or TPM 2.0 ("Crypto Agile"). For now, only the
first format is retrieved.
Signed-off-by: Thiebaud Weksteen <tweek@google.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Pull x86 page table isolation updates from Thomas Gleixner:
"This is the final set of enabling page table isolation on x86:
- Infrastructure patches for handling the extra page tables.
- Patches which map the various bits and pieces which are required to
get in and out of user space into the user space visible page
tables.
- The required changes to have CR3 switching in the entry/exit code.
- Optimizations for the CR3 switching along with documentation how
the ASID/PCID mechanism works.
- Updates to dump pagetables to cover the user space page tables for
W+X scans and extra debugfs files to analyze both the kernel and
the user space visible page tables
The whole functionality is compile time controlled via a config switch
and can be turned on/off on the command line as well"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
x86/ldt: Make the LDT mapping RO
x86/mm/dump_pagetables: Allow dumping current pagetables
x86/mm/dump_pagetables: Check user space page table for WX pages
x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
x86/mm/pti: Add Kconfig
x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
x86/mm: Use INVPCID for __native_flush_tlb_single()
x86/mm: Optimize RESTORE_CR3
x86/mm: Use/Fix PCID to optimize user/kernel switches
x86/mm: Abstract switching CR3
x86/mm: Allow flushing for future ASID switches
x86/pti: Map the vsyscall page if needed
x86/pti: Put the LDT in its own PGD if PTI is on
x86/mm/64: Make a full PGD-entry size hole in the memory map
x86/events/intel/ds: Map debug buffers in cpu_entry_area
x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
x86/mm/pti: Map ESPFIX into user space
x86/mm/pti: Share entry text PMD
x86/entry: Align entry text section to PMD boundary
...
If the machine does not support the paging mode for which the kernel was
compiled, the boot process cannot continue.
It's not possible to let the kernel detect the mismatch as it does not even
reach the point where cpu features can be evaluted due to a triple fault in
the KASLR setup.
Instead of instantaneous silent reboot, emit an error message which gives
the user the information why the boot fails.
Fixes: 77ef56e4f0 ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171204124059.63515-3-kirill.shutemov@linux.intel.com
Prerequisite for fixing the current problem of instantaneous reboots when a
5-level paging kernel is booted on 4-level paging hardware.
At the same time this change prepares the decompression code to boot-time
switching between 4- and 5-level paging.
[ tglx: Folded the GCC < 5 fix. ]
Fixes: 77ef56e4f0 ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171204124059.63515-2-kirill.shutemov@linux.intel.com
Early in the boot process, add checks to determine if the kernel is
running with Secure Encrypted Virtualization (SEV) active.
Checking for SEV requires checking that the kernel is running under a
hypervisor (CPUID 0x00000001, bit 31), that the SEV feature is available
(CPUID 0x8000001f, bit 1) and then checking a non-interceptable SEV MSR
(0xc0010131, bit 0).
This check is required so that during early compressed kernel booting the
pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
updated to include the encryption mask so that when the kernel is
decompressed into encrypted memory, it can boot properly.
After the kernel is decompressed and continues booting the same logic is
used to check if SEV is active and set a flag indicating so. This allows
to distinguish between SME and SEV, each of which have unique differences
in how certain things are handled: e.g. DMA (always bounce buffered with
SEV) or EFI tables (always access decrypted with SME).
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kvm@vger.kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171020143059.3291-13-brijesh.singh@amd.com
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kernel makes use of several GCC extensions, disable Clang warnings
about that in the boot code, as we already do for the rest of the kernel.
This suppresses the following warning when building with clang:
./include/linux/cgroup-defs.h:391:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct cgroup cgrp;
Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171030194351.122090-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Transparently fall back to other poweroff method(s) if EFI poweroff
fails (and returns)
- Use separate PE/COFF section headers for the RX and RW parts of the
ARM stub loader so that the firmware can use strict mapping
permissions
- Add support for requesting the firmware to wipe RAM at warm reboot
- Increase the size of the random seed obtained from UEFI so CRNG
fast init can complete earlier
- Update the EFI framebuffer address if it points to a BAR that gets
moved by the PCI resource allocation code
- Enable "reset attack mitigation" of TPM environments: this is
enabled if the kernel is configured with
CONFIG_RESET_ATTACK_MITIGATION=y.
- Clang related fixes
- Misc cleanups, constification, refactoring, etc"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/bgrt: Use efi_mem_type()
efi: Move efi_mem_type() to common code
efi/reboot: Make function pointer orig_pm_power_off static
efi/random: Increase size of firmware supplied randomness
efi/libstub: Enable reset attack mitigation
firmware/efi/esrt: Constify attribute_group structures
firmware/efi: Constify attribute_group structures
firmware/dcdbas: Constify attribute_group structures
arm/efi: Split zImage code and data into separate PE/COFF sections
arm/efi: Replace open coded constants with symbolic ones
arm/efi: Remove pointless dummy .reloc section
arm/efi: Remove forbidden values from the PE/COFF header
drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it
efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns
efi/arm/arm64: Add missing assignment of efi.config_table
efi/libstub/arm64: Set -fpie when building the EFI stub
efi/libstub/arm64: Force 'hidden' visibility for section markers
efi/libstub/arm64: Use hidden attribute for struct screen_info reference
efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP
Pull x86 apic updates from Thomas Gleixner:
"This update provides:
- Cleanup of the IDT management including the removal of the extra
tracing IDT. A first step to cleanup the vector management code.
- The removal of the paravirt op adjust_exception_frame. This is a
XEN specific issue, but merged through this branch to avoid nasty
merge collisions
- Prevent dmesg spam about the TSC DEADLINE bug, when the CPU has
disabled the TSC DEADLINE timer in CPUID.
- Adjust a debug message in the ioapic code to print out the
information correctly"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
x86/idt: Fix the X86_TRAP_BP gate
x86/xen: Get rid of paravirt op adjust_exception_frame
x86/eisa: Add missing include
x86/idt: Remove superfluous ALIGNment
x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature
x86/idt: Remove the tracing IDT leftovers
x86/idt: Hide set_intr_gate()
x86/idt: Simplify alloc_intr_gate()
x86/idt: Deinline setup functions
x86/idt: Remove unused functions/inlines
x86/idt: Move interrupt gate initialization to IDT code
x86/idt: Move APIC gate initialization to tables
x86/idt: Move regular trap init to tables
x86/idt: Move IST stack based traps to table init
x86/idt: Move debug stack init to table based
x86/idt: Switch early trap init to IDT tables
x86/idt: Prepare for table based init
x86/idt: Move early IDT setup out of 32-bit asm
x86/idt: Move early IDT handler setup to IDT code
x86/idt: Consolidate IDT invalidation
...