Commit Graph

25142 Commits

Author SHA1 Message Date
Alexey Kardashevskiy
d51f86cfd8 powerpc/mm: Switch obsolete dssall to .long
The dssall ("Data Stream Stop All") instruction is obsolete altogether
with other Data Cache Instructions since ISA 2.03 (year 2006).

LLVM IAS does not support it but PPC970 seems to be using it.
This switches dssall to .long as there is no much point in fixing LLVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-6-aik@ozlabs.ru
2021-12-23 22:35:13 +11:00
Daniel Axtens
d72c4a36d7 powerpc/64/asm: Do not reassign labels
The LLVM integrated assembler really does not like us reassigning things
to the same label:

<instantiation>:7:9: error: invalid reassignment of non-absolute variable 'fs_label'

This happens across a bunch of platforms:
https://github.com/ClangBuiltLinux/linux/issues/1043
https://github.com/ClangBuiltLinux/linux/issues/1008
https://github.com/ClangBuiltLinux/linux/issues/920
https://github.com/ClangBuiltLinux/linux/issues/1050

There is no hope of getting this fixed in LLVM (see
https://github.com/ClangBuiltLinux/linux/issues/1043#issuecomment-641571200
and https://bugs.llvm.org/show_bug.cgi?id=47798#c1 )
so if we want to build with LLVM_IAS, we need to hack
around it ourselves.

For us the big problem comes from this:

\#define USE_FIXED_SECTION(sname)				\
	fs_label = start_##sname;				\
	fs_start = sname##_start;				\
	use_ftsec sname;

\#define USE_TEXT_SECTION()
	fs_label = start_text;					\
	fs_start = text_start;					\
	.text

and in particular fs_label.

This works around it by not setting those 'variables' and requiring
that users of the variables instead track for themselves what section
they are in. This isn't amazing, by any stretch, but it gets us further
in the compilation.

Note that even though users have to keep track of the section, using
a wrong one produces an error with both binutils and llvm which prevents
from using wrong section at the compile time:

llvm error example:

AS      arch/powerpc/kernel/head_64.o
<unknown>:0: error: Cannot represent a difference across sections
make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1

binutils error example:

/home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
/home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S:1974: Error: can't resolve `system_call_common' {.text section} - `start_r
eal_vectors' {.head.text.real_vectors section}
make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-5-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Alexey Kardashevskiy
fd98395797 powerpc/64/asm: Inline BRANCH_TO_C000
It is used just once and does not really help with readability, remove it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-4-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Daniel Axtens
f5140cab44 powerpc: check for support for -Wa,-m{power4,any}
LLVM's integrated assembler does not like either -Wa,-mpower4
or -Wa,-many. So just don't pass them if they're not supported.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-3-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Alan Modra
a3ad84da07 powerpc/toc: Future proof kernel toc
This patch future-proofs the kernel against linker changes that might
put the toc pointer at some location other than .got+0x8000, by
replacing __toc_start+0x8000 with .TOC. throughout.  If the kernel's
idea of the toc pointer doesn't agree with the linker, bad things
happen.

prom_init.c code relocating its toc is also changed so that a symbolic
__prom_init_toc_start toc-pointer relative address is calculated
rather than assuming that it is always at toc-pointer - 0x8000.  The
length calculations loading values from the toc are also avoided.
It's a little incestuous to do that with unreloc_toc picking up
adjusted values (which is fine in practice, they both adjust by the
same amount if all goes well).

I've also changed the way .got is aligned in vmlinux.lds and
zImage.lds, mostly so that dumping out section info by objdump or
readelf plainly shows the alignment is 256.  This linker script
feature was added 2005-09-27, available in FSF binutils releases from
2.17 onwards.  Should be safe to use in the kernel, I think.

Finally, put *(.got) before the prom_init.o entry which only needs
*(.toc), so that the GOT header goes in the correct place.  I don't
believe this makes any difference for the kernel as it would for
dynamic objects being loaded by ld.so.  That change is just to stop
lusers who blindly copy kernel scripts being led astray.  Of course,
this change needs the prom_init.c changes.

Some notes on .toc and .got.

.toc is a compiler generated section of addresses.  .got is a linker
generated section of addresses, generally built when the linker sees
R_*_*GOT* relocations.  In the case of powerpc64 ld.bfd, there are
multiple generated .got sections, one per input object file.  So you
can somewhat reasonably write in a linker script an input section
statement like *prom_init.o(.got .toc) to mean "the .got and .toc
section for files matching *prom_init.o".  On other architectures that
doesn't make sense, because the linker generally has just one .got
section.  Even on powerpc64, note well that the GOT entries for
prom_init.o may be merged with GOT entries from other objects.  That
means that if prom_init.o references, say, _end via some GOT
relocation, and some other object also references _end via a GOT
relocation, the GOT entry for _end may be in the range
__prom_init_toc_start to __prom_init_toc_end and if the kernel does
something special to GOT/TOC entries in that range then the value of
_end as seen by objects other than prom_init.o will be affected.  On
the other hand the GOT entry for _end may not be in the range
__prom_init_toc_start to __prom_init_toc_end.  Which way it turns out
is deterministic but a detail of linker operation that should not be
relied on.

A feature of ld.bfd is that input .toc (and .got) sections matching
one linker input section statement may be sorted, to put entries used
by small-model code first, near the toc base.  This is why scripts for
powerpc64 normally use *(.got .toc) rather than *(.got) *(.toc), since
the first form allows more freedom to sort.

Another feature of ld.bfd is that indirect addressing sequences using
the GOT/TOC may be edited by the linker to relative addressing.  In
many cases relative addressing would be emitted by gcc for
-mcmodel=medium if you appropriately decorate variable declarations
with non-default visibility.

The original patch is here:
https://lore.kernel.org/linuxppc-dev/20210310034813.GM6042@bubble.grove.modra.org/

Signed-off-by: Alan Modra <amodra@au1.ibm.com>
[aik: removed non-relocatable which is gone in 24d33ac5b8]
[aik: added <=2.24 check]
[aik: because of llvm-as, kernel_toc_addr() uses "mr" instead of global register variable]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-2-aik@ozlabs.ru
2021-12-23 22:35:01 +11:00
Nick Child
7da1d1ddd1 cuda/pmu: Make find_via_cuda/pmu init functions
Make `find_via_cuda` and `find_via_pmu` initialization functions.
Previously, their definitions in `drivers/macintosh/via-cuda.h` include
the `__init` attribute but their alternative definitions in
`arch/powerpc/powermac/sectup./c` and prototypes in `include/linux/
cuda.h` and `include/linux/pmu.h` do not use the `__init` macro. Since,
only initialization functions call `find_via_cuda` and `find_via_pmu`
it is safe to label these functions with `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-21-nick.child@ibm.com
2021-12-23 22:35:00 +11:00
Nick Child
2493a24271 powerpc/512x: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/512x' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-20-nick.child@ibm.com
2021-12-23 22:33:19 +11:00
Nick Child
407454cafd powerpc/85xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/85xx' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-19-nick.child@ibm.com
2021-12-23 22:33:18 +11:00
Nick Child
f4a88b0ef5 powerpc/83xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/83xx' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-18-nick.child@ibm.com
2021-12-23 22:33:18 +11:00
Nick Child
c0dc225ae7 powerpc/embedded6xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/embedded6xx' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-17-nick.child@ibm.com
2021-12-23 22:33:17 +11:00
Nick Child
1ee969be25 powerpc/44x: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/44x/' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-16-nick.child@ibm.com
2021-12-23 22:33:17 +11:00
Nick Child
1e3d992d21 powerpc/4xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/4xx' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-15-nick.child@ibm.com
2021-12-23 22:33:16 +11:00
Nick Child
f1ba9b9474 powerpc/ps3: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/ps3' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-14-nick.child@ibm.com
2021-12-23 22:33:16 +11:00
Nick Child
e14ff96d08 powerpc/pseries: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/pseries' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-13-nick.child@ibm.com
2021-12-23 22:33:15 +11:00
Nick Child
e5913db1ef powerpc/powernv: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/powernv' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-12-nick.child@ibm.com
2021-12-23 22:33:15 +11:00
Nick Child
b346f57100 powerpc/powermac: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/powermac` are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-11-nick.child@ibm.com
2021-12-23 22:33:14 +11:00
Nick Child
e37e06af9b powerpc/pasemi: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/pasemi' are deserving
of an `__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-10-nick.child@ibm.com
2021-12-23 22:33:14 +11:00
Nick Child
d3aa3c5edf powerpc/chrp: Add __init attribute to eligible functions
The function `Enable_SRAM` defined in 'arch/powerpc/platforms/chrp' is
deserving of an `__init` macro attribute. This function is only called by
other initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-9-nick.child@ibm.com
2021-12-23 22:33:13 +11:00
Nick Child
7c1ab16b2d powerpc/cell: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/cell' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-8-nick.child@ibm.com
2021-12-23 22:33:13 +11:00
Nick Child
456e8eb324 powerpc/xmon: Add __init attribute to eligible functions
`xmon_register_spus` defined in 'arch/powerpc/xmon' is deserving of an
`__init` macro attribute. This functions is only called by other
initialization functions and therefore should inherit the attribute.
Also, change the function declaration in the header file to include
`__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-7-nick.child@ibm.com
2021-12-23 22:33:12 +11:00
Nick Child
6c552983d0 powerpc/sysdev: Add __init attribute to eligible functions
Some files functions in 'arch/powerpc/sysdev' are deserving of an `__init`
macro attribute. These functions are only called by other initialization
functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-6-nick.child@ibm.com
2021-12-23 22:33:12 +11:00
Nick Child
c49f5d88ff powerpc/perf: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/perf' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-5-nick.child@ibm.com
2021-12-23 22:33:11 +11:00
Nick Child
c13f2b2bb5 powerpc/mm: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/mm' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-4-nick.child@ibm.com
2021-12-23 22:33:11 +11:00
Nick Child
ce0c6be9c6 powerpc/lib: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/lib' are deserving of an `__init`
macro attribute. These functions are only called by other initialization
functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-3-nick.child@ibm.com
2021-12-23 22:33:10 +11:00
Nick Child
d276960d92 powerpc/kernel: Add __init attribute to eligible functions
Some functions defined in `arch/powerpc/kernel` (and one in `arch/powerpc/
kexec`) are deserving of an `__init` macro attribute. These functions are
only called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-2-nick.child@ibm.com
2021-12-23 22:33:10 +11:00
Rob Herring
9cbbe6bae9 powerpc/dts: Remove "spidev" nodes
"spidev" is not a real device, but a Linux implementation detail. It has
never been documented either. The kernel has WARNed on the use of it for
over 6 years. Time to remove its usage from the tree.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211217221400.3667133-1-robh@kernel.org
2021-12-21 20:17:58 +11:00
Michael Ellerman
8d84fca437 powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion
In note_prot_wx() we bail out without reporting anything if
CONFIG_PPC_DEBUG_WX is disabled.

But CONFIG_PPC_DEBUG_WX was removed in the conversion to generic ptdump,
we now need to use CONFIG_DEBUG_WX instead.

Fixes: e084728393 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20211203124112.2912562-1-mpe@ellerman.id.au
2021-12-21 13:05:59 +11:00
Nicholas Piggin
467ba14e16 powerpc/64s/radix: Fix huge vmap false positive
pmd_huge() is defined to false when HUGETLB_PAGE is not configured, but
the vmap code still installs huge PMDs. This leads to false bad PMD
errors when vunmapping because it is not seen as a huge PTE, and the bad
PMD check catches it. The end result may not be much more serious than
some bad pmd warning messages, because the pmd_none_or_clear_bad() does
what we wanted and clears the huge PTE anyway.

Fix this by checking pmd_is_leaf(), which checks for a PTE regardless of
config options. The whole huge/large/leaf stuff is a tangled mess but
that's kernel-wide and not something we can improve much in arch/powerpc
code.

pmd_page(), pud_page(), etc., called by vmalloc_to_page() on huge vmaps
can similarly trigger a false VM_BUG_ON when CONFIG_HUGETLB_PAGE=n, so
those checks are adjusted. The checks were added by commit d6eacedd1f
("powerpc/book3s: Use config independent helpers for page table walk"),
while implementing a similar fix for other page table walking functions.

Fixes: d909f9109c ("powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216103342.609192-1-npiggin@gmail.com
2021-12-20 12:13:32 +11:00
Yang Guang
a605b39e8e powerpc: use swap() to make code cleaner
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
[mpe: Add include of linux/minmax.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/71a702c2189b16c152affd8a8cda1d84ce32741c.1639792543.git.yang.guang5@zte.com.cn
2021-12-20 12:01:21 +11:00
Christophe JAILLET
2fe4ca6ad7 powerpc/mpic: Use bitmap_zalloc() when applicable
'mpic->protected' is a bitmap. So use 'bitmap_zalloc()' to simplify
code and improve the semantic, instead of hand writing it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa145f674e08044c98f13f1a985faa9cc29c3708.1639777976.git.christophe.jaillet@wanadoo.fr
2021-12-20 11:54:05 +11:00
Linus Torvalds
713ab911f2 powerpc fixes for 5.16 #4
Fix a recently introduced oops at boot on 85xx in some configurations.
 
 Fix crashes when loading some livepatch modules with STRICT_MODULE_RWX.
 
 Thanks to: Joe Lawrence, Russell Currey, Xiaoming Ni.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmG+q+UTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG5sEACVsb39+Q8w0GKJxUGR0Wri1InJTkLC
 UWpd9PVzG6n8+Epl0f+H5cMlv5V6xaDzTsNXEf27xIACMlPSVHVLDW8u/RFPizdp
 fUDUweGUvK/M0GQW89nRV29DjdNjMmUq1KQFM6eEgYN/0nS3HqF27jamiaFhiZ+e
 pJl1g2dy/DTWrFz3V/8b0nv6kyJE8GlpnCmTchDsPNDbzWQVJXVfM7IB/9N+EVna
 MQAZCyKSOafTgNRPOW+Sm4oF64z+Yn4cxURVKfzbM+0Cd8pNIcGZ7McXrnp0DR4t
 bLGeYEzyR1bPc3VTt7b+jafDP41AakRllKAFz65+pdYOcV1hTRM4inbLVPT5Bmz6
 7ANlX/w4BwYWFk3mWW14BoxU02FlRKRddXhusq7x2wIqvtn1KI8Qm1xpneiGpcW+
 GEe+k9ONGowXp/bc47/Co6dPSHKRRHUC3MsgbDv2Y7M5G9vsetckdQzA2bIqYSx8
 r1ZXNiMwQarPeB6FrGvNbekyCsIhJkk6chhtL0k27BVOFqYiHZtaqBs+RCQYPdC8
 Aih8WOEqdldbItFI3MihyKULucgkVpdnm80Ja67qRthuJICF9OML3mdrt83bxEIp
 fu4Bsw8dygVKJWScZj8AUC/ZBxOrrTGyZR1RvsHpaAiEfinMkxxTnXjqNDiSZWXt
 DKKLx6pfSFrokw==
 =Nmp5
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix a recently introduced oops at boot on 85xx in some configurations.

  Fix crashes when loading some livepatch modules with
  STRICT_MODULE_RWX.

  Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni"

* tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/module_64: Fix livepatching for RO modules
  powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
2021-12-19 11:31:14 -08:00
Paolo Bonzini
5a213b9220 Merge branch 'topic/ppc-kvm' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD
Fix conflicts between memslot overhaul and commit 511d25d6b7 ("KVM:
PPC: Book3S: Suppress warnings when allocating too big memory slots")
from the powerpc tree.
2021-12-19 15:27:21 +01:00
Alexandre Ghiti
e0cb56546d arch: Remove leftovers from prism54 wireless driver
This driver was removed so remove all references to it.

Fixes: d249ff28b1 ("intersil: remove obsolete prism54 wireless driver")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-17 14:12:09 +01:00
Alexandre Ghiti
2ac7069ad7 Documentation, arch: Remove leftovers from CIFS_WEAK_PW_HASH
This config was removed so remove all references to it.

Fixes: 76a3c92ec9 ("cifs: remove support for NTLM and weaker authentication algorithms")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Reviewed-by: Steve French <smfrench@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de> [arch/arm/configs]
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-17 14:12:03 +01:00
Alexandre Ghiti
473dcf0ffc Documentation, arch: Remove leftovers from raw device
Raw device interface was removed so remove all references to configs
related to it.

Fixes: 603e4922f1 ("remove the raw driver")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Acked-by: Arnd Bergmann <arnd@arndb.de> [arch/arm/configs]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-17 14:07:52 +01:00
Rob Herring
1f012283e9 of/fdt: Rework early_init_dt_scan_memory() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework
early_init_dt_scan_memory() to be called directly and use libfdt.

Cc: John Crispin <john@phrozen.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linux-mips@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211215150102.1303588-1-robh@kernel.org
2021-12-16 16:07:52 -06:00
Rob Herring
d665881d21 of/fdt: Rework early_init_dt_scan_root() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework early_init_dt_scan_root()
to be called directly and use libfdt.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Link: https://lore.kernel.org/r/20211118181213.1433346-3-robh@kernel.org
2021-12-16 16:07:48 -06:00
Rob Herring
60f20d84dc of/fdt: Rework early_init_dt_scan_chosen() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework
early_init_dt_scan_chosen() to be called directly and use libfdt.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Link: https://lore.kernel.org/r/20211118181213.1433346-2-robh@kernel.org
2021-12-16 16:07:41 -06:00
Thomas Gleixner
706b585a1b powerpc/mpic_u3msi: Use msi_for_each-desc()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.576162169@linutronix.de
2021-12-16 22:22:19 +01:00
Thomas Gleixner
ab430e7437 powerpc/fsl_msi: Use msi_for_each_desc()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.522641685@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
e22b0d1bbf powerpc/pasemi/msi: Convert to msi_on_each_dec()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.468512783@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
3c46658bd7 powerpc/cell/axon_msi: Convert to msi_on_each_desc()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.414712173@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
85dabc2f72 powerpc/4xx/hsta: Rework MSI handling
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.359766435@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
651b39c488 powerpc/pseries/msi: Let core code check for contiguous entries
Set the domain info flag and remove the check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221814.720998720@linutronix.de
2021-12-16 22:16:40 +01:00
Thomas Gleixner
173ffad79d PCI/MSI: Use msi_desc::msi_index
The usage of msi_desc::pci::entry_nr is confusing at best. It's the index
into the MSI[X] descriptor table.

Use msi_desc::msi_index which is shared between all MSI incarnations
instead of having a PCI specific storage for no value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211210221814.602911509@linutronix.de
2021-12-16 22:16:40 +01:00
Thomas Gleixner
ed1533b581 powerpc/pseries/msi: Use PCI device properties
instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221813.556202506@linutronix.de
2021-12-16 22:16:38 +01:00
Thomas Gleixner
d8a530578b powerpc/cell/axon_msi: Use PCI device property
instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211210221813.493922179@linutronix.de
2021-12-16 22:16:38 +01:00
Emmanuel Gil Peyrot
c636783d59 powerpc: wii_defconfig: Enable the RTC driver
This selects the rtc-gamecube driver, which provides a real-time clock
on this platform.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211215175501.6761-6-linkmauve@linkmauve.fr
2021-12-16 21:49:51 +01:00
Nicholas Piggin
3b54c71537 powerpc/pseries: use slab context cpumask allocation in CPU hotplug init
Slab is up at this point, using the bootmem allocator triggers a
warning. Switch to using the regular cpumask allocator.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105132923.1582514-1-npiggin@gmail.com
2021-12-16 21:31:46 +11:00
Nicholas Piggin
af47d79b04 powerpc/64s/interrupt: avoid saving CFAR in some asynchronous interrupts
Reading the CFAR register is quite costly (~20 cycles on POWER9). It is
a good idea to have for most synchronous interrupts, but for async ones
it is much less important.

Doorbell, external, and decrementer interrupts are the important
asynchronous ones. HV interrupts can't skip CFAR if KVM HV is possible,
because it might be a guest exit that requires CFAR preserved. But the
important pseries interrupts can avoid loading CFAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-7-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
ecb1057c0f powerpc/64/interrupt: reduce expensive debug tests
Move the assertions requiring restart table searches under
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-6-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
0faf20a1ad powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use
Enabling MSR[EE] in interrupt handlers while interrupts are still soft
masked allows PMIs to profile interrupt handlers to some degree, beyond
what SIAR latching allows.

When perf is not being used, this is almost useless work. It requires an
extra mtmsrd in the irq handler, and it also opens the door to masked
interrupts hitting and requiring replay, which is more expensive than
just taking them directly. This effect can be noticable in high IRQ
workloads.

Avoid enabling MSR[EE] unless perf is currently in use. This saves about
60 cycles (or 8%) on a simple decrementer interrupt microbenchmark.
Replayed interrupts drop from 1.4% of all interrupts taken, to 0.003%.

This does prevent the soft-nmi interrupt being taken in these handlers,
but that's not too reliable anyway. The SMP watchdog will continue to be
the reliable way to catch lockups.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-5-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
5a7745b96f powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI
Interrupt code enables MSR[EE] in some irq handlers while keeping local
irqs disabled via soft-mask, allowing PMI interrupts to be taken as
soft-NMI to improve profiling of irq handlers.

When perf is not enabled, there is no point to doing this, it's
additional overhead. So provide a function that can say if PMIs should
be taken promptly if possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-4-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
ff0b0d6e1a powerpc/64s/interrupt: handle MSR EE and RI in interrupt entry wrapper
The mtmsrd to enable MSR[RI] can be combined with the mtmsrd to enable
MSR[EE] in interrupt entry code, for those interrupts which enable EE.
This helps performance of important synchronous interrupts (e.g., page
faults).

This is similar to what commit dd152f70bd ("powerpc/64s: system call
avoid setting MSR[RI] until we set MSR[EE]") does for system calls.

Do this by enabling EE and RI together at the beginning of the entry
wrapper if PACA_IRQ_HARD_DIS is clear, and only enabling RI if it is
set.

Asynchronous interrupts set PACA_IRQ_HARD_DIS, but synchronous ones
leave it unchanged, so by default they always get EE=1 unless they have
interrupted a caller that is hard disabled. When the sync interrupt
later calls interrupt_cond_local_irq_enable(), it will not require
another mtmsrd because MSR[EE] was already enabled here.

This avoids one mtmsrd L=1 for synchronous interrupts on 64s, which
saves about 20 cycles on POWER9. And for kernel-mode interrupts, both
synchronous and asynchronous, this saves an additional 40 cycles due to
the mtmsrd being moved ahead of mfspr SPRN_AMR, which prevents a SPR
scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-3-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
4423eb5ae3 powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible
Make synchronous interrupt handler entry wrappers enable MSR[EE] if
MSR[EE] was enabled in the interrupted context. IRQs are soft-disabled
at this point so there is no change to high level code, but it's a
masked interrupt could fire.

This is a performance disadvantage for interrupts which do not later
call interrupt_cond_local_irq_enable(), because an an additional mtmsrd
or wrtee instruction is executed. However the important synchronous
interrupts (e.g., page fault) do enable interrupts, so the performance
disadvantage is mostly avoided.

In the next patch, MSR[RI] enabling can be combined with MSR[EE]
enabling, which mitigates the performance drop for the former and gives
a performance advanage for the latter interrupts, on 64s machines. 64e
is coming along for the ride for now to avoid divergences with 64s in
this tricky code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-2-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
0a006ace63 powerpc/pseries/vas: Don't print an error when VAS is unavailable
KVM does not support VAS so guests always print a useless error on boot

    vas: HCALL(398) error -2, query_type 0, result buffer 0x57f2000

Change this to only print the message if the error is not H_FUNCTION.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211126052133.1664375-1-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Kajol Jain
6ed05a8efd powerpc/perf: Add data source encodings for power10 platform
The code represent memory/cache level data based on PERF_MEM_LVL_*
namespace, which is in the process of deprication in the favour of
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
Add data source encodings to represent cache/memory data based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.

Add data source encodings to represent data coming from local
memory/Remote memory/distant memory and remote/distant cache hits.

In order to represent data coming from OpenCAPI cache/memory, we use
LVLNUM "PMEM" field which is used to present persistent memory accesses.

Result in power10 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    29.46%          2331  L1 or L1 hit              [.] __random                                     libc-2.28.so
    23.11%          2121  L1 or L1 hit              [.] producer_populate_cache                      producer_consumer
    18.56%          1758  L1 or L1 hit              [.] __random_r                                   libc-2.28.so
    15.64%          1559  L2 or L2 hit              [.] __random                                     libc-2.28.so
    .....
    0.09%              5  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    0.07%              4  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    .....

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-5-kjain@linux.ibm.com
2021-12-16 21:31:44 +11:00
Kajol Jain
4a20ee1061 powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields
The code represent data coming from L1/L2/L3 cache hits based on
PERF_MEM_LVL_* namespace, which is in the process of deprecation in
the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
fields.

Add data source encodings to represent L1/L2/L3 cache hits based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
power10 and older platforms

Result in power9 system without patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                             Shared Object
 # ........  ............  ........................  .................................  ................
 #
    29.51%             1  L2 hit                    [k] perf_event_exec                [kernel.vmlinux]
    27.05%             1  L1 hit                    [k] perf_ctx_unlock                [kernel.vmlinux]
    13.93%             1  L1 hit                    [k] vtime_delta                    [kernel.vmlinux]
    13.11%             1  L1 hit                    [k] prepend_path.isra.11           [kernel.vmlinux]
     8.20%             1  L1 hit                    [.] 00000038.plt_call.__GI_strlen  libc-2.28.so
     8.20%             1  L1 hit                    [k] perf_event_interrupt           [kernel.vmlinux]

Result in power9 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    36.63%             1  L2 or L2 hit              [k] perf_event_exec         [kernel.vmlinux]
    25.50%             1  L1 or L1 hit              [k] vtime_delta             [kernel.vmlinux]
    13.12%             1  L1 or L1 hit              [k] unmap_region            [kernel.vmlinux]
    12.62%             1  L1 or L1 hit              [k] perf_sample_event_took  [kernel.vmlinux]
     6.93%             1  L1 or L1 hit              [k] perf_ctx_unlock         [kernel.vmlinux]
     5.20%             1  L1 or L1 hit              [.] __memcpy_power7         libc-2.28.so

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-4-kjain@linux.ibm.com
2021-12-16 21:31:44 +11:00
Emmanuel Gil Peyrot
57bd7d3565 powerpc: gamecube_defconfig: Enable the RTC driver
This selects the rtc-gamecube driver, which provides a real-time clock
on this platform.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211215175501.6761-5-linkmauve@linkmauve.fr
2021-12-16 10:46:35 +01:00
Emmanuel Gil Peyrot
5479618e1e powerpc: wii.dts: Expose HW_SRNPROT on this platform
This Hollywood register isn’t properly understood, but can allow or
reject access to the SRAM, which we need to set for RTC usage if it
isn’t previously set correctly beforehand.

See https://wiibrew.org/wiki/Hardware/Hollywood_Registers#HW_SRNPROT

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211215175501.6761-4-linkmauve@linkmauve.fr
2021-12-16 10:46:35 +01:00
Michael Ellerman
708da3ff1d Merge branch 'topic/ppc-kvm' into next
Bring in some more KVM commits from our KVM topic branch.
2021-12-15 11:29:53 +11:00
Russell Currey
8734b41b3e powerpc/module_64: Fix livepatching for RO modules
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
writes through the text poke area by using patch_instruction().

R_PPC_REL24 is the only relocation type generated by the kpatch-build
userspace tool or klp-convert kernel tree that I observed applying a
relocation to a post-init module.

A more comprehensive solution is planned, but using patch_instruction()
for R_PPC_REL24 on should serve as a sufficient fix.

This does have a performance impact, I observed ~15% overhead in
module_load() on POWER8 bare metal with checksum verification off.

Fixes: c35717c71e ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
[mpe: Check return codes from patch_instruction()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211214121248.777249-1-mpe@ellerman.id.au
2021-12-14 23:13:03 +11:00
Sean Christopherson
63fa47ba88 KVM: PPC: Book3S HV P9: Use kvm_arch_vcpu_get_wait() to get rcuwait object
Use kvm_arch_vcpu_get_wait() to get a vCPU's rcuwait object instead of
using vcpu->wait directly in kvmhv_run_single_vcpu().  Functionally, this
is a nop as vcpu->arch.waitp is guaranteed to point at vcpu->wait.  But
that is not obvious at first glance, and a future change coming in via
the KVM tree, commit 510958e997 ("KVM: Force PPC to define its own
rcuwait object"), will hide vcpu->wait from architectures that define
__KVM_HAVE_ARCH_WQP to prevent generic KVM from attepting to wake a vCPU
with the wrong rcuwait object.

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211213174556.3871157-1-seanjc@google.com
2021-12-14 22:49:36 +11:00
Eric W. Biederman
0e25498f8c exit: Add and use make_task_dead.
There are two big uses of do_exit.  The first is it's design use to be
the guts of the exit(2) system call.  The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure.  In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Ingo Molnar
6773cc31a9 Linux 5.16-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmG2fU0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC7EH/3R7Rt+OD8Wn8Ss3
 w8V+dBxVwa2u2oMTyUHPxaeOXZ7bi38XlUdLFPOK/76bGwO0a5TmYZqsWdRbGyT0
 HfcYjHsQ0lbJXk/nh2oM47oJxJXVpThIHXJEk0FZ0Y5t+DYjIYlNHzqZymUyhLem
 St74zgWcyT+MXuqY34vB827FJDUnOxhhhi85tObeunaSPAomy9aiYidSC1ARREnz
 iz2VUntP/QnRnKVvL2nUZNzcz1xL5vfCRSKsRGRSv3qW1Y/1M71ylt6JVmSftWq+
 VmMdFxFhdrb1OK/1ct/930Un/UP2NG9EJsWxote2XYlnVSZHzDqH7lUhbqgdCcLz
 1m2tVNY=
 =7wRd
 -----END PGP SIGNATURE-----

Merge tag 'v5.16-rc5' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-12-13 10:48:46 +01:00
Jakub Kicinski
be3158290d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2021-12-10 v2

We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).

The main changes are:

1) Various samples fixes, from Alexander Lobakin.

2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.

3) A batch of new unified APIs for libbpf, logging improvements, version
   querying, etc. Also a batch of old deprecations for old APIs and various
   bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.

4) BPF documentation reorganization and improvements, from Christoph Hellwig
   and Dave Tucker.

5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
   libbpf, from Hengqi Chen.

6) Verifier log fixes, from Hou Tao.

7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.

8) Extend branch record capturing to all platforms that support it,
   from Kajol Jain.

9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.

10) bpftool doc-generating script improvements, from Quentin Monnet.

11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.

12) Deprecation warning fix for perf/bpf_counter, from Song Liu.

13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
    from Tiezhu Yang.

14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.

15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
    Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
    and others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
  libbpf: Add "bool skipped" to struct bpf_map
  libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
  bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
  selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
  selftests/bpf: Add test for libbpf's custom log_buf behavior
  selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
  libbpf: Deprecate bpf_object__load_xattr()
  libbpf: Add per-program log buffer setter and getter
  libbpf: Preserve kernel error code and remove kprobe prog type guessing
  libbpf: Improve logging around BPF program loading
  libbpf: Allow passing user log setting through bpf_object_open_opts
  libbpf: Allow passing preallocated log_buf when loading BTF into kernel
  libbpf: Add OPTS-based bpf_btf_load() API
  libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
  samples/bpf: Remove unneeded variable
  bpf: Remove redundant assignment to pointer t
  selftests/bpf: Fix a compilation warning
  perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
  samples: bpf: Fix 'unknown warning group' build warning on Clang
  samples: bpf: Fix xdp_sample_user.o linking with Clang
  ...
====================

Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 15:56:13 -08:00
Peter Zijlstra
1614b2b11f arch: Make ARCH_STACKWALK independent of STACKTRACE
Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.

Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:03 +00:00
David Woodhouse
5f33868af8 KVM: powerpc: Use Makefile.kvm for common files
It's all fairly baroque but in the end, I don't think there's any reason
for $(KVM)/irqchip.o to have been handled differently, as they all end
up in $(kvm-y) in the end anyway, regardless of whether they get there
via $(common-objs-y) and the CPU-specific object lists.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Message-Id: <20211121125451.9489-7-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09 13:06:01 -05:00
Christophe Leroy
b149d5d45a powerpc/powermac: Add additional missing lockdep_register_key()
Commit df1f679d19 ("powerpc/powermac: Add missing
lockdep_register_key()") fixed a problem that was causing a WARNING.

There are two other places in the same file with the same problem
originating from commit 9e607f7274 ("i2c_powermac: shut up lockdep
warning").

Add missing lockdep_register_key()

Fixes: 9e607f7274 ("i2c_powermac: shut up lockdep warning")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Depends-on: df1f679d19 ("powerpc/powermac: Add missing lockdep_register_key()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200055
Link: https://lore.kernel.org/r/2c7e421874e21b2fb87813d768cf662f630c2ad4.1638984999.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:22 +11:00
Hari Bathini
06e629c25d powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dff
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bb ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f222 ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d3841 ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab26238bb ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com
2021-12-09 22:41:22 +11:00
Hari Bathini
219572d2fc powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
Kdump can be triggered after panic_notifers since commit f06e5153f4
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
after panic_notifers") introduced crash_kexec_post_notifiers option.
But using this option would mean smp_send_stop(), that marks all other
CPUs as offline, gets called before kdump is triggered. As a result,
kdump routines fail to save other CPUs' registers. To fix this, kdump
friendly crash_smp_send_stop() function was introduced with kernel
commit 0ee59413c9 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path"). Override this kdump friendly weak
function to handle crash_kexec_post_notifiers option appropriately
on powerpc.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
[Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-1-hbathini@linux.ibm.com
2021-12-09 22:41:22 +11:00
Anders Roxell
e89257e28e powerpc/cell: Fix clang -Wimplicit-fallthrough warning
Clang warns:

arch/powerpc/platforms/cell/pervasive.c:81:2: error: unannotated fall-through between switch labels
        case SRR1_WAKEEE:
        ^
arch/powerpc/platforms/cell/pervasive.c:81:2: note: insert 'break;' to avoid fall-through
        case SRR1_WAKEEE:
        ^
        break;
1 error generated.

Clang is more pedantic than GCC, which does not warn when failing
through to a case that is just break or return. Clang's version is more
in line with the kernel's own stance in deprecated.rst. Add athe missing
break to silence the warning.

Fixes: 6e83985b0f ("powerpc/cbe: Do not process external or decremeter interrupts from sreset")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207110228.698956-1-anders.roxell@linaro.org
2021-12-09 22:41:21 +11:00
Christophe Leroy
0d76914a4c powerpc/inst: Optimise copy_inst_from_kernel_nofault()
copy_inst_from_kernel_nofault() uses copy_from_kernel_nofault() to
copy one or two 32bits words. This means calling an out-of-line
function which itself calls back copy_from_kernel_nofault_allowed()
then performs a generic copy with loops.

Rewrite copy_inst_from_kernel_nofault() to do everything at a
single place and use __get_kernel_nofault() directly to perform
single accesses without loops.

Allthough the generic function uses pagefault_disable(), it is not
required on powerpc because do_page_fault() bails earlier when a
kernel mode fault happens on a kernel address.

As the function has now become very small, inline it.

With this change, on an 8xx the time spent in the loop in
ftrace_replace_code() is reduced by 23% at function tracer activation
and 27% at nop tracer activation.
The overall time to activate function tracer (measured with shell
command 'time') is 570ms before the patch and 470ms after the patch.

Even vmlinux size is reduced (by 152 instruction).

Before the patch:

	00000018 <copy_inst_from_kernel_nofault>:
	  18:	94 21 ff e0 	stwu    r1,-32(r1)
	  1c:	7c 08 02 a6 	mflr    r0
	  20:	38 a0 00 04 	li      r5,4
	  24:	93 e1 00 1c 	stw     r31,28(r1)
	  28:	7c 7f 1b 78 	mr      r31,r3
	  2c:	38 61 00 08 	addi    r3,r1,8
	  30:	90 01 00 24 	stw     r0,36(r1)
	  34:	48 00 00 01 	bl      34 <copy_inst_from_kernel_nofault+0x1c>
				34: R_PPC_REL24	copy_from_kernel_nofault
	  38:	2c 03 00 00 	cmpwi   r3,0
	  3c:	40 82 00 0c 	bne     48 <copy_inst_from_kernel_nofault+0x30>
	  40:	81 21 00 08 	lwz     r9,8(r1)
	  44:	91 3f 00 00 	stw     r9,0(r31)
	  48:	80 01 00 24 	lwz     r0,36(r1)
	  4c:	83 e1 00 1c 	lwz     r31,28(r1)
	  50:	38 21 00 20 	addi    r1,r1,32
	  54:	7c 08 03 a6 	mtlr    r0
	  58:	4e 80 00 20 	blr

After the patch (before inlining):

	00000018 <copy_inst_from_kernel_nofault>:
	  18:	3d 20 b0 00 	lis     r9,-20480
	  1c:	7c 04 48 40 	cmplw   r4,r9
	  20:	7c 69 1b 78 	mr      r9,r3
	  24:	41 80 00 14 	blt     38 <copy_inst_from_kernel_nofault+0x20>
	  28:	81 44 00 00 	lwz     r10,0(r4)
	  2c:	38 60 00 00 	li      r3,0
	  30:	91 49 00 00 	stw     r10,0(r9)
	  34:	4e 80 00 20 	blr

	  38:	38 60 ff de 	li      r3,-34
	  3c:	4e 80 00 20 	blr
	  40:	38 60 ff f2 	li      r3,-14
	  44:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Add clang workaround, with version check as suggested by Nathan]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d5b12183d5176dd702d29ad94c39c384e51c78f.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
9b307576f3 powerpc/inst: Move ppc_inst_t definition in asm/reg.h
Because of circular inclusion of asm/hw_breakpoint.h, we
need to move definition of asm/reg.h outside of inst.h
so that asm/hw_breakpoint.h gets it without including
asm/inst.h

Also remove asm/inst.h from asm/uprobes.h as it's not
needed anymore.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4b79f1491118af96b1ac0735e74aeca02ea4c04e.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
07b863aef5 powerpc/inst: Define ppc_inst_t as u32 on PPC32
Unlike PPC64 ABI, PPC32 uses the stack to pass a parameter defined
as a struct, even when the struct has a single simple element.

To avoid that, define ppc_inst_t as u32 on PPC32.

Keep it as 'struct ppc_inst' when __CHECKER__ is defined so that
sparse can perform type checking.

Also revert commit 511eea5e2c ("powerpc/kprobes: Fix Oops by passing
ppc_inst as a pointer to emulate_step() on ppc32") as now the
instruction to be emulated is passed as a register to emulate_step().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c6d0c46f598f76ad0b0a88bc0d84773bd921b17c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
c545b9f040 powerpc/inst: Define ppc_inst_t
In order to stop using 'struct ppc_inst' on PPC32,
define a ppc_inst_t typedef.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
3261d99adb powerpc/inst: Refactor ___get_user_instr()
PPC64 version of ___get_user_instr() can be used for PPC32 as well,
by simply disabling the suffix part with IS_ENABLED(CONFIG_PPC64).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1f0ede830ccb33a659119a55cb590820c27004db.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
37eb7ca91b powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
Today we have the following IBATs allocated:

	---[ Instruction Block Address Translation ]---
	0: 0xc0000000-0xc03fffff 0x00000000         4M Kernel   x     m
	1: 0xc0400000-0xc05fffff 0x00400000         2M Kernel   x     m
	2: 0xc0600000-0xc06fffff 0x00600000         1M Kernel   x     m
	3: 0xc0700000-0xc077ffff 0x00700000       512K Kernel   x     m
	4: 0xc0780000-0xc079ffff 0x00780000       128K Kernel   x     m
	5: 0xc07a0000-0xc07bffff 0x007a0000       128K Kernel   x     m
	6:         -
	7:         -

The two 128K should be a single 256K instead.

When _etext is not aligned to 128Kbytes, the system will allocate
all necessary BATs to the lower 128Kbytes boundary, then allocate
an additional 128Kbytes BAT for the remaining block.

Instead, align the top to 128Kbytes so that the function directly
allocates a 256Kbytes last block:

	---[ Instruction Block Address Translation ]---
	0: 0xc0000000-0xc03fffff 0x00000000         4M Kernel   x     m
	1: 0xc0400000-0xc05fffff 0x00400000         2M Kernel   x     m
	2: 0xc0600000-0xc06fffff 0x00600000         1M Kernel   x     m
	3: 0xc0700000-0xc077ffff 0x00700000       512K Kernel   x     m
	4: 0xc0780000-0xc07bffff 0x00780000       256K Kernel   x     m
	5:         -
	6:         -
	7:         -

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ab58b296832b0ec650e2203200e060adbcb2677d.1637930421.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
dede19be51 powerpc: Remove CONFIG_PPC_HAVE_KUAP and CONFIG_PPC_HAVE_KUEP
All platforms now have KUAP and KUEP so remove CONFIG_PPC_HAVE_KUAP
and CONFIG_PPC_HAVE_KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a3c007ad0951965199e6ab2ef1035966bc66e771.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
57bc963837 powerpc/kuap: Wire-up KUAP on book3e/64
This adds KUAP support to book3e/64.
This is done by reading the content of SPRN_MAS1 and checking
the TID at the time user pgtable is loaded.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e2c2c9375afd4bbc06aa904d0103a5f5102a2b1a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
4f6a025201 powerpc/kuap: Wire-up KUAP on 85xx in 32 bits mode.
This adds KUAP support to 85xx in 32 bits mode.
This is done by reading the content of SPRN_MAS1 and checking
the TID at the time user pgtable is loaded.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f8696f8980ca1532ada3a2f0e0a03e756269c7fe.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
fcf9bb6d32 powerpc/kuap: Wire-up KUAP on 40x
This adds KUAP support to 40x. This is done by checking
the content of SPRN_PID at the time user pgtable is loaded.

40x doesn't have KUEP, but KUAP implies KUEP because when the
PID doesn't match the page's PID, the page cannot be read nor
executed.

So KUEP is now automatically selected when KUAP is selected and
disabled when KUAP is disabled.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aaefa91897ddc42ac11019dc0e1d1a525bd08e90.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
f6fad4fb55 powerpc/kuap: Wire-up KUAP on 44x
This adds KUAP support to 44x. This is done by checking
the content of SPRN_PID at the time it is read and written
into SPRN_MMUCR.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d6c3f1978a26feada74b084f651e8cf1e3b3a47.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
43afcf8f01 powerpc: Add KUAP support for BOOKE and 40x
On booke/40x we don't have segments like book3s/32.
On booke/40x we don't have access protection groups like 8xx.

Use the PID register to provide user access protection.
Kernel address space can be accessed with any PID.
User address space has to be accessed with the PID of the user.
User PID is always not null.

Everytime the kernel is entered, set PID register to 0 and
restore PID register when returning to user.

Everytime kernel needs to access user data, PID is restored
for the access.

In TLB miss handlers, check the PID and bail out to data storage
exception when PID is 0 and accessed address is in user space.

Note that also forbids execution of user text by kernel except
when user access is unlocked. But this shouldn't be a problem
as the kernel is not supposed to ever run user text.

This patch prepares the infrastructure but the real activation of KUAP
is done by following patches for each processor type one by one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5d65576a8e31e9480415785a180c92dd4e72306d.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
e3c02f25b4 powerpc/kuap: Make PPC_KUAP_DEBUG depend on PPC_KUAP only
PPC_KUAP_DEBUG is supported by all platforms doing PPC_KUAP,
it doesn't depend on Radix on book3s/64.

This will avoid adding one more dependency when implementing
KUAP on book3e/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a5ff6228a36e51783b83d8c10d058db76e450f63.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
42e03bc524 powerpc/kuap: Prepare for supporting KUAP on BOOK3E/64
Also call kuap_lock() and kuap_save_and_lock() from
interrupt functions with CONFIG_PPC64.

For book3s/64 we keep them empty as it is done in assembly.

Also do the locked assert when switching task unless it is
book3s/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1cbf94e26e6d6e2e028fd687588a7e6622d454a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
047a6fd401 powerpc/config: Add CONFIG_BOOKE_OR_40x
We have many functionnalities common to 40x and BOOKE, it leads to
many places with #if defined(CONFIG_BOOKE) || defined(CONFIG_40x).

We are going to add a few more with KUAP for booke/40x, so create
a new symbol which is defined when either BOOKE or 40x is defined.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9a3dbd60924cb25c9f944d3d8205ac5a0d15e229.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
25ae981faf powerpc/nohash: Move setup_kuap out of 8xx.c
In order to reuse it on booke/4xx, move KUAP
setup routine out of 8xx.c

Make them usable on SMP by removing the __init tag
as it is called for each CPU.

And use __prevent_user_access() instead of hard
coding initial lock.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ae35eec3426509efc2b8ae69586c822e2fe2642a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
937fb7003e powerpc/kuap: Add kuap_lock()
Add kuap_lock() and call it when entering interrupts from user.

It is called kuap_lock() as it is similar to kuap_save_and_lock()
without the save.

However book3s/32 already have a kuap_lock(). Rename it
kuap_lock_addr().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4437e2deb9f6f549f7089d45e9c6f96a7e77905a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
2341964e27 powerpc/kuap: Remove __kuap_assert_locked()
__kuap_assert_locked() is redundant with
__kuap_get_and_assert_locked().

Move the verification of CONFIG_PPC_KUAP_DEBUG in kuap_assert_locked()
and make it call __kuap_get_and_assert_locked() directly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1a60198a25d2ba38a37f1b92bc7d096435df4224.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
c252f3846d powerpc/kuap: Check KUAP activation in generic functions
Today, every platform checks that KUAP is not de-activated
before doing the real job.

Move the verification out of platform specific functions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/894f110397fcd248e125fb855d1e863e4e633a0d.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
ba454f9c8e powerpc/kuap: Add a generic intermediate layer
Make the following functions generic to all platforms.
- bad_kuap_fault()
- kuap_assert_locked()
- kuap_save_and_lock() (PPC32 only)
- kuap_kernel_restore()
- kuap_get_and_assert_locked()

And for all platforms except book3s/64
- allow_user_access()
- prevent_user_access()
- prevent_user_access_return()
- restore_user_access()

Prepend __ in front of the name of platform specific ones.

For now the generic just calls the platform specific, but
next patch will move redundant parts of specific functions
into the generic one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eaef143a8dae7288cd34565ffa7b49c16aee1ec3.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
6754862249 powerpc/kuep: Remove 'nosmep' boot time parameter except for book3s/64
Deactivating KUEP at boot time is unrelevant for PPC32 and BOOK3E/64.

Remove it.

It allows to refactor setup_kuep() via a __weak function
that only PPC64s will overide for now.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix CONFIG_PPC_BOOKS_64 -> CONFIG_PPC_BOOK3S_64 typo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4c36df18b41c988c4512f45d96220486adbe4c99.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
70428da94c powerpc/32s: Save content of sr0 to avoid 'mfsr'
Calling 'mfsr' to get the content of segment registers is heavy,
in addition it requires clearing of the 'reserved' bits.

In order to avoid this operation, save it in mm context and in
thread struct.

The saved sr0 is the one used by kernel, this means that on
locking entry it can be used as is.

For unlocking, the only thing to do is to clear SR_NX.

This improves null_syscall selftest by 12 cycles, ie 4%.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b02baf2ed8f09bad910dfaeeb7353b2ae6830525.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
526d4a4c77 powerpc/32s: Do kuep_lock() and kuep_unlock() in assembly
When interrupt and syscall entries where converted to C, KUEP locking
and unlocking was also converted. It improved performance by unrolling
the loop, and allowed easily implementing boot time deactivation of
KUEP.

However, null_syscall selftest shows that KUEP is still heavy
(361 cycles with KUEP, 212 cycles without).

A way to improve more is to group 'mtsr's together, instead of
repeating 'addi' + 'mtsr' several times.

In order to do that, more registers need to be available. In C, GCC
will always be able to provide the requested number of registers, but
at the cost of saving some data on the stack, which is counter
performant here.

So let's do it in assembly, when we have full control of which
register can be used. It also has the advantage of locking earlier
and unlocking later and it helps GCC generating less tricky code.
The only drawback is to make boot time deactivation less straight
forward and require 'hand' instruction patching.

Group 'mtsr's by 4.

With this change, null_syscall selftest reports 336 cycles. Without
the change it was 361 cycles, that's a 7% reduction.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/115cb279e9b9948dfd93a065e047081c59e3a2a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
df415cd758 powerpc/32s: Remove capability to disable KUEP at boottime
Disabling KUEP at boottime makes things unnecessarily complex.

Still allow disabling KUEP at build time, but when it's built-in
it is always there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/96f583f82423a29a4205c60b9721079111b35567.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
dc3a0e5b83 powerpc/book3e: Activate KUEP at all time
On book3e,
- When using 64 bits PTE: User pages don't have the SX bit defined
so KUEP is always active.
- When using 32 bits PTE: Implement KUEP by clearing SX bit during
TLB miss for user pages. The impact is minimal and worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e376b114283fb94504e2aa2de846780063252cde.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
ee2631603f powerpc/44x: Activate KUEP at all time
On 44x, KUEP is implemented by clearing SX bit during TLB miss
for user pages. The impact is minimal and not worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2414d662558e7fb27d1ed41c8e47c591d576acac.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
13dac4e31e powerpc/8xx: Activate KUEP at all time
On the 8xx, there is absolutely no runtime impact with KUEP. Protection
against execution of user code in kernel mode is set up at boot time
by configuring the groups with contain all user pages as having swapped
protection rights, in extenso EX for user and NA for supervisor.

Configure KUEP at startup and force selection of CONFIG_PPC_KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2129e86944323ffe9ed07fffbeafdfd2e363690a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
6c1fa60d36 Revert "powerpc: Inline setup_kup()"
This reverts commit 1791ebd131.

setup_kup() was inlined to manage conflict between PPC32 marking
setup_{kuap/kuep}() __init and PPC64 not marking them __init.

But in fact PPC32 has removed the __init mark for all but 8xx
in order to properly handle SMP.

In order to make setup_kup() grow a bit, revert the commit
mentioned above but remove __init for 8xx as well so that
we don't have to mark setup_kup() as __ref.

Also switch the order so that KUAP is initialised before KUEP
because on the 40x, KUEP will depend on the activation of KUAP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7691088fd0994ee3c8db6298dc8c00259e3f6a7f.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:16 +11:00
Christophe Leroy
06e7cbc29e powerpc/40x: Map 32Mbytes of memory at startup
As reported by Carlo, 16Mbytes is not enough with modern kernels
that tend to be a bit big, so map another 16M page at boot.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/89b5f974a7fa5011206682cd092e2c905530ff46.1632755552.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:16 +11:00
Nicholas Piggin
31284f703d powerpc/microwatt: add POWER9_CPU, clear PPC_64S_HASH_MMU
Microwatt implements a subset of ISA v3.0 (which is equivalent to
the POWER9_CPU option). It is radix-only, so does not require hash
MMU support.

This saves 20kB compressed dtbImage and 56kB vmlinux size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-19-npiggin@gmail.com
2021-12-09 22:41:16 +11:00
Nicholas Piggin
387e220a2e powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves
128kB kernel image size (90kB text) on powernv_defconfig minus KVM,
350kB on pseries_defconfig minus KVM, 40kB on a tiny config.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG.
      Fix radix_enabled() use in setup_initial_memory_limit(). Add some
      stubs to reduce number of ifdefs.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
2021-12-09 22:41:13 +11:00
Nicholas Piggin
c28573744b powerpc/64s: Make hash MMU support configurable
This adds Kconfig selection which allows 64s hash MMU support to be
disabled. It can be disabled if radix support is enabled, the minimum
supported CPU type is POWER9 (or higher), and KVM is not selected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-17-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin
debeda0171 powerpc/64s: Always define arch unmapped area calls
To avoid any functional changes to radix paths when building with hash
MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the
arch get_unmapped_area calls on 64s platforms.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-16-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin
af3a0ea41c powerpc/64s: Fix radix MMU when MMU_FTR_HPTE_TABLE is clear
There are a few places that require MMU_FTR_HPTE_TABLE to be set even
when running in radix mode. Fix those up.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-15-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin
8dbfc0092b powerpc/64e: remove mmu_linear_psize
mmu_linear_psize is only set at boot once on 64e, is not necessarily
the correct size of the linear map pages, and is never used anywhere.
Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Retain the extern, so we can use IS_ENABLED() for related code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-14-npiggin@gmail.com
2021-12-09 22:39:39 +11:00
Thomas Gleixner
e58f2259b9 genirq/msi, treewide: Use a named struct for PCI/MSI attributes
The unnamed struct sucks and is in the way of further cleanups. Stick the
PCI related MSI data into a real data structure and cleanup all users.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211206210224.374863119@linutronix.de
2021-12-09 11:52:21 +01:00
Cédric Le Goater
eca213152a powerpc/4xx: Complete removal of MSI support
Finish the work by removing all references to the PPC4xx_MSI config
and the associated device nodes in the DTs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/e92f2bb3-b5e1-c870-8151-3917a789a640@kaod.org
2021-12-09 11:52:20 +01:00
Thomas Gleixner
4f1d038b5e powerpc/4xx: Remove MSI support which never worked
This code is broken since day one. ppc4xx_setup_msi_irqs() has the
following gems:

 1) The handling of the result of msi_bitmap_alloc_hwirqs() is completely
    broken:
    
    When the result is greater than or equal 0 (bitmap allocation
    successful) then the loop terminates and the function returns 0
    (success) despite not having installed an interrupt.

    When the result is less than 0 (bitmap allocation fails), it prints an
    error message and continues to "work" with that error code which would
    eventually end up in the MSI message data.

 2) On every invocation the file global pp4xx_msi::msi_virqs bitmap is
    allocated thereby leaking the previous one.

IOW, this has never worked and for more than 10 years nobody cared. Remove
the gunk.

Fixes: 3fb7933850 ("powerpc/4xx: Adding PCIe MSI support")
Fixes: 247540b03b ("powerpc/44x: Fix PCI MSI support for Maui APM821xx SoC and Bluestone board")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210223.872249537@linutronix.de
2021-12-09 11:52:20 +01:00
Paolo Bonzini
93b350f884 Merge branch 'kvm-on-hv-msrbm-fix' into HEAD
Merge bugfix for enlightened MSR Bitmap, before adding support
to KVM for exposing the feature to nested guests.
2021-12-08 05:30:48 -05:00
Sean Christopherson
91b99ea706 KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt()
Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting
the actual "block" sequences into a separate helper (to be named
kvm_vcpu_block()).  x86 will use the standalone block-only path to handle
non-halt cases where the vCPU is not runnable.

Rename block_ns to halt_ns to match the new function name.

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:51 -05:00
Sean Christopherson
005467e06b KVM: Drop obsolete kvm_arch_vcpu_block_finish()
Drop kvm_arch_vcpu_block_finish() now that all arch implementations are
nops.

No functional change intended.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:50 -05:00
Sean Christopherson
510958e997 KVM: Force PPC to define its own rcuwait object
Do not define/reference kvm_vcpu.wait if __KVM_HAVE_ARCH_WQP is true, and
instead force the architecture (PPC) to define its own rcuwait object.
Allowing common KVM to directly access vcpu->wait without a guard makes
it all too easy to introduce potential bugs, e.g. kvm_vcpu_block(),
kvm_vcpu_on_spin(), and async_pf_execute() all operate on vcpu->wait, not
the result of kvm_arch_vcpu_get_wait(), and so may do the wrong thing for
PPC.

Due to PPC's shenanigans with respect to callbacks and waits (it switches
to the virtual core's wait object at KVM_RUN!?!?), it's not clear whether
or not this fixes any bugs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:46 -05:00
Maciej S. Szmigiero
a54d806688 KVM: Keep memslots in tree-based structures instead of array-based ones
The current memslot code uses a (reverse gfn-ordered) memslot array for
keeping track of them.

Because the memslot array that is currently in use cannot be modified
every memslot management operation (create, delete, move, change flags)
has to make a copy of the whole array so it has a scratch copy to work on.

Strictly speaking, however, it is only necessary to make copy of the
memslot that is being modified, copying all the memslots currently present
is just a limitation of the array-based memslot implementation.

Two memslot sets, however, are still needed so the VM continues to run
on the currently active set while the requested operation is being
performed on the second, currently inactive one.

In order to have two memslot sets, but only one copy of actual memslots
it is necessary to split out the memslot data from the memslot sets.

The memslots themselves should be also kept independent of each other
so they can be individually added or deleted.

These two memslot sets should normally point to the same set of
memslots. They can, however, be desynchronized when performing a
memslot management operation by replacing the memslot to be modified
by its copy.  After the operation is complete, both memslot sets once
again point to the same, common set of memslot data.

This commit implements the aforementioned idea.

For tracking of gfns an ordinary rbtree is used since memslots cannot
overlap in the guest address space and so this data structure is
sufficient for ensuring that lookups are done quickly.

The "last used slot" mini-caches (both per-slot set one and per-vCPU one),
that keep track of the last found-by-gfn memslot, are still present in the
new code.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:34 -05:00
Maciej S. Szmigiero
ed922739c9 KVM: Use interval tree to do fast hva lookup in memslots
The current memslots implementation only allows quick binary search by gfn,
quick lookup by hva is not possible - the implementation has to do a linear
scan of the whole memslots array, even though the operation being performed
might apply just to a single memslot.

This significantly hurts performance of per-hva operations with higher
memslot counts.

Since hva ranges can overlap between memslots an interval tree is needed
for tracking them.

[sean: handle interval tree updates in kvm_replace_memslot()]
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <d66b9974becaa9839be9c4e1a5de97b177b4ac20.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:32 -05:00
Sean Christopherson
6a99c6e3f5 KVM: Stop passing kvm_userspace_memory_region to arch memslot hooks
Drop the @mem param from kvm_arch_{prepare,commit}_memory_region() now
that its use has been removed in all architectures.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <aa5ed3e62c27e881d0d8bc0acbc1572bc336dc19.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:25 -05:00
Sean Christopherson
eaaaed137e KVM: PPC: Avoid referencing userspace memory region in memslot updates
For PPC HV, get the number of pages directly from the new memslot instead
of computing the same from the userspace memory region, and explicitly
check for !DELETE instead of inferring the same when toggling mmio_update.
The motivation for these changes is to avoid referencing the @mem param
so that it can be dropped in a future commit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <1e97fb5198be25f98ef82e63a8d770c682264cc9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:22 -05:00
Sean Christopherson
537a17b314 KVM: Let/force architectures to deal with arch specific memslot data
Pass the "old" slot to kvm_arch_prepare_memory_region() and force arch
code to handle propagating arch specific data from "new" to "old" when
necessary.  This is a baby step towards dynamically allocating "new" from
the get go, and is a (very) minor performance boost on x86 due to not
unnecessarily copying arch data.

For PPC HV, copy the rmap in the !CREATE and !DELETE paths, i.e. for MOVE
and FLAGS_ONLY.  This is functionally a nop as the previous behavior
would overwrite the pointer for CREATE, and eventually discard/ignore it
for DELETE.

For x86, copy the arch data only for FLAGS_ONLY changes.  Unlike PPC HV,
x86 needs to reallocate arch data in the MOVE case as the size of x86's
allocations depend on the alignment of the memslot's gfn.

Opportunistically tweak kvm_arch_prepare_memory_region()'s param order to
match the "commit" prototype.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
[mss: add missing RISCV kvm_arch_prepare_memory_region() change]
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <67dea5f11bbcfd71e3da5986f11e87f5dd4013f9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:20 -05:00
Marc Zyngier
46808a4cb8 KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu
index. Unfortunately, we're about to move rework the iterator,
which requires this to be upgrade to an unsigned long.

Let's bite the bullet and repaint all of it in one go.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-7-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:15 -05:00
Marc Zyngier
27592ae8db KVM: Move wiping of the kvm->vcpus array to common code
All architectures have similar loops iterating over the vcpus,
freeing one vcpu at a time, and eventually wiping the reference
off the vcpus array. They are also inconsistently taking
the kvm->lock mutex when wiping the references from the array.

Make this code common, which will simplify further changes.
The locking is dropped altogether, as this should only be called
when there is no further references on the kvm structure.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-2-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:13 -05:00
Sebastian Andrzej Siewior
77993b595a locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
The printk header file includes ratelimit_types.h for its __ratelimit()
based usage. It is required for the static initializer used in
printk_ratelimited(). It uses a raw_spinlock_t and includes the
spinlock_types.h.

PREEMPT_RT substitutes spinlock_t with a rtmutex based implementation and so
its spinlock_t implmentation (provided by spinlock_rt.h) includes rtmutex.h and
atomic.h which leads to recursive includes where defines are missing.

By including only the raw_spinlock_t defines it avoids the atomic.h
related includes at this stage.

An example on powerpc:

|  CALL    scripts/atomic/check-atomics.sh
|In file included from include/linux/bug.h:5,
|                 from include/linux/page-flags.h:10,
|                 from kernel/bounds.c:10:
|arch/powerpc/include/asm/page_32.h: In function ‘clear_page’:
|arch/powerpc/include/asm/bug.h:87:4: error: implicit declaration of function â=80=98__WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
|   87 |    __WARN();    \
|      |    ^~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
|   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
|      |  ^~~~~~~
|arch/powerpc/include/asm/bug.h:58:17: error: invalid application of ‘sizeofâ€=99 to incomplete type ‘struct bug_entryâ€=99
|   58 |     "i" (sizeof(struct bug_entry)), \
|      |                 ^~~~~~
|arch/powerpc/include/asm/bug.h:89:3: note: in expansion of macro ‘BUG_ENTRYâ€=99
|   89 |   BUG_ENTRY(PPC_TLNEI " %4, 0",   \
|      |   ^~~~~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
|   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
|      |  ^~~~~~~
|In file included from arch/powerpc/include/asm/ptrace.h:298,
|                 from arch/powerpc/include/asm/hw_irq.h:12,
|                 from arch/powerpc/include/asm/irqflags.h:12,
|                 from include/linux/irqflags.h:16,
|                 from include/asm-generic/cmpxchg-local.h:6,
|                 from arch/powerpc/include/asm/cmpxchg.h:526,
|                 from arch/powerpc/include/asm/atomic.h:11,
|                 from include/linux/atomic.h:7,
|                 from include/linux/rwbase_rt.h:6,
|                 from include/linux/rwlock_types.h:55,
|                 from include/linux/spinlock_types.h:74,
|                 from include/linux/ratelimit_types.h:7,
|                 from include/linux/printk.h:10,
|                 from include/asm-generic/bug.h:22,
|                 from arch/powerpc/include/asm/bug.h:109,
|                 from include/linux/bug.h:5,
|                 from include/linux/page-flags.h:10,
|                 from kernel/bounds.c:10:
|include/linux/thread_info.h: In function â=80=98copy_overflowâ=80=99:
|include/linux/thread_info.h:210:2: error: implicit declaration of function â=80=98WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
|  210 |  WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
|      |  ^~~~

The WARN / BUG include pulls in printk.h and then ptrace.h expects WARN
(from bug.h) which is not yet complete. Even hw_irq.h has WARN_ON()
statements.

On POWERPC64 there are missing atomic64 defines while building 32bit
VDSO:
|  VDSO32C arch/powerpc/kernel/vdso32/vgettimeofday.o
|In file included from include/linux/atomic.h:80,
|                 from include/linux/rwbase_rt.h:6,
|                 from include/linux/rwlock_types.h:55,
|                 from include/linux/spinlock_types.h:74,
|                 from include/linux/ratelimit_types.h:7,
|                 from include/linux/printk.h:10,
|                 from include/linux/kernel.h:19,
|                 from arch/powerpc/include/asm/page.h:11,
|                 from arch/powerpc/include/asm/vdso/gettimeofday.h:5,
|                 from include/vdso/datapage.h:137,
|                 from lib/vdso/gettimeofday.c:5,
|                 from <command-line>:
|include/linux/atomic-arch-fallback.h: In function ‘arch_atomic64_incâ€=99:
|include/linux/atomic-arch-fallback.h:1447:2: error: implicit declaration of function ‘arch_atomic64_add’; did you mean ‘arch_atomic_add’? [-Werror=3Dimpl
|icit-function-declaration]
| 1447 |  arch_atomic64_add(1, v);
|      |  ^~~~~~~~~~~~~~~~~
|      |  arch_atomic_add

The generic fallback is not included, atomics itself are not used. If
kernel.h does not include printk.h then it comes later from the bug.h
include.

Allow asm/spinlock_types.h to be included from
linux/spinlock_types_raw.h.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211129174654.668506-12-bigeasy@linutronix.de
2021-12-07 15:14:12 +01:00
Nicholas Piggin
20626177c9 powerpc: make memremap_compat_align 64s-only
memremap_compat_align is only relevant when ZONE_DEVICE is selected.
ZONE_DEVICE depends on ARCH_HAS_PTE_DEVMAP, which is only selected
by PPC_BOOK3S_64.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-13-npiggin@gmail.com
2021-12-02 22:57:24 +11:00
Nicholas Piggin
ffbe5d21d1 powerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radix
Radix never sets mmu_linear_psize so it's always 4K, which causes pcpu
atom_size to always be PAGE_SIZE. 64e sets it to 1GB always.

Make paths for these platforms to be explicit about what value they set
atom_size to.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-12-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
f43d2ffb47 powerpc/64s: Rename hash_hugetlbpage.c to hugetlbpage.c
This file contains functions and data common to radix, so rename it to
remove the hash_ prefix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-11-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
bdad5d57df powerpc/64s: move page size definitions from hash specific file
The radix code uses some of the psize variables. Move the common
ones from hash_utils.c to pgtable.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-10-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
310dce6201 powerpc/64s: Make flush_and_reload_slb a no-op when radix is enabled
The radix test can exclude slb_flush_all_realmode() from being called
because flush_and_reload_slb() is only expected to flush ERAT when
called by flush_erat(), which is only on pre-ISA v3.0 CPUs that do not
support radix.

This helps the later change to make hash support configurable to not
introduce runtime changes to radix mode behaviour.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-9-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
162b0889bb powerpc/64s: move THP trace point creation out of hash specific file
In preparation for making hash MMU support configurable, move THP
trace point function definitions out of an otherwise hash-specific
file.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-8-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
3d3282fd34 powerpc/pseries: lparcfg don't include slb_size line in radix mode
This avoids a change in behaviour in the later patch making hash
support configurable. This is possibly a user interface change, so
the alternative would be a hard-coded slb_size=0 here.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-7-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
0c7cc15e92 powerpc/pseries: move process table registration away from hash-specific code
This reduces ifdefs in a later change which makes hash support configurable.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-6-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
935b534c24 powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific
slb.c is hash-specific SLB management, but do_bad_slb_fault deals with
segment interrupts that occur with radix MMU as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-5-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
a4135cbebd powerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVE
The pseries platform does not use the native hash code but the PAPR
virtualised hash interfaces, so remove PPC_HASH_MMU_NATIVE.

This requires moving tlbiel code from hash_native.c to hash_utils.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-4-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
7ebc49031d powerpc: Rename PPC_NATIVE to PPC_HASH_MMU_NATIVE
PPC_NATIVE now only controls the native HPT code, so rename it to be
more descriptive. Restrict it to Book3S only.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-3-npiggin@gmail.com
2021-12-02 22:57:22 +11:00
Nicholas Piggin
79b74a6848 powerpc: Remove unused FW_FEATURE_NATIVE references
FW_FEATURE_NATIVE_ALWAYS and FW_FEATURE_NATIVE_POSSIBLE are always
zero and never do anything. Remove them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-2-npiggin@gmail.com
2021-12-02 22:57:22 +11:00
Alexey Kardashevskiy
792020907b KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST
H_COPY_TOFROM_GUEST is an hcall for an upper level VM to access its nested
VMs memory. The userspace can trigger WARN_ON_ONCE(!(gfp & __GFP_NOWARN))
in __alloc_pages() by constructing a tiny VM which only does
H_COPY_TOFROM_GUEST with a too big GPR9 (number of bytes to copy).

This silences the warning by adding __GFP_NOWARN.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901084550.1658699-1-aik@ozlabs.ru
2021-12-02 22:56:08 +11:00
Alexey Kardashevskiy
511d25d6b7 KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
The userspace can trigger "vmalloc size %lu allocation failure: exceeds
total pages" via the KVM_SET_USER_MEMORY_REGION ioctl.

This silences the warning by checking the limit before calling vzalloc()
and returns ENOMEM if failed.

This does not call underlying valloc helpers as __vmalloc_node() is only
exported when CONFIG_TEST_VMALLOC_MODULE and __vmalloc_node_range() is
not exported at all.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Use 'size' for the variable rather than 'cb']
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901084512.1658628-1-aik@ozlabs.ru
2021-12-02 22:55:10 +11:00
Nicholas Piggin
f6a1987773 KVM: PPC: Book3S HV P9: Remove unused ri_set local variable
ri_set is set and never used.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201052112.2137167-1-npiggin@gmail.com
2021-12-02 10:42:40 +11:00
Cédric Le Goater
2a2ac8a701 powerpc/xive: Fix compile when !CONFIG_PPC_POWERNV.
The automatic "save & restore" of interrupt context is a POWER10/XIVE2
feature exploited by KVM under the PowerNV platform. It is not
available under pSeries and the associated toggle should not be
exposed under the XIVE debugfs directory.

Introduce a platform handler for debugfs initialization and move the
'save-restore' entry under the native (PowerNV) backend to fix compile
when !CONFIG_PPC_POWERNV.

Fixes: 1e7684dc4f ("powerpc/xive: Add a debugfs toggle for save-restore")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201165418.1041842-1-clg@kaod.org
2021-12-02 10:40:38 +11:00
Kees Cook
62ea67e319 powerpc/signal32: Use struct_group() to zero spe regs
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add a struct_group() for the spe registers so that memset() can correctly reason
about the size:

   In function 'fortify_memset_chk',
       inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3:
   >> include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
     195 |    __write_overflow_field();
         |    ^~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211118203604.1288379-1-keescook@chromium.org
2021-12-02 10:39:00 +11:00
Mark Rutland
985faa7868 powerpc: Snapshot thread flags
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Generally this is unlikely to cause a
problem in practice, but it is somewhat unsound, and KCSAN will
legitimately warn that there is a data race.

To avoid such issues, a snapshot of the flags has to be taken prior to
using them. Some places already use READ_ONCE() for that, others do not.

Convert them all to the new flag accessor helpers.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Link: https://lore.kernel.org/r/20211129130653.2037928-11-mark.rutland@arm.com
2021-12-01 00:06:44 +01:00
Mark Rutland
08b0af5b2a powerpc: Avoid discarding flags in system_call_exception()
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Thus, when setting flags we must use
an atomic operation rather than a plain read-modify-write sequence, as a
plain read-modify-write may discard flags which are concurrently set by a
remote thread, e.g.

	// task A			// task B
	tmp = A->thread_info.flags;
					set_tsk_thread_flag(A, NEWFLAG_B);
	tmp |= NEWFLAG_A;
	A->thread_info.flags = tmp;

arch/powerpc/kernel/interrupt.c's system_call_exception() sets
_TIF_RESTOREALL in the thread info flags with a read-modify-write, which
may result in other flags being discarded.

Elsewhere in the file it uses clear_bits() to atomically remove flag bits,
so use set_bits() here for consistency with those.

There may be reasons (e.g. instrumentation) that prevent the use of
set_thread_flag() and clear_thread_flag() here, which would otherwise be
preferable.

Fixes: ae7aaecc3f ("powerpc/64s: system call rfscv workaround for TM bugs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Eirik Fuller <efuller@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20211129130653.2037928-10-mark.rutland@arm.com
2021-12-01 00:06:44 +01:00
Christophe Leroy
af11dee436 powerpc/32s: Fix shift-out-of-bounds in KASAN init
================================================================================
UBSAN: shift-out-of-bounds in arch/powerpc/mm/kasan/book3s_32.c:22:23
shift exponent -1 is negative
CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.5-gentoo-PowerMacG4 #9
Call Trace:
[c214be60] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
[c214be80] [c0b99288] ubsan_epilogue+0x10/0x5c
[c214be90] [c0b98fe0] __ubsan_handle_shift_out_of_bounds+0x94/0x138
[c214bf00] [c1c0f010] kasan_init_region+0xd8/0x26c
[c214bf30] [c1c0ed84] kasan_init+0xc0/0x198
[c214bf70] [c1c08024] setup_arch+0x18/0x54c
[c214bfc0] [c1c037f0] start_kernel+0x90/0x33c
[c214bff0] [00003610] 0x3610
================================================================================

This happens when the directly mapped memory is a power of 2.

Fix it by checking the shift and set the result to 0 when shift is -1

Fixes: 7974c47326 ("powerpc/32s: Implement dedicated kasan_init_region()")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215169
Link: https://lore.kernel.org/r/15cbc3439d4ad988b225e2119ec99502a5cc6ad3.1638261744.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:39 +11:00
Christophe Leroy
df1f679d19 powerpc/powermac: Add missing lockdep_register_key()
KeyWest i2c @0xf8001003 irq 42 /uni-n@f8000000/i2c@f8001000
BUG: key c2d00cbc has not been registered!
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:4801 lockdep_init_map_type+0x4c0/0xb4c
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.5-gentoo-PowerMacG4 #9
NIP:  c01a9428 LR: c01a9428 CTR: 00000000
REGS: e1033cf0 TRAP: 0700   Not tainted  (5.15.5-gentoo-PowerMacG4)
MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24002002  XER: 00000000

GPR00: c01a9428 e1033db0 c2d1cf20 00000016 00000004 00000001 c01c0630 e1033a73
GPR08: 00000000 00000000 00000000 e1033db0 24002004 00000000 f8729377 00000003
GPR16: c1829a9c 00000000 18305357 c1416fc0 c1416f80 c006ac60 c2d00ca8 c1416f00
GPR24: 00000000 c21586f0 c2160000 00000000 c2d00cbc c2170000 c216e1a0 c2160000
NIP [c01a9428] lockdep_init_map_type+0x4c0/0xb4c
LR [c01a9428] lockdep_init_map_type+0x4c0/0xb4c
Call Trace:
[e1033db0] [c01a9428] lockdep_init_map_type+0x4c0/0xb4c (unreliable)
[e1033df0] [c1c177b8] kw_i2c_add+0x334/0x424
[e1033e20] [c1c18294] pmac_i2c_init+0x9ec/0xa9c
[e1033e80] [c1c1a790] smp_core99_probe+0xbc/0x35c
[e1033eb0] [c1c03cb0] kernel_init_freeable+0x190/0x5a4
[e1033f10] [c000946c] kernel_init+0x28/0x154
[e1033f30] [c0035148] ret_from_kernel_thread+0x14/0x1c

Add missing lockdep_register_key()

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/69e4f55565bb45ebb0843977801b245af0c666fe.1638264741.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:39 +11:00
Christophe Leroy
f1797e4de1 powerpc/modules: Don't WARN on first module allocation attempt
module_alloc() first tries to allocate module text within 24 bits direct
jump from kernel text, and tries a wider allocation if first one fails.

When first allocation fails the following is observed in kernel logs:

  vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size
  systemd-udevd: vmalloc error: size 2395133, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null)
  CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G        W         5.15.5-gentoo-PowerMacG4 #9
  Call Trace:
  [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
  [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4
  [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c
  [e2a53c10] [c00338c0] module_alloc+0xa0/0xac
  [e2a53c40] [c027a368] load_module+0x2ae0/0x8148
  [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130
  [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28
  ...

Add __GFP_NOWARN flag to first allocation so that no warning appears
when it fails.

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Fixes: 2ec13df167 ("powerpc/modules: Load modules closer to kernel text")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:32 +11:00
Nicholas Piggin
5402e239d0 powerpc/64s: Get LPID bit width from device tree
Allow the LPID bit width and partition table size to be set at runtime
from the device tree.

Move the PID bit width detection into the same place.

KVM does not support using the extra bits yet, this is mainly required
to get the PTCR register values correct (so KVM will run but it will
not allocate > 4096 LPIDs).

OPAL firmware provides this property for POWER10 CPUs since skiboot
commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for
POWER10 CPUs").

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com
2021-11-30 22:27:07 +11:00
Athira Rajeev
2c9ac51b85 powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC
Running perf fuzzer showed below in dmesg logs:
  "Can't find PMC that caused IRQ"

This means a PMU exception happened, but none of the PMC's (Performance
Monitor Counter) were found to be overflown. There are some corner cases
that clears the PMCs after PMI gets masked. In such cases, the perf
interrupt handler will not find the active PMC values that had caused
the overflow and thus leads to this message while replaying.

Case 1: PMU Interrupt happens during replay of other interrupts and
counter values gets cleared by PMU callbacks before replay:

During replay of interrupts like timer, __do_irq() and doorbell
exception, we conditionally enable interrupts via may_hard_irq_enable().
This could potentially create a window to generate a PMI. Since irq soft
mask is set to ALL_DISABLED, the PMI will get masked here. We could get
IPIs run before perf interrupt is replayed and the PMU events could
be deleted or stopped. This will change the PMU SPR values and resets
the counters. Snippet of ftrace log showing PMU callbacks invoked in
__do_irq():

  <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
  <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
  <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
  <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
  <<>>
  <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
  <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
  <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
  <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
  <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
  <<>>
  <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
  <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts

Case 2: PMI's masked during local_* operations, example local_add(). If
the local_add() operation happens within a local_irq_save(), replay of
PMI will be during local_irq_restore(). Similar to case 1, this could
also create a window before replay where PMU events gets deleted or
stopped.

Fix it by updating the PMU callback function power_pmu_disable() to
check for pending perf interrupt. If there is an overflown PMC and
pending perf interrupt indicated in paca, clear the PMI bit in paca to
drop that sample. Clearing of PMI bit is done in power_pmu_disable()
since disable is invoked before any event gets deleted/stopped. With
this fix, if there are more than one event running in the PMU, there is
a chance that we clear the PMI bit for the event which is not getting
deleted/stopped. The other events may still remain active. Hence to make
sure we don't drop valid sample in such cases, another check is added in
power_pmu_enable. This checks if there is an overflown PMC found among
the active events and if so enable back the PMI bit. Two new helper
functions are introduced to clear/set the PMI, ie
clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function
pmi_irq_pending() is introduced to give a warning if there is pending
PMI bit in paca, but no PMC is overflown.

Also there are corner cases which result in performance monitor
interrupts being triggered during power_pmu_disable(). This happens
since PMXE bit is not cleared along with disabling of other MMCR0 bits
in the pmu_disable. Such PMI's could leave the PMU running and could
trigger PMI again which will set MMCR0 PMAO bit. This could lead to
spurious interrupts in some corner cases. Example, a timer after
power_pmu_del() which will re-enable interrupts and triggers a PMI again
since PMAO bit is still set. But fails to find valid overflow since PMC
was cleared in power_pmu_del(). Fix that by disabling PMXE along with
disabling of other MMCR0 bits in power_pmu_disable().

We can't just replay PMI any time. Hence this approach is preferred
rather than replaying PMI before resetting overflown PMC. Patch also
documents core-book3s on a race condition which can trigger these PMC
messages during idle path in PowerNV.

Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Make pmi_irq_pending() return bool, reflow/reword some comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-11-30 17:15:49 +11:00
Christophe Leroy
f05cab0034 powerpc/atomics: Remove atomic_inc()/atomic_dec() and friends
Now that atomic_add() and atomic_sub() handle immediate operands,
atomic_inc() and atomic_dec() have no added value compared to the
generic fallback which calls atomic_add(1) and atomic_sub(1).

Also remove atomic_inc_not_zero() which fallsback to
atomic_add_unless() which itself fallsback to
atomic_fetch_add_unless() which now handles immediate operands.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0bc64a2f18726055093dbb2e479cefc60a409cfd.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:57 +11:00
Christophe Leroy
41d65207de powerpc/atomics: Use immediate operand when possible
Today we get the following code generation for atomic operations:

	c001bb2c:	39 20 00 01 	li      r9,1
	c001bb30:	7d 40 18 28 	lwarx   r10,0,r3
	c001bb34:	7d 09 50 50 	subf    r8,r9,r10
	c001bb38:	7d 00 19 2d 	stwcx.  r8,0,r3

	c001c7a8:	39 40 00 01 	li      r10,1
	c001c7ac:	7d 00 18 28 	lwarx   r8,0,r3
	c001c7b0:	7c ea 42 14 	add     r7,r10,r8
	c001c7b4:	7c e0 19 2d 	stwcx.  r7,0,r3

By allowing GCC to choose between immediate or regular operation,
we get:

	c001bb2c:	7d 20 18 28 	lwarx   r9,0,r3
	c001bb30:	39 49 ff ff 	addi    r10,r9,-1
	c001bb34:	7d 40 19 2d 	stwcx.  r10,0,r3
	--
	c001c7a4:	7d 40 18 28 	lwarx   r10,0,r3
	c001c7a8:	39 0a 00 01 	addi    r8,r10,1
	c001c7ac:	7d 00 19 2d 	stwcx.  r8,0,r3

For "and", the dot form has to be used because "andi" doesn't exist.

For logical operations we use unsigned 16 bits immediate.
For arithmetic operations we use signed 16 bits immediate.

On pmac32_defconfig, it reduces the text by approx another 8 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2ec558d44db8045752fe9dbd29c9ba84bab6030b.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:57 +11:00
Christophe Leroy
fb350784d8 powerpc/bitops: Use immediate operand when possible
Today we get the following code generation for bitops like
set or clear bit:

	c0009fe0:	39 40 08 00 	li      r10,2048
	c0009fe4:	7c e0 40 28 	lwarx   r7,0,r8
	c0009fe8:	7c e7 53 78 	or      r7,r7,r10
	c0009fec:	7c e0 41 2d 	stwcx.  r7,0,r8

	c000d568:	39 00 18 00 	li      r8,6144
	c000d56c:	7c c0 38 28 	lwarx   r6,0,r7
	c000d570:	7c c6 40 78 	andc    r6,r6,r8
	c000d574:	7c c0 39 2d 	stwcx.  r6,0,r7

Most set bits are constant on lower 16 bits, so it can easily
be replaced by the "immediate" version of the operation. Allow
GCC to choose between the normal or immediate form.

For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
when all bits to be cleared are consecutive.

On 64 bits we don't have any equivalent single operation for clearing,
single bits or a few bits, we'd need two 'rldicl' so it is not
worth it, the li/andc sequence is doing the same.

With this patch we get:

	c0009fe0:	7d 00 50 28 	lwarx   r8,0,r10
	c0009fe4:	61 08 08 00 	ori     r8,r8,2048
	c0009fe8:	7d 00 51 2d 	stwcx.  r8,0,r10

	c000d558:	7c e0 40 28 	lwarx   r7,0,r8
	c000d55c:	54 e7 05 64 	rlwinm  r7,r7,0,21,18
	c000d560:	7c e0 41 2d 	stwcx.  r7,0,r8

On pmac32_defconfig, it reduces the text by approx 10 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:50 +11:00
Nicholas Piggin
aebd1fb45c powerpc: flexible GPR range save/restore macros
Introduce macros that operate on a (start, end) range of GPRs, which
reduces lines of code and need to do mental arithmetic while reading the
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com
2021-11-29 23:15:20 +11:00
Nicholas Piggin
e012c49998 powerpc/watchdog: help remote CPUs to flush NMI printk output
The printk layer at the moment does not seem to have a good way to force
flush printk messages that are created in NMI context, except in the
panic path.

NMI-context printk messages normally get to the console with irq_work,
but that won't help if the CPU is stuck with irqs disabled, as can be
the case for hard lockup watchdog messages.

The watchdog currently flushes the printk buffers after detecting a
lockup on remote CPUs, but they may not have processed their NMI IPI
yet by that stage, or they may have self-detected a lockup in which
case they won't go via this NMI IPI path.

Improve the situation by having NMI-context mark a flag if it called
printk, and have watchdog timer interrupts check if that flag was set
and try to flush if it was. Latency is not a big problem because we
were already stuck for a while, just need to try to make sure the
messages eventually make it out.

Depends-on: 5d5e4522a7 ("printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119113146.752759-6-npiggin@gmail.com
2021-11-29 23:08:43 +11:00
Christophe Leroy
57dd3a7bdf powerpc: Don't bother about .data..Lubsan sections
Since commit 9a427556fb ("vmlinux.lds.h: catch compound literals
into data and BSS") .data..Lubsan sections are taken into account
in DATA_MAIN which is included in DATA_DATA macro.

No need to take care of them anymore in powerpc vmlinux.lds.S

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3eb14570612eef17e01bb67f14a4450136001794.1637840601.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
cdc81aece8 powerpc/ptdump: Fix display a BAT's size unit
We have wrong units on BAT's sizes (G instead of M, M instead of ...)

	---[ Instruction Block Address Translation ]---
	0: 0xc0000000-0xc03fffff 0x00000000         4G Kernel   x     m
	1: 0xc0400000-0xc05fffff 0x00400000         2G Kernel   x     m
	2: 0xc0600000-0xc06fffff 0x00600000         1G Kernel   x     m
	3: 0xc0700000-0xc077ffff 0x00700000       512M Kernel   x     m
	4: 0xc0780000-0xc079ffff 0x00780000       128M Kernel   x     m
	5: 0xc07a0000-0xc07bffff 0x007a0000       128M Kernel   x     m
	6:         -
	7:         -

This is because pt_dump_size() expects a size in Kbytes but
bat_show_603() gives the size in bytes.

To avoid risk of confusion, change pt_dump_size() to take bytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f16c30f5c9185a63335322cf1a8b22f189d335ef.1637922595.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
7dfbfb87c2 powerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32
Unlike PPC64, PPC32 doesn't require any special compiler option
to get _mcount() call not clobbering registers.

Provide ftrace_regs_caller() and ftrace_regs_call() and activate
HAVE_DYNAMIC_FTRACE_WITH_REGS.

That's heavily copied from ftrace_64_mprofile.S

For the time being leave livepatching aside, it will come with
following patch.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1862dc7719855cc2a4eec80920d94c955877557e.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
c93d4f6ecf powerpc/ftrace: Add module_trampoline_target() for PPC32
module_trampoline_target() is used by __ftrace_modify_call().

Implement it for PPC32 so that CONFIG_DYNAMIC_FTRACE_WITH_REGS
can be activated on PPC32 as well.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/42345f464fb465f0fc76f3090e250be8fc1729f0.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
88670fdb26 powerpc/ftrace: No need to read LR from stack in _mcount()
All functions calling _mcount do it exactly the same way, with the
following sequence of instructions:

	c07de788:       7c 08 02 a6     mflr    r0
	c07de78c:       90 01 00 04     stw     r0,4(r1)
	c07de790:       4b 84 13 65     bl      c001faf4 <_mcount>

Allthough LR is pushed on stack, it is still in r0 while entering
_mcount().

Function arguments are in r3-r10, so r11 and r12 are still available
at that point.

Do like PPC64 and use r12 to move LR into CTR, so that r0 is preserved
and doesn't need to be restored from the stack.

While at it, bring back the EXPORT_SYMBOL at the end of _mcount.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24a3ba7db388537c44a038026f926d885372e6d3.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Michael Ellerman
ab85a27395 powerpc: Mark probe_machine() __init and static
Prior to commit b1923caa6e ("powerpc: Merge 32-bit and 64-bit
setup_arch()") probe_machine() was called from setup_32/64.c and lived
in setup-common.c. But now it's only called from setup-common.c so it
can be static and __init, and we don't need the declaration in
machdep.h either.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-6-mpe@ellerman.id.au
2021-11-29 22:49:26 +11:00
Michael Ellerman
a4ac0d249a powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
setup_profiling_timer() is only needed when CONFIG_PROFILING is enabled.

Fixes the following W=1 warning when CONFIG_PROFILING=n:
  linux/arch/powerpc/kernel/smp.c:1638:5: error: no previous prototype for ‘setup_profiling_timer’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-5-mpe@ellerman.id.au
2021-11-29 22:49:23 +11:00
Michael Ellerman
ff47a95d1a powerpc/mm: Move tlbcam_sz() and make it static
Building with W=1 we see a warning:
  linux/arch/powerpc/mm/nohash/fsl_book3e.c:63:15: error: no previous prototype for ‘tlbcam_sz’

tlbcam_sz() is not used outside this file, so we can make it static.
However it's only used inside #ifdef CONFIG_PPC32, so move it within
that ifdef, otherwise we would get a defined but not used error.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-4-mpe@ellerman.id.au
2021-11-29 22:49:20 +11:00
Michael Ellerman
d9150d5bb5 powerpc/85xx: Make c293_pcie_pic_init() static
To fix the W=1 warning:
  linux/arch/powerpc/platforms/85xx/c293pcie.c:22:13: error: no previous prototype for ‘c293_pcie_pic_init’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-3-mpe@ellerman.id.au
2021-11-29 22:49:17 +11:00
Michael Ellerman
84a61fb43f powerpc/85xx: Make mpc85xx_smp_kexec_cpu_down() static
To fix the W=1 warning:
  arch/powerpc/platforms/85xx/smp.c:369:6: error: no previous prototype for ‘mpc85xx_smp_kexec_cpu_down’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-2-mpe@ellerman.id.au
2021-11-29 22:49:14 +11:00
Michael Ellerman
4ea9e321c2 powerpc/85xx: Fix no previous prototype warning for mpc85xx_setup_pmc()
Fixes the following W=1 warning:
  arch/powerpc/platforms/85xx/mpc85xx_pm_ops.c:89:12: warning: no previous prototype for 'mpc85xx_setup_pmc'

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-1-mpe@ellerman.id.au
2021-11-29 22:49:09 +11:00
Nicholas Piggin
2eafc4748b powerpc: select CPUMASK_OFFSTACK if NR_CPUS >= 8192
Some core kernel code starts to go beyond the 2048 byte stack size
warning at NR_CPUS=8192, so select CPUMASK_OFFSTACK in that case.
x86 does similarly for very large NR_CPUS.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105035042.1398309-2-npiggin@gmail.com
2021-11-29 22:48:32 +11:00
Nicholas Piggin
b350111bf7 powerpc: remove cpu_online_cores_map function
This function builds the cores online map with on-stack cpumasks which
can cause high stack usage with large NR_CPUS.

It is not used in any performance sensitive paths, so instead just check
for first thread sibling.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105035042.1398309-1-npiggin@gmail.com
2021-11-29 22:48:32 +11:00
Xiaoming Ni
3dc709e518 powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
When CONFIG_FSL_PMC is set to n, no value is assigned to cpu_up_prepare
in the mpc85xx_pm_ops structure. As a result, oops is triggered in
smp_85xx_start_cpu().

  smp: Bringing up secondary CPUs ...
  kernel tried to execute user page (0) - exploit attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch (NULL pointer?)
  Faulting instruction address: 0x00000000
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [00000000] 0x0
  LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568
  Call Trace:
  [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable)
  [c1051de8] [c0011460] __cpu_up+0xc0/0x228
  [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224
  [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c
  [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8
  [c1051eb8] [c07e67bc] smp_init+0x30/0x78
  [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8
  [c1051f18] [c00032d8] kernel_init+0x14/0x124
  [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c

Fixes: c45361abb9 ("powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n")
Reported-by: Martin Kennedy <hurricos@gmail.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Tested-by: Martin Kennedy <hurricos@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211126041153.16926-1-nixiaoming@huawei.com
2021-11-29 17:47:05 +11:00
Michael Ellerman
af3fdce4ab Revert "powerpc/code-patching: Improve verification of patchability"
This reverts commit 8b8a8f0ab3.

As reported[1] by Sachin this causes problems with ftrace, and it also
causes the code patching selftests to fail as reported[2] by Stephen.

So revert it for now.

1: https://lore.kernel.org/linuxppc-dev/3668743C-09DF-4673-B15C-2FFE2A57F7D7@linux.vnet.ibm.com/
2: https://lore.kernel.org/linuxppc-dev/20211126161747.1f7795b0@canb.auug.org.au/

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-11-29 17:41:52 +11:00
Linus Torvalds
7b65b798a6 powerpc fixes for 5.16 #3
Fix KVM using a Power9 instruction on earlier CPUs, which could lead to the host SLB being
 incorrectly invalidated and a subsequent host crash.
 
 Fix kernel hardlockup on vmap stack overflow on 32-bit.
 
 Thanks to: Christophe Leroy, Nicholas Piggin, Fabiano Rosas.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGh96MTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgGHDEACu37n/NLFp3qzHyrk4fHN6sUQ0wGsE
 oroKeIrTP2+JPsoEDd6Jd9oQmqmVLmVTp4y/Lu6AAJbK4p1He1iPIxIbZn12NwqY
 /siCihV76PXR8748g1j8r3a3UYDmmvmllAIf9JdAn2I4kKPE3J37/1NEGSqZyJ80
 HzkheTczUweANhVuG1RDh3WvnAKMuOvtrT/YgetVILig23oiUORQkZH5wcpScaCN
 qShnqKSuRt8HWaF3uRh6D0sf4o0BWfZ6Pjyi7R/T7s2dQVgz0uCtu4MuyGirx/0J
 uU3yDEZD24y6doyi6HUgAFGeGYqJXky2XDOMrgV6CNGFty9HFp6GgN6gtX353pZa
 9Xplw4kg9UJwGcyk1BvgdQBHdc83y5zmr8c2PfhgiP/5rfEyCTYtHgZKUu5LLW8Y
 ECYkSNnlAT68SancCmiXgos1heIiwB7922cRWJANfGEjItNBFldtb2QZio2bWOUX
 g4gZ8HN2P0OmOxV1nnNkmIe+ZKLeHPN2N/FR7pUdkFk2+BLWFx+SbrceCCicwMF4
 CUUzll8FMFmVAGjFivG/axzqlDWMY5lrDsFZamaWN3VQEo7TwL6l8/cDcFZF4QBq
 viLWbi7sfQm6M3tZXFS465sG4SpUQ4lI8PKptuz5DMdr6WKisTLR7OF0klxsEwZj
 ZcT8a/wzUIZ53w==
 =MBp+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix KVM using a Power9 instruction on earlier CPUs, which could lead
  to the host SLB being incorrectly invalidated and a subsequent host
  crash.

  Fix kernel hardlockup on vmap stack overflow on 32-bit.

  Thanks to Christophe Leroy, Nicholas Piggin, and Fabiano Rosas"

* tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Fix hardlockup on vmap stack overflow
  KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
2021-11-27 10:06:15 -08:00
Linus Torvalds
b501b85957 asm-generic: syscall table updates
André Almeida sends an update for the newly added futex_waitv
 syscall that was initially only added to a few architectures.
 
 Some additional ones have since made it through architecture
 maintainer trees, this finishes the remaining ones.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGfrmIACgkQmmx57+YA
 GNl0eg/9Eu2iliGxiQPzPw93a3DCYr7eiylbhHbgzvCbUtDL4cTE/0fcFGCYUoVu
 k5UT/5ja3Q4z8wE+MUbxXf6igr+ANKJNwomKksTJV1Q6wKF23k5vUEd2ZXGyvrSk
 F+2X0bpB0wZzmMBh98L6dWsEaDJ//8iRZNW4ErmPR4mp25iDqhzZ1k2RoRg9mYoM
 /Lw6u5zlBiqAw5nnMc6BxPo8gJGwgcKDu4VDngpGDVGlvBHgd7Kf1AKi1+aBumml
 WzVXEEdUp3LTj7O/8oU3dKPuZMhl+k/mKUvHn/cieG00FxfMpOnxj0gdaUGgG0O4
 imhQquXtPueq/W3Uod+MhVulKR6AziEOsPhJayMGopLReE0spKDxcplC/OSbMhGj
 0iA6auLdHpHBCg3KWZDj0Sjf2NwL9UCFVDrk8XAQQo7bY2U5LQbcbr0PS7iLlpzN
 gIb+r2k3Azwk/Iqnvl2/SR1cZTx5KxfxCydT03vXaYQXhc5l+u4RvjOZcSGgZXqd
 gzC78vd6fJGKyVeaL2xb6/w7qmA4DFd6sQAbgMlVFRRS6AIJrIdDEZAQztSJ+uYK
 p9A2qzeXz+ftzonySbPHmV4RBGjSVVecUBjRmmDfKNbqmFUjjnCwCv+LeDXh4IS6
 mjmSA/9liYdkHUaujwCLhOEskuCx9ZPeAsqmzXMER0BDzReVlH4=
 =Vy2A
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic syscall table update from Arnd Bergmann:
 "André Almeida sends an update for the newly added futex_waitv syscall
  that was initially only added to a few architectures.

  Some additional ones have since made it through architecture
  maintainer trees, this finishes the remaining ones"

* tag 'asm-generic-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  futex: Wireup futex_waitv syscall
2021-11-25 10:41:28 -08:00
André Almeida
a0eb2da92b futex: Wireup futex_waitv syscall
Wireup futex_waitv syscall for all remaining archs.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-25 14:26:12 +01:00
Nicholas Piggin
3d030e3018 powerpc/watchdog: Fix wd_smp_last_reset_tb reporting
wd_smp_last_reset_tb now gets reset by watchdog_smp_panic() as part of
marking CPUs stuck and removing them from the pending mask before it
begins any printing. This causes last reset times reported to be off.

Fix this by reading it into a local variable before it gets reset.

Fixes: 76521c4b02 ("powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211125103346.1188958-1-npiggin@gmail.com
2021-11-25 21:48:39 +11:00
Michael Ellerman
4afc78eae1 powerpc/microwatt: Make microwatt_get_random_darn() static
Make microwatt_get_random_darn() static, because it can be.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211118004415.1706863-1-mpe@ellerman.id.au
2021-11-25 11:25:34 +11:00
Nicholas Piggin
1f01bf9076 powerpc/watchdog: read TB close to where it is used
When taking watchdog actions, printing messages, comparing and
re-setting wd_smp_last_reset_tb, etc., read TB close to the point of use
and under wd_smp_lock or printing lock (if applicable).

This should keep timebase mostly monotonic with kernel log messages, and
could prevent (in theory) a laggy CPU updating wd_smp_last_reset_tb to
something a long way in the past, and causing other CPUs to appear to be
stuck.

These additional TB reads are all slowpath (lockup has been detected),
so performance does not matter.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-5-npiggin@gmail.com
2021-11-25 11:25:34 +11:00
Nicholas Piggin
76521c4b02 powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi
There is a deadlock with the console_owner lock and the wd_smp_lock:

CPU x takes the console_owner lock
CPU y takes a watchdog timer interrupt and takes __wd_smp_lock
CPU x takes a soft-NMI interrupt, detects deadlock, spins on __wd_smp_lock
CPU y detects deadlock, tries to print something and spins on console_owner
-> deadlock

Change the watchdog locking scheme so wd_smp_lock protects the watchdog
internal data, but "reporting" (printing, issuing NMI IPIs, taking any
action outside of watchdog) uses a non-waiting exclusion. If a CPU detects
a problem but can not take the reporting lock, it just returns because
something else is already reporting. It will try again at some point.

Typically hard lockup watchdog report usefulness is not impacted due to
failure to spewing a large enough amount of data in as short a time as
possible, but by messages getting garbled.

Laurent debugged this and found the deadlock, and this patch is based on
his general approach to avoid expensive operations while holding the lock.
With the addition of the reporting exclusion.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
[np: rework to add reporting exclusion update changelog]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-4-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Nicholas Piggin
858c93c315 powerpc/watchdog: tighten non-atomic read-modify-write access
Most updates to wd_smp_cpus_pending are under lock except the watchdog
interrupt bit clear.

This can race with non-atomic RMW updates to the mask under lock, which
can happen in two instances:

Firstly, if another CPU detects this one is stuck, removes it from the
mask, mask becomes empty and is re-filled with non-atomic stores. This
is okay because it would re-fill the mask with this CPU's bit clear
anyway (because this CPU is now stuck), so it doesn't matter that the
bit clear update got "lost". Add a comment for this.

Secondly, if another CPU detects a different CPU is stuck and removes it
from the pending mask with a non-atomic store to bytes which also
include the bit of this CPU. This case can result in the bit clear being
lost and the end result being the bit is set. This should be so rare it
hardly matters, but to make things simpler to reason about just avoid
the non-atomic access for that case.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-3-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Nicholas Piggin
5dad4ba68a powerpc/watchdog: Fix missed watchdog reset due to memory ordering race
It is possible for all CPUs to miss the pending cpumask becoming clear,
and then nobody resetting it, which will cause the lockup detector to
stop working. It will eventually expire, but watchdog_smp_panic will
avoid doing anything if the pending mask is clear and it will never be
reset.

Order the cpumask clear vs the subsequent test to close this race.

Add an extra check for an empty pending mask when the watchdog fires and
finds its bit still clear, to try to catch any other possible races or
bugs here and keep the watchdog working. The extra test in
arch_touch_nmi_watchdog is required to prevent the new warning from
firing off.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Debugged-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-2-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Peiwei Hu
869fb7e5ae powerpc/prom_init: Fix improper check of prom_getprop()
prom_getprop() can return PROM_ERROR. Binary operator can not identify
it.

Fixes: 94d2dde738 ("[POWERPC] Efika: prune fixups and make them more carefull")
Signed-off-by: Peiwei Hu <jlu.hpw@foxmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/tencent_BA28CC6897B7C95A92EB8C580B5D18589105@qq.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
dd5cde457a powerpc/rtas: rtas_busy_delay_time() kernel-doc
Provide API documentation for rtas_busy_delay_time(), explaining why we
return the same value for 9900 and -2.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-3-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
38f7b7067d powerpc/rtas: rtas_busy_delay() improvements
Generally RTAS cannot block, and in PAPR it is required to return control
to the OS within a few tens of microseconds. In order to support operations
which may take longer to complete, many RTAS primitives can return
intermediate -2 ("busy") or 990x ("extended delay") values, which indicate
that the OS should reattempt the same call with the same arguments at some
point in the future.

Current versions of PAPR are less than clear about this, but the intended
meanings of these values in more detail are:

RTAS_BUSY (-2): RTAS has suspended a potentially long-running operation in
order to meet its latency obligation and give the OS the opportunity to
perform other work. RTAS can resume making progress as soon as the OS
reattempts the call.

RTAS_EXTENDED_DELAY_{MIN...MAX} (9900-9905): RTAS must wait for an external
event to occur or for internal contention to resolve before it can complete
the requested operation. The value encodes a non-binding hint as to roughly
how long the OS should wait before calling again, but the OS is allowed to
reattempt the call sooner or even immediately.

Linux of course must take its own CPU scheduling obligations into account
when handling these statuses; e.g. a task which receives an RTAS_BUSY
status should check whether to reschedule before it attempts the RTAS call
again to avoid starving other tasks.

rtas_busy_delay() is a helper function that "consumes" a busy or extended
delay status. Common usage:

    int rc;

    do {
        rc = rtas_call(rtas_token("some-function"), ...);
    } while (rtas_busy_delay(rc));

    /* convert rc to Linux error value, etc */

If rc is a busy or extended delay status, the caller can rely on
rtas_busy_delay() to perform an appropriate sleep or reschedule and return
nonzero. Other statuses are handled normally by the caller.

The current implementation of rtas_busy_delay() both oversleeps and
overuses the CPU:

*  It performs msleep() for all 990x and even when no delay is
   suggested (-2), but this is understood to actually sleep for two jiffies
   minimum in practice (20ms with HZ=100). 9900 (1ms) and 9901 (10ms)
   appear to be the most common extended delay statuses, and the
   oversleeping measurably lengthens DLPAR operations, which perform
   many RTAS calls.

*  It does not sleep on 990x unless need_resched() is true, causing code
   like the loop above to needlessly retry, wasting CPU time.

Alter the logic to align better with the intended meanings:

*  When passed RTAS_BUSY, perform cond_resched() and return without
   sleeping. The caller should reattempt immediately

*  Always sleep when passed an extended delay status, using usleep_range()
   for precise shorter sleeps. Limit the sleep time to one second even
   though there are higher architected values.

Change rtas_busy_delay()'s return type to bool to better reflect its usage,
and add kernel-doc.

rtas_busy_delay_time() is unchanged, even though it "incorrectly" returns 1
for RTAS_BUSY. There are users of that API with open-coded delay loops in
sensitive contexts that will have to be taken on an individual basis.

Brief results for addition and removal of 5GB memory on a small P9 PowerVM
partition follow. Load was generated with stress-ng --cpu N. For add,
elapsed time is greatly reduced without significant change in the number of
RTAS calls or time spent on CPU. For remove, elapsed time is modestly
reduced, with significant reductions in RTAS calls and time spent on CPU.

With no competing workload (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             1,935      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
-            609.99 msec task-clock                #    0.183 CPUs utilized            ( +-  0.19% )
+             1,956      probe:rtas_call           #    0.003 M/sec                    ( +-  0.17% )
+            618.56 msec task-clock                #    0.278 CPUs utilized            ( +-  0.11% )

-            3.3322 +- 0.0670 seconds time elapsed  ( +-  2.01% )
+            2.2222 +- 0.0416 seconds time elapsed  ( +-  1.87% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             6,224      probe:rtas_call           #    0.008 M/sec                    ( +-  2.57% )
-            750.36 msec task-clock                #    0.190 CPUs utilized            ( +-  2.01% )
+               843      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
+            250.66 msec task-clock                #    0.068 CPUs utilized            ( +-  0.17% )

-            3.9394 +- 0.0890 seconds time elapsed  ( +-  2.26% )
+             3.678 +- 0.113 seconds time elapsed  ( +-  3.07% )

With all CPUs 100% busy (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             2,979      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
-          1,096.62 msec task-clock                #    0.105 CPUs utilized            ( +-  0.10% )
+             2,981      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
+          1,095.26 msec task-clock                #    0.154 CPUs utilized            ( +-  0.21% )

-            10.476 +- 0.104 seconds time elapsed  ( +-  1.00% )
+            7.1124 +- 0.0865 seconds time elapsed  ( +-  1.22% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             2,702      probe:rtas_call           #    0.004 M/sec                    ( +-  4.00% )
-            722.71 msec task-clock                #    0.067 CPUs utilized            ( +-  2.41% )
+             1,246      probe:rtas_call           #    0.003 M/sec                    ( +-  0.25% )
+            487.73 msec task-clock                #    0.049 CPUs utilized            ( +-  0.20% )

-            10.829 +- 0.163 seconds time elapsed  ( +-  1.51% )
+            9.9887 +- 0.0866 seconds time elapsed  ( +-  0.87% )

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-2-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
22887f319a powerpc/pseries: delete scanlog
Remove the pseries scanlog driver.

This code supports functions from Power4-era servers that are not present
on targets currently supported by arch/powerpc. System manuals from this
time have this description:

  Scan Dump data is a set of chip data that the service processor gathers
  after a system malfunction. It consists of chip scan rings, chip trace
  arrays, and Scan COM (SCOM) registers. This data is stored in the
  scan-log partition of the system’s Nonvolatile Random Access
  Memory (NVRAM).

PowerVM partition firmware development doesn't recognize the associated
function call or property, and they don't see any references to them in
their codebase. It seems to have been specific to non-virtualized pseries.

References:

Historical Linux commit from February 2003 (interesting to note this seems
to be the source of non-GPL exports for rtas_call etc):
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=f92e361842d5251e50562b09664082dcbd0548bb

IntelliStation and pSeries docs which refer to the feature:
http://ps-2.retropc.se/basil.holloway/ALL%20PDF/380635.pdf
http://ps-2.kev009.com/rs6000/manuals/p/p615-6C3-6E3/6C3_and_6E3_Users_Guide_SA38-0629.pdf

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210920173203.1800475-1-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
53cadf7dee powerpc/rtas: kernel-doc fixes
Fix the following issues reported by kernel-doc:

$ scripts/kernel-doc -v -none arch/powerpc/kernel/rtas.c
arch/powerpc/kernel/rtas.c:810: info: Scanning doc for function rtas_activate_firmware
arch/powerpc/kernel/rtas.c:818: warning: contents before sections
arch/powerpc/kernel/rtas.c:841: info: Scanning doc for function rtas_call_reentrant
arch/powerpc/kernel/rtas.c:893: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Find a specific pseries error log in an RTAS extended event log.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211116215806.928235-1-nathanl@linux.ibm.com
2021-11-25 11:25:32 +11:00
Christophe Leroy
8b8a8f0ab3 powerpc/code-patching: Improve verification of patchability
Today, patch_instruction() assumes that it is called exclusively on
valid addresses, and only checks that it is not called on an init
address after init section has been freed.

Improve verification by calling kernel_text_address() instead.

kernel_text_address() already includes a verification of
initmem release.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bc683d499a411730504b132a924de0ccc2ef1f79.1636971137.git.christophe.leroy@csgroup.eu
2021-11-25 11:25:32 +11:00
Jason Wang
a3bcfc182b powerpc/tsi108: make EXPORT_SYMBOL follow its function immediately
EXPORT_SYMBOL(foo); should immediately follow its function/variable.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211114115616.493815-1-wangborong@cdjrlc.com
2021-11-25 11:25:32 +11:00
Hari Bathini
e919c0b232 bpf ppc32: Access only if addr is kernel address
With KUAP enabled, any kernel code which wants to access userspace
needs to be surrounded by disable-enable KUAP. But that is not
happening for BPF_PROBE_MEM load instruction. Though PPC32 does not
support read protection, considering the fact that PTR_TO_BTF_ID
(which uses BPF_PROBE_MEM mode) could either be a valid kernel pointer
or NULL but should never be a pointer to userspace address, execute
BPF_PROBE_MEM load only if addr is kernel address, otherwise set
dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-9-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Hari Bathini
23b51916ee bpf ppc32: Add BPF_PROBE_MEM support for JIT
BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains 3 instructions,
first 2 instructions clear dest_reg (lower & higher 32-bit registers)
and last instruction jumps to next instruction in the BPF code.
extable 'insn' field contains relative offset of the instruction and
'fixup' field contains relative offset of the fixup entry. Example
layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| lwz   r28,4(r4)  |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r24)   |
             | lwz  r4,4(r24)   |
             |                  |
             |                  |
             |------------------|
   0x4278 -->| li  r28,0        |  \
             | li  r27,0        |  | fixup entry
             | b   0x4024       |  /
   0x4284 -->| li  r4,0         |
             | li  r3,0         |
             | b   0x40b4       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffe4 |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffe8 |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-8-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Ravi Bangoria
9c70c7147f bpf ppc64: Access only if addr is kernel address
On PPC64 with KUAP enabled, any kernel code which wants to
access userspace needs to be surrounded by disable-enable KUAP.
But that is not happening for BPF_PROBE_MEM load instruction.
So, when BPF program tries to access invalid userspace address,
page-fault handler considers it as bad KUAP fault:

  Kernel attempted to read user page (d0000000) - exploit attempt? (uid: 0)

Considering the fact that PTR_TO_BTF_ID (which uses BPF_PROBE_MEM
mode) could either be a valid kernel pointer or NULL but should
never be a pointer to userspace address, execute BPF_PROBE_MEM load
only if addr is kernel address, otherwise set dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-7-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Ravi Bangoria
983bdc0245 bpf ppc64: Add BPF_PROBE_MEM support for JIT
BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains two instructions,
first instruction clears dest_reg and 2nd jumps to next instruction
in the BPF code. extable 'insn' field contains relative offset of
the instruction and 'fixup' field contains relative offset of the
fixup entry. Example layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| ld   r27,4(r3)   |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r4)    |
             |                  |
             |                  |
             |------------------|
   0x4280 -->| li  r27,0        |  \ fixup entry
             | b   0x4024       |  /
   0x4288 -->| li  r3,0         |
             | b   0x40b0       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffec |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffec |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-6-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Hari Bathini
f15a71b388 powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
Define and use PPC_RAW_BRANCH() macro instead of open coding it. This
macro is used while adding BPF_PROBE_MEM support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-5-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Hari Bathini
efa95f031b bpf powerpc: refactor JIT compiler code
Refactor powerpc LDX JITing code to simplify adding BPF_PROBE_MEM
support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-4-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Ravi Bangoria
04c04205bc bpf powerpc: Remove extra_pass from bpf_jit_build_body()
In case of extra_pass, usual JIT passes are always skipped. So,
extra_pass is always false while calling bpf_jit_build_body() and
can be removed.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-3-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Ravi Bangoria
c9ce7c36e4 bpf powerpc: Remove unused SEEN_STACK
SEEN_STACK is unused on PowerPC. Remove it. Also, have
SEEN_TAILCALL use 0x40000000.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-2-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Oliver O'Halloran
157616f3c2 powerpc/eeh: Use a goto for recovery failures
The EEH recovery logic in eeh_handle_normal_event() has some pretty strange
flow control. If we remove all the actual recovery logic we're left with
the following skeleton:

	if (result != PCI_ERS_RESULT_DISCONNECT) {
		...
	}

	if (result != PCI_ERS_RESULT_DISCONNECT) {
		...
	}

	if (result == PCI_ERS_RESULT_NONE) {
		...
	}

	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
		...
	}

	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
		...
	}

	if (result == PCI_ERS_RESULT_NEED_RESET) {
		...
	}

	if ((result == PCI_ERS_RESULT_RECOVERED) ||
	    (result == PCI_ERS_RESULT_NONE)) {
		...
		goto out;
	}

	/*
	 * unsuccessful recovery / PCI_ERS_RESULT_DISCONECTED
	 * handling is here.
	 */
	...

	out:
	...

Most of the "if () { ... }" blocks above change "result" to
PCI_ERS_RESULT_DISCONNECTED if an error occurs in that recovery step. This
makes the control flow a bit confusing since it breaks the early-exit
pattern that is generally used in Linux. In any case we end up handling the
error in the final else block so why not just jump there directly? Doing so
also allows us to de-indent a bunch of code.

No functional changes.

[dja: rebase on top of linux-next + my preceeding refactor,
      move clearing the EEH_DEV_NO_HANDLER bit above the first goto so that
      it is always clear in the error handler code as it was before.]

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015070628.1331635-2-dja@axtens.net
2021-11-25 11:25:31 +11:00
Daniel Axtens
10b34ece13 powerpc/eeh: Small refactor of eeh_handle_normal_event()
The control flow of eeh_handle_normal_event() is a bit tricky.

Break out one of the error handling paths - rather than be in an else
block, we'll make it part of the regular body of the function and put a
'goto out;' in the true limb of the if.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015070628.1331635-1-dja@axtens.net
2021-11-25 11:25:31 +11:00
Cédric Le Goater
1e7684dc4f powerpc/xive: Add a debugfs toggle for save-restore
On POWER10, the automatic "save & restore" of interrupt context is
always available. Provide a way to deactivate it for tests or
performance.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-11-clg@kaod.org
2021-11-25 11:25:31 +11:00
Cédric Le Goater
c21ee04f11 powerpc/xive: Add a kernel parameter for StoreEOI
StoreEOI is activated by default on platforms supporting the feature
(POWER10) and will be used as soon as firmware advertises its
availability. The kernel parameter provides a way to deactivate its
use. It can be still be reactivated through debugfs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-10-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
d7bc1e376c powerpc/xive: Add a debugfs toggle for StoreEOI
It can be used to deactivate temporarily StoreEOI for tests or
performance on platforms supporting the feature (POWER10)

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-9-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
08f3f61021 powerpc/xive: Add a debugfs file to dump EQs
The XIVE driver under Linux uses a single interrupt priority and only
one event queue is configured per CPU. Expose the contents under
a 'xive/eqs/cpuX' debugfs file.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-8-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
33e1d4a152 powerpc/xive: Rename the 'cpus' debugfs file to 'ipis'
and remove the EQ entries output which is not very useful since only
the next two events of the queue are taken into account. We will
improve the dump of the EQ in the next patches.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-7-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
baed14de78 powerpc/xive: Change the debugfs file 'xive' into a directory
Use a 'cpus' file to dump CPU states and 'interrupts' to dump IRQ states.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-6-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
412877dfae powerpc/xive: Introduce xive_core_debugfs_create()
and fix some compile issues when !CONFIG_DEBUG_FS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[mpe: Add empty stub to fix !CONFIG_DEBUG_FS build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-5-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
756c52c632 powerpc/xive: Activate StoreEOI on P10
StoreEOI (the capability to EOI with a store) requires load-after-store
ordering in some cases to be reliable. P10 introduced a new offset for
load operations to enforce correct ordering and the XIVE driver has
the required support since kernel 5.8, commit b1f9be9392
("powerpc/xive: Enforce load-after-store ordering when StoreEOI is active")

Since skiboot v7, StoreEOI support is advertised on P10 with a new flag
on the PowerNV platform. See skiboot commit 4bd7d84afe46 ("xive/p10:
Introduce a new OPAL_XIVE_IRQ_STORE_EOI2 flag"). When detected,
activate the feature.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-4-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
bd5b00c6cf powerpc/xive: Introduce an helper to print out interrupt characteristics
and extend output of debugfs and xmon with addresses of the ESB
management and trigger pages.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-3-clg@kaod.org
2021-11-25 11:25:29 +11:00
Cédric Le Goater
44b9c8ddcb powerpc/xive: Replace pr_devel() by pr_debug() to ease debug
These routines are not on hot code paths and pr_debug() is easier to
activate. Also add a '0x' prefix to hex printed values (HW IRQ number).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-2-clg@kaod.org
2021-11-25 11:25:29 +11:00
Nicholas Piggin
d02fa40d75 powerpc/powernv: Remove POWER9 PVR version check for entry and uaccess flushes
These aren't necessarily POWER9 only, and it's not to say some new
vulnerability may not get discovered on other processors for which
we would like the flexibility of having the workaround enabled by
firmware.

Remove the restriction that the workarounds only apply to POWER9.

However POWER7 and POWER8 are not affected, and they may not have
older firmware that does not advertise this, so clear these workarounds
manually.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
[mpe: Incorporate changes from Nick, reword comment slightly.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-5-npiggin@gmail.com
2021-11-25 11:25:29 +11:00
Julia Lawall
a1d2b210ff powerpc/btext: add missing of_node_put
for_each_node_by_type performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
local idexpression n;
expression e;
@@

 for_each_node_by_type(n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1448051604-25256-6-git-send-email-Julia.Lawall@lip6.fr
2021-11-25 11:25:29 +11:00
Julia Lawall
a841fd009e powerpc/cell: add missing of_node_put
for_each_node_by_name performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
expression e,e1;
local idexpression n;
@@

 for_each_node_by_name(n, e1) {
   ... when != of_node_put(n)
       when != e = n
(
   return n;
|
+  of_node_put(n);
?  return ...;
)
   ...
 }
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1448051604-25256-7-git-send-email-Julia.Lawall@lip6.fr
2021-11-25 11:25:29 +11:00
Julia Lawall
7d405a939c powerpc/powernv: add missing of_node_put
for_each_compatible_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
local idexpression n;
expression e;
@@

 for_each_compatible_node(n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1448051604-25256-4-git-send-email-Julia.Lawall@lip6.fr
2021-11-25 11:25:29 +11:00
Julia Lawall
f6e82647ff powerpc/6xx: add missing of_node_put
for_each_compatible_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
expression e;
local idexpression n;
@@

@@
local idexpression n;
expression e;
@@

 for_each_compatible_node(n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1448051604-25256-2-git-send-email-Julia.Lawall@lip6.fr
2021-11-25 11:25:29 +11:00
Nicholas Piggin
9c5a432a55 KVM: PPC: Book3S HV P9: Remove subcore HMI handling
On POWER9 and newer, rather than the complex HMI synchronisation and
subcore state, have each thread un-apply the guest TB offset before
calling into the early HMI handler.

This allows the subcore state to be avoided, including subcore enter
/ exit guest, which includes an expensive divide that shows up
slightly in profiles.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-54-npiggin@gmail.com
2021-11-24 21:09:03 +11:00
Nicholas Piggin
6398326b9b KVM: PPC: Book3S HV P9: Stop using vc->dpdes
The P9 path uses vc->dpdes only for msgsndp / SMT emulation. This adds
an ordering requirement between vcpu->doorbell_request and vc->dpdes for
no real benefit. Use vcpu->doorbell_request directly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-53-npiggin@gmail.com
2021-11-24 21:09:03 +11:00
Nicholas Piggin
617326ff01 KVM: PPC: Book3S HV P9: Tidy kvmppc_create_dtl_entry
This goes further to removing vcores from the P9 path. Also avoid the
memset in favour of explicitly initialising all fields.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-52-npiggin@gmail.com
2021-11-24 21:09:03 +11:00
Nicholas Piggin
ecb6a7207f KVM: PPC: Book3S HV P9: Remove most of the vcore logic
The P9 path always uses one vcpu per vcore, so none of the vcore, locks,
stolen time, blocking logic, shared waitq, etc., is required.

Remove most of it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-51-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
434398ab5e KVM: PPC: Book3S HV P9: Avoid cpu_in_guest atomics on entry and exit
cpu_in_guest is set to determine if a CPU needs to be IPI'ed to exit
the guest and notice the need_tlb_flush bit.

This can be implemented as a global per-CPU pointer to the currently
running guest instead of per-guest cpumasks, saving 2 atomics per
entry/exit. P7/8 doesn't require cpu_in_guest, nor does a nested HV
(only the L0 does), so move it to the P9 HV path.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-50-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
4c9a68914e KVM: PPC: Book3S HV P9: Add unlikely annotation for !mmu_ready
The mmu will almost always be ready.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-49-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
f08cbf5c7d KVM: PPC: Book3S HV P9: Avoid changing MSR[RI] in entry and exit
kvm_hstate.in_guest provides the equivalent of MSR[RI]=0 protection,
and it covers the existing MSR[RI]=0 section in late entry and early
exit, so clearing and setting MSR[RI] in those cases does not
actually do anything useful.

Remove the RI manipulation and replace it with comments. Make the
in_guest memory accesses a bit closer to a proper critical section
pattern. This speeds up guest entry/exit performance.

This also removes the MSR[RI] warnings which aren't very interesting
and would cause crashes if they hit due to causing an interrupt in
non-recoverable code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-48-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
241d1f19f0 KVM: PPC: Book3S HV P9: Optimise hash guest SLB saving
slbmfee/slbmfev instructions are very expensive, moreso than a regular
mfspr instruction, so minimising them significantly improves hash guest
exit performance. The slbmfev is only required if slbmfee found a valid
SLB entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-47-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
b49c65c5f9 KVM: PPC: Book3S HV P9: Improve mfmsr performance on entry
Rearrange the MSR saving on entry so it does not follow the mtmsrd to
disable interrupts, avoiding a possible RAW scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-46-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
46dea77f79 KVM: PPC: Book3S HV Nested: Avoid extra mftb() in nested entry
mftb() is expensive and one can be avoided on nested guest dispatch.

If the time checking code distinguishes between the L0 timer and the
nested HV timer, then both can be tested in the same place with the
same mftb() value.

This also nicely illustrates the relationship between the L0 and nested
HV timers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-45-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
d5c0e8332d KVM: PPC: Book3S HV P9: Avoid tlbsync sequence on radix guest exit
Use the existing TLB flushing logic to IPI the previous CPU and run the
necessary barriers before running a guest vCPU on a new physical CPU,
to do the necessary radix GTSE barriers for handling the case of an
interrupted guest tlbie sequence.

This requires the vCPU TLB flush sequence that is currently just done
on one thread, to be expanded to ensure the other threads execute a
ptesync, because causing them to exit the guest will no longer cause a
ptesync by itself.

This results in more IPIs than the TLB flush logic requires, but it's
a significant win for common case scheduling when the vCPU remains on
the same physical CPU.

This saves about 520 cycles (nearly 10%) on a guest entry+exit micro
benchmark on a POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-44-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
0ba0e5d5a6 KVM: PPC: Book3S HV: Split P8 from P9 path guest vCPU TLB flushing
This creates separate functions for old and new paths for vCPU TLB
flushing, which will reduce complexity of the next change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-43-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
a089a6869e KVM: PPC: Book3S HV P9: Don't restore PSSCR if not needed
This also moves the PSSCR update in nested entry to avoid a SPR
scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-42-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
9c75f65f35 KVM: PPC: Book3S HV P9: Test dawr_enabled() before saving host DAWR SPRs
Some of the DAWR SPR access is already predicated on dawr_enabled(),
apply this to the remainder of the accesses.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-41-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
cf3b16cfa6 KVM: PPC: Book3S HV P9: Comment and fix MMU context switching code
Tighten up partition switching code synchronisation and comments.

In particular, hwsync ; isync is required after the last access that is
performed in the context of a partition, before the partition is
switched away from.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-40-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
5236756d04 KVM: PPC: Book3S HV P9: Use Linux SPR save/restore to manage some host SPRs
Linux implements SPR save/restore including storage space for registers
in the task struct for process context switching. Make use of this
similarly to the way we make use of the context switching fp/vec save
restore.

This improves code reuse, allows some stack space to be saved, and helps
with avoiding VRSAVE updates if they are not required.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-39-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
022ecb960c KVM: PPC: Book3S HV P9: Demand fault TM facility registers
Use HFSCR facility disabling to implement demand faulting for TM, with
a hysteresis counter similar to the load_fp etc counters in context
switching that implement the equivalent demand faulting for userspace
facilities.

This speeds up guest entry/exit by avoiding the register save/restore
when a guest is not frequently using them. When a guest does use them
often, there will be some additional demand fault overhead, but these
are not commonly used facilities.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-38-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
a3e18ca8ab KVM: PPC: Book3S HV P9: Demand fault EBB facility registers
Use HFSCR facility disabling to implement demand faulting for EBB, with
a hysteresis counter similar to the load_fp etc counters in context
switching that implement the equivalent demand faulting for userspace
facilities.

This speeds up guest entry/exit by avoiding the register save/restore
when a guest is not frequently using them. When a guest does use them
often, there will be some additional demand fault overhead, but these
are not commonly used facilities.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-37-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
34e02d555d KVM: PPC: Book3S HV P9: More SPR speed improvements
This avoids more scoreboard stalls and reduces mtSPRs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-36-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
d55b1eccc7 KVM: PPC: Book3S HV P9: Restrict DSISR canary workaround to processors that require it
Use CPU_FTR_P9_RADIX_PREFETCH_BUG to apply the workaround, to test for
DD2.1 and below processors. This saves a mtSPR in guest entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-35-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
3e7b337902 KVM: PPC: Book3S HV P9: Switch PMU to guest as late as possible
This moves PMU switch to guest as late as possible in entry, and switch
back to host as early as possible at exit. This helps the host get the
most perf coverage of KVM entry/exit code as possible.

This is slightly suboptimal for SPR scheduling point of view when the
PMU is enabled, but when perf is disabled there is no real difference.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-34-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
3f9e2966d1 KVM: PPC: Book3S HV P9: Implement TM fastpath for guest entry/exit
If TM is not active, only TM register state needs to be saved and
restored, avoiding several mfmsr/mtmsrd instructions and improving
performance.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-33-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
d5f4801945 KVM: PPC: Book3S HV P9: Move remaining SPR and MSR access into low level entry
Move register saving and loading from kvmhv_p9_guest_entry() into the HV
and nested entry handlers.

Accesses are scheduled to reduce mtSPR / mfSPR interleaving which
reduces SPR scoreboard stalls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-32-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
08b3f08af5 KVM: PPC: Book3S HV P9: Move nested guest entry into its own function
Move the part of the guest entry which is specific to nested HV into its
own function. This is just refactoring.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-31-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
aabcaf6ae2 KVM: PPC: Book3S HV P9: Move host OS save/restore functions to built-in
Move the P9 guest/host register switching functions to the built-in
P9 entry code, and export it for nested to use as well.

This allows more flexibility in scheduling these supervisor privileged
SPR accesses with the HV privileged and PR SPR accesses in the low level
entry code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-30-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
516b334210 KVM: PPC: Book3S HV P9: Move vcpu register save/restore into functions
This should be no functional difference but makes the caller easier
to read.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-29-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
0f3b6c4851 KVM: PPC: Book3S HV P9: Juggle SPR switching around
This juggles SPR switching on the entry and exit sides to be more
symmetric, which makes the next refactoring patch possible with no
functional change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-28-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
9dfe7aa7bc KVM: PPC: Book3S HV P9: Only execute mtSPR if the value changed
Keep better track of the current SPR value in places where
they are to be loaded with a new context, to reduce expensive
mtSPR operations.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-27-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
9a1e530bbb KVM: PPC: Book3S HV P9: Avoid SPR scoreboard stalls
Avoid interleaving mfSPR and mtSPR to reduce SPR scoreboard stalls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-26-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
cb2553a093 KVM: PPC: Book3S HV P9: Optimise timebase reads
Reduce the number of mfTB executed by passing the current timebase
around entry and exit code rather than read it multiple times.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-25-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
6547af3eba KVM: PPC: Book3S HV P9: Move TB updates
Move the TB updates between saving and loading guest and host SPRs,
to improve scheduling by keeping issue-NTC operations together as
much as possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-24-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
3c1a4322bb KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase
Change dec_expires to be relative to the guest timebase, and allow
it to be moved into low level P9 guest entry functions, to improve
SPR access scheduling.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-23-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
cf99dedb4b KVM: PPC: Book3S HV P9: Add kvmppc_stop_thread to match kvmppc_start_thread
Small cleanup makes it a bit easier to match up entry and exit
operations.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-22-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
2251fbe763 KVM: PPC: Book3S HV P9: Improve mtmsrd scheduling by delaying MSR[EE] disable
Moving the mtmsrd after the host SPRs are saved and before the guest
SPRs start to be loaded can prevent an SPR scoreboard stall (because
the mtmsrd is L=1 type which does not cause context synchronisation.

This is also now more convenient to combined with the mtmsrd L=0
instruction to enable facilities just below, but that is not done yet.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-21-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
34e119c96b KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs
This reduces the number of mtmsrd required to enable facility bits when
saving/restoring registers, by having the KVM code set all bits up front
rather than using individual facility functions that set their particular
MSR bits.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-20-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
174a3ab633 KVM: PPC: Book3S HV P9: Move SPRG restore to restore_p9_host_os_sprs
Move the SPR update into its relevant helper function. This will
help with SPR scheduling improvements in later changes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-19-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
a1a19e1154 KVM: PPC: Book3S HV: CTRL SPR does not require read-modify-write
Processors that support KVM HV do not require read-modify-write of
the CTRL SPR to set/clear their thread's runlatch. Just write 1 or 0
to it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-18-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
b1adcf57ce KVM: PPC: Book3S HV P9: Factor out yield_count increment
Factor duplicated code into a helper function.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-17-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
9d3ddb86d9 KVM: PPC: Book3S HV P9: Demand fault PMU SPRs when marked not inuse
The pmcregs_in_use field in the guest VPA can not be trusted to reflect
what the guest is doing with PMU SPRs, so the PMU must always be managed
(stopped) when exiting the guest, and SPR values set when entering the
guest to ensure it can't cause a covert channel or otherwise cause other
guests or the host to misbehave.

So prevent guest access to the PMU with HFSCR[PM] if pmcregs_in_use is
clear, and avoid the PMU SPR access on every partition switch. Guests
that set pmcregs_in_use incorrectly or when first setting it and using
the PMU will take a hypervisor facility unavailable interrupt that will
bring in the PMU SPRs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-16-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
401e1ae372 KVM: PPC: Book3S HV P9: Factor PMU save/load into context switch functions
Rather than guest/host save/retsore functions, implement context switch
functions that take care of details like the VPA update for nested.

The reason to split these kind of helpers into explicit save/load
functions is mainly to schedule SPR access nicely, but PMU is a special
case where the load requires mtSPR (to stop counters) and other
difficulties, so there's less possibility to schedule those nicely. The
SPR accesses also have side-effects if the PMU is running, and in later
changes we keep the host PMU running as long as possible so this code
can be better profiled, which also complicates scheduling.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-15-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
57dc0eed73 KVM: PPC: Book3S HV P9: Implement PMU save/restore in C
Implement the P9 path PMU save/restore code in C, and remove the
POWER9/10 code from the P7/8 path assembly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-14-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
0a4b4327ce powerpc/64s: Implement PMU override command line option
It can be useful in simulators (with very constrained environments)
to allow some PMCs to run from boot so they can be sampled directly
by a test harness, rather than having to run perf.

A previous change freezes counters at boot by default, so provide
a boot time option to un-freeze (plus a bit more flexibility).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-13-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
245ebf8e73 powerpc/64s: Always set PMU control registers to frozen/disabled when not in use
KVM PMU management code looks for particular frozen/disabled bits in
the PMU registers so it knows whether it must clear them when coming
out of a guest or not. Setting this up helps KVM make these optimisations
without getting confused. Longer term the better approach might be to
move guest/host PMU switching to the perf subsystem.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-12-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
d3c8a2d374 KVM: PPC: Book3S HV: Don't always save PMU for guest capable of nesting
Provide a config option that controls the workaround added by commit
63279eeb7f ("KVM: PPC: Book3S HV: Always save guest pmu for guest
capable of nesting"). The option defaults to y for now, but is expected
to go away within a few releases.

Nested capable guests running with the earlier commit 1782663897
("KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest
SPRs are live") will now indicate the PMU in-use status of their guests,
which means the parent does not need to unconditionally save the PMU for
nested capable guests.

After this latest round of performance optimisations, this option costs
about 540 cycles or 10% entry/exit performance on a POWER9 nested-capable
guest.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
References: 1782663897 ("KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs are live")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-11-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
46f9caf1a2 powerpc/64s: Keep AMOR SPR a constant ~0 at runtime
This register controls supervisor SPR modifications, and as such is only
relevant for KVM. KVM always sets AMOR to ~0 on guest entry, and never
restores it coming back out to the host, so it can be kept constant and
avoid the mtSPR in KVM guest entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-10-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
eacc818864 KVM: PPC: Book3S HV: POWER10 enable HAIL when running radix guests
HV interrupts may be taken with the MMU enabled when radix guests are
running. Enable LPCR[HAIL] on ISA v3.1 processors for radix guests.
Make this depend on the host LPCR[HAIL] being enabled. Currently that is
always enabled, but having this test means any issue that might require
LPCR[HAIL] to be disabled in the host will not have to be duplicated in
KVM.

This optimisation takes 1380 cycles off a NULL hcall entry+exit micro
benchmark on a POWER10.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-9-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
25aa145856 powerpc/time: add API for KVM to re-arm the host timer/decrementer
Rather than have KVM look up the host timer and fiddle with the
irq-work internal details, have the powerpc/time.c code provide a
function for KVM to re-arm the Linux timer code when exiting a
guest.

This is implementation has an improvement over existing code of
marking a decrementer interrupt as soft-pending if a timer has
expired, rather than setting DEC to a -ve value, which tended to
cause host timers to take two interrupts (first hdec to exit the
guest, then the immediate dec).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-8-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
34bf08a207 KVM: PPC: Book3S HV P9: Reduce mftb per guest entry/exit
mftb is serialising (dispatch next-to-complete) so it is heavy weight
for a mfspr. Avoid reading it multiple times in the entry or exit paths.
A small number of cycles delay to timers is tolerable.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-7-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
9581991a60 KVM: PPC: Book3S HV P9: Use large decrementer for HDEC
On processors that don't suppress the HDEC exceptions when LPCR[HDICE]=0,
this could help reduce needless guest exits due to leftover exceptions on
entering the guest.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-6-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
4ebbd075bc KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer read
There is no need to save away the host DEC value, as it is derived
from the host timer subsystem which maintains the next timer time,
so it can be restored from there.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-5-npiggin@gmail.com
2021-11-24 21:08:56 +11:00
Nicholas Piggin
5955c7469a KMV: PPC: Book3S HV P9: Use set_dec to set decrementer to host
The host Linux timer code arms the decrementer with the value
'decrementers_next_tb - current_tb' using set_dec(), which stores
val - 1 on Book3S-64, which is not quite the same as what KVM does
to re-arm the host decrementer when exiting the guest.

This shouldn't be a significant change, but it makes the logic match
and avoids this small extra change being brought into the next patch.

Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-4-npiggin@gmail.com
2021-11-24 21:08:56 +11:00
Nicholas Piggin
736df58fd5 powerpc/64s: guard optional TIDR SPR with CPU ftr test
The TIDR SPR only exists on POWER9. Avoid accessing it when the
feature bit for it is not set.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-3-npiggin@gmail.com
2021-11-24 21:08:56 +11:00
Nicholas Piggin
f53884b1bf powerpc/64s: Remove WORT SPR from POWER9/10 (take 2)
This removes a missed remnant of the WORT SPR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-2-npiggin@gmail.com
2021-11-24 21:08:56 +11:00
Christophe Leroy
5bb60ea611 powerpc/32: Fix hardlockup on vmap stack overflow
Since the commit c118c7303a ("powerpc/32: Fix vmap stack - Do not
activate MMU before reading task struct") a vmap stack overflow
results in a hard lockup. This is because emergency_ctx is still
addressed with its virtual address allthough data MMU is not active
anymore at that time.

Fix it by using a physical address instead.

Fixes: c118c7303a ("powerpc/32: Fix vmap stack - Do not activate MMU before reading task struct")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ce30364fb7ccda489272af4a1612b6aa147e1d23.1637227521.git.christophe.leroy@csgroup.eu
2021-11-24 21:00:51 +11:00
Nicholas Piggin
cf0b0e3712 KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
The POWER9 ERAT flush instruction is a SLBIA with IH=7, which is a
reserved value on POWER7/8. On POWER8 this invalidates the SLB entries
above index 0, similarly to SLBIA IH=0.

If the SLB entries are invalidated, and then the guest is bypassed, the
host SLB does not get re-loaded, so the bolted entries above 0 will be
lost. This can result in kernel stack access causing a SLB fault.

Kernel stack access causing a SLB fault was responsible for the infamous
mega bug (search "Fix SLB reload bug"). Although since commit
48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C") that
starts using the kernel stack in the SLB miss handler, it might only
result in an infinite loop of SLB faults. In any case it's a bug.

Fix this by only executing the instruction on >= POWER9 where IH=7 is
defined not to invalidate the SLB. POWER7/8 don't require this ERAT
flush.

Fixes: 5008711259 ("KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119031627.577853-1-npiggin@gmail.com
2021-11-24 21:00:36 +11:00
Linus Torvalds
75603b14ed powerpc fixes for 5.16 #2
Fix a bug in copying of sigset_t for 32-bit systems, which caused X to not start.
 
 Fix handling of shared LSIs (rare) with the xive interrupt controller (Power9/10).
 
 Fix missing TOC setup in some KVM code, which could result in oopses depending on kernel
 data layout.
 
 Fix DMA mapping when we have persistent memory and only one DMA window available.
 
 Fix further problems with STRICT_KERNEL_RWX on 8xx, exposed by a recent fix.
 
 A couple of other minor fixes.
 
 Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Cédric Le Goater, Christian Zigotzky,
 Christophe Leroy, Daniel Axtens, Finn Thain, Greg Kurz, Masahiro Yamada, Nicholas Piggin,
 Uwe Kleine-König.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGZzGMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBrRD/4qE1A3+nXe+uZRJM3H5F8C/Ui2I/1G
 JPekyfW9aZklsv8SMlz8BotDTlK8vNwiEtkAuwqLOfPXPi1p/Y1do4sPtXAjUpuX
 mXZP3G9K2xXmALLedXMjJNO6YJjTT5LE7OT42QziSfY1ScS7iqfGNANg1zRjkCRW
 yf2cpBbMRnWdDhCgWyE/V/V4xdPyOTTnnWn3d4F3qNshV0luKgTJl/9yo0OmQrGe
 /T4Cw8jG5p+pSblNyFaACnYlKWF4bYTQIl5NWsvJY0A2cg3I5ah6+hexdGRN/JdI
 K3PWpJ8rx5RjICkTFE4cADI6xIF1bHhjMh3ytcaMH5USBMmW3fTUUfcFwjRkRDHa
 b8Z6V631mgK1v3L0RlrAn+PZ9R212wpupvQT6YOf4pFb5+BzOyaCQCzyQv+BnwoI
 Fwran0HEO6NUODq4off9MADEpNTjwhV2mDFojxiCJ9eb1oCIilLbs8BOUWRSHYe0
 1S22pdj9XSR7yxXt5DnjQBwhR47ZS7D3jXf9gjbmJ/qn6cRPAFzt/m/woSY2Vv7T
 UrZVjz5lb+skjij7vxw+L9jUIwLBd99cvBiHzJpWUNc0RTQeBlAh4QBK/1MNixCP
 93LTN7tsRdGknLRTJ5yfRhEhwuhTTH8SEPp3H+qOZj9sXwq3Bftl4Nm40AgoATHO
 G4kPlgrCMQBcRQ==
 =Ss4y
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc fixes from Michael Ellerman:

 - Fix a bug in copying of sigset_t for 32-bit systems, which caused X
   to not start.

 - Fix handling of shared LSIs (rare) with the xive interrupt controller
   (Power9/10).

 - Fix missing TOC setup in some KVM code, which could result in oopses
   depending on kernel data layout.

 - Fix DMA mapping when we have persistent memory and only one DMA
   window available.

 - Fix further problems with STRICT_KERNEL_RWX on 8xx, exposed by a
   recent fix.

 - A couple of other minor fixes.

Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Cédric Le Goater,
Christian Zigotzky, Christophe Leroy, Daniel Axtens, Finn Thain, Greg
Kurz, Masahiro Yamada, Nicholas Piggin, and Uwe Kleine-König.

* tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/xive: Change IRQ domain to a tree domain
  powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
  powerpc/signal32: Fix sigset_t copy
  powerpc/book3e: Fix TLBCAM preset at boot
  powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window
  powerpc/pseries/ddw: simplify enable_ddw()
  powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
  powerpc/pseries: Fix numa FORM2 parsing fallback code
  powerpc/pseries: rename numa_dist_table to form2_distances
  powerpc: clean vdso32 and vdso64 directories
  powerpc/83xx/mpc8349emitx: Drop unused variable
  KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
2021-11-21 10:26:35 -08:00
Linus Torvalds
7af959b5d5 Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit-vs-signal handling fixes from Eric Biederman:
 "This is a small set of changes where debuggers were no longer able to
  intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit
  cleanups.

  This is essentially the change you suggested with all of i's dotted
  and the t's crossed so that ptrace can intercept all of the cases it
  has been able to intercept the past, and all of the cases that made it
  to exit without giving ptrace a chance still don't give ptrace a
  chance"

* 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Replace force_fatal_sig with force_exit_sig when in doubt
  signal: Don't always set SA_IMMUTABLE for forced signals
2021-11-19 11:33:31 -08:00
Eric W. Biederman
fcb116bc43 signal: Replace force_fatal_sig with force_exit_sig when in doubt
Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.

Unfortunately this broke debuggers[1][2] which reasonably expect
to be able to trap synchronous SIGTRAP and SIGSEGV even when
the target process is not configured to handle those signals.

Add force_exit_sig and use it instead of force_fatal_sig where
historically the code has directly called do_exit.  This has the
implementation benefits of going through the signal exit path
(including generating core dumps) without the danger of allowing
userspace to ignore or change these signals.

This avoids userspace regressions as older kernels exited with do_exit
which debuggers also can not intercept.

In the future is should be possible to improve the quality of
implementation of the kernel by changing some of these force_exit_sig
calls to force_fatal_sig.  That can be done where it matters on
a case-by-case basis with careful analysis.

Reported-by: Kyle Huey <me@kylehuey.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020
Fixes: 00b06da29c ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed")
Fixes: a3616a3c02 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die")
Fixes: 83a1f27ad7 ("signal/powerpc: On swapcontext failure force SIGSEGV")
Fixes: 9bc508cf07 ("signal/s390: Use force_sigsegv in default_trap_handler")
Fixes: 086ec444f8 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig")
Fixes: c317d306d5 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails")
Fixes: 695dd0d634 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit")
Fixes: 1fbd60df8a ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.")
Fixes: 941edc5bf1 ("exit/syscall_user_dispatch: Send ordinary signals on failure")
Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-19 09:15:58 -06:00
Linus Torvalds
c46e8ece96 Selftest changes:
* Cleanups for the perf test infrastructure and mapping hugepages
 
 * Avoid contention on mmap_sem when the guests start to run
 
 * Add event channel upcall support to xen_shinfo_test
 
 x86 changes:
 
 * Fixes for Xen emulation
 
 * Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
 
 * Fixes for migration of 32-bit nested guests on 64-bit hypervisor
 
 * Compilation fixes
 
 * More SEV cleanups
 
 Generic:
 
 * Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
 and num_online_cpus().  Most architectures were only using one of the two.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGV/PAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMrogf/eAyilGRQL7lLETn3DTVlgLVv82+z
 giX11HlUhUmATHIDluj/wVQUjVcY6AO4SnvFaudX7B+mibndkw4L19IubP/koQZu
 xnKSJTn+mVANdzz3UdsHl0ujbPdQJaFCIPW6iewbn2GRRZMwA5F3vMK/H09XRApL
 I7kq8CPA6sC0I3TPzPN3ROxigexzYunZmGQ4qQe0GUdtxHrJOYQN++ddmWbQoEIC
 gdFTyF7CUQ+lmJe0b/Y88yhISFAJCEBuKFlg9tOTuxSfwvPX6lUu+pi+utEx9M+O
 ckTSQli/apZ4RVcSzxMIwX/BciYqhqOz5uMG+w4DRlJixtGSHtjiEVxGxw==
 =Iij4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Selftest changes:

   - Cleanups for the perf test infrastructure and mapping hugepages

   - Avoid contention on mmap_sem when the guests start to run

   - Add event channel upcall support to xen_shinfo_test

  x86 changes:

   - Fixes for Xen emulation

   - Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache

   - Fixes for migration of 32-bit nested guests on 64-bit hypervisor

   - Compilation fixes

   - More SEV cleanups

  Generic:

   - Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
     and num_online_cpus(). Most architectures were only using one of
     the two"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
  KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
  KVM: x86: Assume a 64-bit hypercall for guests with protected state
  selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
  riscv: kvm: fix non-kernel-doc comment block
  KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
  KVM: SEV: Drop a redundant setting of sev->asid during initialization
  KVM: SEV: WARN if SEV-ES is marked active but SEV is not
  KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
  KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
  KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
  KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
  KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
  KVM: x86/xen: Use sizeof_field() instead of open-coding it
  KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
  KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
  ...
2021-11-18 12:05:22 -08:00
Linus Torvalds
7d5775d49e printk fixup for 5.16
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmGWF6YACgkQUqAMR0iA
 lPJJ+RAAm9pi/EElKKl+lOlBl+ehJlKuNLnPQWFmmaRc9xd0ruUipp1nsaktLJ8f
 R/PkSwR/YWpBWlF8P4o+x9sOFyTNyLasoHtqsinEcAJI4lb7d1KOrPliTXyr15Ai
 A303djwJmwCw5KxAPOjkG/nMBlpMIAQRee9GDWs1ykfSlIsI4jp7isVbCFNCQNVF
 auHYq1bfJ5MJYPjxIDZUt+NF7kg7dD4k4g+VCVjaH1u8pGeaCUCtnNjMFOk1XfU8
 yFQnaDcrAu4zJPq3d74z4eN9Bk+su8+DhnfrAEFjuFxGTgYc2MyRt0gGFeiUtNs4
 rvST/eHBO4zeuL18S8G+fLcig/9ZYE73xzjdOCzRzLDjn0VQr9t06ez1QqJOb4D6
 A4SSufwek5NIqYKMlhV/az2EceQYK8Wv3KAz8w98KDfwvVVhUSgE23MbTCO0hvQU
 PWF35d3hQ+9oH0ZGYRumb8OpXtKJ+2KmzyN8Z0xhivHFBIKlcW6IBGhWRANclJO8
 jNAM3jiwi8fRDVM2wI1fmgeEmMhG+WuTI3dJVu3tu4vI923FW5GdY6ev5EvH0Ts0
 khTwIjtmCHUJGSeWajy3Gi9irdyhPyPNRMqgal4GS+gGpVU2mMMKTG+NzxxtCRKR
 BUgfCjFDoDJWrNWIzzOwNqgF0Y+V9GgCZOkb73u/y+xVx0Rmc6U=
 =wbBy
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fixes from Petr Mladek:

 - Try to flush backtraces from other CPUs also on the local one. This
   was a regression caused by printk_safe buffers removal.

 - Remove header dependency warning.

* tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: Remove printk.h inclusion in percpu.h
  printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
2021-11-18 10:50:45 -08:00
Petr Mladek
bf6d0d1e1a Merge branch 'rework/printk_safe-removal' into for-linus 2021-11-18 10:03:47 +01:00
Vitaly Kuznetsov
b7915d55b1 KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:12:14 -05:00
Cédric Le Goater
8e80a73fa9 powerpc/xive: Change IRQ domain to a tree domain
Commit 4f86a06e2d ("irqdomain: Make normal and nomap irqdomains
exclusive") introduced an IRQ_DOMAIN_FLAG_NO_MAP flag to isolate the
'nomap' domains still in use under the powerpc arch. With this new
flag, the revmap_tree of the IRQ domain is not used anymore. This
change broke the support of shared LSIs [1] in the XIVE driver because
it was relying on a lookup in the revmap_tree to query previously
mapped interrupts. Linux now creates two distinct IRQ mappings on the
same HW IRQ which can lead to unexpected behavior in the drivers.

The XIVE IRQ domain is not a direct mapping domain and its HW IRQ
interrupt number space is rather large : 1M/socket on POWER9 and
POWER10, change the XIVE driver to use a 'tree' domain type instead.

[1] For instance, a linux KVM guest with virtio-rng and virtio-balloon
    devices.

Fixes: 4f86a06e2d ("irqdomain: Make normal and nomap irqdomains exclusive")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211116134022.420412-1-clg@kaod.org
2021-11-17 21:55:42 +11:00
Tiezhu Yang
ebf7f6f0a6 bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33
In the current code, the actual max tail call count is 33 which is greater
than MAX_TAIL_CALL_CNT (defined as 32). The actual limit is not consistent
with the meaning of MAX_TAIL_CALL_CNT and thus confusing at first glance.
We can see the historical evolution from commit 04fd61ab36 ("bpf: allow
bpf programs to tail-call other bpf programs") and commit f9dabe016b
("bpf: Undo off-by-one in interpreter tail call count limit"). In order
to avoid changing existing behavior, the actual limit is 33 now, this is
reasonable.

After commit 874be05f52 ("bpf, tests: Add tail call test suite"), we can
see there exists failed testcase.

On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
 # echo 0 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:0 ret 34 != 33 FAIL

On some archs:
 # echo 1 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:1 ret 34 != 33 FAIL

Although the above failed testcase has been fixed in commit 18935a72eb
("bpf/tests: Fix error in tail call limit tests"), it would still be good
to change the value of MAX_TAIL_CALL_CNT from 32 to 33 to make the code
more readable.

The 32-bit x86 JIT was using a limit of 32, just fix the wrong comments and
limit to 33 tail calls as the constant MAX_TAIL_CALL_CNT updated. For the
mips64 JIT, use "ori" instead of "addiu" as suggested by Johan Almbladh.
For the riscv JIT, use RV_REG_TCC directly to save one register move as
suggested by Björn Töpel. For the other implementations, no function changes,
it does not change the current limit 33, the new value of MAX_TAIL_CALL_CNT
can reflect the actual max tail call count, the related tail call testcases
in test_bpf module and selftests can work well for the interpreter and the
JIT.

Here are the test results on x86_64:

 # uname -m
 x86_64
 # echo 0 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf test_suite=test_tail_calls
 # dmesg | tail -1
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
 # rmmod test_bpf
 # echo 1 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf test_suite=test_tail_calls
 # dmesg | tail -1
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
 # rmmod test_bpf
 # ./test_progs -t tailcalls
 #142 tailcalls:OK
 Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/1636075800-3264-1-git-send-email-yangtiezhu@loongson.cn
2021-11-16 14:03:15 +01:00
Christophe Leroy
1e35eba405 powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
As spotted and explained in commit c12ab8dbc4 ("powerpc/8xx: Fix
Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST"), the selection
of STRICT_KERNEL_RWX without selecting DEBUG_RODATA_TEST has spotted
the lack of the DIRTY bit in the pinned kernel data TLBs.

This problem should have been detected a lot earlier if things had
been working as expected. But due to an incredible level of chance or
mishap, this went undetected because of a set of bugs: In fact the
DTLBs were not pinned, because instead of setting the reserve bit
in MD_CTR, it was set in MI_CTR that is the register for ITLBs.

But then, another huge bug was there: the physical address was
reset to 0 at the boundary between RO and RW areas, leading to the
same physical space being mapped at both 0xc0000000 and 0xc8000000.
This had by miracle no consequence until now because the entry was
not really pinned so it was overwritten soon enough to go undetected.

Of course, now that we really pin the DTLBs, it must be fixed as well.

Fixes: f76c8f6d25 ("powerpc/8xx: Add function to set pinned TLBs")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Depends-on: c12ab8dbc4 ("powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a21e9a057fe2d247a535aff0d157a54eefee017a.1636963688.git.christophe.leroy@csgroup.eu
2021-11-16 21:37:10 +11:00
Christophe Leroy
5499802b22 powerpc/signal32: Fix sigset_t copy
The conversion from __copy_from_user() to __get_user() by
commit d3ccc97815 ("powerpc/signal: Use __get_user() to copy
sigset_t") introduced a regression in __get_user_sigset() for
powerpc/32. The bug was subsequently moved into
unsafe_get_user_sigset().

The bug is due to the copied 64 bit value being truncated to
32 bits while being assigned to dst->sig[0]

The regression was reported by users of the Xorg packages distributed in
Debian/powerpc --

    "The symptoms are that the fb screen goes blank, with the backlight
    remaining on and no errors logged in /var/log; wdm (or startx) run
    with no effect (I tried logging in in the blind, with no effect).
    And they are hard to kill, requiring 'kill -KILL ...'"

Fix the regression by copying each word of the sigset, not only the
first one.

__get_user_sigset() was tentatively optimised to copy 64 bits at once
in order to minimise KUAP unlock/lock impact, but the unsafe variant
doesn't suffer that, so it can just copy words.

Fixes: 887f3ceb51 ("powerpc/signal32: Convert do_setcontext[_tm]() to user access block")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Finn Thain <fthain@linux-m68k.org>
Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/99ef38d61c0eb3f79c68942deb0c35995a93a777.1636966353.git.christophe.leroy@csgroup.eu
2021-11-16 21:24:16 +11:00
Christophe Leroy
5b54860943 powerpc/book3e: Fix TLBCAM preset at boot
Commit 52bda69ae8 ("powerpc/fsl_booke: Tell map_mem_in_cams() if
init is done") was supposed to just add an additional parameter to
map_mem_in_cams() and always set it to 'true' at that time.

But a few call sites were messed up. Fix them.

Fixes: 52bda69ae8 ("powerpc/fsl_booke: Tell map_mem_in_cams() if init is done")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d319f2a9367d4d08fd2154e506101bd5f100feeb.1636967119.git.christophe.leroy@csgroup.eu
2021-11-16 21:20:59 +11:00
Alexey Kardashevskiy
ad3976025b powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window
There is a possibility of having just one DMA window available with
a limited capacity which the existing code does not handle that well.
If the window is big enough for the system RAM but less than
MAX_PHYSMEM_BITS (which we want when persistent memory is present),
we create 1:1 window and leave persistent memory without DMA.

This disables 1:1 mapping entirely if there is persistent memory and
either:
- the huge DMA window does not cover the entire address space;
- the default DMA window is removed.

This relies on reverted 54fc3c681d
("powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory")
to return the actual amount RAM in ddw_memory_hotplug_max() (posted
separately).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211108040320.3857636-4-aik@ozlabs.ru
2021-11-15 15:46:46 +11:00
Alexey Kardashevskiy
fb4ee2b30c powerpc/pseries/ddw: simplify enable_ddw()
This drops rather useless ddw_enabled flag as direct_mapping implies
it anyway.

While at this, fix indents in enable_ddw().

This should not cause any behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211108040320.3857636-3-aik@ozlabs.ru
2021-11-15 15:46:46 +11:00
Alexey Kardashevskiy
2d33f55044 powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
This reverts commit 54fc3c681d
which does not allow 1:1 mapping even for the system RAM which
is usually possible.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211108040320.3857636-2-aik@ozlabs.ru
2021-11-15 15:46:46 +11:00
Nicholas Piggin
302039466f powerpc/pseries: Fix numa FORM2 parsing fallback code
In case the FORM2 distance table from firmware is not the expected size,
there is fallback code that just populates the lookup table as local vs
remote.

However it then continues on to use the distance table. Fix.

Fixes: 1c6b5a7e74 ("powerpc/pseries: Add support for FORM2 associativity")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211109064900.2041386-2-npiggin@gmail.com
2021-11-15 15:46:46 +11:00
Nicholas Piggin
0bd81274e3 powerpc/pseries: rename numa_dist_table to form2_distances
The name of the local variable holding the "form2" property address
conflicts with the numa_distance_table global.

This patch does 's/numa_dist_table/form2_distances/g' over the function,
which also renames numa_dist_table_length to form2_distances_length.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211109064900.2041386-1-npiggin@gmail.com
2021-11-15 15:46:46 +11:00
Masahiro Yamada
964c33cd0b powerpc: clean vdso32 and vdso64 directories
Since commit bce74491c3 ("powerpc/vdso: fix unnecessary rebuilds of
vgettimeofday.o"), "make ARCH=powerpc clean" does not clean up the
arch/powerpc/kernel/{vdso32,vdso64} directories.

Use the subdir- trick to let "make clean" descend into them.

Fixes: bce74491c3 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211109185015.615517-1-masahiroy@kernel.org
2021-11-15 15:46:45 +11:00
Uwe Kleine-König
2da516d7ed powerpc/83xx/mpc8349emitx: Drop unused variable
Commit 5d354dc35e ("powerpc/83xx/mpc8349emitx: Make
mcu_gpiochip_remove() return void") removed the usage of the variable
ret, but failed to remove the variable itself, resulting in:

	arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c: In function ‘mcu_remove’:
	arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c:189:6: error: unused variable ‘ret’ [-Werror=unused-variable]
	  189 |  int ret;
	      |      ^~~

So remove the variable now.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110110739.1072634-1-u.kleine-koenig@pengutronix.de
2021-11-15 15:46:45 +11:00
Michael Ellerman
dae5818646 KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
kvmppc_h_set_dabr(), and kvmppc_h_set_xdabr() which jumps into
it, need to use _GLOBAL_TOC to setup the kernel TOC pointer, because
kvmppc_h_set_dabr() uses LOAD_REG_ADDR() to load dawr_force_enable.

When called from hcall_try_real_mode() we have the kernel TOC in r2,
established near the start of kvmppc_interrupt_hv(), so there is no
issue.

But they can also be called from kvmppc_pseries_do_hcall() which is
module code, so the access ends up happening with the kvm-hv module's
r2, which will not point at dawr_force_enable and could even cause a
fault.

With the current code layout and compilers we haven't observed a fault
in practice, the load hits somewhere in kvm-hv.ko and silently returns
some bogus value.

Note that we we expect p8/p9 guests to use the DAWR, but SLOF uses
h_set_dabr() to test if sc1 works correctly, see SLOF's
lib/libhvcall/brokensc1.c.

Fixes: c1fe190c06 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Link: https://lore.kernel.org/r/20210923151031.72408-1-mpe@ellerman.id.au
2021-11-15 15:46:45 +11:00
Alistair Popple
ab09243aa9 mm/migrate.c: remove MIGRATE_PFN_LOCKED
MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect().  If it
wasn't then the a second attempt is made to lock the page.  However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity.  So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.

Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.

Link: https://lkml.kernel.org/r/20211025041608.289017-1-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-11 09:34:35 -08:00
Linus Torvalds
5147da902e Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit cleanups from Eric Biederman:
 "While looking at some issues related to the exit path in the kernel I
  found several instances where the code is not using the existing
  abstractions properly.

  This set of changes introduces force_fatal_sig a way of sending a
  signal and not allowing it to be caught, and corrects the misuse of
  the existing abstractions that I found.

  A lot of the misuse of the existing abstractions are silly things such
  as doing something after calling a no return function, rolling BUG by
  hand, doing more work than necessary to terminate a kernel thread, or
  calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).

  In the review a deficiency in force_fatal_sig and force_sig_seccomp
  where ptrace or sigaction could prevent the delivery of the signal was
  found. I have added a change that adds SA_IMMUTABLE to change that
  makes it impossible to interrupt the delivery of those signals, and
  allows backporting to fix force_sig_seccomp

  And Arnd found an issue where a function passed to kthread_run had the
  wrong prototype, and after my cleanup was failing to build."

* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
  soc: ti: fix wkup_m3_rproc_boot_thread return type
  signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed
  signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
  exit/r8188eu: Replace the macro thread_exit with a simple return 0
  exit/rtl8712: Replace the macro thread_exit with a simple return 0
  exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
  signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
  signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
  signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
  exit/syscall_user_dispatch: Send ordinary signals on failure
  signal: Implement force_fatal_sig
  exit/kthread: Have kernel threads return instead of calling do_exit
  signal/s390: Use force_sigsegv in default_trap_handler
  signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  signal/sparc: In setup_tsb_params convert open coded BUG into BUG
  signal/powerpc: On swapcontext failure force SIGSEGV
  signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
  signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
  signal/sparc32: Remove unreachable do_exit in do_sparc_fault
  ...
2021-11-10 16:15:54 -08:00
Linus Torvalds
e8f023caee asm-generic: asm/syscall.h cleanup
This is a single cleanup from Peter Collingbourne, removing
 some dead code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGKm+kACgkQmmx57+YA
 GNkksg/7BUxJrWFrQmLA3fhzh4wG3KswdrKTGQMf0jRVI1n77vmdfig3hEJmMekH
 0SZoIYmPztOSj34+6p4NxuqY/Sk62oYLr8Awo8ZLDhIBWNUJE8UWC07Qb/3DpGbp
 JfD7/8mXu10htgM85aGlhaVLnHvvqUBR8PlkJUGNxuY5Gy2L+eCkpwCAYlZpSOqr
 vs0BJWFY1LzO1POcjIq4/IM0PvcU2ncLB5XoxDjjIIWnWKyHzWY21ZvoeaNumvou
 xqQ/Hj8Sc+ufS0yNlSgIC+bJP0bp1bSw/dALKr8oYxLt7X9LELVY3WXyRH+It4nS
 b6HPYmga26NVq/u7RrylBA+2fRCDB8E6z73gHt4SeHrRDEvFjhNzIyV+aXPae/dY
 XI/pjiwpG6k6FpSnF69YZJ/Y+GmUA90V/Jq8aLFZhGz8SgpjRl+2foEUSDhRVXCA
 jGB1Y388m0e6jPlVJROB7ORzXMd8K5iciyUGqtAI87QCOtPozn10ruh5RdziguKm
 kDW5IKy9E4l1ch8WRprVbgV5Ew+QWKS1JIbyjDaX3jN0lUPCqgwkwvRxxtgdFnVA
 Lq5BiUdraSWFUr84rLU3gCRU0+VoEdyZYI+bQGGNlQ4ovmLYU1nLmCU/azMvSEgz
 ZsoM5YdffShxUtfwMg+W67RpYOjSnV55ZzwN0+d1C2oqz9xECXc=
 =8EFP
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanup from Arnd Bergmann:
 "This is a single cleanup from Peter Collingbourne, removing some dead
  code"

* tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: remove unused function syscall_set_arguments()
2021-11-10 11:22:03 -08:00
Nicholas Piggin
5d5e4522a7 printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
printk from NMI context relies on irq work being raised on the local CPU
to print to console. This can be a problem if the NMI was raised by a
lockup detector to print lockup stack and regs, because the CPU may not
enable irqs (because it is locked up).

Introduce printk_trigger_flush() that can be called another CPU to try
to get those messages to the console, call that where printk_safe_flush
was previously called.

Fixes: 93d102f094 ("printk: remove safe buffers")
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211107045116.1754411-1-npiggin@gmail.com
2021-11-10 16:12:00 +01:00
Linus Torvalds
59a2ceeef6 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "87 patches.

  Subsystems affected by this patch series: mm (pagecache and hugetlb),
  procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs,
  init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork,
  sysvfs, kcov, gdb, resource, selftests, and ipc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (87 commits)
  ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL
  ipc: check checkpoint_restore_ns_capable() to modify C/R proc files
  selftests/kselftest/runner/run_one(): allow running non-executable files
  virtio-mem: disallow mapping virtio-mem memory via /dev/mem
  kernel/resource: disallow access to exclusive system RAM regions
  kernel/resource: clean up and optimize iomem_is_exclusive()
  scripts/gdb: handle split debug for vmlinux
  kcov: replace local_irq_save() with a local_lock_t
  kcov: avoid enable+disable interrupts if !in_task()
  kcov: allocate per-CPU memory on the relevant node
  Documentation/kcov: define `ip' in the example
  Documentation/kcov: include types.h in the example
  sysv: use BUILD_BUG_ON instead of runtime check
  kernel/fork.c: unshare(): use swap() to make code cleaner
  seq_file: fix passing wrong private data
  seq_file: move seq_escape() to a header
  signal: remove duplicate include in signal.h
  crash_dump: remove duplicate include in crash_dump.h
  crash_dump: fix boolreturn.cocci warning
  hfs/hfsplus: use WARN_ON for sanity check
  ...
2021-11-09 10:11:53 -08:00
Kefeng Wang
843a1ffaf6 powerpc/mm: use core_kernel_text() helper
Use core_kernel_text() helper to simplify code, also drop etext, _stext,
_sinittext, _einittext declaration which already declared in section.h.

Link: https://lkml.kernel.org/r/20210930071143.63410-10-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:51 -08:00
Linus Torvalds
dd72945c43 cxl for v5.16
- Fix support for platforms that do not enumerate every ACPI0016 (CXL
   Host Bridge) in the CHBS (ACPI Host Bridge Structure).
 
 - Introduce a common pci_find_dvsec_capability() helper, clean up open
   coded implementations in various drivers.
 
 - Add 'cxl_test' for regression testing CXL subsystem ABIs. 'cxl_test'
   is a module built from tools/testing/cxl/ that mocks up a CXL topology
   to augment the nascent support for emulation of CXL devices in QEMU.
 
 - Convert libnvdimm to use the uuid API.
 
 - Complete the definition of CXL namespace labels in libnvdimm.
 
 - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL
   mailbox driver. Enable 'ndctl {read,write}-labels' for CXL.
 
 - Continue to sort and refactor functionality into distinct driver and
   core-infrastructure buckets. For example, mailbox handling is now a
   generic core capability consumed by the PCI and cxl_test drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYYRtyQAKCRDfioYZHlFs
 Z7UsAP9DzUN6IWWnYk1R95YXYVxFriRtRsBjujAqTg49EMghawEAoHaA9lxO3Hho
 l25TLYUOmB/zFTlUbe6YQptMJZ5YLwY=
 =im9j
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl updates from Dan Williams:
 "More preparation and plumbing work in the CXL subsystem.

  From an end user perspective the highlight here is lighting up the CXL
  Persistent Memory related commands (label read / write) with the
  generic ioctl() front-end in LIBNVDIMM.

  Otherwise, the ability to instantiate new persistent and volatile
  memory regions is still on track for v5.17.

  Summary:

   - Fix support for platforms that do not enumerate every ACPI0016 (CXL
     Host Bridge) in the CHBS (ACPI Host Bridge Structure).

   - Introduce a common pci_find_dvsec_capability() helper, clean up
     open coded implementations in various drivers.

   - Add 'cxl_test' for regression testing CXL subsystem ABIs.
     'cxl_test' is a module built from tools/testing/cxl/ that mocks up
     a CXL topology to augment the nascent support for emulation of CXL
     devices in QEMU.

   - Convert libnvdimm to use the uuid API.

   - Complete the definition of CXL namespace labels in libnvdimm.

   - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL
     mailbox driver. Enable 'ndctl {read,write}-labels' for CXL.

   - Continue to sort and refactor functionality into distinct driver
     and core-infrastructure buckets. For example, mailbox handling is
     now a generic core capability consumed by the PCI and cxl_test
     drivers"

* tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits)
  ocxl: Use pci core's DVSEC functionality
  cxl/pci: Use pci core's DVSEC functionality
  PCI: Add pci_find_dvsec_capability to find designated VSEC
  cxl/pci: Split cxl_pci_setup_regs()
  cxl/pci: Add @base to cxl_register_map
  cxl/pci: Make more use of cxl_register_map
  cxl/pci: Remove pci request/release regions
  cxl/pci: Fix NULL vs ERR_PTR confusion
  cxl/pci: Remove dev_dbg for unknown register blocks
  cxl/pci: Convert register block identifiers to an enum
  cxl/acpi: Do not fail cxl_acpi_probe() based on a missing CHBS
  cxl/pci: Disambiguate cxl_pci further from cxl_mem
  Documentation/cxl: Add bus internal docs
  cxl/core: Split decoder setup into alloc + add
  tools/testing/cxl: Introduce a mock memory device + driver
  cxl/mbox: Move command definitions to common location
  cxl/bus: Populate the target list at decoder create
  tools/testing/cxl: Introduce a mocked-up CXL port hierarchy
  cxl/pmem: Add support for multiple nvdimm-bridge objects
  cxl/pmem: Translate NVDIMM label commands to CXL label commands
  ...
2021-11-08 11:49:48 -08:00
Linus Torvalds
1e9ed9360f Kbuild updates for v5.16
- Remove the global -isystem compiler flag, which was made possible by
    the introduction of <linux/stdarg.h>
 
  - Improve the Kconfig help to print the location in the top menu level
 
  - Fix "FORCE prerequisite is missing" build warning for sparc
 
  - Add new build targets, tarzst-pkg and perf-tarzst-src-pkg, which generate
    a zstd-compressed tarball
 
  - Prevent gen_init_cpio tool from generating a corrupted cpio when
    KBUILD_BUILD_TIMESTAMP is set to 2106-02-07 or later
 
  - Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmGGkysVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGgZkQAIX4i9Tt6pyl/2xGDGkzUqjprfoH
 QUIo1DoUclLUygoakrrrX3EnZLWrslgPTKjQxdiV6RA6xHfe4cYgNTSq8zM9lsPT
 lu+B4nEDqoXQ5gyLxMlnjS3FRQTNYIeBZEhSAIiW8TENdLKlKc+NYdoj7th50dO0
 SkXRa2dpWHa6t7ZRqHIHMpUWA7gm0w22ZbgQmyUv1CDGO4IHPLqe2b2PMsrzhSZ1
 yypP1l6aQVKuP0hN9aytbTRqDxUd0uOzBf00PK5zx23hjdwZ9wmZrFTKDf9fAu/+
 nR7gBsa5YoYNQh3UkayZXjR5dClmgsCXZ25OXI7YucQp/8OJ5fadfn1NFpJHsw56
 n5cckbHIXgnFUcel5YlkR6qTHjpzdr9vHm90MmiuX99b3oy9czl6pY3qkNfRkllQ
 v7ME5L1qlw3P3ia1KA+H4zW/LIJ8p5cbKBwaY22m3kY3bTx7PiOfMlep4UVqxXSb
 0/OqxSsmYg5LlmwEQ0SSsx45hE0o9nG/cdjkHu1jUOUHxYfpt1T4MTILeGUwmjzd
 TydJym5MZyXBawu4NVB3QLoKm5Jt2BXtyaWOtq74VSrs77roNCdYuQWJ+1aBf2Pg
 0s4CVC2cC7KlxJDImoqswZATGXPMfbiVDcuVSSukYRgBMeCBPUzRhB8YP36BZyD3
 9vFYmqSujtUU7nWb
 =ATFN
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Remove the global -isystem compiler flag, which was made possible by
   the introduction of <linux/stdarg.h>

 - Improve the Kconfig help to print the location in the top menu level

 - Fix "FORCE prerequisite is missing" build warning for sparc

 - Add new build targets, tarzst-pkg and perf-tarzst-src-pkg, which
   generate a zstd-compressed tarball

 - Prevent gen_init_cpio tool from generating a corrupted cpio when
   KBUILD_BUILD_TIMESTAMP is set to 2106-02-07 or later

 - Misc cleanups

* tag 'kbuild-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
  kbuild: use more subdir- for visiting subdirectories while cleaning
  sh: remove meaningless archclean line
  initramfs: Check timestamp to prevent broken cpio archive
  kbuild: split DEBUG_CFLAGS out to scripts/Makefile.debug
  gen_init_cpio: add static const qualifiers
  kbuild: Add make tarzst-pkg build option
  scripts: update the comments of kallsyms support
  sparc: Add missing "FORCE" target when using if_changed
  kconfig: refactor conf_touch_dep()
  kconfig: refactor conf_write_dep()
  kconfig: refactor conf_write_autoconf()
  kconfig: add conf_get_autoheader_name()
  kconfig: move sym_escape_string_value() to confdata.c
  kconfig: refactor listnewconfig code
  kconfig: refactor conf_write_symbol()
  kconfig: refactor conf_write_heading()
  kconfig: remove 'const' from the return type of sym_escape_string_value()
  kconfig: rename a variable in the lexer to a clearer name
  kconfig: narrow the scope of variables in the lexer
  kconfig: Create links to main menu items in search
  ...
2021-11-08 09:15:45 -08:00
Linus Torvalds
0c5c62ddf8 pci-v5.16-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmGFXBkUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vx6Tg/7BsGWm8f+uw/mr9lLm47q2mc4XyoO
 7bR9KDp5NM84W/8ZOU7dqqqsnY0ddrSOLBRyhJJYMW3SwJd1y1ajTBsL1Ujqv+eN
 z+JUFmhq4Laqm4k6Spc9CEJE+Ol5P6gGUtxLYo6PM2R0VxnSs/rDxctT5i7YOpCi
 COJ+NVT/mc/by2loz1kLTSR9GgtBBgd+Y8UA33GFbHKssROw02L0OI3wffp81Oba
 EhMGPoD+0FndAniDw+vaOSoO+YaBuTfbM92T/O00mND69Fj1PWgmNWZz7gAVgsXb
 3RrNENUFxgw6CDt7LZWB8OyT04iXe0R2kJs+PA9gigFCGbypwbd/Nbz5M7e9HUTR
 ray+1EpZib6+nIksQBL2mX8nmtyHMcLiM57TOEhq0+ECDO640MiRm8t0FIG/1E8v
 3ZYd9w20o/NxlFNXHxxpZ3D/osGH5ocyF5c5m1rfB4RGRwztZGL172LWCB0Ezz9r
 eHB8sWxylxuhrH+hp2BzQjyddg7rbF+RA4AVfcQSxUpyV01hoRocKqknoDATVeLH
 664nJIINFxKJFwfuL3E6OhrInNe1LnAhCZsHHqbS+NNQFgvPRznbixBeLkI9dMf5
 Yf6vpsWO7ur8lHHbRndZubVu8nxklXTU7B/w+C11sq6k9LLRJSHzanr3Fn9WA80x
 sznCxwUvbTCu1r0=
 =nsMh
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:
   - Conserve IRQs by setting up portdrv IRQs only when there are users
     (Jan Kiszka)
   - Rework and simplify _OSC negotiation for control of PCIe features
     (Joerg Roedel)
   - Remove struct pci_dev.driver pointer since it's redundant with the
     struct device.driver pointer (Uwe Kleine-König)

  Resource management:
   - Coalesce contiguous host bridge apertures from _CRS to accommodate
     BARs that cover more than one aperture (Kai-Heng Feng)

  Sysfs:
   - Check CAP_SYS_ADMIN before parsing user input (Krzysztof
     Wilczyński)
   - Return -EINVAL consistently from "store" functions (Krzysztof
     Wilczyński)
   - Use sysfs_emit() in endpoint "show" functions to avoid buffer
     overruns (Kunihiko Hayashi)

  PCIe native device hotplug:
   - Ignore Link Down/Up caused by resets during error recovery so
     endpoint drivers can remain bound to the device (Lukas Wunner)

  Virtualization:
   - Avoid bus resets on Atheros QCA6174, where they hang the device
     (Ingmar Klein)
   - Work around Pericom PI7C9X2G switch packet drop erratum by using
     store and forward mode instead of cut-through (Nathan Rossi)
   - Avoid trying to enable AtomicOps on VFs; the PF setting applies to
     all VFs (Selvin Xavier)

  MSI:
   - Document that /sys/bus/pci/devices/.../irq contains the legacy INTx
     interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry
     Song)

  VPD:
   - Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere
     in the possible VPD space; use these to simplify the cxgb3 driver
     (Heiner Kallweit)

  Peer-to-peer DMA:
   - Add (not subtract) the bus offset when calculating DMA address
     (Wang Lu)

  ASPM:
   - Re-enable LTR at Downstream Ports so they don't report Unsupported
     Requests when reset or hot-added devices send LTR messages
     (Mingchuang Qiao)

  Apple PCIe controller driver:
   - Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc
     Zyngier)

  Cadence PCIe controller driver:
   - Return success when probe succeeds instead of falling into error
     path (Li Chen)

  HiSilicon Kirin PCIe controller driver:
   - Reorganize PHY logic and add support for external PHY drivers
     (Mauro Carvalho Chehab)
   - Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro
     Carvalho Chehab)
   - Add Kirin 970 support (Mauro Carvalho Chehab)
   - Make driver removable (Mauro Carvalho Chehab)

  Intel VMD host bridge driver:
   - If IOMMU supports interrupt remapping, leave VMD MSI-X remapping
     enabled (Adrian Huang)
   - Number each controller so we can tell them apart in
     /proc/interrupts (Chunguang Xu)
   - Avoid building on UML because VMD depends on x86 bare metal APIs
     (Johannes Berg)

  Marvell Aardvark PCIe controller driver:
   - Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár)
   - Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár)
   - Downgrade PIO Response Status messages to debug level (Marek Behún)
   - Preserve CRS SV (Config Request Retry Software Visibility) bit in
     emulated Root Control register (Pali Rohár)
   - Fix issue in configuring reference clock (Pali Rohár)
   - Don't clear status bits for masked interrupts (Pali Rohár)
   - Don't mask unused interrupts (Pali Rohár)
   - Avoid code repetition in advk_pcie_rd_conf() (Marek Behún)
   - Retry config accesses on CRS response (Pali Rohár)
   - Simplify emulated Root Capabilities initialization (Pali Rohár)
   - Fix several link training issues (Pali Rohár)
   - Fix link-up checking via LTSSM (Pali Rohár)
   - Fix reporting of Data Link Layer Link Active (Pali Rohár)
   - Fix emulation of W1C bits (Marek Behún)
   - Fix MSI domain .alloc() method to return zero on success (Marek
     Behún)
   - Read entire 16-bit MSI vector in MSI handler, not just low 8 bits
     (Marek Behún)
   - Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits
     at startup; PCI core will set those as necessary (Pali Rohár)
   - When operating as a Root Port, set class code to "PCI Bridge"
     instead of the default "Mass Storage Controller" (Pali Rohár)
   - Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't
     implement this per spec (Pali Rohár)
   - Add emulation of option ROM BAR since aardvark doesn't implement
     this per spec (Pali Rohár)

  MediaTek MT7621 PCIe controller driver:
   - Add MediaTek MT7621 PCIe host controller driver and DT binding
     (Sergio Paracuellos)

  Qualcomm PCIe controller driver:
   - Add SC8180x compatible string (Bjorn Andersson)
   - Add endpoint controller driver and DT binding (Manivannan
     Sadhasivam)
   - Restructure to use of_device_get_match_data() (Prasad Malisetty)
   - Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty)

  Renesas R-Car PCIe controller driver:
   - Remove unnecessary includes (Geert Uytterhoeven)

  Rockchip DesignWare PCIe controller driver:
   - Add DT binding (Simon Xue)

  Socionext UniPhier Pro5 controller driver:
   - Serialize INTx masking/unmasking (Kunihiko Hayashi)

  Synopsys DesignWare PCIe controller driver:
   - Run dwc .host_init() method before registering MSI interrupt
     handler so we can deal with pending interrupts left by bootloader
     (Bjorn Andersson)
   - Clean up Kconfig dependencies (Andy Shevchenko)
   - Export symbols to allow more modular drivers (Luca Ceresoli)

  TI DRA7xx PCIe controller driver:
   - Allow host and endpoint drivers to be modules (Luca Ceresoli)
   - Enable external clock if present (Luca Ceresoli)

  TI J721E PCIe driver:
   - Disable PHY when probe fails after initializing it (Christophe
     JAILLET)

  MicroSemi Switchtec management driver:
   - Return error to application when command execution fails because an
     out-of-band reset has cleared the device BARs, Memory Space Enable,
     etc (Kelvin Cao)
   - Fix MRPC error status handling issue (Kelvin Cao)
   - Mask out other bits when reading of management VEP instance ID
     (Kelvin Cao)
   - Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions
     (Kelvin Cao)
   - Add check of event support (Logan Gunthorpe)

  Miscellaneous:
   - Remove unused pci_pool wrappers, which have been replaced by
     dma_pool (Cai Huoqing)
   - Use 'unsigned int' instead of bare 'unsigned' (Krzysztof
     Wilczyński)
   - Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof
     Wilczyński)
   - Fix some sscanf(), sprintf() format mismatches (Krzysztof
     Wilczyński)
   - Update PCI subsystem information in MAINTAINERS (Krzysztof
     Wilczyński)
   - Correct some misspellings (Krzysztof Wilczyński)"

* tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits)
  PCI: Add ACS quirk for Pericom PI7C9X2G switches
  PCI: apple: Configure RID to SID mapper on device addition
  iommu/dart: Exclude MSI doorbell from PCIe device IOVA range
  PCI: apple: Implement MSI support
  PCI: apple: Add INTx and per-port interrupt support
  PCI: kirin: Allow removing the driver
  PCI: kirin: De-init the dwc driver
  PCI: kirin: Disable clkreq during poweroff sequence
  PCI: kirin: Move the power-off code to a common routine
  PCI: kirin: Add power_off support for Kirin 960 PHY
  PCI: kirin: Allow building it as a module
  PCI: kirin: Add MODULE_* macros
  PCI: kirin: Add Kirin 970 compatible
  PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge
  PCI: apple: Set up reference clocks when probing
  PCI: apple: Add initial hardware bring-up
  PCI: of: Allow matching of an interrupt-map local to a PCI device
  of/irq: Allow matching of an interrupt-map local to an interrupt controller
  irqdomain: Make of_phandle_args_to_fwspec() generally available
  PCI: Do not enable AtomicOps on VFs
  ...
2021-11-06 14:36:12 -07:00
Linus Torvalds
512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
Stephen Kitt
53944f171a mm: remove HARDENED_USERCOPY_FALLBACK
This has served its purpose and is no longer used.  All usercopy
violations appear to have been handled by now, any remaining instances
(or new bugs) will cause copies to be rejected.

This isn't a direct revert of commit 2d891fbc3b ("usercopy: Allow
strict enforcement of whitelists"); since usercopy_fallback is
effectively 0, the fallback handling is removed too.

This also removes the usercopy_fallback module parameter on slab_common.

Link: https://github.com/KSPP/linux/issues/153
Link: https://lkml.kernel.org/r/20210921061149.1091163-1-steve@sk2.org
Signed-off-by: Stephen Kitt <steve@sk2.org>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>	[defconfig change]
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E . Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:43 -07:00
David Hildenbrand
50f9481ed9 mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.

Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>	[kselftest]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
Zhenguo Yao
b5389086ad hugetlbfs: extend the definition of hugepages parameter to support node allocation
We can specify the number of hugepages to allocate at boot.  But the
hugepages is balanced in all nodes at present.  In some scenarios, we
only need hugepages in one node.  For example: DPDK needs hugepages
which are in the same node as NIC.

If DPDK needs four hugepages of 1G size in node1 and system has 16 numa
nodes we must reserve 64 hugepages on the kernel cmdline.  But only four
hugepages are used.  The others should be free after boot.  If the
system memory is low(for example: 64G), it will be an impossible task.

So extend the hugepages parameter to support specifying hugepages on a
specific node.  For example add following parameter:

  hugepagesz=1G hugepages=0:1,1:3

It will allocate 1 hugepage in node0 and 3 hugepages in node1.

Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com
Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
4421cca0a3 memblock: use memblock_free for freeing virtual pointers
Rename memblock_free_ptr() to memblock_free() and use memblock_free()
when freeing a virtual pointer so that memblock_free() will be a
counterpart of memblock_alloc()

The callers are updated with the below semantic patch and manual
addition of (void *) casting to pointers that are represented by
unsigned long variables.

    @@
    identifier vaddr;
    expression size;
    @@
    (
    - memblock_phys_free(__pa(vaddr), size);
    + memblock_free(vaddr, size);
    |
    - memblock_free_ptr(vaddr, size);
    + memblock_free(vaddr, size);
    )

[sfr@canb.auug.org.au: fixup]
  Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au

Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
3ecc68349b memblock: rename memblock_free to memblock_phys_free
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().

The callers are updated with the below semantic patch:

    @@
    expression addr;
    expression size;
    @@
    - memblock_free(addr, size);
    + memblock_phys_free(addr, size);

Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
fa27717110 memblock: drop memblock_free_early_nid() and memblock_free_early()
memblock_free_early_nid() is unused and memblock_free_early() is an
alias for memblock_free().

Replace calls to memblock_free_early() with calls to memblock_free() and
remove memblock_free_early() and memblock_free_early_nid().

Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Christophe Leroy
e012a25d81 powerpc: use generic version of arch_is_kernel_initmem_freed()
The generic version of arch_is_kernel_initmem_freed() now does the same
as powerpc version.

Remove the powerpc version.

Link: https://lkml.kernel.org/r/c53764eb45d41491e2b21da2e7812239897dbebb.1633001016.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:38 -07:00
Bjorn Helgaas
d03c426f7a Merge branch 'pci/driver'
- Drop the struct pci_dev.driver pointer, which is redundant with the
  struct device.driver pointer (Uwe Kleine-König)

* pci/driver:
  PCI: Remove struct pci_dev->driver
  PCI: Use to_pci_driver() instead of pci_dev->driver
  x86/pci/probe_roms: Use to_pci_driver() instead of pci_dev->driver
  perf/x86/intel/uncore: Use to_pci_driver() instead of pci_dev->driver
  powerpc/eeh: Use to_pci_driver() instead of pci_dev->driver
  usb: xhci: Use to_pci_driver() instead of pci_dev->driver
  cxl: Use to_pci_driver() instead of pci_dev->driver
  cxl: Factor out common dev->driver expressions
  xen/pcifront: Use to_pci_driver() instead of pci_dev->driver
  xen/pcifront: Drop pcifront_common_process() tests of pcidev, pdrv
  nfp: use dev_driver_string() instead of pci_dev->driver->name
  mlxsw: pci: Use dev_driver_string() instead of pci_dev->driver->name
  net: marvell: prestera: use dev_driver_string() instead of pci_dev->driver->name
  net: hns3: use dev_driver_string() instead of pci_dev->driver->name
  crypto: hisilicon - use dev_driver_string() instead of pci_dev->driver->name
  powerpc/eeh: Use dev_driver_string() instead of struct pci_dev->driver->name
  ssb: Use dev_driver_string() instead of pci_dev->driver->name
  bcma: simplify reference to driver name
  crypto: qat - simplify adf_enable_aer()
  scsi: message: fusion: Remove unused mpt_pci driver .probe() 'id' parameter
  PCI/ERR: Factor out common dev->driver expressions
  PCI: Drop pci_device_probe() test of !pci_dev->driver
  PCI: Drop pci_device_remove() test of pci_dev->driver
  PCI: Return NULL for to_pci_driver(NULL)
2021-11-05 11:28:43 -05:00
Linus Torvalds
5c0b0c676a powerpc updates for 5.16
- Enable STRICT_KERNEL_RWX for Freescale 85xx platforms.
 
  - Activate CONFIG_STRICT_KERNEL_RWX by default, while still allowing it to be disabled.
 
  - Add support for out-of-line static calls on 32-bit.
 
  - Fix oopses doing bpf-to-bpf calls when STRICT_KERNEL_RWX is enabled.
 
  - Fix boot hangs on e5500 due to stale value in ESR passed to do_page_fault().
 
  - Fix several bugs on pseries in handling of device tree cache information for hotplugged
    CPUs, and/or during partition migration.
 
  - Various other small features and fixes.
 
 Thanks to: Alexey Kardashevskiy, Alistair Popple, Anatolij Gustschin, Andrew Donnellan,
 Athira Rajeev, Bixuan Cui, Bjorn Helgaas, Cédric Le Goater, Christophe Leroy, Daniel
 Axtens, Daniel Henrique Barboza, Denis Kirjanov, Fabiano Rosas, Frederic Barrat, Gustavo
 A. R. Silva, Hari Bathini, Jacques de Laval, Joel Stanley, Kai Song, Kajol Jain, Laurent
 Vivier, Leonardo Bras, Madhavan Srinivasan, Nathan Chancellor, Nathan Lynch, Naveen N.
 Rao, Nicholas Piggin, Nick Desaulniers, Niklas Schnelle, Oliver O'Halloran, Rob Herring,
 Russell Currey, Srikar Dronamraju, Stan Johnson, Tyrel Datwyler, Uwe Kleine-König, Vasant
 Hegde, Wan Jiabing, Xiaoming Ni,
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGFDPoTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgNbsEACVczMVwMBEny5a7W1tqq1bnY9RFVw3
 K+/rE7/FpSLX+RrwgoMkmqfPvfyc9WISVLlIQDKz4XhkBjaXv0+Y4OMsymuDCbxL
 Qk7F1ff22UfLmPjjJk39gHJ8QZQqZk3wmFu2QzTO4yBZbz2SqqXFLxwyoLpZ0LJ8
 pdGl51+bIsTkDJzrdkhX9X4AKS/fYyjbQxq5u7FS89ZCCs+KvzjLcDRo0GZYaOK/
 hgDBa60DCCszL/9yjbh0ANZxmM2Z3+6AXkvAAXrtXzIGk4JzenZfiV+VEzmq8Tt0
 UpWSsUEe7VgykMR3MTrL9G8op70PpKX6OMUPegJq4iRQ24h4mpFCK4oV9OMKJqpF
 ifN9NO2ZZKOz1ke4l7Xe8SEHLX7rq5U/P7INh3AsKYNYwo6HkJhSPxiCBWUTlnZ3
 OYoZ7czyO4gMPHWP92z4CoSiTYVBFuyhYexRcnQskg60TIwbr+lMXziRyPRGI8b6
 taf2rD8eAiyUJnvkFUsyAHtYHpkSkuMeiVqY2CDQdh2SdtIFgwKzB2RjFL0gzaBZ
 XP9RWD+HernGQAJSlIk7cVthont3JHklcKk+ohhDbsWzPeUJcz6t4ChtgRq0x4q4
 Hpes1lsVfXpyxj5ouBK/E/t+diwPvBLM0dCcarQJE6ExgMzBC/C7Br7jCSgyVJA2
 VhtcZaCYh+vRlQ==
 =f7HE
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Enable STRICT_KERNEL_RWX for Freescale 85xx platforms.

 - Activate CONFIG_STRICT_KERNEL_RWX by default, while still allowing it
   to be disabled.

 - Add support for out-of-line static calls on 32-bit.

 - Fix oopses doing bpf-to-bpf calls when STRICT_KERNEL_RWX is enabled.

 - Fix boot hangs on e5500 due to stale value in ESR passed to
   do_page_fault().

 - Fix several bugs on pseries in handling of device tree cache
   information for hotplugged CPUs, and/or during partition migration.

 - Various other small features and fixes.

Thanks to Alexey Kardashevskiy, Alistair Popple, Anatolij Gustschin,
Andrew Donnellan, Athira Rajeev, Bixuan Cui, Bjorn Helgaas, Cédric Le
Goater, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Denis
Kirjanov, Fabiano Rosas, Frederic Barrat, Gustavo A.  R.  Silva, Hari
Bathini, Jacques de Laval, Joel Stanley, Kai Song, Kajol Jain, Laurent
Vivier, Leonardo Bras, Madhavan Srinivasan, Nathan Chancellor, Nathan
Lynch, Naveen N.  Rao, Nicholas Piggin, Nick Desaulniers, Niklas
Schnelle, Oliver O'Halloran, Rob Herring, Russell Currey, Srikar
Dronamraju, Stan Johnson, Tyrel Datwyler, Uwe Kleine-König, Vasant
Hegde, Wan Jiabing, and Xiaoming Ni,

* tag 'powerpc-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (73 commits)
  powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST
  powerpc/32e: Ignore ESR in instruction storage interrupt handler
  powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
  powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEBUG_PAGEALLOC
  MAINTAINERS: Update powerpc KVM entry
  powerpc/xmon: fix task state output
  powerpc/44x/fsp2: add missing of_node_put
  powerpc/dcr: Use cmplwi instead of 3-argument cmpli
  KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling
  powerpc/security: Use a mutex for interrupt exit code patching
  powerpc/83xx/mpc8349emitx: Make mcu_gpiochip_remove() return void
  powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMs
  powerpc/book3e: Fix set_memory_x() and set_memory_nx()
  powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()
  powerpc/bpf: Fix write protecting JIT code
  selftests/powerpc: Use date instead of EPOCHSECONDS in mitigation-patching.sh
  powerpc/64s/interrupt: Fix check_return_regs_valid() false positive
  powerpc/boot: Set LC_ALL=C in wrapper script
  powerpc/64s: Default to 64K pages for 64 bit book3s
  Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC"
  ...
2021-11-05 08:15:46 -07:00
Linus Torvalds
dcd68326d2 Devicetree updates for v5.16:
- Convert /reserved-memory bindings to schemas
 
 - Convert a bunch of NFC bindings to schemas
 
 - Convert bindings to schema: Xilinx USB, Freescale DDR controller, Arm
   CCI-400, UBlox Neo-6M, 1-Wire GPIO, MSI controller, ASpeed LPC, OMAP
   and Inside-Secure HWRNG, register-bit-led, OV5640, Silead GSL1680,
   Elan ekth3000, Marvell bluetooth, TI wlcore, TI bluetooth, ESP ESP8089,
   tlm,trusted-foundations, Microchip cap11xx, Ralink SoCs and boards,
   and TI sysc
 
 - New binding schemas for: msi-ranges, Aspeed UART routing controller,
   palmbus, Xylon LogiCVC display controller, Mediatek's MT7621 SDRAM
   memory controller, and Apple M1 PCIe host
 
 - Run schema checks for %.dtb targets
 
 - Improve build time when using DT_SCHEMA_FILES
 
 - Improve error message when dtschema is not found
 
 - Various doc reference fixes in MAINTAINERS
 
 - Convert architectures to common CPU h/w ID parsing function
   of_get_cpu_hwid().
 
 - Allow for empty NUMA node IDs which may be hotplugged
 
 - Cleanup of __fdt_scan_reserved_mem()
 
 - Constify device_node parameters
 
 - Update dtc to upstream v1.6.1-19-g0a3a9d3449c8. Adds new checks
   'node_name_vs_property_name' and 'interrupt_map'.
 
 - Enable dtc 'unit_address_format' warning by default
 
 - Fix unittest EXPECT text for gpio hog errors
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmGBrj8QHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw3M1D/9gpaVBqp+Q5hZZLWOjz/WkAsExZ71N/8Lh
 rn64XWYQNJ6R1PINkBtlooJy6wTCIMfNs3IEmkAVEXVEj1Nvu7uEZwYbb96B4dJ4
 EiMv/Vz0EphoqnBvICT86XfNZduP1sZ5M11pdv2dNvwJrEvvi98VLDvSucvxorn8
 sm5jsqWOAwroiCR+u8BWW3qH3sugL1BOAwraMoUbosZAo0SpNH4WBdcBz4+v8lUS
 5N8Y8Q6dB6fEqdbVpzMblN2B9c/TEb1VYaeGXRUyQsIUQJajX3xnR8RDnTKLBtsS
 FAKGQORemLwVzBVKeZKbhlqXAJbl701LuKHRLiVerb9UGi+tk4AX9Rgg1Whrp7w4
 UYi+k4Ozus1vDaKsemB1voabSgYYY+aNTRezltdtPz0a+eQJWPUt1xQB5m68cGO4
 TZI+KfExxyGVa8iDgv4AWhvXqbR3+PUTUvel2xEIkRscWmMjXF/+oQXy8QYn2Aok
 S9750/3EUQCbKi9ZUjPLRzd5CuPP2E97i8V2WdOgRse3+H7pPg5IcEq7oQYe9A62
 SnRFjPz1X5g4Hh3bRVmcAGmDzbZJrl9dULvYVdiUWiqzfmHxN7MXO9FIxv3NKVfp
 6jgr5vVVi1ShDnCh3ns4mYUwQ7j72dsONyklbVBbNtGjeeZcv5MEeg9ZAoVvO+lh
 9DNNSGSd2g==
 =dQa6
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:

 - Convert /reserved-memory bindings to schemas

 - Convert a bunch of NFC bindings to schemas

 - Convert bindings to schema: Xilinx USB, Freescale DDR controller, Arm
   CCI-400, UBlox Neo-6M, 1-Wire GPIO, MSI controller, ASpeed LPC, OMAP
   and Inside-Secure HWRNG, register-bit-led, OV5640, Silead GSL1680,
   Elan ekth3000, Marvell bluetooth, TI wlcore, TI bluetooth, ESP
   ESP8089, tlm,trusted-foundations, Microchip cap11xx, Ralink SoCs and
   boards, and TI sysc

 - New binding schemas for: msi-ranges, Aspeed UART routing controller,
   palmbus, Xylon LogiCVC display controller, Mediatek's MT7621 SDRAM
   memory controller, and Apple M1 PCIe host

 - Run schema checks for %.dtb targets

 - Improve build time when using DT_SCHEMA_FILES

 - Improve error message when dtschema is not found

 - Various doc reference fixes in MAINTAINERS

 - Convert architectures to common CPU h/w ID parsing function
   of_get_cpu_hwid().

 - Allow for empty NUMA node IDs which may be hotplugged

 - Cleanup of __fdt_scan_reserved_mem()

 - Constify device_node parameters

 - Update dtc to upstream v1.6.1-19-g0a3a9d3449c8. Adds new checks
   'node_name_vs_property_name' and 'interrupt_map'.

 - Enable dtc 'unit_address_format' warning by default

 - Fix unittest EXPECT text for gpio hog errors

* tag 'devicetree-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (97 commits)
  dt-bindings: net: ti,bluetooth: Document default max-speed
  dt-bindings: pci: rcar-pci-ep: Document r8a7795
  dt-bindings: net: qcom,ipa: IPA does support up to two iommus
  of/fdt: Remove of_scan_flat_dt() usage for __fdt_scan_reserved_mem()
  of: unittest: document intentional interrupt-map provider build warning
  of: unittest: fix EXPECT text for gpio hog errors
  of/unittest: Disable new dtc node_name_vs_property_name and interrupt_map warnings
  scripts/dtc: Update to upstream version v1.6.1-19-g0a3a9d3449c8
  dt-bindings: arm: firmware: tlm,trusted-foundations: Convert txt bindings to yaml
  dt-bindings: display: tilcd: Fix endpoint addressing in example
  dt-bindings: input: microchip,cap11xx: Convert txt bindings to yaml
  dt-bindings: ufs: exynos-ufs: add exynosautov9 compatible
  dt-bindings: ufs: exynos-ufs: add io-coherency property
  dt-bindings: mips: convert Ralink SoCs and boards to schema
  dt-bindings: display: xilinx: Fix example with psgtr
  dt-bindings: net: nfc: nxp,pn544: Convert txt bindings to yaml
  dt-bindings: Add a help message when dtschema tools are missing
  dt-bindings: bus: ti-sysc: Update to use yaml binding
  dt-bindings: sram: Allow numbers in sram region node name
  dt-bindings: display: Document the Xylon LogiCVC display controller
  ...
2021-11-02 22:22:13 -07:00
Linus Torvalds
c03098d4b9 gfs2: Fix mmap + page fault deadlocks
Functions gfs2_file_read_iter and gfs2_file_write_iter are both
 accessing the user buffer to write to or read from while holding the
 inode glock.  In the most basic scenario, that buffer will not be
 resident and it will be mapped to the same file.  Accessing the buffer
 will trigger a page fault, and gfs2 will deadlock trying to take the
 same inode glock again while trying to handle that fault.
 
 Fix that and similar, more complex scenarios by disabling page faults
 while accessing user buffers.  To make this work, introduce a small
 amount of new infrastructure and fix some bugs that didn't trigger so
 far, with page faults enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmGBPisUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpE6A/7BezUnGuNJxJrR8pC+vcLYA7xAgUU
 6STQ6IN7w5UHRlSkNzZxZ2XPxW4uVQ4SxSEeaLqBsHZihepjcLNFZ/8MhQ6UPSD0
 8noHOi7CoIcp6IuWQtCpxRM/xjjm2SlMt2XbVJZaiJcdzCV9gB6TU9EkBRq7Zm/X
 9WFBbv1xZF0skn9ISCJvNtiiI+VyWKgMDUKxJUiTQjmJcklyyqHcVGmQi9BjqPz4
 4s3F+WH6CoGbDKlmNk/6Y9wZ/2+sbvGswVscUxPwJVPoZWsR1xBBUdAeAmEMD1P4
 BgE/Y1J8JXyVPYtyvZKq70XUhKdQkxB7RfX87YasOk9mY4Kjd5rIIGEykh+o2vC9
 kDhCHvf2Mnw5I6Rum3B7UXyB1vemY+fECIHsXhgBnS+ztabRtcAdpCuWoqb43ymw
 yEX1KwXyU4FpRYbrRvdZT42Fmh6ty8TW+N4swg8S2TrffirvgAi5yrcHZ4mPupYv
 lyzvsCW7Wv8hPXn/twNObX+okRgJnsxcCdBXARdCnRXfA8tH23xmu88u8RA1Vdxh
 nzTvv6Dx2EowwojuDWMx29Mw3fA2IqIfbOV+4FaRU7NZ2ZKtknL8yGl27qQUsMoJ
 vYsHTmagasjQr+NDJ3vQRLCw+JQ6B1hENpdkmixFD9moo7X1ZFW3HBi/UL973Bv6
 5CmgeXto8FRUFjI=
 =WeNd
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 mmap + page fault deadlocks fixes from Andreas Gruenbacher:
 "Functions gfs2_file_read_iter and gfs2_file_write_iter are both
  accessing the user buffer to write to or read from while holding the
  inode glock.

  In the most basic deadlock scenario, that buffer will not be resident
  and it will be mapped to the same file. Accessing the buffer will
  trigger a page fault, and gfs2 will deadlock trying to take the same
  inode glock again while trying to handle that fault.

  Fix that and similar, more complex scenarios by disabling page faults
  while accessing user buffers. To make this work, introduce a small
  amount of new infrastructure and fix some bugs that didn't trigger so
  far, with page faults enabled"

* tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix mmap + page fault deadlocks for direct I/O
  iov_iter: Introduce nofault flag to disable page faults
  gup: Introduce FOLL_NOFAULT flag to disable page faults
  iomap: Add done_before argument to iomap_dio_rw
  iomap: Support partial direct I/O on user copy failures
  iomap: Fix iomap_dio_rw return value for user copies
  gfs2: Fix mmap + page fault deadlocks for buffered I/O
  gfs2: Eliminate ip->i_gh
  gfs2: Move the inode glock locking to gfs2_file_buffered_write
  gfs2: Introduce flag for glock holder auto-demotion
  gfs2: Clean up function may_grant
  gfs2: Add wrapper for iomap_file_buffered_write
  iov_iter: Introduce fault_in_iov_iter_writeable
  iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable
  gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable}
  powerpc/kvm: Fix kvm_use_magic_page
  iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
2021-11-02 12:25:03 -07:00
Linus Torvalds
d7e0a795bf ARM:
* More progress on the protected VM front, now with the full
   fixed feature set as well as the limitation of some hypercalls
   after initialisation.
 
 * Cleanup of the RAZ/WI sysreg handling, which was pointlessly
   complicated
 
 * Fixes for the vgic placement in the IPA space, together with a
   bunch of selftests
 
 * More memcg accounting of the memory allocated on behalf of a guest
 
 * Timer and vgic selftests
 
 * Workarounds for the Apple M1 broken vgic implementation
 
 * KConfig cleanups
 
 * New kvmarm.mode=none option, for those who really dislike us
 
 RISC-V:
 * New KVM port.
 
 x86:
 * New API to control TSC offset from userspace
 
 * TSC scaling for nested hypervisors on SVM
 
 * Switch masterclock protection from raw_spin_lock to seqcount
 
 * Clean up function prototypes in the page fault code and avoid
 repeated memslot lookups
 
 * Convey the exit reason to userspace on emulation failure
 
 * Configure time between NX page recovery iterations
 
 * Expose Predictive Store Forwarding Disable CPUID leaf
 
 * Allocate page tracking data structures lazily (if the i915
 KVM-GT functionality is not compiled in)
 
 * Cleanups, fixes and optimizations for the shadow MMU code
 
 s390:
 * SIGP Fixes
 
 * initial preparations for lazy destroy of secure VMs
 
 * storage key improvements/fixes
 
 * Log the guest CPNC
 
 Starting from this release, KVM-PPC patches will come from
 Michael Ellerman's PPC tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGBOiEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNowwf/axlx3g9sgCwQHr12/6UF/7hL/RwP
 9z+pGiUzjl2YQE+RjSvLqyd6zXh+h4dOdOKbZDLSkSTbcral/8U70ojKnQsXM0XM
 1LoymxBTJqkgQBLm9LjYreEbzrPV4irk4ygEmuk3CPOHZu8xX1ei6c5LdandtM/n
 XVUkXsQY+STkmnGv4P3GcPoDththCr0tBTWrFWtxa0w9hYOxx0ay1AZFlgM4FFX0
 QFuRc8VBLoDJpIUjbkhsIRIbrlHc/YDGjuYnAU7lV/CIME8vf2BW6uBwIZJdYcDj
 0ejozLjodEnuKXQGnc8sXFioLX2gbMyQJEvwCgRvUu/EU7ncFm1lfs7THQ==
 =UxKM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - More progress on the protected VM front, now with the full fixed
     feature set as well as the limitation of some hypercalls after
     initialisation.

   - Cleanup of the RAZ/WI sysreg handling, which was pointlessly
     complicated

   - Fixes for the vgic placement in the IPA space, together with a
     bunch of selftests

   - More memcg accounting of the memory allocated on behalf of a guest

   - Timer and vgic selftests

   - Workarounds for the Apple M1 broken vgic implementation

   - KConfig cleanups

   - New kvmarm.mode=none option, for those who really dislike us

  RISC-V:

   - New KVM port.

  x86:

   - New API to control TSC offset from userspace

   - TSC scaling for nested hypervisors on SVM

   - Switch masterclock protection from raw_spin_lock to seqcount

   - Clean up function prototypes in the page fault code and avoid
     repeated memslot lookups

   - Convey the exit reason to userspace on emulation failure

   - Configure time between NX page recovery iterations

   - Expose Predictive Store Forwarding Disable CPUID leaf

   - Allocate page tracking data structures lazily (if the i915 KVM-GT
     functionality is not compiled in)

   - Cleanups, fixes and optimizations for the shadow MMU code

  s390:

   - SIGP Fixes

   - initial preparations for lazy destroy of secure VMs

   - storage key improvements/fixes

   - Log the guest CPNC

  Starting from this release, KVM-PPC patches will come from Michael
  Ellerman's PPC tree"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  RISC-V: KVM: fix boolreturn.cocci warnings
  RISC-V: KVM: remove unneeded semicolon
  RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions
  RISC-V: KVM: Factor-out FP virtualization into separate sources
  KVM: s390: add debug statement for diag 318 CPNC data
  KVM: s390: pv: properly handle page flags for protected guests
  KVM: s390: Fix handle_sske page fault handling
  KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol
  KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info
  KVM: x86: Clarify the kvm_run.emulation_failure structure layout
  KVM: s390: Add a routine for setting userspace CPU state
  KVM: s390: Simplify SIGP Set Arch handling
  KVM: s390: pv: avoid stalls when making pages secure
  KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
  KVM: s390: pv: avoid double free of sida page
  KVM: s390: pv: add macros for UVC CC values
  s390/mm: optimize reset_guest_reference_bit()
  s390/mm: optimize set_guest_storage_key()
  s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present
  ...
2021-11-02 11:24:14 -07:00
Linus Torvalds
d2fac0afe8 audit/stable-5.16 PR 20211101
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmGANdUUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOmihAAgKSTv4Jf0s4yopdcxfuLweiyqHX1
 719QJzdLZohmllrJPq/83FZL9qodCzxy87nAm67Ht0baSKiEjtVgRaVCqJWEE+l6
 oQL+wUsGLP7CmExOP503Uh6tW35AhETQA4Uwu6QtiUYLYG17kAgeR3cTFuekUsJS
 iL4K65PXE2bBxMe7Ta1YIZqcxptbknMgpqYkdne7xs7RS+UiVj8TyRle6ACrfzEX
 IVy4LTk+spHCy1a494g9pt/21xOnbiLHr/FpckALscnvJiUThxbfQHGSQeMpM4uM
 BnwCqFrj860vMeh52M11/GAAXmdPh6AjoLhaSIW2I3M2GbV8ZP2hu1HYUz3osmrT
 f+aeMPJ4feX1xVj6qAC+1G83XRO83tP/YIEuocGiwyepImB25NHPin21xepf6Ru0
 wJX+aXC9O1eG6E2ghT6tBim/MpeNH5OT0hNO3uhGmEQ6xZpArRVVaBwlEdufJiCx
 ZljqEFUT7wA9nGEQif6GdLnGezGr/aNL65caTkIAzHKamd79QIr7VZXYjYIfHSqE
 p74Aro6E8qoQJjsTSkvZceM0u1LRzwS4wPRroE6eGz98oYDpiDm1RPb+9Gw5jyJf
 JN7UjJKO9+iPGAi3KivGBqpBskw4cCp2y/nHrMYmpGUPELcr5kQtDfQ6yp59tVZ8
 Dwo5GeSlG6khmiI=
 =WrEw
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "Add some additional audit logging to capture the openat2() syscall
  open_how struct info.

  Previous variations of the open()/openat() syscalls allowed audit
  admins to inspect the syscall args to get the information contained in
  the new open_how struct used in openat2()"

* tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: return early if the filter rule has a lower priority
  audit: add OPENAT2 record to list "how" info
  audit: add support for the openat2 syscall
  audit: replace magic audit syscall class numbers with macros
  lsm_audit: avoid overloading the "key" audit field
  audit: Convert to SPDX identifier
  audit: rename struct node to struct audit_node to prevent future name collisions
2021-11-01 21:17:39 -07:00
Linus Torvalds
79ef0c0014 Tracing updates for 5.16:
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
   dump happens from a kretprobe callback.
 
 - Fix to bootconfig parsing
 
 - Have tracefs allow owner and group permissions by default (only denying
   others). There's been pressure to allow non root to tracefs in a
   controlled fashion, and using groups is probably the safest.
 
 - Bootconfig memory managament updates.
 
 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.
 
 - Allow perf to be traced by function tracer.
 
 - Rewrite of function graph tracer to be a callback from the function tracer
   instead of having its own trampoline (this change will happen on an arch
   by arch basis, and currently only x86_64 implements it).
 
 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.
 
 - Allow histogram triggers to add variables that can perform calculations
   against the event's fields.
 
 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent warnings
   from the compiler.
 
 - Extend histogram triggers to key off of variables.
 
 - Have trace recursion use bit magic to determine preempt context over if
   branches.
 
 - Have trace recursion disable preemption as all use cases do anyway.
 
 - Added testing for verification of tracing utilities.
 
 - Various small clean ups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
 Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
 =Loz8
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - kprobes: Restructured stack unwinder to show properly on x86 when a
   stack dump happens from a kretprobe callback.

 - Fix to bootconfig parsing

 - Have tracefs allow owner and group permissions by default (only
   denying others). There's been pressure to allow non root to tracefs
   in a controlled fashion, and using groups is probably the safest.

 - Bootconfig memory managament updates.

 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.

 - Allow perf to be traced by function tracer.

 - Rewrite of function graph tracer to be a callback from the function
   tracer instead of having its own trampoline (this change will happen
   on an arch by arch basis, and currently only x86_64 implements it).

 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.

 - Allow histogram triggers to add variables that can perform
   calculations against the event's fields.

 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent
   warnings from the compiler.

 - Extend histogram triggers to key off of variables.

 - Have trace recursion use bit magic to determine preempt context over
   if branches.

 - Have trace recursion disable preemption as all use cases do anyway.

 - Added testing for verification of tracing utilities.

 - Various small clean ups and fixes.

* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
  tracing/histogram: Fix semicolon.cocci warnings
  tracing/histogram: Fix documentation inline emphasis warning
  tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
  tracing: Show size of requested perf buffer
  bootconfig: Initialize ret in xbc_parse_tree()
  ftrace: do CPU checking after preemption disabled
  ftrace: disable preemption when recursion locked
  tracing/histogram: Document expression arithmetic and constants
  tracing/histogram: Optimize division by a power of 2
  tracing/histogram: Covert expr to const if both operands are constants
  tracing/histogram: Simplify handling of .sym-offset in expressions
  tracing: Fix operator precedence for hist triggers expression
  tracing: Add division and multiplication support for hist triggers
  tracing: Add support for creating hist trigger variables from literal
  selftests/ftrace: Stop tracing while reading the trace file by default
  MAINTAINERS: Update KPROBES and TRACING entries
  test_kprobes: Move it from kernel/ to lib/
  docs, kprobes: Remove invalid URL and add new reference
  samples/kretprobes: Fix return value if register_kretprobe() failed
  lib/bootconfig: Fix the xbc_get_info kerneldoc
  ...
2021-11-01 20:05:19 -07:00
Linus Torvalds
bf953917be Various hardening fixes and cleanups for 5.16-rc1
Hi Linus,
 
 Please, pull the following hardening fixes and cleanups that I've
 been collecting during the last development cycle. All of them have
 been baking in linux-next.
 
 Fix -Wcast-function-type error:
 
 - firewire: Remove function callback casts (Oscar Carter)
 
 Fix application of sizeof operator:
 
 - firmware/psci: fix application of sizeof to pointer (jing yangyang)
 
 Replace open coded instances with size_t saturating arithmetic helpers:
 
 - assoc_array: Avoid open coded arithmetic in allocator arguments (Len Baker)
 - writeback: prefer struct_size over open coded arithmetic (Len Baker)
 - aio: Prefer struct_size over open coded arithmetic (Len Baker)
 - dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic (Len Baker)
 
 Flexible array transformation:
 
 - KVM: PPC: Replace zero-length array with flexible array member (Len Baker)
 
 Use 2-factor argument multiplication form:
 
 - nouveau/svm: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva)
 - xfs: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva)
 
 Thanks
 --
 Gustavo
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmGAWzsACgkQRwW0y0cG
 2zF55A/+PTBZKg0XLQkPZ7HFipobeZpfvM0dU4JutwN6Kts1RmMRftPn6ootY18v
 4tWR4jXcnblvEr7UTgYAl6QQdytFXZKOK+JKMWV8LXLqyNGF6sS2PmA6zk/iQoa5
 1q0IKUaaqLIXwmm3xoz+/uNHsb+kfjYOHZpHA6HhYZQFDyShW7+hhIeS1NauJo2X
 op3IWasMumrawPkCJZ0ZJJQLELtZNGt4gHnOjB1MAYhOTAokowgeeDNtyfoJ9j1L
 iL8kimphVLI35H/GERBozmqdqRGIIZLlQF4P66VfNNEXSDoKOemAKDSFrfmYoVwE
 kdh6fqeKPV/aRImrCtNthfpiEjqEpm8afQGMC5H5uPnZontUX9tcU1Qagg0vwYx0
 fLZ8mMuNQK5AZfugK+1+2ShfBYUlhvWRhQdtjC9nIAoO80NqouWB7QD0zIHC2WV7
 durdlhzxik70ISnXqKmTR6bQNcXB6kFLPR30RpcA3E6+AgwlkP0FmaD3e+sDttJ0
 vtxDMHqMMNNzOWlLW2eqEdKMEfoU0gLyRt5iM7EN6R8HUXwup5f9bu7V4LuCnR6y
 FAX4tEa8b5wg01zNfyWClCccU6tetSeXjdrhdIk7szQVsOsYXc4zxDrp6xvqsAh2
 B7GbGk5qeUzM/O7QWNIl+5s/NhUjEzQ3QiQebRDdjVyINU2OKsI=
 =Jk0U
 -----END PGP SIGNATURE-----

Merge tag 'kspp-misc-fixes-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull hardening fixes and cleanups from Gustavo A. R. Silva:
 "Various hardening fixes and cleanups that I've been collecting during
  the last development cycle:

  Fix -Wcast-function-type error:

   - firewire: Remove function callback casts (Oscar Carter)

  Fix application of sizeof operator:

   - firmware/psci: fix application of sizeof to pointer (jing yangyang)

  Replace open coded instances with size_t saturating arithmetic
  helpers:

   - assoc_array: Avoid open coded arithmetic in allocator arguments
     (Len Baker)

   - writeback: prefer struct_size over open coded arithmetic (Len
     Baker)

   - aio: Prefer struct_size over open coded arithmetic (Len Baker)

   - dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic
     (Len Baker)

  Flexible array transformation:

   - KVM: PPC: Replace zero-length array with flexible array member (Len
     Baker)

  Use 2-factor argument multiplication form:

   - nouveau/svm: Use kvcalloc() instead of kvzalloc() (Gustavo A. R.
     Silva)

   - xfs: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva)"

* tag 'kspp-misc-fixes-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  firewire: Remove function callback casts
  nouveau/svm: Use kvcalloc() instead of kvzalloc()
  firmware/psci: fix application of sizeof to pointer
  dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic
  KVM: PPC: Replace zero-length array with flexible array member
  aio: Prefer struct_size over open coded arithmetic
  writeback: prefer struct_size over open coded arithmetic
  xfs: Use kvcalloc() instead of kvzalloc()
  assoc_array: Avoid open coded arithmetic in allocator arguments
2021-11-01 17:29:10 -07:00
Linus Torvalds
01463374c5 cpu-to-thread_info update for v5.16-rc1
Cross-architecture update to move task_struct::cpu back into thread_info
 on arm64, x86, s390, powerpc, and riscv. All Acked by arch maintainers.
 
 Quoting Ard Biesheuvel:
 
 "Move task_struct::cpu back into thread_info
 
  Keeping CPU in task_struct is problematic for architectures that define
  raw_smp_processor_id() in terms of this field, as it requires
  linux/sched.h to be included, which causes a lot of pain in terms of
  circular dependencies (aka 'header soup')
 
  This series moves it back into thread_info (where it came from) for all
  architectures that enable THREAD_INFO_IN_TASK, addressing the header
  soup issue as well as some pointless differences in the implementations
  of task_cpu() and set_task_cpu()."
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmGAEPYWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJq4wEACItgLuyzPgB2eSLVMc3sHPIWcn
 EUWbAWsuzJH79wmJtn2AKxW/C5OLBNGeoNjkXQvFN3ULkQDPrfCpB4x/tB6CjIQI
 WRDf8kO7oaAD85ZrbSwyFl/MFfrD67f6H1HZoB9FKWAzuv/Bp2xQ0Kf06Dv4HEZp
 CzprzZuWtjHB+qgyy+EpGOge3zbFmCuYPE2QpMYLWgs1rcVW9OYvoCI6AYtNefrC
 6Kl6CbmBb1k6lFxkhM7wvRcIJthBl6Bajpc3Z2uL1aLb27dVpQZs3YpY859Knb6U
 ZpOQCRJOMui3HOxyF3bDUI37y0XVLm6xaNM6C/7i0XS1GiFlSxkGVamg+Mp7anpI
 +hdK5kqtSagaBC9CaJvRHnWIex1npQAfiyDNdyiEbrsUJ1dp6/zZcQSe4/m/XRbi
 vywQPGxU9f1ASshzHsGU2TJf7Ps7qHulUsS5fKwmHU2ZjQnbYCoPN10JGO9gKjOX
 yioN5xsKnbPY9j0ys3l9XBqaMJ8KAr1XspplTGIMZIVbjNMlqrfgbg8Qn8T8WGM7
 oUqudMIxczilj0/iEGfGRxBeFaYAfhGQCDnxNlNX9g7Xe/gHTJgNYlHVxL55jHNu
 AoPE3Gd0X8K9fbov0BCB6a21XwGJ6Wj+FSrnvuyWrRuy8JWiDFJaVKUBEcalKr7a
 MhoUNQPu5M83OdC42A==
 =PzvV
 -----END PGP SIGNATURE-----

Merge tag 'cpu-to-thread_info-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull thread_info update to move 'cpu' back from task_struct from Kees Cook:
 "Cross-architecture update to move task_struct::cpu back into
  thread_info on arm64, x86, s390, powerpc, and riscv. All Acked by arch
  maintainers.

  Quoting Ard Biesheuvel:

     'Move task_struct::cpu back into thread_info

      Keeping CPU in task_struct is problematic for architectures that
      define raw_smp_processor_id() in terms of this field, as it
      requires linux/sched.h to be included, which causes a lot of pain
      in terms of circular dependencies (aka 'header soup')

      This series moves it back into thread_info (where it came from)
      for all architectures that enable THREAD_INFO_IN_TASK, addressing
      the header soup issue as well as some pointless differences in the
      implementations of task_cpu() and set_task_cpu()'"

* tag 'cpu-to-thread_info-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  riscv: rely on core code to keep thread_info::cpu updated
  powerpc: smp: remove hack to obtain offset of task_struct::cpu
  sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y
  powerpc: add CPU field to struct thread_info
  s390: add CPU field to struct thread_info
  x86: add CPU field to struct thread_info
  arm64: add CPU field to struct thread_info
2021-11-01 17:00:05 -07:00
Linus Torvalds
6e5772c8d9 Add an interface called cc_platform_has() which is supposed to be used
by confidential computing solutions to query different aspects of the
 system. The intent behind it is to unify testing of such aspects instead
 of having each confidential computing solution add its own set of tests
 to code paths in the kernel, leading to an unwieldy mess.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/uLUACgkQEsHwGGHe
 VUqGbQ/+LOmz8hmL5vtbXw/lVonCSBRKI2KVefnN2VtQ3rjtCq8HlNoq/hAdi15O
 WntABFV8u4daNAcssp+H/p+c8Mt/NzQa60TRooC5ZIynSOCj4oZQxTWjcnR4Qxrf
 oABy4sp09zNW31qExtTVTwPC/Ejzv4hA0Vqt9TLQOSxp7oYVYKeDJNp79VJK64Yz
 Ky7epgg8Pauk0tAT76ATR4kyy9PLGe4/Ry0bOtAptO4NShL1RyRgI0ywUmptJHSw
 FV/MnoexdAs4V8+4zPwyOkf8YMDnhbJcvFcr7Yd9AEz2q9Z1wKCgi1M3aZIoW8lV
 YMXECMGe9DfxmEJbnP5zbnL6eF32x+tbq+fK8Ye4V2fBucpWd27zkcTXjoP+Y+zH
 NLg+9QykR9QCH75YCOXcAg1Q5hSmc4DaWuJymKjT+W7MKs89ywjq+ybIBpLBHbQe
 uN9FM/CEKXx8nQwpNQc7mdUE5sZeCQ875028RaLbLx3/b6uwT6rBlNJfxl/uxmcZ
 iF1kG7Cx4uO+7G1a9EWgxtWiJQ8GiZO7PMCqEdwIymLIrlNksAk7nX2SXTuH5jIZ
 YDuBj/Xz2UUVWYFm88fV5c4ogiFlm9Jeo140Zua/BPdDJd2VOP013rYxzFE/rVSF
 SM2riJxCxkva8Fb+8TNiH42AMhPMSpUt1Nmd1H2rcEABRiT83Ow=
 =Na0U
 -----END PGP SIGNATURE-----

Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull generic confidential computing updates from Borislav Petkov:
 "Add an interface called cc_platform_has() which is supposed to be used
  by confidential computing solutions to query different aspects of the
  system.

  The intent behind it is to unify testing of such aspects instead of
  having each confidential computing solution add its own set of tests
  to code paths in the kernel, leading to an unwieldy mess"

* tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  treewide: Replace the use of mem_encrypt_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_active() with cc_platform_has()
  x86/sme: Replace occurrences of sme_active() with cc_platform_has()
  powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
  x86/sev: Add an x86 version of cc_platform_has()
  arch/cc: Introduce a function to check for confidential computing features
  x86/ioremap: Selectively build arch override encryption functions
2021-11-01 15:16:52 -07:00
Linus Torvalds
9a7e0a90a4 Scheduler updates:
- Revert the printk format based wchan() symbol resolution as it can leak
    the raw value in case that the symbol is not resolvable.
 
  - Make wchan() more robust and work with all kind of unwinders by
    enforcing that the task stays blocked while unwinding is in progress.
 
  - Prevent sched_fork() from accessing an invalid sched_task_group
 
  - Improve asymmetric packing logic
 
  - Extend scheduler statistics to RT and DL scheduling classes and add
    statistics for bandwith burst to the SCHED_FAIR class.
 
  - Properly account SCHED_IDLE entities
 
  - Prevent a potential deadlock when initial priority is assigned to a
    newly created kthread. A recent change to plug a race between cpuset and
    __sched_setscheduler() introduced a new lock dependency which is now
    triggered. Break the lock dependency chain by moving the priority
    assignment to the thread function.
 
  - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
 
  - Improve idle balancing in general and especially for NOHZ enabled
    systems.
 
  - Provide proper interfaces for live patching so it does not have to
    fiddle with scheduler internals.
 
  - Add cluster aware scheduling support.
 
  - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
    scheduler options and delaying mmdrop)
 
  - The usual small tweaks and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
 iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
 /1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
 aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
 3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
 ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
 vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
 mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
 V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
 s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
 i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
 d2qWG7UvoseT+g==
 =fgtS
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Thomas Gleixner:

 - Revert the printk format based wchan() symbol resolution as it can
   leak the raw value in case that the symbol is not resolvable.

 - Make wchan() more robust and work with all kind of unwinders by
   enforcing that the task stays blocked while unwinding is in progress.

 - Prevent sched_fork() from accessing an invalid sched_task_group

 - Improve asymmetric packing logic

 - Extend scheduler statistics to RT and DL scheduling classes and add
   statistics for bandwith burst to the SCHED_FAIR class.

 - Properly account SCHED_IDLE entities

 - Prevent a potential deadlock when initial priority is assigned to a
   newly created kthread. A recent change to plug a race between cpuset
   and __sched_setscheduler() introduced a new lock dependency which is
   now triggered. Break the lock dependency chain by moving the priority
   assignment to the thread function.

 - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.

 - Improve idle balancing in general and especially for NOHZ enabled
   systems.

 - Provide proper interfaces for live patching so it does not have to
   fiddle with scheduler internals.

 - Add cluster aware scheduling support.

 - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
   scheduler options and delaying mmdrop)

 - The usual small tweaks and improvements all over the place

* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
  sched/fair: Cleanup newidle_balance
  sched/fair: Remove sysctl_sched_migration_cost condition
  sched/fair: Wait before decaying max_newidle_lb_cost
  sched/fair: Skip update_blocked_averages if we are defering load balance
  sched/fair: Account update_blocked_averages in newidle_balance cost
  x86: Fix __get_wchan() for !STACKTRACE
  sched,x86: Fix L2 cache mask
  sched/core: Remove rq_relock()
  sched: Improve wake_up_all_idle_cpus() take #2
  irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
  irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
  irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
  sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
  sched: Add cluster scheduler level for x86
  sched: Add cluster scheduler level in core and related Kconfig for ARM64
  topology: Represent clusters of CPUs within a die
  sched: Disable -Wunused-but-set-variable
  sched: Add wrapper for get_wchan() to keep task blocked
  x86: Fix get_wchan() to support the ORC unwinder
  proc: Use task_is_running() for wchan in /proc/$pid/stat
  ...
2021-11-01 13:48:52 -07:00
Linus Torvalds
595b28fb0c Locking updates:
- Move futex code into kernel/futex/ and split up the kitchen sink into
    seperate files to make integration of sys_futex_waitv() simpler.
 
  - Add a new sys_futex_waitv() syscall which allows to wait on multiple
    futexes. The main use case is emulating Windows' WaitForMultipleObjects
    which allows Wine to improve the performance of Windows Games. Also
    native Linux games can benefit from this interface as this is a common
    wait pattern for this kind of applications.
 
  - Add context to ww_mutex_trylock() to provide a path for i915 to rework
    their eviction code step by step without making lockdep upset until the
    final steps of rework are completed. It's also useful for regulator and
    TTM to avoid dropping locks in the non contended path.
 
  - Lockdep and might_sleep() cleanups and improvements
 
  - A few improvements for the RT substitutions.
 
  - The usual small improvements and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/FTITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVNZD/9vIm3Bu1Coz8tbNXz58AiCYq9Y/vp5
 mzFgSzz+VJTkW5Vh8jo5Uel4rCKZyt+rL276EoaRPzYl8KFtWDbpK3qd3PrXKqTX
 At49JO4ttAMJUHIBQ6vblEkykmfEd9YPU1uSWk5roJ+s7Jmr5VWnu0FEWHP00As5
 tWOca/TM0ei9kof26V2fl5aecTGII4i4Zsvy+LPsXtI+TnmP0gSBcGAS/5UnZTtJ
 vQRWTR3ojoYvh5iTmNqbaURYoQLe2j8yscn1DSW1CABWVmP12eDWs+N7jRP4b5S9
 73xOv5P7vpva41wxrK2ir5iNkpsLE97VL2JOHTW8nm7orblfiuxHLTCkTjEdd2pO
 h8blI2IBizEB3JYn2BMkOAaZQOSjN8hd6Ye/b2B4AMEGWeXEoEv6eVy/orYKCluQ
 XDqGn47Vce/SYmo5vfTB8VMt6nANx8PKvOP3IvjHInYEQBgiT6QrlUw3RRkXBp5s
 clQkjYYwjAMVIXowcCrdhoKjMROzi6STShVwHwGL8MaZXqr8Vl6BUO9ckU0pY+4C
 F000Hzwxi8lGEQ9k+P+BnYOEzH5osCty8lloKiQ/7ciX6T+CZHGJPGK/iY4YL8P5
 C3CJWMsHCqST7DodNFJmdfZt99UfIMmEhshMDduU9AAH0tHCn8vOu0U6WvCtpyBp
 BvHj68zteAtlYg==
 =RZ4x
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:

 - Move futex code into kernel/futex/ and split up the kitchen sink into
   seperate files to make integration of sys_futex_waitv() simpler.

 - Add a new sys_futex_waitv() syscall which allows to wait on multiple
   futexes.

   The main use case is emulating Windows' WaitForMultipleObjects which
   allows Wine to improve the performance of Windows Games. Also native
   Linux games can benefit from this interface as this is a common wait
   pattern for this kind of applications.

 - Add context to ww_mutex_trylock() to provide a path for i915 to
   rework their eviction code step by step without making lockdep upset
   until the final steps of rework are completed. It's also useful for
   regulator and TTM to avoid dropping locks in the non contended path.

 - Lockdep and might_sleep() cleanups and improvements

 - A few improvements for the RT substitutions.

 - The usual small improvements and cleanups.

* tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  locking: Remove spin_lock_flags() etc
  locking/rwsem: Fix comments about reader optimistic lock stealing conditions
  locking: Remove rcu_read_{,un}lock() for preempt_{dis,en}able()
  locking/rwsem: Disable preemption for spinning region
  docs: futex: Fix kernel-doc references
  futex: Fix PREEMPT_RT build
  futex2: Documentation: Document sys_futex_waitv() uAPI
  selftests: futex: Test sys_futex_waitv() wouldblock
  selftests: futex: Test sys_futex_waitv() timeout
  selftests: futex: Add sys_futex_waitv() test
  futex,arm: Wire up sys_futex_waitv()
  futex,x86: Wire up sys_futex_waitv()
  futex: Implement sys_futex_waitv()
  futex: Simplify double_lock_hb()
  futex: Split out wait/wake
  futex: Split out requeue
  futex: Rename mark_wake_futex()
  futex: Rename: match_futex()
  futex: Rename: hb_waiter_{inc,dec,pending}()
  futex: Split out PI futex
  ...
2021-11-01 13:15:36 -07:00
Linus Torvalds
91e1c99e17 perf updates:
core:
 
   - Allow ftrace to instrument parts of the perf core code
 
   - Add a new mem_hops field to perf_mem_data_src which allows to represent
     intra-node/package or inter-node/off-package details to prepare for
     next generation systems which have more hieararchy within the
     node/pacakge level.
 
  tools:
 
   - Update for the new mem_hops field in perf_mem_data_src
 
  arch:
 
   - A set of constraints fixes for the Intel uncore PMU
 
   - The usual set of small fixes and improvements for x86 and PPC
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/GkQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaD8D/wLhXR8RxtF4W9HJmHA+5XFsPtg+isp
 ZNU2kOs4gZskFx75NQaRv5ikA8y68TKdIx+NuQvRLYItaMveTToLSsJ55bfGMxIQ
 JHqDvANUNxBmAACnbYQlqf9WgB0i/3fCUHY5lpmN0waKjaswz7WNpycv4ccShVZr
 PKbgEjkeFBhplCqqOF0X5H3V+4q85+nZONm1iSNd4S7/3B6OCxOf1u78usL1bbtW
 yJAMSuTeOVUZCJm7oVywKW/ZlCscT135aKr6xe5QTrjlPuRWzuLaXNezdMnMyoVN
 HVv8a0ClACb8U5KiGfhvaipaIlIAliWJp2qoiNjrspDruhH6Yc+eNh1gUhLbtNpR
 4YZR5jxv4/mS13kzMMQg00cCWQl7N4whPT+ZE9pkpshGt+EwT+Iy3U+v13wDfnnp
 MnDggpWYGEkAck13t/T6DwC3qBIsVujtpiG+tt/ERbTxiuxi1ccQTGY3PDjtHV3k
 tIMH5n7l4jEpfl8VmoSUgz/2h1MLZnQUWp41GXkjkaOt7uunQZen+nAwqpTm28KV
 7U6U0h1q6r7HxOZRxkPPe4HSV+aBNH3H1LeNBfEd3hDCFGf6MY6vLow+2BE9ybk7
 Y6LPbRqq0SN3sd5MND0ZvQEt5Zgol8CMlX+UKoLEEv7RognGbIxkgpK7exv5pC9w
 nWj7TaMfpRzPgw==
 =Oj0G
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Thomas Gleixner:
 "Core:

   - Allow ftrace to instrument parts of the perf core code

   - Add a new mem_hops field to perf_mem_data_src which allows to
     represent intra-node/package or inter-node/off-package details to
     prepare for next generation systems which have more hieararchy
     within the node/pacakge level.

  Tools:

   - Update for the new mem_hops field in perf_mem_data_src

  Arch:

   - A set of constraints fixes for the Intel uncore PMU

   - The usual set of small fixes and improvements for x86 and PPC"

* tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings
  powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses
  tools/perf: Add mem_hops field in perf_mem_data_src structure
  perf: Add mem_hops field in perf_mem_data_src structure
  perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line
  perf/core: Allow ftrace for functions in kernel/event/core.c
  perf/x86: Add new event for AUX output counter index
  perf/x86: Add compiler barrier after updating BTS
  perf/x86/intel/uncore: Fix Intel SPR M3UPI event constraints
  perf/x86/intel/uncore: Fix Intel SPR M2PCIE event constraints
  perf/x86/intel/uncore: Fix Intel SPR IIO event constraints
  perf/x86/intel/uncore: Fix Intel SPR CHA event constraints
  perf/x86/intel/uncore: Fix Intel ICX IIO event constraints
  perf/x86/intel/uncore: Fix invalid unit check
  perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
2021-11-01 13:12:15 -07:00
Linus Torvalds
33c8846c81 for-5.16/block-2021-10-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmF8KDgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmQ2D/wO0nH3U+3+OZChi3XUwYck9Dev3o6BANCF
 ClATiK/kivZY0xY1r8J4ixirZo2gcjIMpWSC3JGYZ5LdspfmYGLUbMjfZsaeU23i
 lAKaX1IqfArmHN76k3IU1bKCg7B0/LFwC0q9QTFWTSwNSs8RK/EZLJ61U1hEXUb3
 OfIpaMmvPiMaU7yuPqhcZK14m1cg1srrLM4rFB/PqsWWStF07pHq32WeArGDAU0e
 Fe0YSnYD7qqA5Qc37KwqjCTmmxKX5YZf7etIcA6p3DNmwcuQrVNzKoCH/ZEDijaD
 E2bS/BWbN1x96+rtoEZfBYEaNIrkmJzmW6+fJ53OITbJF3KqP6V66erhqNcFYCzC
 mhFlRe7voXb/8AP7zQqSIhK529BUBM36sQ6nF7EiQcDrfLc1z39mq6eblUxbknIA
 DDPISD5Tseik9N9x0bc7vINseKyHI1E90VAU/XKADcuGbzLvehPx+2p+Iq5ch5Ah
 oa1G3RdlWWQOZxphJHWJhu1qMfo5+FP9dFZj1aoo7b8Kbc/CedyoQe71cpIE5wNh
 Jj/EpWJnuyKXwuTic2VYGC+6ezM9O5DSdqCfP3YuZky95VESyvRCKJYMMgBYRVdC
 /LuxhnBXIY2G8An7ZTnX0kLCCvLbapIwa0NyA98/xeOngO843coJ6wn8ZmE9LJNH
 kMmpCygUrA==
 =QWC+
 -----END PGP SIGNATURE-----

Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - mq-deadline accounting improvements (Bart)

 - blk-wbt timer fix (Andrea)

 - Untangle the block layer includes (Christoph)

 - Rework the poll support to be bio based, which will enable adding
   support for polling for bio based drivers (Christoph)

 - Block layer core support for multi-actuator drives (Damien)

 - blk-crypto improvements (Eric)

 - Batched tag allocation support (me)

 - Request completion batching support (me)

 - Plugging improvements (me)

 - Shared tag set improvements (John)

 - Concurrent queue quiesce support (Ming)

 - Cache bdev in ->private_data for block devices (Pavel)

 - bdev dio improvements (Pavel)

 - Block device invalidation and block size improvements (Xie)

 - Various cleanups, fixes, and improvements (Christoph, Jackie,
   Masahira, Tejun, Yu, Pavel, Zheng, me)

* tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits)
  blk-mq-debugfs: Show active requests per queue for shared tags
  block: improve readability of blk_mq_end_request_batch()
  virtio-blk: Use blk_validate_block_size() to validate block size
  loop: Use blk_validate_block_size() to validate block size
  nbd: Use blk_validate_block_size() to validate block size
  block: Add a helper to validate the block size
  block: re-flow blk_mq_rq_ctx_init()
  block: prefetch request to be initialized
  block: pass in blk_mq_tags to blk_mq_rq_ctx_init()
  block: add rq_flags to struct blk_mq_alloc_data
  block: add async version of bio_set_polled
  block: kill DIO_MULTI_BIO
  block: kill unused polling bits in __blkdev_direct_IO()
  block: avoid extra iter advance with async iocb
  block: Add independent access ranges support
  blk-mq: don't issue request directly in case that current is to be blocked
  sbitmap: silence data race warning
  blk-cgroup: synchronize blkg creation against policy deactivation
  block: refactor bio_iov_bvec_set()
  block: add single bio async direct IO helper
  ...
2021-11-01 09:19:50 -07:00
Christophe Leroy
c12ab8dbc4 powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST
Until now, all tests involving CONFIG_STRICT_KERNEL_RWX were done with
DEBUG_RODATA_TEST to check the result. But now that
CONFIG_STRICT_KERNEL_RWX is selected by default, it came without
CONFIG_DEBUG_RODATA_TEST and led to the following Oops

[    6.830908] Freeing unused kernel image (initmem) memory: 352K
[    6.840077] BUG: Unable to handle kernel data access on write at 0xc1285200
[    6.846836] Faulting instruction address: 0xc0004b6c
[    6.851745] Oops: Kernel access of bad area, sig: 11 [#1]
[    6.857075] BE PAGE_SIZE=16K PREEMPT CMPC885
[    6.861348] SAF3000 DIE NOTIFICATION
[    6.864830] CPU: 0 PID: 1 Comm: swapper Not tainted 5.15.0-rc5-s3k-dev-02255-g2747d7b7916f #451
[    6.873429] NIP:  c0004b6c LR: c0004b60 CTR: 00000000
[    6.878419] REGS: c902be60 TRAP: 0300   Not tainted  (5.15.0-rc5-s3k-dev-02255-g2747d7b7916f)
[    6.886852] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 53000335  XER: 8000ff40
[    6.893564] DAR: c1285200 DSISR: 82000000
[    6.893564] GPR00: 0c000000 c902bf20 c20f4000 08000000 00000001 04001f00 c1800000 00000035
[    6.893564] GPR08: ff0001ff c1280000 00000002 c0004b60 00001000 00000000 c0004b1c 00000000
[    6.893564] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.893564] GPR24: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c1060000
[    6.932034] NIP [c0004b6c] kernel_init+0x50/0x138
[    6.936682] LR [c0004b60] kernel_init+0x44/0x138
[    6.941245] Call Trace:
[    6.943653] [c902bf20] [c0004b60] kernel_init+0x44/0x138 (unreliable)
[    6.950022] [c902bf30] [c001122c] ret_from_kernel_thread+0x5c/0x64
[    6.956135] Instruction dump:
[    6.959060] 48ffc521 48045469 4800d8cd 3d20c086 89295fa0 2c090000 41820058 480796c9
[    6.966890] 4800e48d 3d20c128 39400002 3fe0c106 <91495200> 3bff8000 4806fa1d 481f7d75
[    6.974902] ---[ end trace 1e397bacba4aa610 ]---

0xc1285200 corresponds to 'system_state' global var that the kernel is trying to set to
SYSTEM_RUNNING. This var is above the RO/RW limit so it shouldn't Oops.

It oopses because the dirty bit is missing.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3d5800b0bbcd7b19761b98f50421358667b45331.1635520232.git.christophe.leroy@csgroup.eu
2021-11-01 21:39:03 +11:00
Arnd Bergmann
f98a3dccfc locking: Remove spin_lock_flags() etc
parisc, ia64 and powerpc32 are the only remaining architectures that
provide custom arch_{spin,read,write}_lock_flags() functions, which are
meant to re-enable interrupts while waiting for a spinlock.

However, none of these can actually run into this codepath, because
it is only called on architectures without CONFIG_GENERIC_LOCKBREAK,
or when CONFIG_DEBUG_LOCK_ALLOC is set without CONFIG_LOCKDEP, and none
of those combinations are possible on the three architectures.

Going back in the git history, it appears that arch/mn10300 may have
been able to run into this code path, but there is a good chance that
it never worked. On the architectures that still exist, it was
already impossible to hit back in 2008 after the introduction of
CONFIG_GENERIC_LOCKBREAK, and possibly earlier.

As this is all dead code, just remove it and the helper functions built
around it. For arch/ia64, the inline asm could be cleaned up, but
it seems safer to leave it untouched.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Helge Deller <deller@gmx.de>  # parisc
Link: https://lore.kernel.org/r/20211022120058.1031690-1-arnd@kernel.org
2021-10-30 16:37:28 +02:00
Linus Torvalds
119c85055d powerpc fixes for 5.15 #6
Three commits fixing some issues introduced with the recent IOMMU changes we merged.
 
 Thanks to: Alexey Kardashevskiy
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmF8e20THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG4GEAC6SAo82kdWOR5MEwZ+U9ht91KqzK/O
 UYm5WLapGaWzKtS5zMZUzc3dTXFsbeSdPghfGp2eLpnkQkfDnltyGA4ERVunEQUg
 soGXt0OXBfmt++D+yogMkURr9tYzZ+ssrdCEC6Vmuv35Tf/dsHA9upWCVVw4UOOv
 w/RRR1uku6kup7NCX6TJZoUQSeAXISDhwk3LXF5jh/hqSyV3BV9yulHYs3J4WtFQ
 zr2dsxXL5DvgM3cOmLBZbnmTYSmU1f0jtGhqEf/6Ar3ljLHZgIvLLWmZK1UyyMwv
 4P7yFIBVObW0sBqGz/4K9p73l8MnVHORrBUe73OavZgrxaM2u0k/7JRj72txcbMw
 KXNKvDoA9nambWV98OXt4HE2bInsporn4DSDSAwJZmuTcqltbFOCf47t0kpe66fB
 ZC2IJCOfrol5ELZvmdAsuVWEucHkdPMPqz0ZB46E/givQf45RmfYnjvVKm8HTQls
 aOivKNuP2VLXhHdeocly1adaDMaeMEouYw1p00VKhOW4GbfHCZs4IczDtqacHRMR
 NB+C0awi6DDQ7WXCEYURte9iURn8owjJhMuIsQj2/SUHVfuQEeJL1llSzo9sa/f2
 ONT0esDhKSuq74T1nmimUMOvAyBg/TKuwY87TJ9BKcJzqSj+oPuuw28d8HBi61qk
 S9rG2jHto84HVw==
 =R9Yq
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Three commits fixing some issues introduced with the recent IOMMU
  changes we merged.

  Thanks to Alexey Kardashevskiy"

* tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present
  powerpc/pseries/iommu: Check if the default window in use before removing it
  powerpc/pseries/iommu: Use correct vfree for it_map
2021-10-29 17:35:56 -07:00
Eric W. Biederman
e21294a7aa signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted.  So change every instance we can to make the code clearer.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Link: https://lkml.kernel.org/r/877de7jrev.fsf@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-29 14:31:34 -05:00
Ben Widawsky
c6d7e1341c ocxl: Use pci core's DVSEC functionality
Reduce maintenance burden of DVSEC query implementation by using the
centralized PCI core implementation.

There are two obvious places to simply drop in the new core
implementation. There remains find_dvsec_from_pos() which would benefit
from using a core implementation. As that change is less trivial it is
reserved for later.

Cc: linuxppc-dev@lists.ozlabs.org
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> (v1)
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Link: https://lore.kernel.org/r/163379789065.692348.7117946955275586530.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:52 -07:00
Nicholas Piggin
81291383ff powerpc/32e: Ignore ESR in instruction storage interrupt handler
A e5500 machine running a 32-bit kernel sometimes hangs at boot,
seemingly going into an infinite loop of instruction storage interrupts.

The ESR (Exception Syndrome Register) has a value of 0x800000 (store)
when this happens, which is likely set by a previous store. An
instruction TLB miss interrupt would then leave ESR unchanged, and if no
PTE exists it calls directly to the instruction storage interrupt
handler without changing ESR.

access_error() does not cause a segfault due to a store to a read-only
vma because is_exec is true. Most subsequent fault handling does not
check for a write fault on a read-only vma, and might do strange things
like create a writeable PTE or call page_mkwrite on a read only vma or
file. It's not clear what happens here to cause the infinite faulting in
this case, a fault handler failure or low level PTE or TLB handling.

In any case this can be fixed by having the instruction storage
interrupt zero regs->dsisr rather than storing the ESR value to it.

Fixes: a01a3f2ddb ("powerpc: remove arguments from fault handler functions")
Cc: stable@vger.kernel.org # v5.12+
Reported-by: Jacques de Laval <jacques.delaval@protonmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Jacques de Laval <jacques.delaval@protonmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211028133043.4159501-1-npiggin@gmail.com
2021-10-29 23:12:37 +11:00
Vasant Hegde
52862ab33c powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
Commit 587164cd, introduced new opal message type (OPAL_MSG_PRD2) and
added opal notifier. But I missed to unregister the notifier during
module unload path. This results in below call trace if you try to
unload and load opal_prd module.

Also add new notifier_block for OPAL_MSG_PRD2 message.

Sample calltrace (modprobe -r opal_prd; modprobe opal_prd)
  BUG: Unable to handle kernel data access on read at 0xc0080000192200e0
  Faulting instruction address: 0xc00000000018d1cc
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  CPU: 66 PID: 7446 Comm: modprobe Kdump: loaded Tainted: G            E     5.14.0prd #759
  NIP:  c00000000018d1cc LR: c00000000018d2a8 CTR: c0000000000cde10
  REGS: c0000003c4c0f0a0 TRAP: 0300   Tainted: G            E      (5.14.0prd)
  MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24224824  XER: 20040000
  CFAR: c00000000018d2a4 DAR: c0080000192200e0 DSISR: 40000000 IRQMASK: 1
  ...
  NIP notifier_chain_register+0x2c/0xc0
  LR  atomic_notifier_chain_register+0x48/0x80
  Call Trace:
    0xc000000002090610 (unreliable)
    atomic_notifier_chain_register+0x58/0x80
    opal_message_notifier_register+0x7c/0x1e0
    opal_prd_probe+0x84/0x150 [opal_prd]
    platform_probe+0x78/0x130
    really_probe+0x110/0x5d0
    __driver_probe_device+0x17c/0x230
    driver_probe_device+0x60/0x130
    __driver_attach+0xfc/0x220
    bus_for_each_dev+0xa8/0x130
    driver_attach+0x34/0x50
    bus_add_driver+0x1b0/0x300
    driver_register+0x98/0x1a0
    __platform_driver_register+0x38/0x50
    opal_prd_driver_init+0x34/0x50 [opal_prd]
    do_one_initcall+0x60/0x2d0
    do_init_module+0x7c/0x320
    load_module+0x3394/0x3650
    __do_sys_finit_module+0xd4/0x160
    system_call_exception+0x140/0x290
    system_call_common+0xf4/0x258

Fixes: 587164cd59 ("powerpc/powernv: Add new opal message type")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211028165716.41300-1-hegdevasant@linux.vnet.ibm.com
2021-10-29 16:47:32 +11:00
Christophe Leroy
f8c0e36b48 powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEBUG_PAGEALLOC
When ARCH_SUPPORTS_DEBUG_PAGEALLOC is not selected, the user can
still select CONFIG_DEBUG_PAGEALLOC in which case __kernel_map_pages()
is provided by mm/page_poison.c

So only define __kernel_map_pages() when both
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC
are defined.

Fixes: 68b44f94d6 ("powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/971b69739ff4746252e711a9845210465c023a9e.1635425947.git.christophe.leroy@csgroup.eu
2021-10-29 11:10:43 +11:00
Michael Ellerman
5b9f69ffa6 Merge branch 'topic/ppc-kvm' into next
Merge a couple of KVM ppc patches we are keeping in a topic branch.
2021-10-29 00:00:57 +11:00
Denis Kirjanov
b1f896ce35 powerpc/xmon: fix task state output
p_state is unsigned since the commit 2f064a59a1

The patch also uses TASK_RUNNING instead of null.

Fixes: 2f064a59a1 ("sched: Change task_struct::state")
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211026133108.7113-1-kda@linux-powerpc.org
2021-10-28 22:32:03 +11:00
Bixuan Cui
290fe8aa69 powerpc/44x/fsp2: add missing of_node_put
Early exits from for_each_compatible_node() should decrement the
node reference counter.  Reported by Coccinelle:

./arch/powerpc/platforms/44x/fsp2.c:206:1-25: WARNING: Function
"for_each_compatible_node" should have of_node_put() before return
around line 218.

Fixes: 7813043e1b ("powerpc/44x/fsp2: Add irq error handlers")
Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1635406102-88719-1-git-send-email-cuibixuan@linux.alibaba.com
2021-10-28 22:29:35 +11:00
Michael Ellerman
fef071be57 powerpc/dcr: Use cmplwi instead of 3-argument cmpli
In dcr-low.S we use cmpli with three arguments, instead of four
arguments as defined in the ISA:

	cmpli	cr0,r3,1024

This appears to be a PPC440-ism, looking at the "PPC440x5 CPU Core
User’s Manual" it shows cmpli having no L field, but implied to be 0 due
to the core being 32-bit. It mentions that the ISA defines four
arguments and recommends using cmplwi.

It also corresponds to the old POWER instruction set, which had no L
field there, a reserved bit instead.

dcr-low.S is only built 32-bit, because it is only built when
DCR_NATIVE=y, which is only selected by 40x and 44x. Looking at the
generated code (with gcc/gas) we see cmplwi as expected.

Although gas is happy with the 3-argument version when building for
32-bit, the LLVM assembler is not and errors out with:

  arch/powerpc/sysdev/dcr-low.S:27:10: error: invalid operand for instruction
   cmpli 0,%r3,1024; ...
           ^

Switch to the cmplwi extended opcode, which avoids any confusion when
reading the ISA, fixes the issue with the LLVM assembler, and also means
the code could be built 64-bit in future (though that's very unlikely).

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
BugLink: https://github.com/ClangBuiltLinux/linux/issues/1419
Link: https://lore.kernel.org/r/20211014024424.528848-1-mpe@ellerman.id.au
2021-10-28 22:29:25 +11:00
Laurent Vivier
235cee1624 KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling
Commit 112665286d ("KVM: PPC: Book3S HV: Context tracking exit guest
context before enabling irqs") moved guest_exit() into the interrupt
protected area to avoid wrong context warning (or worse). The problem is
that tick-based time accounting has not yet been updated at this point
(because it depends on the timer interrupt firing), so the guest time
gets incorrectly accounted to system time.

To fix the problem, follow the x86 fix in commit 1604571401 ("Defer
vtime accounting 'til after IRQ handling"), and allow host IRQs to run
before accounting the guest exit time.

In the case vtime accounting is enabled, this is not required because TB
is used directly for accounting.

Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a
guest running a kernel compile, the 'guest' fields of /proc/stat are
stuck at zero. With the patch they can be observed increasing roughly as
expected.

Fixes: e233d54d4d ("KVM: booke: use __kvm_guest_exit")
Fixes: 112665286d ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs")
Cc: stable@vger.kernel.org # 5.12+
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
[np: only required for tick accounting, add Book3E fix, tweak changelog]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211027142150.3711582-1-npiggin@gmail.com
2021-10-28 22:27:48 +11:00
Russell Currey
3c12b4df8d powerpc/security: Use a mutex for interrupt exit code patching
The mitigation-patching.sh script in the powerpc selftests toggles
all mitigations on and off simultaneously, revealing that rfi_flush
and stf_barrier cannot safely operate at the same time due to races
in updating the static key.

On some systems, the static key code throws a warning and the kernel
remains functional.  On others, the kernel will hang or crash.

Fix this by slapping on a mutex.

Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211027072410.40950-1-ruscur@russell.cc
2021-10-28 17:23:58 +11:00
王贇
ce5e48036c ftrace: disable preemption when recursion locked
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.

And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption disabled, we
can just merge the logical.

This patch will make sure the preemption has been disabled when
trace_test_and_set_recursion() return bit >= 0, and
trace_clear_recursion() will enable the preemption if previously
enabled.

Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com

CC: Petr Mladek <pmladek@suse.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
[ Removed extra line in comment - SDR ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27 11:21:49 -04:00
Uwe Kleine-König
5d354dc35e powerpc/83xx/mpc8349emitx: Make mcu_gpiochip_remove() return void
Up to now mcu_gpiochip_remove() returns zero unconditionally. Make it
return void instead which makes it easier to see in the callers that
there is no error to handle.

Also the return value of i2c remove callbacks is ignored anyway.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211021105657.72572-1-u.kleine-koenig@pengutronix.de
2021-10-28 00:41:29 +11:00
Christophe Leroy
44c14509b0 powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMs
Building tqm8541_defconfig results in:

	arch/powerpc/mm/nohash/fsl_book3e.c: In function 'settlbcam':
	arch/powerpc/mm/nohash/fsl_book3e.c:126:40: error: '_PAGE_BAP_SX' undeclared (first use in this function)
	  126 |         TLBCAM[index].MAS3 |= (flags & _PAGE_BAP_SX) ? MAS3_SX : 0;
	      |                                        ^~~~~~~~~~~~
	arch/powerpc/mm/nohash/fsl_book3e.c:126:40: note: each undeclared identifier is reported only once for each function it appears in
	make[3]: *** [scripts/Makefile.build:277: arch/powerpc/mm/nohash/fsl_book3e.o] Error 1
	make[2]: *** [scripts/Makefile.build:540: arch/powerpc/mm/nohash] Error 2
	make[1]: *** [scripts/Makefile.build:540: arch/powerpc/mm] Error 2
	make: *** [Makefile:1868: arch/powerpc] Error 2

This is because _PAGE_BAP_SX is not defined when using 32 bits PTE.

Now that _PAGE_EXEC contains both _PAGE_BAP_SX and _PAGE_BAP_UX, it can be used instead.

Fixes: 01116e6e98 ("powerpc/fsl_booke: Take exec flag into account when setting TLBCAMs")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/91a0235e7f2a85308b84aa5b9efd8d022e2b899a.1635226743.git.christophe.leroy@csgroup.eu
2021-10-28 00:41:29 +11:00
Christophe Leroy
b6cb20fdc2 powerpc/book3e: Fix set_memory_x() and set_memory_nx()
set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC.
set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC.

Book3e has 2 bits, UX and SX, which defines the exec rights
resp. for user (PR=1) and for kernel (PR=0).

_PAGE_EXEC is defined as UX only.

An executable kernel page is set with either _PAGE_KERNEL_RWX
or _PAGE_KERNEL_ROX, which both have SX set and UX cleared.

So set_memory_nx() call for an executable kernel page does
nothing because UX is already cleared.

And set_memory_x() on a non-executable kernel page makes it
executable for the user and keeps it non-executable for kernel.

Also, pte_exec() always returns 'false' on kernel pages, because
it checks _PAGE_EXEC which doesn't include SX, so for instance
the W+X check doesn't work.

To fix this:
  - change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER
  - sets both UX and SX in _PAGE_EXEC so that pte_exec() returns
    true whenever one of the two bits is set and pte_exprotect()
    clears both bits.
  - Define a book3e specific version of pte_mkexec() which sets
    either SX or UX based on UR.

Fixes: 1f9ad21c3b ("powerpc/mm: Implement set_memory() routines")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c41100f9c144dc5b62e5a751b810190c6b5d42fd.1635226743.git.christophe.leroy@csgroup.eu
2021-10-28 00:41:29 +11:00
Christophe Leroy
b1b93cb7e7 powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()
Commit 26973fa5ac ("powerpc/mm: use pte helpers in generic code")
changed those two functions to use pte helpers to determine which
bits to clear and which bits to set.

This change was based on the assumption that bits to be set/cleared
are always the same and can be determined by applying the pte
manipulation helpers on __pte(0).

But on platforms like book3e, the bits depend on whether the page
is a user page or not.

For the time being it more or less works because of _PAGE_EXEC being
used for user pages only and exec right being set at all time on
kernel page. But following patch will clean that and output of
pte_mkexec() will depend on the page being a user or kernel page.

Instead of trying to make an even more complicated helper where bits
would become dependent on the final pte value, come back to a more
static situation like before commit 26973fa5ac ("powerpc/mm: use
pte helpers in generic code"), by introducing an 8xx specific
version of __ptep_set_access_flags() and ptep_set_wrprotect().

Fixes: 26973fa5ac ("powerpc/mm: use pte helpers in generic code")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/922bdab3a220781bae2360ff3dd5adb7fe4d34f1.1635226743.git.christophe.leroy@csgroup.eu
2021-10-28 00:41:28 +11:00
Hari Bathini
44a8214de9 powerpc/bpf: Fix write protecting JIT code
Running program with bpf-to-bpf function calls results in data access
exception (0x300) with the below call trace:

  bpf_int_jit_compile+0x238/0x750 (unreliable)
  bpf_check+0x2008/0x2710
  bpf_prog_load+0xb00/0x13a0
  __sys_bpf+0x6f4/0x27c0
  sys_bpf+0x2c/0x40
  system_call_exception+0x164/0x330
  system_call_vectored_common+0xe8/0x278

as bpf_int_jit_compile() tries writing to write protected JIT code
location during the extra pass.

Fix it by holding off write protection of JIT code until the extra
pass, where branch target addresses fixup happens.

Fixes: 62e3d4210a ("powerpc/bpf: Write protect JIT code")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211025055649.114728-1-hbathini@linux.ibm.com
2021-10-28 00:08:32 +11:00
Nicholas Piggin
4a5cb51f3d powerpc/64s/interrupt: Fix check_return_regs_valid() false positive
The check_return_regs_valid() can cause a false positive if the return
regs are marked as norestart and they are an HSRR type interrupt,
because the low bit in the bottom of regs->trap causes interrupt type
matching to fail.

This can occcur for example on bare metal with a HV privileged doorbell
interrupt that causes a signal, but do_signal returns early because
get_signal() fails, and takes the "No signal to deliver" path. In this
case no signal was delivered so the return location is not changed so
return SRRs are not invalidated, yet set_trap_norestart is called, which
messes up the match. Building go-1.16.6 is known to reproduce this.

Fix it by using the TRAP() accessor which masks out the low bit.

Fixes: 6eaaf9de35 ("powerpc/64s/interrupt: Check and fix srr_valid without crashing")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211026122531.3599918-1-npiggin@gmail.com
2021-10-27 22:33:47 +11:00
Christophe Leroy
b949d009dd powerpc/boot: Set LC_ALL=C in wrapper script
While trying to build a simple Image for ACADIA platform, I got the
following error:

	  WRAP    arch/powerpc/boot/simpleImage.acadia
	INFO: Uncompressed kernel (size 0x6ae7d0) overlaps the address of the wrapper(0x400000)
	INFO: Fixing the link_address of wrapper to (0x700000)
	powerpc64-linux-gnu-ld : mode d'émulation non reconnu : -T
	Émulations prises en charge : elf64ppc elf32ppc elf32ppclinux elf32ppcsim elf64lppc elf32lppc elf32lppclinux elf32lppcsim
	make[1]: *** [arch/powerpc/boot/Makefile:424 : arch/powerpc/boot/simpleImage.acadia] Erreur 1
	make: *** [arch/powerpc/Makefile:285 : simpleImage.acadia] Erreur 2

Trying again with V=1 shows the following command

	powerpc64-linux-gnu-ld -m -T arch/powerpc/boot/zImage.lds -Ttext 0x700000 --no-dynamic-linker -o arch/powerpc/boot/simpleImage.acadia -Map wrapper.map arch/powerpc/boot/fixed-head.o arch/powerpc/boot/simpleboot.o ./zImage.3278022.o arch/powerpc/boot/wrapper.a

The argument of '-m' is missing.

This is due to the wrapper script calling 'objdump -p vmlinux' and
looking for 'file format', whereas the output of objdump is:

	vmlinux:     format de fichier elf32-powerpc

	En-tête de programme:
	    LOAD off    0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
	         filesz 0x0069e1d4 memsz 0x006c128c flags rwx
	    NOTE off    0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
	         filesz 0x00000054 memsz 0x00000054 flags ---

Add LC_ALL=C at the beginning of the wrapper script in order to get the
output expected by the script:

	vmlinux:     file format elf32-powerpc

	Program Header:
	    LOAD off    0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
	         filesz 0x0069e1d4 memsz 0x006c128c flags rwx
	    NOTE off    0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
	         filesz 0x00000054 memsz 0x00000054 flags ---

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a9ff3bc98035f63b122c051f02dc47c7aed10430.1635256089.git.christophe.leroy@csgroup.eu
2021-10-27 22:31:22 +11:00
Joel Stanley
f22969a660 powerpc/64s: Default to 64K pages for 64 bit book3s
For 64-bit book3s the default should be 64K as that's what modern CPUs
are designed for.

The following defconfigs already set CONFIG_PPC_64K_PAGES:

 cell_defconfig
 pasemi_defconfig
 powernv_defconfig
 ppc64_defconfig
 pseries_defconfig
 skiroot_defconfig

The have the option removed from the defconfig, as it is now the
default.

The defconfigs that now need to set CONFIG_PPC_4K_PAGES to maintain
their existing behaviour are:

 g5_defconfig
 maple_defconfig
 microwatt_defconfig
 ps3_defconfig

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
BugLink: https://github.com/linuxppc/issues/issues/109
Link: https://lore.kernel.org/r/20211015001649.45591-1-joel@jms.id.au
2021-10-27 22:31:22 +11:00
Michael Ellerman
b7472e1764 Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC"
This reverts commit 566af8cda3.

This caused some conflicts vs the audit tree, and the audit maintainers
would prefer we postpone this to the next merge window so we have more
time for testing.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-10-27 22:30:32 +11:00
Eric W. Biederman
83a1f27ad7 signal/powerpc: On swapcontext failure force SIGSEGV
If the register state may be partial and corrupted instead of calling
do_exit, call force_sigsegv(SIGSEGV).  Which properly kills the
process with SIGSEGV and does not let any more userspace code execute,
instead of just killing one thread of the process and potentially
confusing everything.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.")
Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt")
Link: https://lkml.kernel.org/r/20211020174406.17889-7-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25 15:56:29 -05:00
Alexey Kardashevskiy
d853adc7ad powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present
The iommu_init_table() helper takes an address range to reserve in
the IOMMU table being initialized to exclude MMIO addresses, this is
useful if the window stretches far beyond 4GB (although wastes some TCEs).
At the moment the code searches for such MMIO32 range and fails if none
found which is considered a problem while it really is not: it is actually
better as this says there is no MMIO32 to reserve and we can use
usually wasted TCEs. Furthermore PHYP never actually allows creating
windows starting at busaddress=0 so this MMIO32 range is never useful.

This removes error exit and initializes the table with zero range if
no MMIO32 is detected.

Fixes: 381ceda88c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020132315.2287178-5-aik@ozlabs.ru
2021-10-25 11:41:15 +11:00
Alexey Kardashevskiy
92fe01b7c6 powerpc/pseries/iommu: Check if the default window in use before removing it
At the moment this check is performed after we remove the default window
which is late and disallows to revert whatever changes enable_ddw()
has made to DMA windows.

This moves the check and error exit before removing the window.

This raised the message severity from "debug" to "warning" as this
should not happen in practice and cannot be triggered by the userspace.

Fixes: 381ceda88c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020132315.2287178-4-aik@ozlabs.ru
2021-10-25 11:41:14 +11:00
Alexey Kardashevskiy
41ee7232fa powerpc/pseries/iommu: Use correct vfree for it_map
The it_map array is vzalloc'ed so use vfree() for it when creating
a huge DMA window failed for whatever reason.

While at this, write zero to it_map.

Fixes: 381ceda88c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020132315.2287178-3-aik@ozlabs.ru
2021-10-25 11:41:14 +11:00
Masahiro Yamada
8212f8986d kbuild: use more subdir- for visiting subdirectories while cleaning
Documentation/kbuild/makefiles.rst suggests to use "archclean" for
cleaning arch/$(SRCARCH)/boot/, but it is not a hard requirement.

Since commit d92cc4d516 ("kbuild: require all architectures to have
arch/$(SRCARCH)/Kbuild"), we can use the "subdir- += boot" trick for
all architectures. This can take advantage of the parallel option (-j)
for "make clean".

I also cleaned up the comments in arch/$(SRCARCH)/Makefile. The "archdep"
target no longer exists.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
2021-10-24 13:49:46 +09:00
Nathan Lynch
319fa1a52e powerpc/pseries/mobility: ignore ibm, platform-facilities updates
On VMs with NX encryption, compression, and/or RNG offload, these
capabilities are described by nodes in the ibm,platform-facilities device
tree hierarchy:

  $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
  /sys/firmware/devicetree/base/ibm,platform-facilities/
  ├── ibm,compression-v1
  ├── ibm,random-v1
  └── ibm,sym-encryption-v1

  3 directories

The acceleration functions that these nodes describe are not disrupted by
live migration, not even temporarily.

But the post-migration ibm,update-nodes sequence firmware always sends
"delete" messages for this hierarchy, followed by an "add" directive to
reconstruct it via ibm,configure-connector (log with debugging statements
enabled in mobility.c):

  mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
  mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
  mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
  mobility: removing node /ibm,platform-facilities:4294967286
  ...
  mobility: added node /ibm,platform-facilities:4294967286

Note we receive a single "add" message for the entire hierarchy, and what
we receive from the ibm,configure-connector sequence is the top-level
platform-facilities node along with its three children. The debug message
simply reports the parent node and not the whole subtree.

Also, significantly, the nodes added are almost completely equivalent to
the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
the leaf nodes is the only property I've observed to differ, and Linux does
not use that. So in practice, the sum of update messages Linux receives for
this hierarchy is equivalent to minor property updates.

We succeed in removing the original hierarchy from the device tree. But the
vio bus code is ignorant of this, and does not unbind or relinquish its
references. The leaf nodes, still reachable through sysfs, of course still
refer to the now-freed ibm,platform-facilities parent node, which makes
use-after-free possible:

  refcount_t: addition on 0; use-after-free.
  WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
  refcount_warn_saturate+0x160/0x1f0 (unreliable)
  kobject_get+0xf0/0x100
  of_node_get+0x30/0x50
  of_get_parent+0x50/0xb0
  of_fwnode_get_parent+0x54/0x90
  fwnode_count_parents+0x50/0x150
  fwnode_full_name_string+0x30/0x110
  device_node_string+0x49c/0x790
  vsnprintf+0x1c0/0x4c0
  sprintf+0x44/0x60
  devspec_show+0x34/0x50
  dev_attr_show+0x40/0xa0
  sysfs_kf_seq_show+0xbc/0x200
  kernfs_seq_show+0x44/0x60
  seq_read_iter+0x2a4/0x740
  kernfs_fop_read_iter+0x254/0x2e0
  new_sync_read+0x120/0x190
  vfs_read+0x1d0/0x240

Moreover, the "new" replacement subtree is not correctly added to the
device tree, resulting in ibm,platform-facilities parent node without the
appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:

  $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
  /sys/firmware/devicetree/base/ibm,platform-facilities/

  0 directories

  $ cd /sys/devices/vio ; find . -xtype l -exec file {} +
  ./ibm,sym-encryption-v1/of_node: broken symbolic link to
    ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
  ./ibm,random-v1/of_node:         broken symbolic link to
    ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
  ./ibm,compression-v1/of_node:    broken symbolic link to
    ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1

This is because add_dt_node() -> dlpar_attach_node() attaches only the
parent node returned from configure-connector, ignoring any children. This
should be corrected for the general case, but fixing that won't help with
the stale OF node references, which is the more urgent problem.

One way to address that would be to make the drivers respond to node
removal notifications, so that node references can be dropped
appropriately. But this would likely force the drivers to disrupt active
clients for no useful purpose: equivalent nodes are immediately re-added.
And recall that the acceleration capabilities described by the nodes remain
available throughout the whole process.

The solution I believe to be robust for this situation is to convert
remove+add of a node with an unchanged phandle to an update of the node's
properties in the Linux device tree structure. That would involve changing
and adding a fair amount of code, and may take several iterations to land.

Until that can be realized we have a confirmed use-after-free and the
possibility of memory corruption. So add a limited workaround that
discriminates on the node type, ignoring adds and removes. This should be
amenable to backporting in the meantime.

Fixes: 410bccf978 ("powerpc/pseries: Partition migration in the kernel")
Cc: stable@vger.kernel.org
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020194703.2613093-1-nathanl@linux.ibm.com
2021-10-22 15:22:06 +11:00
Christophe Leroy
c7d19189d7 powerpc/32: Don't use a struct based type for pte_t
Long time ago we had a config item called STRICT_MM_TYPECHECKS
to build the kernel with pte_t defined as a structure in order
to perform additional build checks or build it with pte_t
defined as a simple type in order to get simpler generated code.

Commit 670eea9241 ("powerpc/mm: Always use STRICT_MM_TYPECHECKS")
made the struct based definition the only one, considering that the
generated code was similar in both cases.

That's right on ppc64 because the ABI is such that the content of a
struct having a single simple type element is passed as register,
but on ppc32 such a structure is passed via the stack like any
structure.

Simple test function:

	pte_t test(pte_t pte)
	{
		return pte;
	}

Before this patch we get

	c00108ec <test>:
	c00108ec:	81 24 00 00 	lwz     r9,0(r4)
	c00108f0:	91 23 00 00 	stw     r9,0(r3)
	c00108f4:	4e 80 00 20 	blr

So, for PPC32, restore the simple type behaviour we got before
commit 670eea9241, but instead of adding a config option to
activate type check, do it when __CHECKER__ is set so that type
checking is performed by 'sparse' and provides feedback like:

	arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types)
	arch/powerpc/mm/pgtable.c:466:16:    expected unsigned long
	arch/powerpc/mm/pgtable.c:466:16:    got struct pte_t [usertype] x

With this patch we now get

	c0010890 <test>:
	c0010890:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Define STRICT_MM_TYPECHECKS rather than repeating the condition]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c904599f33aaf6bb7ee2836a9ff8368509e0d78d.1631887042.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:06 +11:00
Christophe Leroy
a61ec782a7 powerpc/breakpoint: Cleanup
cache_op_size() does exactly the same as l1_dcache_bytes().

Remove it.

MSR_64BIT already exists, no need to enclode the check
around #ifdef __powerpc64__

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6184b08088312a7d787d450eb902584e4ae77f7a.1632317816.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:06 +11:00
Christophe Leroy
fdacae8a84 powerpc: Activate CONFIG_STRICT_KERNEL_RWX by default
CONFIG_STRICT_KERNEL_RWX should be set by default on every
architectures (See https://github.com/KSPP/linux/issues/4)

On PPC32 we have to find a compromise between performance and/or
memory wasting and selection of strict_kernel_rwx, because it implies
either smaller memory chunks or larger alignment between RO memory
and RW memory.

For instance the 8xx maps memory with 8M pages. So either the limit
between RO and RW must be 8M aligned or it falls back or 512k pages
which implies more pressure on the TLB.

book3s/32 maps memory with BATs as much as possible. BATS can have
any power-of-two size between 128k and 256M but we have only 4 to 8
BATs so the alignment must be good enough to allow efficient use of
the BATs and avoid falling back on standard page mapping which would
kill performance.

So let's go one step forward and make it the default but still allow
users to unset it when wanted.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/057c40164084bfc7d77c0b2ff78d95dbf6a2a21b.1632503622.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy
63f501e07a powerpc/8xx: Simplify TLB handling
In the old days, TLB handling for 8xx was using tlbie and tlbia
instructions directly as much as possible.

But commit f048aace29 ("powerpc/mm: Add SMP support to no-hash
TLB handling") broke that by introducing out-of-line unnecessary
complex functions for booke/smp which don't have tlbie/tlbia
instructions and require more complex handling.

Restore direct use of tlbie and tlbia for 8xx which is never SMP.

With this patch we now get

	c00ecc68 <ptep_clear_flush>:
	c00ecc68:	39 00 00 00 	li      r8,0
	c00ecc6c:	81 46 00 00 	lwz     r10,0(r6)
	c00ecc70:	91 06 00 00 	stw     r8,0(r6)
	c00ecc74:	7c 00 2a 64 	tlbie   r5,r0
	c00ecc78:	7c 00 04 ac 	hwsync
	c00ecc7c:	91 43 00 00 	stw     r10,0(r3)
	c00ecc80:	4e 80 00 20 	blr

Before it was

	c0012880 <local_flush_tlb_page>:
	c0012880:	2c 03 00 00 	cmpwi   r3,0
	c0012884:	41 82 00 54 	beq     c00128d8 <local_flush_tlb_page+0x58>
	c0012888:	81 22 00 00 	lwz     r9,0(r2)
	c001288c:	81 43 00 20 	lwz     r10,32(r3)
	c0012890:	39 29 00 01 	addi    r9,r9,1
	c0012894:	91 22 00 00 	stw     r9,0(r2)
	c0012898:	2c 0a 00 00 	cmpwi   r10,0
	c001289c:	41 82 00 10 	beq     c00128ac <local_flush_tlb_page+0x2c>
	c00128a0:	81 2a 01 dc 	lwz     r9,476(r10)
	c00128a4:	2c 09 ff ff 	cmpwi   r9,-1
	c00128a8:	41 82 00 0c 	beq     c00128b4 <local_flush_tlb_page+0x34>
	c00128ac:	7c 00 22 64 	tlbie   r4,r0
	c00128b0:	7c 00 04 ac 	hwsync
	c00128b4:	81 22 00 00 	lwz     r9,0(r2)
	c00128b8:	39 29 ff ff 	addi    r9,r9,-1
	c00128bc:	2c 09 00 00 	cmpwi   r9,0
	c00128c0:	91 22 00 00 	stw     r9,0(r2)
	c00128c4:	4c a2 00 20 	bclr+   4,eq
	c00128c8:	81 22 00 70 	lwz     r9,112(r2)
	c00128cc:	71 29 00 04 	andi.   r9,r9,4
	c00128d0:	4d 82 00 20 	beqlr
	c00128d4:	48 65 76 74 	b       c0669f48 <preempt_schedule>
	c00128d8:	81 22 00 00 	lwz     r9,0(r2)
	c00128dc:	39 29 00 01 	addi    r9,r9,1
	c00128e0:	91 22 00 00 	stw     r9,0(r2)
	c00128e4:	4b ff ff c8 	b       c00128ac <local_flush_tlb_page+0x2c>
...
	c00ecdc8 <ptep_clear_flush>:
	c00ecdc8:	94 21 ff f0 	stwu    r1,-16(r1)
	c00ecdcc:	39 20 00 00 	li      r9,0
	c00ecdd0:	93 c1 00 08 	stw     r30,8(r1)
	c00ecdd4:	83 c6 00 00 	lwz     r30,0(r6)
	c00ecdd8:	91 26 00 00 	stw     r9,0(r6)
	c00ecddc:	93 e1 00 0c 	stw     r31,12(r1)
	c00ecde0:	7c 08 02 a6 	mflr    r0
	c00ecde4:	7c 7f 1b 78 	mr      r31,r3
	c00ecde8:	7c 83 23 78 	mr      r3,r4
	c00ecdec:	7c a4 2b 78 	mr      r4,r5
	c00ecdf0:	90 01 00 14 	stw     r0,20(r1)
	c00ecdf4:	4b f2 5a 8d 	bl      c0012880 <local_flush_tlb_page>
	c00ecdf8:	93 df 00 00 	stw     r30,0(r31)
	c00ecdfc:	7f e3 fb 78 	mr      r3,r31
	c00ece00:	80 01 00 14 	lwz     r0,20(r1)
	c00ece04:	83 c1 00 08 	lwz     r30,8(r1)
	c00ece08:	83 e1 00 0c 	lwz     r31,12(r1)
	c00ece0c:	7c 08 03 a6 	mtlr    r0
	c00ece10:	38 21 00 10 	addi    r1,r1,16
	c00ece14:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fb324f1c8f2ddb57cf6aad1cea26329558f1c1c0.1631887021.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy
e28d0b6750 powerpc/lib/sstep: Don't use __{get/put}_user() on kernel addresses
In the old days, when we didn't have kernel userspace access
protection and had set_fs(), it was wise to use __get_user()
and friends to read kernel memory.

Nowadays, get_user() and put_user() are granting userspace access and
are exclusively for userspace access.

Convert single step emulation functions to user_access_begin() and
friends and use unsafe_get_user() and unsafe_put_user().

When addressing kernel addresses, there is no need to open userspace
access. And for book3s/32 it is particularly important to no try and
open userspace access on kernel address, because that would break the
content of kernel space segment registers. No guard has been put
against that risk in order to avoid degrading performance.

copy_from_kernel_nofault() and copy_to_kernel_nofault() should
be used but they are out-of-line functions which would degrade
performance. Those two functions are making use of
__get_kernel_nofault() and __put_kernel_nofault() macros.
Those two macros are just wrappers behind __get_user_size_goto() and
__put_user_size_goto().

unsafe_get_user() and unsafe_put_user() are also wrappers of
__get_user_size_goto() and __put_user_size_goto(). Use them to
access kernel space. That allows refactoring userspace and
kernelspace access.

Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Depends-on: 4fe5cda9f8 ("powerpc/uaccess: Implement user_read_access_begin and user_write_access_begin")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/22831c9d17f948680a12c5292e7627288b15f713.1631817805.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy
cbe654c779 powerpc: warn on emulation of dcbz instruction in kernel mode
dcbz instruction shouldn't be used on non-cached memory. Using
it on non-cached memory can result in alignment exception and
implies a heavy handling.

Instead of silentely emulating the instruction and resulting in high
performance degradation, warn whenever an alignment exception is
taken in kernel mode due to dcbz, so that the user is made aware that
dcbz instruction has been used unexpectedly by the kernel.

Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2e3acfe63d289c6fba366e16973c9ab8369e8b75.1631803922.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy
5c810ced36 powerpc/32: Add support for out-of-line static calls
Add support for out-of-line static calls on PPC32. This change
improve performance of calls to global function pointers by
using direct calls instead of indirect calls.

The trampoline is initialy populated with a 'blr' or branch to target,
followed by an unreachable long jump sequence.

In order to cater with parallele execution, the trampoline needs to
be updated in a way that ensures it remains consistent at all time.
This means we can't use the traditional lis/addi to load r12 with
the target address, otherwise there would be a window during which
the first instruction contains the upper part of the new target
address while the second instruction still contains the lower part of
the old target address. To avoid that the target address is stored
just after the 'bctr' and loaded from there with a single instruction.

Then, depending on the target distance, arch_static_call_transform()
will either replace the first instruction by a direct 'bl <target>' or
'nop' in order to have the trampoline fall through the long jump
sequence.

For the special case of __static_call_return0(), to avoid the risk of
a far branch, a version of it is inlined at the end of the trampoline.

Performancewise the long jump sequence is probably not better than
the indirect calls set by GCC when we don't use static calls, but
such calls are unlikely to be required on powerpc32: With most
configurations the kernel size is far below 32 Mbytes so only
modules may happen to be too far. And even modules are likely to
be close enough as they are allocated below the kernel core and
as close as possible of the kernel text.

static_call selftest is running successfully with this change.

With this patch, __do_irq() has the following sequence to trace
irq entries:

	c0004a00 <__SCT__tp_func_irq_entry>:
	c0004a00:	48 00 00 e0 	b       c0004ae0 <__traceiter_irq_entry>
	c0004a04:	3d 80 c0 00 	lis     r12,-16384
	c0004a08:	81 8c 4a 1c 	lwz     r12,18972(r12)
	c0004a0c:	7d 89 03 a6 	mtctr   r12
	c0004a10:	4e 80 04 20 	bctr
	c0004a14:	38 60 00 00 	li      r3,0
	c0004a18:	4e 80 00 20 	blr
	c0004a1c:	00 00 00 00 	.long 0x0
...
	c0005654 <__do_irq>:
...
	c0005664:	7c 7f 1b 78 	mr      r31,r3
...
	c00056a0:	81 22 00 00 	lwz     r9,0(r2)
	c00056a4:	39 29 00 01 	addi    r9,r9,1
	c00056a8:	91 22 00 00 	stw     r9,0(r2)
	c00056ac:	3d 20 c0 af 	lis     r9,-16209
	c00056b0:	81 29 74 cc 	lwz     r9,29900(r9)
	c00056b4:	2c 09 00 00 	cmpwi   r9,0
	c00056b8:	41 82 00 10 	beq     c00056c8 <__do_irq+0x74>
	c00056bc:	80 69 00 04 	lwz     r3,4(r9)
	c00056c0:	7f e4 fb 78 	mr      r4,r31
	c00056c4:	4b ff f3 3d 	bl      c0004a00 <__SCT__tp_func_irq_entry>

Before this patch, __do_irq() was doing the following to trace irq
entries:

	c0005700 <__do_irq>:
...
	c0005710:	7c 7e 1b 78 	mr      r30,r3
...
	c000574c:	93 e1 00 0c 	stw     r31,12(r1)
	c0005750:	81 22 00 00 	lwz     r9,0(r2)
	c0005754:	39 29 00 01 	addi    r9,r9,1
	c0005758:	91 22 00 00 	stw     r9,0(r2)
	c000575c:	3d 20 c0 af 	lis     r9,-16209
	c0005760:	83 e9 f4 cc 	lwz     r31,-2868(r9)
	c0005764:	2c 1f 00 00 	cmpwi   r31,0
	c0005768:	41 82 00 24 	beq     c000578c <__do_irq+0x8c>
	c000576c:	81 3f 00 00 	lwz     r9,0(r31)
	c0005770:	80 7f 00 04 	lwz     r3,4(r31)
	c0005774:	7d 29 03 a6 	mtctr   r9
	c0005778:	7f c4 f3 78 	mr      r4,r30
	c000577c:	4e 80 04 21 	bctrl
	c0005780:	85 3f 00 0c 	lwzu    r9,12(r31)
	c0005784:	2c 09 00 00 	cmpwi   r9,0
	c0005788:	40 82 ff e4 	bne     c000576c <__do_irq+0x6c>

Behind the fact of now using a direct 'bl' instead of a
'load/mtctr/bctr' sequence, we can also see that we get one less
register on the stack.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy
8f7fadb4ba powerpc/machdep: Remove stale functions from ppc_md structure
ppc_md.iommu_save() is not set anymore by any platform after
commit c40785ad30 ("powerpc/dart: Use a cachable DART").
So iommu_save() has become a nop and can be removed.

ppc_md.show_percpuinfo() is not set anymore by any platform after
commit 4350147a81 ("[PATCH] ppc64: SMU based macs cpufreq support").

Last users of ppc_md.rtc_read_val() and ppc_md.rtc_write_val() were
removed by commit 0f03a43b8f ("[POWERPC] Remove todc code from
ARCH=powerpc")

Last user of kgdb_map_scc() was removed by commit 17ce452f7e ("kgdb,
powerpc: arch specific powerpc kgdb support").

ppc.machine_kexec_prepare() has not been used since
commit 8ee3e0d696 ("powerpc: Remove the main legacy iSerie platform
code"). This allows the removal of machine_kexec_prepare() and the
rename of default_machine_kexec_prepare() into machine_kexec_prepare()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
[mpe: Drop prototype for default_machine_kexec_prepare() as noted by dja]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24d4ca0ada683c9436a5f812a7aeb0a1362afa2b.1630398606.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy
e606a2f46c powerpc/time: Remove generic_suspend_{dis/en}able_irqs()
Commit d75d68cfef ("powerpc: Clean up obsolete code relating to
decrementer and timebase") made generic_suspend_enable_irqs() and
generic_suspend_disable_irqs() static.

Fold them into their only caller.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c3f9ec9950394ef939014f7934268e6ee30ca04f.1630398566.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy
566af8cda3 powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
Commit e65e1fc2d2 ("[PATCH] syscall class hookup for all normal
targets") added generic support for AUDIT but that didn't include
support for bi-arch like powerpc.

Commit 4b58841149 ("audit: Add generic compat syscall support")
added generic support for bi-arch.

Convert powerpc to that bi-arch generic audit support.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a4b3951d1191d4183d92a07a6097566bde60d00a.1629812058.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:04 +11:00
Christophe Leroy
a85c728cb5 powerpc/32: Don't use lmw/stmw for saving/restoring non volatile regs
Instructions lmw/stmw are interesting for functions that are rarely
used and not in the cache, because only one instruction is to be
copied into the instruction cache instead of 19. However those
instruction are less performant than 19x raw lwz/stw as they require
synchronisation plus one additional cycle.

SAVE_NVGPRS / REST_NVGPRS are used in only a few places which are
mostly in interrupts entries/exits and in task switch so they are
likely already in the cache.

Using standard lwz improves null_syscall selftest by:
- 10 cycles on mpc832x.
- 2 cycles on mpc8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/316c543b8906712c108985c8463eec09c8db577b.1629732542.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:04 +11:00
Anatolij Gustschin
aed2886a5e powerpc/5200: dts: fix memory node unit name
Fixes build warnings:
Warning (unit_address_vs_reg): /memory: node has a reg or ranges property, but no unit name

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211013220532.24759-4-agust@denx.de
2021-10-22 15:22:04 +11:00
Anatolij Gustschin
7855b6c66d powerpc/5200: dts: fix pci ranges warnings
Fix ranges property warnings:
pci@f0000d00:ranges: 'oneOf' conditional failed, one must be fixed:

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211013220532.24759-3-agust@denx.de
2021-10-22 15:22:04 +11:00
Anatolij Gustschin
e9efabc6e4 powerpc/5200: dts: add missing pci ranges
Add ranges property to fix build warnings:
Warning (pci_bridge): /pci@f0000d00: missing ranges for PCI bridge (or not a bridge)

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211013220532.24759-2-agust@denx.de
2021-10-22 15:22:04 +11:00
Gustavo A. R. Silva
61cb9ac66b powerpc/vas: Fix potential NULL pointer dereference
(!ptr && !ptr->foo) strikes again. :)

The expression (!ptr && !ptr->foo) is bogus and in case ptr is NULL,
it leads to a NULL pointer dereference: ptr->foo.

Fix this by converting && to ||

This issue was detected with the help of Coccinelle, and audited and
fixed manually.

Fixes: 1a0d0d5ed5 ("powerpc/vas: Add platform specific user window operations")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015050345.GA1161918@embeddedor
2021-10-22 15:22:04 +11:00
Christophe Leroy
49e3d8ea62 powerpc/fsl_booke: Enable STRICT_KERNEL_RWX
Enable STRICT_KERNEL_RWX on fsl_booke.

For that, we need additional TLBCAMs dedicated to linear mapping,
based on the alignment of _sinittext.

By default, up to 768 Mbytes of memory are mapped.
It uses 3 TLBCAMs of size 256 Mbytes.

With a data alignment of 16, we need up to 9 TLBCAMs:
  16/16/16/16/64/64/64/256/256

With a data alignment of 4, we need up to 12 TLBCAMs:
  4/4/4/4/16/16/16/64/64/64/256/256

With a data alignment of 1, we need up to 15 TLBCAMs:
  1/1/1/1/4/4/4/16/16/16/64/64/64/256/256

By default, set a 16 Mbytes alignment as a compromise between memory
usage and number of TLBCAMs. This can be adjusted manually when needed.

For the time being, it doens't work when the base is randomised.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/29f9e5d2bbbc83ae9ca879265426a6278bf4d5bb.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:03 +11:00
Christophe Leroy
d5970045cf powerpc/fsl_booke: Update of TLBCAMs after init
After init, set readonly memory as ROX and set readwrite
memory as RWX, if STRICT_KERNEL_RWX is enabled.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/66bef0b9c273e1121706883f3cf5ad0a053d863f.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:03 +11:00
Christophe Leroy
0b2859a743 powerpc/fsl_booke: Allocate separate TLBCAMs for readonly memory
Reorganise TLBCAM allocation so that when STRICT_KERNEL_RWX is
enabled, TLBCAMs are allocated such that readonly memory uses
different TLBCAMs.

This results in an allocation looking like:

Memory CAM mapping: 4/4/4/1/1/1/1/16/16/16/64/64/64/256/256 Mb, residual: 256Mb

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8ca169bc288261a0e0558712f979023c3a960ebb.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:03 +11:00
Christophe Leroy
52bda69ae8 powerpc/fsl_booke: Tell map_mem_in_cams() if init is done
In order to be able to call map_mem_in_cams() once more
after init for STRICT_KERNEL_RWX, add an argument.

For now, map_mem_in_cams() is always called only during init.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3b69a7e0b393b16984ade882a5eae5d727717459.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:03 +11:00
Christophe Leroy
a97dd9e2f7 powerpc/fsl_booke: Enable reloading of TLBCAM without switching to AS1
Avoid switching to AS1 when reloading TLBCAM after init for
STRICT_KERNEL_RWX.

When we setup AS1 we expect the entire accessible memory to be mapped
through one entry, this is not the case anymore at the end of init.

We are not changing the size of TLBCAMs, only flags, so no need to
switch to AS1.

So change loadcam_multi() to not switch to AS1 when the given
temporary tlb entry in 0.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a9d517fbfbc940f56103c46b323f6eb8f4485571.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:03 +11:00
Christophe Leroy
01116e6e98 powerpc/fsl_booke: Take exec flag into account when setting TLBCAMs
Don't force MAS3_SX and MAS3_UX at all time. Take into account the
exec flag.

While at it, fix a couple of closeby style problems (indent with space
and unnecessary parenthesis), it keeps more readability.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5467044e59f27f9fcf709b9661779e3ce5f784f6.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:03 +11:00
Christophe Leroy
3a75fd709c powerpc/fsl_booke: Rename fsl_booke.c to fsl_book3e.c
We have a myriad of CONFIG symbols around different variants
of BOOKEs, which would be worth tidying up one day.

But at least, make file names and CONFIG option match:

We have CONFIG_FSL_BOOKE and CONFIG_PPC_FSL_BOOK3E.

fsl_booke.c is selected by and only by CONFIG_PPC_FSL_BOOK3E.
So rename it fsl_book3e to reduce confusion.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5dc871db1f67739319bec11f049ca450da1c13a2.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:02 +11:00
Christophe Leroy
68b44f94d6 powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE
fsl_booke and 44x are not able to map kernel linear memory with
pages, so they can't support DEBUG_PAGEALLOC and KFENCE, and
STRICT_KERNEL_RWX is also a problem for now.

Enable those only on book3s (both 32 and 64 except KFENCE), 8xx and 40x.

Fixes: 88df6e90fa ("[POWERPC] DEBUG_PAGEALLOC for 32-bit")
Fixes: 95902e6c88 ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32")
Fixes: 90cbac0e99 ("powerpc: Enable KFENCE for PPC32")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d1ad9fdd9b27da3fdfa16510bb542ed51fa6e134.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:02 +11:00
Wan Jiabing
7453f501d4 powerpc/kexec_file: Add of_node_put() before goto
Fix following coccicheck warning:
./arch/powerpc/kexec/file_load_64.c:698:1-22: WARNING: Function
for_each_node_by_type should have of_node_put() before goto

Early exits from for_each_node_by_type should decrement the
node reference counter.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211018015418.10182-1-wanjiabing@vivo.com
2021-10-22 15:22:02 +11:00
Wan Jiabing
915b368f69 powerpc/pseries/iommu: Add of_node_put() before break
Fix following coccicheck warning:

./arch/powerpc/platforms/pseries/iommu.c:924:1-28: WARNING: Function
for_each_node_with_property should have of_node_put() before break

Early exits from for_each_node_with_property should decrement the
node reference counter.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Leonardo Bras <leobras.c@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211014075624.16344-1-wanjiabing@vivo.com
2021-10-22 15:22:02 +11:00
Joel Stanley
4f703e7faa powerpc/s64: Clarify that radix lacks DEBUG_PAGEALLOC
The page_alloc.c code will call into __kernel_map_pages() when
DEBUG_PAGEALLOC is configured and enabled.

As the implementation assumes hash, this should crash spectacularly if
not for a bit of luck in __kernel_map_pages(). In this function
linear_map_hash_count is always zero, the for loop exits without doing
any damage.

There are no other platforms that determine if they support
debug_pagealloc at runtime. Instead of adding code to mm/page_alloc.c to
do that, this change turns the map/unmap into a noop when in radix
mode and prints a warning once.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Reformat if per Christophe's suggestion]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211013213438.675095-1-joel@jms.id.au
2021-10-22 15:22:02 +11:00
Linus Torvalds
0a3221b658 powerpc fixes for 5.15 #5
Fix a bug exposed by a previous fix, where running guests with certain SMT topologies
 could crash the host on Power8.
 
 Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is enabled.
 
 Thanks to: Nathan Lynch, Srikar Dronamraju, Valentin Schneider.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmFxTaUTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFj/D/9J/rz1zYKxvPd+o5WsJBWrgsVBxgGr
 cIhtKARlh0Yoa9fhIYF6ir4HUT2TOhokwJJco7fcND73rfrHfbWshWEzwaR1Lz/2
 WXr0Np509GQw1KCC9nv5DoE5qqLLL+rS+ul75Khkoi99wa/TDD8kQ5o3zUWtDoad
 D6paUIOf+DjnzzD7hIuTljfuOL/P3S4OyarQEopD69n8IgCTCrSsKdzOzAvlKcuL
 c3OoGAY5gasHowfxLfOigkCYloT4gJES7Kxl92QnPByOJUfqhjfKxiN4l0ydR6Bl
 4dUFUp/PkvvmTZiIA3tFzu0fHOgjUCvICyEAewLX7Bh9fXAVYMRc1a2l4LrCWD8T
 hQqQFskxa/viG7Uq2N58Rts9GtF5nG5d6+4l5Ndxus2+tUledjiuzOv3nsSq64TF
 y+Gws66a+A8a7DjcHo281BWzWn3XCIuug4A0doyvJFNdiy/+zj3ojfUhppZam8ca
 DYo7ot71l0XGvFM0Yhr/NQ9MEdpgT19VC0xU2pafnkp6yGUqjH1b9xelvg+uMZEc
 vXb7h+XV0eY4Rdr+3wTnXIwIWnEslJ23YQxFvhZeWeF321wsR5kbBGvNf/aY6xQB
 rLkNX63RdpyYjHGcn3qwcYTkk8wAjCR719tFal8hRMLw+CCjzqcinb7P5FLi4hQv
 cUV/vDffiwyReg==
 =572c
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a bug exposed by a previous fix, where running guests with
   certain SMT topologies could crash the host on Power8.

 - Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is
   enabled.

Thanks to Nathan Lynch, Srikar Dronamraju, and Valentin Schneider.

* tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/smp: do not decrement idle task preempt count in CPU offline
  powerpc/idle: Don't corrupt back chain when going idle
2021-10-21 15:30:09 -10:00
Len Baker
5dfbbb668a KVM: PPC: Replace zero-length array with flexible array member
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use "flexible array members" [1] for these cases. The
older style of one-element or zero-length arrays should no longer be
used[2].

Also, make use of the struct_size() helper in kzalloc().

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-10-20 18:30:42 -05:00
Rob Herring
41408b22ec powerpc: Use of_get_cpu_hwid()
Replace open coded parsing of CPU nodes' 'reg' property with
of_get_cpu_hwid().

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211006164332.1981454-8-robh@kernel.org
2021-10-20 13:37:24 -05:00
Nathan Lynch
787252a10d powerpc/smp: do not decrement idle task preempt count in CPU offline
With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
get:

BUG: scheduling while atomic: swapper/1/0/0x00000000
no locks held by swapper/1/0.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100
Call Trace:
 dump_stack_lvl+0xac/0x108
 __schedule_bug+0xac/0xe0
 __schedule+0xcf8/0x10d0
 schedule_idle+0x3c/0x70
 do_idle+0x2d8/0x4a0
 cpu_startup_entry+0x38/0x40
 start_secondary+0x2ec/0x3a0
 start_secondary_prolog+0x10/0x14

This is because powerpc's arch_cpu_idle_dead() decrements the idle task's
preempt count, for reasons explained in commit a7c2bb8279 ("powerpc:
Re-enable preemption before cpu_die()"), specifically "start_secondary()
expects a preempt_count() of 0."

However, since commit 2c669ef697 ("powerpc/preempt: Don't touch the idle
task's preempt_count during hotplug") and commit f1a0a376ca ("sched/core:
Initialize the idle task with preemption disabled"), that justification no
longer holds.

The idle task isn't supposed to re-enable preemption, so remove the
vestigial preempt_enable() from the CPU offline path.

Tested with pseries and powernv in qemu, and pseries on PowerVM.

Fixes: 2c669ef697 ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com
2021-10-20 21:38:01 +11:00
Michael Ellerman
496c5fe25c powerpc/idle: Don't corrupt back chain when going idle
In isa206_idle_insn_mayloss() we store various registers into the stack
red zone, which is allowed.

However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again,
to 0(r1), which corrupts the stack back chain.

We used to do the same in isa206_idle_insn_mayloss() itself, but we
fixed that in 73287caa92 ("powerpc64/idle: Fix SP offsets when saving
GPRs"), however we missed that the macro also corrupts the back chain.

Corrupting the back chain is bad for debuggability but doesn't
necessarily cause a bug.

However we recently changed the stack handling in some KVM code, and it
now relies on the stack back chain being valid when it returns. The
corruption causes that code to return with r1 pointing somewhere in
kernel data, at some point LR is restored from the stack and we branch
to NULL or somewhere else invalid.

Only affects Power8 hosts running KVM guests, with dynamic_mt_modes
enabled (which it is by default).

The fixes tag below points to the commit that changed the KVM stack
handling, exposing this bug. The actual corruption of the back chain has
always existed since 948cf67c47 ("powerpc: Add NAP mode support on
Power7 in HV mode").

Fixes: 9b4416c509 ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020094826.3222052-1-mpe@ellerman.id.au
2021-10-20 21:37:58 +11:00
Kajol Jain
26da4abfb3 powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses
Fix the data source encodings to represent L2.1/L3.1(another core's
L2/L3 on the same node) accesses properly for power10 and older
plaforms.

Add new macros(LEVEL/REM) which can be used to add mem_lvl_num and remote
field data inside perf_mem_data_src structure.

Result in power9 system with patch changes:

localhost:~/linux/tools/perf # ./perf mem report | grep Remote
     0.01%             1  252           Remote core, same node L3 or L3 hit  [.] 0x0000000000002dd0                producer_consumer   [.] 0x00007fff7f25eb90
anon               HitM          N/A                     No       N/A        0              0
     0.01%             1  220           Remote core, same node L3 or L3 hit  [.] 0x0000000000002dd0                producer_consumer   [.] 0x00007fff77776d90
anon               HitM          N/A                     No       N/A        0              0
     0.01%             1  220           Remote core, same node L3 or L3 hit  [.] 0x0000000000002dd0                producer_consumer   [.] 0x00007fff817d9410
anon               HitM          N/A                     No       N/A        0              0

Fixes: 79e96f8f93 ("powerpc/perf: Export memory hierarchy info to user space")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211006140654.298352-5-kjain@linux.ibm.com
2021-10-19 17:27:01 +02:00
Andreas Gruenbacher
bb523b406c gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable}
Turn fault_in_pages_{readable,writeable} into versions that return the
number of bytes not faulted in, similar to copy_to_user, instead of
returning a non-zero value when any of the requested pages couldn't be
faulted in.  This supports the existing users that require all pages to
be faulted in as well as new users that are happy if any pages can be
faulted in.

Rename the functions to fault_in_{readable,writeable} to make sure
this change doesn't silently break things.

Neither of these functions is entirely trivial and it doesn't seem
useful to inline them, so move them to mm/gup.c.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-18 16:33:03 +02:00
Andreas Gruenbacher
0c8eb2884a powerpc/kvm: Fix kvm_use_magic_page
When switching from __get_user to fault_in_pages_readable, commit
9f9eae5ce7 broke kvm_use_magic_page: like __get_user,
fault_in_pages_readable returns 0 on success.

Fixes: 9f9eae5ce7 ("powerpc/kvm: Prefer fault_in_pages_readable function")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-18 16:31:45 +02:00
Uwe Kleine-König
4141127c44 powerpc/eeh: Use to_pci_driver() instead of pci_dev->driver
Struct pci_driver contains a struct device_driver, so for PCI devices, it's
easy to convert a device_driver * to a pci_driver * with to_pci_driver().
The device_driver * is in struct device, so we don't need to also keep
track of the pci_driver * in struct pci_dev.

Replace pdev->driver with to_pci_driver().  This is a step toward removing
pci_dev->driver.

[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/20211004125935.2300113-11-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-10-18 09:20:15 -05:00
Christoph Hellwig
e41d12f539 mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>
There is no need to pull blk-cgroup.h and thus blkdev.h in here, so
break the include chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:01 -06:00
Linus Torvalds
be9eb2f00f powerpc fixes for 5.15 #4
Fix a bug where guests on P9 with interrupts passed through could get stuck in
 synchronize_irq().
 
 Fix a bug in KVM on P8 where secondary threads entering a guest would write outside their
 allocated stack.
 
 Fix a bug in KVM on P8 where secondary threads could confuse the host offline code and
 cause the guest or host to crash.
 
 Thanks to: Cédric Le Goater
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmFsFZkTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBQND/0VSdTHegF+7kBqWnyP+Rc8aXwr/kPL
 xijmmeWS+BY0nkW7BNabRwVtOHZ4lKyDxPMwFiQ+tbpsmm4LX9sA9igV2yXdK73Y
 pRFkJBxljxXuO3eYa4peQwsvHnI7cR//fUr94b0OT6r0N39T5tKq5MZvWawk05y4
 RivpMOnEtDEd12ySJk5/fshwNKV5L7InI2FRKi3EU8sPNV3XA/z5JQ1CKuvXUUiK
 1HIwwxYqECcMM0dkIo7Qa7F93Wr0V8ZjsXLFV96jpBNL2aUfXVMaIBmITwa1Pk+i
 1581QmI4AJx3b3YAMqoujrgFJQKxaw2ZjiFfDcHGcTD4qCNVH2MNeyHM86Z7lFKL
 osYI7Ya3zzlwd4qhPsLV21mBdBp1xjpiBj1FiRrq4o+Me+O6eQnmG2abL8xUFI/3
 F7ab9qV8xjqfw/XlAXKtb7cco4SSQfi6XuLYSdTduCU5A1unLgpBw9F6H14YrQ5A
 8/uriGLIzhgklwDYgBJUGOjJzMCv0ey1+MJF4R4tEeIJ7+83WuAHCf2eAnxSVXLT
 Q2JaFt1LP8fdW1IPYYICjVgahREsWvTW4wK2lk5sO+BSoop6EbOVfz5TK52ePxLC
 II4ETYG3tIipLCkf2tqF/rU4De7d2SRg70lCNdEBVU9GpmiXec4uwPyX3uAwKdlf
 9qphXh0d15LAPg==
 =fPMr
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a bug where guests on P9 with interrupts passed through could get
   stuck in synchronize_irq().

 - Fix a bug in KVM on P8 where secondary threads entering a guest would
   write outside their allocated stack.

 - Fix a bug in KVM on P8 where secondary threads could confuse the host
   offline code and cause the guest or host to crash.

Thanks to Cédric Le Goater.

* tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest
  KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()
  powerpc/xive: Discard disabled interrupts in get_irqchip_state()
2021-10-17 18:01:32 -10:00
Michael Ellerman
cdeb5d7d89 KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest
We call idle_kvm_start_guest() from power7_offline() if the thread has
been requested to enter KVM. We pass it the SRR1 value that was returned
from power7_idle_insn() which tells us what sort of wakeup we're
processing.

Depending on the SRR1 value we pass in, the KVM code might enter the
guest, or it might return to us to do some host action if the wakeup
requires it.

If idle_kvm_start_guest() is able to handle the wakeup, and enter the
guest it is supposed to indicate that by returning a zero SRR1 value to
us.

That was the behaviour prior to commit 10d91611f4 ("powerpc/64s:
Reimplement book3s idle code in C"), however in that commit the
handling of SRR1 was reworked, and the zeroing behaviour was lost.

Returning from idle_kvm_start_guest() without zeroing the SRR1 value can
confuse the host offline code, causing the guest to crash and other
weirdness.

Fixes: 10d91611f4 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015133929.832061-2-mpe@ellerman.id.au
2021-10-16 00:40:03 +11:00
Michael Ellerman
9b4416c509 KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()
In commit 10d91611f4 ("powerpc/64s: Reimplement book3s idle code in
C") kvm_start_guest() became idle_kvm_start_guest(). The old code
allocated a stack frame on the emergency stack, but didn't use the
frame to store anything, and also didn't store anything in its caller's
frame.

idle_kvm_start_guest() on the other hand is written more like a normal C
function, it creates a frame on entry, and also stores CR/LR into its
callers frame (per the ABI). The problem is that there is no caller
frame on the emergency stack.

The emergency stack for a given CPU is allocated with:

  paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE;

So emergency_sp actually points to the first address above the emergency
stack allocation for a given CPU, we must not store above it without
first decrementing it to create a frame. This is different to the
regular kernel stack, paca->kstack, which is initialised to point at an
initial frame that is ready to use.

idle_kvm_start_guest() stores the backchain, CR and LR all of which
write outside the allocation for the emergency stack. It then creates a
stack frame and saves the non-volatile registers. Unfortunately the
frame it creates is not large enough to fit the non-volatiles, and so
the saving of the non-volatile registers also writes outside the
emergency stack allocation.

The end result is that we corrupt whatever is at 0-24 bytes, and 112-248
bytes above the emergency stack allocation.

In practice this has gone unnoticed because the memory immediately above
the emergency stack happens to be used for other stack allocations,
either another CPUs mc_emergency_sp or an IRQ stack. See the order of
calls to irqstack_early_init() and emergency_stack_init().

The low addresses of another stack are the top of that stack, and so are
only used if that stack is under extreme pressue, which essentially
never happens in practice - and if it did there's a high likelyhood we'd
crash due to that stack overflowing.

Still, we shouldn't be corrupting someone else's stack, and it is purely
luck that we aren't corrupting something else.

To fix it we save CR/LR into the caller's frame using the existing r1 on
entry, we then create a SWITCH_FRAME_SIZE frame (which has space for
pt_regs) on the emergency stack with the backchain pointing to the
existing stack, and then finally we switch to the new frame on the
emergency stack.

Fixes: 10d91611f4 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015133929.832061-1-mpe@ellerman.id.au
2021-10-16 00:39:54 +11:00
Kees Cook
42a20f86dc sched: Add wrapper for get_wchan() to keep task blocked
Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
2021-10-15 11:25:14 +02:00
Christophe Leroy
3091f5fc5f powerpc: Mark .opd section read-only
.opd section contains function descriptors used to locate
functions in the kernel. If someone is able to modify a
function descriptor he will be able to run arbitrary
kernel function instead of another.

To avoid that, move .opd section inside read-only memory.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3cd40b682fb6f75bb40947b55ca0bac20cb3f995.1634136222.git.christophe.leroy@csgroup.eu
2021-10-14 21:47:34 +11:00
Athira Rajeev
8f6aca0e0f powerpc/perf: Fix cycles/instructions as PM_CYC/PM_INST_CMPL in power10
On power9 and earlier platforms, the default event used for cyles and
instructions is PM_CYC (0x0001e) and PM_INST_CMPL (0x00002)
respectively. These events use two programmable PMCs and by default will
count irrespective of the run latch state (idle state). But since they
use programmable PMCs, these events can lead to multiplexing with other
events, because there are only 4 programmable PMCs. Hence in power10,
performance monitoring unit (PMU) driver uses performance monitor
counter 5 (PMC5) and performance monitor counter6 (PMC6) for counting
instructions and cycles.

Currently on power10, the event used for cycles is PM_RUN_CYC (0x600F4)
and instructions uses PM_RUN_INST_CMPL (0x500fa). But counting of these
events in idle state is controlled by the CC56RUN bit setting in Monitor
Mode Control Register0 (MMCR0). If the CC56RUN bit is zero, PMC5/6 will
not count when CTRL[RUN] (run latch) is zero. This could lead to missing
some counts if a thread is in idle state during system wide profiling.

To fix it, set the CC56RUN bit in MMCR0 for power10, which makes PMC5
and PMC6 count instructions and cycles regardless of the run latch
state. Since this change make PMC5/6 count as PM_INST_CMPL/PM_CYC,
rename the event code 0x600f4 as PM_CYC instead of PM_RUN_CYC and event
code 0x500fa as PM_INST_CMPL instead of PM_RUN_INST_CMPL. The changes
are only for PMC5/6 event codes and will not affect the behaviour of
PM_RUN_CYC/PM_RUN_INST_CMPL if progammed in other PMC's.

Fixes: a64e697cef ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.cm>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Tweak change log wording for style and consistency]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211007075121.28497-1-atrajeev@linux.vnet.ibm.com
2021-10-14 21:46:45 +11:00
Kai Song
b616230e23 powerpc/eeh: Fix docstrings in eeh.c
We fix the following warnings when building kernel with W=1:
arch/powerpc/kernel/eeh.c:598: warning: Function parameter or member 'function' not described in 'eeh_pci_enable'
arch/powerpc/kernel/eeh.c:774: warning: Function parameter or member 'edev' not described in 'eeh_set_dev_freset'
arch/powerpc/kernel/eeh.c:774: warning: expecting prototype for eeh_set_pe_freset(). Prototype was for eeh_set_dev_freset() instead
arch/powerpc/kernel/eeh.c:814: warning: Function parameter or member 'include_passed' not described in 'eeh_pe_reset_full'
arch/powerpc/kernel/eeh.c:944: warning: Function parameter or member 'ops' not described in 'eeh_init'
arch/powerpc/kernel/eeh.c:1451: warning: Function parameter or member 'include_passed' not described in 'eeh_pe_reset'
arch/powerpc/kernel/eeh.c:1526: warning: Function parameter or member 'func' not described in 'eeh_pe_inject_err'
arch/powerpc/kernel/eeh.c:1526: warning: Excess function parameter 'function' described in 'eeh_pe_inject_err'

Signed-off-by: Kai Song <songkai01@inspur.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211009041630.4135-1-songkai01@inspur.com
2021-10-13 16:45:33 +11:00
Cédric Le Goater
6ffeb56ee2 powerpc/boot: Use CONFIG_PPC_POWERNV to compile OPAL support
CONFIG_PPC64_BOOT_WRAPPER is selected by CPU_LITTLE_ENDIAN which is
used to compile support for other platforms such as Microwatt. There
is no need for OPAL calls on these.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211011070356.99952-1-clg@kaod.org
2021-10-13 16:45:12 +11:00
Cédric Le Goater
6f779e1d35 powerpc/xive: Discard disabled interrupts in get_irqchip_state()
When an interrupt is passed through, the KVM XIVE device calls the
set_vcpu_affinity() handler which raises the P bit to mask the
interrupt and to catch any in-flight interrupts while routing the
interrupt to the guest.

On the guest side, drivers (like some Intels) can request at probe
time some MSIs and call synchronize_irq() to check that there are no
in flight interrupts. This will call the XIVE get_irqchip_state()
handler which will always return true as the interrupt P bit has been
set on the host side and lock the CPU in an infinite loop.

Fix that by discarding disabled interrupts in get_irqchip_state().

Fixes: da15c03b04 ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race")
Cc: stable@vger.kernel.org #v5.4+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: seeteena <s1seetee@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211011070203.99726-1-clg@kaod.org
2021-10-13 16:38:55 +11:00
Christophe Leroy
602946ec2f powerpc: Set max_mapnr correctly
max_mapnr is used by virt_addr_valid() to check if a linear
address is valid.

It must only include lowmem PFNs, like other architectures.

Problem detected on a system with 1G mem (Only 768M are mapped), with
CONFIG_DEBUG_VIRTUAL and CONFIG_TEST_DEBUG_VIRTUAL, it didn't report
virt_to_phys(VMALLOC_START), VMALLOC_START being 0xf1000000.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/77d99037782ac4b3c3b0124fc4ae80ce7b760b05.1634035228.git.christophe.leroy@csgroup.eu
2021-10-13 13:28:22 +11:00
Nicholas Piggin
322fda0405 KVM: PPC: Book3S HV: H_ENTER filter out reserved HPTE[B] value
The HPTE B field is a 2-bit field with values 0b10 and 0b11 reserved.
This field is also taken from the HPTE and used when KVM executes
TLBIEs to set the B field of those instructions.

Disallow the guest setting B to a reserved value with H_ENTER by
rejecting it. This is the same approach already taken for rejecting
reserved (unsupported) LLP values. This prevents the guest from being
able to induce the host to execute TLBIE with reserved values, which
is not known to be a problem with current processors but in theory it
could prevent the TLBIE from working correctly in a future processor.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145749.1331331-1-npiggin@gmail.com
2021-10-13 13:08:16 +11:00
Uwe Kleine-König
5a72431ec3 powerpc/eeh: Use dev_driver_string() instead of struct pci_dev->driver->name
Replace pdev->driver->name by dev_driver_string() for the corresponding
struct device.  This is a step toward removing pci_dev->driver.

Move the function nearer its only user and instead of the ?: operator use a
normal "if" which is more readable.

Link: https://lore.kernel.org/r/20211004125935.2300113-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-10-12 17:50:12 -05:00
Athira Rajeev
29908bbf7b powerpc/perf: Expose instruction and data address registers as part of extended regs
Patch adds support to include Sampled Instruction Address Register
(SIAR) and Sampled Data Address Register (SDAR) SPRs as part of extended
registers. Update the definition of PERF_REG_PMU_MASK_300/31 and
PERF_REG_EXTENDED_MAX to include these SPR's.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211007065505.27809-4-atrajeev@linux.vnet.ibm.com
2021-10-12 18:39:02 +11:00
Athira Rajeev
02b182e674 powerpc/perf: Refactor the code definition of perf reg extended mask
PERF_REG_PMU_MASK_300 and PERF_REG_PMU_MASK_31 defines the mask
value for extended registers. Current definition of these mask values
uses hex constant and does not use registers by name, making it less
readable. Patch refactor the macro values by or'ing together the actual
register value constants. Also include PERF_REG_EXTENDED_MAX as
part of enum definition.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211007065505.27809-2-atrajeev@linux.vnet.ibm.com
2021-10-12 18:38:50 +11:00
Linus Torvalds
efb52a7d95 powerpc fixes for 5.15 #3
Fix a regression hit by the IPR SCSI driver, introduced by the recent addition of MSI
 domains on pseries.
 
 A big series including 8 BPF fixes, some with potential security impact and the rest
 various code generation issues.
 
 Fix our program check assembler entry path, which was accidentally jumping into a gas
 macro and generating strange stack frames, which could confuse find_bug().
 
 A couple of fixes, and related changes, to fix corner cases in our machine check handling.
 
 Fix our DMA IOMMU ops, which were not always returning the optimal DMA mask, leading to
 at least one device falling back to 32-bit DMA when it shouldn't.
 
 A fix for KUAP handling on 32-bit Book3S.
 
 Fix crashes seen when kdumping on some pseries systems.
 
 Thanks to: Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric Le Goater,
 Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem, Christoph Hellwig, Johan Almbladh, Stan
 Johnson.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmFi044THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgGtXD/0a8dy70W8lXq3k1Zz33vzp/NSnq5Cj
 d1lTI6hgbow5KbMgoKi3f2RC1S1qd0Kenz4zDD+rljRg8xrUCswUB+al2mrkeaBm
 9s82lWwemmdB8LUbWAV/SscaJ6Pc5ODmjwGpsB36gdBgaYnmedQky4knh1m0NNCS
 nyY6ugQIP6GnAZ2mojKnWx84Gj8hIYs5rEqAeDhHGNvLKSt2yIQv5LzqGM423j2O
 44nbMmEt8Jq6oJCSeMRf3j5rgc6RTkwnL/543faRe5BBNbuMU01oIhDrGALXY8Ip
 gOjyRsQYy1s+dU1rypaP/rlE3hRAUGmShCYQXA6kcESgwZc+BMSSShM/FhvVx2C7
 UnFiFHyWnrP3PrhDGW+WGhrUN2kyoZZ7V6x5bggW93pyVCmA9RW62J8HU8DwzucU
 IBO7TBNoHsscKEw8dSwG6jBOfx30bEwD2tl2PjNaDFIk7WndMOT4MbjPTDlXj38u
 PbLPlY1xEghUafAdLwWL46wQMc0pLHyAaiM/Wl8nUaES73tdtMTLtg7QH/+uBc+a
 iS1ji1U9VOpRjtBOjQHmPtRiit9aL1UJ0PDMrJWQpKxALv139jdlhjdzexnPgNAe
 8l/9bvTsCTn5j5C3m1XaUSSGM2AyawMEC0+9bJWMKuD0F8JmZfUYMSjSk0CWJFRG
 e+xG8TGXClfTxg==
 =q+1t
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "A bit of a big batch, partly because I didn't send any last week, and
  also just because the BPF fixes happened to land this week.

  Summary:

   - Fix a regression hit by the IPR SCSI driver, introduced by the
     recent addition of MSI domains on pseries.

   - A big series including 8 BPF fixes, some with potential security
     impact and the rest various code generation issues.

   - Fix our program check assembler entry path, which was accidentally
     jumping into a gas macro and generating strange stack frames, which
     could confuse find_bug().

   - A couple of fixes, and related changes, to fix corner cases in our
     machine check handling.

   - Fix our DMA IOMMU ops, which were not always returning the optimal
     DMA mask, leading to at least one device falling back to 32-bit DMA
     when it shouldn't.

   - A fix for KUAP handling on 32-bit Book3S.

   - Fix crashes seen when kdumping on some pseries systems.

  Thanks to Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric
  Le Goater, Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem,
  Christoph Hellwig, Johan Almbladh, Stan Johnson"

* tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init
  powerpc/32s: Fix kuap_kernel_restore()
  powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler
  powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
  powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG
  powerpc/64: warn if local irqs are enabled in NMI or hardirq context
  powerpc/traps: do not enable irqs in _exception
  powerpc/64s: fix program check interrupt emergency stack path
  powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000
  powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END
  powerpc/bpf ppc32: Fix JMP32_JSET_K
  powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation
  powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
  powerpc/security: Add a helper to query stf_barrier type
  powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
  powerpc/bpf: Fix BPF_MOD when imm == 1
  powerpc/bpf: Validate branch ranges
  powerpc/lib: Add helper to check if offset is within conditional branch range
  powerpc/iommu: Report the correct most efficient DMA mask for PCI devices
2021-10-10 10:12:42 -07:00
Nathan Lynch
f9473a6571 powerpc/pseries/cpuhp: remove obsolete comment from pseries_cpu_die
This comment likely refers to the obsolete DLPAR workflow where some
resource state transitions were driven more directly from user space
utilities, but it also seems to contradict itself: "Change isolate state to
Isolate [...]" is at odds with the preceding sentences, and it does not
relate at all to the code that follows.

Remove it to prevent confusion.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210927201933.76786-5-nathanl@linux.ibm.com
2021-10-09 00:16:00 +11:00
Nathan Lynch
fa2a5dfe2d powerpc/pseries/cpuhp: delete add/remove_by_count code
The core DLPAR code supports two actions (add and remove) and three
subtypes of action:

* By DRC index: the action is attempted on a single specified resource.
  This is the usual case for processors.
* By indexed count: the action is attempted on a range of resources
  beginning at the specified index. This is implemented only by the memory
  DLPAR code.
* By count: the lower layer (CPU or memory) is responsible for locating the
  specified number of resources to which the action can be applied.

I cannot find any evidence of the "by count" subtype being used by drmgr or
qemu for processors. And when I try to exercise this code, the add case
does not work:

  $ ppc64_cpu --smt ; nproc
  SMT=8
  24
  $ printf "cpu remove count 2" > /sys/kernel/dlpar
  $ nproc
  8
  $ printf "cpu add count 2" > /sys/kernel/dlpar
  -bash: printf: write error: Invalid argument
  $ dmesg | tail -2
  pseries-hotplug-cpu: Failed to find enough CPUs (1 of 2) to add
  dlpar: Could not handle DLPAR request "cpu add count 2"
  $ nproc
  8
  $ drmgr -c cpu -a -q 2         # this uses the by-index method
  Validating CPU DLPAR capability...yes.
  CPU 1
  CPU 17
  $ nproc
  24

This is because find_drc_info_cpus_to_add() does not increment drc_index
appropriately during its search.

This is not hard to fix. But the _by_count() functions also have the
property that they attempt to roll back all prior operations if the entire
request cannot be satisfied, even though the rollback itself can encounter
errors. It's not possible to provide transaction-like behavior at this
level, and it's undesirable to have code that can only pretend to do that.
Any users of these functions cannot know what the state of the system is in
the error case. And the error paths are, to my knowledge, impossible to
test without adding custom error injection code.

Summary:

* This code has not worked reliably since its introduction.
* There is no evidence that it is used.
* It contains questionable rollback behaviors in error paths which are
  difficult to test.

So let's remove it.

Fixes: ac71380071 ("powerpc/pseries: Add CPU dlpar remove functionality")
Fixes: 90edf184b9 ("powerpc/pseries: Add CPU dlpar add functionality")
Fixes: b015f6bc95 ("powerpc/pseries: Add cpu DLPAR support for drc-info property")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210927201933.76786-4-nathanl@linux.ibm.com
2021-10-09 00:16:00 +11:00
Nathan Lynch
983f910174 powerpc/cpuhp: BUG -> WARN conversion in offline path
If, due to bugs elsewhere, we get into unregister_cpu_online() with a CPU
that isn't marked hotpluggable, we can emit a warning and return an
appropriate error instead of crashing.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210927201933.76786-3-nathanl@linux.ibm.com
2021-10-09 00:16:00 +11:00
Nathan Lynch
7edd5c9a88 powerpc/pseries/cpuhp: cache node corrections
On pseries, cache nodes in the device tree can be added and removed by the
CPU DLPAR code as well as the partition migration (mobility) code. PowerVM
partitions in dedicated processor mode typically have L2 and L3 cache
nodes.

The CPU DLPAR code has the following shortcomings:

* Cache nodes returned as siblings of a new CPU node by
  ibm,configure-connector are silently discarded; only the CPU node is
  added to the device tree.

* Cache nodes which become unreferenced in the processor removal path are
  not removed from the device tree. This can lead to duplicate nodes when
  the post-migration device tree update code replaces cache nodes.

This is long-standing behavior. Presumably it has gone mostly unnoticed
because the two bugs have the property of obscuring each other in common
simple scenarios (e.g. remove a CPU and add it back). Likely you'd notice
only if you cared to inspect the device tree or the sysfs cacheinfo
information.

Booted with two processors:

  $ pwd
  /sys/firmware/devicetree/base/cpus
  $ ls -1d */
  l2-cache@2010/
  l2-cache@2011/
  l3-cache@3110/
  l3-cache@3111/
  PowerPC,POWER9@0/
  PowerPC,POWER9@8/
  $ lsprop */l2-cache
  l2-cache@2010/l2-cache
                 00003110 (12560)
  l2-cache@2011/l2-cache
                 00003111 (12561)
  PowerPC,POWER9@0/l2-cache
                 00002010 (8208)
  PowerPC,POWER9@8/l2-cache
                 00002011 (8209)
  $ ls /sys/devices/system/cpu/cpu0/cache/
  index0  index1  index2  index3

After DLPAR-adding PowerPC,POWER9@10, we see that its associated cache
nodes are absent, its threads' L2+L3 cacheinfo is unpopulated, and it is
missing a cache level in its sched domain hierarchy:

  $ ls -1d */
  l2-cache@2010/
  l2-cache@2011/
  l3-cache@3110/
  l3-cache@3111/
  PowerPC,POWER9@0/
  PowerPC,POWER9@10/
  PowerPC,POWER9@8/
  $ lsprop PowerPC\,POWER9@10/l2-cache
  PowerPC,POWER9@10/l2-cache
                 00002012 (8210)
  $ ls /sys/devices/system/cpu/cpu16/cache/
  index0  index1
  $ grep . /sys/kernel/debug/sched/domains/cpu{0,8,16}/domain*/name
  /sys/kernel/debug/sched/domains/cpu0/domain0/name:SMT
  /sys/kernel/debug/sched/domains/cpu0/domain1/name:CACHE
  /sys/kernel/debug/sched/domains/cpu0/domain2/name:DIE
  /sys/kernel/debug/sched/domains/cpu8/domain0/name:SMT
  /sys/kernel/debug/sched/domains/cpu8/domain1/name:CACHE
  /sys/kernel/debug/sched/domains/cpu8/domain2/name:DIE
  /sys/kernel/debug/sched/domains/cpu16/domain0/name:SMT
  /sys/kernel/debug/sched/domains/cpu16/domain1/name:DIE

When removing PowerPC,POWER9@8, we see that its cache nodes are left
behind:

  $ ls -1d */
  l2-cache@2010/
  l2-cache@2011/
  l3-cache@3110/
  l3-cache@3111/
  PowerPC,POWER9@0/

When DLPAR is combined with VM migration, we can get duplicate nodes. E.g.
removing one processor, then migrating, adding a processor, and then
migrating again can result in warnings from the OF core during
post-migration device tree updates:

  Duplicate name in cpus, renamed to "l2-cache@2011#1"
  Duplicate name in cpus, renamed to "l3-cache@3111#1"

and nodes with duplicated phandles in the tree, making lookup behavior
unpredictable:

  $ lsprop l[23]-cache@*/ibm,phandle
  l2-cache@2010/ibm,phandle
                   00002010 (8208)
  l2-cache@2011#1/ibm,phandle
                   00002011 (8209)
  l2-cache@2011/ibm,phandle
                   00002011 (8209)
  l3-cache@3110/ibm,phandle
                   00003110 (12560)
  l3-cache@3111#1/ibm,phandle
                   00003111 (12561)
  l3-cache@3111/ibm,phandle
                   00003111 (12561)

Address these issues by:

* Correctly processing siblings of the node returned from
  dlpar_configure_connector().
* Removing cache nodes in the CPU remove path when it can be determined
  that they are not associated with other CPUs or caches.

Use the of_changeset API in both cases, which allows us to keep the error
handling in this code from becoming more complex while ensuring that the
device tree cannot become inconsistent.

Fixes: ac71380071 ("powerpc/pseries: Add CPU dlpar remove functionality")
Fixes: 90edf184b9 ("powerpc/pseries: Add CPU dlpar add functionality")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210927201933.76786-2-nathanl@linux.ibm.com
2021-10-09 00:15:59 +11:00
Nathan Lynch
fda0eb2200 powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted()
vcpu_is_preempted() can be used outside of preempt-disabled critical
sections, yielding warnings such as:

BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
caller is rwsem_spin_on_owner+0x1cc/0x2d0
CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
Call Trace:
[c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
[c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
[c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
[c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
[c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
[c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
[c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
[c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
[c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250

The result of vcpu_is_preempted() is always used speculatively, and the
function does not access per-cpu resources in a (Linux) preempt-unsafe way.
Use raw_smp_processor_id() to avoid such warnings, adding explanatory
comments.

Fixes: ca3f969dcb ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210928214147.312412-3-nathanl@linux.ibm.com
2021-10-09 00:15:59 +11:00
Nathan Lynch
799f9b51db powerpc/paravirt: vcpu_is_preempted() commentary
Add comments more clearly documenting that this function determines whether
hypervisor-level preemption of the VM has occurred.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210928214147.312412-2-nathanl@linux.ibm.com
2021-10-09 00:15:59 +11:00
Nathan Lynch
56537faf88 powerpc: fix unbalanced node refcount in check_kvm_guest()
When check_kvm_guest() succeeds in looking up a /hypervisor OF node, it
returns without performing a matching put for the lookup, leaving the
node's reference count elevated.

Add the necessary call to of_node_put(), rearranging the code slightly to
avoid repetition or goto.

Fixes: 107c55005f ("powerpc/pseries: Add KVM guest doorbell restrictions")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210928124550.132020-1-nathanl@linux.ibm.com
2021-10-09 00:15:59 +11:00
Vasant Hegde
ee87843795 powerpc/powernv/dump: Fix typo in comment
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210914143802.54325-1-hegdevasant@linux.vnet.ibm.com
2021-10-09 00:15:59 +11:00
Nick Desaulniers
2a24d80fc8 powerpc/asm: Remove UPD_CONSTR after GCC 4.9 removal
UPD_CONSTR was previously a preprocessor define for an old GCC 4.9
inline asm bug with m<> constraints.

Fixes: 6563139d90 ("powerpc: remove GCC version check for UPD_CONSTR")
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210914161712.2463458-1-ndesaulniers@google.com
2021-10-09 00:15:59 +11:00
Christophe Leroy
7eff9bc00d powerpc/mem: Fix arch/powerpc/mm/mem.c:53:12: error: no previous prototype for 'create_section_mapping'
Commit 8e11d62e2e ("powerpc/mem: Add back missing header to fix 'no
previous prototype' error") was supposed to fix the problem, but in
the meantime commit a927bd6ba9 ("mm: fix phys_to_target_node() and*
memory_add_physaddr_to_nid() exports") moved create_section_mapping()
prototype from asm/sparsemem.h to asm/mmzone.h

Fixes: 8e11d62e2e ("powerpc/mem: Add back missing header to fix 'no previous prototype' error")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/025754fde3d027904ae9d0191f395890bec93369.1631541649.git.christophe.leroy@csgroup.eu
2021-10-09 00:15:59 +11:00
Niklas Schnelle
452f145eca powerpc: Drop superfluous pci_dev_is_added() calls
On powerpc, pci_dev_is_added() is called as part of SR-IOV fixups
that are done under pcibios_add_device() which in turn is only called in
pci_device_add() whih is called when a PCI device is scanned.

pci_dev_assign_added() is called in pci_bus_add_device() which is only
called after scanning the device. Thus pci_dev_is_added() is always
false and can be dropped.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Tweak change log slightly to reflect Oliver's comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210910141940.2598035-2-schnelle@linux.ibm.com
2021-10-09 00:15:58 +11:00
Christophe Leroy
9d7fb0643a powerpc/powermac: Remove stale declaration of pmac_md
pmac_md doesn't exist anymore, remove stall declaration.

Fixes: e8222502ee ("[PATCH] powerpc: Kill _machine and hard-coded platform numbers")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b2e52934e5a500f149e6d94db3cfa0569bc35081.1630657402.git.christophe.leroy@csgroup.eu
2021-10-09 00:15:58 +11:00
Daniel Axtens
93fa8e9d88 powerpc: Remove unused prototype for of_show_percpuinfo
commit 6d7f58b04d ("[PATCH] powerpc: Some minor cleanups to setup_32.c")
removed of_show_percpuinfo but didn't remove the prototype.

Remove it.

Fixes: 6d7f58b04d ("[PATCH] powerpc: Some minor cleanups to setup_32.c")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210903063246.70691-1-dja@axtens.net
2021-10-09 00:15:58 +11:00
Christophe Leroy
494f238a38 powerpc/476: Fix sparse report
Fix sparse errors:
  arch/powerpc/platforms/44x/ppc476.c:236:17: warning: cast removes address space '__iomem' of expression
  arch/powerpc/platforms/44x/ppc476.c:241:34: warning: incorrect type in argument 1 (different address spaces)
  arch/powerpc/platforms/44x/ppc476.c:241:34:    expected void const volatile [noderef] __iomem *addr
  arch/powerpc/platforms/44x/ppc476.c:241:34:    got unsigned char [usertype] *
  arch/powerpc/platforms/44x/ppc476.c:243:17: warning: incorrect type in argument 1 (different address spaces)
  arch/powerpc/platforms/44x/ppc476.c:243:17:    expected void volatile [noderef] __iomem *addr
  arch/powerpc/platforms/44x/ppc476.c:243:17:    got unsigned char [usertype] *[assigned] fpga

Mark 'fpga' pointer as __iomem.

Fixes: ab9a4183fd ("powerpc: Update currituck pci/usb fixup for new board revision")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa6055769b92a5d8685b8d0adab99c48a0b0ef4b.1631956926.git.christophe.leroy@csgroup.eu
2021-10-09 00:15:58 +11:00
Xiaoming Ni
c45361abb9 powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n
When CONFIG_SMP=y, timebase synchronization is required when the second
kernel is started.

arch/powerpc/kernel/smp.c:
  int __cpu_up(unsigned int cpu, struct task_struct *tidle)
  {
  	...
  	if (smp_ops->give_timebase)
  		smp_ops->give_timebase();
  	...
  }

  void start_secondary(void *unused)
  {
  	...
  	if (smp_ops->take_timebase)
  		smp_ops->take_timebase();
  	...
  }

When CONFIG_HOTPLUG_CPU=n and CONFIG_KEXEC_CORE=n,
 smp_85xx_ops.give_timebase is NULL,
 smp_85xx_ops.take_timebase is NULL,
As a result, the timebase is not synchronized.

Timebase  synchronization does not depend on CONFIG_HOTPLUG_CPU.

Fixes: 56f1ba2807 ("powerpc/mpc85xx: refactor the PM operations")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210929033646.39630-3-nixiaoming@huawei.com
2021-10-09 00:15:58 +11:00
Xiaoming Ni
3c2172c1c4 powerpc/85xx: Fix oops when mpc85xx_smp_guts_ids node cannot be found
When the field described in mpc85xx_smp_guts_ids[] is not configured in
dtb, the mpc85xx_setup_pmc() does not assign a value to the "guts"
variable. As a result, the oops is triggered when
mpc85xx_freeze_time_base() is executed.

Fixes: 56f1ba2807 ("powerpc/mpc85xx: refactor the PM operations")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210929033646.39630-2-nixiaoming@huawei.com
2021-10-09 00:15:58 +11:00
Alexey Kardashevskiy
3872731187 powerps/pseries/dma: Add support for 2M IOMMU page size
The upcoming PAPR spec adds a 2M page size, bit 23 right after 16G page
size in the "ibm,query-pe-dma-window" call.

This adds support for the new page size. Since the new page size is out
of sorted order, this changes the loop to not assume that shift[] is
sorted.

This has now been tested and is known to work on a pre-release version
of phyp.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211006044735.1114669-1-aik@ozlabs.ru
2021-10-09 00:15:57 +11:00
Mahesh Salgaonkar
eb8257a121 pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init
On pseries LPAR when an empty slot is assigned to partition OR in single
LPAR mode, kdump kernel crashes during issuing PHB reset.

In the kdump scenario, we traverse all PHBs and issue reset using the
pe_config_addr of the first child device present under each PHB. However
the code assumes that none of the PHB slots can be empty and uses
list_first_entry() to get the first child device under the PHB. Since
list_first_entry() expects the list to be non-empty, it returns an
invalid pci_dn entry and ends up accessing NULL phb pointer under
pci_dn->phb causing kdump kernel crash.

This patch fixes the below kdump kernel crash by skipping empty slots:

  audit: initializing netlink subsys (disabled)
  thermal_sys: Registered thermal governor 'fair_share'
  thermal_sys: Registered thermal governor 'step_wise'
  cpuidle: using governor menu
  pstore: Registered nvram as persistent store backend
  Issue PHB reset ...
  audit: type=2000 audit(1631267818.000:1): state=initialized audit_enabled=0 res=1
  BUG: Kernel NULL pointer dereference on read at 0x00000268
  Faulting instruction address: 0xc000000008101fb0
  Oops: Kernel access of bad area, sig: 7 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 7 PID: 1 Comm: swapper/7 Not tainted 5.14.0 #1
  NIP:  c000000008101fb0 LR: c000000009284ccc CTR: c000000008029d70
  REGS: c00000001161b840 TRAP: 0300   Not tainted  (5.14.0)
  MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000224  XER: 20040002
  CFAR: c000000008101f0c DAR: 0000000000000268 DSISR: 00080000 IRQMASK: 0
  ...
  NIP pseries_eeh_get_pe_config_addr+0x100/0x1b0
  LR  __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
  Call Trace:
    0xc00000001161bb80 (unreliable)
    __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
    do_one_initcall+0x60/0x2d0
    kernel_init_freeable+0x350/0x3f8
    kernel_init+0x3c/0x17c
    ret_from_kernel_thread+0x5c/0x64

Fixes: 5a090f7c36 ("powerpc/pseries: PCIE PHB reset")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
[mpe: Tweak wording and trim oops]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/163215558252.413351.8600189949820258982.stgit@jupiter
2021-10-07 23:37:22 +11:00
Christophe Leroy
d93f9e2374 powerpc/32s: Fix kuap_kernel_restore()
At interrupt exit, kuap_kernel_restore() calls kuap_unlock() with the
value contained in regs->kuap. However, when regs->kuap contains
0xffffffff it means that KUAP was not unlocked so calling kuap_unlock()
is unrelevant and results in jeopardising the contents of kernel space
segment registers.

So check that regs->kuap doesn't contain KUAP_NONE before calling
kuap_unlock(). In the meantime it also means that if KUAP has not
been correcly locked back at interrupt exit, it must be locked
before continuing. This is done by checking the content of
current->thread.kuap which was returned by kuap_get_and_assert_locked()

Fixes: 16132529ce ("powerpc/32s: Rework Kernel Userspace Access Protection")
Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d0c4d0f050a637052287c09ba521bad960a2790.1631715131.git.christophe.leroy@csgroup.eu
2021-10-07 23:34:00 +11:00
Cédric Le Goater
5a4b032078 powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler
The IPR drivers tests for MSI support at probe time with MSI vector 0
and when done, frees the IRQ with free_irq(). This test was introduced
by 95fecd9039 ("ipr: add test for MSI interrupt support") as an
improvement of commit 5a9ef25b14 ("[SCSI] ipr: add MSI support")
because a boot failure was reported on a Bimini PowerPC system:

  https://lore.kernel.org/r/1242926159.3007.5.camel@localhost.localdomain

It was finally decided to remove MSI support on Bimini systems in
6eb0ac0389 ("powerpc/maple: Add a quirk to disable MSI for IPR on
Bimini").

Linux 5.15-rc1 added MSI domain support to the pseries machine and
when free_irq is called() in the driver, msi_domain_deactivate() also
is. This resets the MSI table entry of the associate vector by calling
__pci_write_msi_msg() with an empty message and breaks any further
activation of the same vector. In the case of the IPR driver, it
breaks the initialization sequence of the IOA.

Introduce an empty irq_write_msi_msg() handler in the MSI domain of
the pseries machine to avoid clearing the MSI vector entry. Updating
the entry is not strictly necessary since it is initialized by the
underlying hypervisor, PowerVM or QEMU/KVM.

Fixes: a5f3d2c17b ("powerpc/pseries/pci: Add MSI domains")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Tested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
[mpe: Tweak comment wording and formatting slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210930102535.1047230-1-clg@kaod.org
2021-10-07 23:33:25 +11:00
Nicholas Piggin
f08fb25bc6 powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
The machine check handler is not considered NMI on 64s. The early
handler is the true NMI handler, and then it schedules the
machine_check_exception handler to run when interrupts are enabled.

This works fine except the case of an unrecoverable MCE, where the true
NMI is taken when MSR[RI] is clear, it can not recover, so it calls
machine_check_exception directly so something might be done about it.

Calling an async handler from NMI context can result in irq state and
other things getting corrupted. This can also trigger the BUG at
  arch/powerpc/include/asm/interrupt.h:168
  BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));

Fix this by making an _async version of the handler which is called
in the normal case, and a NMI version that is called for unrecoverable
interrupts.

Fixes: 2b43dd7653 ("powerpc/64: enable MSR[EE] in irq replay pt_regs")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145642.1331214-6-npiggin@gmail.com
2021-10-07 19:54:55 +11:00
Nicholas Piggin
768c470103 powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG
If a NMI hits early in an interrupt handler before the irq soft-mask
state is reconciled, that can cause a false-positive BUG with a
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG assertion.

Remove that assertion and instead check the case that if regs->msr has
EE clear, then regs->softe should be marked as disabled so the irq state
looks correct to NMI handlers, the same as how it's fixed up in the
case it was implicit soft-masked.

This doesn't fix a known problem -- the change that was fixed by commit
4ec5feec1a ("powerpc/64s: Make NMI record implicitly soft-masked code
as irqs disabled") was the addition of a warning in the soft-nmi
watchdog interrupt which can never actually fire when MSR[EE]=0. However
it may be important if NMI handlers grow more code, and it's less
surprising to anything using 'regs' - (I tripped over this when working
in the area).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145642.1331214-5-npiggin@gmail.com
2021-10-07 19:54:55 +11:00
Nicholas Piggin
ff058a8ada powerpc/64: warn if local irqs are enabled in NMI or hardirq context
This can help catch bugs such as the one fixed by the previous change
to prevent _exception() from enabling irqs.

ppc32 could have a similar warning but it has no good config option to
debug this stuff (the test may be overkill to add for production
kernels).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145642.1331214-4-npiggin@gmail.com
2021-10-07 19:54:55 +11:00
Nicholas Piggin
d0afd44c05 powerpc/traps: do not enable irqs in _exception
_exception can be called by machine check handlers when the MCE hits
user code (e.g., pseries and powernv). This will enable local irqs
because, which is a dicey thing to do in NMI or hard irq context.

This seemed to worked out okay because a userspace MCE can basically be
treated like a synchronous interrupt (after async / imprecise MCEs are
filtered out). Since NMI and hard irq handlers have started growing
nmi_enter / irq_enter, and more irq state sanity checks, this has
started to cause problems (or at least trigger warnings).

The Fixes tag to the commit which introduced this rather than try to
work out exactly which commit was the first that could possibly cause a
problem because that may be difficult to prove.

Fixes: 9f2f79e3a3 ("powerpc: Disable interrupts in 64-bit kernel FP and vector faults")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145642.1331214-3-npiggin@gmail.com
2021-10-07 19:54:54 +11:00
Nicholas Piggin
3e607dc4df powerpc/64s: fix program check interrupt emergency stack path
Emergency stack path was jumping into a 3: label inside the
__GEN_COMMON_BODY macro for the normal path after it had finished,
rather than jumping over it. By a small miracle this is the correct
place to build up a new interrupt frame with the existing stack
pointer, so things basically worked okay with an added weird looking
700 trap frame on top (which had the wrong ->nip so it didn't decode
bug messages either).

Fix this by avoiding using numeric labels when jumping over non-trivial
macros.

Before:

 LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
 Modules linked in:
 CPU: 0 PID: 88 Comm: sh Not tainted 5.15.0-rc2-00034-ge057cdade6e5 #2637
 NIP:  7265677368657265 LR: c00000000006c0c8 CTR: c0000000000097f0
 REGS: c0000000fffb3a50 TRAP: 0700   Not tainted
 MSR:  9000000000021031 <SF,HV,ME,IR,DR,LE>  CR: 00000700  XER: 20040000
 CFAR: c0000000000098b0 IRQMASK: 0
 GPR00: c00000000006c964 c0000000fffb3cf0 c000000001513800 0000000000000000
 GPR04: 0000000048ab0778 0000000042000000 0000000000000000 0000000000001299
 GPR08: 000001e447c718ec 0000000022424282 0000000000002710 c00000000006bee8
 GPR12: 9000000000009033 c0000000016b0000 00000000000000b0 0000000000000001
 GPR16: 0000000000000000 0000000000000002 0000000000000000 0000000000000ff8
 GPR20: 0000000000001fff 0000000000000007 0000000000000080 00007fff89d90158
 GPR24: 0000000002000000 0000000002000000 0000000000000255 0000000000000300
 GPR28: c000000001270000 0000000042000000 0000000048ab0778 c000000080647e80
 NIP [7265677368657265] 0x7265677368657265
 LR [c00000000006c0c8] ___do_page_fault+0x3f8/0xb10
 Call Trace:
 [c0000000fffb3cf0] [c00000000000bdac] soft_nmi_common+0x13c/0x1d0 (unreliable)
 --- interrupt: 700 at decrementer_common_virt+0xb8/0x230
 NIP:  c0000000000098b8 LR: c00000000006c0c8 CTR: c0000000000097f0
 REGS: c0000000fffb3d60 TRAP: 0700   Not tainted
 MSR:  9000000000021031 <SF,HV,ME,IR,DR,LE>  CR: 22424282  XER: 20040000
 CFAR: c0000000000098b0 IRQMASK: 0
 GPR00: c00000000006c964 0000000000002400 c000000001513800 0000000000000000
 GPR04: 0000000048ab0778 0000000042000000 0000000000000000 0000000000001299
 GPR08: 000001e447c718ec 0000000022424282 0000000000002710 c00000000006bee8
 GPR12: 9000000000009033 c0000000016b0000 00000000000000b0 0000000000000001
 GPR16: 0000000000000000 0000000000000002 0000000000000000 0000000000000ff8
 GPR20: 0000000000001fff 0000000000000007 0000000000000080 00007fff89d90158
 GPR24: 0000000002000000 0000000002000000 0000000000000255 0000000000000300
 GPR28: c000000001270000 0000000042000000 0000000048ab0778 c000000080647e80
 NIP [c0000000000098b8] decrementer_common_virt+0xb8/0x230
 LR [c00000000006c0c8] ___do_page_fault+0x3f8/0xb10
 --- interrupt: 700
 Instruction dump:
 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
 ---[ end trace 6d28218e0cc3c949 ]---

After:

 ------------[ cut here ]------------
 kernel BUG at arch/powerpc/kernel/exceptions-64s.S:491!
 Oops: Exception in kernel mode, sig: 5 [#1]
 LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
 Modules linked in:
 CPU: 0 PID: 88 Comm: login Not tainted 5.15.0-rc2-00034-ge057cdade6e5-dirty #2638
 NIP:  c0000000000098b8 LR: c00000000006bf04 CTR: c0000000000097f0
 REGS: c0000000fffb3d60 TRAP: 0700   Not tainted
 MSR:  9000000000021031 <SF,HV,ME,IR,DR,LE>  CR: 24482227  XER: 00040000
 CFAR: c0000000000098b0 IRQMASK: 0
 GPR00: c00000000006bf04 0000000000002400 c000000001513800 c000000001271868
 GPR04: 00000000100f0d29 0000000042000000 0000000000000007 0000000000000009
 GPR08: 00000000100f0d29 0000000024482227 0000000000002710 c000000000181b3c
 GPR12: 9000000000009033 c0000000016b0000 00000000100f0d29 c000000005b22f00
 GPR16: 00000000ffff0000 0000000000000001 0000000000000009 00000000100eed90
 GPR20: 00000000100eed90 0000000010000000 000000001000a49c 00000000100f1430
 GPR24: c000000001271868 0000000002000000 0000000000000215 0000000000000300
 GPR28: c000000001271800 0000000042000000 00000000100f0d29 c000000080647860
 NIP [c0000000000098b8] decrementer_common_virt+0xb8/0x230
 LR [c00000000006bf04] ___do_page_fault+0x234/0xb10
 Call Trace:
 Instruction dump:
 4182000c 39400001 48000008 894d0932 714a0001 39400008 408225fc 718a4000
 7c2a0b78 3821fcf0 41c20008 e82d0910 <0981fcf0> f92101a0 f9610170 f9810178
 ---[ end trace a5dbd1f5ea4ccc51 ]---

Fixes: 0a882e2846 ("powerpc/64s/exception: remove bad stack branch")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145642.1331214-2-npiggin@gmail.com
2021-10-07 19:54:54 +11:00
Naveen N. Rao
548b762763 powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000
Special case handling of the smallest 32-bit negative number for BPF_SUB.

Fixes: 51c66ad849 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7135360a0cdf70adedbccf9863128b8daef18764.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 19:52:59 +11:00
Naveen N. Rao
48164fccdf powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END
Suppress emitting zero extend instruction for 64-bit BPF_END_FROM_[L|B]E
operation.

Fixes: 51c66ad849 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b4e3c3546121315a8e2059b19a1bda84971816e4.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 19:52:59 +11:00
Naveen N. Rao
e8278d4444 powerpc/bpf ppc32: Fix JMP32_JSET_K
'andi' only takes an unsigned 16-bit value. Correct the imm range used
when emitting andi.

Fixes: 51c66ad849 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b94489f52831305ec15aca4dd04a3527236be7e8.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 19:52:58 +11:00
Naveen N. Rao
c9b8da77f2 powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation
Correct the destination register used for ALU32 BPF_ARSH operation.

Fixes: 51c66ad849 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6d24c1f9e79b6f61f5135eaf2ea1e8bcd4dac87b.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 19:52:58 +11:00
Naveen N. Rao
b7540d6250 powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
Emit similar instruction sequences to commit a048a07d7f
("powerpc/64s: Add support for a store forwarding barrier at kernel
entry/exit") when encountering BPF_NOSPEC.

Mitigations are enabled depending on what the firmware advertises. In
particular, we do not gate these mitigations based on current settings,
just like in x86. Due to this, we don't need to take any action if
mitigations are enabled or disabled at runtime.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/956570cbc191cd41f8274bed48ee757a86dac62a.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 19:52:58 +11:00
Naveen N. Rao
030905920f powerpc/security: Add a helper to query stf_barrier type
Add a helper to return the stf_barrier type for the current processor.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 19:52:58 +11:00
Naveen N. Rao
5855c4c1f4 powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
We aren't handling subtraction involving an immediate value of
0x80000000 properly. Fix the same.

Fixes: 156d0e290e ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fold in fix from Naveen to use imm <= 32768]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fc4b1276eb10761fd7ce0814c8dd089da2815251.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 19:52:26 +11:00
Naveen N. Rao
8bbc9d8224 powerpc/bpf: Fix BPF_MOD when imm == 1
Only ignore the operation if dividing by 1.

Fixes: 156d0e290e ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c674ca18c3046885602caebb326213731c675d06.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 18:52:33 +11:00
Naveen N. Rao
3832ba4e28 powerpc/bpf: Validate branch ranges
Add checks to ensure that we never emit branch instructions with
truncated branch offsets.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/71d33a6b7603ec1013c9734dd8bdd4ff5e929142.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 18:52:33 +11:00
Naveen N. Rao
4549c3ea31 powerpc/lib: Add helper to check if offset is within conditional branch range
Add a helper to check if a given offset is within the branch range for a
powerpc conditional branch instruction, and update some sites to use the
new helper.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/442b69a34ced32ca346a0d9a855f3f6cfdbbbd41.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 18:52:33 +11:00
Tom Lendacky
e9d1d2bb75 treewide: Replace the use of mem_encrypt_active() with cc_platform_has()
Replace uses of mem_encrypt_active() with calls to cc_platform_has() with
the CC_ATTR_MEM_ENCRYPT attribute.

Remove the implementation of mem_encrypt_active() across all arches.

For s390, since the default implementation of the cc_platform_has()
matches the s390 implementation of mem_encrypt_active(), cc_platform_has()
does not need to be implemented in s390 (the config option
ARCH_HAS_CC_PLATFORM is not set).

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-9-bp@alien8.de
2021-10-04 11:47:24 +02:00
Tom Lendacky
bfebd37e99 powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
Introduce a powerpc version of the cc_platform_has() function. This will
be used to replace the powerpc mem_encrypt_active() implementation, so
the implementation will initially only support the CC_ATTR_MEM_ENCRYPT
attribute.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lkml.kernel.org/r/20210928191009.32551-5-bp@alien8.de
2021-10-04 11:46:33 +02:00
Pali Rohár
eed183abc0 powerpc/fsl/dts: Fix phy-connection-type for fm1mac3
Property phy-connection-type contains invalid value "sgmii-2500" per scheme
defined in file ethernet-controller.yaml.

Correct phy-connection-type value should be "2500base-x".

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 84e0f1c138 ("powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)")
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02 14:11:46 +01:00
Richard Guy Briggs
1c30e3af8a audit: add support for the openat2 syscall
The openat2(2) syscall was added in kernel v5.6 with commit
fddb5d430a ("open: introduce openat2(2) syscall").

Add the openat2(2) syscall to the audit syscall classifier.

Link: https://github.com/linux-audit/audit-kernel/issues/67
Link: https://lore.kernel.org/r/f5f1a4d8699613f8c02ce762807228c841c2e26f.1621363275.git.rgb@redhat.com
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
[PM: merge fuzz due to previous header rename, commit line wraps]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-01 16:52:48 -04:00
Richard Guy Briggs
42f355ef59 audit: replace magic audit syscall class numbers with macros
Replace audit syscall class magic numbers with macros.

This required putting the macros into new header file
include/linux/audit_arch.h since the syscall macros were
included for both 64 bit and 32 bit in any compat code, causing
redefinition warnings.

Link: https://lore.kernel.org/r/2300b1083a32aade7ae7efb95826e8f3f260b1df.1621363275.git.rgb@redhat.com
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
[PM: renamed header to audit_arch.h after consulting with Richard]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-01 16:41:33 -04:00
Masami Hiramatsu
adf8a61a94 kprobes: treewide: Make it harder to refer kretprobe_trampoline directly
Since now there is kretprobe_trampoline_addr() for referring the
address of kretprobe trampoline code, we don't need to access
kretprobe_trampoline directly.

Make it harder to refer by renaming it to __kretprobe_trampoline().

Link: https://lkml.kernel.org/r/163163045446.489837.14510577516938803097.stgit@devnote2

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30 21:24:06 -04:00
Masami Hiramatsu
96fed8ac2b kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
The __kretprobe_trampoline_handler() callback, called from low level
arch kprobes methods, has the 'trampoline_address' parameter, which is
entirely superfluous as it basically just replicates:

  dereference_kernel_function_descriptor(kretprobe_trampoline)

In fact we had bugs in arch code where it wasn't replicated correctly.

So remove this superfluous parameter and use kretprobe_trampoline_addr()
instead.

Link: https://lkml.kernel.org/r/163163044546.489837.13505751885476015002.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30 21:24:06 -04:00
Masami Hiramatsu
f2ec8d9a3b kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
~15 years ago kprobes grew the 'arch_deref_entry_point()' __weak function:

  3d7e33825d: ("jprobes: make jprobes a little safer for users")

But this is just open-coded dereference_symbol_descriptor() in essence, and
its obscure nature was causing bugs.

Just use the real thing and remove arch_deref_entry_point().

Link: https://lkml.kernel.org/r/163163043630.489837.7924988885652708696.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30 21:24:06 -04:00
Masami Hiramatsu
c42421e205 kprobes: treewide: Use 'kprobe_opcode_t *' for the code address in get_optimized_kprobe()
Since get_optimized_kprobe() is only used inside kprobes,
it doesn't need to use 'unsigned long' type for 'addr' parameter.
Make it use 'kprobe_opcode_t *' for the 'addr' parameter and
subsequent call of arch_within_optimized_kprobe() also should use
'kprobe_opcode_t *'.

Note that MAX_OPTIMIZED_LENGTH and RELATIVEJUMP_SIZE are defined
by byte-size, but the size of 'kprobe_opcode_t' depends on the
architecture. Therefore, we must be careful when calculating
addresses using those macros.

Link: https://lkml.kernel.org/r/163163040680.489837.12133032364499833736.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30 21:24:05 -04:00
Ard Biesheuvel
336868afba powerpc: smp: remove hack to obtain offset of task_struct::cpu
Instead of relying on awful hacks to obtain the offset of the cpu field
in struct task_struct, move it back into struct thread_info, which does
not create the same level of circular dependency hell when trying to
include the header file that defines it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2021-09-30 16:13:11 +02:00
Ard Biesheuvel
bcf9033e54 sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y
THREAD_INFO_IN_TASK moved the CPU field out of thread_info, but this
causes some issues on architectures that define raw_smp_processor_id()
in terms of this field, due to the fact that #include'ing linux/sched.h
to get at struct task_struct is problematic in terms of circular
dependencies.

Given that thread_info and task_struct are the same data structure
anyway when THREAD_INFO_IN_TASK=y, let's move it back so that having
access to the type definition of struct thread_info is sufficient to
reference the CPU number of the current task.

Note that this requires THREAD_INFO_IN_TASK's definition of the
task_thread_info() helper to be updated, as task_cpu() takes a
pointer-to-const, whereas task_thread_info() (which is used to generate
lvalues as well), needs a non-const pointer. So make it a macro instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2021-09-30 16:13:10 +02:00
Ard Biesheuvel
227d735d88 powerpc: add CPU field to struct thread_info
The CPU field will be moved back into thread_info even when
THREAD_INFO_IN_TASK is enabled, so add it back to powerpc's definition
of struct thread_info.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2021-09-30 16:13:10 +02:00
Juergen Gross
a1c42ddedf kvm: rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS
KVM_MAX_VCPU_ID is not specifying the highest allowed vcpu-id, but the
number of allowed vcpu-ids. This has already led to confusion, so
rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS to make its semantics more
clear

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210913135745.13944-3-jgross@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 04:27:05 -04:00
Alexey Kardashevskiy
23c216b335 powerpc/iommu: Report the correct most efficient DMA mask for PCI devices
According to dma-api.rst, the dma_get_required_mask() helper should return
"the mask that the platform requires to operate efficiently". Which in
the case of PPC64 means the bypass mask and not a mask from an IOMMU table
which is shorter and slower to use due to map/unmap operations (especially
expensive on "pseries").

However the existing implementation ignores the possibility of bypassing
and returns the IOMMU table mask on the pseries platform which makes some
drivers (mpt3sas is one example) choose 32bit DMA even though bypass is
supported. The powernv platform sort of handles it by having a bigger
default window with a mask >=40 but it only works as drivers choose
63/64bit if the required mask is >32 which is rather pointless.

This reintroduces the bypass capability check to let drivers make
a better choice of the DMA mask.

Fixes: f1565c24b5 ("powerpc: use the generic dma_ops_bypass mode")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210930034454.95794-1-aik@ozlabs.ru
2021-09-30 17:10:17 +10:00
Sean Christopherson
a68de80f61 entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
that the two function are always called back-to-back by architectures
that have rseq.  The rseq helper is stubbed out for architectures that
don't support rseq, i.e. this is a nop across the board.

Note, tracehook_notify_resume() is horribly named and arguably does not
belong in tracehook.h as literally every line of code in it has nothing
to do with tracing.  But, that's been true since commit a42c6ded82
("move key_repace_session_keyring() into tracehook_notify_resume()")
first usurped tracehook_notify_resume() back in 2012.  Punt cleaning that
mess up to future patches.

No functional change intended.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210901203030.1292304-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22 10:24:01 -04:00
Alexey Dobriyan
04e85bbf71 isystem: delete global -isystem compile option
Further isolate kernel from userspace, prevent accidental inclusion of
undesireable headers, mainly float.h and stdatomic.h.

nds32 keeps -isystem globally due to intrinsics used in entrenched header.

-isystem is selectively reenabled for some files, again, for intrinsics.

Compile tested on:

hexagon-defconfig hexagon-allmodconfig
alpha-allmodconfig alpha-allnoconfig alpha-defconfig arm64-allmodconfig
arm64-allnoconfig arm64-defconfig arm-am200epdkit arm-aspeed_g4
arm-aspeed_g5 arm-assabet arm-at91_dt arm-axm55xx arm-badge4 arm-bcm2835
arm-cerfcube arm-clps711x arm-cm_x300 arm-cns3420vb arm-colibri_pxa270
arm-colibri_pxa300 arm-collie arm-corgi arm-davinci_all arm-dove
arm-ep93xx arm-eseries_pxa arm-exynos arm-ezx arm-footbridge arm-gemini
arm-h3600 arm-h5000 arm-hackkit arm-hisi arm-imote2 arm-imx_v4_v5
arm-imx_v6_v7 arm-integrator arm-iop32x arm-ixp4xx arm-jornada720
arm-keystone arm-lart arm-lpc18xx arm-lpc32xx arm-lpd270 arm-lubbock
arm-magician arm-mainstone arm-milbeaut_m10v arm-mini2440 arm-mmp2
arm-moxart arm-mps2 arm-multi_v4t arm-multi_v5 arm-multi_v7 arm-mv78xx0
arm-mvebu_v5 arm-mvebu_v7 arm-mxs arm-neponset arm-netwinder arm-nhk8815
arm-omap1 arm-omap2plus arm-orion5x arm-oxnas_v6 arm-palmz72 arm-pcm027
arm-pleb arm-pxa arm-pxa168 arm-pxa255-idp arm-pxa3xx arm-pxa910
arm-qcom arm-realview arm-rpc arm-s3c2410 arm-s3c6400 arm-s5pv210
arm-sama5 arm-shannon arm-shmobile arm-simpad arm-socfpga arm-spear13xx
arm-spear3xx arm-spear6xx arm-spitz arm-stm32 arm-sunxi arm-tct_hammer
arm-tegra arm-trizeps4 arm-u8500 arm-versatile arm-vexpress arm-vf610m4
arm-viper arm-vt8500_v6_v7 arm-xcep arm-zeus csky-allmodconfig
csky-allnoconfig csky-defconfig h8300-edosk2674 h8300-h8300h-sim
h8300-h8s-sim i386-allmodconfig i386-allnoconfig i386-defconfig
ia64-allmodconfig ia64-allnoconfig ia64-bigsur ia64-generic ia64-gensparse
ia64-tiger ia64-zx1 m68k-amcore m68k-amiga m68k-apollo m68k-atari
m68k-bvme6000 m68k-hp300 m68k-m5208evb m68k-m5249evb m68k-m5272c3
m68k-m5275evb m68k-m5307c3 m68k-m5407c3 m68k-m5475evb m68k-mac
m68k-multi m68k-mvme147 m68k-mvme16x m68k-q40 m68k-stmark2 m68k-sun3
m68k-sun3x microblaze-allmodconfig microblaze-allnoconfig microblaze-mmu
mips-ar7 mips-ath25 mips-ath79 mips-bcm47xx mips-bcm63xx mips-bigsur
mips-bmips_be mips-bmips_stb mips-capcella mips-cavium_octeon mips-ci20
mips-cobalt mips-cu1000-neo mips-cu1830-neo mips-db1xxx mips-decstation
mips-decstation_64 mips-decstation_r4k mips-e55 mips-fuloong2e
mips-gcw0 mips-generic mips-gpr mips-ip22 mips-ip27 mips-ip28 mips-ip32
mips-jazz mips-jmr3927 mips-lemote2f mips-loongson1b mips-loongson1c
mips-loongson2k mips-loongson3 mips-malta mips-maltaaprp mips-malta_kvm
mips-malta_qemu_32r6 mips-maltasmvp mips-maltasmvp_eva mips-maltaup
mips-maltaup_xpa mips-mpc30x mips-mtx1 mips-nlm_xlp mips-nlm_xlr
mips-omega2p mips-pic32mzda mips-pistachio mips-qi_lb60 mips-rb532
mips-rbtx49xx mips-rm200 mips-rs90 mips-rt305x mips-sb1250_swarm
mips-tb0219 mips-tb0226 mips-tb0287 mips-vocore2 mips-workpad mips-xway
nds32-allmodconfig nds32-allnoconfig nds32-defconfig nios2-10m50
nios2-3c120 nios2-allmodconfig nios2-allnoconfig openrisc-allmodconfig
openrisc-allnoconfig openrisc-or1klitex openrisc-or1ksim
openrisc-simple_smp parisc-allnoconfig parisc-generic-32bit
parisc-generic-64bit powerpc-acadia powerpc-adder875 powerpc-akebono
powerpc-amigaone powerpc-arches powerpc-asp8347 powerpc-bamboo
powerpc-bluestone powerpc-canyonlands powerpc-cell powerpc-chrp32
powerpc-cm5200 powerpc-currituck powerpc-ebony powerpc-eiger
powerpc-ep8248e powerpc-ep88xc powerpc-fsp2 powerpc-g5 powerpc-gamecube
powerpc-ge_imp3a powerpc-holly powerpc-icon powerpc-iss476-smp
powerpc-katmai powerpc-kilauea powerpc-klondike powerpc-kmeter1
powerpc-ksi8560 powerpc-linkstation powerpc-lite5200b powerpc-makalu
powerpc-maple powerpc-mgcoge powerpc-microwatt powerpc-motionpro
powerpc-mpc512x powerpc-mpc5200 powerpc-mpc7448_hpc2 powerpc-mpc8272_ads
powerpc-mpc8313_rdb powerpc-mpc8315_rdb powerpc-mpc832x_mds
powerpc-mpc832x_rdb powerpc-mpc834x_itx powerpc-mpc834x_itxgp
powerpc-mpc834x_mds powerpc-mpc836x_mds powerpc-mpc836x_rdk
powerpc-mpc837x_mds powerpc-mpc837x_rdb powerpc-mpc83xx
powerpc-mpc8540_ads powerpc-mpc8560_ads powerpc-mpc85xx_cds
powerpc-mpc866_ads powerpc-mpc885_ads powerpc-mvme5100 powerpc-obs600
powerpc-pasemi powerpc-pcm030 powerpc-pmac32 powerpc-powernv
powerpc-ppa8548 powerpc-ppc40x powerpc-ppc44x powerpc-ppc64
powerpc-ppc64e powerpc-ppc6xx powerpc-pq2fads powerpc-ps3
powerpc-pseries powerpc-rainier powerpc-redwood powerpc-sam440ep
powerpc-sbc8548 powerpc-sequoia powerpc-skiroot powerpc-socrates
powerpc-storcenter powerpc-stx_gp3 powerpc-taishan powerpc-tqm5200
powerpc-tqm8540 powerpc-tqm8541 powerpc-tqm8548 powerpc-tqm8555
powerpc-tqm8560 powerpc-tqm8xx powerpc-walnut powerpc-warp powerpc-wii
powerpc-xes_mpc85xx riscv-allmodconfig riscv-allnoconfig riscv-nommu_k210
riscv-nommu_k210_sdcard riscv-nommu_virt riscv-rv32 s390-allmodconfig
s390-allnoconfig s390-debug s390-zfcpdump sh-ap325rxa sh-apsh4a3a
sh-apsh4ad0a sh-dreamcast sh-ecovec24 sh-ecovec24-romimage sh-edosk7705
sh-edosk7760 sh-espt sh-hp6xx sh-j2 sh-kfr2r09 sh-kfr2r09-romimage
sh-landisk sh-lboxre2 sh-magicpanelr2 sh-microdev sh-migor sh-polaris
sh-r7780mp sh-r7785rp sh-rsk7201 sh-rsk7203 sh-rsk7264 sh-rsk7269
sh-rts7751r2d1 sh-rts7751r2dplus sh-sdk7780 sh-sdk7786 sh-se7206 sh-se7343
sh-se7619 sh-se7705 sh-se7712 sh-se7721 sh-se7722 sh-se7724 sh-se7750
sh-se7751 sh-se7780 sh-secureedge5410 sh-sh03 sh-sh2007 sh-sh7710voipgw
sh-sh7724_generic sh-sh7757lcr sh-sh7763rdp sh-sh7770_generic sh-sh7785lcr
sh-sh7785lcr_32bit sh-shmin sh-shx3 sh-titan sh-ul2 sh-urquell
sparc-allmodconfig sparc-allnoconfig sparc-sparc32 sparc-sparc64
um-i386-allmodconfig um-i386-allnoconfig um-i386-defconfig
um-x86_64-allmodconfig um-x86_64-allnoconfig x86_64-allmodconfig
x86_64-allnoconfig x86_64-defconfig xtensa-allmodconfig xtensa-allnoconfig
xtensa-audio_kc705 xtensa-cadence_csp xtensa-common xtensa-generic_kc705
xtensa-iss xtensa-nommu_kc705 xtensa-smp_lx200 xtensa-virt
xtensa-xip_kc705

Tested-by: Nathan Chancellor <nathan@kernel.org> # build (hexagon)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-09-22 09:26:24 +09:00
Oliver O'Halloran
06dc660e6e PCI: Rename pcibios_add_device() to pcibios_device_add()
The general convention for pcibios_* hooks is that they're named after the
corresponding pci_* function they provide a hook for. The exception is
pcibios_add_device() which provides a hook for pci_device_add().

Rename pcibios_add_device() to pcibios_device_add() so it matches
pci_device_add().

Also, remove the export of the microblaze version. The only caller must be
compiled as a built-in so there's no reason for the export.

Link: https://lore.kernel.org/r/20210913152709.48013-1-oohall@gmail.com
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>	# s390
2021-09-21 15:26:09 -05:00
Linus Torvalds
62453a460a powerpc fixes for 5.15 #2
Fix crashes when scv (System Call Vectored) is used to make a syscall when a transaction
 is active, on Power9 or later.
 
 Fix bad interactions between rfscv (Return-from scv) and Power9 fake-suspend mode.
 
 Fix crashes when handling machine checks in LPARs using the Hash MMU.
 
 Partly revert a recent change to our XICS interrupt controller code, which broke the
 recently added Microwatt support.
 
 Thanks to: Cédric Le Goater, Eirik Fuller, Ganesh Goudar, Gustavo Romero, Joel Stanley,
 Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmFHDfsTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgNlAD/oC/14jV2fCv54vH5Al1XAz2WuPhhEu
 UsN+MLlx7JshXfLeGzzEogvvakfLFuVkpIxsv+g8a9H6jejjCj/A4wjUy34Im0hS
 nhsMsa1vZPZY/Vz9v3WzdYmqjHVZDKo6ZdUmn0z4OjPnv7wQaEEbaXmoe0+CaXXQ
 jIY2TjVQxbgXeUI1QcPUWh36hiSaYn4O8fAD4VdjxdStl2z6q8zbo/5nzQ8dkBpP
 lldC0T/HRCKLe1ODUExFfbBFqFuLVdbALq0RIuh0GXd+dAqpK1iOkNsxUWChU6Q7
 gu0PrmhRydX3bmL9jmOsZfUk/hIR0fEgwSGITQAb8v0LCvZI0JRn+yvkl7uVZ9Ce
 OWY7B8HZQldvoQqA4/f7YCpAIS2ag7ugDrcAfuDVkDl2o/qJHoTi58IU24wI6+LL
 lwJZjNJoxZ8n1tl1l+gtCNp4E4YYOfuKZi0zH0oJoLiSWeK6PYOvwEGuEC2iQ9cJ
 BEdyYVXqSBuc2anCQ4k+y8gu/zJwP1MCpDNMPMU4P/JcaG9lnrLUhvctf4GKtOSy
 UiktHSTL52J5ldMxSMicpw9qCVgePsQX78Sk3UTHRilRmix6TtWOWYslm4mL+zeD
 u3wpnK+NKkju72LRS/AB1U6cWvModW+WBANPws9j6caoOorh397xsDGEYiKWvp0Z
 xSG/dzXzi2oscQ==
 =yC9U
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix crashes when scv (System Call Vectored) is used to make a syscall
   when a transaction is active, on Power9 or later.

 - Fix bad interactions between rfscv (Return-from scv) and Power9
   fake-suspend mode.

 - Fix crashes when handling machine checks in LPARs using the Hash MMU.

 - Partly revert a recent change to our XICS interrupt controller code,
   which broke the recently added Microwatt support.

Thanks to Cédric Le Goater, Eirik Fuller, Ganesh Goudar, Gustavo Romero,
Joel Stanley, Nicholas Piggin.

* tag 'powerpc-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/xics: Set the IRQ chip data for the ICS native backend
  powerpc/mce: Fix access error in mce handler
  KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers
  powerpc/64s: system call rfscv workaround for TM bugs
  selftests/powerpc: Add scv versions of the basic TM syscall tests
  powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
2021-09-19 13:00:23 -07:00
Cédric Le Goater
c006a06508 powerpc/xics: Set the IRQ chip data for the ICS native backend
The ICS native driver relies on the IRQ chip data to find the struct
'ics_native' describing the ICS controller but it was removed by commit
248af248a8 ("powerpc/xics: Rename the map handler in a check handler").
Revert this change to fix the Microwatt SoC platform.

Fixes: 248af248a8 ("powerpc/xics: Rename the map handler in a check handler")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210913134056.3761960-1-clg@kaod.org
2021-09-15 22:05:53 +10:00
Michael Ellerman
1619b69edc powerpc/boot: Fix build failure since GCC 4.9 removal
Stephen reported that the build was broken since commit
6d2ef226f2 ("compiler_attributes.h: drop __has_attribute() support for
gcc4"), with errors such as:

  include/linux/compiler_attributes.h:296:5: warning: "__has_attribute" is not defined, evaluates to 0 [-Wundef]
    296 | #if __has_attribute(__warning__)
        |     ^~~~~~~~~~~~~~~
  make[2]: *** [arch/powerpc/boot/Makefile:225: arch/powerpc/boot/crt0.o] Error 1

But we expect __has_attribute() to always be defined now that we've
stopped using GCC 4.

Linus debugged it to the point of reading the GCC sources, and noticing
that the problem is that __has_attribute() is not defined when
preprocessing assembly files, which is what we're doing here.

Our assembly files don't include, or need, compiler_attributes.h, but
they are getting it unconditionally from the -include in BOOT_CFLAGS,
which is then added in its entirety to BOOT_AFLAGS.

That -include was added in commit 77433830ed ("powerpc: boot: include
compiler_attributes.h") so that we'd have "fallthrough" and other
attributes defined for the C files in arch/powerpc/boot. But it's not
needed for assembly files.

The minimal fix is to move the addition to BOOT_CFLAGS of -include
compiler_attributes.h until after we've copied BOOT_CFLAGS into
BOOT_AFLAGS. That avoids including compiler_attributes.h for asm files,
but makes no other change to BOOT_CFLAGS or BOOT_AFLAGS.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Debugged-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14 08:33:32 -07:00
Peter Collingbourne
7962c2eddb
arch: remove unused function syscall_set_arguments()
This function appears to have been unused since it was first introduced in
commit 828c365cc8 ("tracehook: asm/syscall.h").

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/I8ce04f002903a37c0b6c1d16e9b2a3afa716c097
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-09-14 16:06:20 +02:00
Linus Torvalds
316346243b Merge branch 'gcc-min-version-5.1' (make gcc-5.1 the minimum version)
Merge patch series from Nick Desaulniers to update the minimum gcc
version to 5.1.

This is some of the left-overs from the merge window that I didn't want
to deal with yesterday, so it comes in after -rc1 but was sent before.

Gcc-4.9 support has been an annoyance for some time, and with -Werror I
had the choice of applying a fairly big patch from Kees Cook to remove a
fair number of initializer warnings (still leaving some), or this patch
series from Nick that just removes the source of the problem.

The initializer cleanups might still be worth it regardless, but
honestly, I preferred just tackling the problem with gcc-4.9 head-on.
We've been more aggressiuve about no longer having to care about
compilers that were released a long time ago, and I think it's been a
good thing.

I added a couple of patches on top to sort out a few left-overs now that
we no longer support gcc-4.x.

As noted by Arnd, as a result of this minimum compiler version upgrade
we can probably change our use of '--std=gnu89' to '--std=gnu11', and
finally start using local loop declarations etc.  But this series does
_not_ yet do that.

Link: https://lore.kernel.org/all/20210909182525.372ee687@canb.auug.org.au/
Link: https://lore.kernel.org/lkml/CAK7LNASs6dvU6D3jL2GG3jW58fXfaj6VNOe55NJnTB8UPuk2pA@mail.gmail.com/
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

* emailed patches from Nick Desaulniers <ndesaulniers@google.com>:
  Drop some straggling mentions of gcc-4.9 as being stale
  compiler_attributes.h: drop __has_attribute() support for gcc4
  vmlinux.lds.h: remove old check for GCC 4.9
  compiler-gcc.h: drop checks for older GCC versions
  Makefile: drop GCC < 5 -fno-var-tracking-assignments workaround
  arm64: remove GCC version check for ARCH_SUPPORTS_INT128
  powerpc: remove GCC version check for UPD_CONSTR
  riscv: remove Kconfig check for GCC version for ARCH_RV64I
  Kconfig.debug: drop GCC 5+ version check for DWARF5
  mm/ksm: remove old GCC 4.9+ check
  compiler.h: drop fallback overflow checkers
  Documentation: raise minimum supported version of GCC to 5.1
2021-09-13 10:43:04 -07:00
Nick Desaulniers
6563139d90 powerpc: remove GCC version check for UPD_CONSTR
Now that GCC 5.1 is the minimum supported version, we can drop this
workaround for older versions of GCC. This adversely affected clang,
too.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-13 10:18:28 -07:00
Ganesh Goudar
3a1e92d089 powerpc/mce: Fix access error in mce handler
We queue an irq work for deferred processing of mce event in realmode
mce handler, where translation is disabled. Queuing of the work may
result in accessing memory outside RMO region, such access needs the
translation to be enabled for an LPAR running with hash mmu else the
kernel crashes.

After enabling translation in mce_handle_error() we used to leave it
enabled to avoid crashing here, but now with the commit
74c3354bc1 ("powerpc/pseries/mce: restore msr before returning from
handler") we are restoring the MSR to disable translation.

Hence to fix this enable the translation before queuing the work.

Without this change following trace is seen on injecting SLB multihit in
an LPAR running with hash mmu.

  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  CPU: 5 PID: 1883 Comm: insmod Tainted: G        OE     5.14.0-mce+ #137
  NIP:  c000000000735d60 LR: c000000000318640 CTR: 0000000000000000
  REGS: c00000001ebff9a0 TRAP: 0300   Tainted: G       OE      (5.14.0-mce+)
  MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 28008228  XER: 00000001
  CFAR: c00000000031863c DAR: c00000027fa8fe08 DSISR: 40000000 IRQMASK: 0
  ...
  NIP llist_add_batch+0x0/0x40
  LR  __irq_work_queue_local+0x70/0xc0
  Call Trace:
    0xc00000001ebffc0c (unreliable)
    irq_work_queue+0x40/0x70
    machine_check_queue_event+0xbc/0xd0
    machine_check_early_common+0x16c/0x1f4

Fixes: 74c3354bc1 ("powerpc/pseries/mce: restore msr before returning from handler")
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
[mpe: Fix comment formatting, trim oops in change log for readability]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210909064330.312432-1-ganeshgr@linux.ibm.com
2021-09-13 22:34:12 +10:00
Nicholas Piggin
267cdfa213 KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers
POWER9 DD2.2 and 2.3 hardware implements a "fake-suspend" mode where
certain TM instructions executed in HV=0 mode cause softpatch interrupts
so the hypervisor can emulate them and prevent problematic processor
conditions. In this fake-suspend mode, the treclaim. instruction does
not modify registers.

Unfortunately the rfscv instruction executed by the guest do not
generate softpatch interrupts, which can cause the hypervisor to lose
track of the fake-suspend mode, and it can execute this treclaim. while
not in fake-suspend mode. This modifies GPRs and crashes the hypervisor.

It's not trivial to disable scv in the guest with HFSCR now, because
they assume a POWER9 has scv available. So this fix saves and restores
checkpointed registers across the treclaim.

Fixes: 7854f7545b ("KVM: PPC: Book3S: Rework TM save/restore code and make it C-callable")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210908101718.118522-2-npiggin@gmail.com
2021-09-13 22:34:12 +10:00
Nicholas Piggin
ae7aaecc3f powerpc/64s: system call rfscv workaround for TM bugs
The rfscv instruction does not work correctly with the fake-suspend mode
in POWER9, which can end up with the hypervisor restoring an incorrect
checkpoint.

Work around this by setting the _TIF_RESTOREALL flag if a system call
returns to a transaction active state, causing rfid to be used instead
of rfscv to return, which will do the right thing. The contents of the
registers are irrelevant because they will be overwritten in this case
anyway.

Fixes: 7fa95f9ada ("powerpc/64s: system call support for scv/rfscv instructions")
Reported-by: Eirik Fuller <efuller@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210908101718.118522-1-npiggin@gmail.com
2021-09-13 22:34:11 +10:00
Nicholas Piggin
b871895b14 powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
If a system call is made with a transaction active, the kernel
immediately aborts it and returns. scv system calls disable irqs even
earlier in their interrupt handler, and tabort_syscall does not fix this
up.

This can result in irq soft-mask state being messed up on the next
kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in
the kernel exit handlers, or possibly worse.

This can't easily be fixed in asm because at this point an async irq may
have hit, which is soft-masked and marked pending. The pending interrupt
has to be replayed before returning to userspace. The fix is to move the
tabort_syscall code to C in the main syscall handler, and just skip the
system call but otherwise return as usual, which will take care of the
pending irqs. This also does a bunch of other things including possible
signal delivery to the process, but the doomed transaction should still
be aborted when it is eventually returned to.

The sc system call path is changed to use the new C function as well to
reduce code and path differences. This slows down how quickly system
calls are aborted when called while a transaction is active, which could
potentially impact TM performance. But making any system call is already
bad for performance, and TM is on the way out, so go with simpler over
faster.

Fixes: 7fa95f9ada ("powerpc/64s: system call support for scv/rfscv instructions")
Reported-by: Eirik Fuller <efuller@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use #ifdef rather than IS_ENABLED() to fix build error on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210903125707.1601269-1-npiggin@gmail.com
2021-09-13 22:34:11 +10:00
Arnd Bergmann
a7a08b275a arch: remove compat_alloc_user_space
All users of compat_alloc_user_space() and copy_in_user() have been
removed from the kernel, only a few functions in sparc remain that can be
changed to calling arch_copy_in_user() instead.

Link: https://lkml.kernel.org/r/20210727144859.4150043-7-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 15:32:35 -07:00
Arnd Bergmann
59ab844eed compat: remove some compat entry points
These are all handled correctly when calling the native system call entry
point, so remove the special cases.

Link: https://lkml.kernel.org/r/20210727144859.4150043-6-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 15:32:35 -07:00
Linus Torvalds
2d338201d5 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "147 patches, based on 7d2a07b769.

  Subsystems affected by this patch series: mm (memory-hotplug, rmap,
  ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
  alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
  checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
  selftests, ipc, and scripts"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
  scripts: check_extable: fix typo in user error message
  mm/workingset: correct kernel-doc notations
  ipc: replace costly bailout check in sysvipc_find_ipc()
  selftests/memfd: remove unused variable
  Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
  configs: remove the obsolete CONFIG_INPUT_POLLDEV
  prctl: allow to setup brk for et_dyn executables
  pid: cleanup the stale comment mentioning pidmap_init().
  kernel/fork.c: unexport get_{mm,task}_exe_file
  coredump: fix memleak in dump_vma_snapshot()
  fs/coredump.c: log if a core dump is aborted due to changed file permissions
  nilfs2: use refcount_dec_and_lock() to fix potential UAF
  nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
  nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
  nilfs2: fix NULL pointer in nilfs_##name##_attr_release
  nilfs2: fix memory leak in nilfs_sysfs_create_device_group
  trap: cleanup trap_init()
  init: move usermodehelper_enable() to populate_rootfs()
  ...
2021-09-08 12:55:35 -07:00
Kefeng Wang
8b097881b5 trap: cleanup trap_init()
There are some empty trap_init() definitions in different ARCHs, Introduce
a new weak trap_init() function to clean them up.

Link: https://lkml.kernel.org/r/20210812123602.76356-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>	[arm32]
Acked-by: Vineet Gupta						[arc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>			[powerpc]
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <palmerdabbelt@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:27 -07:00
David Hildenbrand
e1c158e495 mm/memory_hotplug: remove nid parameter from remove_memory() and friends
There is only a single user remaining.  We can simply lookup the nid only
used for node offlining purposes when walking our memory blocks.  We don't
expect to remove multi-nid ranges; and if we'd ever do, we most probably
don't care about removing multi-nid ranges that actually result in empty
nodes.

If ever required, we can detect the "multi-nid" scenario and simply try
offlining all online nodes.

Link: https://lkml.kernel.org/r/20210712124052.26491-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta@ionos.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
David Hildenbrand
65a2aa5f48 mm/memory_hotplug: remove nid parameter from arch_remove_memory()
The parameter is unused, let's remove it.

Link: https://lkml.kernel.org/r/20210712124052.26491-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Heiko Carstens <hca@linux.ibm.com>	[s390]
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Jia He <justin.he@arm.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
Linus Torvalds
192ad3c27a ARM:
- Page ownership tracking between host EL1 and EL2
 
 - Rely on userspace page tables to create large stage-2 mappings
 
 - Fix incompatibility between pKVM and kmemleak
 
 - Fix the PMU reset state, and improve the performance of the virtual PMU
 
 - Move over to the generic KVM entry code
 
 - Address PSCI reset issues w.r.t. save/restore
 
 - Preliminary rework for the upcoming pKVM fixed feature
 
 - A bunch of MM cleanups
 
 - a vGIC fix for timer spurious interrupts
 
 - Various cleanups
 
 s390:
 
 - enable interpretation of specification exceptions
 
 - fix a vcpu_idx vs vcpu_id mixup
 
 x86:
 
 - fast (lockless) page fault support for the new MMU
 
 - new MMU now the default
 
 - increased maximum allowed VCPU count
 
 - allow inhibit IRQs on KVM_RUN while debugging guests
 
 - let Hyper-V-enabled guests run with virtualized LAPIC as long as they
   do not enable the Hyper-V "AutoEOI" feature
 
 - fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC)
 
 - tuning for the case when two-dimensional paging (EPT/NPT) is disabled
 
 - bugfixes and cleanups, especially with respect to 1) vCPU reset and
   2) choosing a paging mode based on CR0/CR4/EFER
 
 - support for 5-level page table on AMD processors
 
 Generic:
 
 - MMU notifier invalidation callbacks do not take mmu_lock unless necessary
 
 - improved caching of LRU kvm_memory_slot
 
 - support for histogram statistics
 
 - add statistics for halt polling and remote TLB flush requests
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmE2CIAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMyqwf+Ky2WoThuQ9Ra0r/m8pUTAx5+gsAf
 MmG24rNLE+26X0xuBT9Q5+etYYRLrRTWJvo5cgHooz7muAYW6scR+ho5xzvLTAxi
 DAuoijkXsSdGoFCp0OMUHiwG3cgY5N7feTEwLPAb2i6xr/l6SZyCP4zcwiiQbJ2s
 UUD0i3rEoNQ02/hOEveud/ENxzUli9cmmgHKXR3kNgsJClSf1fcuLnhg+7EGMhK9
 +c2V+hde5y0gmEairQWm22MLMRolNZ5NL4kjykiNh2M5q9YvbHe5+f/JmENlNZMT
 bsUQT6Ry1ukuJ0V59rZvUw71KknPFzZ3d6HgW4pwytMq6EJKiISHzRbVnQ==
 =FCAB
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - Page ownership tracking between host EL1 and EL2
   - Rely on userspace page tables to create large stage-2 mappings
   - Fix incompatibility between pKVM and kmemleak
   - Fix the PMU reset state, and improve the performance of the virtual
     PMU
   - Move over to the generic KVM entry code
   - Address PSCI reset issues w.r.t. save/restore
   - Preliminary rework for the upcoming pKVM fixed feature
   - A bunch of MM cleanups
   - a vGIC fix for timer spurious interrupts
   - Various cleanups

  s390:
   - enable interpretation of specification exceptions
   - fix a vcpu_idx vs vcpu_id mixup

  x86:
   - fast (lockless) page fault support for the new MMU
   - new MMU now the default
   - increased maximum allowed VCPU count
   - allow inhibit IRQs on KVM_RUN while debugging guests
   - let Hyper-V-enabled guests run with virtualized LAPIC as long as
     they do not enable the Hyper-V "AutoEOI" feature
   - fixes and optimizations for the toggling of AMD AVIC (virtualized
     LAPIC)
   - tuning for the case when two-dimensional paging (EPT/NPT) is
     disabled
   - bugfixes and cleanups, especially with respect to vCPU reset and
     choosing a paging mode based on CR0/CR4/EFER
   - support for 5-level page table on AMD processors

  Generic:
   - MMU notifier invalidation callbacks do not take mmu_lock unless
     necessary
   - improved caching of LRU kvm_memory_slot
   - support for histogram statistics
   - add statistics for halt polling and remote TLB flush requests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
  KVM: Drop unused kvm_dirty_gfn_invalid()
  KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
  KVM: MMU: mark role_regs and role accessors as maybe unused
  KVM: MIPS: Remove a "set but not used" variable
  x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
  KVM: stats: Add VM stat for remote tlb flush requests
  KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
  KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
  KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
  Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
  KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
  kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
  kvm: x86: Increase MAX_VCPUS to 1024
  kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
  KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
  KVM: s390: index kvm->arch.idle_mask by vcpu_idx
  KVM: s390: Enable specification exception interpretation
  KVM: arm64: Trim guest debug exception handling
  KVM: SVM: Add 5-level page table support for SVM
  ...
2021-09-07 13:40:51 -07:00
Paolo Bonzini
e99314a340 KVM/arm64 updates for 5.15
- Page ownership tracking between host EL1 and EL2
 
 - Rely on userspace page tables to create large stage-2 mappings
 
 - Fix incompatibility between pKVM and kmemleak
 
 - Fix the PMU reset state, and improve the performance of the virtual PMU
 
 - Move over to the generic KVM entry code
 
 - Address PSCI reset issues w.r.t. save/restore
 
 - Preliminary rework for the upcoming pKVM fixed feature
 
 - A bunch of MM cleanups
 
 - a vGIC fix for timer spurious interrupts
 
 - Various cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmEnfogPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDF9oQAINWHN1n30gsxcErMV8gH+XAyhDq2vTjkExQ
 Qz5ddo4R5zeVkj0nkunFSK+W3xYz+W97X3I+IaiiHvk5D6dUatj37IyYlazX5iFT
 7mbjTAqY7GRxfd6um7uK+CTRCApXY49GGkCVLGA5f+6mQ0JMVXaK9AKlsXKWUQLZ
 JvLasUgKkseN6IEJWmPDNBdIeiKBTZloeZMdlM2vSm34HsuirSS5LmshdzJQzSk8
 QSEqwXZX50afzJLNlB9Qa6V1tokjZVoYIBk0vAPO83tTh9HIyGL/PFAqBeq2rnWT
 M19fFFbx5vizap4ICbpviLmZ5AOywCoBmbPBT79eMAJ53rOqHUJhU1y/3DoiVzxu
 LJZI4wmGBQZVivOWOqyEZcNtTAagPLhyrLhMzYulBLwAjfFJmUHdSOxYtx+2Ysvr
 SDIPN31FKWrvifTXTqJHDmaaXusi2CNZUOPzVSe2I14SbX+ZX2ny9DltlbRgPNuc
 hGJagI5cZc0ngd4mAIzjjNmgBS2B+dSc8dOo71dRNJRLtQLiNHcAyQNJyFme+4xI
 NpvpkvzxBAs8rG2X0YIR/Cz3W3yZoCYuQNcoPk7+F/bUTK47VocQCS+gLucHVLbT
 H4286EV5n4nZ7E01oJ6uWnDnslPvrx9Sz2fxsrWYkBDR+xrz0EprrGsftFaILprz
 Ic43uXfd
 =LuHM
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 5.15

- Page ownership tracking between host EL1 and EL2

- Rely on userspace page tables to create large stage-2 mappings

- Fix incompatibility between pKVM and kmemleak

- Fix the PMU reset state, and improve the performance of the virtual PMU

- Move over to the generic KVM entry code

- Address PSCI reset issues w.r.t. save/restore

- Preliminary rework for the upcoming pKVM fixed feature

- A bunch of MM cleanups

- a vGIC fix for timer spurious interrupts

- Various cleanups
2021-09-06 06:34:48 -04:00
Linus Torvalds
58ca241587 Tracing updates for 5.15:
- Simplifying the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT
 
  - bootconfig now can start histograms
 
  - bootconfig supports group/all enabling
 
  - histograms now can put values in linear size buckets
 
  - execnames can be passed to synthetic events
 
  - Introduction of "event probes" that attach to other events and
    can retrieve data from pointers of fields, or record fields
    as different types (a pointer to a string as a string instead
    of just a hex number)
 
  - Various fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYTJDixQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnPLAP9XviWrZD27uFj6LU/Vp2umbq8la1aC
 oW8o9itUGpLoHQD+OtsMpQXsWrxoNw/JD1OWCH4J0YN+TnZAUUG2E9e0twA=
 =OZXG
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT

 - bootconfig can now start histograms

 - bootconfig supports group/all enabling

 - histograms now can put values in linear size buckets

 - execnames can be passed to synthetic events

 - introduce "event probes" that attach to other events and can retrieve
   data from pointers of fields, or record fields as different types (a
   pointer to a string as a string instead of just a hex number)

 - various fixes and clean ups

* tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits)
  tracing/doc: Fix table format in histogram code
  selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes
  selftests/ftrace: Add selftest for testing eprobe events on synthetic events
  selftests/ftrace: Add test case to test adding and removing of event probe
  selftests/ftrace: Fix requirement check of README file
  selftests/ftrace: Add clear_dynamic_events() to test cases
  tracing: Add a probe that attaches to trace events
  tracing/probes: Reject events which have the same name of existing one
  tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs
  tracing/probe: Change traceprobe_set_print_fmt() to take a type
  tracing/probes: Use struct_size() instead of defining custom macros
  tracing/probes: Allow for dot delimiter as well as slash for system names
  tracing/probe: Have traceprobe_parse_probe_arg() take a const arg
  tracing: Have dynamic events have a ref counter
  tracing: Add DYNAMIC flag for dynamic events
  tracing: Replace deprecated CPU-hotplug functions.
  MAINTAINERS: Add an entry for os noise/latency
  tracepoint: Fix kerneldoc comments
  bootconfig/tracing/ktest: Update ktest example for boot-time tracing
  tools/bootconfig: Use per-group/all enable option in ftrace2bconf script
  ...
2021-09-05 11:50:41 -07:00
Linus Torvalds
b250e6d141 Kbuild updates for v5.15
- Add -s option (strict mode) to merge_config.sh to make it fail when
    any symbol is redefined.
 
  - Show a warning if a different compiler is used for building external
    modules.
 
  - Infer --target from ARCH for CC=clang to let you cross-compile the
    kernel without CROSS_COMPILE.
 
  - Make the integrated assembler default (LLVM_IAS=1) for CC=clang.
 
  - Add <linux/stdarg.h> to the kernel source instead of borrowing
    <stdarg.h> from the compiler.
 
  - Add Nick Desaulniers as a Kbuild reviewer.
 
  - Drop stale cc-option tests.
 
  - Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG
    to handle symbols in inline assembly.
 
  - Show a warning if 'FORCE' is missing for if_changed rules.
 
  - Various cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmExXHoVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGAZwP/iHdEZzuQ4cz2uXUaV0fevj9jjPU
 zJ8wrrNabAiT6f5x861DsARQSR4OSt3zN0tyBNgZwUdotbe7ED5GegrgIUBMWlML
 QskhTEIZj7TexAX/20vx671gtzI3JzFg4c9BuriXCFRBvychSevdJPr65gMDOesL
 vOJnXe+SGXG2+fPWi/PxrcOItNRcveqo2GiWHT3g0Cv/DJUulu81gEkz3hrufnMR
 cjMeSkV0nJJcvI755OQBOUnEuigW64k4m2WxHPG24tU8cQOCqV6lqwOfNQBAn4+F
 OoaCMyPQT9gvGYwGExQMCXGg0wbUt1qnxzOVoA2qFCwbo+MFhqjBvPXab6VJm7CE
 mY3RrTtvxSqBdHI6EGcYeLjhycK9b+LLoJ1qc3S9FK8It6NoFFp4XV0R6ItPBls7
 mWi9VSpyI6k0AwLq+bGXEHvaX/bnnf/vfqn8H+w6mRZdXjFV8EB2DiOSRX/OqjVG
 RnvTtXzWWThLyXvWR3Jox4+7X6728oL7akLemoeZI6oTbJDm7dQgwpz5HbSyHXLh
 d+gUF3Y/6lqxT5N9GSVDxpD1bEMh2I7nGQ4M7WGbGas/3yUemF8wbBqGQo4a+YeD
 d9vGAUxDp2PQTtL2sjFo5Gd4PZEM9g7vwWzRvHe0o5NxKEXcBg25b8cD1hxrN9Y4
 Y1AAnc0kLO+My3PC
 =lw3M
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add -s option (strict mode) to merge_config.sh to make it fail when
   any symbol is redefined.

 - Show a warning if a different compiler is used for building external
   modules.

 - Infer --target from ARCH for CC=clang to let you cross-compile the
   kernel without CROSS_COMPILE.

 - Make the integrated assembler default (LLVM_IAS=1) for CC=clang.

 - Add <linux/stdarg.h> to the kernel source instead of borrowing
   <stdarg.h> from the compiler.

 - Add Nick Desaulniers as a Kbuild reviewer.

 - Drop stale cc-option tests.

 - Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG
   to handle symbols in inline assembly.

 - Show a warning if 'FORCE' is missing for if_changed rules.

 - Various cleanups

* tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
  kbuild: redo fake deps at include/ksym/*.h
  kbuild: clean up objtool_args slightly
  modpost: get the *.mod file path more simply
  checkkconfigsymbols.py: Fix the '--ignore' option
  kbuild: merge vmlinux_link() between ARCH=um and other architectures
  kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh
  kbuild: merge vmlinux_link() between the ordinary link and Clang LTO
  kbuild: remove stale *.symversions
  kbuild: remove unused quiet_cmd_update_lto_symversions
  gen_compile_commands: extract compiler command from a series of commands
  x86: remove cc-option-yn test for -mtune=
  arc: replace cc-option-yn uses with cc-option
  s390: replace cc-option-yn uses with cc-option
  ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild
  sparc: move the install rule to arch/sparc/Makefile
  security: remove unneeded subdir-$(CONFIG_...)
  kbuild: sh: remove unused install script
  kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
  kbuild: Switch to 'f' variants of integrated assembler flag
  kbuild: Shuffle blank line to improve comment meaning
  ...
2021-09-03 15:33:47 -07:00
Linus Torvalds
7cca308cfd powerpc updates for 5.15
- Convert pseries & powernv to use MSI IRQ domains.
 
  - Rework the pseries CPU numbering so that CPUs that are removed, and later re-added, are
    given a CPU number on the same node as previously, when possible.
 
  - Add support for a new more flexible device-tree format for specifying NUMA distances.
 
  - Convert powerpc to GENERIC_PTDUMP.
 
  - Retire sbc8548 and sbc8641d board support.
 
  - Various other small features and fixes.
 
 Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard, Cédric Le Goater,
 Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas, Fangrui Song, Finn Thain, Gautham R.
 Shenoy, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo
 Bras, Lukas Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan Chancellor,
 Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R. Sampat, Randy Dunlap, Sebastian
 Andrzej Siewior, Srikar Dronamraju, Wan Jiabing, Xiongwei Song, Zheng Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEyHTYTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDo3D/9aXMVP2wsEMNB0XhTiJ1UUdi311Uq9
 PvkAaGZH14ZqZLVigeiD3gt6YzTH0cEuGj6qgwsJrPDjF8FESnMbBsprMLr5/qE1
 itWRGMAMCFaeTcB9ogYVJkzwg6RN2ZgIqoq4NVswNSXoAQGWb+1bvXq3RnXXNuGR
 TQmLL02poNC6nX0YbRaQoT1Xx4nfUTiKHhU+Aok9uOCMJIyYZVATR6Qafb7/j7tO
 UvjwOHztbu84lcJOGmSnw4LcmwNORLuP9IwR0r+O1M3ijEZqDo9TPkvtSz8HZwjU
 mxdJwhrUmN0euMcghuiFxW+1XG2eM49ugsdJugiezG2RaIijbIp0nAIvdeaKAgT1
 OSSwvWCQ0fkTPyLXE+O6tVqMhlUMdqQlRcyNwmN9svIip9VnwGNq3vA4ePlJm6Fi
 i0i/tLqVNlJwFokZ7blW5g8SRgGRuFfXd5XUYLFvy5Teez+/7b1mW95gPQZSJ8kV
 Tbx2e0nHAPX4hCAxJ1AB3/zTlnjY+4+WJ9bD5XdgXkeVE8PPh1BEkulhMi1R1OMj
 57D1W6OgsBu/Pze78wjAvwO8+NAb1T/2mv2Bd/LY6Q+7hNDqOOhuajyBTxbH41FG
 sqx5bKjKOwgTybfV9A0Eo0e4FQBX07yXltBFHaPlyA4sOsIhM59+PxNrEwN1eZrQ
 LVVsdBXg8pHxrw==
 =EbN0
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Convert pseries & powernv to use MSI IRQ domains.

 - Rework the pseries CPU numbering so that CPUs that are removed, and
   later re-added, are given a CPU number on the same node as
   previously, when possible.

 - Add support for a new more flexible device-tree format for specifying
   NUMA distances.

 - Convert powerpc to GENERIC_PTDUMP.

 - Retire sbc8548 and sbc8641d board support.

 - Various other small features and fixes.

Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard,
Cédric Le Goater, Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas,
Fangrui Song, Finn Thain, Gautham R.  Shenoy, Hari Bathini, Joel
Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo Bras, Lukas
Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan
Chancellor, Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R.
Sampat, Randy Dunlap, Sebastian Andrzej Siewior, Srikar Dronamraju, Wan
Jiabing, Xiongwei Song, and Zheng Yongjun.

* tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (154 commits)
  powerpc/bug: Cast to unsigned long before passing to inline asm
  powerpc/ptdump: Fix generic ptdump for 64-bit
  KVM: PPC: Fix clearing never mapped TCEs in realmode
  powerpc/pseries/iommu: Rename "direct window" to "dma window"
  powerpc/pseries/iommu: Make use of DDW for indirect mapping
  powerpc/pseries/iommu: Find existing DDW with given property name
  powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
  powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
  powerpc/pseries/iommu: Allow DDW windows starting at 0x00
  powerpc/pseries/iommu: Add ddw_list_new_entry() helper
  powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
  powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  powerpc/pseries/iommu: Replace hard-coded page shift
  powerpc/numa: Update cpu_cpu_map on CPU online/offline
  powerpc/numa: Print debug statements only when required
  powerpc/numa: convert printk to pr_xxx
  powerpc/numa: Drop dbg in favour of pr_debug
  powerpc/smp: Enable CACHE domain for shared processor
  powerpc/smp: Update cpu_core_map on all PowerPc systems
  ...
2021-09-03 11:22:50 -07:00
Linus Torvalds
3de18c865f Merge branch 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb updates from Konrad Rzeszutek Wilk:
 "A new feature called restricted DMA pools. It allows SWIOTLB to
  utilize per-device (or per-platform) allocated memory pools instead of
  using the global one.

  The first big user of this is ARM Confidential Computing where the
  memory for DMA operations can be set per platform"

* 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: (23 commits)
  swiotlb: use depends on for DMA_RESTRICTED_POOL
  of: restricted dma: Don't fail device probe on rmem init failure
  of: Move of_dma_set_restricted_buffer() into device.c
  powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()
  s390/pv: fix the forcing of the swiotlb
  swiotlb: Free tbl memory in swiotlb_exit()
  swiotlb: Emit diagnostic in swiotlb_exit()
  swiotlb: Convert io_default_tlb_mem to static allocation
  of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS
  swiotlb: add overflow checks to swiotlb_bounce
  swiotlb: fix implicit debugfs declarations
  of: Add plumbing for restricted DMA pool
  dt-bindings: of: Add restricted DMA pool
  swiotlb: Add restricted DMA pool initialization
  swiotlb: Add restricted DMA alloc/free support
  swiotlb: Refactor swiotlb_tbl_unmap_single
  swiotlb: Move alloc_size to swiotlb_find_slots
  swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  ...
2021-09-03 10:34:44 -07:00
Linus Torvalds
14726903c8 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "173 patches.

  Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
  pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
  bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
  hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
  oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
  mm/madvise: add MADV_WILLNEED to process_madvise()
  mm/vmstat: remove unneeded return value
  mm/vmstat: simplify the array size calculation
  mm/vmstat: correct some wrong comments
  mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
  selftests: vm: add COW time test for KSM pages
  selftests: vm: add KSM merging time test
  mm: KSM: fix data type
  selftests: vm: add KSM merging across nodes test
  selftests: vm: add KSM zero page merging test
  selftests: vm: add KSM unmerge test
  selftests: vm: add KSM merge test
  mm/migrate: correct kernel-doc notation
  mm: wire up syscall process_mrelease
  mm: introduce process_mrelease system call
  memblock: make memblock_find_in_range method private
  mm/mempolicy.c: use in_task() in mempolicy_slab_node()
  mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
  mm/mempolicy: advertise new MPOL_PREFERRED_MANY
  mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
  ...
2021-09-03 10:08:28 -07:00
Suren Baghdasaryan
dce4910396 mm: wire up syscall process_mrelease
Split off from prev patch in the series that implements the syscall.

Link: https://lkml.kernel.org/r/20210809185259.405936-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:17 -07:00
Ohhoon Kwon
fc1f5e980a mm: sparse: pass section_nr to find_memory_block
With CONFIG_SPARSEMEM_EXTREME enabled, __section_nr() which converts
mem_section to section_nr could be costly since it iterates all section
roots to check if the given mem_section is in its range.

On the other hand, __nr_to_section() which converts section_nr to
mem_section can be done in O(1).

Let's pass section_nr instead of mem_section ptr to find_memory_block() in
order to reduce needless iterations.

Link: https://lkml.kernel.org/r/20210707150212.855-3-ohoono.kwon@samsung.com
Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:14 -07:00
Michael Ellerman
a3314262ee Merge branch 'fixes' into next
Merge our fixes branch into next.

That lets us resolve a conflict in arch/powerpc/sysdev/xive/common.c.

Between cbc06f051c ("powerpc/xive: Do not skip CPU-less nodes when
creating the IPIs"), which moved request_irq() out of xive_init_ipis(),
and 17df41fec5 ("powerpc: use IRQF_NO_DEBUG for IPIs") which added
IRQF_NO_DEBUG to that request_irq() call, which has now moved.
2021-09-03 22:54:12 +10:00
Linus Torvalds
4a3bb4200a dma-mapping updates for Linux 5.15
- fix debugfs initialization order (Anthony Iliopoulos)
  - use memory_intersects() directly (Kefeng Wang)
  - allow to return specific errors from ->map_sg
    (Logan Gunthorpe, Martin Oliveira)
  - turn the dma_map_sg return value into an unsigned int (me)
  - provide a common global coherent pool іmplementation (me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmEvY+8LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPaehAAsgnBzzzoLHO83pgs0KL92c+0DiMNHYmaMCJOvZXk
 x2Irv+O74WikRJc4S7uQ26p2spjmUxjmiOjld+8+NN0liD4QO9BQ/SZpIp8emuKS
 /yPG6Xh86xSl/OrPL1y7kGeHkRi5sm3mRhcTdILFQFPLcSReupe++GRfnvrpbOPk
 tj3pBGXluD6iJH12BBt00ushUVzZ0F2xaF6xUDAs94RSZ3tlqsfx6c928Y1KxSZh
 f89q/KuaokyogFG7Ujj/nYgIUETaIs2W6UmxBfRzdEMJFSffwomUMbw+M+qGJ7/d
 2UjamFYRX16FReE8WNsndbX1E6k5JBW12E1qwV3dUwatlNLWEaRq3PNiWkF7zcFH
 LDkpDYN6s5bIDPTfDp21XfPygoH8KQhnD9lVf0aB7n04uu8VJrGB9+10PpkCJVXD
 0b2dcuSwCO7hAfTfNGVV8f3EI/1XPflr1hJvMgcVtY53CR96ldp+4QaElzWLXumN
 MyptirmrVITNVyVwGzhGAblXBLWdarXD0EXudyiaF4Xbrj3AkIOSUCghEwKLpjQf
 UwMFFwSE8yGxKTRK4HfU5gMzy6G751fU7TUe5lmxZLovDflQoSXMWgHE8e7r0Qel
 o5v6lmUzoWz2fAISf3xjauo2ncgmfWMwYM6C7OJy5nG73QXLQId9J+ReXbmrgrrN
 DgI=
 =spje
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - fix debugfs initialization order (Anthony Iliopoulos)

 - use memory_intersects() directly (Kefeng Wang)

 - allow to return specific errors from ->map_sg (Logan Gunthorpe,
   Martin Oliveira)

 - turn the dma_map_sg return value into an unsigned int (me)

 - provide a common global coherent pool іmplementation (me)

* tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mapping: (31 commits)
  hexagon: use the generic global coherent pool
  dma-mapping: make the global coherent pool conditional
  dma-mapping: add a dma_init_global_coherent helper
  dma-mapping: simplify dma_init_coherent_memory
  dma-mapping: allow using the global coherent pool for !ARM
  ARM/nommu: use the generic dma-direct code for non-coherent devices
  dma-direct: add support for dma_coherent_default_memory
  dma-mapping: return an unsigned int from dma_map_sg{,_attrs}
  dma-mapping: disallow .map_sg operations from returning zero on error
  dma-mapping: return error code from dma_dummy_map_sg()
  x86/amd_gart: don't set failed sg dma_address to DMA_MAPPING_ERROR
  x86/amd_gart: return error code from gart_map_sg()
  xen: swiotlb: return error code from xen_swiotlb_map_sg()
  parisc: return error code from .map_sg() ops
  sparc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
  sparc/iommu: return error codes from .map_sg() ops
  s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR
  s390/pci: return error code from s390_dma_map_sg()
  powerpc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
  powerpc/iommu: return error code from .map_sg() ops
  ...
2021-09-02 10:32:06 -07:00
Linus Torvalds
df43d90382 printk changes for 5.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmEt+hwACgkQUqAMR0iA
 lPLppBAAiyrUNVmqqtdww+IJajEs1uD/4FqPsysHRwroHBFymJeQG1XCwUpDZ7jj
 6gXT0chxyjQE18gT/W9nf+PSmA9XvIVA1WSR+WCECTNW3YoZXqtgwiHfgnitXYku
 HlmoZLthYeuoXWw2wn+hVLfTRh6VcPHYEaC21jXrs6B1pOXHbvjJ5eTLHlX9oCfL
 UKSK+jFTHAJcn/GskRzviBe0Hpe8fqnkRol2XX13ltxqtQ73MjaGNu7imEH6/Pa7
 /MHXWtuWJtOvuYz17aztQP4Qwh1xy+kakMy3aHucdlxRBTP4PTzzTuQI3L/RYi6l
 +ttD7OHdRwqFAauBLY3bq3uJjYb5v/64ofd8DNnT2CJvtznY8wrPbTdFoSdPcL2Q
 69/opRWHcUwbU/Gt4WLtyQf3Mk0vepgMbbVg1B5SSy55atRZaXMrA2QJ/JeawZTB
 KK6D/mE7ccze/YFzsySunCUVKCm0veoNxEAcakCCZKXSbsvd1MYcIRC0e+2cv6e5
 2NEH7gL4dD+5tqu5nzvIuKDn3NrDQpbi28iUBoFbkxRgcVyvHJ9AGSa62wtb5h3D
 OgkqQMdVKBbjYNeUodPlQPzmXZDasytavyd0/BC/KENOcBvU/8gW++2UZTfsh/1A
 dLjgwFBdyJncQcCS9Abn20/EKntbIMEX8NLa97XWkA3fuzMKtak=
 =yEVq
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Optionally, provide an index of possible printk messages via
   <debugfs>/printk/index/. It can be used when monitoring important
   kernel messages on a farm of various hosts. The monitor has to be
   updated when some messages has changed or are not longer available by
   a newly deployed kernel.

 - Add printk.console_no_auto_verbose boot parameter. It allows to
   generate crash dump even with slow consoles in a reasonable time
   frame.

 - Remove printk_safe buffers. The messages are always stored directly
   to the main logbuffer, even in NMI or recursive context. Also it
   allows to serialize syslog operations by a mutex instead of a spin
   lock.

 - Misc clean up and build fixes.

* tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk/index: Fix -Wunused-function warning
  lib/nmi_backtrace: Serialize even messages about idle CPUs
  printk: Add printk.console_no_auto_verbose boot parameter
  printk: Remove console_silent()
  lib/test_scanf: Handle n_bits == 0 in random tests
  printk: syslog: close window between wait and read
  printk: convert @syslog_lock to mutex
  printk: remove NMI tracking
  printk: remove safe buffers
  printk: track/limit recursion
  lib/nmi_backtrace: explicitly serialize banner and regs
  printk: Move the printk() kerneldoc comment to its new home
  printk/index: Fix warning about missing prototypes
  MIPS/asm/printk: Fix build failure caused by printk
  printk: index: Add indexing support to dev_printk
  printk: Userspace format indexing support
  printk: Rework parse_prefix into printk_parse_prefix
  printk: Straighten out log_flags into printk_info_flags
  string_helpers: Escape double quotes in escape_special
  printk/console: Check consistent sequence number when handling race in console_unlock()
2021-09-01 18:41:13 -07:00
Linus Torvalds
4cdc4cc2ad asm-generic changes for 5.15
The main content for 5.15 is a series that cleans up the handling of
 strncpy_from_user() and strnlen_user(), removing a lot of slightly
 incorrect versions of these in favor of the lib/strn*.c helpers
 that implement these correctly and more efficiently.
 
 The only architectures that retain a private version now are
 mips, ia64, um and parisc. I had offered to convert those at all,
 but Thomas Bogendoerfer wanted to keep the mips version for the
 moment until he had a chance to do regression testing.
 
 The branch also contains two patches for bitops and for ffs().
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAYS82fGCrR//JCVInAQL9AxAAruOge7r8vzXQC8ehR4iw4/pCyzsLWdjh
 bLvTCovhD6y1KXb0cU3qMI2SUESwy/w9YteyLs4Edh5Yhm9uWIXz2WO6zTNDuW1g
 eNd6lcmoOLOXFxCUX3TZqvnxaEEiedjEJjOTicTBRv8c79Kw+2DTFYEwi8MIWlbx
 gGdGLOJ2SORl6HeE+wn8bfMPCChisMod75koi+Vnp3kp9+aw8VIi0RVMjtZ4HI3v
 z9H0DD0jDAy1eaXnC2+dsaIyrAq8/Lo/pqVBvUJRoBFaV/FHvNH2M0yl15yJYx1V
 1KNJlBhoedc0PiMO9OnsRS1GMq1kEeo+u9gJPqphZQWooAQotD5C0sXsPnsghGo0
 IrsVANy4H0k2h0AazRZd3KwV03aJ6FWHz3qyvbglLAQjKU1MgZTgroF5Q6R2FMtV
 /VtswpGB707+oGtmFvHc1lVgRYZTfduGT1jjBgwUuTUmLhI3/yRIlnodd6dXneX6
 FOK3WbxlhUuIaSZLObLved/yNBgoOajP3vHIUc4c9HrsPEvkjKPB1g/VpbqqWVXe
 vF5/MeUN+b3Rq+h1GnnZQmhiOPIydZmK3qK7zYzp5Da+Ke4I2zWv/Et0/eFSZmh8
 rS/cNMLshSOKMbaPvdopUnWhLspUh82wWDNjDFJx2XNlStVpFkMikKtSY4TrtbV+
 zzHxZpLyQxc=
 =NB0a
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "The main content for 5.15 is a series that cleans up the handling of
  strncpy_from_user() and strnlen_user(), removing a lot of slightly
  incorrect versions of these in favor of the lib/strn*.c helpers that
  implement these correctly and more efficiently.

  The only architectures that retain a private version now are mips,
  ia64, um and parisc. I had offered to convert those at all, but Thomas
  Bogendoerfer wanted to keep the mips version for the moment until he
  had a chance to do regression testing.

  The branch also contains two patches for bitops and for ffs()"

* tag 'asm-generic-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  bitops/non-atomic: make @nr unsigned to avoid any DIV
  asm-generic: ffs: Drop bogus reference to ffz location
  asm-generic: reverse GENERIC_{STRNCPY_FROM,STRNLEN}_USER symbols
  asm-generic: remove extra strn{cpy_from,len}_user declarations
  asm-generic: uaccess: remove inline strncpy_from_user/strnlen_user
  s390: use generic strncpy/strnlen from_user
  microblaze: use generic strncpy/strnlen from_user
  csky: use generic strncpy/strnlen from_user
  arc: use generic strncpy/strnlen from_user
  hexagon: use generic strncpy/strnlen from_user
  h8300: remove stale strncpy_from_user
  asm-generic/uaccess.h: remove __strncpy_from_user/__strnlen_user
2021-09-01 15:13:02 -07:00
Linus Torvalds
bcfeebbff3 Merge branch 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit cleanups from Eric Biederman:
 "In preparation of doing something about PTRACE_EVENT_EXIT I have
  started cleaning up various pieces of code related to do_exit. Most of
  that code I did not manage to get tested and reviewed before the merge
  window opened but a handful of very useful cleanups are ready to be
  merged.

  The first change is simply the removal of the bdflush system call. The
  code has now been disabled long enough that even the oldest userspace
  working userspace setups anyone can find to test are fine with the
  bdflush system call being removed.

  Changing m68k fsp040_die to use force_sigsegv(SIGSEGV) instead of
  calling do_exit directly is interesting only in that it is nearly the
  most difficult of the incorrect uses of do_exit to remove.

  The change to the seccomp code to simply send a signal instead of
  calling do_coredump directly is a very nice little cleanup made
  possible by realizing the existing signal sending helpers were missing
  a little bit of functionality that is easy to provide"

* 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal/seccomp: Dump core when there is only one live thread
  signal/seccomp: Refactor seccomp signal and coredump generation
  signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die
  exit/bdflush: Remove the deprecated bdflush system call
2021-09-01 14:52:05 -07:00
Linus Torvalds
c6c3c5704b Driver core update for 5.15-rc1
Here is the big set of driver core patches for 5.15-rc1.
 
 These do change a number of different things across different
 subsystems, and because of that, there were 2 stable tags created that
 might have already come into your tree from different pulls that did the
 following
 	- changed the bus remove callback to return void
 	- sysfs iomem_get_mapping rework
 
 The latter one will cause a tiny merge issue with your tree, as there
 was a last-minute fix for this in 5.14 in your tree, but the fixup
 should be "obvious".  If you want me to provide a fixed merge for this,
 please let me know.
 
 Other than those two things, there's only a few small things in here:
 	- kernfs performance improvements for huge numbers of sysfs
 	  users at once
 	- tiny api cleanups
 	- other minor changes
 
 All of these have been in linux-next for a while with no reported
 problems, other than the before-mentioned merge issue.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYS+FLQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylXuACfWECnysDtXNe66DdETCFs1a1RToYAoMokWeU5
 s8VFP1NY2BjmxJbkebLL
 =8kVu
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big set of driver core patches for 5.15-rc1.

  These do change a number of different things across different
  subsystems, and because of that, there were 2 stable tags created that
  might have already come into your tree from different pulls that did
  the following

   - changed the bus remove callback to return void

   - sysfs iomem_get_mapping rework

  Other than those two things, there's only a few small things in here:

   - kernfs performance improvements for huge numbers of sysfs users at
     once

   - tiny api cleanups

   - other minor changes

  All of these have been in linux-next for a while with no reported
  problems, other than the before-mentioned merge issue"

* tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits)
  MAINTAINERS: Add dri-devel for component.[hc]
  driver core: platform: Remove platform_device_add_properties()
  ARM: tegra: paz00: Handle device properties with software node API
  bitmap: extend comment to bitmap_print_bitmask/list_to_buf
  drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI
  topology: use bin_attribute to break the size limitation of cpumap ABI
  lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases
  cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list
  sysfs: Rename struct bin_attribute member to f_mapping
  sysfs: Invoke iomem_get_mapping() from the sysfs open callback
  debugfs: Return error during {full/open}_proxy_open() on rmmod
  zorro: Drop useless (and hardly used) .driver member in struct zorro_dev
  zorro: Simplify remove callback
  sh: superhyway: Simplify check in remove callback
  nubus: Simplify check in remove callback
  nubus: Make struct nubus_driver::remove return void
  kernfs: dont call d_splice_alias() under kernfs node lock
  kernfs: use i_lock to protect concurrent inode updates
  kernfs: switch kernfs to use an rwsem
  kernfs: use VFS negative dentry caching
  ...
2021-09-01 08:44:42 -07:00
Michael Ellerman
e432fe97f3 powerpc/bug: Cast to unsigned long before passing to inline asm
In commit 1e688dd2a3 ("powerpc/bug: Provide better flexibility to
WARN_ON/__WARN_FLAGS() with asm goto") we changed WARN_ON(). Previously
it would take the warning condition, x, and double negate it before
converting the result to int, and passing that int to the underlying
inline asm. ie:

  #define WARN_ON(x) ({
  	int __ret_warn_on = !!(x);
  	if (__builtin_constant_p(__ret_warn_on)) {
  	...
  	} else {
  		BUG_ENTRY(PPC_TLNEI " %4, 0",
  			  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
  			  "r" (__ret_warn_on));

The asm then does a full register width comparison with zero and traps
if it is non-zero (PPC_TLNEI).

The new code instead passes the full expression, x, with some arbitrary
type, to the inline asm:

  #define WARN_ON(x) ({
	...
	do {
		if (__builtin_constant_p((x))) {
		...
		} else {
			...
			WARN_ENTRY(PPC_TLNEI " %4, 0",
				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
				   __label_warn_on, "r" (x));

As reported[1] by Nathan, when building with clang this can cause
spurious warnings to fire repeatedly at boot:

  WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
  NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
  REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
  MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
  CFAR: c00000000090a034 IRQMASK: 0
  GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
  GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
  GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
  GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
  GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
  NIP .klist_add_tail+0x3c/0x110
  LR  .bus_add_driver+0x148/0x290
  Call Trace:
    0xc0000000073c35d0 (unreliable)
    .bus_add_driver+0x148/0x290
    .driver_register+0xb8/0x190
    .__hid_register_driver+0x70/0xd0
    .redragon_driver_init+0x34/0x58
    .do_one_initcall+0x130/0x3b0
    .do_initcall_level+0xd8/0x188
    .do_initcalls+0x7c/0xdc
    .kernel_init_freeable+0x178/0x21c
    .kernel_init+0x34/0x220
    .ret_from_kernel_thread+0x58/0x60
  Instruction dump:
  fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
  fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024

The instruction dump shows that we are trapping because r30 is not zero:
  tdnei   r30,0

Where r30 = c000000007de72c8

The WARN_ON() comes from:

  static void knode_set_klist(struct klist_node *knode, struct klist *klist)
  {
  	knode->n_klist = klist;
  	/* no knode deserves to start its life dead */
  	WARN_ON(knode_dead(knode));
  		      ^^^^^^^^^^^^^^^^^

Where:

  #define KNODE_DEAD		1LU

  static bool knode_dead(struct klist_node *knode)
  {
  	return (unsigned long)knode->n_klist & KNODE_DEAD;
  }

The full disassembly shows that clang has not generated any code to
apply the "& KNODE_DEAD" to the n_klist pointer, which is surprising.

Nathan filed an LLVM bug [2], in which Eli Friedman explained that clang
believes it is only passing a single bit to the asm (ie. a bool) and so
the mask of bit 0 with 1 can be omitted, and suggested that if we want
the full 64-bit value passed to the inline asm we should cast to a
64-bit type (or 32-bit on 32-bits).

In fact we already do that for BUG_ENTRY(), which was added to fix a
possibly similar bug in 2005 in commit 32818c2eb6 ("[PATCH] ppc64: Fix
issue with gcc 4.0 compiled kernels").

So cast the value we pass to the inline asm to long.

For GCC this appears to have no effect on code generation, other than
causing sign extension in some cases.

[1]: http://lore.kernel.org/r/YSa1O4fcX1nNKqN/@Ryzen-9-3900X.localdomain
[2]: https://bugs.llvm.org/show_bug.cgi?id=51634

Fixes: 1e688dd2a3 ("powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901112522.1085134-1-mpe@ellerman.id.au
2021-09-01 21:25:43 +10:00
Michael Ellerman
b14b8b1ed0 powerpc/ptdump: Fix generic ptdump for 64-bit
Since the conversion to generic ptdump we see crashes on 64-bit:

  BUG: Unable to handle kernel data access on read at 0xc0eeff7f00000000
  Faulting instruction address: 0xc00000000045e5fc
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP __walk_page_range+0x2bc/0xce0
  LR  __walk_page_range+0x240/0xce0
  Call Trace:
    __walk_page_range+0x240/0xce0 (unreliable)
    walk_page_range_novma+0x74/0xb0
    ptdump_walk_pgd+0x98/0x170
    ptdump_check_wx+0x88/0xd0
    mark_rodata_ro+0x48/0x80
    kernel_init+0x74/0x1a0
    ret_from_kernel_thread+0x5c/0x64

What's happening is that have walked off the end of the kernel page
tables, and started dereferencing junk values.

That happens because we initialised the ptdump_range to span all the way
up to 0xffffffffffffffff:

static struct ptdump_range ptdump_range[] __ro_after_init = {
	{TASK_SIZE_MAX, ~0UL},

But the kernel page tables don't span that far. So on 64-bit set the end
of the range to be the address immediately past the end of the kernel
page tables, to limit the page table walk to valid addresses.

Fixes: e084728393 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210831135151.886620-1-mpe@ellerman.id.au
2021-09-01 16:52:53 +10:00
Linus Torvalds
9e9fb7655e Core:
- Enable memcg accounting for various networking objects.
 
 BPF:
 
  - Introduce bpf timers.
 
  - Add perf link and opaque bpf_cookie which the program can read
    out again, to be used in libbpf-based USDT library.
 
  - Add bpf_task_pt_regs() helper to access user space pt_regs
    in kprobes, to help user space stack unwinding.
 
  - Add support for UNIX sockets for BPF sockmap.
 
  - Extend BPF iterator support for UNIX domain sockets.
 
  - Allow BPF TCP congestion control progs and bpf iterators to call
    bpf_setsockopt(), e.g. to switch to another congestion control
    algorithm.
 
 Protocols:
 
  - Support IOAM Pre-allocated Trace with IPv6.
 
  - Support Management Component Transport Protocol.
 
  - bridge: multicast: add vlan support.
 
  - netfilter: add hooks for the SRv6 lightweight tunnel driver.
 
  - tcp:
     - enable mid-stream window clamping (by user space or BPF)
     - allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
     - more accurate DSACK processing for RACK-TLP
 
  - mptcp:
     - add full mesh path manager option
     - add partial support for MP_FAIL
     - improve use of backup subflows
     - optimize option processing
 
  - af_unix: add OOB notification support.
 
  - ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by
          the router.
 
  - mac80211: Target Wake Time support in AP mode.
 
  - can: j1939: extend UAPI to notify about RX status.
 
 Driver APIs:
 
  - Add page frag support in page pool API.
 
  - Many improvements to the DSA (distributed switch) APIs.
 
  - ethtool: extend IRQ coalesce uAPI with timer reset modes.
 
  - devlink: control which auxiliary devices are created.
 
  - Support CAN PHYs via the generic PHY subsystem.
 
  - Proper cross-chip support for tag_8021q.
 
  - Allow TX forwarding for the software bridge data path to be
    offloaded to capable devices.
 
 Drivers:
 
  - veth: more flexible channels number configuration.
 
  - openvswitch: introduce per-cpu upcall dispatch.
 
  - Add internet mix (IMIX) mode to pktgen.
 
  - Transparently handle XDP operations in the bonding driver.
 
  - Add LiteETH network driver.
 
  - Renesas (ravb):
    - support Gigabit Ethernet IP
 
  - NXP Ethernet switch (sja1105)
    - fast aging support
    - support for "H" switch topologies
    - traffic termination for ports under VLAN-aware bridge
 
  - Intel 1G Ethernet
     - support getcrosststamp() with PCIe PTM (Precision Time
       Measurement) for better time sync
     - support Credit-Based Shaper (CBS) offload, enabling HW traffic
       prioritization and bandwidth reservation
 
  - Broadcom Ethernet (bnxt)
     - support pulse-per-second output
     - support larger Rx rings
 
  - Mellanox Ethernet (mlx5)
     - support ethtool RSS contexts and MQPRIO channel mode
     - support LAG offload with bridging
     - support devlink rate limit API
     - support packet sampling on tunnels
 
  - Huawei Ethernet (hns3):
     - basic devlink support
     - add extended IRQ coalescing support
     - report extended link state
 
  - Netronome Ethernet (nfp):
     - add conntrack offload support
 
  - Broadcom WiFi (brcmfmac):
     - add WPA3 Personal with FT to supported cipher suites
     - support 43752 SDIO device
 
  - Intel WiFi (iwlwifi):
     - support scanning hidden 6GHz networks
     - support for a new hardware family (Bz)
 
  - Xen pv driver:
     - harden netfront against malicious backends
 
  - Qualcomm mobile
     - ipa: refactor power management and enable automatic suspend
     - mhi: move MBIM to WWAN subsystem interfaces
 
 Refactor:
 
  - Ambient BPF run context and cgroup storage cleanup.
 
  - Compat rework for ndo_ioctl.
 
 Old code removal:
 
  - prism54 remove the obsoleted driver, deprecated by the p54 driver.
 
  - wan: remove sbni/granch driver.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEukBYACgkQMUZtbf5S
 IrsyHA//TO8dw18NYts4n9LmlJT2naJ7yBUUSSXK/M+DtW0MQ9nnHhqzPm5uJdRl
 IgQTNJrW3dYzRwgqaWZqEwO1t5/FI+f87ND1Nsekg7x9tF66a6ov5WxU26TwwSba
 U+si/inQ/4chuQ+LxMQobqCDxaLE46I2dIoRl+YfndJ24DRzYSwAEYIPPbSdfyU+
 +/l+3s4GaxO4k/hLciPAiOniyxLoUNiGUTNh+2yqRBXelSRJRKVnl+V22ANFrxRW
 nTEiplfVKhlPU1e4iLuRtaxDDiePHhw9I3j/lMHhfeFU2P/gKJIvz4QpGV0CAZg2
 1VvDU32WEx1GQLXJbKm0KwoNRUq1QSjOyyFti+BO7ugGaYAR4gKhShOqlSYLzUtB
 tbtzQhSNLWOGqgmSJOztZb5kFDm2EdRSll5/lP2uyFlPkIsIp0QbscJVzNTnS74b
 Xz15ZOw41Z4TfWPEMWgfrx6Zkm7pPWkly+7WfUkPcHa1gftNz6tzXXxSXcXIBPdi
 yQ5JCzzxrM5573YHuk5YedwZpn6PiAt4A/muFGk9C6aXP60TQAOS/ppaUzZdnk4D
 NfOk9mj06WEULjYjPcKEuT3GGWE6kmjb8Pu0QZWKOchv7vr6oZly1EkVZqYlXELP
 AfhcrFeuufie8mqm0jdb4LnYaAnqyLzlb1J4Zxh9F+/IX7G3yoc=
 =JDGD
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Enable memcg accounting for various networking objects.

  BPF:

   - Introduce bpf timers.

   - Add perf link and opaque bpf_cookie which the program can read out
     again, to be used in libbpf-based USDT library.

   - Add bpf_task_pt_regs() helper to access user space pt_regs in
     kprobes, to help user space stack unwinding.

   - Add support for UNIX sockets for BPF sockmap.

   - Extend BPF iterator support for UNIX domain sockets.

   - Allow BPF TCP congestion control progs and bpf iterators to call
     bpf_setsockopt(), e.g. to switch to another congestion control
     algorithm.

  Protocols:

   - Support IOAM Pre-allocated Trace with IPv6.

   - Support Management Component Transport Protocol.

   - bridge: multicast: add vlan support.

   - netfilter: add hooks for the SRv6 lightweight tunnel driver.

   - tcp:
       - enable mid-stream window clamping (by user space or BPF)
       - allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
       - more accurate DSACK processing for RACK-TLP

   - mptcp:
       - add full mesh path manager option
       - add partial support for MP_FAIL
       - improve use of backup subflows
       - optimize option processing

   - af_unix: add OOB notification support.

   - ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the
     router.

   - mac80211: Target Wake Time support in AP mode.

   - can: j1939: extend UAPI to notify about RX status.

  Driver APIs:

   - Add page frag support in page pool API.

   - Many improvements to the DSA (distributed switch) APIs.

   - ethtool: extend IRQ coalesce uAPI with timer reset modes.

   - devlink: control which auxiliary devices are created.

   - Support CAN PHYs via the generic PHY subsystem.

   - Proper cross-chip support for tag_8021q.

   - Allow TX forwarding for the software bridge data path to be
     offloaded to capable devices.

  Drivers:

   - veth: more flexible channels number configuration.

   - openvswitch: introduce per-cpu upcall dispatch.

   - Add internet mix (IMIX) mode to pktgen.

   - Transparently handle XDP operations in the bonding driver.

   - Add LiteETH network driver.

   - Renesas (ravb):
       - support Gigabit Ethernet IP

   - NXP Ethernet switch (sja1105):
       - fast aging support
       - support for "H" switch topologies
       - traffic termination for ports under VLAN-aware bridge

   - Intel 1G Ethernet
       - support getcrosststamp() with PCIe PTM (Precision Time
         Measurement) for better time sync
       - support Credit-Based Shaper (CBS) offload, enabling HW traffic
         prioritization and bandwidth reservation

   - Broadcom Ethernet (bnxt)
       - support pulse-per-second output
       - support larger Rx rings

   - Mellanox Ethernet (mlx5)
       - support ethtool RSS contexts and MQPRIO channel mode
       - support LAG offload with bridging
       - support devlink rate limit API
       - support packet sampling on tunnels

   - Huawei Ethernet (hns3):
       - basic devlink support
       - add extended IRQ coalescing support
       - report extended link state

   - Netronome Ethernet (nfp):
       - add conntrack offload support

   - Broadcom WiFi (brcmfmac):
       - add WPA3 Personal with FT to supported cipher suites
       - support 43752 SDIO device

   - Intel WiFi (iwlwifi):
       - support scanning hidden 6GHz networks
       - support for a new hardware family (Bz)

   - Xen pv driver:
       - harden netfront against malicious backends

   - Qualcomm mobile
       - ipa: refactor power management and enable automatic suspend
       - mhi: move MBIM to WWAN subsystem interfaces

  Refactor:

   - Ambient BPF run context and cgroup storage cleanup.

   - Compat rework for ndo_ioctl.

  Old code removal:

   - prism54 remove the obsoleted driver, deprecated by the p54 driver.

   - wan: remove sbni/granch driver"

* tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits)
  net: Add depends on OF_NET for LiteX's LiteETH
  ipv6: seg6: remove duplicated include
  net: hns3: remove unnecessary spaces
  net: hns3: add some required spaces
  net: hns3: clean up a type mismatch warning
  net: hns3: refine function hns3_set_default_feature()
  ipv6: remove duplicated 'net/lwtunnel.h' include
  net: w5100: check return value after calling platform_get_resource()
  net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()
  net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()
  net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()
  fou: remove sparse errors
  ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
  octeontx2-af: Set proper errorcode for IPv4 checksum errors
  octeontx2-af: Fix static code analyzer reported issues
  octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg
  octeontx2-af: Fix loop in free and unmap counter
  af_unix: fix potential NULL deref in unix_dgram_connect()
  dpaa2-eth: Replace strlcpy with strscpy
  octeontx2-af: Use NDC TX for transmit packet data
  ...
2021-08-31 16:43:06 -07:00
Linus Torvalds
7d6e3fa87e Updates to the interrupt core and driver subsystems:
Core changes:
 
    - The usual set of small fixes and improvements all over the place, but nothing
      outstanding
 
 MSI changes:
 
    - Further consolidation of the PCI/MSI interrupt chip code
 
    - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts
      of platform devices in the same way as PCI exposes them.
 
 Driver changes:
 
    - Support for ARM GICv3 EPPI partitions
 
    - Treewide conversion to generic_handle_domain_irq() for all chained
      interrupt controllers
 
    - Conversion to bitmap_zalloc() throughout the irq chip drivers
 
    - The usual set of small fixes and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmEsnpsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoS+/EACQdpRkzl3IDIYqThxVZ8KQzp2rKKVn
 qisAQiWg/6koNJx/yYy62KNAUyKjCIObNtRnWi7OAOx6OvNtQTD2WOLAwkh3Pgw1
 8ePYYl55k+yCs8VoITsZM9jYeO+Tk878pU2A6R943zR+g6G7bskGJrxEyZ9TbzIe
 qKfusNKnRY9/jMQaRALUAAtA9VIVR867GqORX5X8hKz8yE2rqlpb4y+1CFba5BTV
 Vlxw7cIXvXBn7BKAom5diRqEGDNJEbX+56jJ7yDZshgLo7m11D7QLw72kmb6TNVC
 g7PchvFi4afpc1ifEAAp0tk4RiSIAQ91nS3n0+jLcLbodOjIkl14eY02ZCJGAP29
 uslyzUbmy1wgejG6CA63JtZ4MYdrf/OSMGuoN78qnOKYcIsWFzOvlJmBWWNW34qW
 LCaUF9QdJ/slXu6B4vIx30GfN9q4myml8bFUobE5q9mBRrEk4R0B7iyBvPu1xKYr
 ZEan67prI5VEu+afJGpp4r294m4HNVkMLfl3nYmE5+y4MoLeMNKDY3IPTvI9iP4G
 kaFgoPvQo23WnuclNYpJ+CaA4aRASlB2nTY+oAXIYfehbey9EW5vq4/EK864ek6w
 oyUTepxxNhE81tG2jpQbf2tR4COsEHy986clxqPP4AvsZXcbypCw8O2FcflpQbHO
 5DLEAfTmp7cziQ==
 =qyll
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates to the interrupt core and driver subsystems:

  Core changes:

   - The usual set of small fixes and improvements all over the place,
     but nothing stands out

  MSI changes:

   - Further consolidation of the PCI/MSI interrupt chip code

   - Make MSI sysfs code independent of PCI/MSI and expose the MSI
     interrupts of platform devices in the same way as PCI exposes them.

  Driver changes:

   - Support for ARM GICv3 EPPI partitions

   - Treewide conversion to generic_handle_domain_irq() for all chained
     interrupt controllers

   - Conversion to bitmap_zalloc() throughout the irq chip drivers

   - The usual set of small fixes and improvements"

* tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  platform-msi: Add ABI to show msi_irqs of platform devices
  genirq/msi: Move MSI sysfs handling from PCI to MSI core
  genirq/cpuhotplug: Demote debug printk to KERN_DEBUG
  irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy
  irqdomain: Export irq_domain_disconnect_hierarchy()
  irqchip/gic-v3: Fix priority comparison when non-secure priorities are used
  irqchip/apple-aic: Fix irq_disable from within irq handlers
  pinctrl/rockchip: drop the gpio related codes
  gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type
  gpio/rockchip: support next version gpio controller
  gpio/rockchip: use struct rockchip_gpio_regs for gpio controller
  gpio/rockchip: add driver for rockchip gpio
  dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank
  pinctrl/rockchip: add pinctrl device to gpio bank struct
  pinctrl/rockchip: separate struct rockchip_pin_bank to a head file
  pinctrl/rockchip: always enable clock for gpio controller
  genirq: Fix kernel doc indentation
  EDAC/altera: Convert to generic_handle_domain_irq()
  powerpc: Bulk conversion to generic_handle_domain_irq()
  nios2: Bulk conversion to generic_handle_domain_irq()
  ...
2021-08-30 14:38:37 -07:00
Petr Mladek
c985aafb60 Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
Alexey Kardashevskiy
1d78dfde33 KVM: PPC: Fix clearing never mapped TCEs in realmode
Since commit e1a1ef84cd ("KVM: PPC: Book3S: Allocate guest TCEs on
demand too"), pages for TCE tables for KVM guests are allocated only
when needed. This allows skipping any update when clearing TCEs. This
works mostly fine as TCE updates are handled when the MMU is enabled.
The realmode handlers fail with H_TOO_HARD when pages are not yet
allocated, except when clearing a TCE in which case KVM prints a warning
and proceeds to dereference a NULL pointer, which crashes the host OS.

This has not been caught so far as the change in commit e1a1ef84cd is
reasonably new, and POWER9 runs mostly radix which does not use realmode
handlers. With hash, the default TCE table is memset() by QEMU when the
machine is reset which triggers page faults and the KVM TCE device's
kvm_spapr_tce_fault() handles those with MMU on. And the huge DMA
windows are not cleared by VMs which instead successfully create a DMA
window big enough to map the VM memory 1:1 and then VMs just map
everything without clearing.

This started crashing now as commit 381ceda88c ("powerpc/pseries/iommu:
Make use of DDW for indirect mapping") added a mode when a dymanic DMA
window not big enough to map the VM memory 1:1 but it is used anyway,
and the VM now is the first (i.e. not QEMU) to clear a just created
table. Note that upstream QEMU needs to be modified to trigger the VM to
trigger the host OS crash.

This replaces WARN_ON_ONCE_RM() with a check and return, and adds
another warning if TCE is not being cleared.

Fixes: e1a1ef84cd ("KVM: PPC: Book3S: Allocate guest TCEs on demand too")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210827040706.517652-1-aik@ozlabs.ru
2021-08-30 13:51:43 +10:00
Thomas Gleixner
47fb0cfdb7 irqchip updates for Linux 5.15
API updates:
 
 - Treewide conversion to generic_handle_domain_irq() for anything
   that looks like a chained interrupt controller
 
 - Update the irqdomain documentation
 
 - Use of bitmap_zalloc() throughout the tree
 
 New functionalities:
 
 - Support for GICv3 EPPI partitions
 
 Fixes:
 
 - Qualcomm PDC hierarchy fixes
 
 - Yet another priority decoding fix for the GICv3 pseudo-NMIs
 
 - Fix the apple-aic driver irq_eoi() callback to always unmask
   the interrupt
 
 - Properly handle edge interrupts on loongson-pch-pic
 
 - Let the mtk-sysirq driver advertise IRQCHIP_SKIP_SET_WAKE
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmEqIhgPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDjPkP/Rtp6WNZ1QUfJWmHovnh/Wc6ob1DXcBwi9nX
 hy4miIJ1SWuez9G49RlQAiXZoB28B6KKCKKmouiqu7ke7WUhifS0K1ej188wjxRX
 dqRG+m9yBAqKSr0lyWLB5VVCc8XBz4oZTc28n585gHiXfAPv7u0EzW+zNrnloLU5
 NrAj6ppGFUzVT0VxRqcurbymE6OwRWjc3D+z/PhtHZ4SFOhft95CXgsdvMklqyLj
 wwiuZ0Dhj5EruSP/Z7DzbbXnMNmte3HC2/cUNPYkho4/rk+2gVnYv5kVdfPHKQCY
 Wjti/kvuPC3hdTvdw8g7VQfP63R3clZhcQ8s+myoeX5LWzyAHpoxAtdsbX7oVsgs
 aKyrFhddEFVuiFizYyweS89pL0kCkTob8/zlGeuhRiVRTZ3+kG7Zf2UTTnN1ZdLw
 2lMolghiGk4LYJfr83+CDZyYP/VGHDCthfrmd//l39P2wJhkuCDbbeKaElLGWvUt
 abnPf0buCRqMAJe7vh0GHCx7290nEh2IqyHR4AYRVhRaN7bfAXdZH9Xp6ZGT25Fz
 uORgUbGAyhd5Ics/7twE4qeOkfJ6fxwgXsOlx90EfgVYyDJ1sBHNx8Buo2z8Bl/2
 rwCsW49kU7yX/wp11sJctR72RuLKC23dxS6z7aSWkRc6k3u+8xl2eeLIN59FNrKZ
 ToTdbXEQ
 =bdm2
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip updates from Marc Zyngier:

- API updates:

  - Treewide conversion to generic_handle_domain_irq() for anything
    that looks like a chained interrupt controller

  - Update the irqdomain documentation

  - Use of bitmap_zalloc() throughout the tree

- New functionalities:

  - Support for GICv3 EPPI partitions

- Fixes:

  - Qualcomm PDC hierarchy fixes

  - Yet another priority decoding fix for the GICv3 pseudo-NMIs

  - Fix the apple-aic driver irq_eoi() callback to always unmask
    the interrupt

  - Properly handle edge interrupts on loongson-pch-pic

  - Let the mtk-sysirq driver advertise IRQCHIP_SKIP_SET_WAKE

Link: https://lore.kernel.org/r/20210828121013.2647964-1-maz@kernel.org
2021-08-29 21:19:50 +02:00
Linus Torvalds
9f73eacde7 powerpc fixes for 5.14 #7
- Fix scv implicit soft-mask table for relocated (eg. kdump) kernels.
  - Re-enable ARCH_ENABLE_SPLIT_PMD_PTLOCK, which was disabled due to a typo.
 
 Thanks to: Lukas Bulwahn, Nicholas Piggin, Daniel Axtens.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEqLccTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDzUEACmeaRL+SPtrfQQI5vYhCfKn6oD48bn
 52rkjV30vZIgckmJ6FBTbpka670T6GsXccH3///Cnf3KWE+q1IQVh7o8Sr3gWs/T
 +5IlLH+RKw09lFHgo0WBKa053ImBaOVweoNsVIKrhpAYdwk61AhOOagtqVtLhE7Y
 KkzvfP5QEmWsTKst1NTipGB6OqjqO/lTUq4edAHM2Wt8uxEU9MKp8AhntCo1Y6k8
 yL+Ssx5X64VxznSdZVBD7PDiA0dUbCuS7QlelKSn95N++5f8KijjMDSVIHKEbHqK
 KWxMvyUnvS8FoBZtXZqCpOXCa0jBg1zHlWoZAvpYxK1Fk31sSBzQFC5258TtQ9Y9
 gEyVORtq11JNPOMRd+jz+sPTgvaj120cGMXJP8KVyxF0vMRQzavIsKQVXMtn1bRf
 s5WHiycziLsdCl146KKuA2GSA34OX+GmHF5ab/URrlHYsPDfXxyrSzXtll/dMWCn
 svfSdpTs4vXQfHr5LCXghYrK/ZCRbcrSVBMe0IwBFlqy7O1bannNgoQ896eLObYV
 Kt0SbYtAGdAs0HPg0rBrb9K8CnS2uqVaY+nEOIBS9YIbwhugonRkG3XiS0U0qDfU
 jFE18d8mrq/qvVOUvR/XLLgCGl8qLoBYYCpYjoGiIiycnNezv5GH4x8EZaJatBS0
 l4sPiLKfiH2FrA==
 =8mxr
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.14-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix scv implicit soft-mask table for relocated (eg. kdump) kernels

 - Re-enable ARCH_ENABLE_SPLIT_PMD_PTLOCK, which was disabled due to a
   typo

Thanks to Lukas Bulwahn, Nicholas Piggin, and Daniel Axtens.

* tag 'powerpc-5.14-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Fix scv implicit soft-mask table for relocated kernels
  powerpc: Re-enable ARCH_ENABLE_SPLIT_PMD_PTLOCK
2021-08-28 10:40:41 -07:00
Jakub Kicinski
97c78d0af5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 17:57:57 -07:00
Leonardo Bras
57dbbe590f powerpc/pseries/iommu: Rename "direct window" to "dma window"
A previous change introduced the usage of DDW as a bigger indirect DMA
mapping when the DDW available size does not map the whole partition.

As most of the code that manipulates direct mappings was reused for
indirect mappings, it's necessary to rename all names and debug/info
messages to reflect that it can be used for both kinds of mapping.

This should cause no behavioural change, just adjust naming.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-12-leobras.c@gmail.com
2021-08-27 00:56:55 +10:00
Leonardo Bras
381ceda88c powerpc/pseries/iommu: Make use of DDW for indirect mapping
So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW creation
is skipped and the default DMA window "ibm,dma-window" is used instead.

By using DDW, indirect mapping  can get more TCEs than available for the
default DMA window, and also get access to using much larger pagesizes
(16MB as implemented in qemu vs 4k from default DMA window), causing a
significant increase on the maximum amount of memory that can be IOMMU
mapped at the same time.

Indirect mapping will only be used if direct mapping is not a
possibility.

For indirect mapping, it's necessary to re-create the iommu_table with
the new DMA window parameters, so iommu_alloc() can use it.

Removing the default DMA window for using DDW with indirect mapping
is only allowed if there is no current IOMMU memory allocated in
the iommu_table. enable_ddw() is aborted otherwise.

Even though there won't be both direct and indirect mappings at the
same time, we can't reuse the DIRECT64_PROPNAME property name, or else
an older kexec()ed kernel can assume direct mapping, and skip
iommu_alloc(), causing undesirable behavior.
So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
was created to represent a DDW that does not allow direct mapping.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-11-leobras.c@gmail.com
2021-08-27 00:56:55 +10:00
Leonardo Bras
8599395d34 powerpc/pseries/iommu: Find existing DDW with given property name
At the moment pseries stores information about created directly mapped
DDW window in DIRECT64_PROPNAME.

With the objective of implementing indirect DMA mapping with DDW, it's
necessary to have another propriety name to make sure kexec'ing into older
kernels does not break, as it would if we reuse DIRECT64_PROPNAME.

In order to have this, find_existing_ddw_windows() needs to be able to
look for different property names.

Extract find_existing_ddw_windows() into find_existing_ddw_windows_named()
and calls it with current property name.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-10-leobras.c@gmail.com
2021-08-27 00:56:55 +10:00
Leonardo Bras
a5fd95120c powerpc/pseries/iommu: Update remove_dma_window() to accept property name
Update remove_dma_window() so it can be used to remove DDW with a given
property name.

This enables the creation of new property names for DDW, so we can
have different usage for it, like indirect mapping.

Also, add return values to it so we can check if the property was found
while removing the active DDW. This allows skipping the remaining property
names while reducing the impact of multiple property names.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-9-leobras.c@gmail.com
2021-08-27 00:56:55 +10:00
Leonardo Bras
fc8cba8f98 powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
Add a new helper _iommu_table_setparms(), and use it in
iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
code.

Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
so move it to the new helper. Since we need the iommu_table_ops to be
declared before used, declare iommu_table_lpar_multi_ops and
iommu_table_pseries_ops to before their respective iommu_table_setparms*().

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-8-leobras.c@gmail.com
2021-08-27 00:56:55 +10:00
Leonardo Bras
7ed2ed2db2 powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
Code used to create a ddw property that was previously scattered in
enable_ddw() is now gathered in ddw_property_create(), which deals with
allocation and filling the property, letting it ready for
of_property_add(), which now occurs in sequence.

This created an opportunity to reorganize the second part of enable_ddw():

Without this patch enable_ddw() does, in order:
kzalloc() property & members, create_ddw(), fill ddwprop inside property,
ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
of_add_property(), and list_add().

With this patch enable_ddw() does, in order:
create_ddw(), ddw_property_create(), of_add_property(),
ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
and list_add().

This change requires of_remove_property() in case anything fails after
of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
in all memory, which looks the most expensive operation, only if
everything else succeeds.

Also, the error path got remove_ddw() replaced by a new helper
__remove_dma_window(), which only removes the new DDW with an rtas-call.
For this, a new helper clean_dma_window() was needed to clean anything
that could left if walk_system_ram_range() fails.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-7-leobras.c@gmail.com
2021-08-27 00:56:55 +10:00
Leonardo Bras
2ca73c54ce powerpc/pseries/iommu: Allow DDW windows starting at 0x00
enable_ddw() currently returns the address of the DMA window, which is
considered invalid if has the value 0x00.

Also, it only considers valid an address returned from find_existing_ddw
if it's not 0x00.

Changing this behavior makes sense, given the users of enable_ddw() only
need to know if direct mapping is possible. It can also allow a DMA window
starting at 0x00 to be used.

This will be helpful for using a DDW with indirect mapping, as the window
address will be different than 0x00, but it will not map the whole
partition.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-6-leobras.c@gmail.com
2021-08-27 00:56:55 +10:00
Leonardo Bras
92a2321929 powerpc/pseries/iommu: Add ddw_list_new_entry() helper
There are two functions creating direct_window_list entries in a
similar way, so create a ddw_list_new_entry() to avoid duplicity and
simplify those functions.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-5-leobras.c@gmail.com
2021-08-27 00:56:54 +10:00
Leonardo Bras
4ff8677a0b powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
Creates a helper to allow allocating a new iommu_table without the need
to reallocate the iommu_group.

This will be helpful for replacing the iommu_table for the new DMA window,
after we remove the old one with iommu_tce_table_put().

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-4-leobras.c@gmail.com
2021-08-27 00:56:54 +10:00
Leonardo Bras
3c33066a21 powerpc/kernel/iommu: Add new iommu_table_in_use() helper
Having a function to check if the iommu table has any allocation helps
deciding if a tbl can be reset for using a new DMA window.

It should be enough to replace all instances of !bitmap_empty(tbl...).

iommu_table_in_use() skips reserved memory, so we don't need to worry about
releasing it before testing. This causes iommu_table_release_pages() to
become unnecessary, given it is only used to remove reserved memory for
testing.

Also, only allow storing reserved memory values in tbl if they are valid
in the table, so there is no need to check it in the new helper.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-3-leobras.c@gmail.com
2021-08-27 00:56:54 +10:00
Leonardo Bras
0c634bafe3 powerpc/pseries/iommu: Replace hard-coded page shift
Some functions assume IOMMU page size can only be 4K (pageshift == 12).
Update them to accept any page size passed, so we can use 64K pages.

In the process, some defines like TCE_SHIFT were made obsolete, and then
removed.

IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
tce_buildmulti_pSeriesLP().

Most places had a tbl struct, so using tbl->it_page_shift was simple.
tce_free_pSeriesLP() was a special case, since callers not always have a
tbl struct, so adding a tceshift parameter seems the right thing to do.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-2-leobras.c@gmail.com
2021-08-27 00:56:54 +10:00
Srikar Dronamraju
9a245d0e1f powerpc/numa: Update cpu_cpu_map on CPU online/offline
cpu_cpu_map holds all the CPUs in the DIE. However in PowerPC, when
onlining/offlining of CPUs, this mask doesn't get updated.  This mask
is however updated when CPUs are added/removed. So when both
operations like online/offline of CPUs and adding/removing of CPUs are
done simultaneously, then cpumaps end up broken.

WARNING: CPU: 13 PID: 1142 at kernel/sched/topology.c:898
build_sched_domains+0xd48/0x1720
Modules linked in: rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag
udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag
bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
rfkill nf_tables nfnetlink pseries_rng xts vmx_crypto uio_pdrv_genirq
uio binfmt_misc ip_tables xfs libcrc32c dm_service_time sd_mod t10_pi sg
ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror dm_region_hash
dm_log dm_mod fuse
CPU: 13 PID: 1142 Comm: kworker/13:2 Not tainted 5.13.0-rc6+ #28
Workqueue: events cpuset_hotplug_workfn
NIP:  c0000000001caac8 LR: c0000000001caac4 CTR: 00000000007088ec
REGS: c00000005596f220 TRAP: 0700   Not tainted  (5.13.0-rc6+)
MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48828222  XER:
00000009
CFAR: c0000000001ea698 IRQMASK: 0
GPR00: c0000000001caac4 c00000005596f4c0 c000000001c4a400 0000000000000036
GPR04: 00000000fffdffff c00000005596f1d0 0000000000000027 c0000018cfd07f90
GPR08: 0000000000000023 0000000000000001 0000000000000027 c0000018fe68ffe8
GPR12: 0000000000008000 c00000001e9d1880 c00000013a047200 0000000000000800
GPR16: c000000001d3c7d0 0000000000000240 0000000000000048 c000000010aacd18
GPR20: 0000000000000001 c000000010aacc18 c00000013a047c00 c000000139ec2400
GPR24: 0000000000000280 c000000139ec2520 c000000136c1b400 c000000001c93060
GPR28: c00000013a047c20 c000000001d3c6c0 c000000001c978a0 000000000000000d
NIP [c0000000001caac8] build_sched_domains+0xd48/0x1720
LR [c0000000001caac4] build_sched_domains+0xd44/0x1720
Call Trace:
[c00000005596f4c0] [c0000000001caac4] build_sched_domains+0xd44/0x1720 (unreliable)
[c00000005596f670] [c0000000001cc5ec] partition_sched_domains_locked+0x3ac/0x4b0
[c00000005596f710] [c0000000002804e4] rebuild_sched_domains_locked+0x404/0x9e0
[c00000005596f810] [c000000000283e60] rebuild_sched_domains+0x40/0x70
[c00000005596f840] [c000000000284124] cpuset_hotplug_workfn+0x294/0xf10
[c00000005596fc60] [c000000000175040] process_one_work+0x290/0x590
[c00000005596fd00] [c0000000001753c8] worker_thread+0x88/0x620
[c00000005596fda0] [c000000000181704] kthread+0x194/0x1a0
[c00000005596fe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
485af049 60000000 2fa30800 409e0028 80fe0000 e89a00f8 e86100e8 38da0120
7f88e378 7ce53b78 4801fb91 60000000 <0fe00000> 39000000 38e00000 38c00000

Fix this by updating cpu_cpu_map aka cpumask_of_node() on every CPU
online/offline.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826100521.412639-5-srikar@linux.vnet.ibm.com
2021-08-27 00:56:54 +10:00
Srikar Dronamraju
544a09ee74 powerpc/numa: Print debug statements only when required
Currently, a debug message gets printed every time an attempt to
add(remove) a CPU. However this is redundant if the CPU is already added
(removed) from the node.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826100521.412639-4-srikar@linux.vnet.ibm.com
2021-08-27 00:56:54 +10:00
Srikar Dronamraju
506c2075ff powerpc/numa: convert printk to pr_xxx
Convert the remaining printk to pr_xxx
One advantage would be all prints will now have prefix "numa:" from
pr_fmt().

[ convert printk(KERN_ERR) to pr_warn : Suggested by Laurent Dufour ]
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[mpe: Rebase onto powerpc/next, s/WARNING/Warning/]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826100521.412639-3-srikar@linux.vnet.ibm.com
2021-08-27 00:56:54 +10:00
Srikar Dronamraju
544af64297 powerpc/numa: Drop dbg in favour of pr_debug
powerpc supported numa=debug which is not documented. This option was
used to print early debug output. However something more flexible can be
achieved by using CONFIG_DYNAMIC_DEBUG.

Hence drop dbg (and numa=debug) in favour of pr_debug

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[mpe: Rebase on to powerpc/next form2 affinity changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826100521.412639-2-srikar@linux.vnet.ibm.com
2021-08-27 00:56:53 +10:00
Srikar Dronamraju
5bf63497b8 powerpc/smp: Enable CACHE domain for shared processor
Currently CACHE domain is not enabled on shared processor mode PowerVM
LPARS. On PowerVM systems, 'ibm,thread-group' device-tree property 2
under cpu-device-node indicates which all CPUs share L2-cache. However
'ibm,thread-group' device-tree property 2 is a relatively new property.
In absence of 'ibm,thread-group' property 2, 'l2-cache' device property
under cpu-device-node could help system to identify CPUs sharing L2-cache.
However this property is not exposed by PhyP in shared processor mode
configurations.

In absence of properties that inform OS about which CPUs share L2-cache,
fallback on core boundary.

Here are some stats from Power9 shared LPAR with the changes.

$ lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  8
Core(s) per socket:  1
Socket(s):           3
NUMA node(s):        2
Model:               2.2 (pvr 004e 0202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
NUMA node0 CPU(s):   16-23
NUMA node1 CPU(s):   0-15,24-31
Physical sockets:    2
Physical chips:      1
Physical cores/chip: 10

Before patch
$ grep -r . /sys/kernel/debug/sched/domains/cpu0/domain*/name
Before
/sys/kernel/debug/sched/domains/cpu0/domain0/name:SMT
/sys/kernel/debug/sched/domains/cpu0/domain1/name:DIE
/sys/kernel/debug/sched/domains/cpu0/domain2/name:NUMA

After
/sys/kernel/debug/sched/domains/cpu0/domain0/name:SMT
/sys/kernel/debug/sched/domains/cpu0/domain1/name:CACHE
/sys/kernel/debug/sched/domains/cpu0/domain2/name:DIE
/sys/kernel/debug/sched/domains/cpu0/domain3/name:NUMA

$  awk '/domain/{print $1, $2}' /proc/schedstat | sort -u | sed -e 's/00000000,//g'
Before
domain0 00000055
domain0 000000aa
domain0 00005500
domain0 0000aa00
domain0 00550000
domain0 00aa0000
domain0 55000000
domain0 aa000000
domain1 00ff0000
domain1 ff00ffff
domain2 ffffffff

After
domain0 00000055
domain0 000000aa
domain0 00005500
domain0 0000aa00
domain0 00550000
domain0 00aa0000
domain0 55000000
domain0 aa000000
domain1 000000ff
domain1 0000ff00
domain1 00ff0000
domain1 ff000000
domain2 ff00ffff
domain2 ffffffff
domain3 ffffffff

(Lower is better)
perf stat -a -r 5 -n perf bench sched pipe  | tail -n 2
Before
           153.798 +- 0.142 seconds time elapsed  ( +-  0.09% )

After
           111.545 +- 0.652 seconds time elapsed  ( +-  0.58% )

which is an improvement of 27.47%

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826100401.412519-4-srikar@linux.vnet.ibm.com
2021-08-27 00:56:53 +10:00
Srikar Dronamraju
b8b9280303 powerpc/smp: Update cpu_core_map on all PowerPc systems
lscpu() uses core_siblings to list the number of sockets in the
system. core_siblings is set using topology_core_cpumask.

While optimizing the powerpc bootup path, Commit 4ca234a9cb
("powerpc/smp: Stop updating cpu_core_mask").  it was found that
updating cpu_core_mask() ended up taking a lot of time. It was thought
that on Powerpc, cpu_core_mask() would always be same as
cpu_cpu_mask() i.e number of sockets will always be equal to number of
nodes. As an optimization, cpu_core_mask() was made a snapshot of
cpu_cpu_mask().

However that was found to be false with PowerPc KVM guests, where each
node could have more than one socket. So with Commit c47f892d7a
("powerpc/smp: Reintroduce cpu_core_mask"), cpu_core_mask was updated
based on chip_id but in an optimized way using some mask manipulations
and chip_id caching.

However on non-PowerNV and non-pseries KVM guests (i.e not
implementing cpu_to_chip_id(), continued to use a copy of
cpu_cpu_mask().

There are two issues that were noticed on such systems
1. lscpu would report one extra socket.
On a IBM,9009-42A (aka zz system) which has only 2 chips/ sockets/
nodes, lscpu would report
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              160
On-line CPU(s) list: 0-159
Thread(s) per core:  8
Core(s) per socket:  6
Socket(s):           3                <--------------
NUMA node(s):        2
Model:               2.2 (pvr 004e 0202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-79
NUMA node1 CPU(s):   80-159

2. Currently cpu_cpu_mask is updated when a core is
added/removed. However its not updated when smt mode switching or on
CPUs are explicitly offlined. However all other percpu masks are
updated to ensure only active/online CPUs are in the masks.
This results in build_sched_domain traces since there will be CPUs in
cpu_cpu_mask() but those CPUs are not present in SMT / CACHE / MC /
NUMA domains. A loop of threads running smt mode switching and core
add/remove will soon show this trace.
Hence cpu_cpu_mask has to be update at smt mode switch.

This will have impact on cpu_core_mask(). cpu_core_mask() is a
snapshot of cpu_cpu_mask. Different CPUs within the same socket will
end up having different cpu_core_masks since they are snapshots at
different points of time. This means when lscpu will start reporting
many more sockets than the actual number of sockets/ nodes / chips.

Different ways to handle this problem:
A. Update the snapshot aka cpu_core_mask for all CPUs whenever
   cpu_cpu_mask is updated. This would a non-optimal solution.
B. Instead of a cpumask_var_t, make cpu_core_map a cpumask pointer
   pointing to cpu_cpu_mask. However percpu cpumask pointer is frowned
   upon and we need a clean way to handle PowerPc KVM guest which is
   not a snapshot.
C. Update cpu_core_masks all PowerPc systems like in PowerPc KVM
guests using mask manipulations. This approach is relatively simple
and unifies with the existing code.
D. On top of 3, we could also resurrect get_physical_package_id which
   could return a nid for the said CPU. However this is not needed at this
   time.

Option C is the preferred approach for now.

While this is somewhat a revert of Commit 4ca234a9cb ("powerpc/smp:
Stop updating cpu_core_mask").

1. Plain revert has some conflicts
2. For chip_id == -1, the cpu_core_mask is made identical to
cpu_cpu_mask, unlike previously where cpu_core_mask was set to a core
if chip_id doesn't exist.

This goes by the principle that if chip_id is not exposed, then
sockets / chip / node share the same set of CPUs.

With the fix, lscpu o/p would be
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              160
On-line CPU(s) list: 0-159
Thread(s) per core:  8
Core(s) per socket:  6
Socket(s):           2                     <--------------
NUMA node(s):        2
Model:               2.2 (pvr 004e 0202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-79
NUMA node1 CPU(s):   80-159

Fixes: 4ca234a9cb ("powerpc/smp: Stop updating cpu_core_mask")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826100401.412519-3-srikar@linux.vnet.ibm.com
2021-08-27 00:56:53 +10:00
Srikar Dronamraju
8efd249bab powerpc/smp: Fix a crash while booting kvm guest with nr_cpus=2
Aneesh reported a crash with a fairly recent upstream kernel when
booting kernel whose commandline was appended with nr_cpus=2

1:mon> e
cpu 0x1: Vector: 300 (Data Access) at [c000000008a67bd0]
    pc: c00000000002557c: cpu_to_chip_id+0x3c/0x100
    lr: c000000000058380: start_secondary+0x460/0xb00
    sp: c000000008a67e70
   msr: 8000000000001033
   dar: 10
 dsisr: 80000
  current = 0xc00000000891bb00
  paca    = 0xc0000018ff981f80   irqmask: 0x03   irq_happened: 0x01
    pid   = 0, comm = swapper/1
Linux version 5.13.0-rc3-15704-ga050a6d2b7e8 (kvaneesh@ltc-boston8) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #433 SMP Tue May 25 02:38:49 CDT 2021
1:mon> t
[link register   ] c000000000058380 start_secondary+0x460/0xb00
[c000000008a67e70] c000000008a67eb0 (unreliable)
[c000000008a67eb0] c0000000000589d4 start_secondary+0xab4/0xb00
[c000000008a67f90] c00000000000c654 start_secondary_prolog+0x10/0x14

Current code assumes that num_possible_cpus() is always greater than
threads_per_core. However this may not be true when using nr_cpus=2 or
similar options. Handle the case where num_possible_cpus() is not an
exact multiple of  threads_per_core.

Fixes: c1e53367da ("powerpc/smp: Cache CPU to chip lookup")
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Debugged-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826100401.412519-2-srikar@linux.vnet.ibm.com
2021-08-27 00:56:53 +10:00
Joel Stanley
3e18e27118 powerpc/configs/microwatt: Enable options for systemd
When booting with systemd these options are required.

This increases the image by about 50KB, or 2%.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826122653.3236867-4-joel@jms.id.au
2021-08-27 00:56:53 +10:00
Joel Stanley
ef4fcaf99c powerpc/configs/microwattt: Enable Liteeth
Liteeth is the network device used by Microwatt.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826122653.3236867-3-joel@jms.id.au
2021-08-27 00:56:53 +10:00
Joel Stanley
602d0f9656 powerpc/microwatt: Add Ethernet to device tree
The liteeth network device is used in the Microwatt soc.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826122653.3236867-2-joel@jms.id.au
2021-08-27 00:56:53 +10:00
Christophe Leroy
8149238ffd powerpc: Redefine HMT_xxx macros as empty on PPC32
HMT_xxx macros are macros for adjusting thread priority
(hardware multi-threading) are macros inherited from PPC64
via commit 5f7c690728 ("[PATCH] powerpc: Merged ppc_asm.h")

Those instructions are pointless on PPC32, but some common
fonctions like arch_cpu_idle() use them.

So make them empty on PPC32 to avoid those instructions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c5a07fadea33d640ad10cecf0ac8faaec1c524e0.1629898474.git.christophe.leroy@csgroup.eu
2021-08-27 00:56:52 +10:00
Michael Ellerman
627e66f29a Merge changes from Paul Gortmaker
Merge the changes to retire the legacy WR sbc8548 and sbc8641 platforms
from Paul. These were sent as a pull request, but I rebased them onto
rc2 so as not to pull too many unrelated changes in to my next.

Description from Paul's pull request follows:

In v2.6.27 (2008, 917f0af9e5) the sbc8260 support was implicitly
retired by not being carried forward through the ppc --> powerpc
device tree transition.

Then, in v3.6 (2012, b048b4e17c) we retired the support for the
sbc8560 boards.

Next, in v4.18 (2017, 3bc6cf5a86) we retired the support for the
2006 vintage sbc834x boards.

The sbc8548 and sbc8641d boards were maybe 1-2 years newer than the
sbc834x boards, but it is also 3+ years later, so it makes sense to
now retire them as well - which is what is done here.

These two remaining WR boards were based on the Freescale MPC8548-CDS
and the MPC8641D-HPCN reference board implementations.  Having had the
chance to use these and many other Fsl ref boards, I know this:  The
Freescale reference boards were typically produced in limited quantity
and primarily available to BSP developers and hardware designers, and
not likely to have found a 2nd life with hobbyists and/or collectors.

It was good to have that BSP code subjected to mainline review and
hence also widely available back in the day. But given the above, we
should probably also be giving serious consideration to retiring
additional similar age/type reference board platforms as well.

I've always felt it is important for us to be proactive in retiring
old code, since it has a genuine non-zero carrying cost, as described
in the 930d52c012 merge log.  But for the here and now, we just
clean up the remaining BSP code that I had added for SBC platforms.

Link: https://lore.kernel.org/r/20210824174209.GB160508@windriver.com
2021-08-27 00:49:06 +10:00
Paul Gortmaker
d7c1814f2f powerpc: retire sbc8641d board support
The support was for this was added to mainline over 12 years ago, in
v2.6.26 [4e8aae89a3] just around the ppc --> powerpc migration.

I believe the board was introduced shortly after the sbc8548 board,
making it roughly a 14 year old platform - with the CPU speed and
memory size typical for that era.

I haven't had one of these boards for several years, and availability
was discontinued several years before that.

Given that, there is no point in adding a burden to testing coverage
that builds all possible defconfigs, so it makes sense to remove it.

Of course it will remain in the git history forever, for anyone who
happens to find a functional board and wants to tinker with it.

Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-08-27 00:48:18 +10:00
Paul Gortmaker
c12adb0678 powerpc: retire sbc8548 board support
The support was for this was mainlined 13 years ago, in v2.6.25
[0e0fffe887] just around the ppc --> powerpc migration.

I believe the board was introduced a year or two before that, so it
is roughly a 15 year old platform - with the CPU speed and memory size
that was typical for that era.

I haven't had one of these boards for several years, and availability
was discontinued several years before that.

Given that, there is no point in adding a burden to testing coverage
that builds all possible defconfigs, so it makes sense to remove it.

Of course it will remain in the git history forever, for anyone who
happens to find a functional board and wants to tinker with it.

Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-08-27 00:48:11 +10:00
Michael Ellerman
465e333e77 Merge branch 'topic/ppc-kvm' into next
Merge some KVM patches we are keeping in a topic branch in case there
are any merge conflicts that need resolving.
2021-08-26 21:21:11 +10:00
Christophe Leroy
806c0e6e7e powerpc: Refactor verification of MSR_RI
40x and BOOKE don't have MSR_RI therefore all tests involving
MSR_RI may be problematic on those plateforms.

Create helpers to check or set MSR_RI in regs, and use them
in common code.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c2fb93708196734f4176dda334aaa3055f213b89.1629707037.git.christophe.leroy@csgroup.eu
2021-08-26 21:21:07 +10:00
Christophe Leroy
133c17a178 powerpc: Remove MSR_PR check in interrupt_exit_{user/kernel}_prepare()
In those hot functions that are called at every interrupt, any saved
cycle is worth it.

interrupt_exit_user_prepare() and interrupt_exit_kernel_prepare() are
called from three places:
- From entry_32.S
- From interrupt_64.S
- From interrupt_exit_user_restart() and interrupt_exit_kernel_restart()

In entry_32.S, there are inambiguously called based on MSR_PR:

	interrupt_return:
		lwz	r4,_MSR(r1)
		addi	r3,r1,STACK_FRAME_OVERHEAD
		andi.	r0,r4,MSR_PR
		beq	.Lkernel_interrupt_return
		bl	interrupt_exit_user_prepare
	...
	.Lkernel_interrupt_return:
		bl	interrupt_exit_kernel_prepare

In interrupt_64.S, that's similar:

	interrupt_return_\srr\():
		ld	r4,_MSR(r1)
		andi.	r0,r4,MSR_PR
		beq	interrupt_return_\srr\()_kernel
	interrupt_return_\srr\()_user: /* make backtraces match the _kernel variant */
		addi	r3,r1,STACK_FRAME_OVERHEAD
		bl	interrupt_exit_user_prepare
	...
	interrupt_return_\srr\()_kernel:
		addi	r3,r1,STACK_FRAME_OVERHEAD
		bl	interrupt_exit_kernel_prepare

In interrupt_exit_user_restart() and interrupt_exit_kernel_restart(),
MSR_PR is verified respectively by BUG_ON(!user_mode(regs)) and
BUG_ON(user_mode(regs)) prior to calling interrupt_exit_user_prepare()
and interrupt_exit_kernel_prepare().

The verification in interrupt_exit_user_prepare() and
interrupt_exit_kernel_prepare() are therefore useless and can be removed.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/385ead49ccb66a259b25fee3eebf0bd4094068f3.1629707037.git.christophe.leroy@csgroup.eu
2021-08-26 21:21:07 +10:00
Xiongwei Song
d9db6e4202 powerpc/64e: Get dear offset with _DEAR macro
Use _DEAR to get the offset of dear register in pr_regs for 64e cpus.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210807010239.416055-5-sxwjean@me.com
2021-08-26 21:21:07 +10:00
Xiongwei Song
4872cbd0ca powerpc: Add dear as a synonym for pt_regs.dar register
Create an anonymous union for dar and dear regsiters, we can reference
dear to get the effective address when CONFIG_4xx=y or CONFIG_BOOKE=y.
Otherwise, reference dar. This makes code more clear.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Reword commit title]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210807010239.416055-4-sxwjean@me.com
2021-08-26 21:21:07 +10:00
Xiongwei Song
cfa47772ca powerpc/64e: Get esr offset with _ESR macro
Use _ESR to get the offset of esr register in pr_regs for 64e cpus.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210807010239.416055-3-sxwjean@me.com
2021-08-26 21:21:07 +10:00
Xiongwei Song
4f8e78c075 powerpc: Add esr as a synonym for pt_regs.dsisr
Create an anonymous union for dsisr and esr regsiters, we can reference
esr to get the exception detail when CONFIG_4xx=y or CONFIG_BOOKE=y.
Otherwise, reference dsisr. This makes code more clear.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Reword commit title]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210807010239.416055-2-sxwjean@me.com
2021-08-26 21:21:06 +10:00
Kajol Jain
3c69a5f222 powerpc/perf: Fix the check for SIAR value
Incase of random sampling, there can be scenarios where
Sample Instruction Address Register(SIAR) may not latch
to the sampled instruction and could result in
the value of 0. In these scenarios it is preferred to
return regs->nip. These corner cases are seen in the
previous generation (p9) also.

Patch adds the check for SIAR value along with regs_use_siar
and siar_valid checks so that the function will return
regs->nip incase SIAR is zero.

Patch drops the code under PPMU_P10_DD1 flag check
which handles SIAR 0 case only for Power10 DD1.

Fixes: 2ca13a4cc5 ("powerpc/perf: Use regs->nip when SIAR is zero")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210818171556.36912-3-kjain@linux.ibm.com
2021-08-25 22:38:19 +10:00
Kajol Jain
cc90c6742e powerpc/perf: Drop the case of returning 0 as instruction pointer
Drop the case of returning 0 as instruction pointer since kernel
never executes at 0 and userspace almost never does either.

Fixes: e6878835ac ("powerpc/perf: Sample only if SIAR-Valid bit is set in P7+")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210818171556.36912-2-kjain@linux.ibm.com
2021-08-25 22:38:19 +10:00
Kajol Jain
b1643084d1 powerpc/perf: Use stack siar instead of mfspr
Minor optimization in the 'perf_instruction_pointer' function code by
making use of stack siar instead of mfspr.

Fixes: 75382aa72f ("powerpc/perf: Move code to select SIAR or pt_regs into perf_read_regs")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210818171556.36912-1-kjain@linux.ibm.com
2021-08-25 22:38:18 +10:00
Nicholas Piggin
0c8fb653d4 powerpc/64s: Remove WORT SPR from POWER9/10
This register is not architected and not implemented in POWER9 or 10,
it just reads back zeroes for compatibility.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Link: https://lore.kernel.org/r/20210811160134.904987-11-npiggin@gmail.com
2021-08-25 16:37:18 +10:00
Nicholas Piggin
1782663897 KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs are live
After the L1 saves its PMU SPRs but before loading the L2's PMU SPRs,
switch the pmcregs_in_use field in the L1 lppaca to the value advertised
by the L2 in its VPA. On the way out of the L2, set it back after saving
the L2 PMU registers (if they were in-use).

This transfers the PMU liveness indication between the L1 and L2 at the
points where the registers are not live.

This fixes the nested HV bug for which a workaround was added to the L0
HV by commit 63279eeb7f ("KVM: PPC: Book3S HV: Always save guest pmu
for guest capable of nesting"), which explains the problem in detail.
That workaround is no longer required for guests that include this bug
fix.

Fixes: 360cae3137 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Link: https://lore.kernel.org/r/20210811160134.904987-10-npiggin@gmail.com
2021-08-25 16:37:18 +10:00
Fabiano Rosas
f2e29db156 KVM: PPC: Book3S HV Nested: save_hv_return_state does not require trap argument
vcpu is already anargument so vcpu->arch.trap can be used directly.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210811160134.904987-9-npiggin@gmail.com
2021-08-25 16:37:18 +10:00
Fabiano Rosas
7c3ded5735 KVM: PPC: Book3S HV Nested: Stop forwarding all HFUs to L1
If the nested hypervisor has no access to a facility because it has
been disabled by the host, it should also not be able to see the
Hypervisor Facility Unavailable that arises from one of its guests
trying to access the facility.

This patch turns a HFU that happened in L2 into a Hypervisor Emulation
Assistance interrupt and forwards it to L1 for handling. The ones that
happened because L1 explicitly disabled the facility for L2 are still
let through, along with the corresponding Cause bits in the HFSCR.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
[np: move handling into kvmppc_handle_nested_exit]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210811160134.904987-8-npiggin@gmail.com
2021-08-25 16:37:18 +10:00
Nicholas Piggin
8b210a880b KVM: PPC: Book3S HV Nested: Make nested HFSCR state accessible
When the L0 runs a nested L2, there are several permutations of HFSCR
that can be relevant. The HFSCR that the L1 vcpu L1 requested, the
HFSCR that the L1 vcpu may use, and the HFSCR that is actually being
used to run the L2.

The L1 requested HFSCR is not accessible outside the nested hcall
handler, so copy that into a new kvm_nested_guest.hfscr field.

The permitted HFSCR is taken from the HFSCR that the L1 runs with,
which is also not accessible while the hcall is being made. Move
this into a new kvm_vcpu_arch.hfscr_permitted field.

These will be used by the next patch to improve facility handling
for nested guests, and later by facility demand faulting patches.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210811160134.904987-7-npiggin@gmail.com
2021-08-25 16:37:18 +10:00
Fabiano Rosas
7487cabc7e KVM: PPC: Book3S HV Nested: Sanitise vcpu registers
As one of the arguments of the H_ENTER_NESTED hypercall, the nested
hypervisor (L1) prepares a structure containing the values of various
hypervisor-privileged registers with which it wants the nested guest
(L2) to run. Since the nested HV runs in supervisor mode it needs the
host to write to these registers.

To stop a nested HV manipulating this mechanism and using a nested
guest as a proxy to access a facility that has been made unavailable
to it, we have a routine that sanitises the values of the HV registers
before copying them into the nested guest's vcpu struct.

However, when coming out of the guest the values are copied as they
were back into L1 memory, which means that any sanitisation we did
during guest entry will be exposed to L1 after H_ENTER_NESTED returns.

This patch alters this sanitisation to have effect on the vcpu->arch
registers directly before entering and after exiting the guest,
leaving the structure that is copied back into L1 unchanged (except
when we really want L1 to access the value, e.g the Cause bits of
HFSCR).

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20210811160134.904987-6-npiggin@gmail.com
2021-08-25 16:37:17 +10:00
Nicholas Piggin
d82b392d9b KVM: PPC: Book3S HV Nested: Fix TM softpatch HFAC interrupt emulation
Have the TM softpatch emulation code set up the HFAC interrupt and
return -1 in case an instruction was executed with HFSCR bits clear,
and have the interrupt exit handler fall through to the HFAC handler.
When the L0 is running a nested guest, this ensures the HFAC interrupt
is correctly passed up to the L1.

The "direct guest" exit handler will turn these into PROGILL program
interrupts so functionality in practice will be unchanged. But it's
possible an L1 would want to handle these in a different way.

Also rearrange the FAC interrupt emulation code to match the HFAC format
while here (mainly, adding the FSCR_INTR_CAUSE mask).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210811160134.904987-5-npiggin@gmail.com
2021-08-25 16:37:17 +10:00
Nicholas Piggin
4782e0cd0d KVM: PPC: Book3S HV P9: Fixes for TM softpatch interrupt NIP
The softpatch interrupt sets HSRR0 to the faulting instruction +4, so
it should subtract 4 for the faulting instruction address in the case
it is a TM softpatch interrupt (the instruction was not executed) and
it was not emulated.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210811160134.904987-4-npiggin@gmail.com
2021-08-25 16:37:17 +10:00
Nicholas Piggin
daac40e8d7 KVM: PPC: Book3S HV: Remove TM emulation from POWER7/8 path
TM fake-suspend emulation is only used by POWER9. Remove it from the old
code path.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210811160134.904987-3-npiggin@gmail.com
2021-08-25 16:37:17 +10:00
Nicholas Piggin
fd42b7b09c KVM: PPC: Book3S HV: Initialise vcpu MSR with MSR_ME
It is possible to create a VCPU without setting the MSR before running
it, which results in a warning in kvmhv_vcpu_entry_p9() that MSR_ME is
not set. This is pretty harmless because the MSR_ME bit is added to
HSRR1 before HRFID to guest, and a normal qemu guest doesn't hit it.

Initialise the vcpu MSR with MSR_ME set.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210811160134.904987-2-npiggin@gmail.com
2021-08-25 16:37:17 +10:00
Christophe Leroy
9401f4e46c powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros
Force the eh flag at 0 on PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:49 +10:00
Christophe Leroy
19e932eb6e powerpc/ptrace: Make user_mode() common to PPC32 and PPC64
Today we have:

	#ifdef __powerpc64__
	#define user_mode(regs) ((((regs)->msr) >> MSR_PR_LG) & 0x1)
	#else
	#define user_mode(regs) (((regs)->msr & MSR_PR) != 0)
	#endif

With ppc64_defconfig, we get:

	if (!user_mode(regs))
    14b4:	e9 3e 01 08 	ld      r9,264(r30)
    14b8:	71 29 40 00 	andi.   r9,r9,16384
    14bc:	41 82 07 a4 	beq     1c60 <.emulate_instruction+0x7d0>

If taking the ppc32 definition of user_mode(), the exact same code
is generated for ppc64_defconfig.

So, only keep one version of user_mode(), preferably the one not
using MSR_PR_LG which should be kept internal to reg.h.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/000a28c51808bbd802b505af42d2cb316c2be7d3.1629216000.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:49 +10:00
Christophe Leroy
316389e904 powerpc/syscalls: Simplify do_mmap2()
When shift is nul, operations remain valid so no test needed.

And 'ret' is unnecessary.

And use IS_ALIGNED() to check alignment, that's more clear.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/373ec500f386374bc5735007df3d3869eac47be1.1624618701.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:48 +10:00
Christophe Leroy
e084728393 powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP
This patch converts powerpc to the generic PTDUMP implementation.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/03166d569526be70214fe9370a7bad219d2f41c8.1625762907.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:48 +10:00
Christophe Leroy
cf98d2b6ee powerpc/ptdump: Reduce level numbers by 1 in note_page() and add p4d level
Do the same as commit f8f0d0b6fa ("mm: ptdump: reduce level numbers
by 1 in note_page()") and add missing p4d level.

This will align powerpc to the users of generic ptdump.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d76495c574132b197b445a1f133755cca4b912a4.1625762906.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:48 +10:00
Christophe Leroy
64b87b0c70 powerpc/ptdump: Remove unused 'page_size' parameter
note_page_update_state() doesn't use page_size. Remove it.

Could also be removed to note_page() but as a following patch
will remove all current users of note_page(), just leave it as
is for now.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e2f80d052001155251bfe009c360d0c5d9242c6b.1625762906.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:48 +10:00
Christophe Leroy
11f27a7fa4 powerpc/ptdump: Use DEFINE_SHOW_ATTRIBUTE()
Use DEFINE_SHOW_ATTRIBUTE() instead of open coding
open() and fops.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b864a92693ca8413ef0b19f0c12065c212899b6e.1625762905.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:48 +10:00
Christophe Leroy
33e1402435 powerpc: Avoid link stack corruption in misc asm functions
bl;mflr is used at several places to get code position.

Use bcl 20,31,+4 instead of bl in order to preserve link stack.

See commit c974809a26 ("powerpc/vdso: Avoid link stack corruption
in __get_datapage()") for details.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c6eabb4fb6c156f75d56dcbcc6f243e5ac0fba42.1629791763.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:47 +10:00
Christophe Leroy
f5007dbf4d powerpc/booke: Avoid link stack corruption in several places
Use bcl 20,31,+4 instead of bl in order to preserve link stack.

See commit c974809a26 ("powerpc/vdso: Avoid link stack corruption
in __get_datapage()") for details.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e9fbc285eceb720e6c0e032ef47fe8b05f669b48.1629791751.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:47 +10:00
Christophe Leroy
113ec9ccc8 powerpc/32: indirect function call use bctrl rather than blrl in ret_from_kernel_thread
Copied from commit 89bbe4c798 ("powerpc/64: indirect function call
use bctrl rather than blrl in ret_from_kernel_thread")

blrl is not recommended to use as an indirect function call, as it may
corrupt the link stack predictor.

This is not a performance critical path but this should be fixed for
consistency.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/91b1d242525307ceceec7ef6e832bfbacdd4501b.1629436472.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:47 +10:00
Fabiano Rosas
0eb596f1e6 KVM: PPC: Book3S HV: Stop exporting symbols from book3s_64_mmu_radix
The book3s_64_mmu_radix.o object is not part of the KVM builtins and
all the callers of the exported symbols are in the same kvm-hv.ko
module so we should not need to export any symbols.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210805212616.2641017-4-farosas@linux.ibm.com
2021-08-24 23:14:06 +10:00
Fabiano Rosas
c232461c0c KVM: PPC: Book3S HV: Add sanity check to copy_tofrom_guest
Both paths into __kvmhv_copy_tofrom_guest_radix ensure that we arrive
with an effective address that is smaller than our total addressable
space and addresses quadrant 0.

- The H_COPY_TOFROM_GUEST hypercall path rejects the call with
H_PARAMETER if the effective address has any of the twelve most
significant bits set.

- The kvmhv_copy_tofrom_guest_radix path clears the top twelve bits
before calling the internal function.

Although the callers make sure that the effective address is sane, any
future use of the function is exposed to a programming error, so add a
sanity check.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210805212616.2641017-3-farosas@linux.ibm.com
2021-08-24 23:14:06 +10:00
Fabiano Rosas
5d7d6dac8f KVM: PPC: Book3S HV: Fix copy_tofrom_guest routines
The __kvmhv_copy_tofrom_guest_radix function was introduced along with
nested HV guest support. It uses the platform's Radix MMU quadrants to
provide a nested hypervisor with fast access to its nested guests
memory (H_COPY_TOFROM_GUEST hypercall). It has also since been added
as a fast path for the kvmppc_ld/st routines which are used during
instruction emulation.

The commit def0bfdbd6 ("powerpc: use probe_user_read() and
probe_user_write()") changed the low level copy function from
raw_copy_from_user to probe_user_read, which adds a check to
access_ok. In powerpc that is:

 static inline bool __access_ok(unsigned long addr, unsigned long size)
 {
         return addr < TASK_SIZE_MAX && size <= TASK_SIZE_MAX - addr;
 }

and TASK_SIZE_MAX is 0x0010000000000000UL for 64-bit, which means that
setting the two MSBs of the effective address (which correspond to the
quadrant) now cause access_ok to reject the access.

This was not caught earlier because the most common code path via
kvmppc_ld/st contains a fallback (kvm_read_guest) that is likely to
succeed for L1 guests. For nested guests there is no fallback.

Another issue is that probe_user_read (now __copy_from_user_nofault)
does not return the number of bytes not copied in case of failure, so
the destination memory is not being cleared anymore in
kvmhv_copy_from_guest_radix:

 ret = kvmhv_copy_tofrom_guest_radix(vcpu, eaddr, to, NULL, n);
 if (ret > 0)                            <-- always false!
         memset(to + (n - ret), 0, ret);

This patch fixes both issues by skipping access_ok and open-coding the
low level __copy_to/from_user_inatomic.

Fixes: def0bfdbd6 ("powerpc: use probe_user_read() and probe_user_write()")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210805212616.2641017-2-farosas@linux.ibm.com
2021-08-24 23:14:06 +10:00
Cédric Le Goater
cc47ad409b powerpc/compat_sys: Declare syscalls
This fixes a compile error with W=1.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210823090039.166120-3-clg@kaod.org
2021-08-23 19:28:21 +10:00
Cédric Le Goater
3accc0faef powerpc/prom: Fix unused variable ‘reserve_map’ when CONFIG_PPC32 is not set
This fixes a compile error with W=1.

arch/powerpc/kernel/prom.c: In function ‘early_reserve_mem’:
arch/powerpc/kernel/prom.c:625:10: error: variable ‘reserve_map’ set but not used [-Werror=unused-but-set-variable]
  __be64 *reserve_map;
          ^~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210823090039.166120-2-clg@kaod.org
2021-08-23 19:28:21 +10:00
Christophe Leroy
a00ea5b6f2 powerpc/syscalls: Remove __NR__exit
__NR__exit is nowhere used. On most architectures it was removed by
commit 135ab6ec8f ("[PATCH] remove remaining errno and
__KERNEL_SYSCALLS__ references") but not on powerpc.

powerpc removed __KERNEL_SYSCALLS__ in commit 3db03b4afb ("[PATCH]
rename the provided execve functions to kernel_execve"), but __NR__exit
was left over.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6457eb4f327313323ed1f70e540bbb4ddc9178fa.1629701106.git.christophe.leroy@csgroup.eu
2021-08-23 19:28:20 +10:00
Linus Torvalds
1bdc3d5be7 powerpc fixes for 5.14 #6
- Fix random crashes on some 32-bit CPUs by adding isync() after locking/unlocking KUEP
  - Fix intermittent crashes when loading modules with strict module RWX
  - Fix a section mismatch introduce by a previous fix.
 
 Thanks to: Christophe Leroy, Fabiano Rosas, Laurent Vivier, Murilo Opsfelder Araújo,
 Nathan Chancellor, Stan Johnson.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEhjVUTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgM/VD/4o5D9d2Xppt+T0zdnKomq+3ffkC33/
 zBK4vqOVOXbnlRpChqIqsHB3LNxMNMvTVaoLvxgy3ZQ57+rnirSDaFOaj4Nbazdx
 STwWmyxW9xPshqvj8tz8uHadSkvbrCClFy59FXtJf4H/iztTnQORKnXI9r3wxXS+
 wBhw8Nhquuqg4O5h4q6yLLRIAaskus7uymDzYHVZkHO0RhfPLEZJwfCxydc29ukK
 wIRB6qojFBbWm/UscY1w6FiYrBn4Y5F3DzoTzJ7xlO6l1NYaE+58aun/oTGU7922
 /8fXYs34TnkF6sA9qGhOtOc1MXfH8meFoH9s/fY3Z3O88xTe8k15wo2Ujlk/u0X1
 1Gzv9FZI0RnpPtSLPiiu72/zS/vFOxAVCFMTvcodlte9RN90fW5Qwt/O1ya22vWt
 Ea3O9iNmYgQ+lV7ZZYDtKQ22WHIublg6cY5d3NDyj5HrzN/vGyp3QJFb2dnWoEpx
 k/KkK16oiIlduLGiFoYjn1ELyHUBTvp483y7zmspA4fCb0ue6W8b2zt8FszH0hI7
 N4uroGXuk9OyhNsLWR8UHUR0s6Gi0XSaQ0O4XgWfoDAAvdev4oZiCqw0q5552OvX
 eE/Ogxc7INCiaoeLwOhYhCKjr+jBP8fhqyQzquyqMgUqEbxLtcFZCJ09bpXHjjiH
 OlAvZwlzOhwcKg==
 =K2B0
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix random crashes on some 32-bit CPUs by adding isync() after
   locking/unlocking KUEP

 - Fix intermittent crashes when loading modules with strict module RWX

 - Fix a section mismatch introduce by a previous fix.

Thanks to Christophe Leroy, Fabiano Rosas, Laurent Vivier, Murilo
Opsfelder Araújo, Nathan Chancellor, and Stan Johnson.

h# -----BEGIN PGP SIGNATURE-----

* tag 'powerpc-5.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Fix set_memory_*() against concurrent accesses
  powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP
  powerpc/xive: Do not mark xive_request_ipi() as __init
2021-08-22 09:49:31 -07:00
Jing Zhang
8ccba534a1 KVM: stats: Add halt polling related histogram stats
Add three log histogram stats to record the distribution of time spent
on successful polling, failed polling and VCPU wait.
halt_poll_success_hist: Distribution of spent time for a successful poll.
halt_poll_fail_hist: Distribution of spent time for a failed poll.
halt_wait_hist: Distribution of time a VCPU has spent on waiting.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210802165633.1866976-6-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20 16:06:33 -04:00
Jing Zhang
87bcc5fa09 KVM: stats: Add halt_wait_ns stats for all architectures
Add simple stats halt_wait_ns to record the time a VCPU has spent on
waiting for all architectures (not just powerpc).

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210802165633.1866976-5-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20 16:06:33 -04:00
Jing Zhang
f95937ccf5 KVM: stats: Support linear and logarithmic histogram statistics
Add new types of KVM stats, linear and logarithmic histogram.
Histogram are very useful for observing the value distribution
of time or size related stats.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210802165633.1866976-2-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20 16:06:32 -04:00
Christophe Leroy
770cec16cd powerpc/audit: Simplify syscall_get_arch()
Make use of is_32bit_task() and CONFIG_CPU_LITTLE_ENDIAN
to simplify syscall_get_arch().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4be53b9187a4d8c163968f4d224267e41a7fcc33.1629451479.git.christophe.leroy@csgroup.eu
2021-08-20 22:38:13 +10:00
Christophe Leroy
898a1ef06a powerpc/audit: Avoid unneccessary #ifdef in syscall_get_arguments()
Use is_32bit_task() which already handles CONFIG_COMPAT.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ba49cdd574558a0363300c3f6b5b062b397cb071.1629451483.git.christophe.leroy@csgroup.eu
2021-08-20 22:38:08 +10:00
Nicholas Piggin
787c70f2f9 powerpc/64s: Fix scv implicit soft-mask table for relocated kernels
The implict soft-mask table addresses get relocated if they use a
relative symbol like a label. This is right for code that runs relocated
but not for unrelocated. The scv interrupt vectors run unrelocated, so
absolute addresses are required for their soft-mask table entry.

This fixes crashing with relocated kernels, usually an asynchronous
interrupt hitting in the scv handler, then hitting the trap that checks
whether r1 is in userspace.

Fixes: 325678fd05 ("powerpc/64s: add a table of implicit soft-masked addresses")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210820103431.1701240-1-npiggin@gmail.com
2021-08-20 22:35:18 +10:00
Cédric Le Goater
b352ddae7b KVM: PPC: Book3S PR: Remove unused variable
This fixes a compile error with W=1.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819125656.14498-5-clg@kaod.org
2021-08-20 22:17:41 +10:00
Cédric Le Goater
cb53a93e33 KVM: PPC: Book3S PR: Declare kvmppc_handle_exit_pr()
This fixes a compile error with W=1.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819125656.14498-4-clg@kaod.org
2021-08-20 22:17:38 +10:00
Cédric Le Goater
4cb266074a powerpc/pseries/vas: Declare pseries_vas_fault_thread_fn() as static
This fixes a compile error with W=1.

Fixes: 6d0aaf5e0d ("powerpc/pseries/vas: Setup IRQ and fault handling")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819125656.14498-3-clg@kaod.org
2021-08-20 22:17:18 +10:00
Kajol Jain
f9addd85fb powerpc/perf/hv-gpci: Fix counter value parsing
H_GetPerformanceCounterInfo (0xF080) hcall returns the counter data in
the result buffer. Result buffer has specific format defined in the PAPR
specification. One of the fields is counter offset and width of the
counter data returned.

Counter data are returned in a unsigned char array in big endian byte
order. To get the final counter data, the values must be left shifted
byte at a time. But commit 220a0c609a ("powerpc/perf: Add support for
the hv gpci (get performance counter info) interface") made the shifting
bitwise and also assumed little endian order. Because of that, hcall
counters values are reported incorrectly.

In particular this can lead to counters go backwards which messes up the
counter prev vs now calculation and leads to huge counter value
reporting:

  #: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
           -C 0 -I 1000
        time             counts unit events
     1.000078854 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     2.000213293                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     3.000320107                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     4.000428392                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     5.000537864                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     6.000649087                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     7.000760312                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     8.000865218             16,448      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     9.000978985 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    10.001088891             16,384      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    11.001201435                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    12.001307937 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/

Fix the shifting logic to correct match the format, ie. read bytes in
big endian order.

Fixes: e4f226b158 ("powerpc/perf/hv-gpci: Increase request buffer size")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210813082158.429023-1-kjain@linux.ibm.com
2021-08-20 17:00:54 +10:00
Finn Thain
6cd717fe9b powerpc/tau: Add 'static' storage qualifier to 'tau_work' definition
This patch prevents the following sparse warning.

arch/powerpc/kernel/tau_6xx.c:199:1: sparse: sparse: symbol 'tau_work'
was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/44ab381741916a51e783c4a50d0b186abdd8f280.1629334014.git.fthain@linux-m68k.org
2021-08-20 17:00:53 +10:00
Lukas Bulwahn
310d2e83cb powerpc: Re-enable ARCH_ENABLE_SPLIT_PMD_PTLOCK
Commit 66f24fa766 ("mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK")
broke PMD split page table lock for powerpc.

It selects the non-existent config ARCH_ENABLE_PMD_SPLIT_PTLOCK in
arch/powerpc/platforms/Kconfig.cputype, but clearly intended to
select ARCH_ENABLE_SPLIT_PMD_PTLOCK (notice the word swapping!), as
that commit did for all other architectures.

Fix it by selecting the correct symbol ARCH_ENABLE_SPLIT_PMD_PTLOCK.

Fixes: 66f24fa766 ("mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK")
Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
[mpe: Reword change log to make it clear this is a bug fix]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819113954.17515-3-lukas.bulwahn@gmail.com
2021-08-20 12:38:50 +10:00
Lukas Bulwahn
c26d4c5d4f powerpc/kvm: Remove obsolete and unneeded select
Commit a278e7ea60 ("powerpc: Fix compile issue with force DAWR")
selects the non-existing config PPC_DAWR_FORCE_ENABLE for config
KVM_BOOK3S_64_HANDLER. As this commit also introduces a config PPC_DAWR
and this config PPC_DAWR is selected with PPC if PPC64, there is no
need for any further select in the KVM_BOOK3S_64_HANDLER.

Remove an obsolete and unneeded select in config KVM_BOOK3S_64_HANDLER.

The issue was identified with ./scripts/checkkconfigsymbols.py.

Fixes: a278e7ea60 ("powerpc: Fix compile issue with force DAWR")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210819113954.17515-2-lukas.bulwahn@gmail.com
2021-08-20 12:37:18 +10:00
Jakub Kicinski
f444fea789 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/ptp/Kconfig:
  55c8fca1da ("ptp_pch: Restore dependency on PCI")
  e5f3155267 ("ethernet: fix PTP_1588_CLOCK dependencies")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 18:09:18 -07:00
Alexey Dobriyan
c0891ac15f isystem: ship and use stdarg.h
Ship minimal stdarg.h (1 type, 4 macros) as <linux/stdarg.h>.
stdarg.h is the only userspace header commonly used in the kernel.

GPL 2 version of <stdarg.h> can be extracted from
http://archive.debian.org/debian/pool/main/g/gcc-4.2/gcc-4.2_4.2.4.orig.tar.gz

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-19 09:02:55 +09:00
Alexey Dobriyan
39f75da7bc isystem: trim/fixup stdarg.h and other headers
Delete/fixup few includes in anticipation of global -isystem compile
option removal.

Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition
of uintptr_t error (one definition comes from <stddef.h>, another from
<linux/types.h>).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-19 09:02:55 +09:00
Michael Ellerman
9f7853d760 powerpc/mm: Fix set_memory_*() against concurrent accesses
Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
on one of his systems:

  kernel tried to execute exec-protected page (c008000004073278) - exploit attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch
  Faulting instruction address: 0xc008000004073278
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
  CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
  Workqueue: events control_work_handler [virtio_console]
  NIP:  c008000004073278 LR: c008000004073278 CTR: c0000000001e9de0
  REGS: c00000002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
  MSR:  800000004280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002822 XER: 200400cf
  ...
  NIP fill_queue+0xf0/0x210 [virtio_console]
  LR  fill_queue+0xf0/0x210 [virtio_console]
  Call Trace:
    fill_queue+0xb4/0x210 [virtio_console] (unreliable)
    add_port+0x1a8/0x470 [virtio_console]
    control_work_handler+0xbc/0x1e8 [virtio_console]
    process_one_work+0x290/0x590
    worker_thread+0x88/0x620
    kthread+0x194/0x1a0
    ret_from_kernel_thread+0x5c/0x64

Jordan, Fabiano & Murilo were able to reproduce and identify that the
problem is caused by the call to module_enable_ro() in do_init_module(),
which happens after the module's init function has already been called.

Our current implementation of change_page_attr() is not safe against
concurrent accesses, because it invalidates the PTE before flushing the
TLB and then installing the new PTE. That leaves a window in time where
there is no valid PTE for the page, if another CPU tries to access the
page at that time we see something like the fault above.

We can't simply switch to set_pte_at()/flush TLB, because our hash MMU
code doesn't handle a set_pte_at() of a valid PTE. See [1].

But we do have pte_update(), which replaces the old PTE with the new,
meaning there's no window where the PTE is invalid. And the hash MMU
version hash__pte_update() deals with synchronising the hash page table
correctly.

[1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r.fsf@linux.ibm.com/

Fixes: 1f9ad21c3b ("powerpc/mm: Implement set_memory() routines")
Reported-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Murilo Opsfelder Araújo <muriloo@linux.ibm.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210818120518.3603172-1-mpe@ellerman.id.au
2021-08-19 09:41:54 +10:00
Christophe Leroy
ef486bf448 powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP
Commit b5efec00b6 ("powerpc/32s: Move KUEP locking/unlocking in C")
removed the 'isync' instruction after adding/removing NX bit in user
segments. The reasoning behind this change was that when setting the
NX bit we don't mind it taking effect with delay as the kernel never
executes text from userspace, and when clearing the NX bit this is
to return to userspace and then the 'rfi' should synchronise the
context.

However, it looks like on book3s/32 having a hash page table, at least
on the G3 processor, we get an unexpected fault from userspace, then
this is followed by something wrong in the verification of MSR_PR
at end of another interrupt.

This is fixed by adding back the removed isync() following update
of NX bit in user segment registers. Only do it for cores with an
hash table, as 603 cores don't exhibit that problem and the two isync
increase ./null_syscall selftest by 6 cycles on an MPC 832x.

First problem: unexpected WARN_ON() for mysterious PROTFAULT

  WARNING: CPU: 0 PID: 1660 at arch/powerpc/mm/fault.c:354 do_page_fault+0x6c/0x5b0
  Modules linked in:
  CPU: 0 PID: 1660 Comm: Xorg Not tainted 5.13.0-pmac-00028-gb3c15b60339a #40
  NIP:  c001b5c8 LR: c001b6f8 CTR: 00000000
  REGS: e2d09e40 TRAP: 0700   Not tainted  (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  00021032 <ME,IR,DR,RI>  CR: 42d04f30  XER: 20000000
  GPR00: c000424c e2d09f00 c301b680 e2d09f40 0000001e 42000000 00cba028 00000000
  GPR08: 08000000 48000010 c301b680 e2d09f30 22d09f30 00c1fff0 00cba000 a7b7ba4c
  GPR16: 00000031 00000000 00000000 00000000 00000000 00000000 a7b7b0d0 00c5c010
  GPR24: a7b7b64c a7b7d2f0 00000004 00000000 c1efa6c0 00cba02c 00000300 e2d09f40
  NIP [c001b5c8] do_page_fault+0x6c/0x5b0
  LR [c001b6f8] do_page_fault+0x19c/0x5b0
  Call Trace:
  [e2d09f00] [e2d09f04] 0xe2d09f04 (unreliable)
  [e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4
  --- interrupt: 300 at 0xa7a261dc
  NIP:  a7a261dc LR: a7a253bc CTR: 00000000
  REGS: e2d09f40 TRAP: 0300   Not tainted  (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 228428e2  XER: 20000000
  DAR: 00cba02c DSISR: 42000000
  GPR00: a7a27448 afa6b0e0 a74c35c0 a7b7b614 0000001e a7b7b614 00cba028 00000000
  GPR08: 00020fd9 00000031 00cb9ff8 a7a273b0 220028e2 00c1fff0 00cba000 a7b7ba4c
  GPR16: 00000031 00000000 00000000 00000000 00000000 00000000 a7b7b0d0 00c5c010
  GPR24: a7b7b64c a7b7d2f0 00000004 00000002 0000001e a7b7b614 a7b7aff4 00000030
  NIP [a7a261dc] 0xa7a261dc
  LR [a7a253bc] 0xa7a253bc
  --- interrupt: 300
  Instruction dump:
  7c4a1378 810300a0 75278410 83820298 83a300a4 553b018c 551e0036 4082038c
  2e1b0000 40920228 75280800 41820220 <0fe00000> 3b600000 41920214 81420594

Second problem: MSR PR is seen unset allthough the interrupt frame shows it set

  kernel BUG at arch/powerpc/kernel/interrupt.c:458!
  Oops: Exception in kernel mode, sig: 5 [#1]
  BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
  Modules linked in:
  CPU: 0 PID: 1660 Comm: Xorg Tainted: G        W         5.13.0-pmac-00028-gb3c15b60339a #40
  NIP:  c0011434 LR: c001629c CTR: 00000000
  REGS: e2d09e70 TRAP: 0700   Tainted: G        W          (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 42d09f30  XER: 00000000
  GPR00: 00000000 e2d09f30 c301b680 e2d09f40 83440000 c44d0e68 e2d09e8c 00000000
  GPR08: 00000002 00dc228a 00004000 e2d09f30 22d09f30 00c1fff0 afa6ceb4 00c26144
  GPR16: 00c25fb8 00c26140 afa6ceb8 90000000 00c944d8 0000001c 00000000 00200000
  GPR24: 00000000 000001fb afa6d1b4 00000001 00000000 a539a2a0 a530fd80 00000089
  NIP [c0011434] interrupt_exit_kernel_prepare+0x10/0x70
  LR [c001629c] interrupt_return+0x9c/0x144
  Call Trace:
  [e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4 (unreliable)
  --- interrupt: 300 at 0xa09be008
  NIP:  a09be008 LR: a09bdfe8 CTR: a09bdfc0
  REGS: e2d09f40 TRAP: 0300   Tainted: G        W          (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 420028e2  XER: 20000000
  DAR: a539a308 DSISR: 0a000000
  GPR00: a7b90d50 afa6b2d0 a74c35c0 a0a8b690 a0a8b698 a5365d70 a4fa82a8 00000004
  GPR08: 00000000 a09bdfc0 00000000 a5360000 a09bde7c 00c1fff0 afa6ceb4 00c26144
  GPR16: 00c25fb8 00c26140 afa6ceb8 90000000 00c944d8 0000001c 00000000 00200000
  GPR24: 00000000 000001fb afa6d1b4 00000001 00000000 a539a2a0 a530fd80 00000089
  NIP [a09be008] 0xa09be008
  LR [a09bdfe8] 0xa09bdfe8
  --- interrupt: 300
  Instruction dump:
  80010024 83e1001c 7c0803a6 4bffff80 3bc00800 4bffffd0 486b42fd 4bffffcc
  81430084 71480002 41820038 554a0462 <0f0a0000> 80620060 74630001 40820034

Fixes: b5efec00b6 ("powerpc/32s: Move KUEP locking/unlocking in C")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4856f5574906e2aec0522be17bf3848a22b2cd0b.1629269345.git.christophe.leroy@csgroup.eu
2021-08-19 09:41:54 +10:00
Nathan Chancellor
3f78c90f9e powerpc/xive: Do not mark xive_request_ipi() as __init
Compiling ppc64le_defconfig with clang-14 shows a modpost warning:

WARNING: modpost: vmlinux.o(.text+0xa74e0): Section mismatch in
reference from the function xive_setup_cpu_ipi() to the function
.init.text:xive_request_ipi()
The function xive_setup_cpu_ipi() references
the function __init xive_request_ipi().
This is often because xive_setup_cpu_ipi lacks a __init
annotation or the annotation of xive_request_ipi is wrong.

xive_request_ipi() is called from xive_setup_cpu_ipi(), which is not
__init, so xive_request_ipi() should not be marked __init. Remove the
attribute so there is no more warning.

Fixes: cbc06f051c ("powerpc/xive: Do not skip CPU-less nodes when creating the IPIs")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210816185711.21563-1-nathan@kernel.org
2021-08-18 23:52:13 +10:00
Christophe Leroy
51ed00e71f powerpc/32: Remove unneccessary calculations in load_up_{fpu/altivec}
No need to re-read SPRN_THREAD, we can calculate thread address
from current (r2).

And remove a reload of value 1 into r4 as r4 is already 1.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c04cce578b97a76a9e69a096698b1d89f721768a.1629276437.git.christophe.leroy@csgroup.eu
2021-08-18 23:49:55 +10:00
Wan Jiabing
e225c4d6bc powerpc: Remove duplicate includes
interrupt.c: asm/interrupt.h has been included at line 12, so remove the
duplicate one at line 10.

time.c: linux/sched/clock.h has been included at line 33,so remove the
duplicate one at line 56 and move sched/cputime.h under sched including
segament.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210323062916.295346-1-wanjiabing@vivo.com
2021-08-18 09:25:23 +10:00
Joel Stanley
87e0d46bf6 powerpc/configs: Regenerate mpc885_ads_defconfig
Regenerate atop v5.14-rc6 by doing a make savedefconfig.

The changes a re-ordering except for the following (which are still set
indirectly):

 - CONFIG_DEBUG_KERNEL=y selected by EXPERT

 - CONFIG_PPC_EARLY_DEBUG_CPM_ADDR=0xff002008 which is the default
   setting

Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817045407.2445664-4-joel@jms.id.au
2021-08-17 22:52:19 +10:00
Joel Stanley
d0e28a6145 powerpc/config: Renable MTD_PHYSMAP_OF
CONFIG_MTD_PHYSMAP_OF is not longer enabled as it depends on
MTD_PHYSMAP which is not enabled.

This is a regression from commit 642b1e8dbe ("mtd: maps: Merge
physmap_of.c into physmap-core.c"), which added the extra dependency.
Add CONFIG_MTD_PHYSMAP=y so this stays in the config, as Christophe said
it is useful for build coverage.

Fixes: 642b1e8dbe ("mtd: maps: Merge physmap_of.c into physmap-core.c")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817045407.2445664-3-joel@jms.id.au
2021-08-17 22:52:19 +10:00
Joel Stanley
c5ac55b6cb powerpc/config: Fix IPV6 warning in mpc855_ads
When building this config there's a warning:

  79⚠️ override: reassigning to symbol IPV6

Commit 9a1762a4a4 ("powerpc/8xx: Update mpc885_ads_defconfig to
improve CI") added CONFIG_IPV6=y, but left '# CONFIG_IPV6 is not set'
in.

IPV6 is default y, so remove both to clean up the build.

Fixes: 9a1762a4a4 ("powerpc/8xx: Update mpc885_ads_defconfig to improve CI")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817045407.2445664-2-joel@jms.id.au
2021-08-17 22:52:19 +10:00
Michael Ellerman
e95ad5f216 powerpc/head_check: Fix shellcheck errors
Replace "cat file | grep pattern" with "grep pattern file", and quote a
few variables. Together that fixes all shellcheck errors.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817125154.3369884-1-mpe@ellerman.id.au
2021-08-17 22:52:02 +10:00
Masahiro Yamada
4aae683f13 tracing: Refactor TRACE_IRQFLAGS_SUPPORT in Kconfig
Make architectures select TRACE_IRQFLAGS_SUPPORT instead of
having many defines.

Link: https://lkml.kernel.org/r/20210731052233.4703-2-masahiroy@kernel.org

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>   #arch/arc
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:21 -04:00
Randy Dunlap
47c258d71e powerpc/head_check: use stdout for error messages
Prefer stderr instead of stdout for error messages.
This is a good practice and can help CI error detecting and
reporting (0day in this case).

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210815222334.9575-1-rdunlap@infradead.org
2021-08-16 16:28:14 +10:00
Michael Ellerman
8b893ef190 powerpc/pseries: Fix build error when NUMA=n
As reported by lkp, if NUMA=n we see a build error:

   arch/powerpc/platforms/pseries/hotplug-cpu.c: In function 'pseries_cpu_hotplug_init':
   arch/powerpc/platforms/pseries/hotplug-cpu.c:1022:8: error: 'node_to_cpumask_map' undeclared
    1022 |        node_to_cpumask_map[node]);

Use cpumask_of_node() which has an empty stub for NUMA=n, and when
NUMA=y does a lookup from node_to_cpumask_map[].

Fixes: bd1dd4c5f5 ("powerpc/pseries: Prevent free CPU ids being reused on another node")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210816041032.2839343-1-mpe@ellerman.id.au
2021-08-16 14:11:00 +10:00
Linus Torvalds
ecf9343196 powerpc fixes for 5.14 #5
- Fix crashes coming out of nap on 32-bit Book3s (eg. powerbooks).
  - Fix critical and debug interrupts on BookE, seen as crashes when using ptrace.
  - Fix an oops when running an SMP kernel on a UP system.
  - Update pseries LPAR security flavor after partition migration.
  - Fix an oops when using kprobes on BookE.
  - Fix oops on 32-bit pmac by not calling do_IRQ() from timer_interrupt().
  - Fix softlockups on CPU hotplug into a CPU-less node with xive (P9).
 
 Thanks to: Cédric Le Goater, Christophe Leroy, Finn Thain, Geetika Moolchandani, Laurent
 Dufour, Laurent Vivier, Nicholas Piggin, Pu Lehui, Radu Rendec, Srikar Dronamraju, Stan
 Johnson.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEY/6QTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgHNBD/9zsObrReMj+lKbgEBw5u8kbERqWjQ4
 otJuxkT3mrQTA2YsJZ4QpUE7C+78h7P7aS3LpfpeONkI+WSxbuq/j+47538mpiLu
 LmasKZVVdLP3b+3eww2pOEYKF1qACkBxsy6gBy0DAzoWAjczVQkdpoe1pXyIQjz2
 j3UyuuFvyE76eKHn7aSfOHO1PiNfO0ZXghum9gc5kXsOsqg9eaFbbJ4HUD2FHd6V
 UmIl+njlt03TS6TBXkZwpcplfZWhcks7ZY/VqylrWSlbUx75J2aJ2hb0G1iU3l9S
 51AepEOQmZnkhOGA19PJhVudtUBc8pw5RCwYPeqv71tgo8hayCVgjBy+kmHqAvFI
 u0iFqA1dZjCPaFlm9Pcgq/DZdzD2xFLilpY/e4qwyDrQ1TsXM4CdJpEkaSsZ2IZ/
 HQbvjx1D4U7qZTPCMGSG4IQNtxtSVrZO8CzKoRUTDVDLPdjW/259abLQQTpY7x8z
 N5M5KeCk6xNk1ZYzxpzRKk+qSwiueIrqyP5GMMfzOCtJwBe7Q+vWtN1RbNQ2pBVO
 TUzQ0b7WYqiweNUFahXzgeUBUXP6HixG3Ay7z8bnUaWgWSgD8agbyx0gX1Jtj/cJ
 GAnKOH+GygnqIsijonohXpS+TPOHTR7hAP2w3G7ONJhXiaBKFHp4PKJwSO5tuiR3
 NZqm9NYZEsf6CQ==
 =GOBK
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix crashes coming out of nap on 32-bit Book3s (eg. powerbooks).

 - Fix critical and debug interrupts on BookE, seen as crashes when
   using ptrace.

 - Fix an oops when running an SMP kernel on a UP system.

 - Update pseries LPAR security flavor after partition migration.

 - Fix an oops when using kprobes on BookE.

 - Fix oops on 32-bit pmac by not calling do_IRQ() from
   timer_interrupt().

 - Fix softlockups on CPU hotplug into a CPU-less node with xive (P9).

Thanks to Cédric Le Goater, Christophe Leroy, Finn Thain, Geetika
Moolchandani, Laurent Dufour, Laurent Vivier, Nicholas Piggin, Pu Lehui,
Radu Rendec, Srikar Dronamraju, and Stan Johnson.

* tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/xive: Do not skip CPU-less nodes when creating the IPIs
  powerpc/interrupt: Do not call single_step_exception() from other exceptions
  powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()
  powerpc/kprobes: Fix kprobe Oops happens in booke
  powerpc/pseries: Fix update of LPAR security flavor after LPM
  powerpc/smp: Fix OOPS in topology_init()
  powerpc/32: Fix critical and debug interrupts on BOOKE
  powerpc/32s: Fix napping restore in data storage interrupt (DSI)
2021-08-15 06:57:43 -10:00
Fangrui Song
0355785313 powerpc: Add "-z notext" flag to disable diagnostic
Object files used to link .tmp_vmlinux.kallsyms1 have many
R_PPC64_ADDR64 relocations in non-SHF_WRITE sections. There are many
text relocations (e.g. in .rela___ksymtab_gpl+* and .rela__mcount_loc
sections) in a -pie link and are disallowed by LLD:

  ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against local symbol in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output
  >>> defined in arch/powerpc/kernel/head_64.o
  >>> referenced by arch/powerpc/kernel/head_64.o:(__restart_table+0x10)

Newer GNU ld configured with "--enable-textrel-check=error" will report
an error as well:

  $ ld-new -EL -m elf64lppc -pie ... -o .tmp_vmlinux.kallsyms1 ...
  ld-new: read-only segment has dynamic relocations

Add "-z notext" to suppress the errors. Non-CONFIG_RELOCATABLE builds
use the default -no-pie mode and thus R_PPC64_ADDR64 relocations can be
resolved at link-time.

Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp>
Co-developed-by: Bill Wendling <morbo@google.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Bill Wendling <morbo@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210813200511.1905703-1-morbo@google.com
2021-08-15 13:49:39 +10:00
Christophe Leroy
1e688dd2a3 powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
flexibility to GCC.

For that add an entry to the exception table so that
program_check_exception() knowns where to resume execution
after a WARNING.

Here are two exemples. The first one is done on PPC32 (which
benefits from the previous patch), the second is on PPC64.

	unsigned long test(struct pt_regs *regs)
	{
		int ret;

		WARN_ON(regs->msr & MSR_PR);

		return regs->gpr[3];
	}

	unsigned long test9w(unsigned long a, unsigned long b)
	{
		if (WARN_ON(!b))
			return 0;
		return a / b;
	}

Before the patch:

	000003a8 <test>:
	 3a8:	81 23 00 84 	lwz     r9,132(r3)
	 3ac:	71 29 40 00 	andi.   r9,r9,16384
	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
	 3b8:	4e 80 00 20 	blr

	 3bc:	0f e0 00 00 	twui    r0,0
	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
	 3c4:	4e 80 00 20 	blr

	0000000000000bf0 <.test9w>:
	 bf0:	7c 89 00 74 	cntlzd  r9,r4
	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
	 bf8:	0b 09 00 00 	tdnei   r9,0
	 bfc:	2c 24 00 00 	cmpdi   r4,0
	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
	 c04:	7c 63 23 92 	divdu   r3,r3,r4
	 c08:	4e 80 00 20 	blr

	 c0c:	38 60 00 00 	li      r3,0
	 c10:	4e 80 00 20 	blr

After the patch:

	000003a8 <test>:
	 3a8:	81 23 00 84 	lwz     r9,132(r3)
	 3ac:	71 29 40 00 	andi.   r9,r9,16384
	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
	 3b8:	4e 80 00 20 	blr

	 3bc:	0f e0 00 00 	twui    r0,0

	0000000000000c50 <.test9w>:
	 c50:	7c 89 00 74 	cntlzd  r9,r4
	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
	 c58:	0b 09 00 00 	tdnei   r9,0
	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
	 c60:	4e 80 00 20 	blr

	 c70:	38 60 00 00 	li      r3,0
	 c74:	4e 80 00 20 	blr

In the first exemple, we see GCC doesn't need to duplicate what
happens after the trap.

In the second exemple, we see that GCC doesn't need to emit a test
and a branch in the likely path in addition to the trap.

We've got some WARN_ON() in .softirqentry.text section so it needs
to be added in the OTHER_TEXT_SECTIONS in modpost.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/389962b1b702e3c78d169e59bcfac56282889173.1618331882.git.christophe.leroy@csgroup.eu
2021-08-15 13:49:24 +10:00
Christophe Leroy
db87a71992 powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.

For catching simple conditions like a variable having value 0, this
is efficient because it does the test and the trap at the same time.
But most conditions used with BUG_ON or WARN_ON are more complex and
forces GCC to format the condition into a 0 or 1 value in a register.
This will usually require 2 to 3 instructions.

The most efficient solution would be to use __builtin_trap() because
GCC is able to optimise the use of the different trap instructions
based on the requested condition, but this is complex if not
impossible for the following reasons:
- __builtin_trap() is a non-recoverable instruction, so it can't be
used for WARN_ON
- Knowing which line of code generated the trap would require the
analysis of DWARF information. This is not a feature we have today.

As mentioned in commit 8d4fbcfbe0 ("Fix WARN_ON() on bitfield ops")
the way WARN_ON() is implemented is suboptimal. That commit also
mentions an issue with 'long long' condition. It fixed it for
WARN_ON() but the same problem still exists today with BUG_ON() on
PPC32. It will be fixed by using the generic implementation.

By using the generic implementation, gcc will naturally generate a
branch to the unconditional trap generated by BUG().

As modern powerpc implement zero-cycle branch,
that's even more efficient.

And for the functions using WARN_ON() and its return, the test
on return from WARN_ON() is now also used for the WARN_ON() itself.

On PPC64 we don't want it because we want to be able to use CFAR
register to track how we entered the code that trapped. The CFAR
register would be clobbered by the branch.

A simple test function:

	unsigned long test9w(unsigned long a, unsigned long b)
	{
		if (WARN_ON(!b))
			return 0;
		return a / b;
	}

Before the patch:

	0000046c <test9w>:
	 46c:	7c 89 00 34 	cntlzw  r9,r4
	 470:	55 29 d9 7e 	rlwinm  r9,r9,27,5,31
	 474:	0f 09 00 00 	twnei   r9,0
	 478:	2c 04 00 00 	cmpwi   r4,0
	 47c:	41 82 00 0c 	beq     488 <test9w+0x1c>
	 480:	7c 63 23 96 	divwu   r3,r3,r4
	 484:	4e 80 00 20 	blr

	 488:	38 60 00 00 	li      r3,0
	 48c:	4e 80 00 20 	blr

After the patch:

	00000468 <test9w>:
	 468:	2c 04 00 00 	cmpwi   r4,0
	 46c:	41 82 00 0c 	beq     478 <test9w+0x10>
	 470:	7c 63 23 96 	divwu   r3,r3,r4
	 474:	4e 80 00 20 	blr

	 478:	0f e0 00 00 	twui    r0,0
	 47c:	38 60 00 00 	li      r3,0
	 480:	4e 80 00 20 	blr

So we see before the patch we need 3 instructions on the likely path
to handle the WARN_ON(). With the patch the trap goes on the unlikely
path.

See below the difference at the entry of system_call_exception where
we have several BUG_ON(), allthough less impressing.

With the patch:

	00000000 <system_call_exception>:
	   0:	81 6a 00 84 	lwz     r11,132(r10)
	   4:	90 6a 00 88 	stw     r3,136(r10)
	   8:	71 60 00 02 	andi.   r0,r11,2
	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
	  10:	71 60 40 00 	andi.   r0,r11,16384
	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
	  18:	71 6b 80 00 	andi.   r11,r11,32768
	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
	  20:	94 21 ff e0 	stwu    r1,-32(r1)
	  24:	93 e1 00 1c 	stw     r31,28(r1)
	  28:	7d 8c 42 e6 	mftb    r12
	...
	  7c:	0f e0 00 00 	twui    r0,0
	  80:	0f e0 00 00 	twui    r0,0
	  84:	0f e0 00 00 	twui    r0,0

Without the patch:

	00000000 <system_call_exception>:
	   0:	94 21 ff e0 	stwu    r1,-32(r1)
	   4:	93 e1 00 1c 	stw     r31,28(r1)
	   8:	90 6a 00 88 	stw     r3,136(r10)
	   c:	81 6a 00 84 	lwz     r11,132(r10)
	  10:	69 60 00 02 	xori    r0,r11,2
	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
	  18:	0f 00 00 00 	twnei   r0,0
	  1c:	69 60 40 00 	xori    r0,r11,16384
	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
	  24:	0f 00 00 00 	twnei   r0,0
	  28:	69 6b 80 00 	xori    r11,r11,32768
	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
	  30:	0f 0b 00 00 	twnei   r11,0
	  34:	7d 8c 42 e6 	mftb    r12

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b286e07fb771a664b631cd07a40b09c06f26e64b.1618331881.git.christophe.leroy@csgroup.eu
2021-08-14 22:00:01 +10:00
Aneesh Kumar K.V
1c6b5a7e74 powerpc/pseries: Add support for FORM2 associativity
PAPR interface currently supports two different ways of communicating resource
grouping details to the OS. These are referred to as Form 0 and Form 1
associativity grouping. Form 0 is the older format and is now considered
deprecated. This patch adds another resource grouping named FORM2.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-6-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:27 +10:00
Aneesh Kumar K.V
ef31cb83d1 powerpc/pseries: Add a helper for form1 cpu distance
This helper is only used with the dispatch trace log collection.
A later patch will add Form2 affinity support and this change helps
in keeping that simpler. Also add a comment explaining we don't expect
the code to be called with FORM0

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-5-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:27 +10:00
Aneesh Kumar K.V
8ddc6448ec powerpc/pseries: Consolidate different NUMA distance update code paths
The associativity details of the newly added resourced are collected from
the hypervisor via "ibm,configure-connector" rtas call. Update the numa
distance details of the newly added numa node after the above call.

Instead of updating NUMA distance every time we lookup a node id
from the associativity property, add helpers that can be used
during boot which does this only once. Also remove the distance
update from node id lookup helpers.

Currently, we duplicate parsing code for ibm,associativity and
ibm,associativity-lookup-arrays in the kernel. The associativity array provided
by these device tree properties are very similar and hence can use
a helper to parse the node id and numa distance details.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-4-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V
0eacd06bb8 powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY
Also make related code cleanup that will allow adding FORM2_AFFINITY in
later patches. No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-3-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V
7e35ef662c powerpc/pseries: rename min_common_depth to primary_domain_index
No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-2-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V
dbf77fed8b powerpc: rename powerpc_debugfs_root to arch_debugfs_dir
No functional change in this patch. arch_debugfs_dir is the generic kernel
name declared in linux/debugfs.h for arch-specific debugfs directory.
Architectures like x86/s390 already use the name. Rename powerpc
specific powerpc_debugfs_root to arch_debugfs_dir.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V
3e188b1ae8 powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry
Similar to x86/s390 add a debugfs file to tune tlb_single_page_flush_ceiling.
Also add a debugfs entry for tlb_local_single_page_flush_ceiling.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132831.233794-1-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:26 +10:00
Emmanuel Gil Peyrot
140a89b7bf powerpc: wii_defconfig: Enable OTP by default
This selects the nintendo-otp module when building for this platform.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210801073822.12452-6-linkmauve@linkmauve.fr
2021-08-13 22:04:25 +10:00
Emmanuel Gil Peyrot
562a610b4c powerpc: wii.dts: Expose the OTP on this platform
This can be used by the newly-added nintendo-otp nvmem module.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210801073822.12452-5-linkmauve@linkmauve.fr
2021-08-13 22:04:25 +10:00
Emmanuel Gil Peyrot
b11748e693 powerpc: wii.dts: Reduce the size of the control area
This is wrong, but needed in order to avoid overlapping ranges with the
OTP area added in the next commit.  A refactor of this part of the
device tree is needed: according to Wiibrew[1], this area starts at
0x0d800000 and spans 0x400 bytes (that is, 0x100 32-bit registers),
encompassing PIC and GPIO registers, amongst the ones already exposed in
this device tree, which should become children of the control@d800000
node.

[1] https://wiibrew.org/wiki/Hardware/Hollywood_Registers

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210801073822.12452-4-linkmauve@linkmauve.fr
2021-08-13 22:04:25 +10:00
Cédric Le Goater
cbc06f051c powerpc/xive: Do not skip CPU-less nodes when creating the IPIs
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at
runtime. Today, the IPI is not created for such nodes, and hot-plugged
CPUs use a bogus IPI, which leads to soft lockups.

We can not directly allocate and request the IPI on demand because
bringup_up() is called under the IRQ sparse lock. The alternative is
to allocate the IPIs for all possible nodes at startup and to request
the mapping on demand when the first CPU of a node is brought up.

Fixes: 7dcc37b3ef ("powerpc/xive: Map one IPI interrupt per node")
Cc: stable@vger.kernel.org # v5.13
Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210807072057.184698-1-clg@kaod.org
2021-08-12 22:31:41 +10:00
Christophe Leroy
01fcac8e4d powerpc/interrupt: Do not call single_step_exception() from other exceptions
single_step_exception() is called by emulate_single_step() which
is called from (at least) alignment exception() handler and
program_check_exception() handler.

Redefine it as a regular __single_step_exception() which is called
by both single_step_exception() handler and emulate_single_step()
function.

Fixes: 3a96570ffc ("powerpc: convert interrupt handlers to use wrappers")
Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aed174f5cbc06f2cf95233c071d8aac948e46043.1628611921.git.christophe.leroy@csgroup.eu
2021-08-12 22:22:57 +10:00
Christophe Leroy
98694166c2 powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()
An interrupt handler shall not be called from another interrupt
handler otherwise this leads to problems like the following:

  Kernel attempted to write user page (afd4fa84) - exploit attempt? (uid: 1000)
  ------------[ cut here ]------------
  Bug: Write fault blocked by KUAP!
  WARNING: CPU: 0 PID: 1617 at arch/powerpc/mm/fault.c:230 do_page_fault+0x484/0x720
  Modules linked in:
  CPU: 0 PID: 1617 Comm: sshd Tainted: G        W         5.13.0-pmac-00010-g8393422eb77 #7
  NIP:  c001b77c LR: c001b77c CTR: 00000000
  REGS: cb9e5bc0 TRAP: 0700   Tainted: G        W          (5.13.0-pmac-00010-g8393422eb77)
  MSR:  00021032 <ME,IR,DR,RI>  CR: 24942424  XER: 00000000

  GPR00: c001b77c cb9e5c80 c1582c00 00000021 3ffffbff 085b0000 00000027 c8eb644c
  GPR08: 00000023 00000000 00000000 00000000 24942424 0063f8c8 00000000 000186a0
  GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90
  GPR24: 00000040 afd4fa96 00000040 02000000 c1fda6c0 afd4fa84 00000300 cb9e5cc0
  NIP [c001b77c] do_page_fault+0x484/0x720
  LR [c001b77c] do_page_fault+0x484/0x720
  Call Trace:
  [cb9e5c80] [c001b77c] do_page_fault+0x484/0x720 (unreliable)
  [cb9e5cb0] [c000424c] DataAccess_virt+0xd4/0xe4
  --- interrupt: 300 at __copy_tofrom_user+0x110/0x20c
  NIP:  c001f9b4 LR: c03250a0 CTR: 00000004
  REGS: cb9e5cc0 TRAP: 0300   Tainted: G        W          (5.13.0-pmac-00010-g8393422eb77)
  MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 48028468  XER: 20000000
  DAR: afd4fa84 DSISR: 0a000000
  GPR00: 20726f6f cb9e5d80 c1582c00 00000004 cb9e5e3a 00000016 afd4fa80 00000000
  GPR08: 3835202d 72777872 2d78722d 00000004 28028464 0063f8c8 00000000 000186a0
  GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90
  GPR24: 00000040 afd4fa96 00000040 cb9e5e0c 00000daa a0000000 cb9e5e98 afd4fa56
  NIP [c001f9b4] __copy_tofrom_user+0x110/0x20c
  LR [c03250a0] _copy_to_iter+0x144/0x990
  --- interrupt: 300
  [cb9e5d80] [c03e89c0] n_tty_read+0xa4/0x598 (unreliable)
  [cb9e5df0] [c03e2a0c] tty_read+0xdc/0x2b4
  [cb9e5e80] [c0156bf8] vfs_read+0x274/0x340
  [cb9e5f00] [c01571ac] ksys_read+0x70/0x118
  [cb9e5f30] [c0016048] ret_from_syscall+0x0/0x28
  --- interrupt: c00 at 0xa7855c88
  NIP:  a7855c88 LR: a7855c5c CTR: 00000000
  REGS: cb9e5f40 TRAP: 0c00   Tainted: G        W          (5.13.0-pmac-00010-g8393422eb77)
  MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 2402446c  XER: 00000000

  GPR00: 00000003 afd4ec70 a72137d0 0000000b afd4ecac 00004000 0065a990 00000800
  GPR08: 00000000 a7947930 00000000 00000004 c15831b0 0063f8c8 00000000 000186a0
  GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 0065a9e0 00000001 0065fac0
  GPR24: 00000000 00000089 00664050 00000000 00668e30 a720c8dc a7943ff4 0065f9b0
  NIP [a7855c88] 0xa7855c88
  LR [a7855c5c] 0xa7855c5c
  --- interrupt: c00
  Instruction dump:
  3884aa88 38630178 48076861 807f0080 48042e45 2f830000 419e0148 3c80c079
  3c60c076 38841be4 386301c0 4801f705 <0fe00000> 3860000b 4bfffe30 3c80c06b
  ---[ end trace fd69b91a8046c2e5 ]---

Here the problem is that by re-enterring an exception handler,
kuap_save_and_lock() is called a second time with this time KUAP
access locked, leading to regs->kuap being overwritten hence
KUAP not being unlocked at exception exit as expected.

Do not call do_IRQ() from timer_interrupt() directly. Instead,
redefine do_IRQ() as a standard function named __do_IRQ(), and
call it from both do_IRQ() and time_interrupt() handlers.

Fixes: 3a96570ffc ("powerpc: convert interrupt handlers to use wrappers")
Cc: stable@vger.kernel.org # v5.12+
Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c17d234f4927d39a1d7100864a8e1145323d33a0.1628611927.git.christophe.leroy@csgroup.eu
2021-08-12 22:21:57 +10:00
Marc Zyngier
2c89965830 powerpc: Bulk conversion to generic_handle_domain_irq()
Wherever possible, replace constructs that match either
generic_handle_irq(irq_find_mapping()) or
generic_handle_irq(irq_linear_revmap()) to a single call to
generic_handle_domain_irq().

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-12 11:39:41 +01:00
Marc Zyngier
1bce542500 powerpc: Bulk conversion to generic_handle_domain_irq()
Wherever possible, replace constructs that match either
generic_handle_irq(irq_find_mapping()) or
generic_handle_irq(irq_linear_revmap()) to a single call to
generic_handle_domain_irq().

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210802162630.2219813-13-maz@kernel.org
2021-08-10 23:15:02 +10:00
Cédric Le Goater
f5af0a9787 KVM: PPC: Book3S HV: XIVE: Add support for automatic save-restore
On P10, the feature doing an automatic "save & restore" of a VCPU
interrupt context is set by default in OPAL. When a VP context is
pulled out, the state of the interrupt registers are saved by the XIVE
interrupt controller under the internal NVP structure representing the
VP. This saves a costly store/load in guest entries and exits.

If OPAL advertises the "save & restore" feature in the device tree,
it should also have set the 'H' bit in the CAM line. Check that when
vCPUs are connected to their ICP in KVM before going any further.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210720134209.256133-3-clg@kaod.org
2021-08-10 23:15:02 +10:00
Cédric Le Goater
b68c6646cc KVM: PPC: Book3S HV: XIVE: Add a 'flags' field
Use it to hold platform specific features. P9 DD2 introduced
single-escalation support. P10 will add others.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210720134209.256133-2-clg@kaod.org
2021-08-10 23:15:01 +10:00
Cédric Le Goater
17df41fec5 powerpc: use IRQF_NO_DEBUG for IPIs
There is no need to use the lockup detector ("noirqdebug") for IPIs.
The ipistorm benchmark measures a ~10% improvement on high systems
when this flag is set.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210719130614.195886-1-clg@kaod.org
2021-08-10 23:15:01 +10:00
Cédric Le Goater
59b2bc18b1 powerpc/xive: Use XIVE domain under xmon and debugfs
The default domain of the PCI/MSIs is not the XIVE domain anymore. To
list the IRQ mappings under XMON and debugfs, query the IRQ data from
the low level XIVE domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-32-clg@kaod.org
2021-08-10 23:15:01 +10:00
Cédric Le Goater
1753081f2d KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts
PCI MSIs now live in an MSI domain but the underlying calls, which
will EOI the interrupt in real mode, need an HW IRQ number mapped in
the XICS IRQ domain. Grab it there.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-31-clg@kaod.org
2021-08-10 23:15:01 +10:00