Commit Graph

25142 Commits

Author SHA1 Message Date
Christophe Leroy
2fd986377d powerpc: Prepare func_desc_t for refactorisation
In preparation of making func_desc_t generic, change the ELFv2
version to a struct containing 'addr' element.

This allows using single helpers common to ELFv1 and ELFv2 and
reduces the amount of #ifdef's

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5c36105e08b27b98450535bff48d71b690c19739.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
0a9c5ae279 powerpc: Remove 'struct ppc64_opd_entry'
'struct ppc64_opd_entry' doesn't belong to uapi/asm/elf.h

It was initially in module_64.c and commit 2d291e9027 ("Fix compile
failure with non modular builds") moved it into asm/elf.h

But it was by mistake added outside of __KERNEL__ section,
therefore commit c3617f7203 ("UAPI: (Scripted) Disintegrate
arch/powerpc/include/asm") moved it to uapi/asm/elf.h

Now that it is not used anymore by the kernel, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c309ccee65ec2e3802df7a7fe761d0a298584809.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
d3e32b997a powerpc: Use 'struct func_desc' instead of 'struct ppc64_opd_entry'
'struct ppc64_opd_entry' is somehow redundant with 'struct func_desc',
the later is more correct/complete as it includes the third
field which is unused.

So use 'struct func_desc' instead of 'struct ppc64_opd_entry'

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/34e76bac6cbe95a63ecd37df69fb7feb93b0ea7c.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:10 +11:00
Christophe Leroy
5b23cb8cc6 powerpc: Move and rename func_descr_t
There are three architectures with function descriptors, try to
have common names for the address they contain in order to
refactor some functions into generic functions later.

powerpc has 'entry'
ia64 has 'ip'
parisc has 'addr'

Vote for 'addr' and update 'func_descr_t' accordingly.

Move it in asm/elf.h to have it at the same place on all
three architectures, remove the typedef which hides its real
type, and change it to a smoother name 'struct func_desc'.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/529b2ba1d001e8f628ef0d30e8044c9b3d0a4921.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:10 +11:00
Christophe Leroy
81df21de8f powerpc: Fix 'sparse' checking on PPC64le
'sparse' is architecture agnostic and knows nothing about ELF ABI
version.

Just like it gets arch and powerpc type and endian from Makefile,
it also need to get _CALL_ELF from there, otherwise it won't set
PPC64_ELF_ABI_v2 macro for PPC64le and won't check the correct code.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ac1312f2451aa558bb2a8806b4d0aa2020f0c176.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:10 +11:00
Vaibhav Jain
bbbca72352 powerpc/papr_scm: Implement initial support for injecting smart errors
Presently PAPR doesn't support injecting smart errors on an
NVDIMM. This makes testing the NVDIMM health reporting functionality
difficult as simulating NVDIMM health related events need a hacked up
qemu version.

To solve this problem this patch proposes simulating certain set of
NVDIMM health related events in papr_scm. Specifically 'fatal' health
state and 'dirty' shutdown state. These error can be injected via the
user-space 'ndctl-inject-smart(1)' command. With the proposed patch and
corresponding ndctl patches following command flow is expected:

$ sudo ndctl list -DH -d nmem0
...
      "health_state":"ok",
      "shutdown_state":"clean",
...
 # inject unsafe shutdown and fatal health error
$ sudo ndctl inject-smart nmem0 -Uf
...
      "health_state":"fatal",
      "shutdown_state":"dirty",
...
 # uninject all errors
$ sudo ndctl inject-smart nmem0 -N
...
      "health_state":"ok",
      "shutdown_state":"clean",
...

The patch adds a new member 'health_bitmap_inject_mask' inside struct
papr_scm_priv which is then bitwise ANDed to the health bitmap fetched from the
hypervisor. The value for 'health_bitmap_inject_mask' is accessible from sysfs
at nmemX/papr/health_bitmap_inject.

A new PDSM named 'SMART_INJECT' is proposed that accepts newly
introduced 'struct nd_papr_pdsm_smart_inject' as payload thats
exchanged between libndctl and papr_scm to indicate the requested
smart-error states.

When the processing the PDSM 'SMART_INJECT', papr_pdsm_smart_inject()
constructs a pair or 'inject_mask' and 'clear_mask' bitmaps from the payload
and bit-blt it to the 'health_bitmap_inject_mask'. This ensures the after being
fetched from the hypervisor, the health_bitmap reflects requested smart-error
states.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124202204.1488346-1-vaibhav@linux.ibm.com
2022-02-16 23:10:47 +11:00
Christophe Leroy
76b372814b powerpc/ftrace: Style cleanup in ftrace_mprofile.S
Add some line breaks to better match the file's style, add
some space after comma and fix a couple of misplaced blanks.

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/973506292d0c7b05c06530c8e11803ce38e5eda2.1644949750.git.christophe.leroy@csgroup.eu
2022-02-16 23:09:47 +11:00
Christophe Leroy
fc75f87337 powerpc/ftrace: Have arch_ftrace_get_regs() return NULL unless FL_SAVE_REGS is set
When FL_SAVE_REGS is not set we get here via ftrace_caller()
which doesn't save all registers.

ftrace_caller() explicitely clears regs.msr, so we can rely
on it to know where we come from. We don't expect MSR register
to be 0 at all when involving ftrace.

Fixes: 40b035efe2 ("powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2f9a7e898c93cc7438ef5ccd47cb9c3a9c5b53ef.1644949750.git.christophe.leroy@csgroup.eu
2022-02-16 23:09:47 +11:00
Christophe Leroy
df45a55788 powerpc/ftrace: Add recursion protection in prepare_ftrace_return()
The function_graph_enter() does not provide any recursion protection.

Add a protection in prepare_ftrace_return() in case
function_graph_enter() calls something that gets
function graph traced.

Fixes: 830213786c ("powerpc/ftrace: directly call of function graph tracer by ftrace caller")
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/74edf2ff0a60e66b0d9225a137100a86a0557032.1644949750.git.christophe.leroy@csgroup.eu
2022-02-16 23:09:47 +11:00
Christophe Leroy
34d8dac807 powerpc/ftrace: Also save r1 in ftrace_caller()
Also save r1 in ftrace_caller()

r1 is needed during unwinding when the function_graph tracer
is active.

Fixes: 830213786c ("powerpc/ftrace: directly call of function graph tracer by ftrace caller")
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ff535e86d3a69376a6d89168511d4e403835f18b.1644949750.git.christophe.leroy@csgroup.eu
2022-02-16 23:09:47 +11:00
Anders Roxell
fe663df782 powerpc/lib/sstep: fix 'ptesync' build error
Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
2.37.90.20220207) the following build error shows up:

  {standard input}: Assembler messages:
  {standard input}:2088: Error: unrecognized opcode: `ptesync'
  make[3]: *** [/builds/linux/scripts/Makefile.build:287: arch/powerpc/lib/sstep.o] Error 1

Add the 'ifdef CONFIG_PPC64' around the 'ptesync' in function
'emulate_update_regs()' to like it is in 'analyse_instr()'. Since it looks like
it got dropped inadvertently by commit 3cdfcbfd32 ("powerpc: Change
analyse_instr so it doesn't modify *regs").

A key detail is that analyse_instr() will never recognise lwsync or
ptesync on 32-bit (because of the existing ifdef), and as a result
emulate_update_regs() should never be called with an op specifying
either of those on 32-bit. So removing them from emulate_update_regs()
should be a nop in terms of runtime behaviour.

Fixes: 3cdfcbfd32 ("powerpc: Change analyse_instr so it doesn't modify *regs")
Cc: stable@vger.kernel.org # v4.14+
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
[mpe: Add last paragraph of change log mentioning analyse_instr() details]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220211005113.1361436-1-anders.roxell@linaro.org
2022-02-15 22:31:35 +11:00
Paul Menzel
cb7356986d powerpc/boot: Add otheros-too-big.bld to .gitignore
Currently, `git status` lists the file as untracked by git, so tell git
to ignore it.

Fixes: aa3bc365ee ("powerpc/ps3: Add check for otheros image size")
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220214065543.198992-1-pmenzel@molgen.mpg.de
2022-02-15 22:29:52 +11:00
Christophe Leroy
38a1756861 powerpc: Don't allow the use of EMIT_BUG_ENTRY with BUGFLAG_WARNING
Warnings in assembly must use EMIT_WARN_ENTRY in order to generate
the necessary entry in exception table.

Check in EMIT_BUG_ENTRY that flags don't include BUGFLAG_WARNING.

This change avoids problems like the one fixed by
commit fd1eaaaaa6 ("powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug
warnings").

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ddcb422102a37eb45f57694c7ef0ec6187964dff.1644742951.git.christophe.leroy@csgroup.eu
2022-02-14 13:06:43 +11:00
Michael Ellerman
5a72345e6a powerpc: Fix STACKTRACE=n build
Our skiroot_defconfig doesn't enable FTRACE, and so doesn't get
STACKTRACE enabled either. That leads to a build failure since commit
1614b2b11f ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
made stacktrace.c build even when STACKTRACE=n.

  arch/powerpc/kernel/stacktrace.c: In function ‘handle_backtrace_ipi’:
  arch/powerpc/kernel/stacktrace.c:171:2: error: implicit declaration of function ‘nmi_cpu_backtrace’
    171 |  nmi_cpu_backtrace(regs);
        |  ^~~~~~~~~~~~~~~~~
  arch/powerpc/kernel/stacktrace.c: In function ‘arch_trigger_cpumask_backtrace’:
  arch/powerpc/kernel/stacktrace.c:226:2: error: implicit declaration of function ‘nmi_trigger_cpumask_backtrace’
    226 |  nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
        |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This happens because our headers haven't defined
arch_trigger_cpumask_backtrace, which causes lib/nmi_backtrace.c not to
build nmi_cpu_backtrace().

The code in question doesn't actually depend on STACKTRACE=y, that was
just added because arch_trigger_cpumask_backtrace() lived in
stacktrace.c for convenience. So drop the dependency on
CONFIG_STACKTRACE, that causes lib/nmi_backtrace.c to build
nmi_cpu_backtrace() etc. and fixes the build.

Fixes: 1614b2b11f ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220212111349.2806972-1-mpe@ellerman.id.au
2022-02-12 22:47:44 +11:00
Aneesh Kumar K.V
2354ad252b powerpc/mm: Update default hugetlb size early
commit: d9c2340052 ("Do not depend on MAX_ORDER when grouping pages by mobility")
introduced pageblock_order which will be used to group pages better.
The kernel now groups pages based on the value of HPAGE_SHIFT. Hence HPAGE_SHIFT
should be set before we call set_pageblock_order.

set_pageblock_order happens early in the boot and default hugetlb page size
should be initialized before that to compute the right pageblock_order value.

Currently, default hugetlbe page size is set via arch_initcalls which happens
late in the boot as shown via the below callstack:

[c000000007383b10] [c000000001289328] hugetlbpage_init+0x2b8/0x2f8
[c000000007383bc0] [c0000000012749e4] do_one_initcall+0x14c/0x320
[c000000007383c90] [c00000000127505c] kernel_init_freeable+0x410/0x4e8
[c000000007383da0] [c000000000012664] kernel_init+0x30/0x15c
[c000000007383e10] [c00000000000cf14] ret_from_kernel_thread+0x5c/0x64

and the pageblock_order initialization is done early during the boot.

[c0000000018bfc80] [c0000000012ae120] set_pageblock_order+0x50/0x64
[c0000000018bfca0] [c0000000012b3d94] sparse_init+0x188/0x268
[c0000000018bfd60] [c000000001288bfc] initmem_init+0x28c/0x328
[c0000000018bfe50] [c00000000127b370] setup_arch+0x410/0x480
[c0000000018bfed0] [c00000000127401c] start_kernel+0xb8/0x934
[c0000000018bff90] [c00000000000d984] start_here_common+0x1c/0x98

delaying default hugetlb page size initialization implies the kernel will
initialize pageblock_order to (MAX_ORDER - 1) which is not an optimal
value for mobility grouping. IIUC we always had this issue. But it was not
a problem for hash translation mode because (MAX_ORDER - 1) is the same as
HUGETLB_PAGE_ORDER (8) in the case of hash (16MB). With radix,
HUGETLB_PAGE_ORDER will be 5 (2M size) and hence pageblock_order should be
5 instead of 8.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220211065215.101767-1-aneesh.kumar@linux.ibm.com
2022-02-12 22:47:44 +11:00
Nathan Lynch
92e6dc257b powerpc/pseries: make pseries_devicetree_update() static
pseries_devicetree_update() has only one call site, in the same file in
which it is defined. Make it static.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220207221247.354454-1-nathanl@linux.ibm.com
2022-02-12 22:47:44 +11:00
Christophe Leroy
692b21d780 powerpc/vdso: Move cvdso_call macro into gettimeofday.S
Now that gettimeofday.S is unique, move cvdso_call macro
into that file which is the only user.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/72720359d4c58e3a3b96dd74952741225faac3de.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:44 +11:00
Christophe Leroy
9b97bea900 powerpc/vdso: Remove cvdso_call_time macro
cvdso_call_time macro is very similar to cvdso_call macro.

Add a call_time argument to cvdso_call which is 0 by default
and set to 1 when using cvdso_call to call __c_kernel_time().

Return returned value as is with CR[SO] cleared when it is used
for time().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/837a260ad86fc1ce297a562c2117fd69be5f7b5c.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
fd1feade75 powerpc/vdso: Merge vdso64 and vdso32 into a single directory
merge vdso64 into vdso32 and rename it vdso.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4dbe05cc130f6a0858d09ac72e436c373cb08b70.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
d88378d8d2 powerpc/vdso: Rework VDSO32 makefile to add a prefix to object files
In order to merge vdso32 and vdso64 build in following patch, rework
Makefile is order to add -32 suffix to VDSO32 object files.

Also change sigtramp.S to sigtramp32.S as VDSO64 sigtramp.S is too
different to be squashed into VDSO32 sigtramp.S at the first place.

gen_vdso_offsets.sh also becomes gen_vdso32_offsets.sh

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0c421b704a57b228e75a891512568339c53667ad.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
f061fb03ee powerpc/vdso: augment VDSO32 functions to support 64 bits build
VDSO64 cacheflush.S datapage.S gettimeofday.S and vgettimeofday.c
are very similar to their VDSO32 counterpart.

VDSO32 counterpart is already more complete than the VDSO64 version
as it supports both PPC32 vdso and 32 bits VDSO for PPC64.

Use compat macros wherever necessary in PPC32 files
so that they can also be used to build VDSO64.

vdso64/note.S is already a link to vdso32/note.S so
no change is required.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c2cbb8f046b7efc251053521dc39b752795e26b7.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
6836f09903 powerpc/lib/sstep: use truncate_if_32bit()
Use truncate_if_32bit() when possible instead of open coding.

truncate_if_32bit() returns an unsigned long, so don't use it when
a signed value is expected.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7e1c07123f13156d4a27991a2e2694fb584bc068.1642752375.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
7c3bba9199 powerpc/lib/sstep: Remove unneeded #ifdef __powerpc64__
MSR_64BIT is always defined, no need to hide code using MSR_64BIT
inside an #ifdef __powerpc64__

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ee61b693bc7e046eed1abb7a34909eb4878a9442.1642752375.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
67484e0de9 powerpc/lib/sstep: Use l1_dcache_bytes() instead of opencoding
Don't opencode dcache size retrieval based on whether that's ppc32 or ppc64.

Use l1_dcache_bytes()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6c608fd4795e2d8ea1a0a449405a0087f76d8bb3.1642752375.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
9d44d1bd93 powerpc: Use the newly added is_tsk_32bit_task() macro
Two places deserve using the macro is_tsk_32bit_task() added by
commit 252745240b ("powerpc/audit: Fix syscall_get_arch()")

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7304a889dbe885aefad8a8333673c81ee4b8f7a6.1642751874.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
0670010f3b powerpc/32s: Enable STRICT_MODULE_RWX for the 603 core
The book3s/32 MMU doesn't support per page execution protection and
doesn't support RO protection for kernel pages.

However, on the 603 which implements software loaded TLBs, execution
protection is honored by the TLB Miss handler which doesn't load
Instruction TLB for non executable pages. And RO protection is
honored by clearing the C bit for RO pages, leading to DSI.

So on the 603, STRICT_MODULE_RWX is possible without much effort.
Don't disable STRICT_MODULE_RWX on book3s/32 and print a warning
in case STRICT_MODULE_RWX has been selected and the platform has
a Hardware HASH MMU.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1e6162f334167e75f1140082932e3a354b16daba.1642413973.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
a8936569a0 powerpc/bpf: Always reallocate BPF_REG_5, BPF_REG_AX and TMP_REG when possible
BPF_REG_5, BPF_REG_AX and TMP_REG are mapped on non volatile registers
because there are not enough volatile registers, but they don't need
to be preserved on function calls.

So when some volatile registers become available, those registers can
always be reallocated regardless of whether SEEN_FUNC is set or not.

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b04c246874b716911139c04bc004b3b14eed07ef.1641817763.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
f222ab83df powerpc: Add set_memory_{p/np}() and remove set_memory_attr()
set_memory_attr() was implemented by commit 4d1755b6a7 ("powerpc/mm:
implement set_memory_attr()") because the set_memory_xx() couldn't
be used at that time to modify memory "on the fly" as explained it
the commit.

But set_memory_attr() uses set_pte_at() which leads to warnings when
CONFIG_DEBUG_VM is selected, because set_pte_at() is unexpected for
updating existing page table entries.

The check could be bypassed by using __set_pte_at() instead,
as it was the case before commit c988cfd38e ("powerpc/32:
use set_memory_attr()") but since commit 9f7853d760 ("powerpc/mm:
Fix set_memory_*() against concurrent accesses") it is now possible
to use set_memory_xx() functions to update page table entries
"on the fly" because the update is now atomic.

For DEBUG_PAGEALLOC we need to clear and set back _PAGE_PRESENT.
Add set_memory_np() and set_memory_p() for that.

Replace all uses of set_memory_attr() by the relevant set_memory_xx()
and remove set_memory_attr().

Fixes: c988cfd38e ("powerpc/32: use set_memory_attr()")
Cc: stable@vger.kernel.org
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Depends-on: 9f7853d760 ("powerpc/mm: Fix set_memory_*() against concurrent accesses")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cda2b44b55c96f9ac69fa92e68c01084ec9495c5.1640344012.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
a4c182ecf3 powerpc/set_memory: Avoid spinlock recursion in change_page_attr()
Commit 1f9ad21c3b ("powerpc/mm: Implement set_memory() routines")
included a spin_lock() to change_page_attr() in order to
safely perform the three step operations. But then
commit 9f7853d760 ("powerpc/mm: Fix set_memory_*() against
concurrent accesses") modify it to use pte_update() and do
the operation safely against concurrent access.

In the meantime, Maxime reported some spinlock recursion.

[   15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217
[   15.357540]  lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0
[   15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523
[   15.373350] Workqueue: events do_free_init
[   15.377615] Call Trace:
[   15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable)
[   15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4
[   15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310
[   15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0
[   15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8
[   15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94
[   15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310
[   15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134
[   15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8
[   15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c
[   15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8
[   15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94
[   15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8
[   15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8
[   15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210
[   15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c

Remove the read / modify / write sequence to make the operation atomic
and remove the spin_lock() in change_page_attr().

To do the operation atomically, we can't use pte modification helpers
anymore. Because all platforms have different combination of bits, it
is not easy to use those bits directly. But all have the
_PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare
two sets to know which bits are set or cleared.

For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you
know which bit gets cleared and which bit get set when changing exec
permission.

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/all/20211212112152.GA27070@sakura/
Link: https://lore.kernel.org/r/43c3c76a1175ae6dc1a3d3b5c3f7ecb48f683eea.1640344012.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
4ee83a2cfb powerpc/ftrace: Remove ftrace_32.S
Functions in ftrace_32.S are common with PPC64.

Reuse the ones defined for PPC64 with slight modification
when required.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Squash in fixup diff from Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5e837fc190504c4ef834272e70d60ae33f175d49.1640017960.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:28 +11:00
Ard Biesheuvel
297565aa22 lib/xor: make xor prototypes more friendly to compiler vectorization
Modern compilers are perfectly capable of extracting parallelism from
the XOR routines, provided that the prototypes reflect the nature of the
input accurately, in particular, the fact that the input vectors are
expected not to overlap. This is not documented explicitly, but is
implied by the interchangeability of the various C routines, some of
which use temporary variables while others don't: this means that these
routines only behave identically for non-overlapping inputs.

So let's decorate these input vectors with the __restrict modifier,
which informs the compiler that there is no overlap. While at it, make
the input-only vectors pointer-to-const as well.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/563
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-02-11 20:39:39 +11:00
Jakub Kicinski
1127170d45 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-02-09

We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).

The main changes are:

1) Add custom BPF allocator for JITs that pack multiple programs into a huge
   page to reduce iTLB pressure, from Song Liu.

2) Add __user tagging support in vmlinux BTF and utilize it from BPF
   verifier when generating loads, from Yonghong Song.

3) Add per-socket fast path check guarding from cgroup/BPF overhead when
   used by only some sockets, from Pavel Begunkov.

4) Continued libbpf deprecation work of APIs/features and removal of their
   usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
   and various others.

5) Improve BPF instruction set documentation by adding byte swap
   instructions and cleaning up load/store section, from Christoph Hellwig.

6) Switch BPF preload infra to light skeleton and remove libbpf dependency
   from it, from Alexei Starovoitov.

7) Fix architecture-agnostic macros in libbpf for accessing syscall
   arguments from BPF progs for non-x86 architectures,
   from Ilya Leoshkevich.

8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
   of 16-bit field with anonymous zero padding, from Jakub Sitnicki.

9) Add new bpf_copy_from_user_task() helper to read memory from a different
   task than current. Add ability to create sleepable BPF iterator progs,
   from Kenny Yu.

10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
    utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.

11) Generate temporary netns names for BPF selftests to avoid naming
    collisions, from Hangbin Liu.

12) Implement bpf_core_types_are_compat() with limited recursion for
    in-kernel usage, from Matteo Croce.

13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
    to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.

14) Misc minor fixes to libbpf and selftests from various folks.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
  selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
  bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
  libbpf: Fix compilation warning due to mismatched printf format
  selftests/bpf: Test BPF_KPROBE_SYSCALL macro
  libbpf: Add BPF_KPROBE_SYSCALL macro
  libbpf: Fix accessing the first syscall argument on s390
  libbpf: Fix accessing the first syscall argument on arm64
  libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
  selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
  libbpf: Fix accessing syscall arguments on riscv
  libbpf: Fix riscv register names
  libbpf: Fix accessing syscall arguments on powerpc
  selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
  libbpf: Add PT_REGS_SYSCALL_REGS macro
  selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
  bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
  bpf: Fix leftover header->pages in sparc and powerpc code.
  libbpf: Fix signedness bug in btf_dump_array_data()
  selftests/bpf: Do not export subtest as standalone test
  bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
  ...
====================

Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09 18:40:56 -08:00
Song Liu
0f350231b5 bpf: Fix leftover header->pages in sparc and powerpc code.
Replace header->pages * PAGE_SIZE with new header->size.

Fixes: ed2d9e1a26 ("bpf: Use size instead of pages in bpf_binary_header")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220208220509.4180389-2-song@kernel.org
2022-02-08 14:52:05 -08:00
Christophe Leroy
41315494be powerpc/ftrace: Prepare ftrace_64_mprofile.S for reuse by PPC32
PPC64 mprofile versions and PPC32 are very similar.

Modify PPC64 version so that if can be reused for PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/82a732915dc71ee766e31809350939331944006d.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
830213786c powerpc/ftrace: directly call of function graph tracer by ftrace caller
Modify function graph tracer to be handled directly by the standard
ftrace caller.

This is made possible as powerpc now supports
CONFIG_DYNAMIC_FTRACE_WITH_ARGS.

This change simplifies the call of function graph ftrace.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/04d196585ff81bde06a000bd9c633a33a5b21130.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
0c81ed5ed4 powerpc/ftrace: Refactor ftrace_{en/dis}able_ftrace_graph_caller
ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() have common code.

They will have even more common code after following patch.

Refactor into a single ftrace_modify_ftrace_graph_caller() function.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f37785a531f1a8f201e1b3da45997a5c77e9d820.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
40b035efe2 powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS
Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call
of livepatching.

Also note that powerpc being the last one to convert to
CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove
klp_arch_set_pc() on all architectures.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5831f711a778fcd6eb51eb5898f1faae4378b35b.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
c75388a8ce powerpc/ftrace: Prepare PPC64's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS
In order to implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS, change ftrace_caller()
to handle LIVEPATCH the same way as frace_caller_regs().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/850817333cc76593699032e8e9a70d8c36e1af1e.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
d95bf254be powerpc/ftrace: Prepare PPC32's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS
In order to implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS, change
ftrace_caller() stack layout to match struct pt_regs.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/da9734eba504998fb914aca12131c9f6bf6120a8.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
7bdb478c1d powerpc/ftrace: Simplify PPC32's return_to_handler()
return_to_handler() was copied from PPC64. For PPC32 it
just needs to save r3 and r4, and doesn't require any nop
after the bl.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aab39b77b34fb2c4ed08ed01c547b6ed13643788.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
7875bc9b07 powerpc/ftrace: Don't save again LR in ftrace_regs_caller() on PPC32
PPC32 mcount() caller already saves LR on stack,
no need to save it again.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eadcfc770b4f1e35535ffb85e28e858a2c31dec4.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
a4520b2527 powerpc/ftrace: Add support for livepatch to PPC32
PPC64 needs some special logic to properly set up the TOC.
See commit 85baa09549 ("powerpc/livepatch: Add live patching support
on ppc64le") for details.

PPC32 doesn't have TOC so it doesn't need that logic, so adding
LIVEPATCH support is straight forward.

Add CONFIG_LIVEPATCH_64 and move livepatch stack logic into that item.

Livepatch sample modules all work.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/63cb094125b6a6038c65eeac2abaabbabe63addd.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
0c850965d6 powerpc/module_32: Fix livepatching for RO modules
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.

R_PPC_ADDR16_LO, R_PPC_ADDR16_HI, R_PPC_ADDR16_HA and R_PPC_REL24 are
the types generated by the kpatch-build userspace tool or klp-convert
kernel tree observed applying a relocation to a post-init module.

Use patch_instruction() to patch those relocations.

Commit 8734b41b3e ("powerpc/module_64: Fix livepatching for
RO modules") did similar change in module_64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d5697157cb7dba3927e19aa17c915a83bc550bb2.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
27e21e8f12 powerpc/32: Remove _ENTRY() macro
_ENTRY() is now redundant with _GLOBAL(). Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/62a35f8dde2bb74c8d0d7a5430cce07a5a3a6fb6.1638273868.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
1231816373 powerpc/32: Remove remaining .stabs annotations
STABS debug format has been superseded long time ago by DWARF.

Remove the few remaining .stabs annotations from old 32 bits code.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/68932ec2ba6b868d35006b96e90f0890f3da3c05.1638273868.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
66ada29078 powerpc/corenet: Change criteria to set MPIC_ENABLE_COREINT
Don't use ppc_md function comparison.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c8ef82ee5f2713f4c36eb5d2d49b0905c7472801.1630667612.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
fae65a9ac8 powerpc/mpc86xx_hpcn: Remove obsolete statement
Comment says "Delete this in 2.6.27".

Do so now.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a47bb6a69c68156bc2d555152dab5a23733856b7.1630667612.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:09 +11:00
Christophe Leroy
e6d03ac156 powerpc/machdep: Move sys_ctrler_t definition into pmac_feature.h
sys_ctrler_t definitions are tied to pmac. Move it into pmac_feature.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Move to pmac_feature.h to fix some build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7dd5ead4bbca749e2da089ff6fe2b1878d6bf40e.1630667612.git.christophe.leroy@csgroup.eu
2022-02-07 21:02:20 +11:00
Christophe Leroy
9bb162fa26 powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
Allthough kernel text is always mapped with BATs, we still have
inittext mapped with pages, so TLB miss handling is required
when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set.

The final solution should be to set a BAT that also maps inittext
but that BAT then needs to be cleared at end of init, and it will
require more changes to be able to do it properly.

As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big
deal so let's fix it simply for now to enable easy stable application.

Fixes: 035b19a15a ("powerpc/32s: Always map kernel text and rodata with BATs")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aea33b4813a26bdb9378b5f273f00bd5d4abe240.1638857364.git.christophe.leroy@csgroup.eu
2022-02-07 17:18:47 +11:00
Christophe Leroy
d6a6c725a2 powerpc/machdep: Remove CONFIG_PPC_HAS_FEATURE_CALLS
Last user was removed by commit 7bbd827750 ("[PATCH] ppc64: very
basic desktop g5 sound support").

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/803779fffb4ee0801746b2173d37cea3b273f821.1630667612.git.christophe.leroy@csgroup.eu
2022-02-07 16:43:35 +11:00
Sourabh Jain
7c5ed82b80 powerpc: Set crashkernel offset to mid of RMA region
On large config LPARs (having 192 and more cores), Linux fails to boot
due to insufficient memory in the first memblock. It is due to the
memory reservation for the crash kernel which starts at 128MB offset of
the first memblock. This memory reservation for the crash kernel doesn't
leave enough space in the first memblock to accommodate other essential
system resources.

The crash kernel start address was set to 128MB offset by default to
ensure that the crash kernel get some memory below the RMA region which
is used to be of size 256MB. But given that the RMA region size can be
512MB or more, setting the crash kernel offset to mid of RMA size will
leave enough space for the kernel to allocate memory for other system
resources.

Since the above crash kernel offset change is only applicable to the LPAR
platform, the LPAR feature detection is pushed before the crash kernel
reservation. The rest of LPAR specific initialization will still
be done during pseries_probe_fw_features as usual.

This patch is dependent on changes to paca allocation for boot CPU. It
expect boot CPU to discover 1T segment support which is introduced by
the patch posted here:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html

Reported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com
2022-02-07 15:26:12 +11:00
Christophe Leroy
4291d085b0 powerpc/32s: Make pte_update() non atomic on 603 core
On 603 core, TLB miss handler don't do any change to the
page tables so pte_update() doesn't need to be atomic.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cc89d3c11fc9c742d0df3454a657a3a00be24046.1643538554.git.christophe.leroy@csgroup.eu
2022-02-03 22:57:19 +11:00
Christophe Leroy
535bda36db powerpc/nohash: Remove pte_same()
arch/powerpc/include/asm/nohash/{32/64}/pgtable.h has

	#define __HAVE_ARCH_PTE_SAME
	#define pte_same(A,B)      ((pte_val(A) ^ pte_val(B)) == 0)

include/linux/pgtable.h has

	#ifndef __HAVE_ARCH_PTE_SAME
	static inline int pte_same(pte_t pte_a, pte_t pte_b)
	{
		return pte_val(pte_a) == pte_val(pte_b);
	}
	#endif

Remove the powerpc version which is similar to the generic one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/83c97bd58a3596ef1b0ff28b1e41fd492d005520.1643616989.git.christophe.leroy@csgroup.eu
2022-02-03 22:57:00 +11:00
Christophe Leroy
4634bf4455 powerpc/603: Clear C bit when PTE is read only
On book3s/32 MMU, PP bits don't offer kernel RO protection,
kernel pages are always RW.

However, on the 603 a page fault is always generated when the
C bit (change bit = dirty bit) is not set.

Enforce kernel RO protection by clearing C bit in TLB miss
handler when the page doesn't have _PAGE_RW flag.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bbb13848ff0100a76ee9ea95118058c30ae95f2c.1643613343.git.christophe.leroy@csgroup.eu
2022-02-03 22:56:44 +11:00
Christophe Leroy
9872cbfb45 powerpc/603: Remove outdated comment
Since commit 84de6ab0e9 ("powerpc/603: don't handle PAGE_ACCESSED
in TLB miss handlers.") page table is not updated anymore by
TLB miss handlers.

Remove the comment.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/38b1ffefd2146fa56bf8aa605d476ad9736bbb37.1643613296.git.christophe.leroy@csgroup.eu
2022-02-03 22:38:13 +11:00
Chen Jingwen
dd75080aa8 powerpc/kasan: Fix early region not updated correctly
The shadow's page table is not updated when PTE_RPN_SHIFT is 24
and PAGE_SHIFT is 12. It not only causes false positives but
also false negative as shown the following text.

Fix it by bringing the logic of kasan_early_shadow_page_entry here.

1. False Positive:
==================================================================
BUG: KASAN: vmalloc-out-of-bounds in pcpu_alloc+0x508/0xa50
Write of size 16 at addr f57f3be0 by task swapper/0/1

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-12267-gdebe436e77c7 #1
Call Trace:
[c80d1c20] [c07fe7b8] dump_stack_lvl+0x4c/0x6c (unreliable)
[c80d1c40] [c02ff668] print_address_description.constprop.0+0x88/0x300
[c80d1c70] [c02ff45c] kasan_report+0x1ec/0x200
[c80d1cb0] [c0300b20] kasan_check_range+0x160/0x2f0
[c80d1cc0] [c03018a4] memset+0x34/0x90
[c80d1ce0] [c0280108] pcpu_alloc+0x508/0xa50
[c80d1d40] [c02fd7bc] __kmem_cache_create+0xfc/0x570
[c80d1d70] [c0283d64] kmem_cache_create_usercopy+0x274/0x3e0
[c80d1db0] [c2036580] init_sd+0xc4/0x1d0
[c80d1de0] [c00044a0] do_one_initcall+0xc0/0x33c
[c80d1eb0] [c2001624] kernel_init_freeable+0x2c8/0x384
[c80d1ef0] [c0004b14] kernel_init+0x24/0x170
[c80d1f10] [c001b26c] ret_from_kernel_thread+0x5c/0x64

Memory state around the buggy address:
 f57f3a80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 f57f3b00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
>f57f3b80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
                                               ^
 f57f3c00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 f57f3c80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
==================================================================

2. False Negative (with KASAN tests):
==================================================================
Before fix:
    ok 45 - kmalloc_double_kzfree
    # vmalloc_oob: EXPECTATION FAILED at lib/test_kasan.c:1039
    KASAN failure expected in "((volatile char *)area)[3100]", but none occurred
    not ok 46 - vmalloc_oob
    not ok 1 - kasan

==================================================================
After fix:
    ok 1 - kasan

Fixes: cbd18991e2 ("powerpc/mm: Fix an Oops in kasan_mmu_init()")
Cc: stable@vger.kernel.org # 5.4.x
Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211229035226.59159-1-chenjingwen6@huawei.com
2022-02-03 22:37:44 +11:00
Christophe JAILLET
e414e2938e powerpc/xive: Add some error handling code to 'xive_spapr_init()'
'xive_irq_bitmap_add()' can return -ENOMEM.
In this case, we should free the memory already allocated and return
'false' to the caller.

Also add an error path which undoes the 'tima = ioremap(...)'

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/564998101804886b151235c8a9f93020923bfd2c.1643718324.git.christophe.jaillet@wanadoo.fr
2022-02-03 22:36:59 +11:00
Athira Rajeev
0198322379 powerpc/perf: Don't use perf_hw_context for trace IMC PMU
Trace IMC (In-Memory collection counters) in powerpc is useful for
application level profiling.

For trace_imc, presently task context (task_ctx_nr) is set to
perf_hw_context. But perf_hw_context should only be used for CPU PMU.
See commit 2665784850 ("perf/core: Verify we have a single
perf_hw_context PMU").

So for trace_imc, even though it is per thread PMU, it is preferred to
use sw_context in order to be able to do application level monitoring.
Hence change the task_ctx_nr to use perf_sw_context.

Fixes: 012ae24484 ("powerpc/perf: Trace imc PMU functions")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Update subject & incorporate notes into change log, reflow comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220202041837.65968-1-atrajeev@linux.vnet.ibm.com
2022-02-03 22:32:54 +11:00
Wedson Almeida Filho
d4be60fe66 powerpc/module_64: use module_init_section instead of patching names
Without this patch, module init sections are disabled by patching their
names in arch-specific code when they're loaded (which prevents code in
layout_sections from finding init sections). This patch uses the new
arch-specific module_init_section instead.

This allows modules that have .init_array sections to have the
initialisers properly called (on load, before init). Without this patch,
the initialisers are not called because .init_array is renamed to
_init_array, and thus isn't found by code in find_module_sections().

Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220202055123.2144842-1-wedsonaf@google.com
2022-02-03 22:20:37 +11:00
Julia Lawall
925f76c557 powerpc/spufs: adjust list element pointer type
Other uses of &gang->aff_list_head, eg in spufs_assert_affinity, indicate
that the list elements have type spu_context, not spu as used here.  Change
the type of tmp accordingly.

This has no impact on the execution, because tmp is not used in the body of
the loop.

Fixes: c5fc8d2a92 ("[CELL] cell: add placement computation for scheduling of affinity contexts")
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1588929176-28527-1-git-send-email-Julia.Lawall@inria.fr
2022-02-03 21:40:32 +11:00
Bhaskar Chowdhury
a1c4140933 powerpc/epapr: Fix parmeters typo
s/parmeters/parameters/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210320213932.22697-1-unixbhaskar@gmail.com
2022-02-03 21:35:56 +11:00
Fabiano Rosas
b53c861059 powerpc: Fix debug print in smp_setup_cpu_maps
When figuring out the number of threads, the debug message prints "1
thread" for the first iteration of the loop, instead of the actual
number of threads calculated from the length of the
"ibm,ppc-interrupt-server#s" property.

  * /cpus/PowerPC,POWER8@20...
    ibm,ppc-interrupt-server#s -> 1 threads <--- WRONG
    thread 0 -> cpu 0 (hard id 32)
    thread 1 -> cpu 1 (hard id 33)
    thread 2 -> cpu 2 (hard id 34)
    thread 3 -> cpu 3 (hard id 35)
    thread 4 -> cpu 4 (hard id 36)
    thread 5 -> cpu 5 (hard id 37)
    thread 6 -> cpu 6 (hard id 38)
    thread 7 -> cpu 7 (hard id 39)
  * /cpus/PowerPC,POWER8@28...
    ibm,ppc-interrupt-server#s -> 8 threads
    thread 0 -> cpu 8 (hard id 40)
    thread 1 -> cpu 9 (hard id 41)
    thread 2 -> cpu 10 (hard id 42)
    thread 3 -> cpu 11 (hard id 43)
    thread 4 -> cpu 12 (hard id 44)
    thread 5 -> cpu 13 (hard id 45)
    thread 6 -> cpu 14 (hard id 46)
    thread 7 -> cpu 15 (hard id 47)
(...)

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210120181847.952106-1-farosas@linux.ibm.com
2022-02-03 20:24:59 +11:00
Fabiano Rosas
4feb74aa64 KVM: PPC: Decrement module refcount if init_vm fails
We increment the reference count for KVM-HV/PR before the call to
kvmppc_core_init_vm. If that function fails we need to decrement the
refcount.

Also remove the check on kvm_ops->owner because try_module_get can
handle a NULL module.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125155735.1018683-5-farosas@linux.ibm.com
2022-02-03 16:50:44 +11:00
Fabiano Rosas
175be7e580 KVM: PPC: Book3S HV: Free allocated memory if module init fails
The module's exit function is not called when the init fails, we need
to do cleanup before returning.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125155735.1018683-4-farosas@linux.ibm.com
2022-02-03 16:50:44 +11:00
Fabiano Rosas
c5d0d77b45 KVM: PPC: Book3S HV: Delay setting of kvm ops
Delay the setting of kvm_hv_ops until after all init code has
completed. This avoids leaving the ops still accessible if the init
fails.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125155735.1018683-3-farosas@linux.ibm.com
2022-02-03 16:50:44 +11:00
Fabiano Rosas
69ab6ac380 KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
The return of the function is being shadowed by the call to
kvmppc_uvmem_init.

Fixes: ca9f494267 ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125155735.1018683-2-farosas@linux.ibm.com
2022-02-03 16:50:44 +11:00
Michael Ellerman
961f649fb3 powerpc/ptdump: Fix sparse warning in hashpagetable.c
As reported by sparse:

  arch/powerpc/mm/ptdump/hashpagetable.c:264:29: warning: restricted __be64 degrades to integer
  arch/powerpc/mm/ptdump/hashpagetable.c:265:49: warning: restricted __be64 degrades to integer
  arch/powerpc/mm/ptdump/hashpagetable.c:267:36: warning: incorrect type in assignment (different base types)
  arch/powerpc/mm/ptdump/hashpagetable.c:267:36:    expected unsigned long long [usertype]
  arch/powerpc/mm/ptdump/hashpagetable.c:267:36:    got restricted __be64 [usertype] v
  arch/powerpc/mm/ptdump/hashpagetable.c:268:36: warning: incorrect type in assignment (different base types)
  arch/powerpc/mm/ptdump/hashpagetable.c:268:36:    expected unsigned long long [usertype]
  arch/powerpc/mm/ptdump/hashpagetable.c:268:36:    got restricted __be64 [usertype] r

The values returned by plpar_pte_read_4() are CPU endian, not __be64, so
assigning them to struct hash_pte confuses sparse. As a minimal fix open
code a struct to hold the values with CPU endian types.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220202053039.691917-1-mpe@ellerman.id.au
2022-02-02 20:32:11 +11:00
Michael Ellerman
2e7f1e2b30 powerpc/64: Move paca allocation later in boot
Mahesh & Sourabh identified two problems[1][2] with ppc64_bolted_size()
and paca allocation.

The first is that on a Radix capable machine but with "disable_radix" on
the command line, there is a window during early boot where
early_radix_enabled() is true, even though it will later become false.

  early_init_devtree:                       <- early_radix_enabled() = false
    early_init_dt_scan_cpus:                <- early_radix_enabled() = false
        ...
        check_cpu_pa_features:              <- early_radix_enabled() = false
        ...                               ^ <- early_radix_enabled() = TRUE
        allocate_paca:                    | <- early_radix_enabled() = TRUE
            ...                           |
            ppc64_bolted_size:            | <- early_radix_enabled() = TRUE
                if (early_radix_enabled())| <- early_radix_enabled() = TRUE
                    return ULONG_MAX;     |
        ...                               |
    ...                                   | <- early_radix_enabled() = TRUE
    ...                                   | <- early_radix_enabled() = TRUE
    mmu_early_init_devtree()              V
    ...                                     <- early_radix_enabled() = false

This causes ppc64_bolted_size() to return ULONG_MAX for the boot CPU's
paca allocation, even though later it will return a different value.
This is not currently a bug because the paca allocation is also limited
by the RMA size, but that is very fragile.

The second issue is that when using the Hash MMU, when we call
ppc64_bolted_size() for the boot CPU's paca allocation, we have not yet
detected whether 1T segments are available. That causes
ppc64_bolted_size() to return 256MB, even if the machine can actually
support up to 1T. This is usually OK, we generally have space below
256MB for one paca, but for a kdump kernel placed above 256MB it causes
the boot to fail.

At boot we cannot discover all the features of the machine
instantaneously, so there will always be some periods where we have
incomplete knowledge of the system. However both the above problems stem
from the fact that we allocate the boot CPU's paca (and paca pointers
array) before we decide which MMU we are using, or discover its exact
features.

Moving the paca allocation slightly later still can solve both the
issues described above, and means for a normal boot we don't do any
permanent allocations until after we've discovered the MMU.

Note that although we move the boot CPU's paca allocation later, we
still have a temporary paca (boot_paca) accessible via r13, so code that
does read only access to paca fields is safe. The only risk is that some
code writes to the boot_paca, and that write will then be lost when we
switch away from the boot_paca later in early_setup().

The additional code that runs before the paca allocation is primarily
mmu_early_init_devtree(), which is scanning the device tree and
populating globals and cur_cpu_spec with MMU related flags. I do not see
any additional code that writes to paca fields.

[1]: https://lore.kernel.org/r/20211018084434.217772-2-sourabhjain@linux.ibm.com
[2]: https://lore.kernel.org/r/20211018084434.217772-3-sourabhjain@linux.ibm.com

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124130544.408675-1-mpe@ellerman.id.au
2022-02-02 20:32:11 +11:00
Maxim Kiselev
5ebb747492 powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch
On board rev A, the network interface labels for the switch ports
written on the front panel are different than on rev B and later.

This patch fixes network interface names for the switch ports according
to labels that are written on the front panel of the board rev B.
They start from ETH3 and end at ETH10.

This patch also introduces a separate device tree for rev A.
The main device tree is supposed to cover rev B and later.

Fixes: e69eb0824d ("powerpc: dts: t1040rdb: add ports for Seville Ethernet switch")
Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220121091447.3412907-1-bigunclemax@gmail.com
2022-02-02 20:32:10 +11:00
Laurent Dufour
eddaa9a402 powerpc/pseries: read the lpar name from the firmware
The LPAR name may be changed after the LPAR has been started in the HMC.
In that case lparstat command is not reporting the updated value because
it reads it from the device tree which is read at boot time.

However this value could be read from RTAS.

Adding this value in the /proc/powerpc/lparcfg output allows to read the
updated value.

However the hypervisor, like Qemu/KVM, may not support this RTAS
parameter. In that case the value reported in lparcfg is read from the
device tree and so is not updated accordingly.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
[mpe: Drop doc-comment syntax, change RTAS/DT to lower case, use of_root
      to fix missing of_node_put(), use of_property_read_string()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220106161339.74656-1-ldufour@linux.ibm.com
2022-02-02 20:32:10 +11:00
Jason Wang
8e0f353a44 powerpc/kvm: no need to initialise statics to 0
Static variables do not need to be initialised to 0, because compiler
will initialise all uninitialised statics to 0. Thus, remove the
unneeded initialization.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211220030243.603435-1-wangborong@cdjrlc.com
2022-02-02 20:31:25 +11:00
Alexey Kardashevskiy
faf01aef05 KVM: PPC: Merge powerpc's debugfs entry content into generic entry
At the moment KVM on PPC creates 4 types of entries under the kvm debugfs:
1) "%pid-%fd" per a KVM instance (for all platforms);
2) "vm%pid" (for PPC Book3s HV KVM);
3) "vm%u_vcpu%u_timing" (for PPC Book3e KVM);
4) "kvm-xive-%p" (for XIVE PPC Book3s KVM, the same for XICS);

The problem with this is that multiple VMs per process is not allowed for
2) and 3) which makes it possible for userspace to trigger errors when
creating duplicated debugfs entries.

This merges all these into 1).

This defines kvm_arch_create_kvm_debugfs() similar to
kvm_arch_create_vcpu_debugfs().

This defines 2 hooks in kvmppc_ops that allow specific KVM implementations
add necessary entries, this adds the _e500 suffix to
kvmppc_create_vcpu_debugfs_e500() to make it clear what platform it is for.

This makes use of already existing kvm_arch_create_vcpu_debugfs() on PPC.

This removes no more used debugfs_dir pointers from PPC kvm_arch structs.

This stops removing vcpu entries as once created vcpus stay around
for the entire life of a VM and removed when the KVM instance is closed,
see commit d56f5136b0 ("KVM: let kvm_destroy_vm_debugfs clean up vCPU
debugfs directories").

Suggested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220111005404.162219-1-aik@ozlabs.ru
2022-02-02 20:30:26 +11:00
Thierry Reding
d5342fdd16 powerpc: dts: Fix some I2C unit addresses
The unit-address for the Maxim MAX1237 ADCs on XPedite5200 boards don't
match the value in the "reg" property and cause a DTC warning.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211220134036.683309-1-thierry.reding@gmail.com
2022-01-31 13:45:24 +11:00
Maxim Kiselev
17846485df powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
T1040RDB has two RTL8211E-VB phys which requires setting
of internal delays for correct work.

Changing the phy-connection-type property to `rgmii-id`
will fix this issue.

Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211230151123.1258321-1-bigunclemax@gmail.com
2022-01-31 13:45:24 +11:00
Tobias Waldekranz
f529edd1b6 powerpc/e500/qemu-e500: allow core to idle without waiting
This means an idle guest won't needlessly consume an entire core on
the host, waiting for work to show up.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220112112459.1033754-1-troglobit@gmail.com
2022-01-31 13:45:24 +11:00
Michal Suchanek
b2a6f60435 powerpc: add link stack flush mitigation status in debugfs.
The link stack flush status is not visible in debugfs. It can be enabled
even when count cache flush is disabled. Add separate file for its
status.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
[mpe: Update for change to link_stack_flush_type]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191127220959.6208-1-msuchanek@suse.de
2022-01-31 13:45:23 +11:00
Sachin Sant
279d1a72c0 powerpc/xive: Export XIVE IPI information for online-only processors.
Cédric pointed out that XIVE IPI information exported via sysfs
(debug/powerpc/xive) display empty lines for processors which are
not online.

Switch to using for_each_online_cpu() so that information is
displayed for online-only processors.

Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/164146703333.19039.10920919226094771665.sendpatchset@MacBook-Pro.local
2022-01-31 13:45:23 +11:00
Fabiano Rosas
c1c8a66367 KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure
MMIO emulation can fail if the guest uses an instruction that we are
not prepared to emulate. Since these instructions can be and most
likely are valid ones, this is (slightly) closer to an access fault
than to an illegal instruction, so deliver a Data Storage interrupt
instead of a Program interrupt.

BookE ignores bad faults, so it will keep using a Program interrupt
because a DSI would cause a fault loop in the guest.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-6-farosas@linux.ibm.com
2022-01-31 13:43:00 +11:00
Fabiano Rosas
349fbfe9b9 KVM: PPC: mmio: Return to guest after emulation failure
If MMIO emulation fails we don't want to crash the whole guest by
returning to userspace.

The original commit bbf45ba57e ("KVM: ppc: PowerPC 440 KVM
implementation") added a todo:

  /* XXX Deliver Program interrupt to guest. */

and later the commit d69614a295 ("KVM: PPC: Separate loadstore
emulation from priv emulation") added the Program interrupt injection
but in another file, so I'm assuming it was missed that this block
needed to be altered.

Also change the message to a ratelimited one since we're letting the
guest run and it could flood the host logs.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-5-farosas@linux.ibm.com
2022-01-31 13:43:00 +11:00
Fabiano Rosas
3f83150448 KVM: PPC: mmio: Reject instructions that access more than mmio.data size
The MMIO interface between the kernel and userspace uses a structure
that supports a maximum of 8-bytes of data. Instructions that access
more than that need to be emulated in parts.

We currently don't have generic support for splitting the emulation in
parts and each set of instructions needs to be explicitly included.

There's already an error message being printed when a load or store
exceeds the mmio.data buffer but we don't fail the emulation until
later at kvmppc_complete_mmio_load and even then we allow userspace to
make a partial copy of the data, which ends up overwriting some fields
of the mmio structure.

This patch makes the emulation fail earlier at kvmppc_handle_load|store,
which will send a Program interrupt to the guest. This is better than
allowing the guest to proceed with partial data.

Note that this was caught in a somewhat artificial scenario using
quadword instructions (lq/stq), there's no account of an actual guest
in the wild running instructions that are not properly emulated.

(While here, remove the "bad MMIO" messages. The caller already has an
error message.)

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-4-farosas@linux.ibm.com
2022-01-31 13:43:00 +11:00
Fabiano Rosas
b99234b918 KVM: PPC: Fix vmx/vsx mixup in mmio emulation
The MMIO emulation code for vector instructions is duplicated between
VSX and VMX. When emulating VMX we should check the VMX copy size
instead of the VSX one.

Fixes: acc9eb9305 ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-3-farosas@linux.ibm.com
2022-01-31 13:42:59 +11:00
Fabiano Rosas
36d014d37d KVM: PPC: Book3S HV: Stop returning internal values to userspace
Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values
to userspace, against the API of the KVM_RUN ioctl which returns 0 on
success.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-2-farosas@linux.ibm.com
2022-01-31 13:42:59 +11:00
Al Viro
0c9dceb9bb asm/user.h: killed unused macros
Some of them used to be used by libbfd for a.out coredump handling.
Seeing that
	* libbfd has their copies anyway
	* we don't export them into userland headers
	* we don't support a.out coredumps anymore
let's bury the definitions.  They never had in-kernel
users anyway...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-01-30 21:17:00 -05:00
Arnd Bergmann
3e17314c22 agp: define proper stubs for empty helpers
The empty unmap_page_from_agp() macro causes a warning when
building with 'make W=1' on a couple of architectures:

drivers/char/agp/generic.c: In function 'agp_generic_destroy_page':
drivers/char/agp/generic.c:1265:28: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
 1265 |   unmap_page_from_agp(page);

Change the definitions to a 'do { } while (0)' construct to
make these more reliable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-29 22:24:25 +01:00
Nicholas Piggin
8defc2a5dd powerpc/64s/interrupt: Fix decrementer storm
The decrementer exception can fail to be cleared when the interrupt
returns in the case where the decrementer wraps with the next timer
still beyond decrementer_max. This results in a decrementer interrupt
storm. This is triggerable with small decrementer system with hard
and soft watchdogs disabled.

Fix this by always programming the decrementer if there was no timer.

Fixes: 0faf20a1ad ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124143930.3923442-1-npiggin@gmail.com
2022-01-25 16:50:10 +11:00
Nicholas Piggin
22f7ff0dea KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs
The L0 is storing HFSCR requested by the L1 for the L2 in struct
kvm_nested_guest when the L1 requests a vCPU enter L2. kvm_nested_guest
is not a per-vCPU structure. Hilarity ensues.

Fix it by moving the nested hfscr into the vCPU structure together with
the other per-vCPU nested fields.

Fixes: 8b210a880b ("KVM: PPC: Book3S HV Nested: Make nested HFSCR state accessible")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220122105530.3477250-1-npiggin@gmail.com
2022-01-25 16:39:41 +11:00
Athira Rajeev
fb6433b48a powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
Running selftest with CONFIG_PPC_IRQ_SOFT_MASK_DEBUG enabled in kernel
triggered below warning:

[  172.851380] ------------[ cut here ]------------
[  172.851391] WARNING: CPU: 8 PID: 2901 at arch/powerpc/include/asm/hw_irq.h:246 power_pmu_disable+0x270/0x280
[  172.851402] Modules linked in: dm_mod bonding nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfkill nfnetlink sunrpc xfs libcrc32c pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse
[  172.851442] CPU: 8 PID: 2901 Comm: lost_exception_ Not tainted 5.16.0-rc5-03218-g798527287598 #2
[  172.851451] NIP:  c00000000013d600 LR: c00000000013d5a4 CTR: c00000000013b180
[  172.851458] REGS: c000000017687860 TRAP: 0700   Not tainted  (5.16.0-rc5-03218-g798527287598)
[  172.851465] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48004884  XER: 20040000
[  172.851482] CFAR: c00000000013d5b4 IRQMASK: 1
[  172.851482] GPR00: c00000000013d5a4 c000000017687b00 c000000002a10600 0000000000000004
[  172.851482] GPR04: 0000000082004000 c0000008ba08f0a8 0000000000000000 00000008b7ed0000
[  172.851482] GPR08: 00000000446194f6 0000000000008000 c00000000013b118 c000000000d58e68
[  172.851482] GPR12: c00000000013d390 c00000001ec54a80 0000000000000000 0000000000000000
[  172.851482] GPR16: 0000000000000000 0000000000000000 c000000015d5c708 c0000000025396d0
[  172.851482] GPR20: 0000000000000000 0000000000000000 c00000000a3bbf40 0000000000000003
[  172.851482] GPR24: 0000000000000000 c0000008ba097400 c0000000161e0d00 c00000000a3bb600
[  172.851482] GPR28: c000000015d5c700 0000000000000001 0000000082384090 c0000008ba0020d8
[  172.851549] NIP [c00000000013d600] power_pmu_disable+0x270/0x280
[  172.851557] LR [c00000000013d5a4] power_pmu_disable+0x214/0x280
[  172.851565] Call Trace:
[  172.851568] [c000000017687b00] [c00000000013d5a4] power_pmu_disable+0x214/0x280 (unreliable)
[  172.851579] [c000000017687b40] [c0000000003403ac] perf_pmu_disable+0x4c/0x60
[  172.851588] [c000000017687b60] [c0000000003445e4] __perf_event_task_sched_out+0x1d4/0x660
[  172.851596] [c000000017687c50] [c000000000d1175c] __schedule+0xbcc/0x12a0
[  172.851602] [c000000017687d60] [c000000000d11ea8] schedule+0x78/0x140
[  172.851608] [c000000017687d90] [c0000000001a8080] sys_sched_yield+0x20/0x40
[  172.851615] [c000000017687db0] [c0000000000334dc] system_call_exception+0x18c/0x380
[  172.851622] [c000000017687e10] [c00000000000c74c] system_call_common+0xec/0x268

The warning indicates that MSR_EE being set(interrupt enabled) when
there was an overflown PMC detected. This could happen in
power_pmu_disable since it runs under interrupt soft disable
condition ( local_irq_save ) and not with interrupts hard disabled.
commit 2c9ac51b85 ("powerpc/perf: Fix PMU callbacks to clear
pending PMI before resetting an overflown PMC") intended to clear
PMI pending bit in Paca when disabling the PMU. It could happen
that PMC gets overflown while code is in power_pmu_disable
callback function. Hence add a check to see if PMI pending bit
is set in Paca before clearing it via clear_pmi_pending.

Fixes: 2c9ac51b85 ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220122033429.25395-1-atrajeev@linux.vnet.ibm.com
2022-01-24 17:31:15 +11:00
Christophe Leroy
aec982603a powerpc/fixmap: Fix VM debug warning on unmap
Unmapping a fixmap entry is done by calling __set_fixmap()
with FIXMAP_PAGE_CLEAR as flags.

Today, powerpc __set_fixmap() calls map_kernel_page().

map_kernel_page() is not happy when called a second time
for the same page.

	WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0xc/0x1e8
	CPU: 0 PID: 1 Comm: swapper Not tainted 5.16.0-rc3-s3k-dev-01993-g350ff07feb7d-dirty #682
	NIP:  c0017cd4 LR: c00187f0 CTR: 00000010
	REGS: e1011d50 TRAP: 0700   Not tainted  (5.16.0-rc3-s3k-dev-01993-g350ff07feb7d-dirty)
	MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 42000208  XER: 00000000

	GPR00: c0165fec e1011e10 c14c0000 c0ee2550 ff800000 c0f3d000 00000000 c001686c
	GPR08: 00001000 b00045a9 00000001 c0f58460 c0f50000 00000000 c0007e10 00000000
	GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
	GPR24: 00000000 00000000 c0ee2550 00000000 c0f57000 00000ff8 00000000 ff800000
	NIP [c0017cd4] set_pte_at+0xc/0x1e8
	LR [c00187f0] map_kernel_page+0x9c/0x100
	Call Trace:
	[e1011e10] [c0736c68] vsnprintf+0x358/0x6c8 (unreliable)
	[e1011e30] [c0165fec] __set_fixmap+0x30/0x44
	[e1011e40] [c0c13bdc] early_iounmap+0x11c/0x170
	[e1011e70] [c0c06cb0] ioremap_legacy_serial_console+0x88/0xc0
	[e1011e90] [c0c03634] do_one_initcall+0x80/0x178
	[e1011ef0] [c0c0385c] kernel_init_freeable+0xb4/0x250
	[e1011f20] [c0007e34] kernel_init+0x24/0x140
	[e1011f30] [c0016268] ret_from_kernel_thread+0x5c/0x64
	Instruction dump:
	7fe3fb78 48019689 80010014 7c630034 83e1000c 5463d97e 7c0803a6 38210010
	4e800020 81250000 712a0001 41820008 <0fe00000> 9421ffe0 93e1001c 48000030

Implement unmap_kernel_page() which clears an existing pte.

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b0b752f6f6ecc60653e873f385c6f0dce4e9ab6a.1638789098.git.christophe.leroy@csgroup.eu
2022-01-24 17:29:05 +11:00
Linus Torvalds
dd81e1c7d5 powerpc fixes for 5.17 #2
- A series of bpf fixes, including an oops fix and some codegen fixes.
 
  - Fix a regression in syscall_get_arch() for compat processes.
 
  - Fix boot failure on some 32-bit systems with KASAN enabled.
 
  - A couple of other build/minor fixes.
 
 Thanks to: Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa, Johan Almbladh,
 Maxime Bizon, Naveen N. Rao, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHtOJgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLMeD/9eBHqMy+yhXfTw9mjAf8rVlsdY6Xjq
 2LYgHCJpeP0VjxBm8GcyFtAfJc778LyykU18xsi0WVsuwPmVdVwDfZpzxWJrYqeq
 6AuK9uRZAbEx6gZVCsWlk3Em7GyUEqHm5PTxKsDfbzh+hBTNQ8CcKKbCVmEZ+loA
 nWdaWJOZMpE4hQOx0Ph5JRAos/ZUEt2I57qZkRgvIEV+hroFrmdP1rnncdalnLHN
 pXsEn380Q+rpH+48HIpaTpl5+R8CvgFnnlI6/luZC6HoA8K6jph8FdOkkI9eU1v6
 TgdNMiNVIe1wRFzVp+nH7SGqL3l9X/UPS57hL52X+ccpHoioBW9CLPha06g/+UKT
 kJU3mYZg3jvQGrM9H/2R3P97UH9SwHE5SH0xcXj3ItH1NpOKpMO0h5UuWx+Ze8Wo
 zO80xbYLbALtEfx8JMCf95rnUVFL0n55xVJN+7+fs2hU3Be/ZpIyHfDujlHlbd1m
 o3KRhkg1a0RyGsun6Gd2vwLGaBbwYNWvgHM9IDeiAFdJ7HFpp1YrVGpQRkAIUbJS
 bzYiRczjbSof7c9WqLyKNatgNFO/JPI5vi1iNQ3u9Oh9fzjpv5Zok4vK534mOnqU
 QnhiqLJbY+yBpoAvSePGQTCCzDAnJq0r53n0d1O/2T1GcpX9MKM4YlL+Wu4m0yLn
 RcbOCTkmoqWZEw==
 =KjWT
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - A series of bpf fixes, including an oops fix and some codegen fixes.

 - Fix a regression in syscall_get_arch() for compat processes.

 - Fix boot failure on some 32-bit systems with KASAN enabled.

 - A couple of other build/minor fixes.

Thanks to Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa,
Johan Almbladh, Maxime Bizon, Naveen N. Rao, and Nicholas Piggin.

* tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Mask SRR0 before checking against the masked NIP
  powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64
  powerpc/32s: Fix kasan_init_region() for KASAN
  powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32
  powerpc/audit: Fix syscall_get_arch()
  powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
  tools/bpf: Rename 'struct event' to avoid naming conflict
  powerpc/bpf: Update ldimm64 instructions during extra pass
  powerpc32/bpf: Fix codegen for bpf-to-bpf calls
  bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
2022-01-23 17:52:42 +02:00
Linus Torvalds
3689f9f8b0 bitmap patches for 5.17-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
 b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
 A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
 iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
 m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
 9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
 MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
 nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
 CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
 5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
 =RKW4
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - introduce for_each_set_bitrange()

 - use find_first_*_bit() instead of find_next_*_bit() where possible

 - unify for_each_bit() macros

* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
  vsprintf: rework bitmap_list_string
  lib: bitmap: add performance test for bitmap_print_to_pagebuf
  bitmap: unify find_bit operations
  mm/percpu: micro-optimize pcpu_is_populated()
  Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  find: micro-optimize for_each_{set,clear}_bit()
  include/linux: move for_each_bit() macros from bitops.h to find.h
  cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  tools: sync tools/bitmap with mother linux
  all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
  cpumask: use find_first_and_bit()
  lib: add find_first_and_bit()
  arch: remove GENERIC_FIND_FIRST_BIT entirely
  include: move find.h from asm_generic to linux
  bitops: move find_bit_*_le functions from le.h to find.h
  bitops: protect find_first_{,zero}_bit properly
2022-01-23 06:20:44 +02:00
Muchun Song
359745d783 proc: remove PDE_DATA() completely
Remove PDE_DATA() completely and replace it with pde_data().

[akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c]
[akpm@linux-foundation.org: now fix it properly]

Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:37 +02:00
Linus Torvalds
75242f31db RTC for 5.17
New driver:
  - Sunplus SP7021 RTC
  - Nintendo GameCube, Wii and Wii U RTC
 
 Drivers:
  - cmos: refactor UIP handling and presence check, fix century
  - rs5c372: offset correction support, report low voltage
  - rv8803: Epson RX8804 support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmHp+jkACgkQY6TcMGxw
 OjJS+hAArVbXHT0ceVHY5mzqkBXCyVxhXdyLUw/VmedO9NvCTk1+O9VhBwKTRdQX
 FLuvPkyjZlkAk3CEfaHBp4u5AW3Xumdkuh9XwG8b3qRs3UGYBgKiNfXQFtOWHs7w
 VO+lD7+f2GNNTYbYH2dLPs6qZwVIN0rL7JTwUH7Po2mqBVztppSOYhmo3vmOqqux
 FI5jFVDoE8u+h359bIKcaax3/dne9m1uyD1WWxOJFRtY+apbECpUBfo1tmauAEXP
 gg+v7On+gboDqVe9/lwqB+xfKzWFJKwYIu6CM8+Mf/dxhtRNRXCgszoiCSFZHUwG
 GghD7Kb96xWuGX1pRe6xx5/7wixes1wCkIVGa5JQLb5E/GQ3O9y8D+Tdk6lyAP5m
 XUVPWGC/qByj6z9pBHtbHmq9lhd1vPHp3SU0ske1xlCQd3WnPq7bQ3MEw345gf4Z
 RGrtpnUPIfKzHWID+2zCzTj6TluW9FnnOjcm2U8jF/kwB+d1/H2O5xeWR7eQYbUJ
 BY5gjDRVIaM3aQC9QiiFMUorqv8Q3oE9FtjCSuLhUx6WEg4S0mtYsXtRUaLT7pRw
 RRsoeCJd8Abnf1Nl/imZ2IrRCcbNhzLAq2Aw8cHbiroAk8lSFAH2NzvcS8213JfG
 muOrIB2Q3XBu+6hq452ZDE1155FGWFLwKnAjvSRy5yRbsN79/YA=
 =nwD6
 -----END PGP SIGNATURE-----

Merge tag 'rtc-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "Two new drivers this cycle and a significant rework of the CMOS driver
  make the bulk of the changes.

  I also carry powerpc changes with the agreement of Michael.

  New drivers:
   - Sunplus SP7021 RTC
   - Nintendo GameCube, Wii and Wii U RTC

  Driver updates:
   - cmos: refactor UIP handling and presence check, fix century
   - rs5c372: offset correction support, report low voltage
   - rv8803: Epson RX8804 support"

* tag 'rtc-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (33 commits)
  rtc: sunplus: fix return value in sp_rtc_probe()
  rtc: cmos: Evaluate century appropriate
  rtc: gamecube: Fix an IS_ERR() vs NULL check
  rtc: mc146818-lib: fix signedness bug in mc146818_get_time()
  dt-bindings: rtc: qcom-pm8xxx-rtc: update register numbers
  rtc: pxa: fix null pointer dereference
  rtc: ftrtc010: Use platform_get_irq() to get the interrupt
  rtc: Move variable into switch case statement
  rtc: pcf2127: Fix typo in comment
  dt-bindings: rtc: Add Sunplus RTC json-schema
  rtc: Add driver for RTC in Sunplus SP7021
  rtc: rs5c372: fix incorrect oscillation value on r2221tl
  rtc: rs5c372: add offset correction support
  rtc: cmos: avoid UIP when writing alarm time
  rtc: cmos: avoid UIP when reading alarm time
  rtc: mc146818-lib: refactor mc146818_does_rtc_work
  rtc: mc146818-lib: refactor mc146818_get_time
  rtc: mc146818-lib: extract mc146818_avoid_UIP
  rtc: mc146818-lib: fix RTC presence check
  rtc: Check return value from mc146818_get_time()
  ...
2022-01-21 13:13:35 +02:00
Linus Torvalds
fa2e1ba3e9 Networking fixes for 5.17-rc1, including fixes from netfilter, bpf.
Current release - regressions:
 
  - fix memory leaks in the skb free deferral scheme if upper layer
    protocols are used, i.e. in-kernel TCP readers like TLS
 
 Current release - new code bugs:
 
  - nf_tables: fix NULL check typo in _clone() functions
 
  - change the default to y for Vertexcom vendor Kconfig
 
  - a couple of fixes to incorrect uses of ref tracking
 
  - two fixes for constifying netdev->dev_addr
 
 Previous releases - regressions:
 
  - bpf:
    - various verifier fixes mainly around register offset handling
      when passed to helper functions
    - fix mount source displayed for bpffs (none -> bpffs)
 
  - bonding:
    - fix extraction of ports for connection hash calculation
    - fix bond_xmit_broadcast return value when some devices are down
 
  - phy: marvell: add Marvell specific PHY loopback
 
  - sch_api: don't skip qdisc attach on ingress, prevent ref leak
 
  - htb: restore minimal packet size handling in rate control
 
  - sfp: fix high power modules without diagnostic monitoring
 
  - mscc: ocelot:
    - don't let phylink re-enable TX PAUSE on the NPI port
    - don't dereference NULL pointers with shared tc filters
 
  - smsc95xx: correct reset handling for LAN9514
 
  - cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
 
  - phy: micrel: use kszphy_suspend/_resume for irq aware devices,
    avoid races with the interrupt
 
 Previous releases - always broken:
 
  - xdp: check prog type before updating BPF link
 
  - smc: resolve various races around abnormal connection termination
 
  - sit: allow encapsulated IPv6 traffic to be delivered locally
 
  - axienet: fix init/reset handling, add missing barriers,
    read the right status words, stop queues correctly
 
  - add missing dev_put() in sock_timestamping_bind_phc()
 
 Misc:
 
  - ipv4: prevent accidentally passing RTO_ONLINK to
    ip_route_output_key_hash() by sanitizing flags
 
  - ipv4: avoid quadratic behavior in netns dismantle
 
  - stmmac: dwmac-oxnas: add support for OX810SE
 
  - fsl: xgmac_mdio: add workaround for erratum A-009885
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHoS14ACgkQMUZtbf5S
 IrtMQA/6AxhWuj2JsoNhvTzBCi4vkeo53rKU941bxOaST9Ow8dqDc7yAT8YeJU2B
 lGw6/pXx+Fm9twGsRkkQ0vX7piIk25vKzEwnlCYVVXLAnE+lPu9qFH49X1HO5Fwy
 K+frGDC524MrbJFb+UbZfJG4UitsyHoqc58Mp7ZNBe2gn12DcHotsiSJikzdd02F
 rzQZhvwRKsDS2prcIHdvVAxva380cn99mvaFqIPR9MemhWKOzVa3NfkiC3tSlhW/
 OphG3UuOfKCVdofYAO5/oXlVQcDKx0OD9Sr2q8aO0mlME0p0ounKz+LDcwkofaYQ
 pGeMY2pEAHujLyRewunrfaPv8/SIB/ulSPcyreoF28TTN20M+4onvgTHvVSyzLl7
 MA4kYH7tkPgOfbW8T573OFPdrqsy4WTrFPFovGqvDuiE8h65Pll/gTcAqsWjF/xw
 CmfmtICcsBwVGMLUzpUjKAWuB0/voa/sQUuQoxvQFsgCteuslm1suLY5EfSIhdu8
 nvhySJjPXRHicZQNflIwKTiOYYWls7yYVGe76u9hqjyD36peJXYjUjyyENIfLiFA
 0XclGIfSBMGWMGmxvGYIZDwGOKK0j+s0PipliXVjP2otLrPYUjma5Co37KW8SiSV
 9TT673FAXJNB0IJ7xiT7nRUZ/fjRrweP1glte/6d148J1Lf9MTQ=
 =XM4Y
 -----END PGP SIGNATURE-----

Merge tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, bpf.

  Quite a handful of old regression fixes but most of those are
  pre-5.16.

  Current release - regressions:

   - fix memory leaks in the skb free deferral scheme if upper layer
     protocols are used, i.e. in-kernel TCP readers like TLS

  Current release - new code bugs:

   - nf_tables: fix NULL check typo in _clone() functions

   - change the default to y for Vertexcom vendor Kconfig

   - a couple of fixes to incorrect uses of ref tracking

   - two fixes for constifying netdev->dev_addr

  Previous releases - regressions:

   - bpf:
      - various verifier fixes mainly around register offset handling
        when passed to helper functions
      - fix mount source displayed for bpffs (none -> bpffs)

   - bonding:
      - fix extraction of ports for connection hash calculation
      - fix bond_xmit_broadcast return value when some devices are down

   - phy: marvell: add Marvell specific PHY loopback

   - sch_api: don't skip qdisc attach on ingress, prevent ref leak

   - htb: restore minimal packet size handling in rate control

   - sfp: fix high power modules without diagnostic monitoring

   - mscc: ocelot:
      - don't let phylink re-enable TX PAUSE on the NPI port
      - don't dereference NULL pointers with shared tc filters

   - smsc95xx: correct reset handling for LAN9514

   - cpsw: avoid alignment faults by taking NET_IP_ALIGN into account

   - phy: micrel: use kszphy_suspend/_resume for irq aware devices,
     avoid races with the interrupt

  Previous releases - always broken:

   - xdp: check prog type before updating BPF link

   - smc: resolve various races around abnormal connection termination

   - sit: allow encapsulated IPv6 traffic to be delivered locally

   - axienet: fix init/reset handling, add missing barriers, read the
     right status words, stop queues correctly

   - add missing dev_put() in sock_timestamping_bind_phc()

  Misc:

   - ipv4: prevent accidentally passing RTO_ONLINK to
     ip_route_output_key_hash() by sanitizing flags

   - ipv4: avoid quadratic behavior in netns dismantle

   - stmmac: dwmac-oxnas: add support for OX810SE

   - fsl: xgmac_mdio: add workaround for erratum A-009885"

* tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
  ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
  ipv4: avoid quadratic behavior in netns dismantle
  net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
  powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
  dt-bindings: net: Document fsl,erratum-a009885
  net/fsl: xgmac_mdio: Add workaround for erratum A-009885
  net: mscc: ocelot: fix using match before it is set
  net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices
  net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
  nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
  net: axienet: increase default TX ring size to 128
  net: axienet: fix for TX busy handling
  net: axienet: fix number of TX ring slots for available check
  net: axienet: Fix TX ring slot available check
  net: axienet: limit minimum TX ring size
  net: axienet: add missing memory barriers
  net: axienet: reset core on initialization prior to MDIO access
  net: axienet: Wait for PhyRstCmplt after core reset
  net: axienet: increase reset timeout
  bpf, selftests: Add ringbuf memory type confusion test
  ...
2022-01-20 10:57:05 +02:00
Linus Torvalds
f4484d138b Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...
2022-01-20 10:41:01 +02:00
Kefeng Wang
20c0357646 mm: percpu: add generic pcpu_populate_pte() function
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to
populate pte, this patch adds a generic pcpu populate pte function,
pcpu_populate_pte(), which is marked __weak and used on most
architectures, but it is overridden on x86, which has its own
implementation.

Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Kefeng Wang
23f917169e mm: percpu: add generic pcpu_fc_alloc/free funciton
With the previous patch, we could add a generic pcpu first chunk
allocate and free function to cleanup the duplicated definations on each
architecture.

Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Kefeng Wang
1ca3fb3abd mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu
first chunk allocation will call it to alloc memblock on the
corresponding node by it, this is prepare for the next patch.

Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Kefeng Wang
7ecd19cfdf mm: percpu: generalize percpu related config
Patch series "mm: percpu: Cleanup percpu first chunk function".

When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator.  This patchset is aimed to cleanup them and should no function
change.

The currently supported status about 'embed' and 'page' in Archs shows
below,

	embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
	page:  NEED_PER_CPU_EMBED_FIRST_CHUNK

		embed	page
	------------------------
	arm64	  Y	 Y
	mips	  Y	 N
	powerpc	  Y	 Y
	riscv	  Y	 N
	sparc	  Y	 Y
	x86	  Y	 Y
	------------------------

There are two interfaces about percpu first chunk allocator,

 extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
                                size_t atom_size,
                                pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
-                               pcpu_fc_alloc_fn_t alloc_fn,
-                               pcpu_fc_free_fn_t free_fn);
+                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);

 extern int __init pcpu_page_first_chunk(size_t reserved_size,
-                               pcpu_fc_alloc_fn_t alloc_fn,
-                               pcpu_fc_free_fn_t free_fn,
-                               pcpu_fc_populate_pte_fn_t populate_pte_fn);
+                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);

The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().

1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
   provided when archs supported NUMA.

2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
   a generic pcpu_populate_pte() which marked '__weak' is provided, if you
   need a different function to populate pte on the arch(like x86), please
   provide its own implementation.

[1] https://github.com/kevin78/linux.git percpu-cleanup

This patch (of 4):

The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.

Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.

Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Tobias Waldekranz
0d375d610f powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
This block is used in (at least) T1024 and T1040, including their
variants like T1023 etc.

Fixes: d55ad2967d ("powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19 08:14:18 -08:00
Linus Torvalds
fd6f57bfda Kbuild updates for v5.17
- Add new kconfig target 'make mod2noconfig', which will be useful to
    speed up the build and test iteration.
 
  - Raise the minimum supported version of LLVM to 11.0.0
 
  - Refactor certs/Makefile
 
  - Change the format of include/config/auto.conf to stop double-quoting
    string type CONFIG options.
 
  - Fix ARCH=sh builds in dash
 
  - Separate compression macros for general purposes (cmd_bzip2 etc.) and
    the ones for decompressors (cmd_bzip2_with_size etc.)
 
  - Misc Makefile cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmHnFNIVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGiQEP/1tkt9IHP7vFvkN9xChQI8HQ7HOC
 mPIxBAUzHIp1V2IALb0lfojjnpkzcMNpJZVlmqjgyYShLEPPBFwKVXs1War6GViX
 aprUMz7w1zR/vZJ2fplFmrkNwSxNp3+LSE6sHVmsliS4Vfzh7CjHb8DnaKjBvQLZ
 M+eQugjHsWI3d3E81/qtRG5EaVs6q8osF3b0Km59mrESWVYKqwlUP3aUyQCCUGFK
 mI+zC4SrHH6EAIZd//VpaleXxVtDcjjadb7Iru5MFhFdCBIRoSC3d1IWPUNUKNnK
 i0ocDXuIoAulA/mROgrpyAzLXg10qYMwwTmX+tplkHA055gKcY/v4aHym6ypH+TX
 6zd34UMTLM32LSjs8hssiQT8BiZU0uZoa/m2E9IBaiExA2sTsRZxgQMKXFFaPQJl
 jn4cRiG0K1NDeRKtq4xh2WO46OS4sPlR6zW9EXDEsS/bI05Y7LpUz7Flt6iA2Mq3
 0g8uYIYr/9drl96X83tFgTkxxB6lpB29tbsmsrKJRGxvrCDnAhXlXhPCkMajkm2Q
 PjJfNtMFzwemSZWq09+F+X5BgCjzZtroOdFI9FTMNhGWyaUJZXCtcXQ6UTIKnTHO
 cDjcURvh+l56eNEQ5SMTNtAkxB+pX8gPUmyO1wLwRUT4YodxylkTUXGyBBR9tgTn
 Yks1TnPD06ld364l
 =8BQf
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add new kconfig target 'make mod2noconfig', which will be useful to
   speed up the build and test iteration.

 - Raise the minimum supported version of LLVM to 11.0.0

 - Refactor certs/Makefile

 - Change the format of include/config/auto.conf to stop double-quoting
   string type CONFIG options.

 - Fix ARCH=sh builds in dash

 - Separate compression macros for general purposes (cmd_bzip2 etc.) and
   the ones for decompressors (cmd_bzip2_with_size etc.)

 - Misc Makefile cleanups

* tag 'kbuild-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
  kbuild: add cmd_file_size
  arch: decompressor: remove useless vmlinux.bin.all-y
  kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}
  kbuild: drop $(size_append) from cmd_zstd
  sh: rename suffix-y to suffix_y
  doc: kbuild: fix default in `imply` table
  microblaze: use built-in function to get CPU_{MAJOR,MINOR,REV}
  certs: move scripts/extract-cert to certs/
  kbuild: do not quote string values in include/config/auto.conf
  kbuild: do not include include/config/auto.conf from shell scripts
  certs: simplify $(srctree)/ handling and remove config_filename macro
  kbuild: stop using config_filename in scripts/Makefile.modsign
  certs: remove misleading comments about GCC PR
  certs: refactor file cleaning
  certs: remove unneeded -I$(srctree) option for system_certificates.o
  certs: unify duplicated cmd_extract_certs and improve the log
  certs: use $< and $@ to simplify the key generation rule
  kbuild: remove headers_check stub
  kbuild: move headers_check.pl to usr/include/
  certs: use if_changed to re-generate the key when the key type is changed
  ...
2022-01-19 11:15:19 +02:00
Nicholas Piggin
aee101d7b9 powerpc/64s: Mask SRR0 before checking against the masked NIP
Commit 314f6c23dd ("powerpc/64s: Mask NIP before checking against
SRR0") masked off the low 2 bits of the NIP value in the interrupt
stack frame in case they are non-zero and mis-compare against a SRR0
register value of a CPU which always reads back 0 from the 2 low bits
which are reserved.

This now causes the opposite problem that an implementation which does
implement those bits in SRR0 will mis-compare against the masked NIP
value in which they have been cleared. QEMU is one such implementation,
and this is allowed by the architecture.

This can be triggered by sigfuz by setting low bits of PT_NIP in the
signal context.

Fix this for now by masking the SRR0 bits as well. Cleaner is probably
to sanitise these values before putting them in registers or stack, but
this is the quick and backportable fix.

Fixes: 314f6c23dd ("powerpc/64s: Mask NIP before checking against SRR0")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220117134403.2995059-1-npiggin@gmail.com
2022-01-18 10:25:18 +11:00
Athira Rajeev
429a64f6e9 powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64
power_pmu_wants_prompt_pmi() is used to decide if PMIs should be taken
promptly. This is valid only for ppc64 and is used only if
CONFIG_PPC_BOOK3S_64=y. Hence include the function under config check
for PPC64.

Fixes warning for 32-bit compilation:

  arch/powerpc/perf/core-book3s.c:2455:6: warning: no previous prototype for 'power_pmu_wants_prompt_pmi'
    2455 | bool power_pmu_wants_prompt_pmi(void)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 5a7745b96f ("powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move inside existing CONFIG_PPC64 ifdef block]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220114031355.87480-1-atrajeev@linux.vnet.ibm.com
2022-01-17 15:04:13 +11:00
Linus Torvalds
35ce8ae9ae Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal/exit/ptrace updates from Eric Biederman:
 "This set of changes deletes some dead code, makes a lot of cleanups
  which hopefully make the code easier to follow, and fixes bugs found
  along the way.

  The end-game which I have not yet reached yet is for fatal signals
  that generate coredumps to be short-circuit deliverable from
  complete_signal, for force_siginfo_to_task not to require changing
  userspace configured signal delivery state, and for the ptrace stops
  to always happen in locations where we can guarantee on all
  architectures that the all of the registers are saved and available on
  the stack.

  Removal of profile_task_ext, profile_munmap, and profile_handoff_task
  are the big successes for dead code removal this round.

  A bunch of small bug fixes are included, as most of the issues
  reported were small enough that they would not affect bisection so I
  simply added the fixes and did not fold the fixes into the changes
  they were fixing.

  There was a bug that broke coredumps piped to systemd-coredump. I
  dropped the change that caused that bug and replaced it entirely with
  something much more restrained. Unfortunately that required some
  rebasing.

  Some successes after this set of changes: There are few enough calls
  to do_exit to audit in a reasonable amount of time. The lifetime of
  struct kthread now matches the lifetime of struct task, and the
  pointer to struct kthread is no longer stored in set_child_tid. The
  flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
  removed. Issues where task->exit_code was examined with
  signal->group_exit_code should been examined were fixed.

  There are several loosely related changes included because I am
  cleaning up and if I don't include them they will probably get lost.

  The original postings of these changes can be found at:
     https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org

  I trimmed back the last set of changes to only the obviously correct
  once. Simply because there was less time for review than I had hoped"

* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
  ptrace/m68k: Stop open coding ptrace_report_syscall
  ptrace: Remove unused regs argument from ptrace_report_syscall
  ptrace: Remove second setting of PT_SEIZED in ptrace_attach
  taskstats: Cleanup the use of task->exit_code
  exit: Use the correct exit_code in /proc/<pid>/stat
  exit: Fix the exit_code for wait_task_zombie
  exit: Coredumps reach do_group_exit
  exit: Remove profile_handoff_task
  exit: Remove profile_task_exit & profile_munmap
  signal: clean up kernel-doc comments
  signal: Remove the helper signal_group_exit
  signal: Rename group_exit_task group_exec_task
  coredump: Stop setting signal->group_exit_task
  signal: Remove SIGNAL_GROUP_COREDUMP
  signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
  signal: Make coredump handling explicit in complete_signal
  signal: Have prepare_signal detect coredumps using signal->core_state
  signal: Have the oom killer detect coredumps using signal->core_state
  exit: Move force_uaccess back into do_exit
  exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
  ...
2022-01-17 05:49:30 +02:00
Linus Torvalds
79e06c4c49 RISCV:
- Use common KVM implementation of MMU memory caches
 
 - SBI v0.2 support for Guest
 
 - Initial KVM selftests support
 
 - Fix to avoid spurious virtual interrupts after clearing hideleg CSR
 
 - Update email address for Anup and Atish
 
 ARM:
 - Simplification of the 'vcpu first run' by integrating it into
   KVM's 'pid change' flow
 
 - Refactoring of the FP and SVE state tracking, also leading to
   a simpler state and less shared data between EL1 and EL2 in
   the nVHE case
 
 - Tidy up the header file usage for the nvhe hyp object
 
 - New HYP unsharing mechanism, finally allowing pages to be
   unmapped from the Stage-1 EL2 page-tables
 
 - Various pKVM cleanups around refcounting and sharing
 
 - A couple of vgic fixes for bugs that would trigger once
   the vcpu xarray rework is merged, but not sooner
 
 - Add minimal support for ARMv8.7's PMU extension
 
 - Rework kvm_pgtable initialisation ahead of the NV work
 
 - New selftest for IRQ injection
 
 - Teach selftests about the lack of default IPA space and
   page sizes
 
 - Expand sysreg selftest to deal with Pointer Authentication
 
 - The usual bunch of cleanups and doc update
 
 s390:
 - fix sigp sense/start/stop/inconsistency
 
 - cleanups
 
 x86:
 - Clean up some function prototypes more
 
 - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation
 
 - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
 
 - completely remove potential TOC/TOU races in nested SVM consistency checks
 
 - update some PMCs on emulated instructions
 
 - Intel AMX support (joint work between Thomas and Intel)
 
 - large MMU cleanups
 
 - module parameter to disable PMU virtualization
 
 - cleanup register cache
 
 - first part of halt handling cleanups
 
 - Hyper-V enlightened MSR bitmap support for nested hypervisors
 
 Generic:
 - clean up Makefiles
 
 - introduce CONFIG_HAVE_KVM_DIRTY_RING
 
 - optimize memslot lookup using a tree
 
 - optimize vCPU array usage by converting to xarray
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHhxvsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPZkAf+Nz92UL/5nNGcdHtE4m7AToMmitE9
 bYkesf9BMQvAe5wjkABLuoHGi6ay4jabo4fiGzbdkiK7lO5YgfsWiMB3/MT5fl4E
 jRPzaVQabp3YZLM8UYCBmfUVuRj524S967SfSRe0AvYjDEH8y7klPf4+7sCsFT0/
 Px9Vf2KGuOlf0eM78yKg4rGaF0jS22eLgXm6FfNMY8/e29ZAo/jyUmqBY+Z2xxZG
 aWhceDtSheW1jwLHLj3nOlQJvHTn8LVGXBE/R8Gda3ZjrBV2rKaDi4Fh+HD+dz86
 2zVXwzQ7uck2CMW73GMoXMTWoKSHMyvlBOs1BdvBm4UsnGcXR+q8IFCeuQ==
 =s73m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "RISCV:

   - Use common KVM implementation of MMU memory caches

   - SBI v0.2 support for Guest

   - Initial KVM selftests support

   - Fix to avoid spurious virtual interrupts after clearing hideleg CSR

   - Update email address for Anup and Atish

  ARM:

   - Simplification of the 'vcpu first run' by integrating it into KVM's
     'pid change' flow

   - Refactoring of the FP and SVE state tracking, also leading to a
     simpler state and less shared data between EL1 and EL2 in the nVHE
     case

   - Tidy up the header file usage for the nvhe hyp object

   - New HYP unsharing mechanism, finally allowing pages to be unmapped
     from the Stage-1 EL2 page-tables

   - Various pKVM cleanups around refcounting and sharing

   - A couple of vgic fixes for bugs that would trigger once the vcpu
     xarray rework is merged, but not sooner

   - Add minimal support for ARMv8.7's PMU extension

   - Rework kvm_pgtable initialisation ahead of the NV work

   - New selftest for IRQ injection

   - Teach selftests about the lack of default IPA space and page sizes

   - Expand sysreg selftest to deal with Pointer Authentication

   - The usual bunch of cleanups and doc update

  s390:

   - fix sigp sense/start/stop/inconsistency

   - cleanups

  x86:

   - Clean up some function prototypes more

   - improved gfn_to_pfn_cache with proper invalidation, used by Xen
     emulation

   - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery

   - completely remove potential TOC/TOU races in nested SVM consistency
     checks

   - update some PMCs on emulated instructions

   - Intel AMX support (joint work between Thomas and Intel)

   - large MMU cleanups

   - module parameter to disable PMU virtualization

   - cleanup register cache

   - first part of halt handling cleanups

   - Hyper-V enlightened MSR bitmap support for nested hypervisors

  Generic:

   - clean up Makefiles

   - introduce CONFIG_HAVE_KVM_DIRTY_RING

   - optimize memslot lookup using a tree

   - optimize vCPU array usage by converting to xarray"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits)
  x86/fpu: Fix inline prefix warnings
  selftest: kvm: Add amx selftest
  selftest: kvm: Move struct kvm_x86_state to header
  selftest: kvm: Reorder vcpu_load_state steps for AMX
  kvm: x86: Disable interception for IA32_XFD on demand
  x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
  kvm: selftests: Add support for KVM_CAP_XSAVE2
  kvm: x86: Add support for getting/setting expanded xstate buffer
  x86/fpu: Add uabi_size to guest_fpu
  kvm: x86: Add CPUID support for Intel AMX
  kvm: x86: Add XCR0 support for Intel AMX
  kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
  kvm: x86: Emulate IA32_XFD_ERR for guest
  kvm: x86: Intercept #NM for saving IA32_XFD_ERR
  x86/fpu: Prepare xfd_err in struct fpu_guest
  kvm: x86: Add emulation for IA32_XFD
  x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
  kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
  x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
  x86/fpu: Add guest support to xfd_enable_feature()
  ...
2022-01-16 16:15:14 +02:00
Christophe Leroy
d37823c352 powerpc/32s: Fix kasan_init_region() for KASAN
It has been reported some configuration where the kernel doesn't
boot with KASAN enabled.

This is due to wrong BAT allocation for the KASAN area:

	---[ Data Block Address Translation ]---
	0: 0xc0000000-0xcfffffff 0x00000000       256M Kernel rw      m
	1: 0xd0000000-0xdfffffff 0x10000000       256M Kernel rw      m
	2: 0xe0000000-0xefffffff 0x20000000       256M Kernel rw      m
	3: 0xf8000000-0xf9ffffff 0x2a000000        32M Kernel rw      m
	4: 0xfa000000-0xfdffffff 0x2c000000        64M Kernel rw      m

A BAT must have both virtual and physical addresses alignment matching
the size of the BAT. This is not the case for BAT 4 above.

Fix kasan_init_region() by using block_size() function that is in
book3s32/mmu.c. To be able to reuse it here, make it non static and
change its name to bat_block_size() in order to avoid name conflict
with block_size() defined in <linux/blkdev.h>

Also reuse find_free_bat() to avoid an error message from setbat()
when no BAT is available.

And allocate memory outside of linear memory mapping to avoid
wasting that precious space.

With this change we get correct alignment for BATs and KASAN shadow
memory is allocated outside the linear memory space.

	---[ Data Block Address Translation ]---
	0: 0xc0000000-0xcfffffff 0x00000000       256M Kernel rw
	1: 0xd0000000-0xdfffffff 0x10000000       256M Kernel rw
	2: 0xe0000000-0xefffffff 0x20000000       256M Kernel rw
	3: 0xf8000000-0xfbffffff 0x7c000000        64M Kernel rw
	4: 0xfc000000-0xfdffffff 0x7a000000        32M Kernel rw

Fixes: 7974c47326 ("powerpc/32s: Implement dedicated kasan_init_region()")
Cc: stable@vger.kernel.org
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7a50ef902494d1325227d47d33dada01e52e5518.1641818726.git.christophe.leroy@csgroup.eu
2022-01-16 20:51:05 +11:00
Christophe Leroy
87b9d74fb0 powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32
CC      arch/powerpc/kernel/time.o
	In file included from <command-line>:
	./arch/powerpc/include/asm/hw_irq.h: In function 'do_hard_irq_enable':
	././include/linux/compiler_types.h:335:45: error: call to '__compiletime_assert_35' declared with attribute error: BUILD_BUG failed
	  335 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	      |                                             ^
	././include/linux/compiler_types.h:316:25: note: in definition of macro '__compiletime_assert'
	  316 |                         prefix ## suffix();                             \
	      |                         ^~~~~~
	././include/linux/compiler_types.h:335:9: note: in expansion of macro '_compiletime_assert'
	  335 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	      |         ^~~~~~~~~~~~~~~~~~~
	./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
	   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
	      |                                     ^~~~~~~~~~~~~~~~~~
	./include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
	   59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
	      |                     ^~~~~~~~~~~~~~~~
	./arch/powerpc/include/asm/hw_irq.h:483:9: note: in expansion of macro 'BUILD_BUG'
	  483 |         BUILD_BUG();
	      |         ^~~~~~~~~

should_hard_irq_enable() returns false on PPC32 so this BUILD_BUG() shouldn't trigger.

Force inlining of should_hard_irq_enable()

Fixes: 0faf20a1ad ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/247e01e0e10f4dbc59b5ff89e81702eb1ee7641e.1641828571.git.christophe.leroy@csgroup.eu
2022-01-16 20:50:20 +11:00
Linus Torvalds
f56caedaf9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
2022-01-15 20:37:06 +02:00
Yury Norov
b5c7e7ec7d all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
find_first{,_zero}_bit is a more effective analogue of 'next' version if
start == 0. This patch replaces 'next' with 'first' where things look
trivial.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2022-01-15 08:47:31 -08:00
Yury Norov
47d8c15615 include: move find.h from asm_generic to linux
find_bit API and bitmap API are closely related, but inclusion paths
are different - include/asm-generic and include/linux, correspondingly.
In the past it made a lot of troubles due to circular dependencies
and/or undefined symbols. Fix this by moving find.h under include/linux.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2022-01-15 08:47:31 -08:00
Aneesh Kumar K.V
21b084fdf2 mm/mempolicy: wire up syscall set_mempolicy_home_node
Link: https://lkml.kernel.org/r/20211202123810.267175-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:30 +02:00
Qi Zheng
36ef159f44 mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit
Since commit 4064b98270 ("mm: allow VM_FAULT_RETRY for multiple
times") allowed VM_FAULT_RETRY for multiple times, the
FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page
fault path, so the following check is no longer needed:

	flags & FAULT_FLAG_ALLOW_RETRY

So just remove it.

[akpm@linux-foundation.org: coding style fixes]

Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Peter Xu <peterx@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:27 +02:00
Christophe Leroy
252745240b powerpc/audit: Fix syscall_get_arch()
Commit 770cec16cd ("powerpc/audit: Simplify syscall_get_arch()")
and commit 898a1ef06a ("powerpc/audit: Avoid unneccessary #ifdef
in syscall_get_arguments()")
replaced test_tsk_thread_flag(task, TIF_32BIT)) by is_32bit_task().

But is_32bit_task() applies on current task while be want the test
done on task 'task'

So add a new macro is_tsk_32bit_task() to check any task.

Fixes: 770cec16cd ("powerpc/audit: Simplify syscall_get_arch()")
Fixes: 898a1ef06a ("powerpc/audit: Avoid unneccessary #ifdef in syscall_get_arguments()")
Cc: stable@vger.kernel.org
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c55cddb8f65713bf5859ed675d75a50cb37d5995.1642159570.git.christophe.leroy@csgroup.eu
2022-01-15 12:21:26 +11:00
Naveen N. Rao
3f5f766d5f powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
Johan reported the below crash with test_bpf on ppc64 e5500:

  test_bpf: #296 ALU_END_FROM_LE 64: 0x0123456789abcdef -> 0x67452301 jited:1
  Oops: Exception in kernel mode, sig: 4 [#1]
  BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
  Modules linked in: test_bpf(+)
  CPU: 0 PID: 76 Comm: insmod Not tainted 5.14.0-03771-g98c2059e008a-dirty #1
  NIP:  8000000000061c3c LR: 80000000006dea64 CTR: 8000000000061c18
  REGS: c0000000032d3420 TRAP: 0700   Not tainted (5.14.0-03771-g98c2059e008a-dirty)
  MSR:  0000000080089000 <EE,ME>  CR: 88002822  XER: 20000000 IRQMASK: 0
  <...>
  NIP [8000000000061c3c] 0x8000000000061c3c
  LR [80000000006dea64] .__run_one+0x104/0x17c [test_bpf]
  Call Trace:
   .__run_one+0x60/0x17c [test_bpf] (unreliable)
   .test_bpf_init+0x6a8/0xdc8 [test_bpf]
   .do_one_initcall+0x6c/0x28c
   .do_init_module+0x68/0x28c
   .load_module+0x2460/0x2abc
   .__do_sys_init_module+0x120/0x18c
   .system_call_exception+0x110/0x1b8
   system_call_common+0xf0/0x210
  --- interrupt: c00 at 0x101d0acc
  <...>
  ---[ end trace 47b2bf19090bb3d0 ]---

  Illegal instruction

The illegal instruction turned out to be 'ldbrx' emitted for
BPF_FROM_[L|B]E, which was only introduced in ISA v2.06. Guard use of
the same and implement an alternative approach for older processors.

Fixes: 156d0e290e ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Reported-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d1e51c6fdf572062cf3009a751c3406bda01b832.1641468127.git.naveen.n.rao@linux.vnet.ibm.com
2022-01-15 12:21:25 +11:00
Naveen N. Rao
f9320c4999 powerpc/bpf: Update ldimm64 instructions during extra pass
These instructions are updated after the initial JIT, so redo codegen
during the extra pass. Rename bpf_jit_fixup_subprog_calls() to clarify
that this is more than just subprog calls.

Fixes: 69c087ba62 ("bpf: Add bpf_for_each_map_elem() helper")
Cc: stable@vger.kernel.org # v5.15
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7cc162af77ba918eb3ecd26ec9e7824bc44b1fae.1641468127.git.naveen.n.rao@linux.vnet.ibm.com
2022-01-15 12:21:24 +11:00
Naveen N. Rao
fab07611fb powerpc32/bpf: Fix codegen for bpf-to-bpf calls
Pad instructions emitted for BPF_CALL so that the number of instructions
generated does not change for different function addresses. This is
especially important for calls to other bpf functions, whose address
will only be known during extra pass.

Fixes: 51c66ad849 ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/52d8fe51f7620a6f27f377791564d79d75463576.1641468127.git.naveen.n.rao@linux.vnet.ibm.com
2022-01-15 12:21:24 +11:00
Linus Torvalds
29ec39fcf1 powerpc updates for 5.17
- Optimise radix KVM guest entry/exit by 2x on Power9/Power10.
 
  - Allow firmware to tell us whether to disable the entry and uaccess flushes on Power10
    or later CPUs.
 
  - Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits.
 
  - Several fixes and improvements to our hard lockup watchdog.
 
  - Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit.
 
  - Allow building the 64-bit Book3S kernel without hash MMU support, ie. Radix only.
 
  - Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit).
 
  - Add new encodings for perf_mem_data_src.mem_hops field, and use them on Power10.
 
  - A series of small performance improvements to 64-bit interrupt entry.
 
  - Several commits fixing issues when building with the clang integrated assembler.
 
  - Many other small features and fixes.
 
 Thanks to: Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell, Arnd Bergmann,
 Athira Rajeev, Cédric Le Goater, Christophe JAILLET, Christophe Leroy, Christoph Hellwig,
 Daniel Axtens, David Yang, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren,
 Hari Bathini, Jason Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent
 Dufour, Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh Kamboju,
 Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child, Oliver O'Halloran, Peiwei
 Hu, Randy Dunlap, Ravi Bangoria, Rob Herring, Russell Currey, Sachin Sant, Sean
 Christopherson, Segher Boessenkool, Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang
 wangx, Yang Guang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHhVFMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgKwzD/9UUEZzWyzMVRJvP9FPZByN2M8czxHJ
 tuqEuVqnfks8ad8tfm2ebng5t8ZuVASBQU2fpPA1+lpdvprgZN5RFGMRh729vskn
 2aHQPmFvFObNbXOgCoXzk+C5xYi3zoRMVM968neSPBneYo+xDicn/zN5CHAgsjhX
 +baemJQ7/xzwLiZgTHe8fWw3nTk3IbPBpha59SdTvR8Moy6I4O8CDPIYEm3U3/J3
 x14ZRETqjksL7YOzEBk0avm1dDZRw/johz29oRYSmCj7dyy5OqrkPwokJiRY90eA
 1lVdofDc0zElaSWkVGzKdSWRUIXjKIVdtejvDeEvl6H/mI6q4TVZE8rFmn+3Rvgf
 9q0iKtmw5Kn11cqgY/pgEGmxnQtIdAodNfI/t939E7+O5LbcznuYUiy0J/kTD/vl
 Xduotg2dsCI+5ukf1wrk2wt9LhqZL+ziOeaBhyDM4orV8T3HBYL6zWBptun//IGO
 lK6TvvCHSYnGqY4bnrAmiOnbbEtnP6nN3zbcXgSvPM0wCRHPIEqd0NRXtfISo32d
 vBPq1neXWo4wrRJj9X3yOuP+5fEA4I+hB3yrCJOkcEcz+8NhlboQXU7raVsJL+bd
 kze75H8hwX7kE71oJFFl13LbSNABgiLFARTBXKfvdQA2iLdR0Snvm+OouvwWRPo/
 Po7Nm3zqdLc/1A==
 =BxhQ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Optimise radix KVM guest entry/exit by 2x on Power9/Power10.

 - Allow firmware to tell us whether to disable the entry and uaccess
   flushes on Power10 or later CPUs.

 - Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits.

 - Several fixes and improvements to our hard lockup watchdog.

 - Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit.

 - Allow building the 64-bit Book3S kernel without hash MMU support, ie.
   Radix only.

 - Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit).

 - Add new encodings for perf_mem_data_src.mem_hops field, and use them
   on Power10.

 - A series of small performance improvements to 64-bit interrupt entry.

 - Several commits fixing issues when building with the clang integrated
   assembler.

 - Many other small features and fixes.

Thanks to Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell,
Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christophe JAILLET,
Christophe Leroy, Christoph Hellwig, Daniel Axtens, David Yang, Erhard
Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren, Hari Bathini, Jason
Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour,
Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh
Kamboju, Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child,
Oliver O'Halloran, Peiwei Hu, Randy Dunlap, Ravi Bangoria, Rob Herring,
Russell Currey, Sachin Sant, Sean Christopherson, Segher Boessenkool,
Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang wangx, and Yang
Guang.

* tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (240 commits)
  powerpc/xmon: Dump XIVE information for online-only processors.
  powerpc/opal: use default_groups in kobj_type
  powerpc/cacheinfo: use default_groups in kobj_type
  powerpc/sched: Remove unused TASK_SIZE_OF
  powerpc/xive: Add missing null check after calling kmalloc
  powerpc/floppy: Remove usage of the deprecated "pci-dma-compat.h" API
  selftests/powerpc: Add a test of sigreturning to an unaligned address
  powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings
  powerpc/64s: Mask NIP before checking against SRR0
  powerpc/perf: Fix spelling of "its"
  powerpc/32: Fix boot failure with GCC latent entropy plugin
  powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests
  powerpc/code-patching: Move code patching selftests in its own file
  powerpc/code-patching: Move instr_is_branch_{i/b}form() in code-patching.h
  powerpc/code-patching: Move patch_exception() outside code-patching.c
  powerpc/code-patching: Use test_trampoline for prefixed patch test
  powerpc/code-patching: Fix patch_branch() return on out-of-range failure
  powerpc/code-patching: Reorganise do_patch_instruction() to ease error handling
  powerpc/code-patching: Fix unmap_patch_area() error handling
  powerpc/code-patching: Fix error handling in do_patch_instruction()
  ...
2022-01-14 15:17:26 +01:00
Linus Torvalds
feb7a43de5 Rework of the MSI interrupt infrastructure:
Treewide cleanup and consolidation of MSI interrupt handling in
   preparation for further changes in this area which are necessary to:
 
   - address existing shortcomings in the VFIO area
 
   - support the upcoming Interrupt Message Store functionality which
     decouples the message store from the PCI config/MMIO space
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmHf+SETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobzGD/wNEFl5qQo5mNZ9thP6JSJFOItm7zMc
 2QgzCYOqNwAv4jL6Dqo+EHtbShYqDyWzKdKccgqNjmdIqgW8q7/fubN1OPzRsClV
 CZG997AsXDGXYlQcE3tXZjkeCWnWEE2AGLnygSkFV1K/r9ALAtFfTBJAWB+UD+Zc
 1P8Kxo0q0Jg+DQAMAA5bWfSSjo/Pmpr/1AFjY7+GA8BBeJJgWOyW7H1S+GYEWVOE
 RaQP81Sbd6x1JkopxkNqSJ/lbNJfnPJxi2higB56Y0OYn5CuSarYbZUM7oQ2V61t
 jN7pcEEvTpjLd6SJ93ry8WOcJVMTbccCklVfD0AfEwwGUGw2VM6fSyNrZfnrosUN
 tGBEO8eflBJzGTAwSkz1EhiGKna4o1NBDWpr0sH2iUiZC5G6V2hUDbM+0PQJhDa8
 bICwguZElcUUPOprwjS0HXhymnxghTmNHyoEP1yxGoKLTrwIqkH/9KGustWkcBmM
 hNtOCwQNqxcOHg/r3MN0KxttTASgoXgNnmFliAWA7XwseRpLWc95XPQFa5sptRhc
 EzwumEz17EW1iI5/NyZQcY+jcZ9BdgCqgZ9ECjZkyN4U+9G6iACUkxVaHUUs77jl
 a0ISSEHEvJisFOsOMYyFfeWkpIKGIKP/bpLOJEJ6kAdrUWFvlRGF3qlav3JldXQl
 ypFjPapDeB5guw==
 =vKzd
 -----END PGP SIGNATURE-----

Merge tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull MSI irq updates from Thomas Gleixner:
 "Rework of the MSI interrupt infrastructure.

  This is a treewide cleanup and consolidation of MSI interrupt handling
  in preparation for further changes in this area which are necessary
  to:

   - address existing shortcomings in the VFIO area

   - support the upcoming Interrupt Message Store functionality which
     decouples the message store from the PCI config/MMIO space"

* tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits)
  genirq/msi: Populate sysfs entry only once
  PCI/MSI: Unbreak pci_irq_get_affinity()
  genirq/msi: Convert storage to xarray
  genirq/msi: Simplify sysfs handling
  genirq/msi: Add abuse prevention comment to msi header
  genirq/msi: Mop up old interfaces
  genirq/msi: Convert to new functions
  genirq/msi: Make interrupt allocation less convoluted
  platform-msi: Simplify platform device MSI code
  platform-msi: Let core code handle MSI descriptors
  bus: fsl-mc-msi: Simplify MSI descriptor handling
  soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs()
  soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation
  NTB/msi: Convert to msi_on_each_desc()
  PCI: hv: Rework MSI handling
  powerpc/mpic_u3msi: Use msi_for_each-desc()
  powerpc/fsl_msi: Use msi_for_each_desc()
  powerpc/pasemi/msi: Convert to msi_on_each_dec()
  powerpc/cell/axon_msi: Convert to msi_on_each_desc()
  powerpc/4xx/hsta: Rework MSI handling
  ...
2022-01-13 09:05:29 -08:00
Linus Torvalds
4eb766f64d Devicetree updates for v5.17:
Bindings:
 - DT schema conversions for Samsung clocks, RNG bindings, Qcom Command
   DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl, Tegra I2C
   and BPMP, pwm-vibrator, Arm DSU, and Cadence macb
 
 - DT schema conversions for Broadcom platforms: interrupt controllers,
   STB GPIO, STB waketimer, STB reset, iProc MDIO mux, iProc PCIe,
   Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON, SYSTEMPORT, AMAC,
   Northstar 2 PCIe PHY, GENET, moca PHY, GISB arbiter, and SATA
 
 - Add binding schemas for Tegra210 EMC table, TI DC-DC converters,
 
 - Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues
 
 - More fixes due to 'unevaluatedProperties' enabling
 
 - Data type fixes and clean-ups of binding examples found in preparation
   to move to validating DTB files directly (instead of intermediate YAML
   representation.
 
 - Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus
 
 - Add various new compatible strings
 
 DT core:
 - Silence a warning for overlapping reserved memory regions
 
 - Reimplement unittest overlay tracking
 
 - Fix stack frame size warning in unittest
 
 - Clean-ups of early FDT scanning functions
 
 - Fix handling of "linux,usable-memory-range" on EFI booted systems
 
 - Add support for 'fail' status on CPU nodes
 
 - Improve error message in of_phandle_iterator_next()
 
 - kbuild: Disable duplicate unit-address warnings for disabled nodes
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmHfCdcQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw+UZD/0ZMQQ6VF20MW7Gg0bOutd8Q6Q6opjrCG5c
 nLW5mv8Q+um3sI1ZpwdMI4zAfCmTfeL13ZM9KtJKlJ0o41bgId+kZsezy4I2rN9+
 sE1CwA4TninKTJsUkmyQX4fgJRUZ95Eubryfb07sy7nbK3LZQ+t18R5tzVBDpzy4
 7hy4eM6mlMxgIJDi7EUboLZslkMM4TGGutLsk5C5T5V5lcWSt3Jj5WZtl5k4Wykq
 j4i9mU+GGTZi0nGAJQ7lNoLPatZDSVQx5tzNV/Wi8hSwZbn0Kycu+IuWZyihILz/
 9lzB/7tv8fl+xkTaJ5xxaY05HcDeX02yCLzh3PfAHRYdbQ2EkFoaKqJ81SLfAq5t
 aH87v41wFSrjzynxpppqswXOdqI/jofrHrGlQldnw0VHGTjEfDbyZGRQFPHmuzTG
 gXaSNKCxppG7ThpXarfu7D4TdYV75n+cBOsC/BBopYgIS2+xmjDA3t5Scks1/4NX
 1Hfq9IMF9iYJYc/GNXBWcOrLn9d1ILYt6HrKRQar1NIEFH1Lt0c2aw5WsyvOZ4zx
 aLHLSbEwnl+2wleyGB9YQkFaaF7N6qcid3u9KFRJP6nTojoaeQaIi3MR9F3LVReZ
 LV5YfWEcij1zc+lzwgHc6+8bbgFxrKgOC2IL/B6u93u/BO0wmF/54kbEZKaLyX8d
 a7Iii4IYFw==
 =2g8v
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:
 "Bindings:

   - DT schema conversions for Samsung clocks, RNG bindings, Qcom
     Command DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl,
     Tegra I2C and BPMP, pwm-vibrator, Arm DSU, and Cadence macb

   - DT schema conversions for Broadcom platforms: interrupt
     controllers, STB GPIO, STB waketimer, STB reset, iProc MDIO mux,
     iProc PCIe, Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON,
     SYSTEMPORT, AMAC, Northstar 2 PCIe PHY, GENET, moca PHY, GISB
     arbiter, and SATA

   - Add binding schemas for Tegra210 EMC table, TI DC-DC converters,

   - Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues

   - More fixes due to 'unevaluatedProperties' enabling

   - Data type fixes and clean-ups of binding examples found in
     preparation to move to validating DTB files directly (instead of
     intermediate YAML representation.

   - Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus

   - Add various new compatible strings

  DT core:

   - Silence a warning for overlapping reserved memory regions

   - Reimplement unittest overlay tracking

   - Fix stack frame size warning in unittest

   - Clean-ups of early FDT scanning functions

   - Fix handling of "linux,usable-memory-range" on EFI booted systems

   - Add support for 'fail' status on CPU nodes

   - Improve error message in of_phandle_iterator_next()

   - kbuild: Disable duplicate unit-address warnings for disabled nodes"

* tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (114 commits)
  dt-bindings: net: mdio: Drop resets/reset-names child properties
  dt-bindings: clock: samsung: convert S5Pv210 to dtschema
  dt-bindings: clock: samsung: convert Exynos5410 to dtschema
  dt-bindings: clock: samsung: convert Exynos5260 to dtschema
  dt-bindings: clock: samsung: extend Exynos7 bindings with UFS
  dt-bindings: clock: samsung: convert Exynos7 to dtschema
  dt-bindings: clock: samsung: convert Exynos5433 to dtschema
  dt-bindings: i2c: maxim,max96712: Add bindings for Maxim Integrated MAX96712
  dt-bindings: iio: adi,ltc2983: Fix 64-bit property sizes
  dt-bindings: power: maxim,max17040: Fix incorrect type for 'maxim,rcomp'
  dt-bindings: interrupt-controller: arm,gic-v3: Fix 'interrupts' cell size in example
  dt-bindings: iio/magnetometer: yamaha,yas530: Fix invalid 'interrupts' in example
  dt-bindings: clock: imx5: Drop clock consumer node from example
  dt-bindings: Drop required 'interrupt-parent'
  dt-bindings: net: ti,dp83869: Drop value on boolean 'ti,max-output-impedance'
  dt-bindings: net: wireless: mt76: Fix 8-bit property sizes
  dt-bindings: PCI: snps,dw-pcie-ep: Drop conflicting 'max-functions' schema
  dt-bindings: i2c: st,stm32-i2c: Make each example a separate entry
  dt-bindings: net: stm32-dwmac: Make each example a separate entry
  dt-bindings: net: Cleanup MDIO node schemas
  ...
2022-01-12 16:47:05 -08:00
Linus Torvalds
daadb3bd0e Peter Zijlstra says:
"Lots of cleanups and preparation; highlights:
 
  - futex: Cleanup and remove runtime futex_cmpxchg detection
 
  - rtmutex: Some fixes for the PREEMPT_RT locking infrastructure
 
  - kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
    annotate the racy owner->on_cpu access *once*.
 
  - atomic64: Dead-Code-Elemination"
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHdvssACgkQEsHwGGHe
 VUrbBg//VQvz5BwddIJDj9utt5AvSixNcTF5mJyFKCSIqO0S4J8nCNcvJjZ2bs4S
 w1YmInFbp0WFGUhaIZiw0e6KWJUoINTng4MfHDZosS1doT2of53ZaQqXs3i81jDz
 87w8ADVHL0x4+BNjdsIwbcuPSDTmJFoyFOdeXTIl9hv9ZULT8m4Mt+LJuUHNZ+vF
 rS1jyseVPWkcm5y+Yie0rhip+ygzbfbt0ArsLfRcrBJsKr6oxLxV2DDF+2djXuuP
 d2OgGT7VkbgAhoKpzVXUiHsT6ppR5Mn5TLSa4EZ4bPPCUFldOhKuCAImF3T6yVIa
 44iX5vQN9v5VHBy6ocPbdOIBuYBYVGCMurh1t7pbpB6G+mmSxMiyta5MY37POwjv
 K2JT9mC2A6a4d17gue5FT3mnJMBB4eHwVaDfAwCZs/5rRNuoTz4aY5Xy04Mq0ltI
 39uarwBd5hwSugBWg44AS5E9h52E654FQ7g6iS4NtUvJuuaXBTl43EcZWx2+mnPL
 zY+iOMVMgg33VIVcm/mlf/6zWL0LXPmILUiA1fp4Q9/n8u1EuOOyeA/GsC9Pl3wO
 HY3KpYJA5eQpIk/JEnzKm5ZE3pCrUdH6VDC/SB4owQtafQG6OxyQVP1Gj7KYxZsD
 NqqpJ4nkKooc5f5DqVEN8wrjyYsnVxEfriEG09OoR6wI3MqyUA4=
 =vrYy
 -----END PGP SIGNATURE-----

Merge tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Borislav Petkov:
 "Lots of cleanups and preparation. Highlights:

   - futex: Cleanup and remove runtime futex_cmpxchg detection

   - rtmutex: Some fixes for the PREEMPT_RT locking infrastructure

   - kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
     annotate the racy owner->on_cpu access *once*.

   - atomic64: Dead-Code-Elemination"

[ Description above by Peter Zijlstra ]

* tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomic: atomic64: Remove unusable atomic ops
  futex: Fix additional regressions
  locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
  x86/mm: Include spinlock_t definition in pgtable.
  locking: Mark racy reads of owner->on_cpu
  locking: Make owner_on_cpu() into <linux/sched.h>
  lockdep/selftests: Adapt ww-tests for PREEMPT_RT
  lockdep/selftests: Skip the softirq related tests on PREEMPT_RT
  lockdep/selftests: Unbalanced migrate_disable() & rcu_read_lock().
  lockdep/selftests: Avoid using local_lock_{acquire|release}().
  lockdep: Remove softirq accounting on PREEMPT_RT.
  locking/rtmutex: Add rt_mutex_lock_nest_lock() and rt_mutex_lock_killable().
  locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.
  locking: Remove rt_rwlock_is_contended().
  sched: Trigger warning if ->migration_disabled counter underflows.
  futex: Fix sparc32/m68k/nds32 build regression
  futex: Remove futex_cmpxchg detection
  futex: Ensure futex_atomic_cmpxchg_inatomic() is present
  kernel/locking: Use a pointer in ww_mutex_trylock().
2022-01-11 17:24:45 -08:00
Linus Torvalds
5c947d0dba Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Algorithms:

   - Drop alignment requirement for data in aesni

   - Use synchronous seeding from the /dev/random in DRBG

   - Reseed nopr DRBGs every 5 minutes from /dev/random

   - Add KDF algorithms currently used by security/DH

   - Fix lack of entropy on some AMD CPUs with jitter RNG

  Drivers:

   - Add support for the D1 variant in sun8i-ce

   - Add SEV_INIT_EX support in ccp

   - PFVF support for GEN4 host driver in qat

   - Compression support for GEN4 devices in qat

   - Add cn10k random number generator support"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (145 commits)
  crypto: af_alg - rewrite NULL pointer check
  lib/mpi: Add the return value check of kcalloc()
  crypto: qat - fix definition of ring reset results
  crypto: hisilicon - cleanup warning in qm_get_qos_value()
  crypto: kdf - select SHA-256 required for self-test
  crypto: x86/aesni - don't require alignment of data
  crypto: ccp - remove unneeded semicolon
  crypto: stm32/crc32 - Fix kernel BUG triggered in probe()
  crypto: s390/sha512 - Use macros instead of direct IV numbers
  crypto: sparc/sha - remove duplicate hash init function
  crypto: powerpc/sha - remove duplicate hash init function
  crypto: mips/sha - remove duplicate hash init function
  crypto: sha256 - remove duplicate generic hash init function
  crypto: jitter - add oversampling of noise source
  MAINTAINERS: update SEC2 driver maintainers list
  crypto: ux500 - Use platform_get_irq() to get the interrupt
  crypto: hisilicon/qm - disable qm clock-gating
  crypto: omap-aes - Fix broken pm_runtime_and_get() usage
  MAINTAINERS: update caam crypto driver maintainers list
  crypto: octeontx2 - prevent underflow in get_cores_bmap()
  ...
2022-01-11 10:21:35 -08:00
Linus Torvalds
8efd0d9c31 Networking changes for 5.17.
Core
 ----
 
  - Defer freeing TCP skbs to the BH handler, whenever possible,
    or at least perform the freeing outside of the socket lock section
    to decrease cross-CPU allocator work and improve latency.
 
  - Add netdevice refcount tracking to locate sources of netdevice
    and net namespace refcount leaks.
 
  - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
    all queues from a single CPU removing latency spikes.
 
  - Various small optimizations throughout the stack from Eric Dumazet.
 
  - Make netdev->dev_addr[] constant, force modifications to go via
    appropriate helpers to allow us to keep addresses in ordered data
    structures.
 
  - Replace unix_table_lock with per-hash locks, improving performance
    of bind() calls.
 
  - Extend skb drop tracepoint with a drop reason.
 
  - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
 
 BPF
 ---
 
  - New helpers:
    - bpf_find_vma(), find and inspect VMAs for profiling use cases
    - bpf_loop(), runtime-bounded loop helper trading some execution
      time for much faster (if at all converging) verification
    - bpf_strncmp(), improve performance, avoid compiler flakiness
    - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
      for tracing programs, all inlined by the verifier
 
  - Support BPF relocations (CO-RE) in the kernel loader.
 
  - Further the support for BTF_TYPE_TAG annotations.
 
  - Allow access to local storage in sleepable helpers.
 
  - Convert verifier argument types to a composable form with different
    attributes which can be shared across types (ro, maybe-null).
 
  - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
    creating new, extensible ones where missing and deprecating those
    to be removed.
 
 Protocols
 ---------
 
  - WiFi (mac80211/cfg80211):
    - notify user space about long "come back in N" AP responses,
      allow it to react to such temporary rejections
    - allow non-standard VHT MCS 10/11 rates
    - use coarse time in airtime fairness code to save CPU cycles
 
  - Bluetooth:
    - rework of HCI command execution serialization to use a common
      queue and work struct, and improve handling errors reported
      in the middle of a batch of commands
    - rework HCI event handling to use skb_pull_data, avoiding packet
      parsing pitfalls
    - support AOSP Bluetooth Quality Report
 
  - SMC:
    - support net namespaces, following the RDMA model
    - improve connection establishment latency by pre-clearing buffers
    - introduce TCP ULP for automatic redirection to SMC
 
  - Multi-Path TCP:
    - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
    - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
      IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
    - support cmsgs: TCP_INQ
    - improvements in the data scheduler (assigning data to subflows)
    - support fastclose option (quick shutdown of the full MPTCP
      connection, similar to TCP RST in regular TCP)
 
  - MCTP (Management Component Transport) over serial, as defined by
    DMTF spec DSP0253 - "MCTP Serial Transport Binding".
 
 Driver API
 ----------
 
  - Support timestamping on bond interfaces in active/passive mode.
 
  - Introduce generic phylink link mode validation for drivers which
    don't have any quirks and where MAC capability bits fully express
    what's supported. Allow PCS layer to participate in the validation.
    Convert a number of drivers.
 
  - Add support to set/get size of buffers on the Rx rings and size of
    the tx copybreak buffer via ethtool.
 
  - Support offloading TC actions as first-class citizens rather than
    only as attributes of filters, improve sharing and device resource
    utilization.
 
  - WiFi (mac80211/cfg80211):
    - support forwarding offload (ndo_fill_forward_path)
    - support for background radar detection hardware
    - SA Query Procedures offload on the AP side
 
 New hardware / drivers
 ----------------------
 
  - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
    real-time requirements for isochronous communication with protocols
    like OPC UA Pub/Sub.
 
  - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
    integrated into many older Qualcomm SoCs, e.g. MSM8916 or
    MSM8974 (qcom_bam_dmux).
 
  - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch
    driver with support for bridging, VLANs and multicast forwarding
    (lan966x).
 
  - iwlmei driver for co-operating between Intel's WiFi driver and
    Intel's Active Management Technology (AMT) devices.
 
  - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
 
  - Bluetooth:
    - MediaTek MT7921 SDIO devices
    - Foxconn MT7922A
    - Realtek RTL8852AE
 
 Drivers
 -------
 
  - Significantly improve performance in the datapaths of:
    lan78xx, ax88179_178a, lantiq_xrx200, bnxt.
 
  - Intel Ethernet NICs:
    - igb: support PTP/time PEROUT and EXTTS SDP functions on
      82580/i354/i350 adapters
    - ixgbevf: new PF -> VF mailbox API which avoids the risk of
      mailbox corruption with ESXi
    - iavf: support configuration of VLAN features of finer granularity,
      stacked tags and filtering
    - ice: PTP support for new E822 devices with sub-ns precision
    - ice: support firmware activation without reboot
 
  - Mellanox Ethernet NICs (mlx5):
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
    - support TC forwarding when tunnel encap and decap happen between
      two ports of the same NIC
    - dynamically size and allow disabling various features to save
      resources for running in embedded / SmartNIC scenarios
 
  - Broadcom Ethernet NICs (bnxt):
    - use page frag allocator to improve Rx performance
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
 
  - Other Ethernet NICs:
    - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
 
  - Microsoft cloud/virtual NIC (mana):
    - add XDP support (PASS, DROP, TX)
 
  - Mellanox Ethernet switches (mlxsw):
    - initial support for Spectrum-4 ASICs
    - VxLAN with IPv6 underlay
 
  - Marvell Ethernet switches (prestera):
    - support flower flow templates
    - add basic IP forwarding support
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - support Per-Stream Filtering and Policing (PSFP)
    - enable cut-through forwarding between ports by default
    - support FDMA to improve packet Rx/Tx to CPU
 
  - Other embedded switches:
    - hellcreek: improve trapping management (STP and PTP) packets
    - qca8k: support link aggregation and port mirroring
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - qca6390, wcn6855: enable 802.11 power save mode in station mode
    - BSS color change support
    - WCN6855 hw2.1 support
    - 11d scan offload support
    - scan MAC address randomization support
    - full monitor mode, only supported on QCN9074
    - qca6390/wcn6855: report signal and tx bitrate
    - qca6390: rfkill support
    - qca6390/wcn6855: regdb.bin support
 
  - Intel WiFi (iwlwifi):
    - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
      in cooperation with the BIOS
    - support for Optimized Connectivity Experience (OCE) scan
    - support firmware API version 68
    - lots of preparatory work for the upcoming Bz device family
 
  - MediaTek WiFi (mt76):
    - Specific Absorption Rate (SAR) support
    - mt7921: 160 MHz channel support
 
  - RealTek WiFi (rtw88):
    - Specific Absorption Rate (SAR) support
    - scan offload
 
  - Other WiFi NICs
    - ath10k: support fetching (pre-)calibration data from nvmem
    - brcmfmac: configure keep-alive packet on suspend
    - wcn36xx: beacon filter support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHbkZAACgkQMUZtbf5S
 IruYkQ//XX7BggcwBfukPK83j0dONolClijqKcKR08g4vB5L8GXvv6OErKIWrh4k
 h8JanCH352ZkbCSw3MvFdm825UYQv8vPMd6Qks/LJ4aSKqCuy4MIlAo+yOw4Km3O
 i7++lRfma6DqHHI59wvLjWoxZSPu8lL+rI8UsZ5qMOlnNlGAOXsNrzRjaqQ3FddY
 AMxZeBUtrPqUCCQZFq3U8apkYzUp7CA/3XR9zRcja3uPbrtOV2G+4whRF90qGNWz
 Tm/QvJ9F/Ab292cbhxR4KuaQ3hUhaCQyDjbZk3+FZzZpAVhYTVqcNjny6+yXmbiP
 NXRtwemnl1NlWKMnJM8lEeY48u626tRIkxA/Wtd61uoO5uKUSxfGP+UpUi+DfXbF
 yIw50VQ7L2bpxXP/HjtmhVgZDaWKYyh22Zw4Hp/muMJz0hgUB0KODY3tf2jUWbjJ
 0oEgocWyzhhwMQKqupTDCIaRgIs2ewYr4ZrFDhI3HnHC/vv1VjoPRUPIyxwppD2N
 cXvZb3B1sWK8iX5gCbISGzyU4bB7I0rvJSTU42ueti7n6NqRFZ79qHQpYnnY+JdO
 z1qOwY/d/yWfBoXVKRtRg2qz6CdEt5BQklwAgVEBgrFpf58gp694EwGMb1htY14J
 r/k9bVpmyIFpUnBH2CPMRfBVA3tUTqzyzzFV4AMw40NYLKmhLdo=
 =KLm3
 -----END PGP SIGNATURE-----

Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Defer freeing TCP skbs to the BH handler, whenever possible, or at
     least perform the freeing outside of the socket lock section to
     decrease cross-CPU allocator work and improve latency.

   - Add netdevice refcount tracking to locate sources of netdevice and
     net namespace refcount leaks.

   - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
     all queues from a single CPU removing latency spikes.

   - Various small optimizations throughout the stack from Eric Dumazet.

   - Make netdev->dev_addr[] constant, force modifications to go via
     appropriate helpers to allow us to keep addresses in ordered data
     structures.

   - Replace unix_table_lock with per-hash locks, improving performance
     of bind() calls.

   - Extend skb drop tracepoint with a drop reason.

   - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.

  BPF
  ---

   - New helpers:
      - bpf_find_vma(), find and inspect VMAs for profiling use cases
      - bpf_loop(), runtime-bounded loop helper trading some execution
        time for much faster (if at all converging) verification
      - bpf_strncmp(), improve performance, avoid compiler flakiness
      - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
        for tracing programs, all inlined by the verifier

   - Support BPF relocations (CO-RE) in the kernel loader.

   - Further the support for BTF_TYPE_TAG annotations.

   - Allow access to local storage in sleepable helpers.

   - Convert verifier argument types to a composable form with different
     attributes which can be shared across types (ro, maybe-null).

   - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
     creating new, extensible ones where missing and deprecating those
     to be removed.

  Protocols
  ---------

   - WiFi (mac80211/cfg80211):
      - notify user space about long "come back in N" AP responses,
        allow it to react to such temporary rejections
      - allow non-standard VHT MCS 10/11 rates
      - use coarse time in airtime fairness code to save CPU cycles

   - Bluetooth:
      - rework of HCI command execution serialization to use a common
        queue and work struct, and improve handling errors reported in
        the middle of a batch of commands
      - rework HCI event handling to use skb_pull_data, avoiding packet
        parsing pitfalls
      - support AOSP Bluetooth Quality Report

   - SMC:
      - support net namespaces, following the RDMA model
      - improve connection establishment latency by pre-clearing buffers
      - introduce TCP ULP for automatic redirection to SMC

   - Multi-Path TCP:
      - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
      - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
        IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
      - support cmsgs: TCP_INQ
      - improvements in the data scheduler (assigning data to subflows)
      - support fastclose option (quick shutdown of the full MPTCP
        connection, similar to TCP RST in regular TCP)

   - MCTP (Management Component Transport) over serial, as defined by
     DMTF spec DSP0253 - "MCTP Serial Transport Binding".

  Driver API
  ----------

   - Support timestamping on bond interfaces in active/passive mode.

   - Introduce generic phylink link mode validation for drivers which
     don't have any quirks and where MAC capability bits fully express
     what's supported. Allow PCS layer to participate in the validation.
     Convert a number of drivers.

   - Add support to set/get size of buffers on the Rx rings and size of
     the tx copybreak buffer via ethtool.

   - Support offloading TC actions as first-class citizens rather than
     only as attributes of filters, improve sharing and device resource
     utilization.

   - WiFi (mac80211/cfg80211):
      - support forwarding offload (ndo_fill_forward_path)
      - support for background radar detection hardware
      - SA Query Procedures offload on the AP side

  New hardware / drivers
  ----------------------

   - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
     real-time requirements for isochronous communication with protocols
     like OPC UA Pub/Sub.

   - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
     integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
     (qcom_bam_dmux).

   - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
     with support for bridging, VLANs and multicast forwarding
     (lan966x).

   - iwlmei driver for co-operating between Intel's WiFi driver and
     Intel's Active Management Technology (AMT) devices.

   - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips

   - Bluetooth:
      - MediaTek MT7921 SDIO devices
      - Foxconn MT7922A
      - Realtek RTL8852AE

  Drivers
  -------

   - Significantly improve performance in the datapaths of: lan78xx,
     ax88179_178a, lantiq_xrx200, bnxt.

   - Intel Ethernet NICs:
      - igb: support PTP/time PEROUT and EXTTS SDP functions on
        82580/i354/i350 adapters
      - ixgbevf: new PF -> VF mailbox API which avoids the risk of
        mailbox corruption with ESXi
      - iavf: support configuration of VLAN features of finer
        granularity, stacked tags and filtering
      - ice: PTP support for new E822 devices with sub-ns precision
      - ice: support firmware activation without reboot

   - Mellanox Ethernet NICs (mlx5):
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
      - support TC forwarding when tunnel encap and decap happen between
        two ports of the same NIC
      - dynamically size and allow disabling various features to save
        resources for running in embedded / SmartNIC scenarios

   - Broadcom Ethernet NICs (bnxt):
      - use page frag allocator to improve Rx performance
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool

   - Other Ethernet NICs:
      - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support

   - Microsoft cloud/virtual NIC (mana):
      - add XDP support (PASS, DROP, TX)

   - Mellanox Ethernet switches (mlxsw):
      - initial support for Spectrum-4 ASICs
      - VxLAN with IPv6 underlay

   - Marvell Ethernet switches (prestera):
      - support flower flow templates
      - add basic IP forwarding support

   - NXP embedded Ethernet switches (ocelot & felix):
      - support Per-Stream Filtering and Policing (PSFP)
      - enable cut-through forwarding between ports by default
      - support FDMA to improve packet Rx/Tx to CPU

   - Other embedded switches:
      - hellcreek: improve trapping management (STP and PTP) packets
      - qca8k: support link aggregation and port mirroring

   - Qualcomm 802.11ax WiFi (ath11k):
      - qca6390, wcn6855: enable 802.11 power save mode in station mode
      - BSS color change support
      - WCN6855 hw2.1 support
      - 11d scan offload support
      - scan MAC address randomization support
      - full monitor mode, only supported on QCN9074
      - qca6390/wcn6855: report signal and tx bitrate
      - qca6390: rfkill support
      - qca6390/wcn6855: regdb.bin support

   - Intel WiFi (iwlwifi):
      - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
        in cooperation with the BIOS
      - support for Optimized Connectivity Experience (OCE) scan
      - support firmware API version 68
      - lots of preparatory work for the upcoming Bz device family

   - MediaTek WiFi (mt76):
      - Specific Absorption Rate (SAR) support
      - mt7921: 160 MHz channel support

   - RealTek WiFi (rtw88):
      - Specific Absorption Rate (SAR) support
      - scan offload

   - Other WiFi NICs
      - ath10k: support fetching (pre-)calibration data from nvmem
      - brcmfmac: configure keep-alive packet on suspend
      - wcn36xx: beacon filter support"

* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
  tcp: tcp_send_challenge_ack delete useless param `skb`
  net/qla3xxx: Remove useless DMA-32 fallback configuration
  rocker: Remove useless DMA-32 fallback configuration
  hinic: Remove useless DMA-32 fallback configuration
  lan743x: Remove useless DMA-32 fallback configuration
  net: enetc: Remove useless DMA-32 fallback configuration
  cxgb4vf: Remove useless DMA-32 fallback configuration
  cxgb4: Remove useless DMA-32 fallback configuration
  cxgb3: Remove useless DMA-32 fallback configuration
  bnx2x: Remove useless DMA-32 fallback configuration
  et131x: Remove useless DMA-32 fallback configuration
  be2net: Remove useless DMA-32 fallback configuration
  vmxnet3: Remove useless DMA-32 fallback configuration
  bna: Simplify DMA setting
  net: alteon: Simplify DMA setting
  myri10ge: Simplify DMA setting
  qlcnic: Simplify DMA setting
  net: allwinner: Fix print format
  page_pool: remove spinlock in page_pool_refill_alloc_cache()
  amt: fix wrong return type of amt_send_membership_update()
  ...
2022-01-10 19:06:09 -08:00
Linus Torvalds
48a60bdb2b - Add a set of thread_info.flags accessors which snapshot it before
accesing it in order to prevent any potential data races, and convert
 all users to those new accessors
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcgFoACgkQEsHwGGHe
 VUqXeRAAvcNEfFw6BvXeGfFTxKmOrsRtu2WCkAkjvamyhXMCrjBqqHlygLJFCH5i
 2mc6HBohzo4vBFcgi3R5tVkGazqlthY1KUM9Jpk7rUuUzi0phTH7n/MafZOm9Es/
 BHYcAAyT/NwZRbCN0geccIzBtbc4xr8kxtec7vkRfGDx8B9/uFN86xm7cKAaL62G
 UDs0IquDPKEns3A7uKNuvKztILtuZWD1WcSkbOULJzXgLkb+cYKO1Lm9JK9rx8Ds
 8tjezrJgOYGLQyyv0i3pWelm3jCZOKUChPslft0opvVUbrNd8piehvOm9CWopHcB
 QsYOWchnULTE9o4ZAs/1PkxC0LlFEWZH8bOLxBMTDVEY+xvmDuj1PdBUpncgJbOh
 dunHzsvaWproBSYUXA9nKhZWTVGl+CM8Ks7jXjl3IPynLd6cpYZ/5gyBVWEX7q3e
 8htG95NzdPPo7doxMiNSKGSmSm0Np1TJ/i89vsYeGfefsvsq53Fyjhu7dIuTWHmU
 2YUe6qHs6dF9x1bkHAAZz6T9Hs4BoGQBcXUnooT9JbzVdv2RfTPsrawdu8dOnzV1
 RhwCFdFcll0AIEl0T9fCYzUI/Ga8ZS0roXs5NZ4wl0lwr0BGFwiU8WC1FUdGsZo9
 0duaa0Tpv0OWt6rIMMB/E9QsqCDsQ4CMHuQpVVw+GOO5ux9kMms=
 =v6Xn
 -----END PGP SIGNATURE-----

Merge tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull thread_info flag accessor helper updates from Borislav Petkov:
 "Add a set of thread_info.flags accessors which snapshot it before
  accesing it in order to prevent any potential data races, and convert
  all users to those new accessors"

* tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  powerpc: Snapshot thread flags
  powerpc: Avoid discarding flags in system_call_exception()
  openrisc: Snapshot thread flags
  microblaze: Snapshot thread flags
  arm64: Snapshot thread flags
  ARM: Snapshot thread flags
  alpha: Snapshot thread flags
  sched: Snapshot thread flags
  entry: Snapshot thread flags
  x86: Snapshot thread flags
  thread_info: Add helpers to snapshot thread flags
2022-01-10 11:34:10 -08:00
Linus Torvalds
9b9e211360 arm64 updates for 5.17:
- KCSAN enabled for arm64.
 
 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.
 
 - Some more SVE clean-ups and refactoring in preparation for SME support
   (scalable matrix extensions).
 
 - BTI clean-ups (SYM_FUNC macros etc.)
 
 - arm64 atomics clean-up and codegen improvements.
 
 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and reciprocal
   square root estimate).
 
 - Use SHA3 instructions to speed-up XOR.
 
 - arm64 unwind code refactoring/unification.
 
 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP == 1
   (potentially set by a hypervisor; user-space already does this).
 
 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.
 
 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmHXNtYACgkQa9axLQDI
 XvHBGw/+OVGdbORxwrU+uRb7N6qIJkrW/mmM4x1KLo1i+REZLb8/VlXm0xC60FG+
 39x6FSVkRr+lLDfTqpQsOez5FpdsvOe9Fc4L3bwniDg+EPo7x65VmP2dw/Ae2q0i
 87xyWCczx5hFEPF/1sb1R1pm3bTXjeklBkdv+OXhwflLOwpCp1J8z8WJK8qJVFX6
 CmuE6Q4fDQr0ghl9Nf8DiAr20mHDh8wMKNUJOg4waaQOOCta6q1oJ3qfz6E9z1eW
 zEE3dfZgBCx7HCRc3KGgzT7H4Ces3BYvhBYP6bJRliVI88XdPiM4MfdGL4UIb27Q
 NLAdr+FVzk/YLzMHtxSfkT10nBqoOPWUTckLu9jIIl5cpBX73Wiz7jfzBvqFmC/y
 opSFMZ3lwQPM5WAPtAlZptA3GPPySeInVmvUgB7IQ+1Q1T1n8ri1y5hzTYC4Sc/g
 amJI1rXf1Al8+2zFBggr6Up+EOnfV9nAwrzLXkRlASsfmvY4dnVWg3NWfBqtEHAq
 VuZCecSgawxuSlpmJ4VGbLrBFaz18bn9EzujR5fFvi5Qcg1CMFOROi2+6IynopNV
 IS0R8j6fwgQPA5lcnNIPeJRRkQoqO4l8bPDzeXEny0BSw313EgBSo9aQtnjyIJbp
 BTuDHARKs+/NvDPvd8GQkxNPgwJnVOL9pdgNAolEu1/k7JtnIS0=
 =ecyi
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - KCSAN enabled for arm64.

 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.

 - Some more SVE clean-ups and refactoring in preparation for SME
   support (scalable matrix extensions).

 - BTI clean-ups (SYM_FUNC macros etc.)

 - arm64 atomics clean-up and codegen improvements.

 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and
   reciprocal square root estimate).

 - Use SHA3 instructions to speed-up XOR.

 - arm64 unwind code refactoring/unification.

 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP ==
   1 (potentially set by a hypervisor; user-space already does this).

 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.

 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits)
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64: Enable KCSAN
  kselftest/arm64: Add pidbench for floating point syscall cases
  arm64/fp: Add comments documenting the usage of state restore functions
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME
  ...
2022-01-10 08:49:37 -08:00
Linus Torvalds
a7ac314061 asm-generic: cleanups for 5.17
A few minor cleanups for cross-architecture code: Alexandre Ghiti
 deals with removing some leftovers from drivers and features that
 have been removed, and Wasin Thonkaew has a cosmetic change.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmHEj8UACgkQmmx57+YA
 GNnchRAAigY0M7hCG0qV/dRgAuxfnEyUtaV1zcx245tKjfQlVwUgecD33G/rwloO
 bFWkJBzTWjJSGNyHQwYvi6QF9JyLImT8EtyvHYXgNEjRfAxQj8ayRR9glUcEiOTX
 00OagUe9NC0Ui/NdX34J7b0ZMMm1XFij1jKQxaoy+U/I97mKqHHrRqJSHpRw77sK
 31948zf3mzJJD10Lo6CBvLngyN/gj5mVPYyrkrerdZ7X7QzzAsoO+gYTeEUDbkie
 4tj3tLwUNbqAUoJNO4lkN6BeXGph2G36tNTPgvJATyx9fZ7M5ayeK6I4QwxtlvBK
 xhmkHTCzc4m1xlwTPjdYO/g/UZHP8pa+WxOkTTSxdQgcgQh/p+zZl3MRxh+5lR/B
 iXcBy7qQfovDkgX7jW81KCJRJZztaFuLlbmQgskP53xsk/zj2A3oiH2pT3X07gGf
 qaZQ3xzeBJl4ax0U03QbC9WTjSqvZprprAbKrM52rzhehDCLHn3kYVMBvIeGI6eh
 m+mlZYBFEzvNh3ak9D5+oOB6iOctqK4h8giIEgvr6HJzxtbLsT1WZ6T4YEfD03fI
 hdttqkt4dBl06Ofql5TE88SbwEvXikG+nFZ8wIEgeRj9EErozCGjKbAd13SAEsJk
 u/ySXHJbwuoT+71SdFlqgj74A9PJWE6FJuVNoEZjAD5CxlVI1tA=
 =ZaDF
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "A few minor cleanups for cross-architecture code: Alexandre Ghiti
  deals with removing some leftovers from drivers and features that have
  been removed, and Wasin Thonkaew has a cosmetic change"

* tag 'asm-generic-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic/error-injection.h: fix a spelling mistake, and a coding style issue
  arch: Remove leftovers from prism54 wireless driver
  arch: Remove leftovers from mandatory file locking
  Documentation, arch: Remove leftovers from CIFS_WEAK_PW_HASH
  Documentation, arch: Remove leftovers from raw device
2022-01-10 08:42:28 -08:00
Masahiro Yamada
129ab0d2d9 kbuild: do not quote string values in include/config/auto.conf
The previous commit fixed up all shell scripts to not include
include/config/auto.conf.

Now that include/config/auto.conf is only included by Makefiles,
we can change it into a more Make-friendly form.

Previously, Kconfig output string values enclosed with double-quotes
(both in the .config and include/config/auto.conf):

    CONFIG_X="foo bar"

Unlike shell, Make handles double-quotes (and single-quotes as well)
verbatim. We must rip them off when used.

There are some patterns:

  [1] $(patsubst "%",%,$(CONFIG_X))
  [2] $(CONFIG_X:"%"=%)
  [3] $(subst ",,$(CONFIG_X))
  [4] $(shell echo $(CONFIG_X))

These are not only ugly, but also fragile.

[1] and [2] do not work if the value contains spaces, like
   CONFIG_X=" foo bar "

[3] does not work correctly if the value contains double-quotes like
   CONFIG_X="foo\"bar"

[4] seems to work better, but has a cost of forking a process.

Anyway, quoted strings were always PITA for our Makefiles.

This commit changes Kconfig to stop quoting in include/config/auto.conf.

These are the string type symbols referenced in Makefiles or scripts:

    ACPI_CUSTOM_DSDT_FILE
    ARC_BUILTIN_DTB_NAME
    ARC_TUNE_MCPU
    BUILTIN_DTB_SOURCE
    CC_IMPLICIT_FALLTHROUGH
    CC_VERSION_TEXT
    CFG80211_EXTRA_REGDB_KEYDIR
    EXTRA_FIRMWARE
    EXTRA_FIRMWARE_DIR
    EXTRA_TARGETS
    H8300_BUILTIN_DTB
    INITRAMFS_SOURCE
    LOCALVERSION
    MODULE_SIG_HASH
    MODULE_SIG_KEY
    NDS32_BUILTIN_DTB
    NIOS2_DTB_SOURCE
    OPENRISC_BUILTIN_DTB
    SOC_CANAAN_K210_DTB_SOURCE
    SYSTEM_BLACKLIST_HASH_LIST
    SYSTEM_REVOCATION_KEYS
    SYSTEM_TRUSTED_KEYS
    TARGET_CPU
    UNUSED_KSYMS_WHITELIST
    XILINX_MICROBLAZE0_FAMILY
    XILINX_MICROBLAZE0_HW_VER
    XTENSA_VARIANT_NAME

I checked them one by one, and fixed up the code where necessary.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-01-08 18:03:57 +09:00
Sachin Sant
f1aa0e47c2 powerpc/xmon: Dump XIVE information for online-only processors.
dxa command in XMON debugger iterates through all possible processors.
As a result, empty lines are printed even for processors which are not
online.

CPU 47:pp=00 CPPR=ff IPI=0x0040002f PQ=-- EQ idx=699 T=0 00000000 00000000
CPU 48:
CPU 49:

Restrict XIVE information(dxa) to be displayed for online processors only.

Signed-off-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/164139226833.12930.272224382183014664.sendpatchset@MacBook-Pro.local
2022-01-06 21:47:00 +11:00
Greg Kroah-Hartman
32a1bda4b1 powerpc/opal: use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the powerpc opal dump and elog sysfs code to use
default_groups field which has been the preferred way since aa30f47cf6
("kobject: Add support for default attribute groups to kobj_type") so
that we can soon get rid of the obsolete default_attrs field.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220104161318.1306023-1-gregkh@linuxfoundation.org
2022-01-05 10:59:52 +11:00
Greg Kroah-Hartman
2bdf3f9e9d powerpc/cacheinfo: use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the powerpc cacheinfo sysfs code to use default_groups
field which has been the preferred way since aa30f47cf6 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220104155450.1291277-1-gregkh@linuxfoundation.org
2022-01-05 10:58:23 +11:00
Guo Ren
08035a67f3 powerpc/sched: Remove unused TASK_SIZE_OF
This macro isn't used in Linux sched, now. Delete in
include/linux/sched.h and arch's include/asm.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211228064730.2882351-5-guoren@kernel.org
2022-01-04 23:12:27 +11:00
Ammar Faizi
18dbfcdedc powerpc/xive: Add missing null check after calling kmalloc
Commit 930914b7d5 ("powerpc/xive: Add a debugfs file to dump
internal XIVE state") forgot to add a null check.

Add it.

Fixes: 930914b7d5 ("powerpc/xive: Add a debugfs file to dump internal XIVE state")
Signed-off-by: Ammar Faizi <ammarfaizi2@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211226135314.251221-1-ammar.faizi@intel.com
2022-01-04 16:03:05 +11:00
Christophe JAILLET
e57c2fd6cd powerpc/floppy: Remove usage of the deprecated "pci-dma-compat.h" API
In [1], Christoph Hellwig has proposed to remove the wrappers in
include/linux/pci-dma-compat.h.

Some reasons why this API should be removed have been given by Julia
Lawall in [2].

A coccinelle script has been used to perform the needed transformation
Only relevant parts are given below.

@@ @@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@ @@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

[1]: https://lore.kernel.org/kernel-janitors/20200421081257.GA131897@infradead.org/
[2]: https://lore.kernel.org/kernel-janitors/alpine.DEB.2.22.394.2007120902170.2424@hadrien/

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9e24eedeab44cbb840598bb188561a48811de845.1641119338.git.christophe.jaillet@wanadoo.fr
2022-01-04 16:00:59 +11:00
Tianjia Zhang
41ea0f6c19 crypto: powerpc/sha - remove duplicate hash init function
sha*_base_init() series functions has implemented the initialization
of the hash context, this commit use sha*_base_init() function to
replace repeated implementations.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-12-31 18:10:55 +11:00
Jakub Kicinski
aec53e60e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
  commit 077cdda764 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
  commit 31108d142f ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  commit 4390c6edc0 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/

net/smc/smc_wr.c
  commit 49dc9013e3 ("net/smc: Use the bitmap API when applicable")
  commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
  bitmap_zero()/memset() is removed by the fix

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-30 12:12:12 -08:00
Linus Torvalds
f651faaaba powerpc fixes for 5.16 #5
Fix DEBUG_WX never reporting any WX mappings, due to use of an incorrect config symbol
 since we converted to using generic ptdump.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHK5joTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgOC0EACbVv34c9dt2ODchr45RHJFmfDdp/oh
 xhQPpSFWQZltfqsLcMwsSRdbVWOvDUDb5eUVJF6PdBhK9BhO30nq3vP8477N5Whp
 PJX237ygZVjFWVbhb0cQB7nmWdMAJhJ7inqIvHxHcgkmzf/M9D21AV/YM3zrZLT9
 NFgboT4TUHG7ll6s8xZce2AM+1Iq/3J8anFkh0AZJPPCdaefOh3JRAkAf7LDTYrV
 ARiVAImXW6zbtKcxZ7Qu5bDjPhHSB/wd00AKY3eqkGIEDwp78TDVItrIGVWX5Y+R
 U6abKgxkZwlMC1K+PCN0q53lg/eSX59Z4FEY/aiWZ2/zfNHU1CSIAVRu7SgL+JVN
 aypVpLuChp7TrWL0asngkJ8/ztFRx8SkBg0oxpLXOPkGU91KEhuedtBBlL60sfoG
 Xh55PvRxgjWUaNwiDX+f+G6Mx2zzaiGOUUeiwwexCtsji9Yt7Fe3NH2eteYLt45r
 S9hcyX/jfa/HMFTIoStqsb//pcQXi5fGA6xU/16aGNpIiRP8+nvrMcGSTuTO4CON
 RCO0mf6yd1Cdq+OrD7vX6lS8iXVhGJtfc49KQ6nPtMHkcMStEYS52jANlzfU3CKO
 ne3+mrX5raty7VkG0HRCfMauhgVPhHqbdlK1JNjSUagL0xXxf+88s3vVWuwRyIb2
 32tNV4XvWtysYw==
 =Pt/J
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "Fix DEBUG_WX never reporting any WX mappings, due to use of an
  incorrect config symbol since we converted to using generic ptdump"

* tag 'powerpc-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion
2021-12-28 11:42:01 -08:00
Michael Ellerman
fd1eaaaaa6 powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings
When CONFIG_PPC_RFI_SRR_DEBUG=y we check the SRR values before returning
from interrupts. This is done in asm using EMIT_BUG_ENTRY, and passing
BUGFLAG_WARNING.

However that fails to create an exception table entry for the warning,
and so do_program_check() fails the exception table search and proceeds
to call _exception(), resulting in an oops like:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 2 PID: 1204 Comm: sigreturn_unali Tainted: P                  5.16.0-rc2-00194-g91ca3d4f77c5 #12
  NIP:  c00000000000c5b0 LR: 0000000000000000 CTR: 0000000000000000
  ...
  NIP [c00000000000c5b0] system_call_common+0x150/0x268
  LR [0000000000000000] 0x0
  Call Trace:
  [c00000000db73e10] [c00000000000c558] system_call_common+0xf8/0x268 (unreliable)
  ...
  Instruction dump:
  7cc803a6 888d0931 2c240000 4082001c 38800000 988d0931 e8810170 e8a10178
  7c9a03a6 7cbb03a6 7d7a02a6 e9810170 <7f0b6088> 7d7b02a6 e9810178 7f0b6088

We should instead use EMIT_WARN_ENTRY, which creates an exception table
entry for the warning, allowing the warning to be correctly recognised,
and the code to resume after printing the warning.

Note however that because this warning is buried deep in the interrupt
return path, we are not able to recover from it (due to MSR_RI being
clear), so we still end up in die() with an unrecoverable exception.

Fixes: 59dc5bfca0 ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221135101.2085547-2-mpe@ellerman.id.au
2021-12-25 10:56:01 +11:00
Michael Ellerman
314f6c23dd powerpc/64s: Mask NIP before checking against SRR0
When CONFIG_PPC_RFI_SRR_DEBUG=y we check that NIP and SRR0 match when
returning from interrupts. This can trigger falsely if NIP has either of
its two low bits set via sigreturn or ptrace, while SRR0 has its low two
bits masked in hardware.

As a quick fix make sure to mask the low bits before doing the check.

Fixes: 59dc5bfca0 ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20211221135101.2085547-1-mpe@ellerman.id.au
2021-12-25 10:55:55 +11:00
Randy Dunlap
5b09250cca powerpc/perf: Fix spelling of "its"
Use the possessive "its" instead of the contraction of "it is" (it's).

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211223003942.22098-1-rdunlap@infradead.org
2021-12-23 22:36:58 +11:00
Christophe Leroy
bba496656a powerpc/32: Fix boot failure with GCC latent entropy plugin
Boot fails with GCC latent entropy plugin enabled.

This is due to early boot functions trying to access 'latent_entropy'
global data while the kernel is not relocated at its final
destination yet.

As there is no way to tell GCC to use PTRRELOC() to access it,
disable latent entropy plugin in early_32.o and feature-fixups.o and
code-patching.o

Fixes: 38addce8b6 ("gcc-plugins: Add latent_entropy plugin")
Cc: stable@vger.kernel.org # v4.9+
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215217
Link: https://lore.kernel.org/r/2bac55483b8daf5b1caa163a45fa5f9cdbe18be4.1640178426.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:58 +11:00
Christophe Leroy
309a0a6018 powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests
The purpose of selftests is to check that instructions are
properly formed. Not to check that they properly run.

For that test it uses normal memory, not special test
memory.

In preparation of a future patch enforcing patch_instruction()
to be used only on valid text areas, implement a ppc_inst_write()
instruction which is the complement of ppc_inst_read(). This
new function writes the formated instruction in valid kernel
memory and doesn't bother about icache.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7cf5335cc07ca9b6f8cdaa20ca9887fce4df3bea.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:58 +11:00
Christophe Leroy
f30a578d76 powerpc/code-patching: Move code patching selftests in its own file
Code patching selftests are half of code-patching.c.
As they are guarded by CONFIG_CODE_PATCHING_SELFTESTS,
they'd be better in their own file.

Also add a missing __init for instr_is_branch_to_addr()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c0c30504f04eb546a48ff77127a8bccd12a3d809.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:58 +11:00
Christophe Leroy
31acc59956 powerpc/code-patching: Move instr_is_branch_{i/b}form() in code-patching.h
To enable moving selftests in their own C file in following patch,
move instr_is_branch_iform() and instr_is_branch_bform()
to code-patching.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fca0f3b191211b3681020885a611bf73eef20563.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:58 +11:00
Christophe Leroy
29562a9da2 powerpc/code-patching: Move patch_exception() outside code-patching.c
patch_exception() is dedicated to book3e/64 is nothing more than
a normal use of patch_branch(), so move it into a place dedicated
to book3e/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0968622b98b1fb51838c35b844c42ad6609de62e.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:55 +11:00
Christophe Leroy
ff14a9c09f powerpc/code-patching: Use test_trampoline for prefixed patch test
Use the dedicated test_trampoline function for testing prefixed
patching like other tests and remove the hand coded assembly stuff.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a450ef3f8653f75e1bd9aaf7a3889d379752f33b.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:25 +11:00
Christophe Leroy
d5937db114 powerpc/code-patching: Fix patch_branch() return on out-of-range failure
Do not silentely ignore a failure of create_branch() in
patch_branch(). Return -ERANGE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8540cb64b1f06710eaf41e3835c7ba3e21fa2b05.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
6b21af7449 powerpc/code-patching: Reorganise do_patch_instruction() to ease error handling
Split do_patch_instruction() in two functions, the caller doing the
spin locking and the callee doing everything else.

And remove a few unnecessary initialisations and intermediate
variables.

This allows the callee to return from anywhere in the function.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dbc85980a0d2a935731b272e8907e8bb1d8fc8c5.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
a3483c3dd1 powerpc/code-patching: Fix unmap_patch_area() error handling
pXd_offset() doesn't return NULL. When the base is NULL, it
still adds the offset.

Use pXd_none() to check validity instead. It also improves
performance by folding out none existing levels as pXd_none()
always returns 0 in that case.

Such an error is unexpected, use WARN_ON() so that the caller
doesn't have to worry about it, and drop the returned value.

And now that unmap_patch_area() doesn't return error, we can
take into account the error returned by __patch_instruction().

While at it, remove the 'inline' property which is useless.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/299804b117fae35c786c827536c91f25352e279b.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
285672f993 powerpc/code-patching: Fix error handling in do_patch_instruction()
Use real errors instead of using -1 as error, so that errors
returned by callees can be used towards callers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/85259d894069e47f915ea580b169e1adbeec7a61.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
af5304a750 powerpc/code-patching: Remove init_mem_is_free
A new state has been added by commit d2635f2012 ("mm: create a new
system state and fix core_kernel_text()"). That state tells when
initmem is about to be released and is redundant with init_mem_is_free.

Remove init_mem_is_free.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ad8c3ccb39c8edaa89fd3eda1cc7218baea1cde5.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
edecd2d6d6 powerpc/code-patching: Remove pr_debug()/pr_devel() messages and fix check()
code-patching has been working for years now, time has come to
remove debugging messages.

Change useful message to KERN_INFO and remove other ones.

Also add KERN_ERR to check() macro and change it into a do/while
to make checkpatch happy.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3ff9823c0a812a8a145d979a9600a6d4591b80ee.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Alexey Kardashevskiy
62479e6e26 powerpc/mm/book3s64/hash: Switch pre 2.06 tlbiel to .long
The llvm integrated assembler does not recognise the ISA 2.05 tlbiel
version. Work around it by switching to .long when an old arch level
detected.

Signed-off-by: Daniel Axtens <dja@axtens.net>
[aik: did "Eventually do this more smartly"]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-7-aik@ozlabs.ru
2021-12-23 22:35:13 +11:00
Alexey Kardashevskiy
d51f86cfd8 powerpc/mm: Switch obsolete dssall to .long
The dssall ("Data Stream Stop All") instruction is obsolete altogether
with other Data Cache Instructions since ISA 2.03 (year 2006).

LLVM IAS does not support it but PPC970 seems to be using it.
This switches dssall to .long as there is no much point in fixing LLVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-6-aik@ozlabs.ru
2021-12-23 22:35:13 +11:00
Daniel Axtens
d72c4a36d7 powerpc/64/asm: Do not reassign labels
The LLVM integrated assembler really does not like us reassigning things
to the same label:

<instantiation>:7:9: error: invalid reassignment of non-absolute variable 'fs_label'

This happens across a bunch of platforms:
https://github.com/ClangBuiltLinux/linux/issues/1043
https://github.com/ClangBuiltLinux/linux/issues/1008
https://github.com/ClangBuiltLinux/linux/issues/920
https://github.com/ClangBuiltLinux/linux/issues/1050

There is no hope of getting this fixed in LLVM (see
https://github.com/ClangBuiltLinux/linux/issues/1043#issuecomment-641571200
and https://bugs.llvm.org/show_bug.cgi?id=47798#c1 )
so if we want to build with LLVM_IAS, we need to hack
around it ourselves.

For us the big problem comes from this:

\#define USE_FIXED_SECTION(sname)				\
	fs_label = start_##sname;				\
	fs_start = sname##_start;				\
	use_ftsec sname;

\#define USE_TEXT_SECTION()
	fs_label = start_text;					\
	fs_start = text_start;					\
	.text

and in particular fs_label.

This works around it by not setting those 'variables' and requiring
that users of the variables instead track for themselves what section
they are in. This isn't amazing, by any stretch, but it gets us further
in the compilation.

Note that even though users have to keep track of the section, using
a wrong one produces an error with both binutils and llvm which prevents
from using wrong section at the compile time:

llvm error example:

AS      arch/powerpc/kernel/head_64.o
<unknown>:0: error: Cannot represent a difference across sections
make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1

binutils error example:

/home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
/home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S:1974: Error: can't resolve `system_call_common' {.text section} - `start_r
eal_vectors' {.head.text.real_vectors section}
make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-5-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Alexey Kardashevskiy
fd98395797 powerpc/64/asm: Inline BRANCH_TO_C000
It is used just once and does not really help with readability, remove it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-4-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Daniel Axtens
f5140cab44 powerpc: check for support for -Wa,-m{power4,any}
LLVM's integrated assembler does not like either -Wa,-mpower4
or -Wa,-many. So just don't pass them if they're not supported.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-3-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Alan Modra
a3ad84da07 powerpc/toc: Future proof kernel toc
This patch future-proofs the kernel against linker changes that might
put the toc pointer at some location other than .got+0x8000, by
replacing __toc_start+0x8000 with .TOC. throughout.  If the kernel's
idea of the toc pointer doesn't agree with the linker, bad things
happen.

prom_init.c code relocating its toc is also changed so that a symbolic
__prom_init_toc_start toc-pointer relative address is calculated
rather than assuming that it is always at toc-pointer - 0x8000.  The
length calculations loading values from the toc are also avoided.
It's a little incestuous to do that with unreloc_toc picking up
adjusted values (which is fine in practice, they both adjust by the
same amount if all goes well).

I've also changed the way .got is aligned in vmlinux.lds and
zImage.lds, mostly so that dumping out section info by objdump or
readelf plainly shows the alignment is 256.  This linker script
feature was added 2005-09-27, available in FSF binutils releases from
2.17 onwards.  Should be safe to use in the kernel, I think.

Finally, put *(.got) before the prom_init.o entry which only needs
*(.toc), so that the GOT header goes in the correct place.  I don't
believe this makes any difference for the kernel as it would for
dynamic objects being loaded by ld.so.  That change is just to stop
lusers who blindly copy kernel scripts being led astray.  Of course,
this change needs the prom_init.c changes.

Some notes on .toc and .got.

.toc is a compiler generated section of addresses.  .got is a linker
generated section of addresses, generally built when the linker sees
R_*_*GOT* relocations.  In the case of powerpc64 ld.bfd, there are
multiple generated .got sections, one per input object file.  So you
can somewhat reasonably write in a linker script an input section
statement like *prom_init.o(.got .toc) to mean "the .got and .toc
section for files matching *prom_init.o".  On other architectures that
doesn't make sense, because the linker generally has just one .got
section.  Even on powerpc64, note well that the GOT entries for
prom_init.o may be merged with GOT entries from other objects.  That
means that if prom_init.o references, say, _end via some GOT
relocation, and some other object also references _end via a GOT
relocation, the GOT entry for _end may be in the range
__prom_init_toc_start to __prom_init_toc_end and if the kernel does
something special to GOT/TOC entries in that range then the value of
_end as seen by objects other than prom_init.o will be affected.  On
the other hand the GOT entry for _end may not be in the range
__prom_init_toc_start to __prom_init_toc_end.  Which way it turns out
is deterministic but a detail of linker operation that should not be
relied on.

A feature of ld.bfd is that input .toc (and .got) sections matching
one linker input section statement may be sorted, to put entries used
by small-model code first, near the toc base.  This is why scripts for
powerpc64 normally use *(.got .toc) rather than *(.got) *(.toc), since
the first form allows more freedom to sort.

Another feature of ld.bfd is that indirect addressing sequences using
the GOT/TOC may be edited by the linker to relative addressing.  In
many cases relative addressing would be emitted by gcc for
-mcmodel=medium if you appropriately decorate variable declarations
with non-default visibility.

The original patch is here:
https://lore.kernel.org/linuxppc-dev/20210310034813.GM6042@bubble.grove.modra.org/

Signed-off-by: Alan Modra <amodra@au1.ibm.com>
[aik: removed non-relocatable which is gone in 24d33ac5b8]
[aik: added <=2.24 check]
[aik: because of llvm-as, kernel_toc_addr() uses "mr" instead of global register variable]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-2-aik@ozlabs.ru
2021-12-23 22:35:01 +11:00
Nick Child
7da1d1ddd1 cuda/pmu: Make find_via_cuda/pmu init functions
Make `find_via_cuda` and `find_via_pmu` initialization functions.
Previously, their definitions in `drivers/macintosh/via-cuda.h` include
the `__init` attribute but their alternative definitions in
`arch/powerpc/powermac/sectup./c` and prototypes in `include/linux/
cuda.h` and `include/linux/pmu.h` do not use the `__init` macro. Since,
only initialization functions call `find_via_cuda` and `find_via_pmu`
it is safe to label these functions with `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-21-nick.child@ibm.com
2021-12-23 22:35:00 +11:00
Nick Child
2493a24271 powerpc/512x: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/512x' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-20-nick.child@ibm.com
2021-12-23 22:33:19 +11:00
Nick Child
407454cafd powerpc/85xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/85xx' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-19-nick.child@ibm.com
2021-12-23 22:33:18 +11:00
Nick Child
f4a88b0ef5 powerpc/83xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/83xx' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-18-nick.child@ibm.com
2021-12-23 22:33:18 +11:00
Nick Child
c0dc225ae7 powerpc/embedded6xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/embedded6xx' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-17-nick.child@ibm.com
2021-12-23 22:33:17 +11:00
Nick Child
1ee969be25 powerpc/44x: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/44x/' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-16-nick.child@ibm.com
2021-12-23 22:33:17 +11:00
Nick Child
1e3d992d21 powerpc/4xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/4xx' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-15-nick.child@ibm.com
2021-12-23 22:33:16 +11:00
Nick Child
f1ba9b9474 powerpc/ps3: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/ps3' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-14-nick.child@ibm.com
2021-12-23 22:33:16 +11:00
Nick Child
e14ff96d08 powerpc/pseries: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/pseries' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-13-nick.child@ibm.com
2021-12-23 22:33:15 +11:00
Nick Child
e5913db1ef powerpc/powernv: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/powernv' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-12-nick.child@ibm.com
2021-12-23 22:33:15 +11:00
Nick Child
b346f57100 powerpc/powermac: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/powermac` are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-11-nick.child@ibm.com
2021-12-23 22:33:14 +11:00
Nick Child
e37e06af9b powerpc/pasemi: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/pasemi' are deserving
of an `__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-10-nick.child@ibm.com
2021-12-23 22:33:14 +11:00
Nick Child
d3aa3c5edf powerpc/chrp: Add __init attribute to eligible functions
The function `Enable_SRAM` defined in 'arch/powerpc/platforms/chrp' is
deserving of an `__init` macro attribute. This function is only called by
other initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-9-nick.child@ibm.com
2021-12-23 22:33:13 +11:00
Nick Child
7c1ab16b2d powerpc/cell: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/cell' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-8-nick.child@ibm.com
2021-12-23 22:33:13 +11:00
Nick Child
456e8eb324 powerpc/xmon: Add __init attribute to eligible functions
`xmon_register_spus` defined in 'arch/powerpc/xmon' is deserving of an
`__init` macro attribute. This functions is only called by other
initialization functions and therefore should inherit the attribute.
Also, change the function declaration in the header file to include
`__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-7-nick.child@ibm.com
2021-12-23 22:33:12 +11:00
Nick Child
6c552983d0 powerpc/sysdev: Add __init attribute to eligible functions
Some files functions in 'arch/powerpc/sysdev' are deserving of an `__init`
macro attribute. These functions are only called by other initialization
functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-6-nick.child@ibm.com
2021-12-23 22:33:12 +11:00
Nick Child
c49f5d88ff powerpc/perf: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/perf' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-5-nick.child@ibm.com
2021-12-23 22:33:11 +11:00
Nick Child
c13f2b2bb5 powerpc/mm: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/mm' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-4-nick.child@ibm.com
2021-12-23 22:33:11 +11:00
Nick Child
ce0c6be9c6 powerpc/lib: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/lib' are deserving of an `__init`
macro attribute. These functions are only called by other initialization
functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-3-nick.child@ibm.com
2021-12-23 22:33:10 +11:00
Nick Child
d276960d92 powerpc/kernel: Add __init attribute to eligible functions
Some functions defined in `arch/powerpc/kernel` (and one in `arch/powerpc/
kexec`) are deserving of an `__init` macro attribute. These functions are
only called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-2-nick.child@ibm.com
2021-12-23 22:33:10 +11:00
Rob Herring
9cbbe6bae9 powerpc/dts: Remove "spidev" nodes
"spidev" is not a real device, but a Linux implementation detail. It has
never been documented either. The kernel has WARNed on the use of it for
over 6 years. Time to remove its usage from the tree.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211217221400.3667133-1-robh@kernel.org
2021-12-21 20:17:58 +11:00
Michael Ellerman
8d84fca437 powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion
In note_prot_wx() we bail out without reporting anything if
CONFIG_PPC_DEBUG_WX is disabled.

But CONFIG_PPC_DEBUG_WX was removed in the conversion to generic ptdump,
we now need to use CONFIG_DEBUG_WX instead.

Fixes: e084728393 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20211203124112.2912562-1-mpe@ellerman.id.au
2021-12-21 13:05:59 +11:00
Nicholas Piggin
467ba14e16 powerpc/64s/radix: Fix huge vmap false positive
pmd_huge() is defined to false when HUGETLB_PAGE is not configured, but
the vmap code still installs huge PMDs. This leads to false bad PMD
errors when vunmapping because it is not seen as a huge PTE, and the bad
PMD check catches it. The end result may not be much more serious than
some bad pmd warning messages, because the pmd_none_or_clear_bad() does
what we wanted and clears the huge PTE anyway.

Fix this by checking pmd_is_leaf(), which checks for a PTE regardless of
config options. The whole huge/large/leaf stuff is a tangled mess but
that's kernel-wide and not something we can improve much in arch/powerpc
code.

pmd_page(), pud_page(), etc., called by vmalloc_to_page() on huge vmaps
can similarly trigger a false VM_BUG_ON when CONFIG_HUGETLB_PAGE=n, so
those checks are adjusted. The checks were added by commit d6eacedd1f
("powerpc/book3s: Use config independent helpers for page table walk"),
while implementing a similar fix for other page table walking functions.

Fixes: d909f9109c ("powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216103342.609192-1-npiggin@gmail.com
2021-12-20 12:13:32 +11:00
Yang Guang
a605b39e8e powerpc: use swap() to make code cleaner
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
[mpe: Add include of linux/minmax.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/71a702c2189b16c152affd8a8cda1d84ce32741c.1639792543.git.yang.guang5@zte.com.cn
2021-12-20 12:01:21 +11:00
Christophe JAILLET
2fe4ca6ad7 powerpc/mpic: Use bitmap_zalloc() when applicable
'mpic->protected' is a bitmap. So use 'bitmap_zalloc()' to simplify
code and improve the semantic, instead of hand writing it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa145f674e08044c98f13f1a985faa9cc29c3708.1639777976.git.christophe.jaillet@wanadoo.fr
2021-12-20 11:54:05 +11:00
Linus Torvalds
713ab911f2 powerpc fixes for 5.16 #4
Fix a recently introduced oops at boot on 85xx in some configurations.
 
 Fix crashes when loading some livepatch modules with STRICT_MODULE_RWX.
 
 Thanks to: Joe Lawrence, Russell Currey, Xiaoming Ni.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmG+q+UTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG5sEACVsb39+Q8w0GKJxUGR0Wri1InJTkLC
 UWpd9PVzG6n8+Epl0f+H5cMlv5V6xaDzTsNXEf27xIACMlPSVHVLDW8u/RFPizdp
 fUDUweGUvK/M0GQW89nRV29DjdNjMmUq1KQFM6eEgYN/0nS3HqF27jamiaFhiZ+e
 pJl1g2dy/DTWrFz3V/8b0nv6kyJE8GlpnCmTchDsPNDbzWQVJXVfM7IB/9N+EVna
 MQAZCyKSOafTgNRPOW+Sm4oF64z+Yn4cxURVKfzbM+0Cd8pNIcGZ7McXrnp0DR4t
 bLGeYEzyR1bPc3VTt7b+jafDP41AakRllKAFz65+pdYOcV1hTRM4inbLVPT5Bmz6
 7ANlX/w4BwYWFk3mWW14BoxU02FlRKRddXhusq7x2wIqvtn1KI8Qm1xpneiGpcW+
 GEe+k9ONGowXp/bc47/Co6dPSHKRRHUC3MsgbDv2Y7M5G9vsetckdQzA2bIqYSx8
 r1ZXNiMwQarPeB6FrGvNbekyCsIhJkk6chhtL0k27BVOFqYiHZtaqBs+RCQYPdC8
 Aih8WOEqdldbItFI3MihyKULucgkVpdnm80Ja67qRthuJICF9OML3mdrt83bxEIp
 fu4Bsw8dygVKJWScZj8AUC/ZBxOrrTGyZR1RvsHpaAiEfinMkxxTnXjqNDiSZWXt
 DKKLx6pfSFrokw==
 =Nmp5
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix a recently introduced oops at boot on 85xx in some configurations.

  Fix crashes when loading some livepatch modules with
  STRICT_MODULE_RWX.

  Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni"

* tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/module_64: Fix livepatching for RO modules
  powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
2021-12-19 11:31:14 -08:00
Paolo Bonzini
5a213b9220 Merge branch 'topic/ppc-kvm' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD
Fix conflicts between memslot overhaul and commit 511d25d6b7 ("KVM:
PPC: Book3S: Suppress warnings when allocating too big memory slots")
from the powerpc tree.
2021-12-19 15:27:21 +01:00
Alexandre Ghiti
e0cb56546d arch: Remove leftovers from prism54 wireless driver
This driver was removed so remove all references to it.

Fixes: d249ff28b1 ("intersil: remove obsolete prism54 wireless driver")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-17 14:12:09 +01:00
Alexandre Ghiti
2ac7069ad7 Documentation, arch: Remove leftovers from CIFS_WEAK_PW_HASH
This config was removed so remove all references to it.

Fixes: 76a3c92ec9 ("cifs: remove support for NTLM and weaker authentication algorithms")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Reviewed-by: Steve French <smfrench@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de> [arch/arm/configs]
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-17 14:12:03 +01:00
Alexandre Ghiti
473dcf0ffc Documentation, arch: Remove leftovers from raw device
Raw device interface was removed so remove all references to configs
related to it.

Fixes: 603e4922f1 ("remove the raw driver")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Acked-by: Arnd Bergmann <arnd@arndb.de> [arch/arm/configs]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-17 14:07:52 +01:00
Rob Herring
1f012283e9 of/fdt: Rework early_init_dt_scan_memory() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework
early_init_dt_scan_memory() to be called directly and use libfdt.

Cc: John Crispin <john@phrozen.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linux-mips@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211215150102.1303588-1-robh@kernel.org
2021-12-16 16:07:52 -06:00
Rob Herring
d665881d21 of/fdt: Rework early_init_dt_scan_root() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework early_init_dt_scan_root()
to be called directly and use libfdt.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Link: https://lore.kernel.org/r/20211118181213.1433346-3-robh@kernel.org
2021-12-16 16:07:48 -06:00
Rob Herring
60f20d84dc of/fdt: Rework early_init_dt_scan_chosen() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework
early_init_dt_scan_chosen() to be called directly and use libfdt.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Link: https://lore.kernel.org/r/20211118181213.1433346-2-robh@kernel.org
2021-12-16 16:07:41 -06:00
Thomas Gleixner
706b585a1b powerpc/mpic_u3msi: Use msi_for_each-desc()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.576162169@linutronix.de
2021-12-16 22:22:19 +01:00
Thomas Gleixner
ab430e7437 powerpc/fsl_msi: Use msi_for_each_desc()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.522641685@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
e22b0d1bbf powerpc/pasemi/msi: Convert to msi_on_each_dec()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.468512783@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
3c46658bd7 powerpc/cell/axon_msi: Convert to msi_on_each_desc()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.414712173@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
85dabc2f72 powerpc/4xx/hsta: Rework MSI handling
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.359766435@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
651b39c488 powerpc/pseries/msi: Let core code check for contiguous entries
Set the domain info flag and remove the check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221814.720998720@linutronix.de
2021-12-16 22:16:40 +01:00
Thomas Gleixner
173ffad79d PCI/MSI: Use msi_desc::msi_index
The usage of msi_desc::pci::entry_nr is confusing at best. It's the index
into the MSI[X] descriptor table.

Use msi_desc::msi_index which is shared between all MSI incarnations
instead of having a PCI specific storage for no value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211210221814.602911509@linutronix.de
2021-12-16 22:16:40 +01:00
Thomas Gleixner
ed1533b581 powerpc/pseries/msi: Use PCI device properties
instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221813.556202506@linutronix.de
2021-12-16 22:16:38 +01:00
Thomas Gleixner
d8a530578b powerpc/cell/axon_msi: Use PCI device property
instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211210221813.493922179@linutronix.de
2021-12-16 22:16:38 +01:00
Emmanuel Gil Peyrot
c636783d59 powerpc: wii_defconfig: Enable the RTC driver
This selects the rtc-gamecube driver, which provides a real-time clock
on this platform.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211215175501.6761-6-linkmauve@linkmauve.fr
2021-12-16 21:49:51 +01:00
Nicholas Piggin
3b54c71537 powerpc/pseries: use slab context cpumask allocation in CPU hotplug init
Slab is up at this point, using the bootmem allocator triggers a
warning. Switch to using the regular cpumask allocator.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105132923.1582514-1-npiggin@gmail.com
2021-12-16 21:31:46 +11:00
Nicholas Piggin
af47d79b04 powerpc/64s/interrupt: avoid saving CFAR in some asynchronous interrupts
Reading the CFAR register is quite costly (~20 cycles on POWER9). It is
a good idea to have for most synchronous interrupts, but for async ones
it is much less important.

Doorbell, external, and decrementer interrupts are the important
asynchronous ones. HV interrupts can't skip CFAR if KVM HV is possible,
because it might be a guest exit that requires CFAR preserved. But the
important pseries interrupts can avoid loading CFAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-7-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
ecb1057c0f powerpc/64/interrupt: reduce expensive debug tests
Move the assertions requiring restart table searches under
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-6-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
0faf20a1ad powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use
Enabling MSR[EE] in interrupt handlers while interrupts are still soft
masked allows PMIs to profile interrupt handlers to some degree, beyond
what SIAR latching allows.

When perf is not being used, this is almost useless work. It requires an
extra mtmsrd in the irq handler, and it also opens the door to masked
interrupts hitting and requiring replay, which is more expensive than
just taking them directly. This effect can be noticable in high IRQ
workloads.

Avoid enabling MSR[EE] unless perf is currently in use. This saves about
60 cycles (or 8%) on a simple decrementer interrupt microbenchmark.
Replayed interrupts drop from 1.4% of all interrupts taken, to 0.003%.

This does prevent the soft-nmi interrupt being taken in these handlers,
but that's not too reliable anyway. The SMP watchdog will continue to be
the reliable way to catch lockups.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-5-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
5a7745b96f powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI
Interrupt code enables MSR[EE] in some irq handlers while keeping local
irqs disabled via soft-mask, allowing PMI interrupts to be taken as
soft-NMI to improve profiling of irq handlers.

When perf is not enabled, there is no point to doing this, it's
additional overhead. So provide a function that can say if PMIs should
be taken promptly if possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-4-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
ff0b0d6e1a powerpc/64s/interrupt: handle MSR EE and RI in interrupt entry wrapper
The mtmsrd to enable MSR[RI] can be combined with the mtmsrd to enable
MSR[EE] in interrupt entry code, for those interrupts which enable EE.
This helps performance of important synchronous interrupts (e.g., page
faults).

This is similar to what commit dd152f70bd ("powerpc/64s: system call
avoid setting MSR[RI] until we set MSR[EE]") does for system calls.

Do this by enabling EE and RI together at the beginning of the entry
wrapper if PACA_IRQ_HARD_DIS is clear, and only enabling RI if it is
set.

Asynchronous interrupts set PACA_IRQ_HARD_DIS, but synchronous ones
leave it unchanged, so by default they always get EE=1 unless they have
interrupted a caller that is hard disabled. When the sync interrupt
later calls interrupt_cond_local_irq_enable(), it will not require
another mtmsrd because MSR[EE] was already enabled here.

This avoids one mtmsrd L=1 for synchronous interrupts on 64s, which
saves about 20 cycles on POWER9. And for kernel-mode interrupts, both
synchronous and asynchronous, this saves an additional 40 cycles due to
the mtmsrd being moved ahead of mfspr SPRN_AMR, which prevents a SPR
scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-3-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
4423eb5ae3 powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible
Make synchronous interrupt handler entry wrappers enable MSR[EE] if
MSR[EE] was enabled in the interrupted context. IRQs are soft-disabled
at this point so there is no change to high level code, but it's a
masked interrupt could fire.

This is a performance disadvantage for interrupts which do not later
call interrupt_cond_local_irq_enable(), because an an additional mtmsrd
or wrtee instruction is executed. However the important synchronous
interrupts (e.g., page fault) do enable interrupts, so the performance
disadvantage is mostly avoided.

In the next patch, MSR[RI] enabling can be combined with MSR[EE]
enabling, which mitigates the performance drop for the former and gives
a performance advanage for the latter interrupts, on 64s machines. 64e
is coming along for the ride for now to avoid divergences with 64s in
this tricky code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-2-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
0a006ace63 powerpc/pseries/vas: Don't print an error when VAS is unavailable
KVM does not support VAS so guests always print a useless error on boot

    vas: HCALL(398) error -2, query_type 0, result buffer 0x57f2000

Change this to only print the message if the error is not H_FUNCTION.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211126052133.1664375-1-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Kajol Jain
6ed05a8efd powerpc/perf: Add data source encodings for power10 platform
The code represent memory/cache level data based on PERF_MEM_LVL_*
namespace, which is in the process of deprication in the favour of
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
Add data source encodings to represent cache/memory data based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.

Add data source encodings to represent data coming from local
memory/Remote memory/distant memory and remote/distant cache hits.

In order to represent data coming from OpenCAPI cache/memory, we use
LVLNUM "PMEM" field which is used to present persistent memory accesses.

Result in power10 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    29.46%          2331  L1 or L1 hit              [.] __random                                     libc-2.28.so
    23.11%          2121  L1 or L1 hit              [.] producer_populate_cache                      producer_consumer
    18.56%          1758  L1 or L1 hit              [.] __random_r                                   libc-2.28.so
    15.64%          1559  L2 or L2 hit              [.] __random                                     libc-2.28.so
    .....
    0.09%              5  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    0.07%              4  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    .....

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-5-kjain@linux.ibm.com
2021-12-16 21:31:44 +11:00
Kajol Jain
4a20ee1061 powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields
The code represent data coming from L1/L2/L3 cache hits based on
PERF_MEM_LVL_* namespace, which is in the process of deprecation in
the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
fields.

Add data source encodings to represent L1/L2/L3 cache hits based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
power10 and older platforms

Result in power9 system without patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                             Shared Object
 # ........  ............  ........................  .................................  ................
 #
    29.51%             1  L2 hit                    [k] perf_event_exec                [kernel.vmlinux]
    27.05%             1  L1 hit                    [k] perf_ctx_unlock                [kernel.vmlinux]
    13.93%             1  L1 hit                    [k] vtime_delta                    [kernel.vmlinux]
    13.11%             1  L1 hit                    [k] prepend_path.isra.11           [kernel.vmlinux]
     8.20%             1  L1 hit                    [.] 00000038.plt_call.__GI_strlen  libc-2.28.so
     8.20%             1  L1 hit                    [k] perf_event_interrupt           [kernel.vmlinux]

Result in power9 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    36.63%             1  L2 or L2 hit              [k] perf_event_exec         [kernel.vmlinux]
    25.50%             1  L1 or L1 hit              [k] vtime_delta             [kernel.vmlinux]
    13.12%             1  L1 or L1 hit              [k] unmap_region            [kernel.vmlinux]
    12.62%             1  L1 or L1 hit              [k] perf_sample_event_took  [kernel.vmlinux]
     6.93%             1  L1 or L1 hit              [k] perf_ctx_unlock         [kernel.vmlinux]
     5.20%             1  L1 or L1 hit              [.] __memcpy_power7         libc-2.28.so

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-4-kjain@linux.ibm.com
2021-12-16 21:31:44 +11:00
Emmanuel Gil Peyrot
57bd7d3565 powerpc: gamecube_defconfig: Enable the RTC driver
This selects the rtc-gamecube driver, which provides a real-time clock
on this platform.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211215175501.6761-5-linkmauve@linkmauve.fr
2021-12-16 10:46:35 +01:00
Emmanuel Gil Peyrot
5479618e1e powerpc: wii.dts: Expose HW_SRNPROT on this platform
This Hollywood register isn’t properly understood, but can allow or
reject access to the SRAM, which we need to set for RTC usage if it
isn’t previously set correctly beforehand.

See https://wiibrew.org/wiki/Hardware/Hollywood_Registers#HW_SRNPROT

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211215175501.6761-4-linkmauve@linkmauve.fr
2021-12-16 10:46:35 +01:00
Michael Ellerman
708da3ff1d Merge branch 'topic/ppc-kvm' into next
Bring in some more KVM commits from our KVM topic branch.
2021-12-15 11:29:53 +11:00
Russell Currey
8734b41b3e powerpc/module_64: Fix livepatching for RO modules
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
writes through the text poke area by using patch_instruction().

R_PPC_REL24 is the only relocation type generated by the kpatch-build
userspace tool or klp-convert kernel tree that I observed applying a
relocation to a post-init module.

A more comprehensive solution is planned, but using patch_instruction()
for R_PPC_REL24 on should serve as a sufficient fix.

This does have a performance impact, I observed ~15% overhead in
module_load() on POWER8 bare metal with checksum verification off.

Fixes: c35717c71e ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
[mpe: Check return codes from patch_instruction()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211214121248.777249-1-mpe@ellerman.id.au
2021-12-14 23:13:03 +11:00
Sean Christopherson
63fa47ba88 KVM: PPC: Book3S HV P9: Use kvm_arch_vcpu_get_wait() to get rcuwait object
Use kvm_arch_vcpu_get_wait() to get a vCPU's rcuwait object instead of
using vcpu->wait directly in kvmhv_run_single_vcpu().  Functionally, this
is a nop as vcpu->arch.waitp is guaranteed to point at vcpu->wait.  But
that is not obvious at first glance, and a future change coming in via
the KVM tree, commit 510958e997 ("KVM: Force PPC to define its own
rcuwait object"), will hide vcpu->wait from architectures that define
__KVM_HAVE_ARCH_WQP to prevent generic KVM from attepting to wake a vCPU
with the wrong rcuwait object.

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211213174556.3871157-1-seanjc@google.com
2021-12-14 22:49:36 +11:00
Eric W. Biederman
0e25498f8c exit: Add and use make_task_dead.
There are two big uses of do_exit.  The first is it's design use to be
the guts of the exit(2) system call.  The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure.  In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Ingo Molnar
6773cc31a9 Linux 5.16-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmG2fU0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC7EH/3R7Rt+OD8Wn8Ss3
 w8V+dBxVwa2u2oMTyUHPxaeOXZ7bi38XlUdLFPOK/76bGwO0a5TmYZqsWdRbGyT0
 HfcYjHsQ0lbJXk/nh2oM47oJxJXVpThIHXJEk0FZ0Y5t+DYjIYlNHzqZymUyhLem
 St74zgWcyT+MXuqY34vB827FJDUnOxhhhi85tObeunaSPAomy9aiYidSC1ARREnz
 iz2VUntP/QnRnKVvL2nUZNzcz1xL5vfCRSKsRGRSv3qW1Y/1M71ylt6JVmSftWq+
 VmMdFxFhdrb1OK/1ct/930Un/UP2NG9EJsWxote2XYlnVSZHzDqH7lUhbqgdCcLz
 1m2tVNY=
 =7wRd
 -----END PGP SIGNATURE-----

Merge tag 'v5.16-rc5' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-12-13 10:48:46 +01:00
Jakub Kicinski
be3158290d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2021-12-10 v2

We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).

The main changes are:

1) Various samples fixes, from Alexander Lobakin.

2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.

3) A batch of new unified APIs for libbpf, logging improvements, version
   querying, etc. Also a batch of old deprecations for old APIs and various
   bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.

4) BPF documentation reorganization and improvements, from Christoph Hellwig
   and Dave Tucker.

5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
   libbpf, from Hengqi Chen.

6) Verifier log fixes, from Hou Tao.

7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.

8) Extend branch record capturing to all platforms that support it,
   from Kajol Jain.

9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.

10) bpftool doc-generating script improvements, from Quentin Monnet.

11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.

12) Deprecation warning fix for perf/bpf_counter, from Song Liu.

13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
    from Tiezhu Yang.

14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.

15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
    Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
    and others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
  libbpf: Add "bool skipped" to struct bpf_map
  libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
  bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
  selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
  selftests/bpf: Add test for libbpf's custom log_buf behavior
  selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
  libbpf: Deprecate bpf_object__load_xattr()
  libbpf: Add per-program log buffer setter and getter
  libbpf: Preserve kernel error code and remove kprobe prog type guessing
  libbpf: Improve logging around BPF program loading
  libbpf: Allow passing user log setting through bpf_object_open_opts
  libbpf: Allow passing preallocated log_buf when loading BTF into kernel
  libbpf: Add OPTS-based bpf_btf_load() API
  libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
  samples/bpf: Remove unneeded variable
  bpf: Remove redundant assignment to pointer t
  selftests/bpf: Fix a compilation warning
  perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
  samples: bpf: Fix 'unknown warning group' build warning on Clang
  samples: bpf: Fix xdp_sample_user.o linking with Clang
  ...
====================

Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 15:56:13 -08:00
Peter Zijlstra
1614b2b11f arch: Make ARCH_STACKWALK independent of STACKTRACE
Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.

Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:03 +00:00
David Woodhouse
5f33868af8 KVM: powerpc: Use Makefile.kvm for common files
It's all fairly baroque but in the end, I don't think there's any reason
for $(KVM)/irqchip.o to have been handled differently, as they all end
up in $(kvm-y) in the end anyway, regardless of whether they get there
via $(common-objs-y) and the CPU-specific object lists.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Message-Id: <20211121125451.9489-7-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09 13:06:01 -05:00
Christophe Leroy
b149d5d45a powerpc/powermac: Add additional missing lockdep_register_key()
Commit df1f679d19 ("powerpc/powermac: Add missing
lockdep_register_key()") fixed a problem that was causing a WARNING.

There are two other places in the same file with the same problem
originating from commit 9e607f7274 ("i2c_powermac: shut up lockdep
warning").

Add missing lockdep_register_key()

Fixes: 9e607f7274 ("i2c_powermac: shut up lockdep warning")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Depends-on: df1f679d19 ("powerpc/powermac: Add missing lockdep_register_key()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200055
Link: https://lore.kernel.org/r/2c7e421874e21b2fb87813d768cf662f630c2ad4.1638984999.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:22 +11:00
Hari Bathini
06e629c25d powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dff
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bb ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f222 ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d3841 ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab26238bb ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com
2021-12-09 22:41:22 +11:00
Hari Bathini
219572d2fc powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
Kdump can be triggered after panic_notifers since commit f06e5153f4
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
after panic_notifers") introduced crash_kexec_post_notifiers option.
But using this option would mean smp_send_stop(), that marks all other
CPUs as offline, gets called before kdump is triggered. As a result,
kdump routines fail to save other CPUs' registers. To fix this, kdump
friendly crash_smp_send_stop() function was introduced with kernel
commit 0ee59413c9 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path"). Override this kdump friendly weak
function to handle crash_kexec_post_notifiers option appropriately
on powerpc.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
[Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-1-hbathini@linux.ibm.com
2021-12-09 22:41:22 +11:00
Anders Roxell
e89257e28e powerpc/cell: Fix clang -Wimplicit-fallthrough warning
Clang warns:

arch/powerpc/platforms/cell/pervasive.c:81:2: error: unannotated fall-through between switch labels
        case SRR1_WAKEEE:
        ^
arch/powerpc/platforms/cell/pervasive.c:81:2: note: insert 'break;' to avoid fall-through
        case SRR1_WAKEEE:
        ^
        break;
1 error generated.

Clang is more pedantic than GCC, which does not warn when failing
through to a case that is just break or return. Clang's version is more
in line with the kernel's own stance in deprecated.rst. Add athe missing
break to silence the warning.

Fixes: 6e83985b0f ("powerpc/cbe: Do not process external or decremeter interrupts from sreset")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207110228.698956-1-anders.roxell@linaro.org
2021-12-09 22:41:21 +11:00
Christophe Leroy
0d76914a4c powerpc/inst: Optimise copy_inst_from_kernel_nofault()
copy_inst_from_kernel_nofault() uses copy_from_kernel_nofault() to
copy one or two 32bits words. This means calling an out-of-line
function which itself calls back copy_from_kernel_nofault_allowed()
then performs a generic copy with loops.

Rewrite copy_inst_from_kernel_nofault() to do everything at a
single place and use __get_kernel_nofault() directly to perform
single accesses without loops.

Allthough the generic function uses pagefault_disable(), it is not
required on powerpc because do_page_fault() bails earlier when a
kernel mode fault happens on a kernel address.

As the function has now become very small, inline it.

With this change, on an 8xx the time spent in the loop in
ftrace_replace_code() is reduced by 23% at function tracer activation
and 27% at nop tracer activation.
The overall time to activate function tracer (measured with shell
command 'time') is 570ms before the patch and 470ms after the patch.

Even vmlinux size is reduced (by 152 instruction).

Before the patch:

	00000018 <copy_inst_from_kernel_nofault>:
	  18:	94 21 ff e0 	stwu    r1,-32(r1)
	  1c:	7c 08 02 a6 	mflr    r0
	  20:	38 a0 00 04 	li      r5,4
	  24:	93 e1 00 1c 	stw     r31,28(r1)
	  28:	7c 7f 1b 78 	mr      r31,r3
	  2c:	38 61 00 08 	addi    r3,r1,8
	  30:	90 01 00 24 	stw     r0,36(r1)
	  34:	48 00 00 01 	bl      34 <copy_inst_from_kernel_nofault+0x1c>
				34: R_PPC_REL24	copy_from_kernel_nofault
	  38:	2c 03 00 00 	cmpwi   r3,0
	  3c:	40 82 00 0c 	bne     48 <copy_inst_from_kernel_nofault+0x30>
	  40:	81 21 00 08 	lwz     r9,8(r1)
	  44:	91 3f 00 00 	stw     r9,0(r31)
	  48:	80 01 00 24 	lwz     r0,36(r1)
	  4c:	83 e1 00 1c 	lwz     r31,28(r1)
	  50:	38 21 00 20 	addi    r1,r1,32
	  54:	7c 08 03 a6 	mtlr    r0
	  58:	4e 80 00 20 	blr

After the patch (before inlining):

	00000018 <copy_inst_from_kernel_nofault>:
	  18:	3d 20 b0 00 	lis     r9,-20480
	  1c:	7c 04 48 40 	cmplw   r4,r9
	  20:	7c 69 1b 78 	mr      r9,r3
	  24:	41 80 00 14 	blt     38 <copy_inst_from_kernel_nofault+0x20>
	  28:	81 44 00 00 	lwz     r10,0(r4)
	  2c:	38 60 00 00 	li      r3,0
	  30:	91 49 00 00 	stw     r10,0(r9)
	  34:	4e 80 00 20 	blr

	  38:	38 60 ff de 	li      r3,-34
	  3c:	4e 80 00 20 	blr
	  40:	38 60 ff f2 	li      r3,-14
	  44:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Add clang workaround, with version check as suggested by Nathan]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d5b12183d5176dd702d29ad94c39c384e51c78f.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
9b307576f3 powerpc/inst: Move ppc_inst_t definition in asm/reg.h
Because of circular inclusion of asm/hw_breakpoint.h, we
need to move definition of asm/reg.h outside of inst.h
so that asm/hw_breakpoint.h gets it without including
asm/inst.h

Also remove asm/inst.h from asm/uprobes.h as it's not
needed anymore.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4b79f1491118af96b1ac0735e74aeca02ea4c04e.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
07b863aef5 powerpc/inst: Define ppc_inst_t as u32 on PPC32
Unlike PPC64 ABI, PPC32 uses the stack to pass a parameter defined
as a struct, even when the struct has a single simple element.

To avoid that, define ppc_inst_t as u32 on PPC32.

Keep it as 'struct ppc_inst' when __CHECKER__ is defined so that
sparse can perform type checking.

Also revert commit 511eea5e2c ("powerpc/kprobes: Fix Oops by passing
ppc_inst as a pointer to emulate_step() on ppc32") as now the
instruction to be emulated is passed as a register to emulate_step().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c6d0c46f598f76ad0b0a88bc0d84773bd921b17c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
c545b9f040 powerpc/inst: Define ppc_inst_t
In order to stop using 'struct ppc_inst' on PPC32,
define a ppc_inst_t typedef.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
3261d99adb powerpc/inst: Refactor ___get_user_instr()
PPC64 version of ___get_user_instr() can be used for PPC32 as well,
by simply disabling the suffix part with IS_ENABLED(CONFIG_PPC64).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1f0ede830ccb33a659119a55cb590820c27004db.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
37eb7ca91b powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
Today we have the following IBATs allocated:

	---[ Instruction Block Address Translation ]---
	0: 0xc0000000-0xc03fffff 0x00000000         4M Kernel   x     m
	1: 0xc0400000-0xc05fffff 0x00400000         2M Kernel   x     m
	2: 0xc0600000-0xc06fffff 0x00600000         1M Kernel   x     m
	3: 0xc0700000-0xc077ffff 0x00700000       512K Kernel   x     m
	4: 0xc0780000-0xc079ffff 0x00780000       128K Kernel   x     m
	5: 0xc07a0000-0xc07bffff 0x007a0000       128K Kernel   x     m
	6:         -
	7:         -

The two 128K should be a single 256K instead.

When _etext is not aligned to 128Kbytes, the system will allocate
all necessary BATs to the lower 128Kbytes boundary, then allocate
an additional 128Kbytes BAT for the remaining block.

Instead, align the top to 128Kbytes so that the function directly
allocates a 256Kbytes last block:

	---[ Instruction Block Address Translation ]---
	0: 0xc0000000-0xc03fffff 0x00000000         4M Kernel   x     m
	1: 0xc0400000-0xc05fffff 0x00400000         2M Kernel   x     m
	2: 0xc0600000-0xc06fffff 0x00600000         1M Kernel   x     m
	3: 0xc0700000-0xc077ffff 0x00700000       512K Kernel   x     m
	4: 0xc0780000-0xc07bffff 0x00780000       256K Kernel   x     m
	5:         -
	6:         -
	7:         -

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ab58b296832b0ec650e2203200e060adbcb2677d.1637930421.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
dede19be51 powerpc: Remove CONFIG_PPC_HAVE_KUAP and CONFIG_PPC_HAVE_KUEP
All platforms now have KUAP and KUEP so remove CONFIG_PPC_HAVE_KUAP
and CONFIG_PPC_HAVE_KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a3c007ad0951965199e6ab2ef1035966bc66e771.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
57bc963837 powerpc/kuap: Wire-up KUAP on book3e/64
This adds KUAP support to book3e/64.
This is done by reading the content of SPRN_MAS1 and checking
the TID at the time user pgtable is loaded.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e2c2c9375afd4bbc06aa904d0103a5f5102a2b1a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
4f6a025201 powerpc/kuap: Wire-up KUAP on 85xx in 32 bits mode.
This adds KUAP support to 85xx in 32 bits mode.
This is done by reading the content of SPRN_MAS1 and checking
the TID at the time user pgtable is loaded.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f8696f8980ca1532ada3a2f0e0a03e756269c7fe.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
fcf9bb6d32 powerpc/kuap: Wire-up KUAP on 40x
This adds KUAP support to 40x. This is done by checking
the content of SPRN_PID at the time user pgtable is loaded.

40x doesn't have KUEP, but KUAP implies KUEP because when the
PID doesn't match the page's PID, the page cannot be read nor
executed.

So KUEP is now automatically selected when KUAP is selected and
disabled when KUAP is disabled.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aaefa91897ddc42ac11019dc0e1d1a525bd08e90.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
f6fad4fb55 powerpc/kuap: Wire-up KUAP on 44x
This adds KUAP support to 44x. This is done by checking
the content of SPRN_PID at the time it is read and written
into SPRN_MMUCR.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d6c3f1978a26feada74b084f651e8cf1e3b3a47.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
43afcf8f01 powerpc: Add KUAP support for BOOKE and 40x
On booke/40x we don't have segments like book3s/32.
On booke/40x we don't have access protection groups like 8xx.

Use the PID register to provide user access protection.
Kernel address space can be accessed with any PID.
User address space has to be accessed with the PID of the user.
User PID is always not null.

Everytime the kernel is entered, set PID register to 0 and
restore PID register when returning to user.

Everytime kernel needs to access user data, PID is restored
for the access.

In TLB miss handlers, check the PID and bail out to data storage
exception when PID is 0 and accessed address is in user space.

Note that also forbids execution of user text by kernel except
when user access is unlocked. But this shouldn't be a problem
as the kernel is not supposed to ever run user text.

This patch prepares the infrastructure but the real activation of KUAP
is done by following patches for each processor type one by one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5d65576a8e31e9480415785a180c92dd4e72306d.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
e3c02f25b4 powerpc/kuap: Make PPC_KUAP_DEBUG depend on PPC_KUAP only
PPC_KUAP_DEBUG is supported by all platforms doing PPC_KUAP,
it doesn't depend on Radix on book3s/64.

This will avoid adding one more dependency when implementing
KUAP on book3e/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a5ff6228a36e51783b83d8c10d058db76e450f63.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
42e03bc524 powerpc/kuap: Prepare for supporting KUAP on BOOK3E/64
Also call kuap_lock() and kuap_save_and_lock() from
interrupt functions with CONFIG_PPC64.

For book3s/64 we keep them empty as it is done in assembly.

Also do the locked assert when switching task unless it is
book3s/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1cbf94e26e6d6e2e028fd687588a7e6622d454a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
047a6fd401 powerpc/config: Add CONFIG_BOOKE_OR_40x
We have many functionnalities common to 40x and BOOKE, it leads to
many places with #if defined(CONFIG_BOOKE) || defined(CONFIG_40x).

We are going to add a few more with KUAP for booke/40x, so create
a new symbol which is defined when either BOOKE or 40x is defined.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9a3dbd60924cb25c9f944d3d8205ac5a0d15e229.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
25ae981faf powerpc/nohash: Move setup_kuap out of 8xx.c
In order to reuse it on booke/4xx, move KUAP
setup routine out of 8xx.c

Make them usable on SMP by removing the __init tag
as it is called for each CPU.

And use __prevent_user_access() instead of hard
coding initial lock.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ae35eec3426509efc2b8ae69586c822e2fe2642a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
937fb7003e powerpc/kuap: Add kuap_lock()
Add kuap_lock() and call it when entering interrupts from user.

It is called kuap_lock() as it is similar to kuap_save_and_lock()
without the save.

However book3s/32 already have a kuap_lock(). Rename it
kuap_lock_addr().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4437e2deb9f6f549f7089d45e9c6f96a7e77905a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
2341964e27 powerpc/kuap: Remove __kuap_assert_locked()
__kuap_assert_locked() is redundant with
__kuap_get_and_assert_locked().

Move the verification of CONFIG_PPC_KUAP_DEBUG in kuap_assert_locked()
and make it call __kuap_get_and_assert_locked() directly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1a60198a25d2ba38a37f1b92bc7d096435df4224.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
c252f3846d powerpc/kuap: Check KUAP activation in generic functions
Today, every platform checks that KUAP is not de-activated
before doing the real job.

Move the verification out of platform specific functions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/894f110397fcd248e125fb855d1e863e4e633a0d.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
ba454f9c8e powerpc/kuap: Add a generic intermediate layer
Make the following functions generic to all platforms.
- bad_kuap_fault()
- kuap_assert_locked()
- kuap_save_and_lock() (PPC32 only)
- kuap_kernel_restore()
- kuap_get_and_assert_locked()

And for all platforms except book3s/64
- allow_user_access()
- prevent_user_access()
- prevent_user_access_return()
- restore_user_access()

Prepend __ in front of the name of platform specific ones.

For now the generic just calls the platform specific, but
next patch will move redundant parts of specific functions
into the generic one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eaef143a8dae7288cd34565ffa7b49c16aee1ec3.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
6754862249 powerpc/kuep: Remove 'nosmep' boot time parameter except for book3s/64
Deactivating KUEP at boot time is unrelevant for PPC32 and BOOK3E/64.

Remove it.

It allows to refactor setup_kuep() via a __weak function
that only PPC64s will overide for now.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix CONFIG_PPC_BOOKS_64 -> CONFIG_PPC_BOOK3S_64 typo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4c36df18b41c988c4512f45d96220486adbe4c99.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
70428da94c powerpc/32s: Save content of sr0 to avoid 'mfsr'
Calling 'mfsr' to get the content of segment registers is heavy,
in addition it requires clearing of the 'reserved' bits.

In order to avoid this operation, save it in mm context and in
thread struct.

The saved sr0 is the one used by kernel, this means that on
locking entry it can be used as is.

For unlocking, the only thing to do is to clear SR_NX.

This improves null_syscall selftest by 12 cycles, ie 4%.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b02baf2ed8f09bad910dfaeeb7353b2ae6830525.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
526d4a4c77 powerpc/32s: Do kuep_lock() and kuep_unlock() in assembly
When interrupt and syscall entries where converted to C, KUEP locking
and unlocking was also converted. It improved performance by unrolling
the loop, and allowed easily implementing boot time deactivation of
KUEP.

However, null_syscall selftest shows that KUEP is still heavy
(361 cycles with KUEP, 212 cycles without).

A way to improve more is to group 'mtsr's together, instead of
repeating 'addi' + 'mtsr' several times.

In order to do that, more registers need to be available. In C, GCC
will always be able to provide the requested number of registers, but
at the cost of saving some data on the stack, which is counter
performant here.

So let's do it in assembly, when we have full control of which
register can be used. It also has the advantage of locking earlier
and unlocking later and it helps GCC generating less tricky code.
The only drawback is to make boot time deactivation less straight
forward and require 'hand' instruction patching.

Group 'mtsr's by 4.

With this change, null_syscall selftest reports 336 cycles. Without
the change it was 361 cycles, that's a 7% reduction.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/115cb279e9b9948dfd93a065e047081c59e3a2a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
df415cd758 powerpc/32s: Remove capability to disable KUEP at boottime
Disabling KUEP at boottime makes things unnecessarily complex.

Still allow disabling KUEP at build time, but when it's built-in
it is always there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/96f583f82423a29a4205c60b9721079111b35567.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
dc3a0e5b83 powerpc/book3e: Activate KUEP at all time
On book3e,
- When using 64 bits PTE: User pages don't have the SX bit defined
so KUEP is always active.
- When using 32 bits PTE: Implement KUEP by clearing SX bit during
TLB miss for user pages. The impact is minimal and worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e376b114283fb94504e2aa2de846780063252cde.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
ee2631603f powerpc/44x: Activate KUEP at all time
On 44x, KUEP is implemented by clearing SX bit during TLB miss
for user pages. The impact is minimal and not worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2414d662558e7fb27d1ed41c8e47c591d576acac.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
13dac4e31e powerpc/8xx: Activate KUEP at all time
On the 8xx, there is absolutely no runtime impact with KUEP. Protection
against execution of user code in kernel mode is set up at boot time
by configuring the groups with contain all user pages as having swapped
protection rights, in extenso EX for user and NA for supervisor.

Configure KUEP at startup and force selection of CONFIG_PPC_KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2129e86944323ffe9ed07fffbeafdfd2e363690a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
6c1fa60d36 Revert "powerpc: Inline setup_kup()"
This reverts commit 1791ebd131.

setup_kup() was inlined to manage conflict between PPC32 marking
setup_{kuap/kuep}() __init and PPC64 not marking them __init.

But in fact PPC32 has removed the __init mark for all but 8xx
in order to properly handle SMP.

In order to make setup_kup() grow a bit, revert the commit
mentioned above but remove __init for 8xx as well so that
we don't have to mark setup_kup() as __ref.

Also switch the order so that KUAP is initialised before KUEP
because on the 40x, KUEP will depend on the activation of KUAP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7691088fd0994ee3c8db6298dc8c00259e3f6a7f.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:16 +11:00
Christophe Leroy
06e7cbc29e powerpc/40x: Map 32Mbytes of memory at startup
As reported by Carlo, 16Mbytes is not enough with modern kernels
that tend to be a bit big, so map another 16M page at boot.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/89b5f974a7fa5011206682cd092e2c905530ff46.1632755552.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:16 +11:00
Nicholas Piggin
31284f703d powerpc/microwatt: add POWER9_CPU, clear PPC_64S_HASH_MMU
Microwatt implements a subset of ISA v3.0 (which is equivalent to
the POWER9_CPU option). It is radix-only, so does not require hash
MMU support.

This saves 20kB compressed dtbImage and 56kB vmlinux size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-19-npiggin@gmail.com
2021-12-09 22:41:16 +11:00
Nicholas Piggin
387e220a2e powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves
128kB kernel image size (90kB text) on powernv_defconfig minus KVM,
350kB on pseries_defconfig minus KVM, 40kB on a tiny config.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG.
      Fix radix_enabled() use in setup_initial_memory_limit(). Add some
      stubs to reduce number of ifdefs.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
2021-12-09 22:41:13 +11:00
Nicholas Piggin
c28573744b powerpc/64s: Make hash MMU support configurable
This adds Kconfig selection which allows 64s hash MMU support to be
disabled. It can be disabled if radix support is enabled, the minimum
supported CPU type is POWER9 (or higher), and KVM is not selected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-17-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin
debeda0171 powerpc/64s: Always define arch unmapped area calls
To avoid any functional changes to radix paths when building with hash
MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the
arch get_unmapped_area calls on 64s platforms.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-16-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin
af3a0ea41c powerpc/64s: Fix radix MMU when MMU_FTR_HPTE_TABLE is clear
There are a few places that require MMU_FTR_HPTE_TABLE to be set even
when running in radix mode. Fix those up.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-15-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin
8dbfc0092b powerpc/64e: remove mmu_linear_psize
mmu_linear_psize is only set at boot once on 64e, is not necessarily
the correct size of the linear map pages, and is never used anywhere.
Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Retain the extern, so we can use IS_ENABLED() for related code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-14-npiggin@gmail.com
2021-12-09 22:39:39 +11:00
Thomas Gleixner
e58f2259b9 genirq/msi, treewide: Use a named struct for PCI/MSI attributes
The unnamed struct sucks and is in the way of further cleanups. Stick the
PCI related MSI data into a real data structure and cleanup all users.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211206210224.374863119@linutronix.de
2021-12-09 11:52:21 +01:00
Cédric Le Goater
eca213152a powerpc/4xx: Complete removal of MSI support
Finish the work by removing all references to the PPC4xx_MSI config
and the associated device nodes in the DTs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/e92f2bb3-b5e1-c870-8151-3917a789a640@kaod.org
2021-12-09 11:52:20 +01:00
Thomas Gleixner
4f1d038b5e powerpc/4xx: Remove MSI support which never worked
This code is broken since day one. ppc4xx_setup_msi_irqs() has the
following gems:

 1) The handling of the result of msi_bitmap_alloc_hwirqs() is completely
    broken:
    
    When the result is greater than or equal 0 (bitmap allocation
    successful) then the loop terminates and the function returns 0
    (success) despite not having installed an interrupt.

    When the result is less than 0 (bitmap allocation fails), it prints an
    error message and continues to "work" with that error code which would
    eventually end up in the MSI message data.

 2) On every invocation the file global pp4xx_msi::msi_virqs bitmap is
    allocated thereby leaking the previous one.

IOW, this has never worked and for more than 10 years nobody cared. Remove
the gunk.

Fixes: 3fb7933850 ("powerpc/4xx: Adding PCIe MSI support")
Fixes: 247540b03b ("powerpc/44x: Fix PCI MSI support for Maui APM821xx SoC and Bluestone board")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210223.872249537@linutronix.de
2021-12-09 11:52:20 +01:00
Paolo Bonzini
93b350f884 Merge branch 'kvm-on-hv-msrbm-fix' into HEAD
Merge bugfix for enlightened MSR Bitmap, before adding support
to KVM for exposing the feature to nested guests.
2021-12-08 05:30:48 -05:00
Sean Christopherson
91b99ea706 KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt()
Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting
the actual "block" sequences into a separate helper (to be named
kvm_vcpu_block()).  x86 will use the standalone block-only path to handle
non-halt cases where the vCPU is not runnable.

Rename block_ns to halt_ns to match the new function name.

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:51 -05:00
Sean Christopherson
005467e06b KVM: Drop obsolete kvm_arch_vcpu_block_finish()
Drop kvm_arch_vcpu_block_finish() now that all arch implementations are
nops.

No functional change intended.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:50 -05:00
Sean Christopherson
510958e997 KVM: Force PPC to define its own rcuwait object
Do not define/reference kvm_vcpu.wait if __KVM_HAVE_ARCH_WQP is true, and
instead force the architecture (PPC) to define its own rcuwait object.
Allowing common KVM to directly access vcpu->wait without a guard makes
it all too easy to introduce potential bugs, e.g. kvm_vcpu_block(),
kvm_vcpu_on_spin(), and async_pf_execute() all operate on vcpu->wait, not
the result of kvm_arch_vcpu_get_wait(), and so may do the wrong thing for
PPC.

Due to PPC's shenanigans with respect to callbacks and waits (it switches
to the virtual core's wait object at KVM_RUN!?!?), it's not clear whether
or not this fixes any bugs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:46 -05:00
Maciej S. Szmigiero
a54d806688 KVM: Keep memslots in tree-based structures instead of array-based ones
The current memslot code uses a (reverse gfn-ordered) memslot array for
keeping track of them.

Because the memslot array that is currently in use cannot be modified
every memslot management operation (create, delete, move, change flags)
has to make a copy of the whole array so it has a scratch copy to work on.

Strictly speaking, however, it is only necessary to make copy of the
memslot that is being modified, copying all the memslots currently present
is just a limitation of the array-based memslot implementation.

Two memslot sets, however, are still needed so the VM continues to run
on the currently active set while the requested operation is being
performed on the second, currently inactive one.

In order to have two memslot sets, but only one copy of actual memslots
it is necessary to split out the memslot data from the memslot sets.

The memslots themselves should be also kept independent of each other
so they can be individually added or deleted.

These two memslot sets should normally point to the same set of
memslots. They can, however, be desynchronized when performing a
memslot management operation by replacing the memslot to be modified
by its copy.  After the operation is complete, both memslot sets once
again point to the same, common set of memslot data.

This commit implements the aforementioned idea.

For tracking of gfns an ordinary rbtree is used since memslots cannot
overlap in the guest address space and so this data structure is
sufficient for ensuring that lookups are done quickly.

The "last used slot" mini-caches (both per-slot set one and per-vCPU one),
that keep track of the last found-by-gfn memslot, are still present in the
new code.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:34 -05:00
Maciej S. Szmigiero
ed922739c9 KVM: Use interval tree to do fast hva lookup in memslots
The current memslots implementation only allows quick binary search by gfn,
quick lookup by hva is not possible - the implementation has to do a linear
scan of the whole memslots array, even though the operation being performed
might apply just to a single memslot.

This significantly hurts performance of per-hva operations with higher
memslot counts.

Since hva ranges can overlap between memslots an interval tree is needed
for tracking them.

[sean: handle interval tree updates in kvm_replace_memslot()]
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <d66b9974becaa9839be9c4e1a5de97b177b4ac20.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:32 -05:00
Sean Christopherson
6a99c6e3f5 KVM: Stop passing kvm_userspace_memory_region to arch memslot hooks
Drop the @mem param from kvm_arch_{prepare,commit}_memory_region() now
that its use has been removed in all architectures.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <aa5ed3e62c27e881d0d8bc0acbc1572bc336dc19.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:25 -05:00
Sean Christopherson
eaaaed137e KVM: PPC: Avoid referencing userspace memory region in memslot updates
For PPC HV, get the number of pages directly from the new memslot instead
of computing the same from the userspace memory region, and explicitly
check for !DELETE instead of inferring the same when toggling mmio_update.
The motivation for these changes is to avoid referencing the @mem param
so that it can be dropped in a future commit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <1e97fb5198be25f98ef82e63a8d770c682264cc9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:22 -05:00
Sean Christopherson
537a17b314 KVM: Let/force architectures to deal with arch specific memslot data
Pass the "old" slot to kvm_arch_prepare_memory_region() and force arch
code to handle propagating arch specific data from "new" to "old" when
necessary.  This is a baby step towards dynamically allocating "new" from
the get go, and is a (very) minor performance boost on x86 due to not
unnecessarily copying arch data.

For PPC HV, copy the rmap in the !CREATE and !DELETE paths, i.e. for MOVE
and FLAGS_ONLY.  This is functionally a nop as the previous behavior
would overwrite the pointer for CREATE, and eventually discard/ignore it
for DELETE.

For x86, copy the arch data only for FLAGS_ONLY changes.  Unlike PPC HV,
x86 needs to reallocate arch data in the MOVE case as the size of x86's
allocations depend on the alignment of the memslot's gfn.

Opportunistically tweak kvm_arch_prepare_memory_region()'s param order to
match the "commit" prototype.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
[mss: add missing RISCV kvm_arch_prepare_memory_region() change]
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <67dea5f11bbcfd71e3da5986f11e87f5dd4013f9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:20 -05:00
Marc Zyngier
46808a4cb8 KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu
index. Unfortunately, we're about to move rework the iterator,
which requires this to be upgrade to an unsigned long.

Let's bite the bullet and repaint all of it in one go.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-7-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:15 -05:00
Marc Zyngier
27592ae8db KVM: Move wiping of the kvm->vcpus array to common code
All architectures have similar loops iterating over the vcpus,
freeing one vcpu at a time, and eventually wiping the reference
off the vcpus array. They are also inconsistently taking
the kvm->lock mutex when wiping the references from the array.

Make this code common, which will simplify further changes.
The locking is dropped altogether, as this should only be called
when there is no further references on the kvm structure.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-2-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:13 -05:00
Sebastian Andrzej Siewior
77993b595a locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
The printk header file includes ratelimit_types.h for its __ratelimit()
based usage. It is required for the static initializer used in
printk_ratelimited(). It uses a raw_spinlock_t and includes the
spinlock_types.h.

PREEMPT_RT substitutes spinlock_t with a rtmutex based implementation and so
its spinlock_t implmentation (provided by spinlock_rt.h) includes rtmutex.h and
atomic.h which leads to recursive includes where defines are missing.

By including only the raw_spinlock_t defines it avoids the atomic.h
related includes at this stage.

An example on powerpc:

|  CALL    scripts/atomic/check-atomics.sh
|In file included from include/linux/bug.h:5,
|                 from include/linux/page-flags.h:10,
|                 from kernel/bounds.c:10:
|arch/powerpc/include/asm/page_32.h: In function ‘clear_page’:
|arch/powerpc/include/asm/bug.h:87:4: error: implicit declaration of function â=80=98__WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
|   87 |    __WARN();    \
|      |    ^~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
|   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
|      |  ^~~~~~~
|arch/powerpc/include/asm/bug.h:58:17: error: invalid application of ‘sizeofâ€=99 to incomplete type ‘struct bug_entryâ€=99
|   58 |     "i" (sizeof(struct bug_entry)), \
|      |                 ^~~~~~
|arch/powerpc/include/asm/bug.h:89:3: note: in expansion of macro ‘BUG_ENTRYâ€=99
|   89 |   BUG_ENTRY(PPC_TLNEI " %4, 0",   \
|      |   ^~~~~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
|   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
|      |  ^~~~~~~
|In file included from arch/powerpc/include/asm/ptrace.h:298,
|                 from arch/powerpc/include/asm/hw_irq.h:12,
|                 from arch/powerpc/include/asm/irqflags.h:12,
|                 from include/linux/irqflags.h:16,
|                 from include/asm-generic/cmpxchg-local.h:6,
|                 from arch/powerpc/include/asm/cmpxchg.h:526,
|                 from arch/powerpc/include/asm/atomic.h:11,
|                 from include/linux/atomic.h:7,
|                 from include/linux/rwbase_rt.h:6,
|                 from include/linux/rwlock_types.h:55,
|                 from include/linux/spinlock_types.h:74,
|                 from include/linux/ratelimit_types.h:7,
|                 from include/linux/printk.h:10,
|                 from include/asm-generic/bug.h:22,
|                 from arch/powerpc/include/asm/bug.h:109,
|                 from include/linux/bug.h:5,
|                 from include/linux/page-flags.h:10,
|                 from kernel/bounds.c:10:
|include/linux/thread_info.h: In function â=80=98copy_overflowâ=80=99:
|include/linux/thread_info.h:210:2: error: implicit declaration of function â=80=98WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
|  210 |  WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
|      |  ^~~~

The WARN / BUG include pulls in printk.h and then ptrace.h expects WARN
(from bug.h) which is not yet complete. Even hw_irq.h has WARN_ON()
statements.

On POWERPC64 there are missing atomic64 defines while building 32bit
VDSO:
|  VDSO32C arch/powerpc/kernel/vdso32/vgettimeofday.o
|In file included from include/linux/atomic.h:80,
|                 from include/linux/rwbase_rt.h:6,
|                 from include/linux/rwlock_types.h:55,
|                 from include/linux/spinlock_types.h:74,
|                 from include/linux/ratelimit_types.h:7,
|                 from include/linux/printk.h:10,
|                 from include/linux/kernel.h:19,
|                 from arch/powerpc/include/asm/page.h:11,
|                 from arch/powerpc/include/asm/vdso/gettimeofday.h:5,
|                 from include/vdso/datapage.h:137,
|                 from lib/vdso/gettimeofday.c:5,
|                 from <command-line>:
|include/linux/atomic-arch-fallback.h: In function ‘arch_atomic64_incâ€=99:
|include/linux/atomic-arch-fallback.h:1447:2: error: implicit declaration of function ‘arch_atomic64_add’; did you mean ‘arch_atomic_add’? [-Werror=3Dimpl
|icit-function-declaration]
| 1447 |  arch_atomic64_add(1, v);
|      |  ^~~~~~~~~~~~~~~~~
|      |  arch_atomic_add

The generic fallback is not included, atomics itself are not used. If
kernel.h does not include printk.h then it comes later from the bug.h
include.

Allow asm/spinlock_types.h to be included from
linux/spinlock_types_raw.h.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211129174654.668506-12-bigeasy@linutronix.de
2021-12-07 15:14:12 +01:00
Nicholas Piggin
20626177c9 powerpc: make memremap_compat_align 64s-only
memremap_compat_align is only relevant when ZONE_DEVICE is selected.
ZONE_DEVICE depends on ARCH_HAS_PTE_DEVMAP, which is only selected
by PPC_BOOK3S_64.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-13-npiggin@gmail.com
2021-12-02 22:57:24 +11:00
Nicholas Piggin
ffbe5d21d1 powerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radix
Radix never sets mmu_linear_psize so it's always 4K, which causes pcpu
atom_size to always be PAGE_SIZE. 64e sets it to 1GB always.

Make paths for these platforms to be explicit about what value they set
atom_size to.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-12-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
f43d2ffb47 powerpc/64s: Rename hash_hugetlbpage.c to hugetlbpage.c
This file contains functions and data common to radix, so rename it to
remove the hash_ prefix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-11-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
bdad5d57df powerpc/64s: move page size definitions from hash specific file
The radix code uses some of the psize variables. Move the common
ones from hash_utils.c to pgtable.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-10-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
310dce6201 powerpc/64s: Make flush_and_reload_slb a no-op when radix is enabled
The radix test can exclude slb_flush_all_realmode() from being called
because flush_and_reload_slb() is only expected to flush ERAT when
called by flush_erat(), which is only on pre-ISA v3.0 CPUs that do not
support radix.

This helps the later change to make hash support configurable to not
introduce runtime changes to radix mode behaviour.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-9-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
162b0889bb powerpc/64s: move THP trace point creation out of hash specific file
In preparation for making hash MMU support configurable, move THP
trace point function definitions out of an otherwise hash-specific
file.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-8-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
3d3282fd34 powerpc/pseries: lparcfg don't include slb_size line in radix mode
This avoids a change in behaviour in the later patch making hash
support configurable. This is possibly a user interface change, so
the alternative would be a hard-coded slb_size=0 here.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-7-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
0c7cc15e92 powerpc/pseries: move process table registration away from hash-specific code
This reduces ifdefs in a later change which makes hash support configurable.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-6-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
935b534c24 powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific
slb.c is hash-specific SLB management, but do_bad_slb_fault deals with
segment interrupts that occur with radix MMU as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-5-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
a4135cbebd powerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVE
The pseries platform does not use the native hash code but the PAPR
virtualised hash interfaces, so remove PPC_HASH_MMU_NATIVE.

This requires moving tlbiel code from hash_native.c to hash_utils.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-4-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
7ebc49031d powerpc: Rename PPC_NATIVE to PPC_HASH_MMU_NATIVE
PPC_NATIVE now only controls the native HPT code, so rename it to be
more descriptive. Restrict it to Book3S only.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-3-npiggin@gmail.com
2021-12-02 22:57:22 +11:00
Nicholas Piggin
79b74a6848 powerpc: Remove unused FW_FEATURE_NATIVE references
FW_FEATURE_NATIVE_ALWAYS and FW_FEATURE_NATIVE_POSSIBLE are always
zero and never do anything. Remove them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-2-npiggin@gmail.com
2021-12-02 22:57:22 +11:00
Alexey Kardashevskiy
792020907b KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST
H_COPY_TOFROM_GUEST is an hcall for an upper level VM to access its nested
VMs memory. The userspace can trigger WARN_ON_ONCE(!(gfp & __GFP_NOWARN))
in __alloc_pages() by constructing a tiny VM which only does
H_COPY_TOFROM_GUEST with a too big GPR9 (number of bytes to copy).

This silences the warning by adding __GFP_NOWARN.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901084550.1658699-1-aik@ozlabs.ru
2021-12-02 22:56:08 +11:00
Alexey Kardashevskiy
511d25d6b7 KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
The userspace can trigger "vmalloc size %lu allocation failure: exceeds
total pages" via the KVM_SET_USER_MEMORY_REGION ioctl.

This silences the warning by checking the limit before calling vzalloc()
and returns ENOMEM if failed.

This does not call underlying valloc helpers as __vmalloc_node() is only
exported when CONFIG_TEST_VMALLOC_MODULE and __vmalloc_node_range() is
not exported at all.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Use 'size' for the variable rather than 'cb']
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901084512.1658628-1-aik@ozlabs.ru
2021-12-02 22:55:10 +11:00
Nicholas Piggin
f6a1987773 KVM: PPC: Book3S HV P9: Remove unused ri_set local variable
ri_set is set and never used.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201052112.2137167-1-npiggin@gmail.com
2021-12-02 10:42:40 +11:00
Cédric Le Goater
2a2ac8a701 powerpc/xive: Fix compile when !CONFIG_PPC_POWERNV.
The automatic "save & restore" of interrupt context is a POWER10/XIVE2
feature exploited by KVM under the PowerNV platform. It is not
available under pSeries and the associated toggle should not be
exposed under the XIVE debugfs directory.

Introduce a platform handler for debugfs initialization and move the
'save-restore' entry under the native (PowerNV) backend to fix compile
when !CONFIG_PPC_POWERNV.

Fixes: 1e7684dc4f ("powerpc/xive: Add a debugfs toggle for save-restore")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201165418.1041842-1-clg@kaod.org
2021-12-02 10:40:38 +11:00
Kees Cook
62ea67e319 powerpc/signal32: Use struct_group() to zero spe regs
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add a struct_group() for the spe registers so that memset() can correctly reason
about the size:

   In function 'fortify_memset_chk',
       inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3:
   >> include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
     195 |    __write_overflow_field();
         |    ^~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211118203604.1288379-1-keescook@chromium.org
2021-12-02 10:39:00 +11:00
Mark Rutland
985faa7868 powerpc: Snapshot thread flags
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Generally this is unlikely to cause a
problem in practice, but it is somewhat unsound, and KCSAN will
legitimately warn that there is a data race.

To avoid such issues, a snapshot of the flags has to be taken prior to
using them. Some places already use READ_ONCE() for that, others do not.

Convert them all to the new flag accessor helpers.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Link: https://lore.kernel.org/r/20211129130653.2037928-11-mark.rutland@arm.com
2021-12-01 00:06:44 +01:00
Mark Rutland
08b0af5b2a powerpc: Avoid discarding flags in system_call_exception()
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Thus, when setting flags we must use
an atomic operation rather than a plain read-modify-write sequence, as a
plain read-modify-write may discard flags which are concurrently set by a
remote thread, e.g.

	// task A			// task B
	tmp = A->thread_info.flags;
					set_tsk_thread_flag(A, NEWFLAG_B);
	tmp |= NEWFLAG_A;
	A->thread_info.flags = tmp;

arch/powerpc/kernel/interrupt.c's system_call_exception() sets
_TIF_RESTOREALL in the thread info flags with a read-modify-write, which
may result in other flags being discarded.

Elsewhere in the file it uses clear_bits() to atomically remove flag bits,
so use set_bits() here for consistency with those.

There may be reasons (e.g. instrumentation) that prevent the use of
set_thread_flag() and clear_thread_flag() here, which would otherwise be
preferable.

Fixes: ae7aaecc3f ("powerpc/64s: system call rfscv workaround for TM bugs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Eirik Fuller <efuller@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20211129130653.2037928-10-mark.rutland@arm.com
2021-12-01 00:06:44 +01:00
Christophe Leroy
af11dee436 powerpc/32s: Fix shift-out-of-bounds in KASAN init
================================================================================
UBSAN: shift-out-of-bounds in arch/powerpc/mm/kasan/book3s_32.c:22:23
shift exponent -1 is negative
CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.5-gentoo-PowerMacG4 #9
Call Trace:
[c214be60] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
[c214be80] [c0b99288] ubsan_epilogue+0x10/0x5c
[c214be90] [c0b98fe0] __ubsan_handle_shift_out_of_bounds+0x94/0x138
[c214bf00] [c1c0f010] kasan_init_region+0xd8/0x26c
[c214bf30] [c1c0ed84] kasan_init+0xc0/0x198
[c214bf70] [c1c08024] setup_arch+0x18/0x54c
[c214bfc0] [c1c037f0] start_kernel+0x90/0x33c
[c214bff0] [00003610] 0x3610
================================================================================

This happens when the directly mapped memory is a power of 2.

Fix it by checking the shift and set the result to 0 when shift is -1

Fixes: 7974c47326 ("powerpc/32s: Implement dedicated kasan_init_region()")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215169
Link: https://lore.kernel.org/r/15cbc3439d4ad988b225e2119ec99502a5cc6ad3.1638261744.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:39 +11:00
Christophe Leroy
df1f679d19 powerpc/powermac: Add missing lockdep_register_key()
KeyWest i2c @0xf8001003 irq 42 /uni-n@f8000000/i2c@f8001000
BUG: key c2d00cbc has not been registered!
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:4801 lockdep_init_map_type+0x4c0/0xb4c
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.5-gentoo-PowerMacG4 #9
NIP:  c01a9428 LR: c01a9428 CTR: 00000000
REGS: e1033cf0 TRAP: 0700   Not tainted  (5.15.5-gentoo-PowerMacG4)
MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24002002  XER: 00000000

GPR00: c01a9428 e1033db0 c2d1cf20 00000016 00000004 00000001 c01c0630 e1033a73
GPR08: 00000000 00000000 00000000 e1033db0 24002004 00000000 f8729377 00000003
GPR16: c1829a9c 00000000 18305357 c1416fc0 c1416f80 c006ac60 c2d00ca8 c1416f00
GPR24: 00000000 c21586f0 c2160000 00000000 c2d00cbc c2170000 c216e1a0 c2160000
NIP [c01a9428] lockdep_init_map_type+0x4c0/0xb4c
LR [c01a9428] lockdep_init_map_type+0x4c0/0xb4c
Call Trace:
[e1033db0] [c01a9428] lockdep_init_map_type+0x4c0/0xb4c (unreliable)
[e1033df0] [c1c177b8] kw_i2c_add+0x334/0x424
[e1033e20] [c1c18294] pmac_i2c_init+0x9ec/0xa9c
[e1033e80] [c1c1a790] smp_core99_probe+0xbc/0x35c
[e1033eb0] [c1c03cb0] kernel_init_freeable+0x190/0x5a4
[e1033f10] [c000946c] kernel_init+0x28/0x154
[e1033f30] [c0035148] ret_from_kernel_thread+0x14/0x1c

Add missing lockdep_register_key()

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/69e4f55565bb45ebb0843977801b245af0c666fe.1638264741.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:39 +11:00
Christophe Leroy
f1797e4de1 powerpc/modules: Don't WARN on first module allocation attempt
module_alloc() first tries to allocate module text within 24 bits direct
jump from kernel text, and tries a wider allocation if first one fails.

When first allocation fails the following is observed in kernel logs:

  vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size
  systemd-udevd: vmalloc error: size 2395133, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null)
  CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G        W         5.15.5-gentoo-PowerMacG4 #9
  Call Trace:
  [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
  [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4
  [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c
  [e2a53c10] [c00338c0] module_alloc+0xa0/0xac
  [e2a53c40] [c027a368] load_module+0x2ae0/0x8148
  [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130
  [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28
  ...

Add __GFP_NOWARN flag to first allocation so that no warning appears
when it fails.

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Fixes: 2ec13df167 ("powerpc/modules: Load modules closer to kernel text")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:32 +11:00
Nicholas Piggin
5402e239d0 powerpc/64s: Get LPID bit width from device tree
Allow the LPID bit width and partition table size to be set at runtime
from the device tree.

Move the PID bit width detection into the same place.

KVM does not support using the extra bits yet, this is mainly required
to get the PTCR register values correct (so KVM will run but it will
not allocate > 4096 LPIDs).

OPAL firmware provides this property for POWER10 CPUs since skiboot
commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for
POWER10 CPUs").

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com
2021-11-30 22:27:07 +11:00
Athira Rajeev
2c9ac51b85 powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC
Running perf fuzzer showed below in dmesg logs:
  "Can't find PMC that caused IRQ"

This means a PMU exception happened, but none of the PMC's (Performance
Monitor Counter) were found to be overflown. There are some corner cases
that clears the PMCs after PMI gets masked. In such cases, the perf
interrupt handler will not find the active PMC values that had caused
the overflow and thus leads to this message while replaying.

Case 1: PMU Interrupt happens during replay of other interrupts and
counter values gets cleared by PMU callbacks before replay:

During replay of interrupts like timer, __do_irq() and doorbell
exception, we conditionally enable interrupts via may_hard_irq_enable().
This could potentially create a window to generate a PMI. Since irq soft
mask is set to ALL_DISABLED, the PMI will get masked here. We could get
IPIs run before perf interrupt is replayed and the PMU events could
be deleted or stopped. This will change the PMU SPR values and resets
the counters. Snippet of ftrace log showing PMU callbacks invoked in
__do_irq():

  <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
  <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
  <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
  <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
  <<>>
  <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
  <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
  <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
  <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
  <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
  <<>>
  <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
  <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts

Case 2: PMI's masked during local_* operations, example local_add(). If
the local_add() operation happens within a local_irq_save(), replay of
PMI will be during local_irq_restore(). Similar to case 1, this could
also create a window before replay where PMU events gets deleted or
stopped.

Fix it by updating the PMU callback function power_pmu_disable() to
check for pending perf interrupt. If there is an overflown PMC and
pending perf interrupt indicated in paca, clear the PMI bit in paca to
drop that sample. Clearing of PMI bit is done in power_pmu_disable()
since disable is invoked before any event gets deleted/stopped. With
this fix, if there are more than one event running in the PMU, there is
a chance that we clear the PMI bit for the event which is not getting
deleted/stopped. The other events may still remain active. Hence to make
sure we don't drop valid sample in such cases, another check is added in
power_pmu_enable. This checks if there is an overflown PMC found among
the active events and if so enable back the PMI bit. Two new helper
functions are introduced to clear/set the PMI, ie
clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function
pmi_irq_pending() is introduced to give a warning if there is pending
PMI bit in paca, but no PMC is overflown.

Also there are corner cases which result in performance monitor
interrupts being triggered during power_pmu_disable(). This happens
since PMXE bit is not cleared along with disabling of other MMCR0 bits
in the pmu_disable. Such PMI's could leave the PMU running and could
trigger PMI again which will set MMCR0 PMAO bit. This could lead to
spurious interrupts in some corner cases. Example, a timer after
power_pmu_del() which will re-enable interrupts and triggers a PMI again
since PMAO bit is still set. But fails to find valid overflow since PMC
was cleared in power_pmu_del(). Fix that by disabling PMXE along with
disabling of other MMCR0 bits in power_pmu_disable().

We can't just replay PMI any time. Hence this approach is preferred
rather than replaying PMI before resetting overflown PMC. Patch also
documents core-book3s on a race condition which can trigger these PMC
messages during idle path in PowerNV.

Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Make pmi_irq_pending() return bool, reflow/reword some comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-11-30 17:15:49 +11:00
Christophe Leroy
f05cab0034 powerpc/atomics: Remove atomic_inc()/atomic_dec() and friends
Now that atomic_add() and atomic_sub() handle immediate operands,
atomic_inc() and atomic_dec() have no added value compared to the
generic fallback which calls atomic_add(1) and atomic_sub(1).

Also remove atomic_inc_not_zero() which fallsback to
atomic_add_unless() which itself fallsback to
atomic_fetch_add_unless() which now handles immediate operands.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0bc64a2f18726055093dbb2e479cefc60a409cfd.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:57 +11:00
Christophe Leroy
41d65207de powerpc/atomics: Use immediate operand when possible
Today we get the following code generation for atomic operations:

	c001bb2c:	39 20 00 01 	li      r9,1
	c001bb30:	7d 40 18 28 	lwarx   r10,0,r3
	c001bb34:	7d 09 50 50 	subf    r8,r9,r10
	c001bb38:	7d 00 19 2d 	stwcx.  r8,0,r3

	c001c7a8:	39 40 00 01 	li      r10,1
	c001c7ac:	7d 00 18 28 	lwarx   r8,0,r3
	c001c7b0:	7c ea 42 14 	add     r7,r10,r8
	c001c7b4:	7c e0 19 2d 	stwcx.  r7,0,r3

By allowing GCC to choose between immediate or regular operation,
we get:

	c001bb2c:	7d 20 18 28 	lwarx   r9,0,r3
	c001bb30:	39 49 ff ff 	addi    r10,r9,-1
	c001bb34:	7d 40 19 2d 	stwcx.  r10,0,r3
	--
	c001c7a4:	7d 40 18 28 	lwarx   r10,0,r3
	c001c7a8:	39 0a 00 01 	addi    r8,r10,1
	c001c7ac:	7d 00 19 2d 	stwcx.  r8,0,r3

For "and", the dot form has to be used because "andi" doesn't exist.

For logical operations we use unsigned 16 bits immediate.
For arithmetic operations we use signed 16 bits immediate.

On pmac32_defconfig, it reduces the text by approx another 8 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2ec558d44db8045752fe9dbd29c9ba84bab6030b.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:57 +11:00
Christophe Leroy
fb350784d8 powerpc/bitops: Use immediate operand when possible
Today we get the following code generation for bitops like
set or clear bit:

	c0009fe0:	39 40 08 00 	li      r10,2048
	c0009fe4:	7c e0 40 28 	lwarx   r7,0,r8
	c0009fe8:	7c e7 53 78 	or      r7,r7,r10
	c0009fec:	7c e0 41 2d 	stwcx.  r7,0,r8

	c000d568:	39 00 18 00 	li      r8,6144
	c000d56c:	7c c0 38 28 	lwarx   r6,0,r7
	c000d570:	7c c6 40 78 	andc    r6,r6,r8
	c000d574:	7c c0 39 2d 	stwcx.  r6,0,r7

Most set bits are constant on lower 16 bits, so it can easily
be replaced by the "immediate" version of the operation. Allow
GCC to choose between the normal or immediate form.

For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
when all bits to be cleared are consecutive.

On 64 bits we don't have any equivalent single operation for clearing,
single bits or a few bits, we'd need two 'rldicl' so it is not
worth it, the li/andc sequence is doing the same.

With this patch we get:

	c0009fe0:	7d 00 50 28 	lwarx   r8,0,r10
	c0009fe4:	61 08 08 00 	ori     r8,r8,2048
	c0009fe8:	7d 00 51 2d 	stwcx.  r8,0,r10

	c000d558:	7c e0 40 28 	lwarx   r7,0,r8
	c000d55c:	54 e7 05 64 	rlwinm  r7,r7,0,21,18
	c000d560:	7c e0 41 2d 	stwcx.  r7,0,r8

On pmac32_defconfig, it reduces the text by approx 10 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:50 +11:00
Nicholas Piggin
aebd1fb45c powerpc: flexible GPR range save/restore macros
Introduce macros that operate on a (start, end) range of GPRs, which
reduces lines of code and need to do mental arithmetic while reading the
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com
2021-11-29 23:15:20 +11:00
Nicholas Piggin
e012c49998 powerpc/watchdog: help remote CPUs to flush NMI printk output
The printk layer at the moment does not seem to have a good way to force
flush printk messages that are created in NMI context, except in the
panic path.

NMI-context printk messages normally get to the console with irq_work,
but that won't help if the CPU is stuck with irqs disabled, as can be
the case for hard lockup watchdog messages.

The watchdog currently flushes the printk buffers after detecting a
lockup on remote CPUs, but they may not have processed their NMI IPI
yet by that stage, or they may have self-detected a lockup in which
case they won't go via this NMI IPI path.

Improve the situation by having NMI-context mark a flag if it called
printk, and have watchdog timer interrupts check if that flag was set
and try to flush if it was. Latency is not a big problem because we
were already stuck for a while, just need to try to make sure the
messages eventually make it out.

Depends-on: 5d5e4522a7 ("printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119113146.752759-6-npiggin@gmail.com
2021-11-29 23:08:43 +11:00
Christophe Leroy
57dd3a7bdf powerpc: Don't bother about .data..Lubsan sections
Since commit 9a427556fb ("vmlinux.lds.h: catch compound literals
into data and BSS") .data..Lubsan sections are taken into account
in DATA_MAIN which is included in DATA_DATA macro.

No need to take care of them anymore in powerpc vmlinux.lds.S

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3eb14570612eef17e01bb67f14a4450136001794.1637840601.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
cdc81aece8 powerpc/ptdump: Fix display a BAT's size unit
We have wrong units on BAT's sizes (G instead of M, M instead of ...)

	---[ Instruction Block Address Translation ]---
	0: 0xc0000000-0xc03fffff 0x00000000         4G Kernel   x     m
	1: 0xc0400000-0xc05fffff 0x00400000         2G Kernel   x     m
	2: 0xc0600000-0xc06fffff 0x00600000         1G Kernel   x     m
	3: 0xc0700000-0xc077ffff 0x00700000       512M Kernel   x     m
	4: 0xc0780000-0xc079ffff 0x00780000       128M Kernel   x     m
	5: 0xc07a0000-0xc07bffff 0x007a0000       128M Kernel   x     m
	6:         -
	7:         -

This is because pt_dump_size() expects a size in Kbytes but
bat_show_603() gives the size in bytes.

To avoid risk of confusion, change pt_dump_size() to take bytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f16c30f5c9185a63335322cf1a8b22f189d335ef.1637922595.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
7dfbfb87c2 powerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32
Unlike PPC64, PPC32 doesn't require any special compiler option
to get _mcount() call not clobbering registers.

Provide ftrace_regs_caller() and ftrace_regs_call() and activate
HAVE_DYNAMIC_FTRACE_WITH_REGS.

That's heavily copied from ftrace_64_mprofile.S

For the time being leave livepatching aside, it will come with
following patch.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1862dc7719855cc2a4eec80920d94c955877557e.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
c93d4f6ecf powerpc/ftrace: Add module_trampoline_target() for PPC32
module_trampoline_target() is used by __ftrace_modify_call().

Implement it for PPC32 so that CONFIG_DYNAMIC_FTRACE_WITH_REGS
can be activated on PPC32 as well.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/42345f464fb465f0fc76f3090e250be8fc1729f0.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
88670fdb26 powerpc/ftrace: No need to read LR from stack in _mcount()
All functions calling _mcount do it exactly the same way, with the
following sequence of instructions:

	c07de788:       7c 08 02 a6     mflr    r0
	c07de78c:       90 01 00 04     stw     r0,4(r1)
	c07de790:       4b 84 13 65     bl      c001faf4 <_mcount>

Allthough LR is pushed on stack, it is still in r0 while entering
_mcount().

Function arguments are in r3-r10, so r11 and r12 are still available
at that point.

Do like PPC64 and use r12 to move LR into CTR, so that r0 is preserved
and doesn't need to be restored from the stack.

While at it, bring back the EXPORT_SYMBOL at the end of _mcount.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24a3ba7db388537c44a038026f926d885372e6d3.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Michael Ellerman
ab85a27395 powerpc: Mark probe_machine() __init and static
Prior to commit b1923caa6e ("powerpc: Merge 32-bit and 64-bit
setup_arch()") probe_machine() was called from setup_32/64.c and lived
in setup-common.c. But now it's only called from setup-common.c so it
can be static and __init, and we don't need the declaration in
machdep.h either.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-6-mpe@ellerman.id.au
2021-11-29 22:49:26 +11:00
Michael Ellerman
a4ac0d249a powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
setup_profiling_timer() is only needed when CONFIG_PROFILING is enabled.

Fixes the following W=1 warning when CONFIG_PROFILING=n:
  linux/arch/powerpc/kernel/smp.c:1638:5: error: no previous prototype for ‘setup_profiling_timer’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-5-mpe@ellerman.id.au
2021-11-29 22:49:23 +11:00
Michael Ellerman
ff47a95d1a powerpc/mm: Move tlbcam_sz() and make it static
Building with W=1 we see a warning:
  linux/arch/powerpc/mm/nohash/fsl_book3e.c:63:15: error: no previous prototype for ‘tlbcam_sz’

tlbcam_sz() is not used outside this file, so we can make it static.
However it's only used inside #ifdef CONFIG_PPC32, so move it within
that ifdef, otherwise we would get a defined but not used error.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-4-mpe@ellerman.id.au
2021-11-29 22:49:20 +11:00
Michael Ellerman
d9150d5bb5 powerpc/85xx: Make c293_pcie_pic_init() static
To fix the W=1 warning:
  linux/arch/powerpc/platforms/85xx/c293pcie.c:22:13: error: no previous prototype for ‘c293_pcie_pic_init’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-3-mpe@ellerman.id.au
2021-11-29 22:49:17 +11:00
Michael Ellerman
84a61fb43f powerpc/85xx: Make mpc85xx_smp_kexec_cpu_down() static
To fix the W=1 warning:
  arch/powerpc/platforms/85xx/smp.c:369:6: error: no previous prototype for ‘mpc85xx_smp_kexec_cpu_down’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-2-mpe@ellerman.id.au
2021-11-29 22:49:14 +11:00
Michael Ellerman
4ea9e321c2 powerpc/85xx: Fix no previous prototype warning for mpc85xx_setup_pmc()
Fixes the following W=1 warning:
  arch/powerpc/platforms/85xx/mpc85xx_pm_ops.c:89:12: warning: no previous prototype for 'mpc85xx_setup_pmc'

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-1-mpe@ellerman.id.au
2021-11-29 22:49:09 +11:00
Nicholas Piggin
2eafc4748b powerpc: select CPUMASK_OFFSTACK if NR_CPUS >= 8192
Some core kernel code starts to go beyond the 2048 byte stack size
warning at NR_CPUS=8192, so select CPUMASK_OFFSTACK in that case.
x86 does similarly for very large NR_CPUS.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105035042.1398309-2-npiggin@gmail.com
2021-11-29 22:48:32 +11:00
Nicholas Piggin
b350111bf7 powerpc: remove cpu_online_cores_map function
This function builds the cores online map with on-stack cpumasks which
can cause high stack usage with large NR_CPUS.

It is not used in any performance sensitive paths, so instead just check
for first thread sibling.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105035042.1398309-1-npiggin@gmail.com
2021-11-29 22:48:32 +11:00
Xiaoming Ni
3dc709e518 powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
When CONFIG_FSL_PMC is set to n, no value is assigned to cpu_up_prepare
in the mpc85xx_pm_ops structure. As a result, oops is triggered in
smp_85xx_start_cpu().

  smp: Bringing up secondary CPUs ...
  kernel tried to execute user page (0) - exploit attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch (NULL pointer?)
  Faulting instruction address: 0x00000000
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [00000000] 0x0
  LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568
  Call Trace:
  [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable)
  [c1051de8] [c0011460] __cpu_up+0xc0/0x228
  [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224
  [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c
  [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8
  [c1051eb8] [c07e67bc] smp_init+0x30/0x78
  [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8
  [c1051f18] [c00032d8] kernel_init+0x14/0x124
  [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c

Fixes: c45361abb9 ("powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n")
Reported-by: Martin Kennedy <hurricos@gmail.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Tested-by: Martin Kennedy <hurricos@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211126041153.16926-1-nixiaoming@huawei.com
2021-11-29 17:47:05 +11:00
Michael Ellerman
af3fdce4ab Revert "powerpc/code-patching: Improve verification of patchability"
This reverts commit 8b8a8f0ab3.

As reported[1] by Sachin this causes problems with ftrace, and it also
causes the code patching selftests to fail as reported[2] by Stephen.

So revert it for now.

1: https://lore.kernel.org/linuxppc-dev/3668743C-09DF-4673-B15C-2FFE2A57F7D7@linux.vnet.ibm.com/
2: https://lore.kernel.org/linuxppc-dev/20211126161747.1f7795b0@canb.auug.org.au/

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-11-29 17:41:52 +11:00
Linus Torvalds
7b65b798a6 powerpc fixes for 5.16 #3
Fix KVM using a Power9 instruction on earlier CPUs, which could lead to the host SLB being
 incorrectly invalidated and a subsequent host crash.
 
 Fix kernel hardlockup on vmap stack overflow on 32-bit.
 
 Thanks to: Christophe Leroy, Nicholas Piggin, Fabiano Rosas.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGh96MTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgGHDEACu37n/NLFp3qzHyrk4fHN6sUQ0wGsE
 oroKeIrTP2+JPsoEDd6Jd9oQmqmVLmVTp4y/Lu6AAJbK4p1He1iPIxIbZn12NwqY
 /siCihV76PXR8748g1j8r3a3UYDmmvmllAIf9JdAn2I4kKPE3J37/1NEGSqZyJ80
 HzkheTczUweANhVuG1RDh3WvnAKMuOvtrT/YgetVILig23oiUORQkZH5wcpScaCN
 qShnqKSuRt8HWaF3uRh6D0sf4o0BWfZ6Pjyi7R/T7s2dQVgz0uCtu4MuyGirx/0J
 uU3yDEZD24y6doyi6HUgAFGeGYqJXky2XDOMrgV6CNGFty9HFp6GgN6gtX353pZa
 9Xplw4kg9UJwGcyk1BvgdQBHdc83y5zmr8c2PfhgiP/5rfEyCTYtHgZKUu5LLW8Y
 ECYkSNnlAT68SancCmiXgos1heIiwB7922cRWJANfGEjItNBFldtb2QZio2bWOUX
 g4gZ8HN2P0OmOxV1nnNkmIe+ZKLeHPN2N/FR7pUdkFk2+BLWFx+SbrceCCicwMF4
 CUUzll8FMFmVAGjFivG/axzqlDWMY5lrDsFZamaWN3VQEo7TwL6l8/cDcFZF4QBq
 viLWbi7sfQm6M3tZXFS465sG4SpUQ4lI8PKptuz5DMdr6WKisTLR7OF0klxsEwZj
 ZcT8a/wzUIZ53w==
 =MBp+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix KVM using a Power9 instruction on earlier CPUs, which could lead
  to the host SLB being incorrectly invalidated and a subsequent host
  crash.

  Fix kernel hardlockup on vmap stack overflow on 32-bit.

  Thanks to Christophe Leroy, Nicholas Piggin, and Fabiano Rosas"

* tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Fix hardlockup on vmap stack overflow
  KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
2021-11-27 10:06:15 -08:00
Linus Torvalds
b501b85957 asm-generic: syscall table updates
André Almeida sends an update for the newly added futex_waitv
 syscall that was initially only added to a few architectures.
 
 Some additional ones have since made it through architecture
 maintainer trees, this finishes the remaining ones.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGfrmIACgkQmmx57+YA
 GNl0eg/9Eu2iliGxiQPzPw93a3DCYr7eiylbhHbgzvCbUtDL4cTE/0fcFGCYUoVu
 k5UT/5ja3Q4z8wE+MUbxXf6igr+ANKJNwomKksTJV1Q6wKF23k5vUEd2ZXGyvrSk
 F+2X0bpB0wZzmMBh98L6dWsEaDJ//8iRZNW4ErmPR4mp25iDqhzZ1k2RoRg9mYoM
 /Lw6u5zlBiqAw5nnMc6BxPo8gJGwgcKDu4VDngpGDVGlvBHgd7Kf1AKi1+aBumml
 WzVXEEdUp3LTj7O/8oU3dKPuZMhl+k/mKUvHn/cieG00FxfMpOnxj0gdaUGgG0O4
 imhQquXtPueq/W3Uod+MhVulKR6AziEOsPhJayMGopLReE0spKDxcplC/OSbMhGj
 0iA6auLdHpHBCg3KWZDj0Sjf2NwL9UCFVDrk8XAQQo7bY2U5LQbcbr0PS7iLlpzN
 gIb+r2k3Azwk/Iqnvl2/SR1cZTx5KxfxCydT03vXaYQXhc5l+u4RvjOZcSGgZXqd
 gzC78vd6fJGKyVeaL2xb6/w7qmA4DFd6sQAbgMlVFRRS6AIJrIdDEZAQztSJ+uYK
 p9A2qzeXz+ftzonySbPHmV4RBGjSVVecUBjRmmDfKNbqmFUjjnCwCv+LeDXh4IS6
 mjmSA/9liYdkHUaujwCLhOEskuCx9ZPeAsqmzXMER0BDzReVlH4=
 =Vy2A
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic syscall table update from Arnd Bergmann:
 "André Almeida sends an update for the newly added futex_waitv syscall
  that was initially only added to a few architectures.

  Some additional ones have since made it through architecture
  maintainer trees, this finishes the remaining ones"

* tag 'asm-generic-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  futex: Wireup futex_waitv syscall
2021-11-25 10:41:28 -08:00
André Almeida
a0eb2da92b futex: Wireup futex_waitv syscall
Wireup futex_waitv syscall for all remaining archs.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-25 14:26:12 +01:00
Nicholas Piggin
3d030e3018 powerpc/watchdog: Fix wd_smp_last_reset_tb reporting
wd_smp_last_reset_tb now gets reset by watchdog_smp_panic() as part of
marking CPUs stuck and removing them from the pending mask before it
begins any printing. This causes last reset times reported to be off.

Fix this by reading it into a local variable before it gets reset.

Fixes: 76521c4b02 ("powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211125103346.1188958-1-npiggin@gmail.com
2021-11-25 21:48:39 +11:00
Michael Ellerman
4afc78eae1 powerpc/microwatt: Make microwatt_get_random_darn() static
Make microwatt_get_random_darn() static, because it can be.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211118004415.1706863-1-mpe@ellerman.id.au
2021-11-25 11:25:34 +11:00
Nicholas Piggin
1f01bf9076 powerpc/watchdog: read TB close to where it is used
When taking watchdog actions, printing messages, comparing and
re-setting wd_smp_last_reset_tb, etc., read TB close to the point of use
and under wd_smp_lock or printing lock (if applicable).

This should keep timebase mostly monotonic with kernel log messages, and
could prevent (in theory) a laggy CPU updating wd_smp_last_reset_tb to
something a long way in the past, and causing other CPUs to appear to be
stuck.

These additional TB reads are all slowpath (lockup has been detected),
so performance does not matter.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-5-npiggin@gmail.com
2021-11-25 11:25:34 +11:00
Nicholas Piggin
76521c4b02 powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi
There is a deadlock with the console_owner lock and the wd_smp_lock:

CPU x takes the console_owner lock
CPU y takes a watchdog timer interrupt and takes __wd_smp_lock
CPU x takes a soft-NMI interrupt, detects deadlock, spins on __wd_smp_lock
CPU y detects deadlock, tries to print something and spins on console_owner
-> deadlock

Change the watchdog locking scheme so wd_smp_lock protects the watchdog
internal data, but "reporting" (printing, issuing NMI IPIs, taking any
action outside of watchdog) uses a non-waiting exclusion. If a CPU detects
a problem but can not take the reporting lock, it just returns because
something else is already reporting. It will try again at some point.

Typically hard lockup watchdog report usefulness is not impacted due to
failure to spewing a large enough amount of data in as short a time as
possible, but by messages getting garbled.

Laurent debugged this and found the deadlock, and this patch is based on
his general approach to avoid expensive operations while holding the lock.
With the addition of the reporting exclusion.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
[np: rework to add reporting exclusion update changelog]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-4-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Nicholas Piggin
858c93c315 powerpc/watchdog: tighten non-atomic read-modify-write access
Most updates to wd_smp_cpus_pending are under lock except the watchdog
interrupt bit clear.

This can race with non-atomic RMW updates to the mask under lock, which
can happen in two instances:

Firstly, if another CPU detects this one is stuck, removes it from the
mask, mask becomes empty and is re-filled with non-atomic stores. This
is okay because it would re-fill the mask with this CPU's bit clear
anyway (because this CPU is now stuck), so it doesn't matter that the
bit clear update got "lost". Add a comment for this.

Secondly, if another CPU detects a different CPU is stuck and removes it
from the pending mask with a non-atomic store to bytes which also
include the bit of this CPU. This case can result in the bit clear being
lost and the end result being the bit is set. This should be so rare it
hardly matters, but to make things simpler to reason about just avoid
the non-atomic access for that case.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-3-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Nicholas Piggin
5dad4ba68a powerpc/watchdog: Fix missed watchdog reset due to memory ordering race
It is possible for all CPUs to miss the pending cpumask becoming clear,
and then nobody resetting it, which will cause the lockup detector to
stop working. It will eventually expire, but watchdog_smp_panic will
avoid doing anything if the pending mask is clear and it will never be
reset.

Order the cpumask clear vs the subsequent test to close this race.

Add an extra check for an empty pending mask when the watchdog fires and
finds its bit still clear, to try to catch any other possible races or
bugs here and keep the watchdog working. The extra test in
arch_touch_nmi_watchdog is required to prevent the new warning from
firing off.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Debugged-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-2-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Peiwei Hu
869fb7e5ae powerpc/prom_init: Fix improper check of prom_getprop()
prom_getprop() can return PROM_ERROR. Binary operator can not identify
it.

Fixes: 94d2dde738 ("[POWERPC] Efika: prune fixups and make them more carefull")
Signed-off-by: Peiwei Hu <jlu.hpw@foxmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/tencent_BA28CC6897B7C95A92EB8C580B5D18589105@qq.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
dd5cde457a powerpc/rtas: rtas_busy_delay_time() kernel-doc
Provide API documentation for rtas_busy_delay_time(), explaining why we
return the same value for 9900 and -2.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-3-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
38f7b7067d powerpc/rtas: rtas_busy_delay() improvements
Generally RTAS cannot block, and in PAPR it is required to return control
to the OS within a few tens of microseconds. In order to support operations
which may take longer to complete, many RTAS primitives can return
intermediate -2 ("busy") or 990x ("extended delay") values, which indicate
that the OS should reattempt the same call with the same arguments at some
point in the future.

Current versions of PAPR are less than clear about this, but the intended
meanings of these values in more detail are:

RTAS_BUSY (-2): RTAS has suspended a potentially long-running operation in
order to meet its latency obligation and give the OS the opportunity to
perform other work. RTAS can resume making progress as soon as the OS
reattempts the call.

RTAS_EXTENDED_DELAY_{MIN...MAX} (9900-9905): RTAS must wait for an external
event to occur or for internal contention to resolve before it can complete
the requested operation. The value encodes a non-binding hint as to roughly
how long the OS should wait before calling again, but the OS is allowed to
reattempt the call sooner or even immediately.

Linux of course must take its own CPU scheduling obligations into account
when handling these statuses; e.g. a task which receives an RTAS_BUSY
status should check whether to reschedule before it attempts the RTAS call
again to avoid starving other tasks.

rtas_busy_delay() is a helper function that "consumes" a busy or extended
delay status. Common usage:

    int rc;

    do {
        rc = rtas_call(rtas_token("some-function"), ...);
    } while (rtas_busy_delay(rc));

    /* convert rc to Linux error value, etc */

If rc is a busy or extended delay status, the caller can rely on
rtas_busy_delay() to perform an appropriate sleep or reschedule and return
nonzero. Other statuses are handled normally by the caller.

The current implementation of rtas_busy_delay() both oversleeps and
overuses the CPU:

*  It performs msleep() for all 990x and even when no delay is
   suggested (-2), but this is understood to actually sleep for two jiffies
   minimum in practice (20ms with HZ=100). 9900 (1ms) and 9901 (10ms)
   appear to be the most common extended delay statuses, and the
   oversleeping measurably lengthens DLPAR operations, which perform
   many RTAS calls.

*  It does not sleep on 990x unless need_resched() is true, causing code
   like the loop above to needlessly retry, wasting CPU time.

Alter the logic to align better with the intended meanings:

*  When passed RTAS_BUSY, perform cond_resched() and return without
   sleeping. The caller should reattempt immediately

*  Always sleep when passed an extended delay status, using usleep_range()
   for precise shorter sleeps. Limit the sleep time to one second even
   though there are higher architected values.

Change rtas_busy_delay()'s return type to bool to better reflect its usage,
and add kernel-doc.

rtas_busy_delay_time() is unchanged, even though it "incorrectly" returns 1
for RTAS_BUSY. There are users of that API with open-coded delay loops in
sensitive contexts that will have to be taken on an individual basis.

Brief results for addition and removal of 5GB memory on a small P9 PowerVM
partition follow. Load was generated with stress-ng --cpu N. For add,
elapsed time is greatly reduced without significant change in the number of
RTAS calls or time spent on CPU. For remove, elapsed time is modestly
reduced, with significant reductions in RTAS calls and time spent on CPU.

With no competing workload (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             1,935      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
-            609.99 msec task-clock                #    0.183 CPUs utilized            ( +-  0.19% )
+             1,956      probe:rtas_call           #    0.003 M/sec                    ( +-  0.17% )
+            618.56 msec task-clock                #    0.278 CPUs utilized            ( +-  0.11% )

-            3.3322 +- 0.0670 seconds time elapsed  ( +-  2.01% )
+            2.2222 +- 0.0416 seconds time elapsed  ( +-  1.87% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             6,224      probe:rtas_call           #    0.008 M/sec                    ( +-  2.57% )
-            750.36 msec task-clock                #    0.190 CPUs utilized            ( +-  2.01% )
+               843      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
+            250.66 msec task-clock                #    0.068 CPUs utilized            ( +-  0.17% )

-            3.9394 +- 0.0890 seconds time elapsed  ( +-  2.26% )
+             3.678 +- 0.113 seconds time elapsed  ( +-  3.07% )

With all CPUs 100% busy (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             2,979      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
-          1,096.62 msec task-clock                #    0.105 CPUs utilized            ( +-  0.10% )
+             2,981      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
+          1,095.26 msec task-clock                #    0.154 CPUs utilized            ( +-  0.21% )

-            10.476 +- 0.104 seconds time elapsed  ( +-  1.00% )
+            7.1124 +- 0.0865 seconds time elapsed  ( +-  1.22% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             2,702      probe:rtas_call           #    0.004 M/sec                    ( +-  4.00% )
-            722.71 msec task-clock                #    0.067 CPUs utilized            ( +-  2.41% )
+             1,246      probe:rtas_call           #    0.003 M/sec                    ( +-  0.25% )
+            487.73 msec task-clock                #    0.049 CPUs utilized            ( +-  0.20% )

-            10.829 +- 0.163 seconds time elapsed  ( +-  1.51% )
+            9.9887 +- 0.0866 seconds time elapsed  ( +-  0.87% )

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-2-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
22887f319a powerpc/pseries: delete scanlog
Remove the pseries scanlog driver.

This code supports functions from Power4-era servers that are not present
on targets currently supported by arch/powerpc. System manuals from this
time have this description:

  Scan Dump data is a set of chip data that the service processor gathers
  after a system malfunction. It consists of chip scan rings, chip trace
  arrays, and Scan COM (SCOM) registers. This data is stored in the
  scan-log partition of the system’s Nonvolatile Random Access
  Memory (NVRAM).

PowerVM partition firmware development doesn't recognize the associated
function call or property, and they don't see any references to them in
their codebase. It seems to have been specific to non-virtualized pseries.

References:

Historical Linux commit from February 2003 (interesting to note this seems
to be the source of non-GPL exports for rtas_call etc):
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=f92e361842d5251e50562b09664082dcbd0548bb

IntelliStation and pSeries docs which refer to the feature:
http://ps-2.retropc.se/basil.holloway/ALL%20PDF/380635.pdf
http://ps-2.kev009.com/rs6000/manuals/p/p615-6C3-6E3/6C3_and_6E3_Users_Guide_SA38-0629.pdf

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210920173203.1800475-1-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
53cadf7dee powerpc/rtas: kernel-doc fixes
Fix the following issues reported by kernel-doc:

$ scripts/kernel-doc -v -none arch/powerpc/kernel/rtas.c
arch/powerpc/kernel/rtas.c:810: info: Scanning doc for function rtas_activate_firmware
arch/powerpc/kernel/rtas.c:818: warning: contents before sections
arch/powerpc/kernel/rtas.c:841: info: Scanning doc for function rtas_call_reentrant
arch/powerpc/kernel/rtas.c:893: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Find a specific pseries error log in an RTAS extended event log.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211116215806.928235-1-nathanl@linux.ibm.com
2021-11-25 11:25:32 +11:00
Christophe Leroy
8b8a8f0ab3 powerpc/code-patching: Improve verification of patchability
Today, patch_instruction() assumes that it is called exclusively on
valid addresses, and only checks that it is not called on an init
address after init section has been freed.

Improve verification by calling kernel_text_address() instead.

kernel_text_address() already includes a verification of
initmem release.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bc683d499a411730504b132a924de0ccc2ef1f79.1636971137.git.christophe.leroy@csgroup.eu
2021-11-25 11:25:32 +11:00
Jason Wang
a3bcfc182b powerpc/tsi108: make EXPORT_SYMBOL follow its function immediately
EXPORT_SYMBOL(foo); should immediately follow its function/variable.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211114115616.493815-1-wangborong@cdjrlc.com
2021-11-25 11:25:32 +11:00
Hari Bathini
e919c0b232 bpf ppc32: Access only if addr is kernel address
With KUAP enabled, any kernel code which wants to access userspace
needs to be surrounded by disable-enable KUAP. But that is not
happening for BPF_PROBE_MEM load instruction. Though PPC32 does not
support read protection, considering the fact that PTR_TO_BTF_ID
(which uses BPF_PROBE_MEM mode) could either be a valid kernel pointer
or NULL but should never be a pointer to userspace address, execute
BPF_PROBE_MEM load only if addr is kernel address, otherwise set
dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-9-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Hari Bathini
23b51916ee bpf ppc32: Add BPF_PROBE_MEM support for JIT
BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains 3 instructions,
first 2 instructions clear dest_reg (lower & higher 32-bit registers)
and last instruction jumps to next instruction in the BPF code.
extable 'insn' field contains relative offset of the instruction and
'fixup' field contains relative offset of the fixup entry. Example
layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| lwz   r28,4(r4)  |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r24)   |
             | lwz  r4,4(r24)   |
             |                  |
             |                  |
             |------------------|
   0x4278 -->| li  r28,0        |  \
             | li  r27,0        |  | fixup entry
             | b   0x4024       |  /
   0x4284 -->| li  r4,0         |
             | li  r3,0         |
             | b   0x40b4       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffe4 |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffe8 |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-8-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Ravi Bangoria
9c70c7147f bpf ppc64: Access only if addr is kernel address
On PPC64 with KUAP enabled, any kernel code which wants to
access userspace needs to be surrounded by disable-enable KUAP.
But that is not happening for BPF_PROBE_MEM load instruction.
So, when BPF program tries to access invalid userspace address,
page-fault handler considers it as bad KUAP fault:

  Kernel attempted to read user page (d0000000) - exploit attempt? (uid: 0)

Considering the fact that PTR_TO_BTF_ID (which uses BPF_PROBE_MEM
mode) could either be a valid kernel pointer or NULL but should
never be a pointer to userspace address, execute BPF_PROBE_MEM load
only if addr is kernel address, otherwise set dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-7-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Ravi Bangoria
983bdc0245 bpf ppc64: Add BPF_PROBE_MEM support for JIT
BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains two instructions,
first instruction clears dest_reg and 2nd jumps to next instruction
in the BPF code. extable 'insn' field contains relative offset of
the instruction and 'fixup' field contains relative offset of the
fixup entry. Example layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| ld   r27,4(r3)   |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r4)    |
             |                  |
             |                  |
             |------------------|
   0x4280 -->| li  r27,0        |  \ fixup entry
             | b   0x4024       |  /
   0x4288 -->| li  r3,0         |
             | b   0x40b0       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffec |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffec |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-6-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Hari Bathini
f15a71b388 powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
Define and use PPC_RAW_BRANCH() macro instead of open coding it. This
macro is used while adding BPF_PROBE_MEM support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-5-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Hari Bathini
efa95f031b bpf powerpc: refactor JIT compiler code
Refactor powerpc LDX JITing code to simplify adding BPF_PROBE_MEM
support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-4-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Ravi Bangoria
04c04205bc bpf powerpc: Remove extra_pass from bpf_jit_build_body()
In case of extra_pass, usual JIT passes are always skipped. So,
extra_pass is always false while calling bpf_jit_build_body() and
can be removed.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-3-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Ravi Bangoria
c9ce7c36e4 bpf powerpc: Remove unused SEEN_STACK
SEEN_STACK is unused on PowerPC. Remove it. Also, have
SEEN_TAILCALL use 0x40000000.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-2-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Oliver O'Halloran
157616f3c2 powerpc/eeh: Use a goto for recovery failures
The EEH recovery logic in eeh_handle_normal_event() has some pretty strange
flow control. If we remove all the actual recovery logic we're left with
the following skeleton:

	if (result != PCI_ERS_RESULT_DISCONNECT) {
		...
	}

	if (result != PCI_ERS_RESULT_DISCONNECT) {
		...
	}

	if (result == PCI_ERS_RESULT_NONE) {
		...
	}

	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
		...
	}

	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
		...
	}

	if (result == PCI_ERS_RESULT_NEED_RESET) {
		...
	}

	if ((result == PCI_ERS_RESULT_RECOVERED) ||
	    (result == PCI_ERS_RESULT_NONE)) {
		...
		goto out;
	}

	/*
	 * unsuccessful recovery / PCI_ERS_RESULT_DISCONECTED
	 * handling is here.
	 */
	...

	out:
	...

Most of the "if () { ... }" blocks above change "result" to
PCI_ERS_RESULT_DISCONNECTED if an error occurs in that recovery step. This
makes the control flow a bit confusing since it breaks the early-exit
pattern that is generally used in Linux. In any case we end up handling the
error in the final else block so why not just jump there directly? Doing so
also allows us to de-indent a bunch of code.

No functional changes.

[dja: rebase on top of linux-next + my preceeding refactor,
      move clearing the EEH_DEV_NO_HANDLER bit above the first goto so that
      it is always clear in the error handler code as it was before.]

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015070628.1331635-2-dja@axtens.net
2021-11-25 11:25:31 +11:00
Daniel Axtens
10b34ece13 powerpc/eeh: Small refactor of eeh_handle_normal_event()
The control flow of eeh_handle_normal_event() is a bit tricky.

Break out one of the error handling paths - rather than be in an else
block, we'll make it part of the regular body of the function and put a
'goto out;' in the true limb of the if.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015070628.1331635-1-dja@axtens.net
2021-11-25 11:25:31 +11:00
Cédric Le Goater
1e7684dc4f powerpc/xive: Add a debugfs toggle for save-restore
On POWER10, the automatic "save & restore" of interrupt context is
always available. Provide a way to deactivate it for tests or
performance.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-11-clg@kaod.org
2021-11-25 11:25:31 +11:00
Cédric Le Goater
c21ee04f11 powerpc/xive: Add a kernel parameter for StoreEOI
StoreEOI is activated by default on platforms supporting the feature
(POWER10) and will be used as soon as firmware advertises its
availability. The kernel parameter provides a way to deactivate its
use. It can be still be reactivated through debugfs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-10-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
d7bc1e376c powerpc/xive: Add a debugfs toggle for StoreEOI
It can be used to deactivate temporarily StoreEOI for tests or
performance on platforms supporting the feature (POWER10)

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-9-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
08f3f61021 powerpc/xive: Add a debugfs file to dump EQs
The XIVE driver under Linux uses a single interrupt priority and only
one event queue is configured per CPU. Expose the contents under
a 'xive/eqs/cpuX' debugfs file.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-8-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
33e1d4a152 powerpc/xive: Rename the 'cpus' debugfs file to 'ipis'
and remove the EQ entries output which is not very useful since only
the next two events of the queue are taken into account. We will
improve the dump of the EQ in the next patches.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-7-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
baed14de78 powerpc/xive: Change the debugfs file 'xive' into a directory
Use a 'cpus' file to dump CPU states and 'interrupts' to dump IRQ states.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-6-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
412877dfae powerpc/xive: Introduce xive_core_debugfs_create()
and fix some compile issues when !CONFIG_DEBUG_FS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[mpe: Add empty stub to fix !CONFIG_DEBUG_FS build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-5-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
756c52c632 powerpc/xive: Activate StoreEOI on P10
StoreEOI (the capability to EOI with a store) requires load-after-store
ordering in some cases to be reliable. P10 introduced a new offset for
load operations to enforce correct ordering and the XIVE driver has
the required support since kernel 5.8, commit b1f9be9392
("powerpc/xive: Enforce load-after-store ordering when StoreEOI is active")

Since skiboot v7, StoreEOI support is advertised on P10 with a new flag
on the PowerNV platform. See skiboot commit 4bd7d84afe46 ("xive/p10:
Introduce a new OPAL_XIVE_IRQ_STORE_EOI2 flag"). When detected,
activate the feature.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-4-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
bd5b00c6cf powerpc/xive: Introduce an helper to print out interrupt characteristics
and extend output of debugfs and xmon with addresses of the ESB
management and trigger pages.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-3-clg@kaod.org
2021-11-25 11:25:29 +11:00
Cédric Le Goater
44b9c8ddcb powerpc/xive: Replace pr_devel() by pr_debug() to ease debug
These routines are not on hot code paths and pr_debug() is easier to
activate. Also add a '0x' prefix to hex printed values (HW IRQ number).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-2-clg@kaod.org
2021-11-25 11:25:29 +11:00
Nicholas Piggin
d02fa40d75 powerpc/powernv: Remove POWER9 PVR version check for entry and uaccess flushes
These aren't necessarily POWER9 only, and it's not to say some new
vulnerability may not get discovered on other processors for which
we would like the flexibility of having the workaround enabled by
firmware.

Remove the restriction that the workarounds only apply to POWER9.

However POWER7 and POWER8 are not affected, and they may not have
older firmware that does not advertise this, so clear these workarounds
manually.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
[mpe: Incorporate changes from Nick, reword comment slightly.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-5-npiggin@gmail.com
2021-11-25 11:25:29 +11:00
Julia Lawall
a1d2b210ff powerpc/btext: add missing of_node_put
for_each_node_by_type performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
local idexpression n;
expression e;
@@

 for_each_node_by_type(n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1448051604-25256-6-git-send-email-Julia.Lawall@lip6.fr
2021-11-25 11:25:29 +11:00
Julia Lawall
a841fd009e powerpc/cell: add missing of_node_put
for_each_node_by_name performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
expression e,e1;
local idexpression n;
@@

 for_each_node_by_name(n, e1) {
   ... when != of_node_put(n)
       when != e = n
(
   return n;
|
+  of_node_put(n);
?  return ...;
)
   ...
 }
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1448051604-25256-7-git-send-email-Julia.Lawall@lip6.fr
2021-11-25 11:25:29 +11:00
Julia Lawall
7d405a939c powerpc/powernv: add missing of_node_put
for_each_compatible_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
local idexpression n;
expression e;
@@

 for_each_compatible_node(n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1448051604-25256-4-git-send-email-Julia.Lawall@lip6.fr
2021-11-25 11:25:29 +11:00
Julia Lawall
f6e82647ff powerpc/6xx: add missing of_node_put
for_each_compatible_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
expression e;
local idexpression n;
@@

@@
local idexpression n;
expression e;
@@

 for_each_compatible_node(n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1448051604-25256-2-git-send-email-Julia.Lawall@lip6.fr
2021-11-25 11:25:29 +11:00
Nicholas Piggin
9c5a432a55 KVM: PPC: Book3S HV P9: Remove subcore HMI handling
On POWER9 and newer, rather than the complex HMI synchronisation and
subcore state, have each thread un-apply the guest TB offset before
calling into the early HMI handler.

This allows the subcore state to be avoided, including subcore enter
/ exit guest, which includes an expensive divide that shows up
slightly in profiles.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-54-npiggin@gmail.com
2021-11-24 21:09:03 +11:00
Nicholas Piggin
6398326b9b KVM: PPC: Book3S HV P9: Stop using vc->dpdes
The P9 path uses vc->dpdes only for msgsndp / SMT emulation. This adds
an ordering requirement between vcpu->doorbell_request and vc->dpdes for
no real benefit. Use vcpu->doorbell_request directly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-53-npiggin@gmail.com
2021-11-24 21:09:03 +11:00
Nicholas Piggin
617326ff01 KVM: PPC: Book3S HV P9: Tidy kvmppc_create_dtl_entry
This goes further to removing vcores from the P9 path. Also avoid the
memset in favour of explicitly initialising all fields.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-52-npiggin@gmail.com
2021-11-24 21:09:03 +11:00
Nicholas Piggin
ecb6a7207f KVM: PPC: Book3S HV P9: Remove most of the vcore logic
The P9 path always uses one vcpu per vcore, so none of the vcore, locks,
stolen time, blocking logic, shared waitq, etc., is required.

Remove most of it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-51-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
434398ab5e KVM: PPC: Book3S HV P9: Avoid cpu_in_guest atomics on entry and exit
cpu_in_guest is set to determine if a CPU needs to be IPI'ed to exit
the guest and notice the need_tlb_flush bit.

This can be implemented as a global per-CPU pointer to the currently
running guest instead of per-guest cpumasks, saving 2 atomics per
entry/exit. P7/8 doesn't require cpu_in_guest, nor does a nested HV
(only the L0 does), so move it to the P9 HV path.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-50-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
4c9a68914e KVM: PPC: Book3S HV P9: Add unlikely annotation for !mmu_ready
The mmu will almost always be ready.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-49-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
f08cbf5c7d KVM: PPC: Book3S HV P9: Avoid changing MSR[RI] in entry and exit
kvm_hstate.in_guest provides the equivalent of MSR[RI]=0 protection,
and it covers the existing MSR[RI]=0 section in late entry and early
exit, so clearing and setting MSR[RI] in those cases does not
actually do anything useful.

Remove the RI manipulation and replace it with comments. Make the
in_guest memory accesses a bit closer to a proper critical section
pattern. This speeds up guest entry/exit performance.

This also removes the MSR[RI] warnings which aren't very interesting
and would cause crashes if they hit due to causing an interrupt in
non-recoverable code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-48-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
241d1f19f0 KVM: PPC: Book3S HV P9: Optimise hash guest SLB saving
slbmfee/slbmfev instructions are very expensive, moreso than a regular
mfspr instruction, so minimising them significantly improves hash guest
exit performance. The slbmfev is only required if slbmfee found a valid
SLB entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-47-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
b49c65c5f9 KVM: PPC: Book3S HV P9: Improve mfmsr performance on entry
Rearrange the MSR saving on entry so it does not follow the mtmsrd to
disable interrupts, avoiding a possible RAW scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-46-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
46dea77f79 KVM: PPC: Book3S HV Nested: Avoid extra mftb() in nested entry
mftb() is expensive and one can be avoided on nested guest dispatch.

If the time checking code distinguishes between the L0 timer and the
nested HV timer, then both can be tested in the same place with the
same mftb() value.

This also nicely illustrates the relationship between the L0 and nested
HV timers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-45-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin
d5c0e8332d KVM: PPC: Book3S HV P9: Avoid tlbsync sequence on radix guest exit
Use the existing TLB flushing logic to IPI the previous CPU and run the
necessary barriers before running a guest vCPU on a new physical CPU,
to do the necessary radix GTSE barriers for handling the case of an
interrupted guest tlbie sequence.

This requires the vCPU TLB flush sequence that is currently just done
on one thread, to be expanded to ensure the other threads execute a
ptesync, because causing them to exit the guest will no longer cause a
ptesync by itself.

This results in more IPIs than the TLB flush logic requires, but it's
a significant win for common case scheduling when the vCPU remains on
the same physical CPU.

This saves about 520 cycles (nearly 10%) on a guest entry+exit micro
benchmark on a POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-44-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
0ba0e5d5a6 KVM: PPC: Book3S HV: Split P8 from P9 path guest vCPU TLB flushing
This creates separate functions for old and new paths for vCPU TLB
flushing, which will reduce complexity of the next change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-43-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
a089a6869e KVM: PPC: Book3S HV P9: Don't restore PSSCR if not needed
This also moves the PSSCR update in nested entry to avoid a SPR
scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-42-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
9c75f65f35 KVM: PPC: Book3S HV P9: Test dawr_enabled() before saving host DAWR SPRs
Some of the DAWR SPR access is already predicated on dawr_enabled(),
apply this to the remainder of the accesses.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-41-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
cf3b16cfa6 KVM: PPC: Book3S HV P9: Comment and fix MMU context switching code
Tighten up partition switching code synchronisation and comments.

In particular, hwsync ; isync is required after the last access that is
performed in the context of a partition, before the partition is
switched away from.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-40-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
5236756d04 KVM: PPC: Book3S HV P9: Use Linux SPR save/restore to manage some host SPRs
Linux implements SPR save/restore including storage space for registers
in the task struct for process context switching. Make use of this
similarly to the way we make use of the context switching fp/vec save
restore.

This improves code reuse, allows some stack space to be saved, and helps
with avoiding VRSAVE updates if they are not required.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-39-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
022ecb960c KVM: PPC: Book3S HV P9: Demand fault TM facility registers
Use HFSCR facility disabling to implement demand faulting for TM, with
a hysteresis counter similar to the load_fp etc counters in context
switching that implement the equivalent demand faulting for userspace
facilities.

This speeds up guest entry/exit by avoiding the register save/restore
when a guest is not frequently using them. When a guest does use them
often, there will be some additional demand fault overhead, but these
are not commonly used facilities.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-38-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
a3e18ca8ab KVM: PPC: Book3S HV P9: Demand fault EBB facility registers
Use HFSCR facility disabling to implement demand faulting for EBB, with
a hysteresis counter similar to the load_fp etc counters in context
switching that implement the equivalent demand faulting for userspace
facilities.

This speeds up guest entry/exit by avoiding the register save/restore
when a guest is not frequently using them. When a guest does use them
often, there will be some additional demand fault overhead, but these
are not commonly used facilities.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-37-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin
34e02d555d KVM: PPC: Book3S HV P9: More SPR speed improvements
This avoids more scoreboard stalls and reduces mtSPRs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-36-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
d55b1eccc7 KVM: PPC: Book3S HV P9: Restrict DSISR canary workaround to processors that require it
Use CPU_FTR_P9_RADIX_PREFETCH_BUG to apply the workaround, to test for
DD2.1 and below processors. This saves a mtSPR in guest entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-35-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
3e7b337902 KVM: PPC: Book3S HV P9: Switch PMU to guest as late as possible
This moves PMU switch to guest as late as possible in entry, and switch
back to host as early as possible at exit. This helps the host get the
most perf coverage of KVM entry/exit code as possible.

This is slightly suboptimal for SPR scheduling point of view when the
PMU is enabled, but when perf is disabled there is no real difference.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-34-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
3f9e2966d1 KVM: PPC: Book3S HV P9: Implement TM fastpath for guest entry/exit
If TM is not active, only TM register state needs to be saved and
restored, avoiding several mfmsr/mtmsrd instructions and improving
performance.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-33-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
d5f4801945 KVM: PPC: Book3S HV P9: Move remaining SPR and MSR access into low level entry
Move register saving and loading from kvmhv_p9_guest_entry() into the HV
and nested entry handlers.

Accesses are scheduled to reduce mtSPR / mfSPR interleaving which
reduces SPR scoreboard stalls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-32-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
08b3f08af5 KVM: PPC: Book3S HV P9: Move nested guest entry into its own function
Move the part of the guest entry which is specific to nested HV into its
own function. This is just refactoring.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-31-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
aabcaf6ae2 KVM: PPC: Book3S HV P9: Move host OS save/restore functions to built-in
Move the P9 guest/host register switching functions to the built-in
P9 entry code, and export it for nested to use as well.

This allows more flexibility in scheduling these supervisor privileged
SPR accesses with the HV privileged and PR SPR accesses in the low level
entry code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-30-npiggin@gmail.com
2021-11-24 21:09:00 +11:00
Nicholas Piggin
516b334210 KVM: PPC: Book3S HV P9: Move vcpu register save/restore into functions
This should be no functional difference but makes the caller easier
to read.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-29-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
0f3b6c4851 KVM: PPC: Book3S HV P9: Juggle SPR switching around
This juggles SPR switching on the entry and exit sides to be more
symmetric, which makes the next refactoring patch possible with no
functional change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-28-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
9dfe7aa7bc KVM: PPC: Book3S HV P9: Only execute mtSPR if the value changed
Keep better track of the current SPR value in places where
they are to be loaded with a new context, to reduce expensive
mtSPR operations.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-27-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
9a1e530bbb KVM: PPC: Book3S HV P9: Avoid SPR scoreboard stalls
Avoid interleaving mfSPR and mtSPR to reduce SPR scoreboard stalls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-26-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
cb2553a093 KVM: PPC: Book3S HV P9: Optimise timebase reads
Reduce the number of mfTB executed by passing the current timebase
around entry and exit code rather than read it multiple times.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-25-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
6547af3eba KVM: PPC: Book3S HV P9: Move TB updates
Move the TB updates between saving and loading guest and host SPRs,
to improve scheduling by keeping issue-NTC operations together as
much as possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-24-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
3c1a4322bb KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase
Change dec_expires to be relative to the guest timebase, and allow
it to be moved into low level P9 guest entry functions, to improve
SPR access scheduling.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-23-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
cf99dedb4b KVM: PPC: Book3S HV P9: Add kvmppc_stop_thread to match kvmppc_start_thread
Small cleanup makes it a bit easier to match up entry and exit
operations.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-22-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin
2251fbe763 KVM: PPC: Book3S HV P9: Improve mtmsrd scheduling by delaying MSR[EE] disable
Moving the mtmsrd after the host SPRs are saved and before the guest
SPRs start to be loaded can prevent an SPR scoreboard stall (because
the mtmsrd is L=1 type which does not cause context synchronisation.

This is also now more convenient to combined with the mtmsrd L=0
instruction to enable facilities just below, but that is not done yet.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-21-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
34e119c96b KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs
This reduces the number of mtmsrd required to enable facility bits when
saving/restoring registers, by having the KVM code set all bits up front
rather than using individual facility functions that set their particular
MSR bits.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-20-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
174a3ab633 KVM: PPC: Book3S HV P9: Move SPRG restore to restore_p9_host_os_sprs
Move the SPR update into its relevant helper function. This will
help with SPR scheduling improvements in later changes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-19-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
a1a19e1154 KVM: PPC: Book3S HV: CTRL SPR does not require read-modify-write
Processors that support KVM HV do not require read-modify-write of
the CTRL SPR to set/clear their thread's runlatch. Just write 1 or 0
to it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-18-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
b1adcf57ce KVM: PPC: Book3S HV P9: Factor out yield_count increment
Factor duplicated code into a helper function.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-17-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
9d3ddb86d9 KVM: PPC: Book3S HV P9: Demand fault PMU SPRs when marked not inuse
The pmcregs_in_use field in the guest VPA can not be trusted to reflect
what the guest is doing with PMU SPRs, so the PMU must always be managed
(stopped) when exiting the guest, and SPR values set when entering the
guest to ensure it can't cause a covert channel or otherwise cause other
guests or the host to misbehave.

So prevent guest access to the PMU with HFSCR[PM] if pmcregs_in_use is
clear, and avoid the PMU SPR access on every partition switch. Guests
that set pmcregs_in_use incorrectly or when first setting it and using
the PMU will take a hypervisor facility unavailable interrupt that will
bring in the PMU SPRs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-16-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
401e1ae372 KVM: PPC: Book3S HV P9: Factor PMU save/load into context switch functions
Rather than guest/host save/retsore functions, implement context switch
functions that take care of details like the VPA update for nested.

The reason to split these kind of helpers into explicit save/load
functions is mainly to schedule SPR access nicely, but PMU is a special
case where the load requires mtSPR (to stop counters) and other
difficulties, so there's less possibility to schedule those nicely. The
SPR accesses also have side-effects if the PMU is running, and in later
changes we keep the host PMU running as long as possible so this code
can be better profiled, which also complicates scheduling.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-15-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
57dc0eed73 KVM: PPC: Book3S HV P9: Implement PMU save/restore in C
Implement the P9 path PMU save/restore code in C, and remove the
POWER9/10 code from the P7/8 path assembly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-14-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin
0a4b4327ce powerpc/64s: Implement PMU override command line option
It can be useful in simulators (with very constrained environments)
to allow some PMCs to run from boot so they can be sampled directly
by a test harness, rather than having to run perf.

A previous change freezes counters at boot by default, so provide
a boot time option to un-freeze (plus a bit more flexibility).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-13-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
245ebf8e73 powerpc/64s: Always set PMU control registers to frozen/disabled when not in use
KVM PMU management code looks for particular frozen/disabled bits in
the PMU registers so it knows whether it must clear them when coming
out of a guest or not. Setting this up helps KVM make these optimisations
without getting confused. Longer term the better approach might be to
move guest/host PMU switching to the perf subsystem.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-12-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
d3c8a2d374 KVM: PPC: Book3S HV: Don't always save PMU for guest capable of nesting
Provide a config option that controls the workaround added by commit
63279eeb7f ("KVM: PPC: Book3S HV: Always save guest pmu for guest
capable of nesting"). The option defaults to y for now, but is expected
to go away within a few releases.

Nested capable guests running with the earlier commit 1782663897
("KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest
SPRs are live") will now indicate the PMU in-use status of their guests,
which means the parent does not need to unconditionally save the PMU for
nested capable guests.

After this latest round of performance optimisations, this option costs
about 540 cycles or 10% entry/exit performance on a POWER9 nested-capable
guest.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
References: 1782663897 ("KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs are live")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-11-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
46f9caf1a2 powerpc/64s: Keep AMOR SPR a constant ~0 at runtime
This register controls supervisor SPR modifications, and as such is only
relevant for KVM. KVM always sets AMOR to ~0 on guest entry, and never
restores it coming back out to the host, so it can be kept constant and
avoid the mtSPR in KVM guest entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-10-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
eacc818864 KVM: PPC: Book3S HV: POWER10 enable HAIL when running radix guests
HV interrupts may be taken with the MMU enabled when radix guests are
running. Enable LPCR[HAIL] on ISA v3.1 processors for radix guests.
Make this depend on the host LPCR[HAIL] being enabled. Currently that is
always enabled, but having this test means any issue that might require
LPCR[HAIL] to be disabled in the host will not have to be duplicated in
KVM.

This optimisation takes 1380 cycles off a NULL hcall entry+exit micro
benchmark on a POWER10.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-9-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
25aa145856 powerpc/time: add API for KVM to re-arm the host timer/decrementer
Rather than have KVM look up the host timer and fiddle with the
irq-work internal details, have the powerpc/time.c code provide a
function for KVM to re-arm the Linux timer code when exiting a
guest.

This is implementation has an improvement over existing code of
marking a decrementer interrupt as soft-pending if a timer has
expired, rather than setting DEC to a -ve value, which tended to
cause host timers to take two interrupts (first hdec to exit the
guest, then the immediate dec).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-8-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
34bf08a207 KVM: PPC: Book3S HV P9: Reduce mftb per guest entry/exit
mftb is serialising (dispatch next-to-complete) so it is heavy weight
for a mfspr. Avoid reading it multiple times in the entry or exit paths.
A small number of cycles delay to timers is tolerable.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-7-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
9581991a60 KVM: PPC: Book3S HV P9: Use large decrementer for HDEC
On processors that don't suppress the HDEC exceptions when LPCR[HDICE]=0,
this could help reduce needless guest exits due to leftover exceptions on
entering the guest.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-6-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin
4ebbd075bc KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer read
There is no need to save away the host DEC value, as it is derived
from the host timer subsystem which maintains the next timer time,
so it can be restored from there.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-5-npiggin@gmail.com
2021-11-24 21:08:56 +11:00
Nicholas Piggin
5955c7469a KMV: PPC: Book3S HV P9: Use set_dec to set decrementer to host
The host Linux timer code arms the decrementer with the value
'decrementers_next_tb - current_tb' using set_dec(), which stores
val - 1 on Book3S-64, which is not quite the same as what KVM does
to re-arm the host decrementer when exiting the guest.

This shouldn't be a significant change, but it makes the logic match
and avoids this small extra change being brought into the next patch.

Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-4-npiggin@gmail.com
2021-11-24 21:08:56 +11:00
Nicholas Piggin
736df58fd5 powerpc/64s: guard optional TIDR SPR with CPU ftr test
The TIDR SPR only exists on POWER9. Avoid accessing it when the
feature bit for it is not set.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-3-npiggin@gmail.com
2021-11-24 21:08:56 +11:00
Nicholas Piggin
f53884b1bf powerpc/64s: Remove WORT SPR from POWER9/10 (take 2)
This removes a missed remnant of the WORT SPR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-2-npiggin@gmail.com
2021-11-24 21:08:56 +11:00
Christophe Leroy
5bb60ea611 powerpc/32: Fix hardlockup on vmap stack overflow
Since the commit c118c7303a ("powerpc/32: Fix vmap stack - Do not
activate MMU before reading task struct") a vmap stack overflow
results in a hard lockup. This is because emergency_ctx is still
addressed with its virtual address allthough data MMU is not active
anymore at that time.

Fix it by using a physical address instead.

Fixes: c118c7303a ("powerpc/32: Fix vmap stack - Do not activate MMU before reading task struct")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ce30364fb7ccda489272af4a1612b6aa147e1d23.1637227521.git.christophe.leroy@csgroup.eu
2021-11-24 21:00:51 +11:00
Nicholas Piggin
cf0b0e3712 KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
The POWER9 ERAT flush instruction is a SLBIA with IH=7, which is a
reserved value on POWER7/8. On POWER8 this invalidates the SLB entries
above index 0, similarly to SLBIA IH=0.

If the SLB entries are invalidated, and then the guest is bypassed, the
host SLB does not get re-loaded, so the bolted entries above 0 will be
lost. This can result in kernel stack access causing a SLB fault.

Kernel stack access causing a SLB fault was responsible for the infamous
mega bug (search "Fix SLB reload bug"). Although since commit
48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C") that
starts using the kernel stack in the SLB miss handler, it might only
result in an infinite loop of SLB faults. In any case it's a bug.

Fix this by only executing the instruction on >= POWER9 where IH=7 is
defined not to invalidate the SLB. POWER7/8 don't require this ERAT
flush.

Fixes: 5008711259 ("KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119031627.577853-1-npiggin@gmail.com
2021-11-24 21:00:36 +11:00
Linus Torvalds
75603b14ed powerpc fixes for 5.16 #2
Fix a bug in copying of sigset_t for 32-bit systems, which caused X to not start.
 
 Fix handling of shared LSIs (rare) with the xive interrupt controller (Power9/10).
 
 Fix missing TOC setup in some KVM code, which could result in oopses depending on kernel
 data layout.
 
 Fix DMA mapping when we have persistent memory and only one DMA window available.
 
 Fix further problems with STRICT_KERNEL_RWX on 8xx, exposed by a recent fix.
 
 A couple of other minor fixes.
 
 Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Cédric Le Goater, Christian Zigotzky,
 Christophe Leroy, Daniel Axtens, Finn Thain, Greg Kurz, Masahiro Yamada, Nicholas Piggin,
 Uwe Kleine-König.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGZzGMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBrRD/4qE1A3+nXe+uZRJM3H5F8C/Ui2I/1G
 JPekyfW9aZklsv8SMlz8BotDTlK8vNwiEtkAuwqLOfPXPi1p/Y1do4sPtXAjUpuX
 mXZP3G9K2xXmALLedXMjJNO6YJjTT5LE7OT42QziSfY1ScS7iqfGNANg1zRjkCRW
 yf2cpBbMRnWdDhCgWyE/V/V4xdPyOTTnnWn3d4F3qNshV0luKgTJl/9yo0OmQrGe
 /T4Cw8jG5p+pSblNyFaACnYlKWF4bYTQIl5NWsvJY0A2cg3I5ah6+hexdGRN/JdI
 K3PWpJ8rx5RjICkTFE4cADI6xIF1bHhjMh3ytcaMH5USBMmW3fTUUfcFwjRkRDHa
 b8Z6V631mgK1v3L0RlrAn+PZ9R212wpupvQT6YOf4pFb5+BzOyaCQCzyQv+BnwoI
 Fwran0HEO6NUODq4off9MADEpNTjwhV2mDFojxiCJ9eb1oCIilLbs8BOUWRSHYe0
 1S22pdj9XSR7yxXt5DnjQBwhR47ZS7D3jXf9gjbmJ/qn6cRPAFzt/m/woSY2Vv7T
 UrZVjz5lb+skjij7vxw+L9jUIwLBd99cvBiHzJpWUNc0RTQeBlAh4QBK/1MNixCP
 93LTN7tsRdGknLRTJ5yfRhEhwuhTTH8SEPp3H+qOZj9sXwq3Bftl4Nm40AgoATHO
 G4kPlgrCMQBcRQ==
 =Ss4y
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc fixes from Michael Ellerman:

 - Fix a bug in copying of sigset_t for 32-bit systems, which caused X
   to not start.

 - Fix handling of shared LSIs (rare) with the xive interrupt controller
   (Power9/10).

 - Fix missing TOC setup in some KVM code, which could result in oopses
   depending on kernel data layout.

 - Fix DMA mapping when we have persistent memory and only one DMA
   window available.

 - Fix further problems with STRICT_KERNEL_RWX on 8xx, exposed by a
   recent fix.

 - A couple of other minor fixes.

Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Cédric Le Goater,
Christian Zigotzky, Christophe Leroy, Daniel Axtens, Finn Thain, Greg
Kurz, Masahiro Yamada, Nicholas Piggin, and Uwe Kleine-König.

* tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/xive: Change IRQ domain to a tree domain
  powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
  powerpc/signal32: Fix sigset_t copy
  powerpc/book3e: Fix TLBCAM preset at boot
  powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window
  powerpc/pseries/ddw: simplify enable_ddw()
  powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
  powerpc/pseries: Fix numa FORM2 parsing fallback code
  powerpc/pseries: rename numa_dist_table to form2_distances
  powerpc: clean vdso32 and vdso64 directories
  powerpc/83xx/mpc8349emitx: Drop unused variable
  KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
2021-11-21 10:26:35 -08:00
Linus Torvalds
7af959b5d5 Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit-vs-signal handling fixes from Eric Biederman:
 "This is a small set of changes where debuggers were no longer able to
  intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit
  cleanups.

  This is essentially the change you suggested with all of i's dotted
  and the t's crossed so that ptrace can intercept all of the cases it
  has been able to intercept the past, and all of the cases that made it
  to exit without giving ptrace a chance still don't give ptrace a
  chance"

* 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Replace force_fatal_sig with force_exit_sig when in doubt
  signal: Don't always set SA_IMMUTABLE for forced signals
2021-11-19 11:33:31 -08:00
Eric W. Biederman
fcb116bc43 signal: Replace force_fatal_sig with force_exit_sig when in doubt
Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.

Unfortunately this broke debuggers[1][2] which reasonably expect
to be able to trap synchronous SIGTRAP and SIGSEGV even when
the target process is not configured to handle those signals.

Add force_exit_sig and use it instead of force_fatal_sig where
historically the code has directly called do_exit.  This has the
implementation benefits of going through the signal exit path
(including generating core dumps) without the danger of allowing
userspace to ignore or change these signals.

This avoids userspace regressions as older kernels exited with do_exit
which debuggers also can not intercept.

In the future is should be possible to improve the quality of
implementation of the kernel by changing some of these force_exit_sig
calls to force_fatal_sig.  That can be done where it matters on
a case-by-case basis with careful analysis.

Reported-by: Kyle Huey <me@kylehuey.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020
Fixes: 00b06da29c ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed")
Fixes: a3616a3c02 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die")
Fixes: 83a1f27ad7 ("signal/powerpc: On swapcontext failure force SIGSEGV")
Fixes: 9bc508cf07 ("signal/s390: Use force_sigsegv in default_trap_handler")
Fixes: 086ec444f8 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig")
Fixes: c317d306d5 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails")
Fixes: 695dd0d634 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit")
Fixes: 1fbd60df8a ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.")
Fixes: 941edc5bf1 ("exit/syscall_user_dispatch: Send ordinary signals on failure")
Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-19 09:15:58 -06:00
Linus Torvalds
c46e8ece96 Selftest changes:
* Cleanups for the perf test infrastructure and mapping hugepages
 
 * Avoid contention on mmap_sem when the guests start to run
 
 * Add event channel upcall support to xen_shinfo_test
 
 x86 changes:
 
 * Fixes for Xen emulation
 
 * Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
 
 * Fixes for migration of 32-bit nested guests on 64-bit hypervisor
 
 * Compilation fixes
 
 * More SEV cleanups
 
 Generic:
 
 * Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
 and num_online_cpus().  Most architectures were only using one of the two.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGV/PAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMrogf/eAyilGRQL7lLETn3DTVlgLVv82+z
 giX11HlUhUmATHIDluj/wVQUjVcY6AO4SnvFaudX7B+mibndkw4L19IubP/koQZu
 xnKSJTn+mVANdzz3UdsHl0ujbPdQJaFCIPW6iewbn2GRRZMwA5F3vMK/H09XRApL
 I7kq8CPA6sC0I3TPzPN3ROxigexzYunZmGQ4qQe0GUdtxHrJOYQN++ddmWbQoEIC
 gdFTyF7CUQ+lmJe0b/Y88yhISFAJCEBuKFlg9tOTuxSfwvPX6lUu+pi+utEx9M+O
 ckTSQli/apZ4RVcSzxMIwX/BciYqhqOz5uMG+w4DRlJixtGSHtjiEVxGxw==
 =Iij4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Selftest changes:

   - Cleanups for the perf test infrastructure and mapping hugepages

   - Avoid contention on mmap_sem when the guests start to run

   - Add event channel upcall support to xen_shinfo_test

  x86 changes:

   - Fixes for Xen emulation

   - Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache

   - Fixes for migration of 32-bit nested guests on 64-bit hypervisor

   - Compilation fixes

   - More SEV cleanups

  Generic:

   - Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
     and num_online_cpus(). Most architectures were only using one of
     the two"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
  KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
  KVM: x86: Assume a 64-bit hypercall for guests with protected state
  selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
  riscv: kvm: fix non-kernel-doc comment block
  KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
  KVM: SEV: Drop a redundant setting of sev->asid during initialization
  KVM: SEV: WARN if SEV-ES is marked active but SEV is not
  KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
  KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
  KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
  KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
  KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
  KVM: x86/xen: Use sizeof_field() instead of open-coding it
  KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
  KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
  ...
2021-11-18 12:05:22 -08:00
Linus Torvalds
7d5775d49e printk fixup for 5.16
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmGWF6YACgkQUqAMR0iA
 lPJJ+RAAm9pi/EElKKl+lOlBl+ehJlKuNLnPQWFmmaRc9xd0ruUipp1nsaktLJ8f
 R/PkSwR/YWpBWlF8P4o+x9sOFyTNyLasoHtqsinEcAJI4lb7d1KOrPliTXyr15Ai
 A303djwJmwCw5KxAPOjkG/nMBlpMIAQRee9GDWs1ykfSlIsI4jp7isVbCFNCQNVF
 auHYq1bfJ5MJYPjxIDZUt+NF7kg7dD4k4g+VCVjaH1u8pGeaCUCtnNjMFOk1XfU8
 yFQnaDcrAu4zJPq3d74z4eN9Bk+su8+DhnfrAEFjuFxGTgYc2MyRt0gGFeiUtNs4
 rvST/eHBO4zeuL18S8G+fLcig/9ZYE73xzjdOCzRzLDjn0VQr9t06ez1QqJOb4D6
 A4SSufwek5NIqYKMlhV/az2EceQYK8Wv3KAz8w98KDfwvVVhUSgE23MbTCO0hvQU
 PWF35d3hQ+9oH0ZGYRumb8OpXtKJ+2KmzyN8Z0xhivHFBIKlcW6IBGhWRANclJO8
 jNAM3jiwi8fRDVM2wI1fmgeEmMhG+WuTI3dJVu3tu4vI923FW5GdY6ev5EvH0Ts0
 khTwIjtmCHUJGSeWajy3Gi9irdyhPyPNRMqgal4GS+gGpVU2mMMKTG+NzxxtCRKR
 BUgfCjFDoDJWrNWIzzOwNqgF0Y+V9GgCZOkb73u/y+xVx0Rmc6U=
 =wbBy
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fixes from Petr Mladek:

 - Try to flush backtraces from other CPUs also on the local one. This
   was a regression caused by printk_safe buffers removal.

 - Remove header dependency warning.

* tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: Remove printk.h inclusion in percpu.h
  printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
2021-11-18 10:50:45 -08:00
Petr Mladek
bf6d0d1e1a Merge branch 'rework/printk_safe-removal' into for-linus 2021-11-18 10:03:47 +01:00
Vitaly Kuznetsov
b7915d55b1 KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:12:14 -05:00
Cédric Le Goater
8e80a73fa9 powerpc/xive: Change IRQ domain to a tree domain
Commit 4f86a06e2d ("irqdomain: Make normal and nomap irqdomains
exclusive") introduced an IRQ_DOMAIN_FLAG_NO_MAP flag to isolate the
'nomap' domains still in use under the powerpc arch. With this new
flag, the revmap_tree of the IRQ domain is not used anymore. This
change broke the support of shared LSIs [1] in the XIVE driver because
it was relying on a lookup in the revmap_tree to query previously
mapped interrupts. Linux now creates two distinct IRQ mappings on the
same HW IRQ which can lead to unexpected behavior in the drivers.

The XIVE IRQ domain is not a direct mapping domain and its HW IRQ
interrupt number space is rather large : 1M/socket on POWER9 and
POWER10, change the XIVE driver to use a 'tree' domain type instead.

[1] For instance, a linux KVM guest with virtio-rng and virtio-balloon
    devices.

Fixes: 4f86a06e2d ("irqdomain: Make normal and nomap irqdomains exclusive")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211116134022.420412-1-clg@kaod.org
2021-11-17 21:55:42 +11:00
Tiezhu Yang
ebf7f6f0a6 bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33
In the current code, the actual max tail call count is 33 which is greater
than MAX_TAIL_CALL_CNT (defined as 32). The actual limit is not consistent
with the meaning of MAX_TAIL_CALL_CNT and thus confusing at first glance.
We can see the historical evolution from commit 04fd61ab36 ("bpf: allow
bpf programs to tail-call other bpf programs") and commit f9dabe016b
("bpf: Undo off-by-one in interpreter tail call count limit"). In order
to avoid changing existing behavior, the actual limit is 33 now, this is
reasonable.

After commit 874be05f52 ("bpf, tests: Add tail call test suite"), we can
see there exists failed testcase.

On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
 # echo 0 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:0 ret 34 != 33 FAIL

On some archs:
 # echo 1 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:1 ret 34 != 33 FAIL

Although the above failed testcase has been fixed in commit 18935a72eb
("bpf/tests: Fix error in tail call limit tests"), it would still be good
to change the value of MAX_TAIL_CALL_CNT from 32 to 33 to make the code
more readable.

The 32-bit x86 JIT was using a limit of 32, just fix the wrong comments and
limit to 33 tail calls as the constant MAX_TAIL_CALL_CNT updated. For the
mips64 JIT, use "ori" instead of "addiu" as suggested by Johan Almbladh.
For the riscv JIT, use RV_REG_TCC directly to save one register move as
suggested by Björn Töpel. For the other implementations, no function changes,
it does not change the current limit 33, the new value of MAX_TAIL_CALL_CNT
can reflect the actual max tail call count, the related tail call testcases
in test_bpf module and selftests can work well for the interpreter and the
JIT.

Here are the test results on x86_64:

 # uname -m
 x86_64
 # echo 0 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf test_suite=test_tail_calls
 # dmesg | tail -1
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
 # rmmod test_bpf
 # echo 1 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf test_suite=test_tail_calls
 # dmesg | tail -1
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
 # rmmod test_bpf
 # ./test_progs -t tailcalls
 #142 tailcalls:OK
 Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/1636075800-3264-1-git-send-email-yangtiezhu@loongson.cn
2021-11-16 14:03:15 +01:00
Christophe Leroy
1e35eba405 powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
As spotted and explained in commit c12ab8dbc4 ("powerpc/8xx: Fix
Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST"), the selection
of STRICT_KERNEL_RWX without selecting DEBUG_RODATA_TEST has spotted
the lack of the DIRTY bit in the pinned kernel data TLBs.

This problem should have been detected a lot earlier if things had
been working as expected. But due to an incredible level of chance or
mishap, this went undetected because of a set of bugs: In fact the
DTLBs were not pinned, because instead of setting the reserve bit
in MD_CTR, it was set in MI_CTR that is the register for ITLBs.

But then, another huge bug was there: the physical address was
reset to 0 at the boundary between RO and RW areas, leading to the
same physical space being mapped at both 0xc0000000 and 0xc8000000.
This had by miracle no consequence until now because the entry was
not really pinned so it was overwritten soon enough to go undetected.

Of course, now that we really pin the DTLBs, it must be fixed as well.

Fixes: f76c8f6d25 ("powerpc/8xx: Add function to set pinned TLBs")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Depends-on: c12ab8dbc4 ("powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a21e9a057fe2d247a535aff0d157a54eefee017a.1636963688.git.christophe.leroy@csgroup.eu
2021-11-16 21:37:10 +11:00
Christophe Leroy
5499802b22 powerpc/signal32: Fix sigset_t copy
The conversion from __copy_from_user() to __get_user() by
commit d3ccc97815 ("powerpc/signal: Use __get_user() to copy
sigset_t") introduced a regression in __get_user_sigset() for
powerpc/32. The bug was subsequently moved into
unsafe_get_user_sigset().

The bug is due to the copied 64 bit value being truncated to
32 bits while being assigned to dst->sig[0]

The regression was reported by users of the Xorg packages distributed in
Debian/powerpc --

    "The symptoms are that the fb screen goes blank, with the backlight
    remaining on and no errors logged in /var/log; wdm (or startx) run
    with no effect (I tried logging in in the blind, with no effect).
    And they are hard to kill, requiring 'kill -KILL ...'"

Fix the regression by copying each word of the sigset, not only the
first one.

__get_user_sigset() was tentatively optimised to copy 64 bits at once
in order to minimise KUAP unlock/lock impact, but the unsafe variant
doesn't suffer that, so it can just copy words.

Fixes: 887f3ceb51 ("powerpc/signal32: Convert do_setcontext[_tm]() to user access block")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Finn Thain <fthain@linux-m68k.org>
Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/99ef38d61c0eb3f79c68942deb0c35995a93a777.1636966353.git.christophe.leroy@csgroup.eu
2021-11-16 21:24:16 +11:00
Christophe Leroy
5b54860943 powerpc/book3e: Fix TLBCAM preset at boot
Commit 52bda69ae8 ("powerpc/fsl_booke: Tell map_mem_in_cams() if
init is done") was supposed to just add an additional parameter to
map_mem_in_cams() and always set it to 'true' at that time.

But a few call sites were messed up. Fix them.

Fixes: 52bda69ae8 ("powerpc/fsl_booke: Tell map_mem_in_cams() if init is done")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d319f2a9367d4d08fd2154e506101bd5f100feeb.1636967119.git.christophe.leroy@csgroup.eu
2021-11-16 21:20:59 +11:00
Alexey Kardashevskiy
ad3976025b powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window
There is a possibility of having just one DMA window available with
a limited capacity which the existing code does not handle that well.
If the window is big enough for the system RAM but less than
MAX_PHYSMEM_BITS (which we want when persistent memory is present),
we create 1:1 window and leave persistent memory without DMA.

This disables 1:1 mapping entirely if there is persistent memory and
either:
- the huge DMA window does not cover the entire address space;
- the default DMA window is removed.

This relies on reverted 54fc3c681d
("powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory")
to return the actual amount RAM in ddw_memory_hotplug_max() (posted
separately).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211108040320.3857636-4-aik@ozlabs.ru
2021-11-15 15:46:46 +11:00
Alexey Kardashevskiy
fb4ee2b30c powerpc/pseries/ddw: simplify enable_ddw()
This drops rather useless ddw_enabled flag as direct_mapping implies
it anyway.

While at this, fix indents in enable_ddw().

This should not cause any behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211108040320.3857636-3-aik@ozlabs.ru
2021-11-15 15:46:46 +11:00
Alexey Kardashevskiy
2d33f55044 powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
This reverts commit 54fc3c681d
which does not allow 1:1 mapping even for the system RAM which
is usually possible.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211108040320.3857636-2-aik@ozlabs.ru
2021-11-15 15:46:46 +11:00
Nicholas Piggin
302039466f powerpc/pseries: Fix numa FORM2 parsing fallback code
In case the FORM2 distance table from firmware is not the expected size,
there is fallback code that just populates the lookup table as local vs
remote.

However it then continues on to use the distance table. Fix.

Fixes: 1c6b5a7e74 ("powerpc/pseries: Add support for FORM2 associativity")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211109064900.2041386-2-npiggin@gmail.com
2021-11-15 15:46:46 +11:00
Nicholas Piggin
0bd81274e3 powerpc/pseries: rename numa_dist_table to form2_distances
The name of the local variable holding the "form2" property address
conflicts with the numa_distance_table global.

This patch does 's/numa_dist_table/form2_distances/g' over the function,
which also renames numa_dist_table_length to form2_distances_length.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211109064900.2041386-1-npiggin@gmail.com
2021-11-15 15:46:46 +11:00
Masahiro Yamada
964c33cd0b powerpc: clean vdso32 and vdso64 directories
Since commit bce74491c3 ("powerpc/vdso: fix unnecessary rebuilds of
vgettimeofday.o"), "make ARCH=powerpc clean" does not clean up the
arch/powerpc/kernel/{vdso32,vdso64} directories.

Use the subdir- trick to let "make clean" descend into them.

Fixes: bce74491c3 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211109185015.615517-1-masahiroy@kernel.org
2021-11-15 15:46:45 +11:00
Uwe Kleine-König
2da516d7ed powerpc/83xx/mpc8349emitx: Drop unused variable
Commit 5d354dc35e ("powerpc/83xx/mpc8349emitx: Make
mcu_gpiochip_remove() return void") removed the usage of the variable
ret, but failed to remove the variable itself, resulting in:

	arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c: In function ‘mcu_remove’:
	arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c:189:6: error: unused variable ‘ret’ [-Werror=unused-variable]
	  189 |  int ret;
	      |      ^~~

So remove the variable now.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110110739.1072634-1-u.kleine-koenig@pengutronix.de
2021-11-15 15:46:45 +11:00
Michael Ellerman
dae5818646 KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
kvmppc_h_set_dabr(), and kvmppc_h_set_xdabr() which jumps into
it, need to use _GLOBAL_TOC to setup the kernel TOC pointer, because
kvmppc_h_set_dabr() uses LOAD_REG_ADDR() to load dawr_force_enable.

When called from hcall_try_real_mode() we have the kernel TOC in r2,
established near the start of kvmppc_interrupt_hv(), so there is no
issue.

But they can also be called from kvmppc_pseries_do_hcall() which is
module code, so the access ends up happening with the kvm-hv module's
r2, which will not point at dawr_force_enable and could even cause a
fault.

With the current code layout and compilers we haven't observed a fault
in practice, the load hits somewhere in kvm-hv.ko and silently returns
some bogus value.

Note that we we expect p8/p9 guests to use the DAWR, but SLOF uses
h_set_dabr() to test if sc1 works correctly, see SLOF's
lib/libhvcall/brokensc1.c.

Fixes: c1fe190c06 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Link: https://lore.kernel.org/r/20210923151031.72408-1-mpe@ellerman.id.au
2021-11-15 15:46:45 +11:00
Alistair Popple
ab09243aa9 mm/migrate.c: remove MIGRATE_PFN_LOCKED
MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect().  If it
wasn't then the a second attempt is made to lock the page.  However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity.  So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.

Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.

Link: https://lkml.kernel.org/r/20211025041608.289017-1-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-11 09:34:35 -08:00
Linus Torvalds
5147da902e Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit cleanups from Eric Biederman:
 "While looking at some issues related to the exit path in the kernel I
  found several instances where the code is not using the existing
  abstractions properly.

  This set of changes introduces force_fatal_sig a way of sending a
  signal and not allowing it to be caught, and corrects the misuse of
  the existing abstractions that I found.

  A lot of the misuse of the existing abstractions are silly things such
  as doing something after calling a no return function, rolling BUG by
  hand, doing more work than necessary to terminate a kernel thread, or
  calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).

  In the review a deficiency in force_fatal_sig and force_sig_seccomp
  where ptrace or sigaction could prevent the delivery of the signal was
  found. I have added a change that adds SA_IMMUTABLE to change that
  makes it impossible to interrupt the delivery of those signals, and
  allows backporting to fix force_sig_seccomp

  And Arnd found an issue where a function passed to kthread_run had the
  wrong prototype, and after my cleanup was failing to build."

* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
  soc: ti: fix wkup_m3_rproc_boot_thread return type
  signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed
  signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
  exit/r8188eu: Replace the macro thread_exit with a simple return 0
  exit/rtl8712: Replace the macro thread_exit with a simple return 0
  exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
  signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
  signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
  signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
  exit/syscall_user_dispatch: Send ordinary signals on failure
  signal: Implement force_fatal_sig
  exit/kthread: Have kernel threads return instead of calling do_exit
  signal/s390: Use force_sigsegv in default_trap_handler
  signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
  signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
  signal/sparc: In setup_tsb_params convert open coded BUG into BUG
  signal/powerpc: On swapcontext failure force SIGSEGV
  signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
  signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
  signal/sparc32: Remove unreachable do_exit in do_sparc_fault
  ...
2021-11-10 16:15:54 -08:00
Linus Torvalds
e8f023caee asm-generic: asm/syscall.h cleanup
This is a single cleanup from Peter Collingbourne, removing
 some dead code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGKm+kACgkQmmx57+YA
 GNkksg/7BUxJrWFrQmLA3fhzh4wG3KswdrKTGQMf0jRVI1n77vmdfig3hEJmMekH
 0SZoIYmPztOSj34+6p4NxuqY/Sk62oYLr8Awo8ZLDhIBWNUJE8UWC07Qb/3DpGbp
 JfD7/8mXu10htgM85aGlhaVLnHvvqUBR8PlkJUGNxuY5Gy2L+eCkpwCAYlZpSOqr
 vs0BJWFY1LzO1POcjIq4/IM0PvcU2ncLB5XoxDjjIIWnWKyHzWY21ZvoeaNumvou
 xqQ/Hj8Sc+ufS0yNlSgIC+bJP0bp1bSw/dALKr8oYxLt7X9LELVY3WXyRH+It4nS
 b6HPYmga26NVq/u7RrylBA+2fRCDB8E6z73gHt4SeHrRDEvFjhNzIyV+aXPae/dY
 XI/pjiwpG6k6FpSnF69YZJ/Y+GmUA90V/Jq8aLFZhGz8SgpjRl+2foEUSDhRVXCA
 jGB1Y388m0e6jPlVJROB7ORzXMd8K5iciyUGqtAI87QCOtPozn10ruh5RdziguKm
 kDW5IKy9E4l1ch8WRprVbgV5Ew+QWKS1JIbyjDaX3jN0lUPCqgwkwvRxxtgdFnVA
 Lq5BiUdraSWFUr84rLU3gCRU0+VoEdyZYI+bQGGNlQ4ovmLYU1nLmCU/azMvSEgz
 ZsoM5YdffShxUtfwMg+W67RpYOjSnV55ZzwN0+d1C2oqz9xECXc=
 =8EFP
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanup from Arnd Bergmann:
 "This is a single cleanup from Peter Collingbourne, removing some dead
  code"

* tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: remove unused function syscall_set_arguments()
2021-11-10 11:22:03 -08:00
Nicholas Piggin
5d5e4522a7 printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
printk from NMI context relies on irq work being raised on the local CPU
to print to console. This can be a problem if the NMI was raised by a
lockup detector to print lockup stack and regs, because the CPU may not
enable irqs (because it is locked up).

Introduce printk_trigger_flush() that can be called another CPU to try
to get those messages to the console, call that where printk_safe_flush
was previously called.

Fixes: 93d102f094 ("printk: remove safe buffers")
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211107045116.1754411-1-npiggin@gmail.com
2021-11-10 16:12:00 +01:00
Linus Torvalds
59a2ceeef6 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "87 patches.

  Subsystems affected by this patch series: mm (pagecache and hugetlb),
  procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs,
  init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork,
  sysvfs, kcov, gdb, resource, selftests, and ipc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (87 commits)
  ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL
  ipc: check checkpoint_restore_ns_capable() to modify C/R proc files
  selftests/kselftest/runner/run_one(): allow running non-executable files
  virtio-mem: disallow mapping virtio-mem memory via /dev/mem
  kernel/resource: disallow access to exclusive system RAM regions
  kernel/resource: clean up and optimize iomem_is_exclusive()
  scripts/gdb: handle split debug for vmlinux
  kcov: replace local_irq_save() with a local_lock_t
  kcov: avoid enable+disable interrupts if !in_task()
  kcov: allocate per-CPU memory on the relevant node
  Documentation/kcov: define `ip' in the example
  Documentation/kcov: include types.h in the example
  sysv: use BUILD_BUG_ON instead of runtime check
  kernel/fork.c: unshare(): use swap() to make code cleaner
  seq_file: fix passing wrong private data
  seq_file: move seq_escape() to a header
  signal: remove duplicate include in signal.h
  crash_dump: remove duplicate include in crash_dump.h
  crash_dump: fix boolreturn.cocci warning
  hfs/hfsplus: use WARN_ON for sanity check
  ...
2021-11-09 10:11:53 -08:00
Kefeng Wang
843a1ffaf6 powerpc/mm: use core_kernel_text() helper
Use core_kernel_text() helper to simplify code, also drop etext, _stext,
_sinittext, _einittext declaration which already declared in section.h.

Link: https://lkml.kernel.org/r/20210930071143.63410-10-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:51 -08:00
Linus Torvalds
dd72945c43 cxl for v5.16
- Fix support for platforms that do not enumerate every ACPI0016 (CXL
   Host Bridge) in the CHBS (ACPI Host Bridge Structure).
 
 - Introduce a common pci_find_dvsec_capability() helper, clean up open
   coded implementations in various drivers.
 
 - Add 'cxl_test' for regression testing CXL subsystem ABIs. 'cxl_test'
   is a module built from tools/testing/cxl/ that mocks up a CXL topology
   to augment the nascent support for emulation of CXL devices in QEMU.
 
 - Convert libnvdimm to use the uuid API.
 
 - Complete the definition of CXL namespace labels in libnvdimm.
 
 - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL
   mailbox driver. Enable 'ndctl {read,write}-labels' for CXL.
 
 - Continue to sort and refactor functionality into distinct driver and
   core-infrastructure buckets. For example, mailbox handling is now a
   generic core capability consumed by the PCI and cxl_test drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYYRtyQAKCRDfioYZHlFs
 Z7UsAP9DzUN6IWWnYk1R95YXYVxFriRtRsBjujAqTg49EMghawEAoHaA9lxO3Hho
 l25TLYUOmB/zFTlUbe6YQptMJZ5YLwY=
 =im9j
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl updates from Dan Williams:
 "More preparation and plumbing work in the CXL subsystem.

  From an end user perspective the highlight here is lighting up the CXL
  Persistent Memory related commands (label read / write) with the
  generic ioctl() front-end in LIBNVDIMM.

  Otherwise, the ability to instantiate new persistent and volatile
  memory regions is still on track for v5.17.

  Summary:

   - Fix support for platforms that do not enumerate every ACPI0016 (CXL
     Host Bridge) in the CHBS (ACPI Host Bridge Structure).

   - Introduce a common pci_find_dvsec_capability() helper, clean up
     open coded implementations in various drivers.

   - Add 'cxl_test' for regression testing CXL subsystem ABIs.
     'cxl_test' is a module built from tools/testing/cxl/ that mocks up
     a CXL topology to augment the nascent support for emulation of CXL
     devices in QEMU.

   - Convert libnvdimm to use the uuid API.

   - Complete the definition of CXL namespace labels in libnvdimm.

   - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL
     mailbox driver. Enable 'ndctl {read,write}-labels' for CXL.

   - Continue to sort and refactor functionality into distinct driver
     and core-infrastructure buckets. For example, mailbox handling is
     now a generic core capability consumed by the PCI and cxl_test
     drivers"

* tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits)
  ocxl: Use pci core's DVSEC functionality
  cxl/pci: Use pci core's DVSEC functionality
  PCI: Add pci_find_dvsec_capability to find designated VSEC
  cxl/pci: Split cxl_pci_setup_regs()
  cxl/pci: Add @base to cxl_register_map
  cxl/pci: Make more use of cxl_register_map
  cxl/pci: Remove pci request/release regions
  cxl/pci: Fix NULL vs ERR_PTR confusion
  cxl/pci: Remove dev_dbg for unknown register blocks
  cxl/pci: Convert register block identifiers to an enum
  cxl/acpi: Do not fail cxl_acpi_probe() based on a missing CHBS
  cxl/pci: Disambiguate cxl_pci further from cxl_mem
  Documentation/cxl: Add bus internal docs
  cxl/core: Split decoder setup into alloc + add
  tools/testing/cxl: Introduce a mock memory device + driver
  cxl/mbox: Move command definitions to common location
  cxl/bus: Populate the target list at decoder create
  tools/testing/cxl: Introduce a mocked-up CXL port hierarchy
  cxl/pmem: Add support for multiple nvdimm-bridge objects
  cxl/pmem: Translate NVDIMM label commands to CXL label commands
  ...
2021-11-08 11:49:48 -08:00
Linus Torvalds
1e9ed9360f Kbuild updates for v5.16
- Remove the global -isystem compiler flag, which was made possible by
    the introduction of <linux/stdarg.h>
 
  - Improve the Kconfig help to print the location in the top menu level
 
  - Fix "FORCE prerequisite is missing" build warning for sparc
 
  - Add new build targets, tarzst-pkg and perf-tarzst-src-pkg, which generate
    a zstd-compressed tarball
 
  - Prevent gen_init_cpio tool from generating a corrupted cpio when
    KBUILD_BUILD_TIMESTAMP is set to 2106-02-07 or later
 
  - Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmGGkysVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGgZkQAIX4i9Tt6pyl/2xGDGkzUqjprfoH
 QUIo1DoUclLUygoakrrrX3EnZLWrslgPTKjQxdiV6RA6xHfe4cYgNTSq8zM9lsPT
 lu+B4nEDqoXQ5gyLxMlnjS3FRQTNYIeBZEhSAIiW8TENdLKlKc+NYdoj7th50dO0
 SkXRa2dpWHa6t7ZRqHIHMpUWA7gm0w22ZbgQmyUv1CDGO4IHPLqe2b2PMsrzhSZ1
 yypP1l6aQVKuP0hN9aytbTRqDxUd0uOzBf00PK5zx23hjdwZ9wmZrFTKDf9fAu/+
 nR7gBsa5YoYNQh3UkayZXjR5dClmgsCXZ25OXI7YucQp/8OJ5fadfn1NFpJHsw56
 n5cckbHIXgnFUcel5YlkR6qTHjpzdr9vHm90MmiuX99b3oy9czl6pY3qkNfRkllQ
 v7ME5L1qlw3P3ia1KA+H4zW/LIJ8p5cbKBwaY22m3kY3bTx7PiOfMlep4UVqxXSb
 0/OqxSsmYg5LlmwEQ0SSsx45hE0o9nG/cdjkHu1jUOUHxYfpt1T4MTILeGUwmjzd
 TydJym5MZyXBawu4NVB3QLoKm5Jt2BXtyaWOtq74VSrs77roNCdYuQWJ+1aBf2Pg
 0s4CVC2cC7KlxJDImoqswZATGXPMfbiVDcuVSSukYRgBMeCBPUzRhB8YP36BZyD3
 9vFYmqSujtUU7nWb
 =ATFN
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Remove the global -isystem compiler flag, which was made possible by
   the introduction of <linux/stdarg.h>

 - Improve the Kconfig help to print the location in the top menu level

 - Fix "FORCE prerequisite is missing" build warning for sparc

 - Add new build targets, tarzst-pkg and perf-tarzst-src-pkg, which
   generate a zstd-compressed tarball

 - Prevent gen_init_cpio tool from generating a corrupted cpio when
   KBUILD_BUILD_TIMESTAMP is set to 2106-02-07 or later

 - Misc cleanups

* tag 'kbuild-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
  kbuild: use more subdir- for visiting subdirectories while cleaning
  sh: remove meaningless archclean line
  initramfs: Check timestamp to prevent broken cpio archive
  kbuild: split DEBUG_CFLAGS out to scripts/Makefile.debug
  gen_init_cpio: add static const qualifiers
  kbuild: Add make tarzst-pkg build option
  scripts: update the comments of kallsyms support
  sparc: Add missing "FORCE" target when using if_changed
  kconfig: refactor conf_touch_dep()
  kconfig: refactor conf_write_dep()
  kconfig: refactor conf_write_autoconf()
  kconfig: add conf_get_autoheader_name()
  kconfig: move sym_escape_string_value() to confdata.c
  kconfig: refactor listnewconfig code
  kconfig: refactor conf_write_symbol()
  kconfig: refactor conf_write_heading()
  kconfig: remove 'const' from the return type of sym_escape_string_value()
  kconfig: rename a variable in the lexer to a clearer name
  kconfig: narrow the scope of variables in the lexer
  kconfig: Create links to main menu items in search
  ...
2021-11-08 09:15:45 -08:00
Linus Torvalds
0c5c62ddf8 pci-v5.16-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmGFXBkUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vx6Tg/7BsGWm8f+uw/mr9lLm47q2mc4XyoO
 7bR9KDp5NM84W/8ZOU7dqqqsnY0ddrSOLBRyhJJYMW3SwJd1y1ajTBsL1Ujqv+eN
 z+JUFmhq4Laqm4k6Spc9CEJE+Ol5P6gGUtxLYo6PM2R0VxnSs/rDxctT5i7YOpCi
 COJ+NVT/mc/by2loz1kLTSR9GgtBBgd+Y8UA33GFbHKssROw02L0OI3wffp81Oba
 EhMGPoD+0FndAniDw+vaOSoO+YaBuTfbM92T/O00mND69Fj1PWgmNWZz7gAVgsXb
 3RrNENUFxgw6CDt7LZWB8OyT04iXe0R2kJs+PA9gigFCGbypwbd/Nbz5M7e9HUTR
 ray+1EpZib6+nIksQBL2mX8nmtyHMcLiM57TOEhq0+ECDO640MiRm8t0FIG/1E8v
 3ZYd9w20o/NxlFNXHxxpZ3D/osGH5ocyF5c5m1rfB4RGRwztZGL172LWCB0Ezz9r
 eHB8sWxylxuhrH+hp2BzQjyddg7rbF+RA4AVfcQSxUpyV01hoRocKqknoDATVeLH
 664nJIINFxKJFwfuL3E6OhrInNe1LnAhCZsHHqbS+NNQFgvPRznbixBeLkI9dMf5
 Yf6vpsWO7ur8lHHbRndZubVu8nxklXTU7B/w+C11sq6k9LLRJSHzanr3Fn9WA80x
 sznCxwUvbTCu1r0=
 =nsMh
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:
   - Conserve IRQs by setting up portdrv IRQs only when there are users
     (Jan Kiszka)
   - Rework and simplify _OSC negotiation for control of PCIe features
     (Joerg Roedel)
   - Remove struct pci_dev.driver pointer since it's redundant with the
     struct device.driver pointer (Uwe Kleine-König)

  Resource management:
   - Coalesce contiguous host bridge apertures from _CRS to accommodate
     BARs that cover more than one aperture (Kai-Heng Feng)

  Sysfs:
   - Check CAP_SYS_ADMIN before parsing user input (Krzysztof
     Wilczyński)
   - Return -EINVAL consistently from "store" functions (Krzysztof
     Wilczyński)
   - Use sysfs_emit() in endpoint "show" functions to avoid buffer
     overruns (Kunihiko Hayashi)

  PCIe native device hotplug:
   - Ignore Link Down/Up caused by resets during error recovery so
     endpoint drivers can remain bound to the device (Lukas Wunner)

  Virtualization:
   - Avoid bus resets on Atheros QCA6174, where they hang the device
     (Ingmar Klein)
   - Work around Pericom PI7C9X2G switch packet drop erratum by using
     store and forward mode instead of cut-through (Nathan Rossi)
   - Avoid trying to enable AtomicOps on VFs; the PF setting applies to
     all VFs (Selvin Xavier)

  MSI:
   - Document that /sys/bus/pci/devices/.../irq contains the legacy INTx
     interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry
     Song)

  VPD:
   - Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere
     in the possible VPD space; use these to simplify the cxgb3 driver
     (Heiner Kallweit)

  Peer-to-peer DMA:
   - Add (not subtract) the bus offset when calculating DMA address
     (Wang Lu)

  ASPM:
   - Re-enable LTR at Downstream Ports so they don't report Unsupported
     Requests when reset or hot-added devices send LTR messages
     (Mingchuang Qiao)

  Apple PCIe controller driver:
   - Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc
     Zyngier)

  Cadence PCIe controller driver:
   - Return success when probe succeeds instead of falling into error
     path (Li Chen)

  HiSilicon Kirin PCIe controller driver:
   - Reorganize PHY logic and add support for external PHY drivers
     (Mauro Carvalho Chehab)
   - Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro
     Carvalho Chehab)
   - Add Kirin 970 support (Mauro Carvalho Chehab)
   - Make driver removable (Mauro Carvalho Chehab)

  Intel VMD host bridge driver:
   - If IOMMU supports interrupt remapping, leave VMD MSI-X remapping
     enabled (Adrian Huang)
   - Number each controller so we can tell them apart in
     /proc/interrupts (Chunguang Xu)
   - Avoid building on UML because VMD depends on x86 bare metal APIs
     (Johannes Berg)

  Marvell Aardvark PCIe controller driver:
   - Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár)
   - Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár)
   - Downgrade PIO Response Status messages to debug level (Marek Behún)
   - Preserve CRS SV (Config Request Retry Software Visibility) bit in
     emulated Root Control register (Pali Rohár)
   - Fix issue in configuring reference clock (Pali Rohár)
   - Don't clear status bits for masked interrupts (Pali Rohár)
   - Don't mask unused interrupts (Pali Rohár)
   - Avoid code repetition in advk_pcie_rd_conf() (Marek Behún)
   - Retry config accesses on CRS response (Pali Rohár)
   - Simplify emulated Root Capabilities initialization (Pali Rohár)
   - Fix several link training issues (Pali Rohár)
   - Fix link-up checking via LTSSM (Pali Rohár)
   - Fix reporting of Data Link Layer Link Active (Pali Rohár)
   - Fix emulation of W1C bits (Marek Behún)
   - Fix MSI domain .alloc() method to return zero on success (Marek
     Behún)
   - Read entire 16-bit MSI vector in MSI handler, not just low 8 bits
     (Marek Behún)
   - Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits
     at startup; PCI core will set those as necessary (Pali Rohár)
   - When operating as a Root Port, set class code to "PCI Bridge"
     instead of the default "Mass Storage Controller" (Pali Rohár)
   - Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't
     implement this per spec (Pali Rohár)
   - Add emulation of option ROM BAR since aardvark doesn't implement
     this per spec (Pali Rohár)

  MediaTek MT7621 PCIe controller driver:
   - Add MediaTek MT7621 PCIe host controller driver and DT binding
     (Sergio Paracuellos)

  Qualcomm PCIe controller driver:
   - Add SC8180x compatible string (Bjorn Andersson)
   - Add endpoint controller driver and DT binding (Manivannan
     Sadhasivam)
   - Restructure to use of_device_get_match_data() (Prasad Malisetty)
   - Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty)

  Renesas R-Car PCIe controller driver:
   - Remove unnecessary includes (Geert Uytterhoeven)

  Rockchip DesignWare PCIe controller driver:
   - Add DT binding (Simon Xue)

  Socionext UniPhier Pro5 controller driver:
   - Serialize INTx masking/unmasking (Kunihiko Hayashi)

  Synopsys DesignWare PCIe controller driver:
   - Run dwc .host_init() method before registering MSI interrupt
     handler so we can deal with pending interrupts left by bootloader
     (Bjorn Andersson)
   - Clean up Kconfig dependencies (Andy Shevchenko)
   - Export symbols to allow more modular drivers (Luca Ceresoli)

  TI DRA7xx PCIe controller driver:
   - Allow host and endpoint drivers to be modules (Luca Ceresoli)
   - Enable external clock if present (Luca Ceresoli)

  TI J721E PCIe driver:
   - Disable PHY when probe fails after initializing it (Christophe
     JAILLET)

  MicroSemi Switchtec management driver:
   - Return error to application when command execution fails because an
     out-of-band reset has cleared the device BARs, Memory Space Enable,
     etc (Kelvin Cao)
   - Fix MRPC error status handling issue (Kelvin Cao)
   - Mask out other bits when reading of management VEP instance ID
     (Kelvin Cao)
   - Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions
     (Kelvin Cao)
   - Add check of event support (Logan Gunthorpe)

  Miscellaneous:
   - Remove unused pci_pool wrappers, which have been replaced by
     dma_pool (Cai Huoqing)
   - Use 'unsigned int' instead of bare 'unsigned' (Krzysztof
     Wilczyński)
   - Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof
     Wilczyński)
   - Fix some sscanf(), sprintf() format mismatches (Krzysztof
     Wilczyński)
   - Update PCI subsystem information in MAINTAINERS (Krzysztof
     Wilczyński)
   - Correct some misspellings (Krzysztof Wilczyński)"

* tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits)
  PCI: Add ACS quirk for Pericom PI7C9X2G switches
  PCI: apple: Configure RID to SID mapper on device addition
  iommu/dart: Exclude MSI doorbell from PCIe device IOVA range
  PCI: apple: Implement MSI support
  PCI: apple: Add INTx and per-port interrupt support
  PCI: kirin: Allow removing the driver
  PCI: kirin: De-init the dwc driver
  PCI: kirin: Disable clkreq during poweroff sequence
  PCI: kirin: Move the power-off code to a common routine
  PCI: kirin: Add power_off support for Kirin 960 PHY
  PCI: kirin: Allow building it as a module
  PCI: kirin: Add MODULE_* macros
  PCI: kirin: Add Kirin 970 compatible
  PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge
  PCI: apple: Set up reference clocks when probing
  PCI: apple: Add initial hardware bring-up
  PCI: of: Allow matching of an interrupt-map local to a PCI device
  of/irq: Allow matching of an interrupt-map local to an interrupt controller
  irqdomain: Make of_phandle_args_to_fwspec() generally available
  PCI: Do not enable AtomicOps on VFs
  ...
2021-11-06 14:36:12 -07:00
Linus Torvalds
512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
Stephen Kitt
53944f171a mm: remove HARDENED_USERCOPY_FALLBACK
This has served its purpose and is no longer used.  All usercopy
violations appear to have been handled by now, any remaining instances
(or new bugs) will cause copies to be rejected.

This isn't a direct revert of commit 2d891fbc3b ("usercopy: Allow
strict enforcement of whitelists"); since usercopy_fallback is
effectively 0, the fallback handling is removed too.

This also removes the usercopy_fallback module parameter on slab_common.

Link: https://github.com/KSPP/linux/issues/153
Link: https://lkml.kernel.org/r/20210921061149.1091163-1-steve@sk2.org
Signed-off-by: Stephen Kitt <steve@sk2.org>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>	[defconfig change]
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E . Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:43 -07:00
David Hildenbrand
50f9481ed9 mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.

Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>	[kselftest]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
Zhenguo Yao
b5389086ad hugetlbfs: extend the definition of hugepages parameter to support node allocation
We can specify the number of hugepages to allocate at boot.  But the
hugepages is balanced in all nodes at present.  In some scenarios, we
only need hugepages in one node.  For example: DPDK needs hugepages
which are in the same node as NIC.

If DPDK needs four hugepages of 1G size in node1 and system has 16 numa
nodes we must reserve 64 hugepages on the kernel cmdline.  But only four
hugepages are used.  The others should be free after boot.  If the
system memory is low(for example: 64G), it will be an impossible task.

So extend the hugepages parameter to support specifying hugepages on a
specific node.  For example add following parameter:

  hugepagesz=1G hugepages=0:1,1:3

It will allocate 1 hugepage in node0 and 3 hugepages in node1.

Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com
Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
4421cca0a3 memblock: use memblock_free for freeing virtual pointers
Rename memblock_free_ptr() to memblock_free() and use memblock_free()
when freeing a virtual pointer so that memblock_free() will be a
counterpart of memblock_alloc()

The callers are updated with the below semantic patch and manual
addition of (void *) casting to pointers that are represented by
unsigned long variables.

    @@
    identifier vaddr;
    expression size;
    @@
    (
    - memblock_phys_free(__pa(vaddr), size);
    + memblock_free(vaddr, size);
    |
    - memblock_free_ptr(vaddr, size);
    + memblock_free(vaddr, size);
    )

[sfr@canb.auug.org.au: fixup]
  Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au

Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
3ecc68349b memblock: rename memblock_free to memblock_phys_free
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().

The callers are updated with the below semantic patch:

    @@
    expression addr;
    expression size;
    @@
    - memblock_free(addr, size);
    + memblock_phys_free(addr, size);

Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Mike Rapoport
fa27717110 memblock: drop memblock_free_early_nid() and memblock_free_early()
memblock_free_early_nid() is unused and memblock_free_early() is an
alias for memblock_free().

Replace calls to memblock_free_early() with calls to memblock_free() and
remove memblock_free_early() and memblock_free_early_nid().

Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Christophe Leroy
e012a25d81 powerpc: use generic version of arch_is_kernel_initmem_freed()
The generic version of arch_is_kernel_initmem_freed() now does the same
as powerpc version.

Remove the powerpc version.

Link: https://lkml.kernel.org/r/c53764eb45d41491e2b21da2e7812239897dbebb.1633001016.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:38 -07:00
Bjorn Helgaas
d03c426f7a Merge branch 'pci/driver'
- Drop the struct pci_dev.driver pointer, which is redundant with the
  struct device.driver pointer (Uwe Kleine-König)

* pci/driver:
  PCI: Remove struct pci_dev->driver
  PCI: Use to_pci_driver() instead of pci_dev->driver
  x86/pci/probe_roms: Use to_pci_driver() instead of pci_dev->driver
  perf/x86/intel/uncore: Use to_pci_driver() instead of pci_dev->driver
  powerpc/eeh: Use to_pci_driver() instead of pci_dev->driver
  usb: xhci: Use to_pci_driver() instead of pci_dev->driver
  cxl: Use to_pci_driver() instead of pci_dev->driver
  cxl: Factor out common dev->driver expressions
  xen/pcifront: Use to_pci_driver() instead of pci_dev->driver
  xen/pcifront: Drop pcifront_common_process() tests of pcidev, pdrv
  nfp: use dev_driver_string() instead of pci_dev->driver->name
  mlxsw: pci: Use dev_driver_string() instead of pci_dev->driver->name
  net: marvell: prestera: use dev_driver_string() instead of pci_dev->driver->name
  net: hns3: use dev_driver_string() instead of pci_dev->driver->name
  crypto: hisilicon - use dev_driver_string() instead of pci_dev->driver->name
  powerpc/eeh: Use dev_driver_string() instead of struct pci_dev->driver->name
  ssb: Use dev_driver_string() instead of pci_dev->driver->name
  bcma: simplify reference to driver name
  crypto: qat - simplify adf_enable_aer()
  scsi: message: fusion: Remove unused mpt_pci driver .probe() 'id' parameter
  PCI/ERR: Factor out common dev->driver expressions
  PCI: Drop pci_device_probe() test of !pci_dev->driver
  PCI: Drop pci_device_remove() test of pci_dev->driver
  PCI: Return NULL for to_pci_driver(NULL)
2021-11-05 11:28:43 -05:00