Commit Graph

25142 Commits

Author SHA1 Message Date
Naveen N. Rao
1d4866d565 powerpc64/bpf: Use r12 for constant blinding
In preparation for preserving kernel toc in r2, switch BPF_REG_AX from
r2 to r12. r12 is not used by bpf JIT except during external helper/bpf
calls, or with BPF_NOSPEC. These sequences aren't emitted when
BPF_REG_AX is used for constant blinding and other purposes.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e109f98617eacb4512c17a48525e94eda42889e6.1644834730.git.naveen.n.rao@linux.vnet.ibm.com
2022-03-08 00:04:57 +11:00
Naveen N. Rao
c2067f7f88 powerpc64/bpf: Do not save/restore LR on each call to bpf_stf_barrier()
Instead of saving and restoring LR before each invocation to
bpf_stf_barrier(), set SEEN_FUNC flag so that we save/restore LR in
prologue/epilogue.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4446f25478d82a2a4ac9dab2ebdfd88ddf923eb7.1644834730.git.naveen.n.rao@linux.vnet.ibm.com
2022-03-08 00:04:57 +11:00
Naveen N. Rao
0ffdbce6f4 powerpc/bpf: Handle large branch ranges with BPF_EXIT
In some scenarios, it is possible that the program epilogue is outside
the branch range for a BPF_EXIT instruction. Instead of rejecting such
programs, emit epilogue as an alternate exit point from the program.
Track the location of the same so that subsequent exits can take either
of the two paths.

Reported-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/33aa2e92645a92712be23b18035a2c6dcb92ff8d.1644834730.git.naveen.n.rao@linux.vnet.ibm.com
2022-03-08 00:04:57 +11:00
Naveen N. Rao
bafb5898de powerpc/bpf: Emit a single branch instruction for known short branch ranges
PPC_BCC() emits two instructions to accommodate scenarios where we need
to branch outside the range of a conditional branch. PPC_BCC_SHORT()
emits a single branch instruction and can be used when the branch is
known to be within a conditional branch range.

Convert some of the uses of PPC_BCC() in the powerpc BPF JIT over to
PPC_BCC_SHORT() where we know the branch range.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/edbca01377d1d5f472868bf6d8962b0a0d85b96f.1644834730.git.naveen.n.rao@linux.vnet.ibm.com
2022-03-08 00:04:57 +11:00
Naveen N. Rao
acd7408d27 powerpc/bpf: Skip branch range validation during first pass
During the first pass, addrs[] is still being populated. So, all
branches to following instructions will appear to be going to the start
of the JIT program. Ignore branch range validation for such instructions
and assume those to be in range. Branch range validation will happen
during the second pass after addrs[] is setup properly.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bc517413d11636e20dbfc88503dad14bcbe391e2.1644834730.git.naveen.n.rao@linux.vnet.ibm.com
2022-03-08 00:04:57 +11:00
Michael Ellerman
591b4b2684 powerpc/code-patching: Pre-map patch area
Paul reported a warning with DEBUG_ATOMIC_SLEEP=y:

  BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256
  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
  preempt_count: 0, expected: 0
  ...
  Call Trace:
    dump_stack_lvl+0xa0/0xec (unreliable)
    __might_resched+0x2f4/0x310
    kmem_cache_alloc+0x220/0x4b0
    __pud_alloc+0x74/0x1d0
    hash__map_kernel_page+0x2cc/0x390
    do_patch_instruction+0x134/0x4a0
    arch_jump_label_transform+0x64/0x78
    __jump_label_update+0x148/0x180
    static_key_enable_cpuslocked+0xd0/0x120
    static_key_enable+0x30/0x50
    check_kvm_guest+0x60/0x88
    pSeries_smp_probe+0x54/0xb0
    smp_prepare_cpus+0x3e0/0x430
    kernel_init_freeable+0x20c/0x43c
    kernel_init+0x30/0x1a0
    ret_from_kernel_thread+0x5c/0x64

Peter pointed out that this is because do_patch_instruction() has
disabled interrupts, but then map_patch_area() calls map_kernel_page()
then hash__map_kernel_page() which does a sleeping memory allocation.

We only see the warning in KVM guests with SMT enabled, which is not
particularly common, or on other platforms if CONFIG_KPROBES is
disabled, also not common. The reason we don't see it in most
configurations is that another path that happens to have interrupts
enabled has allocated the required page tables for us, eg. there's a
path in kprobes init that does that. That's just pure luck though.

As Christophe suggested, the simplest solution is to do a dummy
map/unmap when we initialise the patching, so that any required page
table levels are pre-allocated before the first call to
do_patch_instruction(). This works because the unmap doesn't free any
page tables that were allocated by the map, it just clears the PTE,
leaving the page table levels there for the next map.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Debugged-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220223015821.473097-1-mpe@ellerman.id.au
2022-03-08 00:04:57 +11:00
Michael Ellerman
d4679ac8ea powerpc/64s: Don't use DSISR for SLB faults
Since commit 46ddcb3950 ("powerpc/mm: Show if a bad page fault on data
is read or write.") we use page_fault_is_write(regs->dsisr) in
__bad_page_fault() to determine if the fault is for a read or write, and
change the message printed accordingly.

But SLB faults, aka Data Segment Interrupts, don't set DSISR (Data
Storage Interrupt Status Register) to a useful value. All ISA versions
from v2.03 through v3.1 specify that the Data Segment Interrupt sets
DSISR "to an undefined value". As far as I can see there's no mention of
SLB faults setting DSISR in any BookIV content either.

This manifests as accesses that should be a read being incorrectly
reported as writes, for example, using the xmon "dump" command:

  0:mon> d 0x5deadbeef0000000
  5deadbeef0000000
  [359526.415354][    C6] BUG: Unable to handle kernel data access on write at 0x5deadbeef0000000
  [359526.415611][    C6] Faulting instruction address: 0xc00000000010a300
  cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf400]
      pc: c00000000010a300: mread+0x90/0x190

If we disassemble the PC, we see a load instruction:

  0:mon> di c00000000010a300
  c00000000010a300 89490000      lbz     r10,0(r9)

We can also see in exceptions-64s.S that the data_access_slb block
doesn't set IDSISR=1, which means it doesn't load DSISR into pt_regs. So
the value we're using to determine if the fault is a read/write is some
stale value in pt_regs from a previous page fault.

Rework the printing logic to separate the SLB fault case out, and only
print read/write in the cases where we can determine it.

The result looks like eg:

  0:mon> d 0x5deadbeef0000000
  5deadbeef0000000
  [  721.779525][    C6] BUG: Unable to handle kernel data access at 0x5deadbeef0000000
  [  721.779697][    C6] Faulting instruction address: 0xc00000000014cbe0
  cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf390]

  0:mon> d 0
  0000000000000000
  [  742.793242][    C6] BUG: Kernel NULL pointer dereference at 0x00000000
  [  742.793316][    C6] Faulting instruction address: 0xc00000000014cbe0
  cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf390]

Fixes: 46ddcb3950 ("powerpc/mm: Show if a bad page fault on data is read or write.")
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20220222113449.319193-1-mpe@ellerman.id.au
2022-03-08 00:04:56 +11:00
Jakob Koschel
fa1321b11b powerpc/sysdev: fix incorrect use to determine if list is empty
'gtm' will *always* be set by list_for_each_entry().
It is incorrect to assume that the iterator value will be NULL if the
list is empty.

Instead of checking the pointer it should be checked if
the list is empty.

Fixes: 83ff9dcf37 ("powerpc/sysdev: implement FSL GTM support")
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220228142434.576226-1-jakobkoschel@gmail.com
2022-03-08 00:04:56 +11:00
Haren Myneni
37e6764895 powerpc/pseries/vas: Add VAS migration handler
Since the VAS windows belong to the VAS hardware resource, the
hypervisor expects the partition to close them on source partition
and reopen them after the partition migrated on the destination
machine.

This handler is called before pseries_suspend() to close these
windows and again invoked after migration. All active windows
for both default and QoS types will be closed and mark them
inactive and reopened after migration with this handler.
During the migration, the user space receives paste instruction
failure if it issues copy/paste on these inactive windows.

The current migration implementation does not freeze the user
space and applications can continue to open VAS windows while
migration is in progress. So when the migration_in_progress flag
is set, VAS open window API returns -EBUSY.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/05e45ff4f8babd2490ccb7ae923884f4aa21a7e5.camel@linux.ibm.com
2022-03-08 00:04:56 +11:00
Haren Myneni
716d7a2e37 powerpc/pseries/vas: Modify reconfig open/close functions for migration
VAS is a hardware engine stays on the chip. So when the partition
migrates, all VAS windows on the source system have to be closed
and reopen them on the destination after migration.

The kernel has to consider both DLPAR CPU and migration events to
take action on VAS windows. So using VAS_WIN_NO_CRED_CLOSE and
VAS_WIN_MIGRATE_CLOSE status bits and windows will be reopened
after migration only after both status bits are cleared.

This patch make changes to the current reconfig_open/close_windows
functions to support migration:
- Set VAS_WIN_MIGRATE_CLOSE to the window status when closes and
  reopen windows with the same status during resume.
- Continue to close all windows even if deallocate HCALL failed
  (should not happen) since no way to stop migration with the
  current LPM implementation.
- If the DLPAR CPU event happens while migration is in progress,
  set VAS_WIN_NO_CRED_CLOSE to the window status. Close window
  happens with the first event (migration or DLPAR) and Reopen
  window happens only with the last event (migration or DLPAR).

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0aad580387cb58379496b4cbbd7c5596e9ea70be.camel@linux.ibm.com
2022-03-08 00:04:56 +11:00
Haren Myneni
278fe1cc22 powerpc/pseries/vas: Define global hv_cop_caps struct
The coprocessor capabilities struct is used to get default and
QoS capabilities from the hypervisor during init, DLPAR event and
migration. So instead of allocating this struct for each event,
define global struct and reuse it which allows the migration code
to avoid adding an error path.

Also disable copy/paste feature flag if any capabilities HCALL
is failed.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/57da6a270fcb9308cd57be7c88037029343080f7.camel@linux.ibm.com
2022-03-08 00:04:56 +11:00
Haren Myneni
45f06eac30 powerpc/pseries/vas: Add 'update_total_credits' entry for QoS capabilities
pseries supports two types of credits - Default (uses normal priority
FIFO) and Qality of service (QoS uses high priority FIFO). The user
decides the number of QoS credits and sets this value with HMC
interface. The total credits for QoS capabilities can be changed
dynamically with HMC interface which invokes drmgr to communicate
to the kernel.

This patch creats 'update_total_credits' entry for QoS capabilities
so that drmgr command can write the new target QoS credits in sysfs.
Instead of using this value, the kernel gets the new QoS capabilities
from the hypervisor whenever update_total_credits is updated to make
sure sync with the QoS target credits in the hypervisor.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b01ef31a0f964686d00243e7de7f09c73c07e69e.camel@linux.ibm.com
2022-03-08 00:04:56 +11:00
Haren Myneni
b903737bc5 powerpc/pseries/vas: sysfs interface to export capabilities
The hypervisor provides the available VAS GZIP capabilities such
as default or QoS window type and the target available credits in
each type. This patch creates sysfs entries and exports the target,
used and the available credits for each feature.

This interface can be used by the user space to determine the credits
usage or to set the target credits in the case of QoS type (for DLPAR).

/sys/devices/vas/vas0/gzip/default_capabilities (default GZIP capabilities)
	nr_total_credits /* Total credits available. Can be
			 /* changed with DLPAR operation */
	nr_used_credits  /* Used credits */

/sys/devices/vas/vas0/gzip/qos_capabilities (QoS GZIP capabilities)
	nr_total_credits
	nr_used_credits

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/702d8b626ebfac2b52f4995eebeafe1c9a6fcb75.camel@linux.ibm.com
2022-03-08 00:04:56 +11:00
Haren Myneni
c656cfe571 powerpc/pseries/vas: Reopen windows with DLPAR core add
VAS windows can be closed in the hypervisor due to lost credits
when the core is removed and the kernel gets fault for NX
requests on these inactive windows. If the NX requests are
issued on these inactive windows, OS gets page faults and the
paste failure will be returned to the user space. If the lost
credits are available later with core add, reopen these windows
and set them active. Later when the OS sees page faults on these
active windows, it creates mapping on the new paste address.
Then the user space can continue to use these windows and send
HW compression requests to NX successfully.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d9f360e21355e6826142c81146acfa9b60bc7ecc.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
8ef7b9e176 powerpc/pseries/vas: Close windows with DLPAR core removal
The hypervisor assigns vas credits (windows) for each LPAR based
on the number of cores configured in that system. The OS is
expected to release credits when cores are removed, and may
allocate more when cores are added. So there is a possibility of
using excessive credits (windows) in the LPAR and the hypervisor
expects the system to close the excessive windows so that NX load
can be equally distributed across all LPARs in the system.

When the OS closes the excessive windows in the hypervisor,
it sets the window status inactive and invalidates window
virtual address mapping. The user space receives paste instruction
failure if any NX requests are issued on the inactive window.
Then the user space can use with the available open windows or
retry NX requests until this window active again.

This patch also adds the notifier for core removal/add to close
windows in the hypervisor if the system lost credits (core
removal) and reopen windows in the hypervisor when the previously
lost credits are available.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/108928f9c00a48cc6a722315d482d07cf66acf5a.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
6a8d4ca891 powerpc/vas: Map paste address only if window is active
The paste address mapping is done with mmap() after the window is
opened with ioctl. The partition has to close VAS windows in the
hypervisor if it lost credits due to DLPAR core removal. But the
kernel marks these windows inactive until the previously lost
credits are available later. If the window is inactive due to
DLPAR after this mmap(), the paste instruction returns failure
until the the OS reopens this window again.

Before the user space issuing mmap(), there is a possibility of
happening DLPAR core removal event which causes the corresponding
window inactive. So if the window is not active, return mmap()
failure with -EACCES and expects the user space reissue mmap()
when the window is active or open a new window when the credit
is available.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bbb203c26b324534e25658cb1dbbcb5160a2f93a.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
b5c63d90cc powerpc/vas: Return paste instruction failure if no active window
The VAS window may not be active if the system looses credits and
the NX generates page fault when it receives request on unmap
paste address.

The kernel handles the fault by remap new paste address if the
window is active again, Otherwise return the paste instruction
failure if the executed instruction that caused the fault was
a paste.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/492b9aefd593061d51dda67ee4d2fc449c000dce.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
1fe3a33ba0 powerpc/vas: Add paste address mmap fault handler
The user space opens VAS windows and issues NX requests by pasting
CRB on the corresponding paste address mmap. When the system lost
credits due to core removal, the kernel has to close the window in
the hypervisor and make the window inactive by unmapping this paste
address. Also the OS has to handle NX request page faults if the user
space issue NX requests.

This handler maps the new paste address with the same VMA when the
window is active again (due to core add with DLPAR). Otherwise
returns paste failure.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3956e1c1fdfde69127055ff1c0256c7d71104030.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
976410cd2c powerpc/pseries/vas: Save PID in pseries_vas_window struct
The kernel sets the VAS window with PID when it is opened in
the hypervisor. During DLPAR operation, windows can be closed and
reopened in the hypervisor when the credit is available. So saves
this PID in pseries_vas_window struct when the window is opened
initially and reuse it later during DLPAR operation.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a57cbe6d292fe49ad55a0b49c5679d6a24d8fe73.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
40562fe4fa powerpc/pseries/vas: Use common names in VAS capability structure
nr_total/nr_used_credits provides credits usage to user space
via sysfs and the same interface can be used on PowerNV in
future. Changed with proper naming so that applicable on both
pseries and PowerNV.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f4313e9f198ee4f8d4fa4d015d8d1873e17851e6.camel@linux.ibm.com
2022-03-08 00:04:54 +11:00
Michael Ellerman
9ef78b6293 Merge branch 'topic/ppc-kvm' into next
Merge our topic branch containing powerpc KVM related commits.

Alexey Kardashevskiy (1):
      KVM: PPC: Merge powerpc's debugfs entry content into generic entry

Fabiano Rosas (9):
      KVM: PPC: Book3S HV: Stop returning internal values to userspace
      KVM: PPC: Fix vmx/vsx mixup in mmio emulation
      KVM: PPC: mmio: Reject instructions that access more than mmio.data size
      KVM: PPC: mmio: Return to guest after emulation failure
      KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure
      KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
      KVM: PPC: Book3S HV: Delay setting of kvm ops
      KVM: PPC: Book3S HV: Free allocated memory if module init fails
      KVM: PPC: Decrement module refcount if init_vm fails

Jason Wang (1):
      powerpc/kvm: no need to initialise statics to 0

Nour-eddine Taleb (1):
      KVM: PPC: Book3S HV: remove unnecessary casts
2022-03-08 00:02:29 +11:00
Michael Ellerman
4bc06c59f6 Merge branch 'topic/func-desc-lkdtm' into next
Merge a topic branch we are maintaining with some cross-architecture
changes to function descriptor handling and their use in LKDTM.

From Christophe's cover letter:

Fix LKDTM for PPC64/IA64/PARISC

PPC64/IA64/PARISC have function descriptors. LKDTM doesn't work on those
three architectures because LKDTM messes up function descriptors with
functions.

This series does some cleanup in the three architectures and refactors
function descriptors so that it can then easily use it in a generic way
in LKDTM.
2022-03-07 23:34:32 +11:00
Nicholas Piggin
c7fa848ff0 KVM: PPC: Book3S HV P9: Fix "lost kick" race
When new work is created that requires attention from the hypervisor
(e.g., to inject an interrupt into the guest), fast_vcpu_kick is used to
pull the target vcpu out of the guest if it may have been running.

Therefore the work creation side looks like this:

  vcpu->arch.doorbell_request = 1;
  kvmppc_fast_vcpu_kick_hv(vcpu) {
    smp_mb();
    cpu = vcpu->cpu;
    if (cpu != -1)
        send_ipi(cpu);
  }

And the guest entry side *should* look like this:

  vcpu->cpu = smp_processor_id();
  smp_mb();
  if (vcpu->arch.doorbell_request) {
    // do something (abort entry or inject doorbell etc)
  }

But currently the store and load are flipped, so it is possible for the
entry to see no doorbell pending, and the doorbell creation misses the
store to set cpu, resulting lost work (or at least delayed until the
next guest exit).

Fix this by reordering the entry operations and adding a smp_mb
between them. The P8 path appears to have a similar race which is
commented but not addressed yet.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-2-npiggin@gmail.com
2022-03-07 13:14:30 +11:00
Michael Ellerman
48015b632f powerpc: Fix STACKTRACE=n build
Our skiroot_defconfig doesn't enable FTRACE, and so doesn't get
STACKTRACE enabled either. That leads to a build failure since commit
1614b2b11f ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
made stacktrace.c build even when STACKTRACE=n.

  arch/powerpc/kernel/stacktrace.c: In function ‘handle_backtrace_ipi’:
  arch/powerpc/kernel/stacktrace.c:171:2: error: implicit declaration of function ‘nmi_cpu_backtrace’
    171 |  nmi_cpu_backtrace(regs);
        |  ^~~~~~~~~~~~~~~~~
  arch/powerpc/kernel/stacktrace.c: In function ‘arch_trigger_cpumask_backtrace’:
  arch/powerpc/kernel/stacktrace.c:226:2: error: implicit declaration of function ‘nmi_trigger_cpumask_backtrace’
    226 |  nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
        |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This happens because our headers haven't defined
arch_trigger_cpumask_backtrace, which causes lib/nmi_backtrace.c not to
build nmi_cpu_backtrace().

The code in question doesn't actually depend on STACKTRACE=y, that was
just added because arch_trigger_cpumask_backtrace() lived in
stacktrace.c for convenience. So drop the dependency on
CONFIG_STACKTRACE, that causes lib/nmi_backtrace.c to build
nmi_cpu_backtrace() etc. and fixes the build.

Fixes: 1614b2b11f ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
[mpe: Cherry pick of 5a72345e6a from next into fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220212111349.2806972-1-mpe@ellerman.id.au
2022-03-07 10:26:20 +11:00
Murilo Opsfelder Araujo
58dbe9b373 powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set
The following build failure occurs when CONFIG_PPC_64S_HASH_MMU is not
set:

    arch/powerpc/kernel/setup_64.c: In function ‘setup_per_cpu_areas’:
    arch/powerpc/kernel/setup_64.c:811:21: error: ‘mmu_linear_psize’ undeclared (first use in this function); did you mean ‘mmu_virtual_psize’?
      811 |                 if (mmu_linear_psize == MMU_PAGE_4K)
          |                     ^~~~~~~~~~~~~~~~
          |                     mmu_virtual_psize
    arch/powerpc/kernel/setup_64.c:811:21: note: each undeclared identifier is reported only once for each function it appears in

Move the declaration of mmu_linear_psize outside of
CONFIG_PPC_64S_HASH_MMU ifdef.

After the above is fixed, it fails later with the following error:

    ld: arch/powerpc/kexec/file_load_64.o: in function `.arch_kexec_kernel_image_probe':
    file_load_64.c:(.text+0x1c1c): undefined reference to `.add_htab_mem_range'

Fix that, too, by conditioning add_htab_mem_range() symbol to
CONFIG_PPC_64S_HASH_MMU.

Fixes: 387e220a2e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215567
Link: https://lore.kernel.org/r/20220301204743.45133-1-muriloo@linux.ibm.com
2022-03-05 20:42:21 +11:00
Nour-eddine Taleb
e40b38a41c KVM: PPC: Book3S HV: remove unnecessary casts
Remove unnecessary casts, from "void *" to "struct kvmppc_xics *"

Signed-off-by: Nour-eddine Taleb <kernel.noureddine@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303143416.201851-1-kernel.noureddine@gmail.com
2022-03-04 12:58:46 +11:00
Christoph Hellwig
27674ef6c7 mm: remove the extra ZONE_DEVICE struct page refcount
ZONE_DEVICE struct pages have an extra reference count that complicates
the code for put_page() and several places in the kernel that need to
check the reference count to see that a page is not being used (gup,
compaction, migration, etc.). Clean up the code so the reference count
doesn't need to be treated specially for ZONE_DEVICE pages.

Note that this excludes the special idle page wakeup for fsdax pages,
which still happens at refcount 1.  This is a separate issue and will
be sorted out later.  Given that only fsdax pages require the
notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig
symbol can go away and be replaced with a FS_DAX check for this hook
in the put_page fastpath.

Based on an earlier patch from Ralph Campbell <rcampbell@nvidia.com>.

Link: https://lkml.kernel.org/r/20220210072828.2930359-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Christian Knig <christian.koenig@amd.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Christoph Hellwig
dc90f0846d mm: don't include <linux/memremap.h> in <linux/mm.h>
Move the check for the actual pgmap types that need the free at refcount
one behavior into the out of line helper, and thus avoid the need to
pull memremap.h into mm.h.

Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Anders Roxell
8219d31eff powerpc/lib/sstep: Fix build errors with newer binutils
Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
2.37.90.20220207) the following build error shows up:

  {standard input}: Assembler messages:
  {standard input}:10576: Error: unrecognized opcode: `stbcx.'
  {standard input}:10680: Error: unrecognized opcode: `lharx'
  {standard input}:10694: Error: unrecognized opcode: `lbarx'

Rework to add assembler directives [1] around the instruction.  The
problem with this might be that we can trick a power6 into
single-stepping through an stbcx. for instance, and it will execute that
in kernel mode.

[1] https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo

Fixes: 350779a29f ("powerpc: Handle most loads and stores in instruction emulation code")
Cc: stable@vger.kernel.org # v4.14+
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220224162215.3406642-3-anders.roxell@linaro.org
2022-03-01 23:51:09 +11:00
Anders Roxell
8667d0d64d powerpc: Fix build errors with newer binutils
Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
2.37.90.20220207) the following build error shows up:

  {standard input}: Assembler messages:
  {standard input}:1190: Error: unrecognized opcode: `stbcix'
  {standard input}:1433: Error: unrecognized opcode: `lwzcix'
  {standard input}:1453: Error: unrecognized opcode: `stbcix'
  {standard input}:1460: Error: unrecognized opcode: `stwcix'
  {standard input}:1596: Error: unrecognized opcode: `stbcix'
  ...

Rework to add assembler directives [1] around the instruction. Going
through them one by one shows that the changes should be safe.  Like
__get_user_atomic_128_aligned() is only called in p9_hmi_special_emu(),
which according to the name is specific to power9.  And __raw_rm_read*()
are only called in things that are powernv or book3s_hv specific.

[1] https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo

Cc: stable@vger.kernel.org
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
[mpe: Make commit subject more descriptive]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220224162215.3406642-2-anders.roxell@linaro.org
2022-03-01 23:51:08 +11:00
Anders Roxell
a633cb1edd powerpc/lib/sstep: Fix 'sthcx' instruction
Looks like there been a copy paste mistake when added the instruction
'stbcx' twice and one was probably meant to be 'sthcx'. Changing to
'sthcx' from 'stbcx'.

Fixes: 350779a29f ("powerpc: Handle most loads and stores in instruction emulation code")
Cc: stable@vger.kernel.org # v4.14+
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220224162215.3406642-1-anders.roxell@linaro.org
2022-03-01 23:51:08 +11:00
Michael Ellerman
2863dd2db2 powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit
When CONFIG_GENERIC_CPU=y (true for all our defconfigs) we pass
-mcpu=powerpc64 to the compiler, even when we're building a 32-bit
kernel.

This happens because we have an ifdef CONFIG_PPC_BOOK3S_64/else block in
the Makefile that was written before 32-bit supported GENERIC_CPU. Prior
to that the else block only applied to 64-bit Book3E.

The GCC man page says -mcpu=powerpc64 "[specifies] a pure ... 64-bit big
endian PowerPC ... architecture machine [type], with an appropriate,
generic processor model assumed for scheduling purposes."

It's unclear how that interacts with -m32, which we are also passing,
although obviously -m32 is taking precedence in some sense, as the
32-bit kernel only contains 32-bit instructions.

This was noticed by inspection, not via any bug reports, but it does
affect code generation. Comparing before/after code generation, there
are some changes to instruction scheduling, and the after case (with
-mcpu=powerpc64 removed) the compiler seems more keen to use r8.

Fix it by making the else case only apply to Book3E 64, which excludes
32-bit.

Fixes: 0e00a8c9fd ("powerpc: Allow CPU selection also on PPC32")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220215112858.304779-1-mpe@ellerman.id.au
2022-03-01 23:51:04 +11:00
Daniel Henrique Barboza
749ed4a206 powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()
Executing node_set_online() when nid = NUMA_NO_NODE results in an
undefined behavior. node_set_online() will call node_set_state(), into
__node_set(), into set_bit(), and since NUMA_NO_NODE is -1 we'll end up
doing a negative shift operation inside
arch/powerpc/include/asm/bitops.h. This potential UB was detected
running a kernel with CONFIG_UBSAN.

The behavior was introduced by commit 10f78fd0da ("powerpc/numa: Fix a
regression on memoryless node 0"), where the check for nid > 0 was
removed to fix a problem that was happening with nid = 0, but the result
is that now we're trying to online NUMA_NO_NODE nids as well.

Checking for nid >= 0 will allow node 0 to be onlined while avoiding
this UB with NUMA_NO_NODE.

Fixes: 10f78fd0da ("powerpc/numa: Fix a regression on memoryless node 0")
Reported-by: Ping Fang <pifang@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220224182312.1012527-1-danielhb413@gmail.com
2022-03-01 23:41:01 +11:00
Christophe Leroy
973e2e6462 powerpc/interrupt: Remove struct interrupt_state
Since commit ceff77efa4 ("powerpc/64e/interrupt: Use new interrupt
context tracking scheme") struct interrupt_state has been empty and
unused.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1d862ce3eab3da6ca7ac47d4a78a18f154462511.1645806970.git.christophe.leroy@csgroup.eu
2022-03-01 23:41:00 +11:00
Hari Bathini
607451ce0a powerpc/fadump: register for fadump as early as possible
Crash recovery (fadump) is setup in the userspace by some service. This
service rebuilds initrd with dump capture capability, if it is not
already dump capture capable before proceeding to register for firmware
assisted dump (echo 1 > /sys/kernel/fadump/registered). But arming the
kernel with crash recovery support does not have to wait for userspace
configuration. So, register for fadump while setting it up itself. This
can at worst lead to a scenario, where /proc/vmcore is ready afer crash
but the initrd does not know how/where to offload it, which is always
better than not having a /proc/vmcore at all due to incomplete
configuration in the userspace at the time of crash.

Commit 0823c68b05 ("powerpc/fadump: re-register firmware-assisted dump
if already registered") ensures this change does not break userspace.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
[mpe: Reword comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220201105305.155511-1-hbathini@linux.ibm.com
2022-03-01 23:41:00 +11:00
Kees Cook
2792d84e6d usercopy: Check valid lifetime via stack depth
One of the things that CONFIG_HARDENED_USERCOPY sanity-checks is whether
an object that is about to be copied to/from userspace is overlapping
the stack at all. If it is, it performs a number of inexpensive
bounds checks. One of the finer-grained checks is whether an object
crosses stack frames within the stack region. Doing this on x86 with
CONFIG_FRAME_POINTER was cheap/easy. Doing it with ORC was deemed too
heavy, and was left out (a while ago), leaving the courser whole-stack
check.

The LKDTM tests USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM
try to exercise these cross-frame cases to validate the defense is
working. They have been failing ever since ORC was added (which was
expected). While Muhammad was investigating various LKDTM failures[1],
he asked me for additional details on them, and I realized that when
exact stack frame boundary checking is not available (i.e. everything
except x86 with FRAME_POINTER), it could check if a stack object is at
least "current depth valid", in the sense that any object within the
stack region but not between start-of-stack and current_stack_pointer
should be considered unavailable (i.e. its lifetime is from a call no
longer present on the stack).

Introduce ARCH_HAS_CURRENT_STACK_POINTER to track which architectures
have actually implemented the common global register alias.

Additionally report usercopy bounds checking failures with an offset
from current_stack_pointer, which may assist with diagnosing failures.

The LKDTM USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM tests
(once slightly adjusted in a separate patch) pass again with this fixed.

[1] https://github.com/kernelci/kernelci-project/issues/84

Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v1: https://lore.kernel.org/lkml/20220216201449.2087956-1-keescook@chromium.org
v2: https://lore.kernel.org/lkml/20220224060342.1855457-1-keescook@chromium.org
v3: https://lore.kernel.org/lkml/20220225173345.3358109-1-keescook@chromium.org
v4: - improve commit log (akpm)
2022-02-25 18:20:11 -08:00
Arnd Bergmann
dd865f090f
Merge branch 'set_fs-4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic into asm-generic
Christoph Hellwig and a few others spent a huge effort on removing
set_fs() from most of the important architectures, but about half the
other architectures were never completed even though most of them don't
actually use set_fs() at all.

I did a patch for microblaze at some point, which turned out to be fairly
generic, and now ported it to most other architectures, using new generic
implementations of access_ok() and __{get,put}_kernel_nocheck().

Three architectures (sparc64, ia64, and sh) needed some extra work,
which I also completed.

* 'set_fs-4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  uaccess: remove CONFIG_SET_FS
  ia64: remove CONFIG_SET_FS support
  sh: remove CONFIG_SET_FS support
  sparc64: remove CONFIG_SET_FS support
  lib/test_lockup: fix kernel pointer check for separate address spaces
  uaccess: generalize access_ok()
  uaccess: fix type mismatch warnings from access_ok()
  arm64: simplify access_ok()
  m68k: fix access_ok for coldfire
  MIPS: use simpler access_ok()
  MIPS: Handle address errors for accesses above CPU max virtual user address
  uaccess: add generic __{get,put}_kernel_nofault
  nios2: drop access_ok() check from __put_user()
  x86: use more conventional access_ok() definition
  x86: remove __range_not_ok()
  sparc64: add __{get,put}_kernel_nofault()
  nds32: fix access_ok() checks in get/put_user
  uaccess: fix nios2 and microblaze get_user_8()
  uaccess: fix integer overflow on access_ok()
2022-02-25 11:16:58 +01:00
Arnd Bergmann
12700c17fc uaccess: generalize access_ok()
There are many different ways that access_ok() is defined across
architectures, but in the end, they all just compare against the
user_addr_max() value or they accept anything.

Provide one definition that works for most architectures, checking
against TASK_SIZE_MAX for user processes or skipping the check inside
of uaccess_kernel() sections.

For architectures without CONFIG_SET_FS(), this should be the fastest
check, as it comes down to a single comparison of a pointer against a
compile-time constant, while the architecture specific versions tend to
do something more complex for historic reasons or get something wrong.

Type checking for __user annotations is handled inconsistently across
architectures, but this is easily simplified as well by using an inline
function that takes a 'const void __user *' argument. A handful of
callers need an extra __user annotation for this.

Some architectures had trick to use 33-bit or 65-bit arithmetic on the
addresses to calculate the overflow, however this simpler version uses
fewer registers, which means it can produce better object code in the
end despite needing a second (statically predicted) branch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64, asm-generic]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Arnd Bergmann
23fc539e81 uaccess: fix type mismatch warnings from access_ok()
On some architectures, access_ok() does not do any argument type
checking, so replacing the definition with a generic one causes
a few warnings for harmless issues that were never caught before.

Fix the ones that I found either through my own test builds or
that were reported by the 0-day bot.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Arnd Bergmann
34737e2698 uaccess: add generic __{get,put}_kernel_nofault
Nine architectures are still missing __{get,put}_kernel_nofault:
alpha, ia64, microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa.

Add a generic version that lets everything use the normal
copy_{from,to}_kernel_nofault() code based on these, removing the last
use of get_fs()/set_fs() from architecture-independent code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Jakub Kicinski
aaa25a2fa7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/testing/selftests/net/mptcp/mptcp_join.sh
  34aa6e3bcc ("selftests: mptcp: add ip mptcp wrappers")

  857898eb4b ("selftests: mptcp: add missing join check")
  6ef84b1517 ("selftests: mptcp: more robust signal race test")
https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/

drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c
  fb7e76ea3f ("net/mlx5e: TC, Skip redundant ct clear actions")
  c63741b426 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information")

  09bf979232 ("net/mlx5e: TC, Move pedit_headers_action to parse_attr")
  84ba8062e3 ("net/mlx5e: Test CT and SAMPLE on flow attr")
  efe6f961cd ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow")
  3b49a7edec ("net/mlx5e: TC, Reject rules with multiple CT actions")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24 17:54:25 -08:00
Guo Zhengkui
8a0edc72be powerpc/module_64: fix array_size.cocci warning
Fix following coccicheck warning:
./arch/powerpc/kernel/module_64.c:432:40-41: WARNING: Use ARRAY_SIZE.

ARRAY_SIZE(arr) is a macro provided by the kernel. It makes sure that arr
is an array, so it's safer than sizeof(arr) / sizeof(arr[0]) and more
standard.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220223075426.20939-1-guozhengkui@vivo.com
2022-02-24 17:53:55 +11:00
Nicholas Piggin
8b91cee5ea powerpc/64s/hash: Make hash faults work in NMI context
Hash faults are not resoved in NMI context, instead causing the access
to fail. This is done because perf interrupts can get backtraces
including walking the user stack, and taking a hash fault on those could
deadlock on the HPTE lock if the perf interrupt hits while the same HPTE
lock is being held by the hash fault code. The user-access for the stack
walking will notice the access failed and deal with that in the perf
code.

The reason to allow perf interrupts in is to better profile hash faults.

The problem with this is any hash fault on a kernel access that happens
in NMI context will crash, because kernel accesses must not fail.

Hard lockups, system reset, machine checks that access vmalloc space
including modules and including stack backtracing and symbol lookup in
modules, per-cpu data, etc could all run into this problem.

Fix this by disallowing perf interrupts in the hash fault code (the
direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs
to extend to the preload case). This simplifies the tricky logic in hash
faults and perf, at the cost of reduced profiling of hash faults.

perf can still latch addresses when interrupts are disabled, it just
won't get the stack trace at that point, so it would still find hot
spots, just sometimes with confusing stack chains.

An alternative could be to allow perf interrupts here but always do the
slowpath stack walk if we are in nmi context, but that slows down all
perf interrupt stack walking on hash though and it does not remove as
much tricky code.

Reported-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220204035348.545435-1-npiggin@gmail.com
2022-02-24 12:46:54 +11:00
Christophe Leroy
406a8c1d8f powerpc: Remove remaining stab codes
Following commit 1231816373 ("powerpc/32: Remove remaining .stabs
annotations"), stabs code are not used anymore.

Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d8b33342d7454f6ca4f368f5206896558dfa06f4.1645538722.git.christophe.leroy@csgroup.eu
2022-02-23 14:49:27 +11:00
Pali Rohár
904b10fb18 PCI: Add defines for normal and subtractive PCI bridges
Add these PCI class codes to pci_ids.h:

  PCI_CLASS_BRIDGE_PCI_NORMAL
  PCI_CLASS_BRIDGE_PCI_SUBTRACTIVE

Use these defines in all kernel code for describing PCI class codes for
normal and subtractive PCI bridges.

[bhelgaas: similar change in pci-mvebu.c]
Link: https://lore.kernel.org/r/20220214114109.26809-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-02-17 15:29:35 -06:00
Masahiro Yamada
4a3233c1a6
shmbuf.h: add asm/shmbuf.h to UAPI compile-test coverage
asm/shmbuf.h is currently excluded from the UAPI compile-test because of
the errors like follows:

    HDRTEST usr/include/asm/shmbuf.h
  In file included from ./usr/include/asm/shmbuf.h:6,
                   from <command-line>:
  ./usr/include/asm-generic/shmbuf.h:26:33: error: field ‘shm_perm’ has incomplete type
     26 |         struct ipc64_perm       shm_perm;       /* operation perms */
        |                                 ^~~~~~~~
  ./usr/include/asm-generic/shmbuf.h:27:9: error: unknown type name ‘size_t’
     27 |         size_t                  shm_segsz;      /* size of segment (bytes) */
        |         ^~~~~~
  ./usr/include/asm-generic/shmbuf.h:40:9: error: unknown type name ‘__kernel_pid_t’
     40 |         __kernel_pid_t          shm_cpid;       /* pid of creator */
        |         ^~~~~~~~~~~~~~
  ./usr/include/asm-generic/shmbuf.h:41:9: error: unknown type name ‘__kernel_pid_t’
     41 |         __kernel_pid_t          shm_lpid;       /* pid of last operator */
        |         ^~~~~~~~~~~~~~

The errors can be fixed by replacing size_t with __kernel_size_t and by
including proper headers.

Then, remove the no-header-test entry from user/include/Makefile.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-17 09:09:37 +01:00
Masahiro Yamada
72113d0a7d
signal.h: add linux/signal.h and asm/signal.h to UAPI compile-test coverage
linux/signal.h and asm/signal.h are currently excluded from the UAPI
compile-test because of the errors like follows:

    HDRTEST usr/include/asm/signal.h
  In file included from <command-line>:
  ./usr/include/asm/signal.h:103:9: error: unknown type name ‘size_t’
    103 |         size_t ss_size;
        |         ^~~~~~

The errors can be fixed by replacing size_t with __kernel_size_t.

Then, remove the no-header-test entries from user/include/Makefile.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-17 09:09:36 +01:00
Christophe Leroy
e1478d8eaf asm-generic: Refactor dereference_[kernel]_function_descriptor()
dereference_function_descriptor() and
dereference_kernel_function_descriptor() are identical on the
three architectures implementing them.

Make them common and put them out-of-line in kernel/extable.c
which is one of the users and has similar type of functions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/449db09b2eba57f4ab05f80102a67d8675bc8bcd.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
0dc690e4ef asm-generic: Define 'func_desc_t' to commonly describe function descriptors
We have three architectures using function descriptors, each with its
own type and name.

Add a common typedef that can be used in generic code.

Also add a stub typedef for architecture without function descriptors,
to avoid a forest of #ifdefs.

It replaces the similar 'func_desc_t' previously defined in
arch/powerpc/kernel/module_64.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f1f91b142b3c1082bdc1586ce71c9bac1e75213c.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
a257cacc38 asm-generic: Define CONFIG_HAVE_FUNCTION_DESCRIPTORS
Replace HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR by a config option
named CONFIG_HAVE_FUNCTION_DESCRIPTORS and use it instead of
'dereference_function_descriptor' macro to know whether an
arch has function descriptors.

To limit churn in one of the following patches, use
an #ifdef/#else construct with empty first part
instead of an #ifndef in asm-generic/sections.h

On powerpc, make sure the config option matches the ABI used
by the compiler with a BUILD_BUG_ON() and add missing _CALL_ELF=2
when calling 'sparse' so that sparse sees the same piece of
code as GCC.

And include a helper to check whether an arch has function
descriptors or not : have_function_descriptors()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4a0f11fb0ea74a3197bc44dd7ba25e53a24fd03d.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
2fd986377d powerpc: Prepare func_desc_t for refactorisation
In preparation of making func_desc_t generic, change the ELFv2
version to a struct containing 'addr' element.

This allows using single helpers common to ELFv1 and ELFv2 and
reduces the amount of #ifdef's

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5c36105e08b27b98450535bff48d71b690c19739.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
0a9c5ae279 powerpc: Remove 'struct ppc64_opd_entry'
'struct ppc64_opd_entry' doesn't belong to uapi/asm/elf.h

It was initially in module_64.c and commit 2d291e9027 ("Fix compile
failure with non modular builds") moved it into asm/elf.h

But it was by mistake added outside of __KERNEL__ section,
therefore commit c3617f7203 ("UAPI: (Scripted) Disintegrate
arch/powerpc/include/asm") moved it to uapi/asm/elf.h

Now that it is not used anymore by the kernel, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c309ccee65ec2e3802df7a7fe761d0a298584809.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
d3e32b997a powerpc: Use 'struct func_desc' instead of 'struct ppc64_opd_entry'
'struct ppc64_opd_entry' is somehow redundant with 'struct func_desc',
the later is more correct/complete as it includes the third
field which is unused.

So use 'struct func_desc' instead of 'struct ppc64_opd_entry'

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/34e76bac6cbe95a63ecd37df69fb7feb93b0ea7c.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:10 +11:00
Christophe Leroy
5b23cb8cc6 powerpc: Move and rename func_descr_t
There are three architectures with function descriptors, try to
have common names for the address they contain in order to
refactor some functions into generic functions later.

powerpc has 'entry'
ia64 has 'ip'
parisc has 'addr'

Vote for 'addr' and update 'func_descr_t' accordingly.

Move it in asm/elf.h to have it at the same place on all
three architectures, remove the typedef which hides its real
type, and change it to a smoother name 'struct func_desc'.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/529b2ba1d001e8f628ef0d30e8044c9b3d0a4921.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:10 +11:00
Christophe Leroy
81df21de8f powerpc: Fix 'sparse' checking on PPC64le
'sparse' is architecture agnostic and knows nothing about ELF ABI
version.

Just like it gets arch and powerpc type and endian from Makefile,
it also need to get _CALL_ELF from there, otherwise it won't set
PPC64_ELF_ABI_v2 macro for PPC64le and won't check the correct code.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ac1312f2451aa558bb2a8806b4d0aa2020f0c176.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:10 +11:00
Vaibhav Jain
bbbca72352 powerpc/papr_scm: Implement initial support for injecting smart errors
Presently PAPR doesn't support injecting smart errors on an
NVDIMM. This makes testing the NVDIMM health reporting functionality
difficult as simulating NVDIMM health related events need a hacked up
qemu version.

To solve this problem this patch proposes simulating certain set of
NVDIMM health related events in papr_scm. Specifically 'fatal' health
state and 'dirty' shutdown state. These error can be injected via the
user-space 'ndctl-inject-smart(1)' command. With the proposed patch and
corresponding ndctl patches following command flow is expected:

$ sudo ndctl list -DH -d nmem0
...
      "health_state":"ok",
      "shutdown_state":"clean",
...
 # inject unsafe shutdown and fatal health error
$ sudo ndctl inject-smart nmem0 -Uf
...
      "health_state":"fatal",
      "shutdown_state":"dirty",
...
 # uninject all errors
$ sudo ndctl inject-smart nmem0 -N
...
      "health_state":"ok",
      "shutdown_state":"clean",
...

The patch adds a new member 'health_bitmap_inject_mask' inside struct
papr_scm_priv which is then bitwise ANDed to the health bitmap fetched from the
hypervisor. The value for 'health_bitmap_inject_mask' is accessible from sysfs
at nmemX/papr/health_bitmap_inject.

A new PDSM named 'SMART_INJECT' is proposed that accepts newly
introduced 'struct nd_papr_pdsm_smart_inject' as payload thats
exchanged between libndctl and papr_scm to indicate the requested
smart-error states.

When the processing the PDSM 'SMART_INJECT', papr_pdsm_smart_inject()
constructs a pair or 'inject_mask' and 'clear_mask' bitmaps from the payload
and bit-blt it to the 'health_bitmap_inject_mask'. This ensures the after being
fetched from the hypervisor, the health_bitmap reflects requested smart-error
states.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124202204.1488346-1-vaibhav@linux.ibm.com
2022-02-16 23:10:47 +11:00
Christophe Leroy
76b372814b powerpc/ftrace: Style cleanup in ftrace_mprofile.S
Add some line breaks to better match the file's style, add
some space after comma and fix a couple of misplaced blanks.

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/973506292d0c7b05c06530c8e11803ce38e5eda2.1644949750.git.christophe.leroy@csgroup.eu
2022-02-16 23:09:47 +11:00
Christophe Leroy
fc75f87337 powerpc/ftrace: Have arch_ftrace_get_regs() return NULL unless FL_SAVE_REGS is set
When FL_SAVE_REGS is not set we get here via ftrace_caller()
which doesn't save all registers.

ftrace_caller() explicitely clears regs.msr, so we can rely
on it to know where we come from. We don't expect MSR register
to be 0 at all when involving ftrace.

Fixes: 40b035efe2 ("powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2f9a7e898c93cc7438ef5ccd47cb9c3a9c5b53ef.1644949750.git.christophe.leroy@csgroup.eu
2022-02-16 23:09:47 +11:00
Christophe Leroy
df45a55788 powerpc/ftrace: Add recursion protection in prepare_ftrace_return()
The function_graph_enter() does not provide any recursion protection.

Add a protection in prepare_ftrace_return() in case
function_graph_enter() calls something that gets
function graph traced.

Fixes: 830213786c ("powerpc/ftrace: directly call of function graph tracer by ftrace caller")
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/74edf2ff0a60e66b0d9225a137100a86a0557032.1644949750.git.christophe.leroy@csgroup.eu
2022-02-16 23:09:47 +11:00
Christophe Leroy
34d8dac807 powerpc/ftrace: Also save r1 in ftrace_caller()
Also save r1 in ftrace_caller()

r1 is needed during unwinding when the function_graph tracer
is active.

Fixes: 830213786c ("powerpc/ftrace: directly call of function graph tracer by ftrace caller")
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ff535e86d3a69376a6d89168511d4e403835f18b.1644949750.git.christophe.leroy@csgroup.eu
2022-02-16 23:09:47 +11:00
Anders Roxell
fe663df782 powerpc/lib/sstep: fix 'ptesync' build error
Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
2.37.90.20220207) the following build error shows up:

  {standard input}: Assembler messages:
  {standard input}:2088: Error: unrecognized opcode: `ptesync'
  make[3]: *** [/builds/linux/scripts/Makefile.build:287: arch/powerpc/lib/sstep.o] Error 1

Add the 'ifdef CONFIG_PPC64' around the 'ptesync' in function
'emulate_update_regs()' to like it is in 'analyse_instr()'. Since it looks like
it got dropped inadvertently by commit 3cdfcbfd32 ("powerpc: Change
analyse_instr so it doesn't modify *regs").

A key detail is that analyse_instr() will never recognise lwsync or
ptesync on 32-bit (because of the existing ifdef), and as a result
emulate_update_regs() should never be called with an op specifying
either of those on 32-bit. So removing them from emulate_update_regs()
should be a nop in terms of runtime behaviour.

Fixes: 3cdfcbfd32 ("powerpc: Change analyse_instr so it doesn't modify *regs")
Cc: stable@vger.kernel.org # v4.14+
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
[mpe: Add last paragraph of change log mentioning analyse_instr() details]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220211005113.1361436-1-anders.roxell@linaro.org
2022-02-15 22:31:35 +11:00
Paul Menzel
cb7356986d powerpc/boot: Add otheros-too-big.bld to .gitignore
Currently, `git status` lists the file as untracked by git, so tell git
to ignore it.

Fixes: aa3bc365ee ("powerpc/ps3: Add check for otheros image size")
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220214065543.198992-1-pmenzel@molgen.mpg.de
2022-02-15 22:29:52 +11:00
Christophe Leroy
38a1756861 powerpc: Don't allow the use of EMIT_BUG_ENTRY with BUGFLAG_WARNING
Warnings in assembly must use EMIT_WARN_ENTRY in order to generate
the necessary entry in exception table.

Check in EMIT_BUG_ENTRY that flags don't include BUGFLAG_WARNING.

This change avoids problems like the one fixed by
commit fd1eaaaaa6 ("powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug
warnings").

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ddcb422102a37eb45f57694c7ef0ec6187964dff.1644742951.git.christophe.leroy@csgroup.eu
2022-02-14 13:06:43 +11:00
Michael Ellerman
5a72345e6a powerpc: Fix STACKTRACE=n build
Our skiroot_defconfig doesn't enable FTRACE, and so doesn't get
STACKTRACE enabled either. That leads to a build failure since commit
1614b2b11f ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
made stacktrace.c build even when STACKTRACE=n.

  arch/powerpc/kernel/stacktrace.c: In function ‘handle_backtrace_ipi’:
  arch/powerpc/kernel/stacktrace.c:171:2: error: implicit declaration of function ‘nmi_cpu_backtrace’
    171 |  nmi_cpu_backtrace(regs);
        |  ^~~~~~~~~~~~~~~~~
  arch/powerpc/kernel/stacktrace.c: In function ‘arch_trigger_cpumask_backtrace’:
  arch/powerpc/kernel/stacktrace.c:226:2: error: implicit declaration of function ‘nmi_trigger_cpumask_backtrace’
    226 |  nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
        |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This happens because our headers haven't defined
arch_trigger_cpumask_backtrace, which causes lib/nmi_backtrace.c not to
build nmi_cpu_backtrace().

The code in question doesn't actually depend on STACKTRACE=y, that was
just added because arch_trigger_cpumask_backtrace() lived in
stacktrace.c for convenience. So drop the dependency on
CONFIG_STACKTRACE, that causes lib/nmi_backtrace.c to build
nmi_cpu_backtrace() etc. and fixes the build.

Fixes: 1614b2b11f ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220212111349.2806972-1-mpe@ellerman.id.au
2022-02-12 22:47:44 +11:00
Aneesh Kumar K.V
2354ad252b powerpc/mm: Update default hugetlb size early
commit: d9c2340052 ("Do not depend on MAX_ORDER when grouping pages by mobility")
introduced pageblock_order which will be used to group pages better.
The kernel now groups pages based on the value of HPAGE_SHIFT. Hence HPAGE_SHIFT
should be set before we call set_pageblock_order.

set_pageblock_order happens early in the boot and default hugetlb page size
should be initialized before that to compute the right pageblock_order value.

Currently, default hugetlbe page size is set via arch_initcalls which happens
late in the boot as shown via the below callstack:

[c000000007383b10] [c000000001289328] hugetlbpage_init+0x2b8/0x2f8
[c000000007383bc0] [c0000000012749e4] do_one_initcall+0x14c/0x320
[c000000007383c90] [c00000000127505c] kernel_init_freeable+0x410/0x4e8
[c000000007383da0] [c000000000012664] kernel_init+0x30/0x15c
[c000000007383e10] [c00000000000cf14] ret_from_kernel_thread+0x5c/0x64

and the pageblock_order initialization is done early during the boot.

[c0000000018bfc80] [c0000000012ae120] set_pageblock_order+0x50/0x64
[c0000000018bfca0] [c0000000012b3d94] sparse_init+0x188/0x268
[c0000000018bfd60] [c000000001288bfc] initmem_init+0x28c/0x328
[c0000000018bfe50] [c00000000127b370] setup_arch+0x410/0x480
[c0000000018bfed0] [c00000000127401c] start_kernel+0xb8/0x934
[c0000000018bff90] [c00000000000d984] start_here_common+0x1c/0x98

delaying default hugetlb page size initialization implies the kernel will
initialize pageblock_order to (MAX_ORDER - 1) which is not an optimal
value for mobility grouping. IIUC we always had this issue. But it was not
a problem for hash translation mode because (MAX_ORDER - 1) is the same as
HUGETLB_PAGE_ORDER (8) in the case of hash (16MB). With radix,
HUGETLB_PAGE_ORDER will be 5 (2M size) and hence pageblock_order should be
5 instead of 8.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220211065215.101767-1-aneesh.kumar@linux.ibm.com
2022-02-12 22:47:44 +11:00
Nathan Lynch
92e6dc257b powerpc/pseries: make pseries_devicetree_update() static
pseries_devicetree_update() has only one call site, in the same file in
which it is defined. Make it static.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220207221247.354454-1-nathanl@linux.ibm.com
2022-02-12 22:47:44 +11:00
Christophe Leroy
692b21d780 powerpc/vdso: Move cvdso_call macro into gettimeofday.S
Now that gettimeofday.S is unique, move cvdso_call macro
into that file which is the only user.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/72720359d4c58e3a3b96dd74952741225faac3de.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:44 +11:00
Christophe Leroy
9b97bea900 powerpc/vdso: Remove cvdso_call_time macro
cvdso_call_time macro is very similar to cvdso_call macro.

Add a call_time argument to cvdso_call which is 0 by default
and set to 1 when using cvdso_call to call __c_kernel_time().

Return returned value as is with CR[SO] cleared when it is used
for time().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/837a260ad86fc1ce297a562c2117fd69be5f7b5c.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
fd1feade75 powerpc/vdso: Merge vdso64 and vdso32 into a single directory
merge vdso64 into vdso32 and rename it vdso.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4dbe05cc130f6a0858d09ac72e436c373cb08b70.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
d88378d8d2 powerpc/vdso: Rework VDSO32 makefile to add a prefix to object files
In order to merge vdso32 and vdso64 build in following patch, rework
Makefile is order to add -32 suffix to VDSO32 object files.

Also change sigtramp.S to sigtramp32.S as VDSO64 sigtramp.S is too
different to be squashed into VDSO32 sigtramp.S at the first place.

gen_vdso_offsets.sh also becomes gen_vdso32_offsets.sh

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0c421b704a57b228e75a891512568339c53667ad.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
f061fb03ee powerpc/vdso: augment VDSO32 functions to support 64 bits build
VDSO64 cacheflush.S datapage.S gettimeofday.S and vgettimeofday.c
are very similar to their VDSO32 counterpart.

VDSO32 counterpart is already more complete than the VDSO64 version
as it supports both PPC32 vdso and 32 bits VDSO for PPC64.

Use compat macros wherever necessary in PPC32 files
so that they can also be used to build VDSO64.

vdso64/note.S is already a link to vdso32/note.S so
no change is required.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c2cbb8f046b7efc251053521dc39b752795e26b7.1642782130.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
6836f09903 powerpc/lib/sstep: use truncate_if_32bit()
Use truncate_if_32bit() when possible instead of open coding.

truncate_if_32bit() returns an unsigned long, so don't use it when
a signed value is expected.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7e1c07123f13156d4a27991a2e2694fb584bc068.1642752375.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
7c3bba9199 powerpc/lib/sstep: Remove unneeded #ifdef __powerpc64__
MSR_64BIT is always defined, no need to hide code using MSR_64BIT
inside an #ifdef __powerpc64__

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ee61b693bc7e046eed1abb7a34909eb4878a9442.1642752375.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:43 +11:00
Christophe Leroy
67484e0de9 powerpc/lib/sstep: Use l1_dcache_bytes() instead of opencoding
Don't opencode dcache size retrieval based on whether that's ppc32 or ppc64.

Use l1_dcache_bytes()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6c608fd4795e2d8ea1a0a449405a0087f76d8bb3.1642752375.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
9d44d1bd93 powerpc: Use the newly added is_tsk_32bit_task() macro
Two places deserve using the macro is_tsk_32bit_task() added by
commit 252745240b ("powerpc/audit: Fix syscall_get_arch()")

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7304a889dbe885aefad8a8333673c81ee4b8f7a6.1642751874.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
0670010f3b powerpc/32s: Enable STRICT_MODULE_RWX for the 603 core
The book3s/32 MMU doesn't support per page execution protection and
doesn't support RO protection for kernel pages.

However, on the 603 which implements software loaded TLBs, execution
protection is honored by the TLB Miss handler which doesn't load
Instruction TLB for non executable pages. And RO protection is
honored by clearing the C bit for RO pages, leading to DSI.

So on the 603, STRICT_MODULE_RWX is possible without much effort.
Don't disable STRICT_MODULE_RWX on book3s/32 and print a warning
in case STRICT_MODULE_RWX has been selected and the platform has
a Hardware HASH MMU.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1e6162f334167e75f1140082932e3a354b16daba.1642413973.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
a8936569a0 powerpc/bpf: Always reallocate BPF_REG_5, BPF_REG_AX and TMP_REG when possible
BPF_REG_5, BPF_REG_AX and TMP_REG are mapped on non volatile registers
because there are not enough volatile registers, but they don't need
to be preserved on function calls.

So when some volatile registers become available, those registers can
always be reallocated regardless of whether SEEN_FUNC is set or not.

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b04c246874b716911139c04bc004b3b14eed07ef.1641817763.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
f222ab83df powerpc: Add set_memory_{p/np}() and remove set_memory_attr()
set_memory_attr() was implemented by commit 4d1755b6a7 ("powerpc/mm:
implement set_memory_attr()") because the set_memory_xx() couldn't
be used at that time to modify memory "on the fly" as explained it
the commit.

But set_memory_attr() uses set_pte_at() which leads to warnings when
CONFIG_DEBUG_VM is selected, because set_pte_at() is unexpected for
updating existing page table entries.

The check could be bypassed by using __set_pte_at() instead,
as it was the case before commit c988cfd38e ("powerpc/32:
use set_memory_attr()") but since commit 9f7853d760 ("powerpc/mm:
Fix set_memory_*() against concurrent accesses") it is now possible
to use set_memory_xx() functions to update page table entries
"on the fly" because the update is now atomic.

For DEBUG_PAGEALLOC we need to clear and set back _PAGE_PRESENT.
Add set_memory_np() and set_memory_p() for that.

Replace all uses of set_memory_attr() by the relevant set_memory_xx()
and remove set_memory_attr().

Fixes: c988cfd38e ("powerpc/32: use set_memory_attr()")
Cc: stable@vger.kernel.org
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Depends-on: 9f7853d760 ("powerpc/mm: Fix set_memory_*() against concurrent accesses")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cda2b44b55c96f9ac69fa92e68c01084ec9495c5.1640344012.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
a4c182ecf3 powerpc/set_memory: Avoid spinlock recursion in change_page_attr()
Commit 1f9ad21c3b ("powerpc/mm: Implement set_memory() routines")
included a spin_lock() to change_page_attr() in order to
safely perform the three step operations. But then
commit 9f7853d760 ("powerpc/mm: Fix set_memory_*() against
concurrent accesses") modify it to use pte_update() and do
the operation safely against concurrent access.

In the meantime, Maxime reported some spinlock recursion.

[   15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217
[   15.357540]  lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0
[   15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523
[   15.373350] Workqueue: events do_free_init
[   15.377615] Call Trace:
[   15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable)
[   15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4
[   15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310
[   15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0
[   15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8
[   15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94
[   15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310
[   15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134
[   15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8
[   15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c
[   15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8
[   15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94
[   15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8
[   15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8
[   15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210
[   15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c

Remove the read / modify / write sequence to make the operation atomic
and remove the spin_lock() in change_page_attr().

To do the operation atomically, we can't use pte modification helpers
anymore. Because all platforms have different combination of bits, it
is not easy to use those bits directly. But all have the
_PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare
two sets to know which bits are set or cleared.

For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you
know which bit gets cleared and which bit get set when changing exec
permission.

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/all/20211212112152.GA27070@sakura/
Link: https://lore.kernel.org/r/43c3c76a1175ae6dc1a3d3b5c3f7ecb48f683eea.1640344012.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:42 +11:00
Christophe Leroy
4ee83a2cfb powerpc/ftrace: Remove ftrace_32.S
Functions in ftrace_32.S are common with PPC64.

Reuse the ones defined for PPC64 with slight modification
when required.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Squash in fixup diff from Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5e837fc190504c4ef834272e70d60ae33f175d49.1640017960.git.christophe.leroy@csgroup.eu
2022-02-12 22:47:28 +11:00
Ard Biesheuvel
297565aa22 lib/xor: make xor prototypes more friendly to compiler vectorization
Modern compilers are perfectly capable of extracting parallelism from
the XOR routines, provided that the prototypes reflect the nature of the
input accurately, in particular, the fact that the input vectors are
expected not to overlap. This is not documented explicitly, but is
implied by the interchangeability of the various C routines, some of
which use temporary variables while others don't: this means that these
routines only behave identically for non-overlapping inputs.

So let's decorate these input vectors with the __restrict modifier,
which informs the compiler that there is no overlap. While at it, make
the input-only vectors pointer-to-const as well.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/563
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-02-11 20:39:39 +11:00
Jakub Kicinski
1127170d45 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-02-09

We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).

The main changes are:

1) Add custom BPF allocator for JITs that pack multiple programs into a huge
   page to reduce iTLB pressure, from Song Liu.

2) Add __user tagging support in vmlinux BTF and utilize it from BPF
   verifier when generating loads, from Yonghong Song.

3) Add per-socket fast path check guarding from cgroup/BPF overhead when
   used by only some sockets, from Pavel Begunkov.

4) Continued libbpf deprecation work of APIs/features and removal of their
   usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
   and various others.

5) Improve BPF instruction set documentation by adding byte swap
   instructions and cleaning up load/store section, from Christoph Hellwig.

6) Switch BPF preload infra to light skeleton and remove libbpf dependency
   from it, from Alexei Starovoitov.

7) Fix architecture-agnostic macros in libbpf for accessing syscall
   arguments from BPF progs for non-x86 architectures,
   from Ilya Leoshkevich.

8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
   of 16-bit field with anonymous zero padding, from Jakub Sitnicki.

9) Add new bpf_copy_from_user_task() helper to read memory from a different
   task than current. Add ability to create sleepable BPF iterator progs,
   from Kenny Yu.

10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
    utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.

11) Generate temporary netns names for BPF selftests to avoid naming
    collisions, from Hangbin Liu.

12) Implement bpf_core_types_are_compat() with limited recursion for
    in-kernel usage, from Matteo Croce.

13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
    to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.

14) Misc minor fixes to libbpf and selftests from various folks.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
  selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
  bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
  libbpf: Fix compilation warning due to mismatched printf format
  selftests/bpf: Test BPF_KPROBE_SYSCALL macro
  libbpf: Add BPF_KPROBE_SYSCALL macro
  libbpf: Fix accessing the first syscall argument on s390
  libbpf: Fix accessing the first syscall argument on arm64
  libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
  selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
  libbpf: Fix accessing syscall arguments on riscv
  libbpf: Fix riscv register names
  libbpf: Fix accessing syscall arguments on powerpc
  selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
  libbpf: Add PT_REGS_SYSCALL_REGS macro
  selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
  bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
  bpf: Fix leftover header->pages in sparc and powerpc code.
  libbpf: Fix signedness bug in btf_dump_array_data()
  selftests/bpf: Do not export subtest as standalone test
  bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
  ...
====================

Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09 18:40:56 -08:00
Song Liu
0f350231b5 bpf: Fix leftover header->pages in sparc and powerpc code.
Replace header->pages * PAGE_SIZE with new header->size.

Fixes: ed2d9e1a26 ("bpf: Use size instead of pages in bpf_binary_header")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220208220509.4180389-2-song@kernel.org
2022-02-08 14:52:05 -08:00
Christophe Leroy
41315494be powerpc/ftrace: Prepare ftrace_64_mprofile.S for reuse by PPC32
PPC64 mprofile versions and PPC32 are very similar.

Modify PPC64 version so that if can be reused for PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/82a732915dc71ee766e31809350939331944006d.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
830213786c powerpc/ftrace: directly call of function graph tracer by ftrace caller
Modify function graph tracer to be handled directly by the standard
ftrace caller.

This is made possible as powerpc now supports
CONFIG_DYNAMIC_FTRACE_WITH_ARGS.

This change simplifies the call of function graph ftrace.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/04d196585ff81bde06a000bd9c633a33a5b21130.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
0c81ed5ed4 powerpc/ftrace: Refactor ftrace_{en/dis}able_ftrace_graph_caller
ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() have common code.

They will have even more common code after following patch.

Refactor into a single ftrace_modify_ftrace_graph_caller() function.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f37785a531f1a8f201e1b3da45997a5c77e9d820.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
40b035efe2 powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS
Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call
of livepatching.

Also note that powerpc being the last one to convert to
CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove
klp_arch_set_pc() on all architectures.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5831f711a778fcd6eb51eb5898f1faae4378b35b.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
c75388a8ce powerpc/ftrace: Prepare PPC64's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS
In order to implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS, change ftrace_caller()
to handle LIVEPATCH the same way as frace_caller_regs().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/850817333cc76593699032e8e9a70d8c36e1af1e.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
d95bf254be powerpc/ftrace: Prepare PPC32's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS
In order to implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS, change
ftrace_caller() stack layout to match struct pt_regs.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/da9734eba504998fb914aca12131c9f6bf6120a8.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
7bdb478c1d powerpc/ftrace: Simplify PPC32's return_to_handler()
return_to_handler() was copied from PPC64. For PPC32 it
just needs to save r3 and r4, and doesn't require any nop
after the bl.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aab39b77b34fb2c4ed08ed01c547b6ed13643788.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy
7875bc9b07 powerpc/ftrace: Don't save again LR in ftrace_regs_caller() on PPC32
PPC32 mcount() caller already saves LR on stack,
no need to save it again.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eadcfc770b4f1e35535ffb85e28e858a2c31dec4.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
a4520b2527 powerpc/ftrace: Add support for livepatch to PPC32
PPC64 needs some special logic to properly set up the TOC.
See commit 85baa09549 ("powerpc/livepatch: Add live patching support
on ppc64le") for details.

PPC32 doesn't have TOC so it doesn't need that logic, so adding
LIVEPATCH support is straight forward.

Add CONFIG_LIVEPATCH_64 and move livepatch stack logic into that item.

Livepatch sample modules all work.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/63cb094125b6a6038c65eeac2abaabbabe63addd.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
0c850965d6 powerpc/module_32: Fix livepatching for RO modules
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.

R_PPC_ADDR16_LO, R_PPC_ADDR16_HI, R_PPC_ADDR16_HA and R_PPC_REL24 are
the types generated by the kpatch-build userspace tool or klp-convert
kernel tree observed applying a relocation to a post-init module.

Use patch_instruction() to patch those relocations.

Commit 8734b41b3e ("powerpc/module_64: Fix livepatching for
RO modules") did similar change in module_64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d5697157cb7dba3927e19aa17c915a83bc550bb2.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
27e21e8f12 powerpc/32: Remove _ENTRY() macro
_ENTRY() is now redundant with _GLOBAL(). Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/62a35f8dde2bb74c8d0d7a5430cce07a5a3a6fb6.1638273868.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
1231816373 powerpc/32: Remove remaining .stabs annotations
STABS debug format has been superseded long time ago by DWARF.

Remove the few remaining .stabs annotations from old 32 bits code.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/68932ec2ba6b868d35006b96e90f0890f3da3c05.1638273868.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
66ada29078 powerpc/corenet: Change criteria to set MPIC_ENABLE_COREINT
Don't use ppc_md function comparison.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c8ef82ee5f2713f4c36eb5d2d49b0905c7472801.1630667612.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy
fae65a9ac8 powerpc/mpc86xx_hpcn: Remove obsolete statement
Comment says "Delete this in 2.6.27".

Do so now.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a47bb6a69c68156bc2d555152dab5a23733856b7.1630667612.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:09 +11:00
Christophe Leroy
e6d03ac156 powerpc/machdep: Move sys_ctrler_t definition into pmac_feature.h
sys_ctrler_t definitions are tied to pmac. Move it into pmac_feature.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Move to pmac_feature.h to fix some build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7dd5ead4bbca749e2da089ff6fe2b1878d6bf40e.1630667612.git.christophe.leroy@csgroup.eu
2022-02-07 21:02:20 +11:00
Christophe Leroy
9bb162fa26 powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
Allthough kernel text is always mapped with BATs, we still have
inittext mapped with pages, so TLB miss handling is required
when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set.

The final solution should be to set a BAT that also maps inittext
but that BAT then needs to be cleared at end of init, and it will
require more changes to be able to do it properly.

As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big
deal so let's fix it simply for now to enable easy stable application.

Fixes: 035b19a15a ("powerpc/32s: Always map kernel text and rodata with BATs")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aea33b4813a26bdb9378b5f273f00bd5d4abe240.1638857364.git.christophe.leroy@csgroup.eu
2022-02-07 17:18:47 +11:00
Christophe Leroy
d6a6c725a2 powerpc/machdep: Remove CONFIG_PPC_HAS_FEATURE_CALLS
Last user was removed by commit 7bbd827750 ("[PATCH] ppc64: very
basic desktop g5 sound support").

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/803779fffb4ee0801746b2173d37cea3b273f821.1630667612.git.christophe.leroy@csgroup.eu
2022-02-07 16:43:35 +11:00
Sourabh Jain
7c5ed82b80 powerpc: Set crashkernel offset to mid of RMA region
On large config LPARs (having 192 and more cores), Linux fails to boot
due to insufficient memory in the first memblock. It is due to the
memory reservation for the crash kernel which starts at 128MB offset of
the first memblock. This memory reservation for the crash kernel doesn't
leave enough space in the first memblock to accommodate other essential
system resources.

The crash kernel start address was set to 128MB offset by default to
ensure that the crash kernel get some memory below the RMA region which
is used to be of size 256MB. But given that the RMA region size can be
512MB or more, setting the crash kernel offset to mid of RMA size will
leave enough space for the kernel to allocate memory for other system
resources.

Since the above crash kernel offset change is only applicable to the LPAR
platform, the LPAR feature detection is pushed before the crash kernel
reservation. The rest of LPAR specific initialization will still
be done during pseries_probe_fw_features as usual.

This patch is dependent on changes to paca allocation for boot CPU. It
expect boot CPU to discover 1T segment support which is introduced by
the patch posted here:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html

Reported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com
2022-02-07 15:26:12 +11:00
Christophe Leroy
4291d085b0 powerpc/32s: Make pte_update() non atomic on 603 core
On 603 core, TLB miss handler don't do any change to the
page tables so pte_update() doesn't need to be atomic.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cc89d3c11fc9c742d0df3454a657a3a00be24046.1643538554.git.christophe.leroy@csgroup.eu
2022-02-03 22:57:19 +11:00
Christophe Leroy
535bda36db powerpc/nohash: Remove pte_same()
arch/powerpc/include/asm/nohash/{32/64}/pgtable.h has

	#define __HAVE_ARCH_PTE_SAME
	#define pte_same(A,B)      ((pte_val(A) ^ pte_val(B)) == 0)

include/linux/pgtable.h has

	#ifndef __HAVE_ARCH_PTE_SAME
	static inline int pte_same(pte_t pte_a, pte_t pte_b)
	{
		return pte_val(pte_a) == pte_val(pte_b);
	}
	#endif

Remove the powerpc version which is similar to the generic one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/83c97bd58a3596ef1b0ff28b1e41fd492d005520.1643616989.git.christophe.leroy@csgroup.eu
2022-02-03 22:57:00 +11:00
Christophe Leroy
4634bf4455 powerpc/603: Clear C bit when PTE is read only
On book3s/32 MMU, PP bits don't offer kernel RO protection,
kernel pages are always RW.

However, on the 603 a page fault is always generated when the
C bit (change bit = dirty bit) is not set.

Enforce kernel RO protection by clearing C bit in TLB miss
handler when the page doesn't have _PAGE_RW flag.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bbb13848ff0100a76ee9ea95118058c30ae95f2c.1643613343.git.christophe.leroy@csgroup.eu
2022-02-03 22:56:44 +11:00
Christophe Leroy
9872cbfb45 powerpc/603: Remove outdated comment
Since commit 84de6ab0e9 ("powerpc/603: don't handle PAGE_ACCESSED
in TLB miss handlers.") page table is not updated anymore by
TLB miss handlers.

Remove the comment.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/38b1ffefd2146fa56bf8aa605d476ad9736bbb37.1643613296.git.christophe.leroy@csgroup.eu
2022-02-03 22:38:13 +11:00
Chen Jingwen
dd75080aa8 powerpc/kasan: Fix early region not updated correctly
The shadow's page table is not updated when PTE_RPN_SHIFT is 24
and PAGE_SHIFT is 12. It not only causes false positives but
also false negative as shown the following text.

Fix it by bringing the logic of kasan_early_shadow_page_entry here.

1. False Positive:
==================================================================
BUG: KASAN: vmalloc-out-of-bounds in pcpu_alloc+0x508/0xa50
Write of size 16 at addr f57f3be0 by task swapper/0/1

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-12267-gdebe436e77c7 #1
Call Trace:
[c80d1c20] [c07fe7b8] dump_stack_lvl+0x4c/0x6c (unreliable)
[c80d1c40] [c02ff668] print_address_description.constprop.0+0x88/0x300
[c80d1c70] [c02ff45c] kasan_report+0x1ec/0x200
[c80d1cb0] [c0300b20] kasan_check_range+0x160/0x2f0
[c80d1cc0] [c03018a4] memset+0x34/0x90
[c80d1ce0] [c0280108] pcpu_alloc+0x508/0xa50
[c80d1d40] [c02fd7bc] __kmem_cache_create+0xfc/0x570
[c80d1d70] [c0283d64] kmem_cache_create_usercopy+0x274/0x3e0
[c80d1db0] [c2036580] init_sd+0xc4/0x1d0
[c80d1de0] [c00044a0] do_one_initcall+0xc0/0x33c
[c80d1eb0] [c2001624] kernel_init_freeable+0x2c8/0x384
[c80d1ef0] [c0004b14] kernel_init+0x24/0x170
[c80d1f10] [c001b26c] ret_from_kernel_thread+0x5c/0x64

Memory state around the buggy address:
 f57f3a80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 f57f3b00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
>f57f3b80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
                                               ^
 f57f3c00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 f57f3c80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
==================================================================

2. False Negative (with KASAN tests):
==================================================================
Before fix:
    ok 45 - kmalloc_double_kzfree
    # vmalloc_oob: EXPECTATION FAILED at lib/test_kasan.c:1039
    KASAN failure expected in "((volatile char *)area)[3100]", but none occurred
    not ok 46 - vmalloc_oob
    not ok 1 - kasan

==================================================================
After fix:
    ok 1 - kasan

Fixes: cbd18991e2 ("powerpc/mm: Fix an Oops in kasan_mmu_init()")
Cc: stable@vger.kernel.org # 5.4.x
Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211229035226.59159-1-chenjingwen6@huawei.com
2022-02-03 22:37:44 +11:00
Christophe JAILLET
e414e2938e powerpc/xive: Add some error handling code to 'xive_spapr_init()'
'xive_irq_bitmap_add()' can return -ENOMEM.
In this case, we should free the memory already allocated and return
'false' to the caller.

Also add an error path which undoes the 'tima = ioremap(...)'

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/564998101804886b151235c8a9f93020923bfd2c.1643718324.git.christophe.jaillet@wanadoo.fr
2022-02-03 22:36:59 +11:00
Athira Rajeev
0198322379 powerpc/perf: Don't use perf_hw_context for trace IMC PMU
Trace IMC (In-Memory collection counters) in powerpc is useful for
application level profiling.

For trace_imc, presently task context (task_ctx_nr) is set to
perf_hw_context. But perf_hw_context should only be used for CPU PMU.
See commit 2665784850 ("perf/core: Verify we have a single
perf_hw_context PMU").

So for trace_imc, even though it is per thread PMU, it is preferred to
use sw_context in order to be able to do application level monitoring.
Hence change the task_ctx_nr to use perf_sw_context.

Fixes: 012ae24484 ("powerpc/perf: Trace imc PMU functions")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Update subject & incorporate notes into change log, reflow comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220202041837.65968-1-atrajeev@linux.vnet.ibm.com
2022-02-03 22:32:54 +11:00
Wedson Almeida Filho
d4be60fe66 powerpc/module_64: use module_init_section instead of patching names
Without this patch, module init sections are disabled by patching their
names in arch-specific code when they're loaded (which prevents code in
layout_sections from finding init sections). This patch uses the new
arch-specific module_init_section instead.

This allows modules that have .init_array sections to have the
initialisers properly called (on load, before init). Without this patch,
the initialisers are not called because .init_array is renamed to
_init_array, and thus isn't found by code in find_module_sections().

Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220202055123.2144842-1-wedsonaf@google.com
2022-02-03 22:20:37 +11:00
Julia Lawall
925f76c557 powerpc/spufs: adjust list element pointer type
Other uses of &gang->aff_list_head, eg in spufs_assert_affinity, indicate
that the list elements have type spu_context, not spu as used here.  Change
the type of tmp accordingly.

This has no impact on the execution, because tmp is not used in the body of
the loop.

Fixes: c5fc8d2a92 ("[CELL] cell: add placement computation for scheduling of affinity contexts")
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1588929176-28527-1-git-send-email-Julia.Lawall@inria.fr
2022-02-03 21:40:32 +11:00
Bhaskar Chowdhury
a1c4140933 powerpc/epapr: Fix parmeters typo
s/parmeters/parameters/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210320213932.22697-1-unixbhaskar@gmail.com
2022-02-03 21:35:56 +11:00
Fabiano Rosas
b53c861059 powerpc: Fix debug print in smp_setup_cpu_maps
When figuring out the number of threads, the debug message prints "1
thread" for the first iteration of the loop, instead of the actual
number of threads calculated from the length of the
"ibm,ppc-interrupt-server#s" property.

  * /cpus/PowerPC,POWER8@20...
    ibm,ppc-interrupt-server#s -> 1 threads <--- WRONG
    thread 0 -> cpu 0 (hard id 32)
    thread 1 -> cpu 1 (hard id 33)
    thread 2 -> cpu 2 (hard id 34)
    thread 3 -> cpu 3 (hard id 35)
    thread 4 -> cpu 4 (hard id 36)
    thread 5 -> cpu 5 (hard id 37)
    thread 6 -> cpu 6 (hard id 38)
    thread 7 -> cpu 7 (hard id 39)
  * /cpus/PowerPC,POWER8@28...
    ibm,ppc-interrupt-server#s -> 8 threads
    thread 0 -> cpu 8 (hard id 40)
    thread 1 -> cpu 9 (hard id 41)
    thread 2 -> cpu 10 (hard id 42)
    thread 3 -> cpu 11 (hard id 43)
    thread 4 -> cpu 12 (hard id 44)
    thread 5 -> cpu 13 (hard id 45)
    thread 6 -> cpu 14 (hard id 46)
    thread 7 -> cpu 15 (hard id 47)
(...)

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210120181847.952106-1-farosas@linux.ibm.com
2022-02-03 20:24:59 +11:00
Fabiano Rosas
4feb74aa64 KVM: PPC: Decrement module refcount if init_vm fails
We increment the reference count for KVM-HV/PR before the call to
kvmppc_core_init_vm. If that function fails we need to decrement the
refcount.

Also remove the check on kvm_ops->owner because try_module_get can
handle a NULL module.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125155735.1018683-5-farosas@linux.ibm.com
2022-02-03 16:50:44 +11:00
Fabiano Rosas
175be7e580 KVM: PPC: Book3S HV: Free allocated memory if module init fails
The module's exit function is not called when the init fails, we need
to do cleanup before returning.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125155735.1018683-4-farosas@linux.ibm.com
2022-02-03 16:50:44 +11:00
Fabiano Rosas
c5d0d77b45 KVM: PPC: Book3S HV: Delay setting of kvm ops
Delay the setting of kvm_hv_ops until after all init code has
completed. This avoids leaving the ops still accessible if the init
fails.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125155735.1018683-3-farosas@linux.ibm.com
2022-02-03 16:50:44 +11:00
Fabiano Rosas
69ab6ac380 KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
The return of the function is being shadowed by the call to
kvmppc_uvmem_init.

Fixes: ca9f494267 ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125155735.1018683-2-farosas@linux.ibm.com
2022-02-03 16:50:44 +11:00
Michael Ellerman
961f649fb3 powerpc/ptdump: Fix sparse warning in hashpagetable.c
As reported by sparse:

  arch/powerpc/mm/ptdump/hashpagetable.c:264:29: warning: restricted __be64 degrades to integer
  arch/powerpc/mm/ptdump/hashpagetable.c:265:49: warning: restricted __be64 degrades to integer
  arch/powerpc/mm/ptdump/hashpagetable.c:267:36: warning: incorrect type in assignment (different base types)
  arch/powerpc/mm/ptdump/hashpagetable.c:267:36:    expected unsigned long long [usertype]
  arch/powerpc/mm/ptdump/hashpagetable.c:267:36:    got restricted __be64 [usertype] v
  arch/powerpc/mm/ptdump/hashpagetable.c:268:36: warning: incorrect type in assignment (different base types)
  arch/powerpc/mm/ptdump/hashpagetable.c:268:36:    expected unsigned long long [usertype]
  arch/powerpc/mm/ptdump/hashpagetable.c:268:36:    got restricted __be64 [usertype] r

The values returned by plpar_pte_read_4() are CPU endian, not __be64, so
assigning them to struct hash_pte confuses sparse. As a minimal fix open
code a struct to hold the values with CPU endian types.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220202053039.691917-1-mpe@ellerman.id.au
2022-02-02 20:32:11 +11:00
Michael Ellerman
2e7f1e2b30 powerpc/64: Move paca allocation later in boot
Mahesh & Sourabh identified two problems[1][2] with ppc64_bolted_size()
and paca allocation.

The first is that on a Radix capable machine but with "disable_radix" on
the command line, there is a window during early boot where
early_radix_enabled() is true, even though it will later become false.

  early_init_devtree:                       <- early_radix_enabled() = false
    early_init_dt_scan_cpus:                <- early_radix_enabled() = false
        ...
        check_cpu_pa_features:              <- early_radix_enabled() = false
        ...                               ^ <- early_radix_enabled() = TRUE
        allocate_paca:                    | <- early_radix_enabled() = TRUE
            ...                           |
            ppc64_bolted_size:            | <- early_radix_enabled() = TRUE
                if (early_radix_enabled())| <- early_radix_enabled() = TRUE
                    return ULONG_MAX;     |
        ...                               |
    ...                                   | <- early_radix_enabled() = TRUE
    ...                                   | <- early_radix_enabled() = TRUE
    mmu_early_init_devtree()              V
    ...                                     <- early_radix_enabled() = false

This causes ppc64_bolted_size() to return ULONG_MAX for the boot CPU's
paca allocation, even though later it will return a different value.
This is not currently a bug because the paca allocation is also limited
by the RMA size, but that is very fragile.

The second issue is that when using the Hash MMU, when we call
ppc64_bolted_size() for the boot CPU's paca allocation, we have not yet
detected whether 1T segments are available. That causes
ppc64_bolted_size() to return 256MB, even if the machine can actually
support up to 1T. This is usually OK, we generally have space below
256MB for one paca, but for a kdump kernel placed above 256MB it causes
the boot to fail.

At boot we cannot discover all the features of the machine
instantaneously, so there will always be some periods where we have
incomplete knowledge of the system. However both the above problems stem
from the fact that we allocate the boot CPU's paca (and paca pointers
array) before we decide which MMU we are using, or discover its exact
features.

Moving the paca allocation slightly later still can solve both the
issues described above, and means for a normal boot we don't do any
permanent allocations until after we've discovered the MMU.

Note that although we move the boot CPU's paca allocation later, we
still have a temporary paca (boot_paca) accessible via r13, so code that
does read only access to paca fields is safe. The only risk is that some
code writes to the boot_paca, and that write will then be lost when we
switch away from the boot_paca later in early_setup().

The additional code that runs before the paca allocation is primarily
mmu_early_init_devtree(), which is scanning the device tree and
populating globals and cur_cpu_spec with MMU related flags. I do not see
any additional code that writes to paca fields.

[1]: https://lore.kernel.org/r/20211018084434.217772-2-sourabhjain@linux.ibm.com
[2]: https://lore.kernel.org/r/20211018084434.217772-3-sourabhjain@linux.ibm.com

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124130544.408675-1-mpe@ellerman.id.au
2022-02-02 20:32:11 +11:00
Maxim Kiselev
5ebb747492 powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch
On board rev A, the network interface labels for the switch ports
written on the front panel are different than on rev B and later.

This patch fixes network interface names for the switch ports according
to labels that are written on the front panel of the board rev B.
They start from ETH3 and end at ETH10.

This patch also introduces a separate device tree for rev A.
The main device tree is supposed to cover rev B and later.

Fixes: e69eb0824d ("powerpc: dts: t1040rdb: add ports for Seville Ethernet switch")
Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220121091447.3412907-1-bigunclemax@gmail.com
2022-02-02 20:32:10 +11:00
Laurent Dufour
eddaa9a402 powerpc/pseries: read the lpar name from the firmware
The LPAR name may be changed after the LPAR has been started in the HMC.
In that case lparstat command is not reporting the updated value because
it reads it from the device tree which is read at boot time.

However this value could be read from RTAS.

Adding this value in the /proc/powerpc/lparcfg output allows to read the
updated value.

However the hypervisor, like Qemu/KVM, may not support this RTAS
parameter. In that case the value reported in lparcfg is read from the
device tree and so is not updated accordingly.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
[mpe: Drop doc-comment syntax, change RTAS/DT to lower case, use of_root
      to fix missing of_node_put(), use of_property_read_string()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220106161339.74656-1-ldufour@linux.ibm.com
2022-02-02 20:32:10 +11:00
Jason Wang
8e0f353a44 powerpc/kvm: no need to initialise statics to 0
Static variables do not need to be initialised to 0, because compiler
will initialise all uninitialised statics to 0. Thus, remove the
unneeded initialization.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211220030243.603435-1-wangborong@cdjrlc.com
2022-02-02 20:31:25 +11:00
Alexey Kardashevskiy
faf01aef05 KVM: PPC: Merge powerpc's debugfs entry content into generic entry
At the moment KVM on PPC creates 4 types of entries under the kvm debugfs:
1) "%pid-%fd" per a KVM instance (for all platforms);
2) "vm%pid" (for PPC Book3s HV KVM);
3) "vm%u_vcpu%u_timing" (for PPC Book3e KVM);
4) "kvm-xive-%p" (for XIVE PPC Book3s KVM, the same for XICS);

The problem with this is that multiple VMs per process is not allowed for
2) and 3) which makes it possible for userspace to trigger errors when
creating duplicated debugfs entries.

This merges all these into 1).

This defines kvm_arch_create_kvm_debugfs() similar to
kvm_arch_create_vcpu_debugfs().

This defines 2 hooks in kvmppc_ops that allow specific KVM implementations
add necessary entries, this adds the _e500 suffix to
kvmppc_create_vcpu_debugfs_e500() to make it clear what platform it is for.

This makes use of already existing kvm_arch_create_vcpu_debugfs() on PPC.

This removes no more used debugfs_dir pointers from PPC kvm_arch structs.

This stops removing vcpu entries as once created vcpus stay around
for the entire life of a VM and removed when the KVM instance is closed,
see commit d56f5136b0 ("KVM: let kvm_destroy_vm_debugfs clean up vCPU
debugfs directories").

Suggested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220111005404.162219-1-aik@ozlabs.ru
2022-02-02 20:30:26 +11:00
Thierry Reding
d5342fdd16 powerpc: dts: Fix some I2C unit addresses
The unit-address for the Maxim MAX1237 ADCs on XPedite5200 boards don't
match the value in the "reg" property and cause a DTC warning.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211220134036.683309-1-thierry.reding@gmail.com
2022-01-31 13:45:24 +11:00
Maxim Kiselev
17846485df powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
T1040RDB has two RTL8211E-VB phys which requires setting
of internal delays for correct work.

Changing the phy-connection-type property to `rgmii-id`
will fix this issue.

Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211230151123.1258321-1-bigunclemax@gmail.com
2022-01-31 13:45:24 +11:00
Tobias Waldekranz
f529edd1b6 powerpc/e500/qemu-e500: allow core to idle without waiting
This means an idle guest won't needlessly consume an entire core on
the host, waiting for work to show up.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220112112459.1033754-1-troglobit@gmail.com
2022-01-31 13:45:24 +11:00
Michal Suchanek
b2a6f60435 powerpc: add link stack flush mitigation status in debugfs.
The link stack flush status is not visible in debugfs. It can be enabled
even when count cache flush is disabled. Add separate file for its
status.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
[mpe: Update for change to link_stack_flush_type]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191127220959.6208-1-msuchanek@suse.de
2022-01-31 13:45:23 +11:00
Sachin Sant
279d1a72c0 powerpc/xive: Export XIVE IPI information for online-only processors.
Cédric pointed out that XIVE IPI information exported via sysfs
(debug/powerpc/xive) display empty lines for processors which are
not online.

Switch to using for_each_online_cpu() so that information is
displayed for online-only processors.

Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/164146703333.19039.10920919226094771665.sendpatchset@MacBook-Pro.local
2022-01-31 13:45:23 +11:00
Fabiano Rosas
c1c8a66367 KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure
MMIO emulation can fail if the guest uses an instruction that we are
not prepared to emulate. Since these instructions can be and most
likely are valid ones, this is (slightly) closer to an access fault
than to an illegal instruction, so deliver a Data Storage interrupt
instead of a Program interrupt.

BookE ignores bad faults, so it will keep using a Program interrupt
because a DSI would cause a fault loop in the guest.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-6-farosas@linux.ibm.com
2022-01-31 13:43:00 +11:00
Fabiano Rosas
349fbfe9b9 KVM: PPC: mmio: Return to guest after emulation failure
If MMIO emulation fails we don't want to crash the whole guest by
returning to userspace.

The original commit bbf45ba57e ("KVM: ppc: PowerPC 440 KVM
implementation") added a todo:

  /* XXX Deliver Program interrupt to guest. */

and later the commit d69614a295 ("KVM: PPC: Separate loadstore
emulation from priv emulation") added the Program interrupt injection
but in another file, so I'm assuming it was missed that this block
needed to be altered.

Also change the message to a ratelimited one since we're letting the
guest run and it could flood the host logs.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-5-farosas@linux.ibm.com
2022-01-31 13:43:00 +11:00
Fabiano Rosas
3f83150448 KVM: PPC: mmio: Reject instructions that access more than mmio.data size
The MMIO interface between the kernel and userspace uses a structure
that supports a maximum of 8-bytes of data. Instructions that access
more than that need to be emulated in parts.

We currently don't have generic support for splitting the emulation in
parts and each set of instructions needs to be explicitly included.

There's already an error message being printed when a load or store
exceeds the mmio.data buffer but we don't fail the emulation until
later at kvmppc_complete_mmio_load and even then we allow userspace to
make a partial copy of the data, which ends up overwriting some fields
of the mmio structure.

This patch makes the emulation fail earlier at kvmppc_handle_load|store,
which will send a Program interrupt to the guest. This is better than
allowing the guest to proceed with partial data.

Note that this was caught in a somewhat artificial scenario using
quadword instructions (lq/stq), there's no account of an actual guest
in the wild running instructions that are not properly emulated.

(While here, remove the "bad MMIO" messages. The caller already has an
error message.)

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-4-farosas@linux.ibm.com
2022-01-31 13:43:00 +11:00
Fabiano Rosas
b99234b918 KVM: PPC: Fix vmx/vsx mixup in mmio emulation
The MMIO emulation code for vector instructions is duplicated between
VSX and VMX. When emulating VMX we should check the VMX copy size
instead of the VSX one.

Fixes: acc9eb9305 ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-3-farosas@linux.ibm.com
2022-01-31 13:42:59 +11:00
Fabiano Rosas
36d014d37d KVM: PPC: Book3S HV: Stop returning internal values to userspace
Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values
to userspace, against the API of the KVM_RUN ioctl which returns 0 on
success.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220125215655.1026224-2-farosas@linux.ibm.com
2022-01-31 13:42:59 +11:00
Al Viro
0c9dceb9bb asm/user.h: killed unused macros
Some of them used to be used by libbfd for a.out coredump handling.
Seeing that
	* libbfd has their copies anyway
	* we don't export them into userland headers
	* we don't support a.out coredumps anymore
let's bury the definitions.  They never had in-kernel
users anyway...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-01-30 21:17:00 -05:00
Arnd Bergmann
3e17314c22 agp: define proper stubs for empty helpers
The empty unmap_page_from_agp() macro causes a warning when
building with 'make W=1' on a couple of architectures:

drivers/char/agp/generic.c: In function 'agp_generic_destroy_page':
drivers/char/agp/generic.c:1265:28: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
 1265 |   unmap_page_from_agp(page);

Change the definitions to a 'do { } while (0)' construct to
make these more reliable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-29 22:24:25 +01:00
Nicholas Piggin
8defc2a5dd powerpc/64s/interrupt: Fix decrementer storm
The decrementer exception can fail to be cleared when the interrupt
returns in the case where the decrementer wraps with the next timer
still beyond decrementer_max. This results in a decrementer interrupt
storm. This is triggerable with small decrementer system with hard
and soft watchdogs disabled.

Fix this by always programming the decrementer if there was no timer.

Fixes: 0faf20a1ad ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124143930.3923442-1-npiggin@gmail.com
2022-01-25 16:50:10 +11:00
Nicholas Piggin
22f7ff0dea KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs
The L0 is storing HFSCR requested by the L1 for the L2 in struct
kvm_nested_guest when the L1 requests a vCPU enter L2. kvm_nested_guest
is not a per-vCPU structure. Hilarity ensues.

Fix it by moving the nested hfscr into the vCPU structure together with
the other per-vCPU nested fields.

Fixes: 8b210a880b ("KVM: PPC: Book3S HV Nested: Make nested HFSCR state accessible")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220122105530.3477250-1-npiggin@gmail.com
2022-01-25 16:39:41 +11:00
Athira Rajeev
fb6433b48a powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
Running selftest with CONFIG_PPC_IRQ_SOFT_MASK_DEBUG enabled in kernel
triggered below warning:

[  172.851380] ------------[ cut here ]------------
[  172.851391] WARNING: CPU: 8 PID: 2901 at arch/powerpc/include/asm/hw_irq.h:246 power_pmu_disable+0x270/0x280
[  172.851402] Modules linked in: dm_mod bonding nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfkill nfnetlink sunrpc xfs libcrc32c pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse
[  172.851442] CPU: 8 PID: 2901 Comm: lost_exception_ Not tainted 5.16.0-rc5-03218-g798527287598 #2
[  172.851451] NIP:  c00000000013d600 LR: c00000000013d5a4 CTR: c00000000013b180
[  172.851458] REGS: c000000017687860 TRAP: 0700   Not tainted  (5.16.0-rc5-03218-g798527287598)
[  172.851465] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48004884  XER: 20040000
[  172.851482] CFAR: c00000000013d5b4 IRQMASK: 1
[  172.851482] GPR00: c00000000013d5a4 c000000017687b00 c000000002a10600 0000000000000004
[  172.851482] GPR04: 0000000082004000 c0000008ba08f0a8 0000000000000000 00000008b7ed0000
[  172.851482] GPR08: 00000000446194f6 0000000000008000 c00000000013b118 c000000000d58e68
[  172.851482] GPR12: c00000000013d390 c00000001ec54a80 0000000000000000 0000000000000000
[  172.851482] GPR16: 0000000000000000 0000000000000000 c000000015d5c708 c0000000025396d0
[  172.851482] GPR20: 0000000000000000 0000000000000000 c00000000a3bbf40 0000000000000003
[  172.851482] GPR24: 0000000000000000 c0000008ba097400 c0000000161e0d00 c00000000a3bb600
[  172.851482] GPR28: c000000015d5c700 0000000000000001 0000000082384090 c0000008ba0020d8
[  172.851549] NIP [c00000000013d600] power_pmu_disable+0x270/0x280
[  172.851557] LR [c00000000013d5a4] power_pmu_disable+0x214/0x280
[  172.851565] Call Trace:
[  172.851568] [c000000017687b00] [c00000000013d5a4] power_pmu_disable+0x214/0x280 (unreliable)
[  172.851579] [c000000017687b40] [c0000000003403ac] perf_pmu_disable+0x4c/0x60
[  172.851588] [c000000017687b60] [c0000000003445e4] __perf_event_task_sched_out+0x1d4/0x660
[  172.851596] [c000000017687c50] [c000000000d1175c] __schedule+0xbcc/0x12a0
[  172.851602] [c000000017687d60] [c000000000d11ea8] schedule+0x78/0x140
[  172.851608] [c000000017687d90] [c0000000001a8080] sys_sched_yield+0x20/0x40
[  172.851615] [c000000017687db0] [c0000000000334dc] system_call_exception+0x18c/0x380
[  172.851622] [c000000017687e10] [c00000000000c74c] system_call_common+0xec/0x268

The warning indicates that MSR_EE being set(interrupt enabled) when
there was an overflown PMC detected. This could happen in
power_pmu_disable since it runs under interrupt soft disable
condition ( local_irq_save ) and not with interrupts hard disabled.
commit 2c9ac51b85 ("powerpc/perf: Fix PMU callbacks to clear
pending PMI before resetting an overflown PMC") intended to clear
PMI pending bit in Paca when disabling the PMU. It could happen
that PMC gets overflown while code is in power_pmu_disable
callback function. Hence add a check to see if PMI pending bit
is set in Paca before clearing it via clear_pmi_pending.

Fixes: 2c9ac51b85 ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220122033429.25395-1-atrajeev@linux.vnet.ibm.com
2022-01-24 17:31:15 +11:00
Christophe Leroy
aec982603a powerpc/fixmap: Fix VM debug warning on unmap
Unmapping a fixmap entry is done by calling __set_fixmap()
with FIXMAP_PAGE_CLEAR as flags.

Today, powerpc __set_fixmap() calls map_kernel_page().

map_kernel_page() is not happy when called a second time
for the same page.

	WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0xc/0x1e8
	CPU: 0 PID: 1 Comm: swapper Not tainted 5.16.0-rc3-s3k-dev-01993-g350ff07feb7d-dirty #682
	NIP:  c0017cd4 LR: c00187f0 CTR: 00000010
	REGS: e1011d50 TRAP: 0700   Not tainted  (5.16.0-rc3-s3k-dev-01993-g350ff07feb7d-dirty)
	MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 42000208  XER: 00000000

	GPR00: c0165fec e1011e10 c14c0000 c0ee2550 ff800000 c0f3d000 00000000 c001686c
	GPR08: 00001000 b00045a9 00000001 c0f58460 c0f50000 00000000 c0007e10 00000000
	GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
	GPR24: 00000000 00000000 c0ee2550 00000000 c0f57000 00000ff8 00000000 ff800000
	NIP [c0017cd4] set_pte_at+0xc/0x1e8
	LR [c00187f0] map_kernel_page+0x9c/0x100
	Call Trace:
	[e1011e10] [c0736c68] vsnprintf+0x358/0x6c8 (unreliable)
	[e1011e30] [c0165fec] __set_fixmap+0x30/0x44
	[e1011e40] [c0c13bdc] early_iounmap+0x11c/0x170
	[e1011e70] [c0c06cb0] ioremap_legacy_serial_console+0x88/0xc0
	[e1011e90] [c0c03634] do_one_initcall+0x80/0x178
	[e1011ef0] [c0c0385c] kernel_init_freeable+0xb4/0x250
	[e1011f20] [c0007e34] kernel_init+0x24/0x140
	[e1011f30] [c0016268] ret_from_kernel_thread+0x5c/0x64
	Instruction dump:
	7fe3fb78 48019689 80010014 7c630034 83e1000c 5463d97e 7c0803a6 38210010
	4e800020 81250000 712a0001 41820008 <0fe00000> 9421ffe0 93e1001c 48000030

Implement unmap_kernel_page() which clears an existing pte.

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b0b752f6f6ecc60653e873f385c6f0dce4e9ab6a.1638789098.git.christophe.leroy@csgroup.eu
2022-01-24 17:29:05 +11:00
Linus Torvalds
dd81e1c7d5 powerpc fixes for 5.17 #2
- A series of bpf fixes, including an oops fix and some codegen fixes.
 
  - Fix a regression in syscall_get_arch() for compat processes.
 
  - Fix boot failure on some 32-bit systems with KASAN enabled.
 
  - A couple of other build/minor fixes.
 
 Thanks to: Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa, Johan Almbladh,
 Maxime Bizon, Naveen N. Rao, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHtOJgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLMeD/9eBHqMy+yhXfTw9mjAf8rVlsdY6Xjq
 2LYgHCJpeP0VjxBm8GcyFtAfJc778LyykU18xsi0WVsuwPmVdVwDfZpzxWJrYqeq
 6AuK9uRZAbEx6gZVCsWlk3Em7GyUEqHm5PTxKsDfbzh+hBTNQ8CcKKbCVmEZ+loA
 nWdaWJOZMpE4hQOx0Ph5JRAos/ZUEt2I57qZkRgvIEV+hroFrmdP1rnncdalnLHN
 pXsEn380Q+rpH+48HIpaTpl5+R8CvgFnnlI6/luZC6HoA8K6jph8FdOkkI9eU1v6
 TgdNMiNVIe1wRFzVp+nH7SGqL3l9X/UPS57hL52X+ccpHoioBW9CLPha06g/+UKT
 kJU3mYZg3jvQGrM9H/2R3P97UH9SwHE5SH0xcXj3ItH1NpOKpMO0h5UuWx+Ze8Wo
 zO80xbYLbALtEfx8JMCf95rnUVFL0n55xVJN+7+fs2hU3Be/ZpIyHfDujlHlbd1m
 o3KRhkg1a0RyGsun6Gd2vwLGaBbwYNWvgHM9IDeiAFdJ7HFpp1YrVGpQRkAIUbJS
 bzYiRczjbSof7c9WqLyKNatgNFO/JPI5vi1iNQ3u9Oh9fzjpv5Zok4vK534mOnqU
 QnhiqLJbY+yBpoAvSePGQTCCzDAnJq0r53n0d1O/2T1GcpX9MKM4YlL+Wu4m0yLn
 RcbOCTkmoqWZEw==
 =KjWT
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - A series of bpf fixes, including an oops fix and some codegen fixes.

 - Fix a regression in syscall_get_arch() for compat processes.

 - Fix boot failure on some 32-bit systems with KASAN enabled.

 - A couple of other build/minor fixes.

Thanks to Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa,
Johan Almbladh, Maxime Bizon, Naveen N. Rao, and Nicholas Piggin.

* tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Mask SRR0 before checking against the masked NIP
  powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64
  powerpc/32s: Fix kasan_init_region() for KASAN
  powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32
  powerpc/audit: Fix syscall_get_arch()
  powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
  tools/bpf: Rename 'struct event' to avoid naming conflict
  powerpc/bpf: Update ldimm64 instructions during extra pass
  powerpc32/bpf: Fix codegen for bpf-to-bpf calls
  bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
2022-01-23 17:52:42 +02:00
Linus Torvalds
3689f9f8b0 bitmap patches for 5.17-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
 b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
 A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
 iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
 m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
 9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
 MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
 nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
 CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
 5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
 =RKW4
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - introduce for_each_set_bitrange()

 - use find_first_*_bit() instead of find_next_*_bit() where possible

 - unify for_each_bit() macros

* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
  vsprintf: rework bitmap_list_string
  lib: bitmap: add performance test for bitmap_print_to_pagebuf
  bitmap: unify find_bit operations
  mm/percpu: micro-optimize pcpu_is_populated()
  Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  find: micro-optimize for_each_{set,clear}_bit()
  include/linux: move for_each_bit() macros from bitops.h to find.h
  cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  tools: sync tools/bitmap with mother linux
  all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
  cpumask: use find_first_and_bit()
  lib: add find_first_and_bit()
  arch: remove GENERIC_FIND_FIRST_BIT entirely
  include: move find.h from asm_generic to linux
  bitops: move find_bit_*_le functions from le.h to find.h
  bitops: protect find_first_{,zero}_bit properly
2022-01-23 06:20:44 +02:00
Muchun Song
359745d783 proc: remove PDE_DATA() completely
Remove PDE_DATA() completely and replace it with pde_data().

[akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c]
[akpm@linux-foundation.org: now fix it properly]

Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:37 +02:00
Linus Torvalds
75242f31db RTC for 5.17
New driver:
  - Sunplus SP7021 RTC
  - Nintendo GameCube, Wii and Wii U RTC
 
 Drivers:
  - cmos: refactor UIP handling and presence check, fix century
  - rs5c372: offset correction support, report low voltage
  - rv8803: Epson RX8804 support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmHp+jkACgkQY6TcMGxw
 OjJS+hAArVbXHT0ceVHY5mzqkBXCyVxhXdyLUw/VmedO9NvCTk1+O9VhBwKTRdQX
 FLuvPkyjZlkAk3CEfaHBp4u5AW3Xumdkuh9XwG8b3qRs3UGYBgKiNfXQFtOWHs7w
 VO+lD7+f2GNNTYbYH2dLPs6qZwVIN0rL7JTwUH7Po2mqBVztppSOYhmo3vmOqqux
 FI5jFVDoE8u+h359bIKcaax3/dne9m1uyD1WWxOJFRtY+apbECpUBfo1tmauAEXP
 gg+v7On+gboDqVe9/lwqB+xfKzWFJKwYIu6CM8+Mf/dxhtRNRXCgszoiCSFZHUwG
 GghD7Kb96xWuGX1pRe6xx5/7wixes1wCkIVGa5JQLb5E/GQ3O9y8D+Tdk6lyAP5m
 XUVPWGC/qByj6z9pBHtbHmq9lhd1vPHp3SU0ske1xlCQd3WnPq7bQ3MEw345gf4Z
 RGrtpnUPIfKzHWID+2zCzTj6TluW9FnnOjcm2U8jF/kwB+d1/H2O5xeWR7eQYbUJ
 BY5gjDRVIaM3aQC9QiiFMUorqv8Q3oE9FtjCSuLhUx6WEg4S0mtYsXtRUaLT7pRw
 RRsoeCJd8Abnf1Nl/imZ2IrRCcbNhzLAq2Aw8cHbiroAk8lSFAH2NzvcS8213JfG
 muOrIB2Q3XBu+6hq452ZDE1155FGWFLwKnAjvSRy5yRbsN79/YA=
 =nwD6
 -----END PGP SIGNATURE-----

Merge tag 'rtc-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "Two new drivers this cycle and a significant rework of the CMOS driver
  make the bulk of the changes.

  I also carry powerpc changes with the agreement of Michael.

  New drivers:
   - Sunplus SP7021 RTC
   - Nintendo GameCube, Wii and Wii U RTC

  Driver updates:
   - cmos: refactor UIP handling and presence check, fix century
   - rs5c372: offset correction support, report low voltage
   - rv8803: Epson RX8804 support"

* tag 'rtc-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (33 commits)
  rtc: sunplus: fix return value in sp_rtc_probe()
  rtc: cmos: Evaluate century appropriate
  rtc: gamecube: Fix an IS_ERR() vs NULL check
  rtc: mc146818-lib: fix signedness bug in mc146818_get_time()
  dt-bindings: rtc: qcom-pm8xxx-rtc: update register numbers
  rtc: pxa: fix null pointer dereference
  rtc: ftrtc010: Use platform_get_irq() to get the interrupt
  rtc: Move variable into switch case statement
  rtc: pcf2127: Fix typo in comment
  dt-bindings: rtc: Add Sunplus RTC json-schema
  rtc: Add driver for RTC in Sunplus SP7021
  rtc: rs5c372: fix incorrect oscillation value on r2221tl
  rtc: rs5c372: add offset correction support
  rtc: cmos: avoid UIP when writing alarm time
  rtc: cmos: avoid UIP when reading alarm time
  rtc: mc146818-lib: refactor mc146818_does_rtc_work
  rtc: mc146818-lib: refactor mc146818_get_time
  rtc: mc146818-lib: extract mc146818_avoid_UIP
  rtc: mc146818-lib: fix RTC presence check
  rtc: Check return value from mc146818_get_time()
  ...
2022-01-21 13:13:35 +02:00
Linus Torvalds
fa2e1ba3e9 Networking fixes for 5.17-rc1, including fixes from netfilter, bpf.
Current release - regressions:
 
  - fix memory leaks in the skb free deferral scheme if upper layer
    protocols are used, i.e. in-kernel TCP readers like TLS
 
 Current release - new code bugs:
 
  - nf_tables: fix NULL check typo in _clone() functions
 
  - change the default to y for Vertexcom vendor Kconfig
 
  - a couple of fixes to incorrect uses of ref tracking
 
  - two fixes for constifying netdev->dev_addr
 
 Previous releases - regressions:
 
  - bpf:
    - various verifier fixes mainly around register offset handling
      when passed to helper functions
    - fix mount source displayed for bpffs (none -> bpffs)
 
  - bonding:
    - fix extraction of ports for connection hash calculation
    - fix bond_xmit_broadcast return value when some devices are down
 
  - phy: marvell: add Marvell specific PHY loopback
 
  - sch_api: don't skip qdisc attach on ingress, prevent ref leak
 
  - htb: restore minimal packet size handling in rate control
 
  - sfp: fix high power modules without diagnostic monitoring
 
  - mscc: ocelot:
    - don't let phylink re-enable TX PAUSE on the NPI port
    - don't dereference NULL pointers with shared tc filters
 
  - smsc95xx: correct reset handling for LAN9514
 
  - cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
 
  - phy: micrel: use kszphy_suspend/_resume for irq aware devices,
    avoid races with the interrupt
 
 Previous releases - always broken:
 
  - xdp: check prog type before updating BPF link
 
  - smc: resolve various races around abnormal connection termination
 
  - sit: allow encapsulated IPv6 traffic to be delivered locally
 
  - axienet: fix init/reset handling, add missing barriers,
    read the right status words, stop queues correctly
 
  - add missing dev_put() in sock_timestamping_bind_phc()
 
 Misc:
 
  - ipv4: prevent accidentally passing RTO_ONLINK to
    ip_route_output_key_hash() by sanitizing flags
 
  - ipv4: avoid quadratic behavior in netns dismantle
 
  - stmmac: dwmac-oxnas: add support for OX810SE
 
  - fsl: xgmac_mdio: add workaround for erratum A-009885
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHoS14ACgkQMUZtbf5S
 IrtMQA/6AxhWuj2JsoNhvTzBCi4vkeo53rKU941bxOaST9Ow8dqDc7yAT8YeJU2B
 lGw6/pXx+Fm9twGsRkkQ0vX7piIk25vKzEwnlCYVVXLAnE+lPu9qFH49X1HO5Fwy
 K+frGDC524MrbJFb+UbZfJG4UitsyHoqc58Mp7ZNBe2gn12DcHotsiSJikzdd02F
 rzQZhvwRKsDS2prcIHdvVAxva380cn99mvaFqIPR9MemhWKOzVa3NfkiC3tSlhW/
 OphG3UuOfKCVdofYAO5/oXlVQcDKx0OD9Sr2q8aO0mlME0p0ounKz+LDcwkofaYQ
 pGeMY2pEAHujLyRewunrfaPv8/SIB/ulSPcyreoF28TTN20M+4onvgTHvVSyzLl7
 MA4kYH7tkPgOfbW8T573OFPdrqsy4WTrFPFovGqvDuiE8h65Pll/gTcAqsWjF/xw
 CmfmtICcsBwVGMLUzpUjKAWuB0/voa/sQUuQoxvQFsgCteuslm1suLY5EfSIhdu8
 nvhySJjPXRHicZQNflIwKTiOYYWls7yYVGe76u9hqjyD36peJXYjUjyyENIfLiFA
 0XclGIfSBMGWMGmxvGYIZDwGOKK0j+s0PipliXVjP2otLrPYUjma5Co37KW8SiSV
 9TT673FAXJNB0IJ7xiT7nRUZ/fjRrweP1glte/6d148J1Lf9MTQ=
 =XM4Y
 -----END PGP SIGNATURE-----

Merge tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, bpf.

  Quite a handful of old regression fixes but most of those are
  pre-5.16.

  Current release - regressions:

   - fix memory leaks in the skb free deferral scheme if upper layer
     protocols are used, i.e. in-kernel TCP readers like TLS

  Current release - new code bugs:

   - nf_tables: fix NULL check typo in _clone() functions

   - change the default to y for Vertexcom vendor Kconfig

   - a couple of fixes to incorrect uses of ref tracking

   - two fixes for constifying netdev->dev_addr

  Previous releases - regressions:

   - bpf:
      - various verifier fixes mainly around register offset handling
        when passed to helper functions
      - fix mount source displayed for bpffs (none -> bpffs)

   - bonding:
      - fix extraction of ports for connection hash calculation
      - fix bond_xmit_broadcast return value when some devices are down

   - phy: marvell: add Marvell specific PHY loopback

   - sch_api: don't skip qdisc attach on ingress, prevent ref leak

   - htb: restore minimal packet size handling in rate control

   - sfp: fix high power modules without diagnostic monitoring

   - mscc: ocelot:
      - don't let phylink re-enable TX PAUSE on the NPI port
      - don't dereference NULL pointers with shared tc filters

   - smsc95xx: correct reset handling for LAN9514

   - cpsw: avoid alignment faults by taking NET_IP_ALIGN into account

   - phy: micrel: use kszphy_suspend/_resume for irq aware devices,
     avoid races with the interrupt

  Previous releases - always broken:

   - xdp: check prog type before updating BPF link

   - smc: resolve various races around abnormal connection termination

   - sit: allow encapsulated IPv6 traffic to be delivered locally

   - axienet: fix init/reset handling, add missing barriers, read the
     right status words, stop queues correctly

   - add missing dev_put() in sock_timestamping_bind_phc()

  Misc:

   - ipv4: prevent accidentally passing RTO_ONLINK to
     ip_route_output_key_hash() by sanitizing flags

   - ipv4: avoid quadratic behavior in netns dismantle

   - stmmac: dwmac-oxnas: add support for OX810SE

   - fsl: xgmac_mdio: add workaround for erratum A-009885"

* tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
  ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
  ipv4: avoid quadratic behavior in netns dismantle
  net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
  powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
  dt-bindings: net: Document fsl,erratum-a009885
  net/fsl: xgmac_mdio: Add workaround for erratum A-009885
  net: mscc: ocelot: fix using match before it is set
  net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices
  net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
  nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
  net: axienet: increase default TX ring size to 128
  net: axienet: fix for TX busy handling
  net: axienet: fix number of TX ring slots for available check
  net: axienet: Fix TX ring slot available check
  net: axienet: limit minimum TX ring size
  net: axienet: add missing memory barriers
  net: axienet: reset core on initialization prior to MDIO access
  net: axienet: Wait for PhyRstCmplt after core reset
  net: axienet: increase reset timeout
  bpf, selftests: Add ringbuf memory type confusion test
  ...
2022-01-20 10:57:05 +02:00
Linus Torvalds
f4484d138b Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...
2022-01-20 10:41:01 +02:00
Kefeng Wang
20c0357646 mm: percpu: add generic pcpu_populate_pte() function
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to
populate pte, this patch adds a generic pcpu populate pte function,
pcpu_populate_pte(), which is marked __weak and used on most
architectures, but it is overridden on x86, which has its own
implementation.

Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Kefeng Wang
23f917169e mm: percpu: add generic pcpu_fc_alloc/free funciton
With the previous patch, we could add a generic pcpu first chunk
allocate and free function to cleanup the duplicated definations on each
architecture.

Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Kefeng Wang
1ca3fb3abd mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu
first chunk allocation will call it to alloc memblock on the
corresponding node by it, this is prepare for the next patch.

Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Kefeng Wang
7ecd19cfdf mm: percpu: generalize percpu related config
Patch series "mm: percpu: Cleanup percpu first chunk function".

When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator.  This patchset is aimed to cleanup them and should no function
change.

The currently supported status about 'embed' and 'page' in Archs shows
below,

	embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
	page:  NEED_PER_CPU_EMBED_FIRST_CHUNK

		embed	page
	------------------------
	arm64	  Y	 Y
	mips	  Y	 N
	powerpc	  Y	 Y
	riscv	  Y	 N
	sparc	  Y	 Y
	x86	  Y	 Y
	------------------------

There are two interfaces about percpu first chunk allocator,

 extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
                                size_t atom_size,
                                pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
-                               pcpu_fc_alloc_fn_t alloc_fn,
-                               pcpu_fc_free_fn_t free_fn);
+                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);

 extern int __init pcpu_page_first_chunk(size_t reserved_size,
-                               pcpu_fc_alloc_fn_t alloc_fn,
-                               pcpu_fc_free_fn_t free_fn,
-                               pcpu_fc_populate_pte_fn_t populate_pte_fn);
+                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);

The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().

1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
   provided when archs supported NUMA.

2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
   a generic pcpu_populate_pte() which marked '__weak' is provided, if you
   need a different function to populate pte on the arch(like x86), please
   provide its own implementation.

[1] https://github.com/kevin78/linux.git percpu-cleanup

This patch (of 4):

The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.

Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.

Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Tobias Waldekranz
0d375d610f powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
This block is used in (at least) T1024 and T1040, including their
variants like T1023 etc.

Fixes: d55ad2967d ("powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19 08:14:18 -08:00
Linus Torvalds
fd6f57bfda Kbuild updates for v5.17
- Add new kconfig target 'make mod2noconfig', which will be useful to
    speed up the build and test iteration.
 
  - Raise the minimum supported version of LLVM to 11.0.0
 
  - Refactor certs/Makefile
 
  - Change the format of include/config/auto.conf to stop double-quoting
    string type CONFIG options.
 
  - Fix ARCH=sh builds in dash
 
  - Separate compression macros for general purposes (cmd_bzip2 etc.) and
    the ones for decompressors (cmd_bzip2_with_size etc.)
 
  - Misc Makefile cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmHnFNIVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGiQEP/1tkt9IHP7vFvkN9xChQI8HQ7HOC
 mPIxBAUzHIp1V2IALb0lfojjnpkzcMNpJZVlmqjgyYShLEPPBFwKVXs1War6GViX
 aprUMz7w1zR/vZJ2fplFmrkNwSxNp3+LSE6sHVmsliS4Vfzh7CjHb8DnaKjBvQLZ
 M+eQugjHsWI3d3E81/qtRG5EaVs6q8osF3b0Km59mrESWVYKqwlUP3aUyQCCUGFK
 mI+zC4SrHH6EAIZd//VpaleXxVtDcjjadb7Iru5MFhFdCBIRoSC3d1IWPUNUKNnK
 i0ocDXuIoAulA/mROgrpyAzLXg10qYMwwTmX+tplkHA055gKcY/v4aHym6ypH+TX
 6zd34UMTLM32LSjs8hssiQT8BiZU0uZoa/m2E9IBaiExA2sTsRZxgQMKXFFaPQJl
 jn4cRiG0K1NDeRKtq4xh2WO46OS4sPlR6zW9EXDEsS/bI05Y7LpUz7Flt6iA2Mq3
 0g8uYIYr/9drl96X83tFgTkxxB6lpB29tbsmsrKJRGxvrCDnAhXlXhPCkMajkm2Q
 PjJfNtMFzwemSZWq09+F+X5BgCjzZtroOdFI9FTMNhGWyaUJZXCtcXQ6UTIKnTHO
 cDjcURvh+l56eNEQ5SMTNtAkxB+pX8gPUmyO1wLwRUT4YodxylkTUXGyBBR9tgTn
 Yks1TnPD06ld364l
 =8BQf
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add new kconfig target 'make mod2noconfig', which will be useful to
   speed up the build and test iteration.

 - Raise the minimum supported version of LLVM to 11.0.0

 - Refactor certs/Makefile

 - Change the format of include/config/auto.conf to stop double-quoting
   string type CONFIG options.

 - Fix ARCH=sh builds in dash

 - Separate compression macros for general purposes (cmd_bzip2 etc.) and
   the ones for decompressors (cmd_bzip2_with_size etc.)

 - Misc Makefile cleanups

* tag 'kbuild-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
  kbuild: add cmd_file_size
  arch: decompressor: remove useless vmlinux.bin.all-y
  kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}
  kbuild: drop $(size_append) from cmd_zstd
  sh: rename suffix-y to suffix_y
  doc: kbuild: fix default in `imply` table
  microblaze: use built-in function to get CPU_{MAJOR,MINOR,REV}
  certs: move scripts/extract-cert to certs/
  kbuild: do not quote string values in include/config/auto.conf
  kbuild: do not include include/config/auto.conf from shell scripts
  certs: simplify $(srctree)/ handling and remove config_filename macro
  kbuild: stop using config_filename in scripts/Makefile.modsign
  certs: remove misleading comments about GCC PR
  certs: refactor file cleaning
  certs: remove unneeded -I$(srctree) option for system_certificates.o
  certs: unify duplicated cmd_extract_certs and improve the log
  certs: use $< and $@ to simplify the key generation rule
  kbuild: remove headers_check stub
  kbuild: move headers_check.pl to usr/include/
  certs: use if_changed to re-generate the key when the key type is changed
  ...
2022-01-19 11:15:19 +02:00
Nicholas Piggin
aee101d7b9 powerpc/64s: Mask SRR0 before checking against the masked NIP
Commit 314f6c23dd ("powerpc/64s: Mask NIP before checking against
SRR0") masked off the low 2 bits of the NIP value in the interrupt
stack frame in case they are non-zero and mis-compare against a SRR0
register value of a CPU which always reads back 0 from the 2 low bits
which are reserved.

This now causes the opposite problem that an implementation which does
implement those bits in SRR0 will mis-compare against the masked NIP
value in which they have been cleared. QEMU is one such implementation,
and this is allowed by the architecture.

This can be triggered by sigfuz by setting low bits of PT_NIP in the
signal context.

Fix this for now by masking the SRR0 bits as well. Cleaner is probably
to sanitise these values before putting them in registers or stack, but
this is the quick and backportable fix.

Fixes: 314f6c23dd ("powerpc/64s: Mask NIP before checking against SRR0")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220117134403.2995059-1-npiggin@gmail.com
2022-01-18 10:25:18 +11:00
Athira Rajeev
429a64f6e9 powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64
power_pmu_wants_prompt_pmi() is used to decide if PMIs should be taken
promptly. This is valid only for ppc64 and is used only if
CONFIG_PPC_BOOK3S_64=y. Hence include the function under config check
for PPC64.

Fixes warning for 32-bit compilation:

  arch/powerpc/perf/core-book3s.c:2455:6: warning: no previous prototype for 'power_pmu_wants_prompt_pmi'
    2455 | bool power_pmu_wants_prompt_pmi(void)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 5a7745b96f ("powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move inside existing CONFIG_PPC64 ifdef block]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220114031355.87480-1-atrajeev@linux.vnet.ibm.com
2022-01-17 15:04:13 +11:00
Linus Torvalds
35ce8ae9ae Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal/exit/ptrace updates from Eric Biederman:
 "This set of changes deletes some dead code, makes a lot of cleanups
  which hopefully make the code easier to follow, and fixes bugs found
  along the way.

  The end-game which I have not yet reached yet is for fatal signals
  that generate coredumps to be short-circuit deliverable from
  complete_signal, for force_siginfo_to_task not to require changing
  userspace configured signal delivery state, and for the ptrace stops
  to always happen in locations where we can guarantee on all
  architectures that the all of the registers are saved and available on
  the stack.

  Removal of profile_task_ext, profile_munmap, and profile_handoff_task
  are the big successes for dead code removal this round.

  A bunch of small bug fixes are included, as most of the issues
  reported were small enough that they would not affect bisection so I
  simply added the fixes and did not fold the fixes into the changes
  they were fixing.

  There was a bug that broke coredumps piped to systemd-coredump. I
  dropped the change that caused that bug and replaced it entirely with
  something much more restrained. Unfortunately that required some
  rebasing.

  Some successes after this set of changes: There are few enough calls
  to do_exit to audit in a reasonable amount of time. The lifetime of
  struct kthread now matches the lifetime of struct task, and the
  pointer to struct kthread is no longer stored in set_child_tid. The
  flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
  removed. Issues where task->exit_code was examined with
  signal->group_exit_code should been examined were fixed.

  There are several loosely related changes included because I am
  cleaning up and if I don't include them they will probably get lost.

  The original postings of these changes can be found at:
     https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org

  I trimmed back the last set of changes to only the obviously correct
  once. Simply because there was less time for review than I had hoped"

* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
  ptrace/m68k: Stop open coding ptrace_report_syscall
  ptrace: Remove unused regs argument from ptrace_report_syscall
  ptrace: Remove second setting of PT_SEIZED in ptrace_attach
  taskstats: Cleanup the use of task->exit_code
  exit: Use the correct exit_code in /proc/<pid>/stat
  exit: Fix the exit_code for wait_task_zombie
  exit: Coredumps reach do_group_exit
  exit: Remove profile_handoff_task
  exit: Remove profile_task_exit & profile_munmap
  signal: clean up kernel-doc comments
  signal: Remove the helper signal_group_exit
  signal: Rename group_exit_task group_exec_task
  coredump: Stop setting signal->group_exit_task
  signal: Remove SIGNAL_GROUP_COREDUMP
  signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
  signal: Make coredump handling explicit in complete_signal
  signal: Have prepare_signal detect coredumps using signal->core_state
  signal: Have the oom killer detect coredumps using signal->core_state
  exit: Move force_uaccess back into do_exit
  exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
  ...
2022-01-17 05:49:30 +02:00
Linus Torvalds
79e06c4c49 RISCV:
- Use common KVM implementation of MMU memory caches
 
 - SBI v0.2 support for Guest
 
 - Initial KVM selftests support
 
 - Fix to avoid spurious virtual interrupts after clearing hideleg CSR
 
 - Update email address for Anup and Atish
 
 ARM:
 - Simplification of the 'vcpu first run' by integrating it into
   KVM's 'pid change' flow
 
 - Refactoring of the FP and SVE state tracking, also leading to
   a simpler state and less shared data between EL1 and EL2 in
   the nVHE case
 
 - Tidy up the header file usage for the nvhe hyp object
 
 - New HYP unsharing mechanism, finally allowing pages to be
   unmapped from the Stage-1 EL2 page-tables
 
 - Various pKVM cleanups around refcounting and sharing
 
 - A couple of vgic fixes for bugs that would trigger once
   the vcpu xarray rework is merged, but not sooner
 
 - Add minimal support for ARMv8.7's PMU extension
 
 - Rework kvm_pgtable initialisation ahead of the NV work
 
 - New selftest for IRQ injection
 
 - Teach selftests about the lack of default IPA space and
   page sizes
 
 - Expand sysreg selftest to deal with Pointer Authentication
 
 - The usual bunch of cleanups and doc update
 
 s390:
 - fix sigp sense/start/stop/inconsistency
 
 - cleanups
 
 x86:
 - Clean up some function prototypes more
 
 - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation
 
 - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
 
 - completely remove potential TOC/TOU races in nested SVM consistency checks
 
 - update some PMCs on emulated instructions
 
 - Intel AMX support (joint work between Thomas and Intel)
 
 - large MMU cleanups
 
 - module parameter to disable PMU virtualization
 
 - cleanup register cache
 
 - first part of halt handling cleanups
 
 - Hyper-V enlightened MSR bitmap support for nested hypervisors
 
 Generic:
 - clean up Makefiles
 
 - introduce CONFIG_HAVE_KVM_DIRTY_RING
 
 - optimize memslot lookup using a tree
 
 - optimize vCPU array usage by converting to xarray
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHhxvsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPZkAf+Nz92UL/5nNGcdHtE4m7AToMmitE9
 bYkesf9BMQvAe5wjkABLuoHGi6ay4jabo4fiGzbdkiK7lO5YgfsWiMB3/MT5fl4E
 jRPzaVQabp3YZLM8UYCBmfUVuRj524S967SfSRe0AvYjDEH8y7klPf4+7sCsFT0/
 Px9Vf2KGuOlf0eM78yKg4rGaF0jS22eLgXm6FfNMY8/e29ZAo/jyUmqBY+Z2xxZG
 aWhceDtSheW1jwLHLj3nOlQJvHTn8LVGXBE/R8Gda3ZjrBV2rKaDi4Fh+HD+dz86
 2zVXwzQ7uck2CMW73GMoXMTWoKSHMyvlBOs1BdvBm4UsnGcXR+q8IFCeuQ==
 =s73m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "RISCV:

   - Use common KVM implementation of MMU memory caches

   - SBI v0.2 support for Guest

   - Initial KVM selftests support

   - Fix to avoid spurious virtual interrupts after clearing hideleg CSR

   - Update email address for Anup and Atish

  ARM:

   - Simplification of the 'vcpu first run' by integrating it into KVM's
     'pid change' flow

   - Refactoring of the FP and SVE state tracking, also leading to a
     simpler state and less shared data between EL1 and EL2 in the nVHE
     case

   - Tidy up the header file usage for the nvhe hyp object

   - New HYP unsharing mechanism, finally allowing pages to be unmapped
     from the Stage-1 EL2 page-tables

   - Various pKVM cleanups around refcounting and sharing

   - A couple of vgic fixes for bugs that would trigger once the vcpu
     xarray rework is merged, but not sooner

   - Add minimal support for ARMv8.7's PMU extension

   - Rework kvm_pgtable initialisation ahead of the NV work

   - New selftest for IRQ injection

   - Teach selftests about the lack of default IPA space and page sizes

   - Expand sysreg selftest to deal with Pointer Authentication

   - The usual bunch of cleanups and doc update

  s390:

   - fix sigp sense/start/stop/inconsistency

   - cleanups

  x86:

   - Clean up some function prototypes more

   - improved gfn_to_pfn_cache with proper invalidation, used by Xen
     emulation

   - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery

   - completely remove potential TOC/TOU races in nested SVM consistency
     checks

   - update some PMCs on emulated instructions

   - Intel AMX support (joint work between Thomas and Intel)

   - large MMU cleanups

   - module parameter to disable PMU virtualization

   - cleanup register cache

   - first part of halt handling cleanups

   - Hyper-V enlightened MSR bitmap support for nested hypervisors

  Generic:

   - clean up Makefiles

   - introduce CONFIG_HAVE_KVM_DIRTY_RING

   - optimize memslot lookup using a tree

   - optimize vCPU array usage by converting to xarray"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits)
  x86/fpu: Fix inline prefix warnings
  selftest: kvm: Add amx selftest
  selftest: kvm: Move struct kvm_x86_state to header
  selftest: kvm: Reorder vcpu_load_state steps for AMX
  kvm: x86: Disable interception for IA32_XFD on demand
  x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
  kvm: selftests: Add support for KVM_CAP_XSAVE2
  kvm: x86: Add support for getting/setting expanded xstate buffer
  x86/fpu: Add uabi_size to guest_fpu
  kvm: x86: Add CPUID support for Intel AMX
  kvm: x86: Add XCR0 support for Intel AMX
  kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
  kvm: x86: Emulate IA32_XFD_ERR for guest
  kvm: x86: Intercept #NM for saving IA32_XFD_ERR
  x86/fpu: Prepare xfd_err in struct fpu_guest
  kvm: x86: Add emulation for IA32_XFD
  x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
  kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
  x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
  x86/fpu: Add guest support to xfd_enable_feature()
  ...
2022-01-16 16:15:14 +02:00
Christophe Leroy
d37823c352 powerpc/32s: Fix kasan_init_region() for KASAN
It has been reported some configuration where the kernel doesn't
boot with KASAN enabled.

This is due to wrong BAT allocation for the KASAN area:

	---[ Data Block Address Translation ]---
	0: 0xc0000000-0xcfffffff 0x00000000       256M Kernel rw      m
	1: 0xd0000000-0xdfffffff 0x10000000       256M Kernel rw      m
	2: 0xe0000000-0xefffffff 0x20000000       256M Kernel rw      m
	3: 0xf8000000-0xf9ffffff 0x2a000000        32M Kernel rw      m
	4: 0xfa000000-0xfdffffff 0x2c000000        64M Kernel rw      m

A BAT must have both virtual and physical addresses alignment matching
the size of the BAT. This is not the case for BAT 4 above.

Fix kasan_init_region() by using block_size() function that is in
book3s32/mmu.c. To be able to reuse it here, make it non static and
change its name to bat_block_size() in order to avoid name conflict
with block_size() defined in <linux/blkdev.h>

Also reuse find_free_bat() to avoid an error message from setbat()
when no BAT is available.

And allocate memory outside of linear memory mapping to avoid
wasting that precious space.

With this change we get correct alignment for BATs and KASAN shadow
memory is allocated outside the linear memory space.

	---[ Data Block Address Translation ]---
	0: 0xc0000000-0xcfffffff 0x00000000       256M Kernel rw
	1: 0xd0000000-0xdfffffff 0x10000000       256M Kernel rw
	2: 0xe0000000-0xefffffff 0x20000000       256M Kernel rw
	3: 0xf8000000-0xfbffffff 0x7c000000        64M Kernel rw
	4: 0xfc000000-0xfdffffff 0x7a000000        32M Kernel rw

Fixes: 7974c47326 ("powerpc/32s: Implement dedicated kasan_init_region()")
Cc: stable@vger.kernel.org
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7a50ef902494d1325227d47d33dada01e52e5518.1641818726.git.christophe.leroy@csgroup.eu
2022-01-16 20:51:05 +11:00
Christophe Leroy
87b9d74fb0 powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32
CC      arch/powerpc/kernel/time.o
	In file included from <command-line>:
	./arch/powerpc/include/asm/hw_irq.h: In function 'do_hard_irq_enable':
	././include/linux/compiler_types.h:335:45: error: call to '__compiletime_assert_35' declared with attribute error: BUILD_BUG failed
	  335 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	      |                                             ^
	././include/linux/compiler_types.h:316:25: note: in definition of macro '__compiletime_assert'
	  316 |                         prefix ## suffix();                             \
	      |                         ^~~~~~
	././include/linux/compiler_types.h:335:9: note: in expansion of macro '_compiletime_assert'
	  335 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	      |         ^~~~~~~~~~~~~~~~~~~
	./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
	   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
	      |                                     ^~~~~~~~~~~~~~~~~~
	./include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
	   59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
	      |                     ^~~~~~~~~~~~~~~~
	./arch/powerpc/include/asm/hw_irq.h:483:9: note: in expansion of macro 'BUILD_BUG'
	  483 |         BUILD_BUG();
	      |         ^~~~~~~~~

should_hard_irq_enable() returns false on PPC32 so this BUILD_BUG() shouldn't trigger.

Force inlining of should_hard_irq_enable()

Fixes: 0faf20a1ad ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/247e01e0e10f4dbc59b5ff89e81702eb1ee7641e.1641828571.git.christophe.leroy@csgroup.eu
2022-01-16 20:50:20 +11:00
Linus Torvalds
f56caedaf9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
2022-01-15 20:37:06 +02:00
Yury Norov
b5c7e7ec7d all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
find_first{,_zero}_bit is a more effective analogue of 'next' version if
start == 0. This patch replaces 'next' with 'first' where things look
trivial.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2022-01-15 08:47:31 -08:00
Yury Norov
47d8c15615 include: move find.h from asm_generic to linux
find_bit API and bitmap API are closely related, but inclusion paths
are different - include/asm-generic and include/linux, correspondingly.
In the past it made a lot of troubles due to circular dependencies
and/or undefined symbols. Fix this by moving find.h under include/linux.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2022-01-15 08:47:31 -08:00
Aneesh Kumar K.V
21b084fdf2 mm/mempolicy: wire up syscall set_mempolicy_home_node
Link: https://lkml.kernel.org/r/20211202123810.267175-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:30 +02:00
Qi Zheng
36ef159f44 mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit
Since commit 4064b98270 ("mm: allow VM_FAULT_RETRY for multiple
times") allowed VM_FAULT_RETRY for multiple times, the
FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page
fault path, so the following check is no longer needed:

	flags & FAULT_FLAG_ALLOW_RETRY

So just remove it.

[akpm@linux-foundation.org: coding style fixes]

Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Peter Xu <peterx@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:27 +02:00
Christophe Leroy
252745240b powerpc/audit: Fix syscall_get_arch()
Commit 770cec16cd ("powerpc/audit: Simplify syscall_get_arch()")
and commit 898a1ef06a ("powerpc/audit: Avoid unneccessary #ifdef
in syscall_get_arguments()")
replaced test_tsk_thread_flag(task, TIF_32BIT)) by is_32bit_task().

But is_32bit_task() applies on current task while be want the test
done on task 'task'

So add a new macro is_tsk_32bit_task() to check any task.

Fixes: 770cec16cd ("powerpc/audit: Simplify syscall_get_arch()")
Fixes: 898a1ef06a ("powerpc/audit: Avoid unneccessary #ifdef in syscall_get_arguments()")
Cc: stable@vger.kernel.org
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c55cddb8f65713bf5859ed675d75a50cb37d5995.1642159570.git.christophe.leroy@csgroup.eu
2022-01-15 12:21:26 +11:00
Naveen N. Rao
3f5f766d5f powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
Johan reported the below crash with test_bpf on ppc64 e5500:

  test_bpf: #296 ALU_END_FROM_LE 64: 0x0123456789abcdef -> 0x67452301 jited:1
  Oops: Exception in kernel mode, sig: 4 [#1]
  BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
  Modules linked in: test_bpf(+)
  CPU: 0 PID: 76 Comm: insmod Not tainted 5.14.0-03771-g98c2059e008a-dirty #1
  NIP:  8000000000061c3c LR: 80000000006dea64 CTR: 8000000000061c18
  REGS: c0000000032d3420 TRAP: 0700   Not tainted (5.14.0-03771-g98c2059e008a-dirty)
  MSR:  0000000080089000 <EE,ME>  CR: 88002822  XER: 20000000 IRQMASK: 0
  <...>
  NIP [8000000000061c3c] 0x8000000000061c3c
  LR [80000000006dea64] .__run_one+0x104/0x17c [test_bpf]
  Call Trace:
   .__run_one+0x60/0x17c [test_bpf] (unreliable)
   .test_bpf_init+0x6a8/0xdc8 [test_bpf]
   .do_one_initcall+0x6c/0x28c
   .do_init_module+0x68/0x28c
   .load_module+0x2460/0x2abc
   .__do_sys_init_module+0x120/0x18c
   .system_call_exception+0x110/0x1b8
   system_call_common+0xf0/0x210
  --- interrupt: c00 at 0x101d0acc
  <...>
  ---[ end trace 47b2bf19090bb3d0 ]---

  Illegal instruction

The illegal instruction turned out to be 'ldbrx' emitted for
BPF_FROM_[L|B]E, which was only introduced in ISA v2.06. Guard use of
the same and implement an alternative approach for older processors.

Fixes: 156d0e290e ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Reported-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d1e51c6fdf572062cf3009a751c3406bda01b832.1641468127.git.naveen.n.rao@linux.vnet.ibm.com
2022-01-15 12:21:25 +11:00
Naveen N. Rao
f9320c4999 powerpc/bpf: Update ldimm64 instructions during extra pass
These instructions are updated after the initial JIT, so redo codegen
during the extra pass. Rename bpf_jit_fixup_subprog_calls() to clarify
that this is more than just subprog calls.

Fixes: 69c087ba62 ("bpf: Add bpf_for_each_map_elem() helper")
Cc: stable@vger.kernel.org # v5.15
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7cc162af77ba918eb3ecd26ec9e7824bc44b1fae.1641468127.git.naveen.n.rao@linux.vnet.ibm.com
2022-01-15 12:21:24 +11:00
Naveen N. Rao
fab07611fb powerpc32/bpf: Fix codegen for bpf-to-bpf calls
Pad instructions emitted for BPF_CALL so that the number of instructions
generated does not change for different function addresses. This is
especially important for calls to other bpf functions, whose address
will only be known during extra pass.

Fixes: 51c66ad849 ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/52d8fe51f7620a6f27f377791564d79d75463576.1641468127.git.naveen.n.rao@linux.vnet.ibm.com
2022-01-15 12:21:24 +11:00
Linus Torvalds
29ec39fcf1 powerpc updates for 5.17
- Optimise radix KVM guest entry/exit by 2x on Power9/Power10.
 
  - Allow firmware to tell us whether to disable the entry and uaccess flushes on Power10
    or later CPUs.
 
  - Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits.
 
  - Several fixes and improvements to our hard lockup watchdog.
 
  - Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit.
 
  - Allow building the 64-bit Book3S kernel without hash MMU support, ie. Radix only.
 
  - Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit).
 
  - Add new encodings for perf_mem_data_src.mem_hops field, and use them on Power10.
 
  - A series of small performance improvements to 64-bit interrupt entry.
 
  - Several commits fixing issues when building with the clang integrated assembler.
 
  - Many other small features and fixes.
 
 Thanks to: Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell, Arnd Bergmann,
 Athira Rajeev, Cédric Le Goater, Christophe JAILLET, Christophe Leroy, Christoph Hellwig,
 Daniel Axtens, David Yang, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren,
 Hari Bathini, Jason Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent
 Dufour, Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh Kamboju,
 Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child, Oliver O'Halloran, Peiwei
 Hu, Randy Dunlap, Ravi Bangoria, Rob Herring, Russell Currey, Sachin Sant, Sean
 Christopherson, Segher Boessenkool, Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang
 wangx, Yang Guang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHhVFMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgKwzD/9UUEZzWyzMVRJvP9FPZByN2M8czxHJ
 tuqEuVqnfks8ad8tfm2ebng5t8ZuVASBQU2fpPA1+lpdvprgZN5RFGMRh729vskn
 2aHQPmFvFObNbXOgCoXzk+C5xYi3zoRMVM968neSPBneYo+xDicn/zN5CHAgsjhX
 +baemJQ7/xzwLiZgTHe8fWw3nTk3IbPBpha59SdTvR8Moy6I4O8CDPIYEm3U3/J3
 x14ZRETqjksL7YOzEBk0avm1dDZRw/johz29oRYSmCj7dyy5OqrkPwokJiRY90eA
 1lVdofDc0zElaSWkVGzKdSWRUIXjKIVdtejvDeEvl6H/mI6q4TVZE8rFmn+3Rvgf
 9q0iKtmw5Kn11cqgY/pgEGmxnQtIdAodNfI/t939E7+O5LbcznuYUiy0J/kTD/vl
 Xduotg2dsCI+5ukf1wrk2wt9LhqZL+ziOeaBhyDM4orV8T3HBYL6zWBptun//IGO
 lK6TvvCHSYnGqY4bnrAmiOnbbEtnP6nN3zbcXgSvPM0wCRHPIEqd0NRXtfISo32d
 vBPq1neXWo4wrRJj9X3yOuP+5fEA4I+hB3yrCJOkcEcz+8NhlboQXU7raVsJL+bd
 kze75H8hwX7kE71oJFFl13LbSNABgiLFARTBXKfvdQA2iLdR0Snvm+OouvwWRPo/
 Po7Nm3zqdLc/1A==
 =BxhQ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Optimise radix KVM guest entry/exit by 2x on Power9/Power10.

 - Allow firmware to tell us whether to disable the entry and uaccess
   flushes on Power10 or later CPUs.

 - Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits.

 - Several fixes and improvements to our hard lockup watchdog.

 - Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit.

 - Allow building the 64-bit Book3S kernel without hash MMU support, ie.
   Radix only.

 - Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit).

 - Add new encodings for perf_mem_data_src.mem_hops field, and use them
   on Power10.

 - A series of small performance improvements to 64-bit interrupt entry.

 - Several commits fixing issues when building with the clang integrated
   assembler.

 - Many other small features and fixes.

Thanks to Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell,
Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christophe JAILLET,
Christophe Leroy, Christoph Hellwig, Daniel Axtens, David Yang, Erhard
Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren, Hari Bathini, Jason
Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour,
Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh
Kamboju, Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child,
Oliver O'Halloran, Peiwei Hu, Randy Dunlap, Ravi Bangoria, Rob Herring,
Russell Currey, Sachin Sant, Sean Christopherson, Segher Boessenkool,
Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang wangx, and Yang
Guang.

* tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (240 commits)
  powerpc/xmon: Dump XIVE information for online-only processors.
  powerpc/opal: use default_groups in kobj_type
  powerpc/cacheinfo: use default_groups in kobj_type
  powerpc/sched: Remove unused TASK_SIZE_OF
  powerpc/xive: Add missing null check after calling kmalloc
  powerpc/floppy: Remove usage of the deprecated "pci-dma-compat.h" API
  selftests/powerpc: Add a test of sigreturning to an unaligned address
  powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings
  powerpc/64s: Mask NIP before checking against SRR0
  powerpc/perf: Fix spelling of "its"
  powerpc/32: Fix boot failure with GCC latent entropy plugin
  powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests
  powerpc/code-patching: Move code patching selftests in its own file
  powerpc/code-patching: Move instr_is_branch_{i/b}form() in code-patching.h
  powerpc/code-patching: Move patch_exception() outside code-patching.c
  powerpc/code-patching: Use test_trampoline for prefixed patch test
  powerpc/code-patching: Fix patch_branch() return on out-of-range failure
  powerpc/code-patching: Reorganise do_patch_instruction() to ease error handling
  powerpc/code-patching: Fix unmap_patch_area() error handling
  powerpc/code-patching: Fix error handling in do_patch_instruction()
  ...
2022-01-14 15:17:26 +01:00
Linus Torvalds
feb7a43de5 Rework of the MSI interrupt infrastructure:
Treewide cleanup and consolidation of MSI interrupt handling in
   preparation for further changes in this area which are necessary to:
 
   - address existing shortcomings in the VFIO area
 
   - support the upcoming Interrupt Message Store functionality which
     decouples the message store from the PCI config/MMIO space
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmHf+SETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobzGD/wNEFl5qQo5mNZ9thP6JSJFOItm7zMc
 2QgzCYOqNwAv4jL6Dqo+EHtbShYqDyWzKdKccgqNjmdIqgW8q7/fubN1OPzRsClV
 CZG997AsXDGXYlQcE3tXZjkeCWnWEE2AGLnygSkFV1K/r9ALAtFfTBJAWB+UD+Zc
 1P8Kxo0q0Jg+DQAMAA5bWfSSjo/Pmpr/1AFjY7+GA8BBeJJgWOyW7H1S+GYEWVOE
 RaQP81Sbd6x1JkopxkNqSJ/lbNJfnPJxi2higB56Y0OYn5CuSarYbZUM7oQ2V61t
 jN7pcEEvTpjLd6SJ93ry8WOcJVMTbccCklVfD0AfEwwGUGw2VM6fSyNrZfnrosUN
 tGBEO8eflBJzGTAwSkz1EhiGKna4o1NBDWpr0sH2iUiZC5G6V2hUDbM+0PQJhDa8
 bICwguZElcUUPOprwjS0HXhymnxghTmNHyoEP1yxGoKLTrwIqkH/9KGustWkcBmM
 hNtOCwQNqxcOHg/r3MN0KxttTASgoXgNnmFliAWA7XwseRpLWc95XPQFa5sptRhc
 EzwumEz17EW1iI5/NyZQcY+jcZ9BdgCqgZ9ECjZkyN4U+9G6iACUkxVaHUUs77jl
 a0ISSEHEvJisFOsOMYyFfeWkpIKGIKP/bpLOJEJ6kAdrUWFvlRGF3qlav3JldXQl
 ypFjPapDeB5guw==
 =vKzd
 -----END PGP SIGNATURE-----

Merge tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull MSI irq updates from Thomas Gleixner:
 "Rework of the MSI interrupt infrastructure.

  This is a treewide cleanup and consolidation of MSI interrupt handling
  in preparation for further changes in this area which are necessary
  to:

   - address existing shortcomings in the VFIO area

   - support the upcoming Interrupt Message Store functionality which
     decouples the message store from the PCI config/MMIO space"

* tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits)
  genirq/msi: Populate sysfs entry only once
  PCI/MSI: Unbreak pci_irq_get_affinity()
  genirq/msi: Convert storage to xarray
  genirq/msi: Simplify sysfs handling
  genirq/msi: Add abuse prevention comment to msi header
  genirq/msi: Mop up old interfaces
  genirq/msi: Convert to new functions
  genirq/msi: Make interrupt allocation less convoluted
  platform-msi: Simplify platform device MSI code
  platform-msi: Let core code handle MSI descriptors
  bus: fsl-mc-msi: Simplify MSI descriptor handling
  soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs()
  soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation
  NTB/msi: Convert to msi_on_each_desc()
  PCI: hv: Rework MSI handling
  powerpc/mpic_u3msi: Use msi_for_each-desc()
  powerpc/fsl_msi: Use msi_for_each_desc()
  powerpc/pasemi/msi: Convert to msi_on_each_dec()
  powerpc/cell/axon_msi: Convert to msi_on_each_desc()
  powerpc/4xx/hsta: Rework MSI handling
  ...
2022-01-13 09:05:29 -08:00
Linus Torvalds
4eb766f64d Devicetree updates for v5.17:
Bindings:
 - DT schema conversions for Samsung clocks, RNG bindings, Qcom Command
   DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl, Tegra I2C
   and BPMP, pwm-vibrator, Arm DSU, and Cadence macb
 
 - DT schema conversions for Broadcom platforms: interrupt controllers,
   STB GPIO, STB waketimer, STB reset, iProc MDIO mux, iProc PCIe,
   Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON, SYSTEMPORT, AMAC,
   Northstar 2 PCIe PHY, GENET, moca PHY, GISB arbiter, and SATA
 
 - Add binding schemas for Tegra210 EMC table, TI DC-DC converters,
 
 - Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues
 
 - More fixes due to 'unevaluatedProperties' enabling
 
 - Data type fixes and clean-ups of binding examples found in preparation
   to move to validating DTB files directly (instead of intermediate YAML
   representation.
 
 - Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus
 
 - Add various new compatible strings
 
 DT core:
 - Silence a warning for overlapping reserved memory regions
 
 - Reimplement unittest overlay tracking
 
 - Fix stack frame size warning in unittest
 
 - Clean-ups of early FDT scanning functions
 
 - Fix handling of "linux,usable-memory-range" on EFI booted systems
 
 - Add support for 'fail' status on CPU nodes
 
 - Improve error message in of_phandle_iterator_next()
 
 - kbuild: Disable duplicate unit-address warnings for disabled nodes
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmHfCdcQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw+UZD/0ZMQQ6VF20MW7Gg0bOutd8Q6Q6opjrCG5c
 nLW5mv8Q+um3sI1ZpwdMI4zAfCmTfeL13ZM9KtJKlJ0o41bgId+kZsezy4I2rN9+
 sE1CwA4TninKTJsUkmyQX4fgJRUZ95Eubryfb07sy7nbK3LZQ+t18R5tzVBDpzy4
 7hy4eM6mlMxgIJDi7EUboLZslkMM4TGGutLsk5C5T5V5lcWSt3Jj5WZtl5k4Wykq
 j4i9mU+GGTZi0nGAJQ7lNoLPatZDSVQx5tzNV/Wi8hSwZbn0Kycu+IuWZyihILz/
 9lzB/7tv8fl+xkTaJ5xxaY05HcDeX02yCLzh3PfAHRYdbQ2EkFoaKqJ81SLfAq5t
 aH87v41wFSrjzynxpppqswXOdqI/jofrHrGlQldnw0VHGTjEfDbyZGRQFPHmuzTG
 gXaSNKCxppG7ThpXarfu7D4TdYV75n+cBOsC/BBopYgIS2+xmjDA3t5Scks1/4NX
 1Hfq9IMF9iYJYc/GNXBWcOrLn9d1ILYt6HrKRQar1NIEFH1Lt0c2aw5WsyvOZ4zx
 aLHLSbEwnl+2wleyGB9YQkFaaF7N6qcid3u9KFRJP6nTojoaeQaIi3MR9F3LVReZ
 LV5YfWEcij1zc+lzwgHc6+8bbgFxrKgOC2IL/B6u93u/BO0wmF/54kbEZKaLyX8d
 a7Iii4IYFw==
 =2g8v
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:
 "Bindings:

   - DT schema conversions for Samsung clocks, RNG bindings, Qcom
     Command DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl,
     Tegra I2C and BPMP, pwm-vibrator, Arm DSU, and Cadence macb

   - DT schema conversions for Broadcom platforms: interrupt
     controllers, STB GPIO, STB waketimer, STB reset, iProc MDIO mux,
     iProc PCIe, Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON,
     SYSTEMPORT, AMAC, Northstar 2 PCIe PHY, GENET, moca PHY, GISB
     arbiter, and SATA

   - Add binding schemas for Tegra210 EMC table, TI DC-DC converters,

   - Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues

   - More fixes due to 'unevaluatedProperties' enabling

   - Data type fixes and clean-ups of binding examples found in
     preparation to move to validating DTB files directly (instead of
     intermediate YAML representation.

   - Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus

   - Add various new compatible strings

  DT core:

   - Silence a warning for overlapping reserved memory regions

   - Reimplement unittest overlay tracking

   - Fix stack frame size warning in unittest

   - Clean-ups of early FDT scanning functions

   - Fix handling of "linux,usable-memory-range" on EFI booted systems

   - Add support for 'fail' status on CPU nodes

   - Improve error message in of_phandle_iterator_next()

   - kbuild: Disable duplicate unit-address warnings for disabled nodes"

* tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (114 commits)
  dt-bindings: net: mdio: Drop resets/reset-names child properties
  dt-bindings: clock: samsung: convert S5Pv210 to dtschema
  dt-bindings: clock: samsung: convert Exynos5410 to dtschema
  dt-bindings: clock: samsung: convert Exynos5260 to dtschema
  dt-bindings: clock: samsung: extend Exynos7 bindings with UFS
  dt-bindings: clock: samsung: convert Exynos7 to dtschema
  dt-bindings: clock: samsung: convert Exynos5433 to dtschema
  dt-bindings: i2c: maxim,max96712: Add bindings for Maxim Integrated MAX96712
  dt-bindings: iio: adi,ltc2983: Fix 64-bit property sizes
  dt-bindings: power: maxim,max17040: Fix incorrect type for 'maxim,rcomp'
  dt-bindings: interrupt-controller: arm,gic-v3: Fix 'interrupts' cell size in example
  dt-bindings: iio/magnetometer: yamaha,yas530: Fix invalid 'interrupts' in example
  dt-bindings: clock: imx5: Drop clock consumer node from example
  dt-bindings: Drop required 'interrupt-parent'
  dt-bindings: net: ti,dp83869: Drop value on boolean 'ti,max-output-impedance'
  dt-bindings: net: wireless: mt76: Fix 8-bit property sizes
  dt-bindings: PCI: snps,dw-pcie-ep: Drop conflicting 'max-functions' schema
  dt-bindings: i2c: st,stm32-i2c: Make each example a separate entry
  dt-bindings: net: stm32-dwmac: Make each example a separate entry
  dt-bindings: net: Cleanup MDIO node schemas
  ...
2022-01-12 16:47:05 -08:00
Linus Torvalds
daadb3bd0e Peter Zijlstra says:
"Lots of cleanups and preparation; highlights:
 
  - futex: Cleanup and remove runtime futex_cmpxchg detection
 
  - rtmutex: Some fixes for the PREEMPT_RT locking infrastructure
 
  - kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
    annotate the racy owner->on_cpu access *once*.
 
  - atomic64: Dead-Code-Elemination"
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHdvssACgkQEsHwGGHe
 VUrbBg//VQvz5BwddIJDj9utt5AvSixNcTF5mJyFKCSIqO0S4J8nCNcvJjZ2bs4S
 w1YmInFbp0WFGUhaIZiw0e6KWJUoINTng4MfHDZosS1doT2of53ZaQqXs3i81jDz
 87w8ADVHL0x4+BNjdsIwbcuPSDTmJFoyFOdeXTIl9hv9ZULT8m4Mt+LJuUHNZ+vF
 rS1jyseVPWkcm5y+Yie0rhip+ygzbfbt0ArsLfRcrBJsKr6oxLxV2DDF+2djXuuP
 d2OgGT7VkbgAhoKpzVXUiHsT6ppR5Mn5TLSa4EZ4bPPCUFldOhKuCAImF3T6yVIa
 44iX5vQN9v5VHBy6ocPbdOIBuYBYVGCMurh1t7pbpB6G+mmSxMiyta5MY37POwjv
 K2JT9mC2A6a4d17gue5FT3mnJMBB4eHwVaDfAwCZs/5rRNuoTz4aY5Xy04Mq0ltI
 39uarwBd5hwSugBWg44AS5E9h52E654FQ7g6iS4NtUvJuuaXBTl43EcZWx2+mnPL
 zY+iOMVMgg33VIVcm/mlf/6zWL0LXPmILUiA1fp4Q9/n8u1EuOOyeA/GsC9Pl3wO
 HY3KpYJA5eQpIk/JEnzKm5ZE3pCrUdH6VDC/SB4owQtafQG6OxyQVP1Gj7KYxZsD
 NqqpJ4nkKooc5f5DqVEN8wrjyYsnVxEfriEG09OoR6wI3MqyUA4=
 =vrYy
 -----END PGP SIGNATURE-----

Merge tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Borislav Petkov:
 "Lots of cleanups and preparation. Highlights:

   - futex: Cleanup and remove runtime futex_cmpxchg detection

   - rtmutex: Some fixes for the PREEMPT_RT locking infrastructure

   - kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
     annotate the racy owner->on_cpu access *once*.

   - atomic64: Dead-Code-Elemination"

[ Description above by Peter Zijlstra ]

* tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomic: atomic64: Remove unusable atomic ops
  futex: Fix additional regressions
  locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
  x86/mm: Include spinlock_t definition in pgtable.
  locking: Mark racy reads of owner->on_cpu
  locking: Make owner_on_cpu() into <linux/sched.h>
  lockdep/selftests: Adapt ww-tests for PREEMPT_RT
  lockdep/selftests: Skip the softirq related tests on PREEMPT_RT
  lockdep/selftests: Unbalanced migrate_disable() & rcu_read_lock().
  lockdep/selftests: Avoid using local_lock_{acquire|release}().
  lockdep: Remove softirq accounting on PREEMPT_RT.
  locking/rtmutex: Add rt_mutex_lock_nest_lock() and rt_mutex_lock_killable().
  locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.
  locking: Remove rt_rwlock_is_contended().
  sched: Trigger warning if ->migration_disabled counter underflows.
  futex: Fix sparc32/m68k/nds32 build regression
  futex: Remove futex_cmpxchg detection
  futex: Ensure futex_atomic_cmpxchg_inatomic() is present
  kernel/locking: Use a pointer in ww_mutex_trylock().
2022-01-11 17:24:45 -08:00
Linus Torvalds
5c947d0dba Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Algorithms:

   - Drop alignment requirement for data in aesni

   - Use synchronous seeding from the /dev/random in DRBG

   - Reseed nopr DRBGs every 5 minutes from /dev/random

   - Add KDF algorithms currently used by security/DH

   - Fix lack of entropy on some AMD CPUs with jitter RNG

  Drivers:

   - Add support for the D1 variant in sun8i-ce

   - Add SEV_INIT_EX support in ccp

   - PFVF support for GEN4 host driver in qat

   - Compression support for GEN4 devices in qat

   - Add cn10k random number generator support"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (145 commits)
  crypto: af_alg - rewrite NULL pointer check
  lib/mpi: Add the return value check of kcalloc()
  crypto: qat - fix definition of ring reset results
  crypto: hisilicon - cleanup warning in qm_get_qos_value()
  crypto: kdf - select SHA-256 required for self-test
  crypto: x86/aesni - don't require alignment of data
  crypto: ccp - remove unneeded semicolon
  crypto: stm32/crc32 - Fix kernel BUG triggered in probe()
  crypto: s390/sha512 - Use macros instead of direct IV numbers
  crypto: sparc/sha - remove duplicate hash init function
  crypto: powerpc/sha - remove duplicate hash init function
  crypto: mips/sha - remove duplicate hash init function
  crypto: sha256 - remove duplicate generic hash init function
  crypto: jitter - add oversampling of noise source
  MAINTAINERS: update SEC2 driver maintainers list
  crypto: ux500 - Use platform_get_irq() to get the interrupt
  crypto: hisilicon/qm - disable qm clock-gating
  crypto: omap-aes - Fix broken pm_runtime_and_get() usage
  MAINTAINERS: update caam crypto driver maintainers list
  crypto: octeontx2 - prevent underflow in get_cores_bmap()
  ...
2022-01-11 10:21:35 -08:00
Linus Torvalds
8efd0d9c31 Networking changes for 5.17.
Core
 ----
 
  - Defer freeing TCP skbs to the BH handler, whenever possible,
    or at least perform the freeing outside of the socket lock section
    to decrease cross-CPU allocator work and improve latency.
 
  - Add netdevice refcount tracking to locate sources of netdevice
    and net namespace refcount leaks.
 
  - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
    all queues from a single CPU removing latency spikes.
 
  - Various small optimizations throughout the stack from Eric Dumazet.
 
  - Make netdev->dev_addr[] constant, force modifications to go via
    appropriate helpers to allow us to keep addresses in ordered data
    structures.
 
  - Replace unix_table_lock with per-hash locks, improving performance
    of bind() calls.
 
  - Extend skb drop tracepoint with a drop reason.
 
  - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
 
 BPF
 ---
 
  - New helpers:
    - bpf_find_vma(), find and inspect VMAs for profiling use cases
    - bpf_loop(), runtime-bounded loop helper trading some execution
      time for much faster (if at all converging) verification
    - bpf_strncmp(), improve performance, avoid compiler flakiness
    - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
      for tracing programs, all inlined by the verifier
 
  - Support BPF relocations (CO-RE) in the kernel loader.
 
  - Further the support for BTF_TYPE_TAG annotations.
 
  - Allow access to local storage in sleepable helpers.
 
  - Convert verifier argument types to a composable form with different
    attributes which can be shared across types (ro, maybe-null).
 
  - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
    creating new, extensible ones where missing and deprecating those
    to be removed.
 
 Protocols
 ---------
 
  - WiFi (mac80211/cfg80211):
    - notify user space about long "come back in N" AP responses,
      allow it to react to such temporary rejections
    - allow non-standard VHT MCS 10/11 rates
    - use coarse time in airtime fairness code to save CPU cycles
 
  - Bluetooth:
    - rework of HCI command execution serialization to use a common
      queue and work struct, and improve handling errors reported
      in the middle of a batch of commands
    - rework HCI event handling to use skb_pull_data, avoiding packet
      parsing pitfalls
    - support AOSP Bluetooth Quality Report
 
  - SMC:
    - support net namespaces, following the RDMA model
    - improve connection establishment latency by pre-clearing buffers
    - introduce TCP ULP for automatic redirection to SMC
 
  - Multi-Path TCP:
    - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
    - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
      IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
    - support cmsgs: TCP_INQ
    - improvements in the data scheduler (assigning data to subflows)
    - support fastclose option (quick shutdown of the full MPTCP
      connection, similar to TCP RST in regular TCP)
 
  - MCTP (Management Component Transport) over serial, as defined by
    DMTF spec DSP0253 - "MCTP Serial Transport Binding".
 
 Driver API
 ----------
 
  - Support timestamping on bond interfaces in active/passive mode.
 
  - Introduce generic phylink link mode validation for drivers which
    don't have any quirks and where MAC capability bits fully express
    what's supported. Allow PCS layer to participate in the validation.
    Convert a number of drivers.
 
  - Add support to set/get size of buffers on the Rx rings and size of
    the tx copybreak buffer via ethtool.
 
  - Support offloading TC actions as first-class citizens rather than
    only as attributes of filters, improve sharing and device resource
    utilization.
 
  - WiFi (mac80211/cfg80211):
    - support forwarding offload (ndo_fill_forward_path)
    - support for background radar detection hardware
    - SA Query Procedures offload on the AP side
 
 New hardware / drivers
 ----------------------
 
  - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
    real-time requirements for isochronous communication with protocols
    like OPC UA Pub/Sub.
 
  - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
    integrated into many older Qualcomm SoCs, e.g. MSM8916 or
    MSM8974 (qcom_bam_dmux).
 
  - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch
    driver with support for bridging, VLANs and multicast forwarding
    (lan966x).
 
  - iwlmei driver for co-operating between Intel's WiFi driver and
    Intel's Active Management Technology (AMT) devices.
 
  - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
 
  - Bluetooth:
    - MediaTek MT7921 SDIO devices
    - Foxconn MT7922A
    - Realtek RTL8852AE
 
 Drivers
 -------
 
  - Significantly improve performance in the datapaths of:
    lan78xx, ax88179_178a, lantiq_xrx200, bnxt.
 
  - Intel Ethernet NICs:
    - igb: support PTP/time PEROUT and EXTTS SDP functions on
      82580/i354/i350 adapters
    - ixgbevf: new PF -> VF mailbox API which avoids the risk of
      mailbox corruption with ESXi
    - iavf: support configuration of VLAN features of finer granularity,
      stacked tags and filtering
    - ice: PTP support for new E822 devices with sub-ns precision
    - ice: support firmware activation without reboot
 
  - Mellanox Ethernet NICs (mlx5):
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
    - support TC forwarding when tunnel encap and decap happen between
      two ports of the same NIC
    - dynamically size and allow disabling various features to save
      resources for running in embedded / SmartNIC scenarios
 
  - Broadcom Ethernet NICs (bnxt):
    - use page frag allocator to improve Rx performance
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
 
  - Other Ethernet NICs:
    - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
 
  - Microsoft cloud/virtual NIC (mana):
    - add XDP support (PASS, DROP, TX)
 
  - Mellanox Ethernet switches (mlxsw):
    - initial support for Spectrum-4 ASICs
    - VxLAN with IPv6 underlay
 
  - Marvell Ethernet switches (prestera):
    - support flower flow templates
    - add basic IP forwarding support
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - support Per-Stream Filtering and Policing (PSFP)
    - enable cut-through forwarding between ports by default
    - support FDMA to improve packet Rx/Tx to CPU
 
  - Other embedded switches:
    - hellcreek: improve trapping management (STP and PTP) packets
    - qca8k: support link aggregation and port mirroring
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - qca6390, wcn6855: enable 802.11 power save mode in station mode
    - BSS color change support
    - WCN6855 hw2.1 support
    - 11d scan offload support
    - scan MAC address randomization support
    - full monitor mode, only supported on QCN9074
    - qca6390/wcn6855: report signal and tx bitrate
    - qca6390: rfkill support
    - qca6390/wcn6855: regdb.bin support
 
  - Intel WiFi (iwlwifi):
    - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
      in cooperation with the BIOS
    - support for Optimized Connectivity Experience (OCE) scan
    - support firmware API version 68
    - lots of preparatory work for the upcoming Bz device family
 
  - MediaTek WiFi (mt76):
    - Specific Absorption Rate (SAR) support
    - mt7921: 160 MHz channel support
 
  - RealTek WiFi (rtw88):
    - Specific Absorption Rate (SAR) support
    - scan offload
 
  - Other WiFi NICs
    - ath10k: support fetching (pre-)calibration data from nvmem
    - brcmfmac: configure keep-alive packet on suspend
    - wcn36xx: beacon filter support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHbkZAACgkQMUZtbf5S
 IruYkQ//XX7BggcwBfukPK83j0dONolClijqKcKR08g4vB5L8GXvv6OErKIWrh4k
 h8JanCH352ZkbCSw3MvFdm825UYQv8vPMd6Qks/LJ4aSKqCuy4MIlAo+yOw4Km3O
 i7++lRfma6DqHHI59wvLjWoxZSPu8lL+rI8UsZ5qMOlnNlGAOXsNrzRjaqQ3FddY
 AMxZeBUtrPqUCCQZFq3U8apkYzUp7CA/3XR9zRcja3uPbrtOV2G+4whRF90qGNWz
 Tm/QvJ9F/Ab292cbhxR4KuaQ3hUhaCQyDjbZk3+FZzZpAVhYTVqcNjny6+yXmbiP
 NXRtwemnl1NlWKMnJM8lEeY48u626tRIkxA/Wtd61uoO5uKUSxfGP+UpUi+DfXbF
 yIw50VQ7L2bpxXP/HjtmhVgZDaWKYyh22Zw4Hp/muMJz0hgUB0KODY3tf2jUWbjJ
 0oEgocWyzhhwMQKqupTDCIaRgIs2ewYr4ZrFDhI3HnHC/vv1VjoPRUPIyxwppD2N
 cXvZb3B1sWK8iX5gCbISGzyU4bB7I0rvJSTU42ueti7n6NqRFZ79qHQpYnnY+JdO
 z1qOwY/d/yWfBoXVKRtRg2qz6CdEt5BQklwAgVEBgrFpf58gp694EwGMb1htY14J
 r/k9bVpmyIFpUnBH2CPMRfBVA3tUTqzyzzFV4AMw40NYLKmhLdo=
 =KLm3
 -----END PGP SIGNATURE-----

Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Defer freeing TCP skbs to the BH handler, whenever possible, or at
     least perform the freeing outside of the socket lock section to
     decrease cross-CPU allocator work and improve latency.

   - Add netdevice refcount tracking to locate sources of netdevice and
     net namespace refcount leaks.

   - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
     all queues from a single CPU removing latency spikes.

   - Various small optimizations throughout the stack from Eric Dumazet.

   - Make netdev->dev_addr[] constant, force modifications to go via
     appropriate helpers to allow us to keep addresses in ordered data
     structures.

   - Replace unix_table_lock with per-hash locks, improving performance
     of bind() calls.

   - Extend skb drop tracepoint with a drop reason.

   - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.

  BPF
  ---

   - New helpers:
      - bpf_find_vma(), find and inspect VMAs for profiling use cases
      - bpf_loop(), runtime-bounded loop helper trading some execution
        time for much faster (if at all converging) verification
      - bpf_strncmp(), improve performance, avoid compiler flakiness
      - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
        for tracing programs, all inlined by the verifier

   - Support BPF relocations (CO-RE) in the kernel loader.

   - Further the support for BTF_TYPE_TAG annotations.

   - Allow access to local storage in sleepable helpers.

   - Convert verifier argument types to a composable form with different
     attributes which can be shared across types (ro, maybe-null).

   - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
     creating new, extensible ones where missing and deprecating those
     to be removed.

  Protocols
  ---------

   - WiFi (mac80211/cfg80211):
      - notify user space about long "come back in N" AP responses,
        allow it to react to such temporary rejections
      - allow non-standard VHT MCS 10/11 rates
      - use coarse time in airtime fairness code to save CPU cycles

   - Bluetooth:
      - rework of HCI command execution serialization to use a common
        queue and work struct, and improve handling errors reported in
        the middle of a batch of commands
      - rework HCI event handling to use skb_pull_data, avoiding packet
        parsing pitfalls
      - support AOSP Bluetooth Quality Report

   - SMC:
      - support net namespaces, following the RDMA model
      - improve connection establishment latency by pre-clearing buffers
      - introduce TCP ULP for automatic redirection to SMC

   - Multi-Path TCP:
      - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
      - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
        IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
      - support cmsgs: TCP_INQ
      - improvements in the data scheduler (assigning data to subflows)
      - support fastclose option (quick shutdown of the full MPTCP
        connection, similar to TCP RST in regular TCP)

   - MCTP (Management Component Transport) over serial, as defined by
     DMTF spec DSP0253 - "MCTP Serial Transport Binding".

  Driver API
  ----------

   - Support timestamping on bond interfaces in active/passive mode.

   - Introduce generic phylink link mode validation for drivers which
     don't have any quirks and where MAC capability bits fully express
     what's supported. Allow PCS layer to participate in the validation.
     Convert a number of drivers.

   - Add support to set/get size of buffers on the Rx rings and size of
     the tx copybreak buffer via ethtool.

   - Support offloading TC actions as first-class citizens rather than
     only as attributes of filters, improve sharing and device resource
     utilization.

   - WiFi (mac80211/cfg80211):
      - support forwarding offload (ndo_fill_forward_path)
      - support for background radar detection hardware
      - SA Query Procedures offload on the AP side

  New hardware / drivers
  ----------------------

   - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
     real-time requirements for isochronous communication with protocols
     like OPC UA Pub/Sub.

   - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
     integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
     (qcom_bam_dmux).

   - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
     with support for bridging, VLANs and multicast forwarding
     (lan966x).

   - iwlmei driver for co-operating between Intel's WiFi driver and
     Intel's Active Management Technology (AMT) devices.

   - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips

   - Bluetooth:
      - MediaTek MT7921 SDIO devices
      - Foxconn MT7922A
      - Realtek RTL8852AE

  Drivers
  -------

   - Significantly improve performance in the datapaths of: lan78xx,
     ax88179_178a, lantiq_xrx200, bnxt.

   - Intel Ethernet NICs:
      - igb: support PTP/time PEROUT and EXTTS SDP functions on
        82580/i354/i350 adapters
      - ixgbevf: new PF -> VF mailbox API which avoids the risk of
        mailbox corruption with ESXi
      - iavf: support configuration of VLAN features of finer
        granularity, stacked tags and filtering
      - ice: PTP support for new E822 devices with sub-ns precision
      - ice: support firmware activation without reboot

   - Mellanox Ethernet NICs (mlx5):
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
      - support TC forwarding when tunnel encap and decap happen between
        two ports of the same NIC
      - dynamically size and allow disabling various features to save
        resources for running in embedded / SmartNIC scenarios

   - Broadcom Ethernet NICs (bnxt):
      - use page frag allocator to improve Rx performance
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool

   - Other Ethernet NICs:
      - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support

   - Microsoft cloud/virtual NIC (mana):
      - add XDP support (PASS, DROP, TX)

   - Mellanox Ethernet switches (mlxsw):
      - initial support for Spectrum-4 ASICs
      - VxLAN with IPv6 underlay

   - Marvell Ethernet switches (prestera):
      - support flower flow templates
      - add basic IP forwarding support

   - NXP embedded Ethernet switches (ocelot & felix):
      - support Per-Stream Filtering and Policing (PSFP)
      - enable cut-through forwarding between ports by default
      - support FDMA to improve packet Rx/Tx to CPU

   - Other embedded switches:
      - hellcreek: improve trapping management (STP and PTP) packets
      - qca8k: support link aggregation and port mirroring

   - Qualcomm 802.11ax WiFi (ath11k):
      - qca6390, wcn6855: enable 802.11 power save mode in station mode
      - BSS color change support
      - WCN6855 hw2.1 support
      - 11d scan offload support
      - scan MAC address randomization support
      - full monitor mode, only supported on QCN9074
      - qca6390/wcn6855: report signal and tx bitrate
      - qca6390: rfkill support
      - qca6390/wcn6855: regdb.bin support

   - Intel WiFi (iwlwifi):
      - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
        in cooperation with the BIOS
      - support for Optimized Connectivity Experience (OCE) scan
      - support firmware API version 68
      - lots of preparatory work for the upcoming Bz device family

   - MediaTek WiFi (mt76):
      - Specific Absorption Rate (SAR) support
      - mt7921: 160 MHz channel support

   - RealTek WiFi (rtw88):
      - Specific Absorption Rate (SAR) support
      - scan offload

   - Other WiFi NICs
      - ath10k: support fetching (pre-)calibration data from nvmem
      - brcmfmac: configure keep-alive packet on suspend
      - wcn36xx: beacon filter support"

* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
  tcp: tcp_send_challenge_ack delete useless param `skb`
  net/qla3xxx: Remove useless DMA-32 fallback configuration
  rocker: Remove useless DMA-32 fallback configuration
  hinic: Remove useless DMA-32 fallback configuration
  lan743x: Remove useless DMA-32 fallback configuration
  net: enetc: Remove useless DMA-32 fallback configuration
  cxgb4vf: Remove useless DMA-32 fallback configuration
  cxgb4: Remove useless DMA-32 fallback configuration
  cxgb3: Remove useless DMA-32 fallback configuration
  bnx2x: Remove useless DMA-32 fallback configuration
  et131x: Remove useless DMA-32 fallback configuration
  be2net: Remove useless DMA-32 fallback configuration
  vmxnet3: Remove useless DMA-32 fallback configuration
  bna: Simplify DMA setting
  net: alteon: Simplify DMA setting
  myri10ge: Simplify DMA setting
  qlcnic: Simplify DMA setting
  net: allwinner: Fix print format
  page_pool: remove spinlock in page_pool_refill_alloc_cache()
  amt: fix wrong return type of amt_send_membership_update()
  ...
2022-01-10 19:06:09 -08:00
Linus Torvalds
48a60bdb2b - Add a set of thread_info.flags accessors which snapshot it before
accesing it in order to prevent any potential data races, and convert
 all users to those new accessors
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcgFoACgkQEsHwGGHe
 VUqXeRAAvcNEfFw6BvXeGfFTxKmOrsRtu2WCkAkjvamyhXMCrjBqqHlygLJFCH5i
 2mc6HBohzo4vBFcgi3R5tVkGazqlthY1KUM9Jpk7rUuUzi0phTH7n/MafZOm9Es/
 BHYcAAyT/NwZRbCN0geccIzBtbc4xr8kxtec7vkRfGDx8B9/uFN86xm7cKAaL62G
 UDs0IquDPKEns3A7uKNuvKztILtuZWD1WcSkbOULJzXgLkb+cYKO1Lm9JK9rx8Ds
 8tjezrJgOYGLQyyv0i3pWelm3jCZOKUChPslft0opvVUbrNd8piehvOm9CWopHcB
 QsYOWchnULTE9o4ZAs/1PkxC0LlFEWZH8bOLxBMTDVEY+xvmDuj1PdBUpncgJbOh
 dunHzsvaWproBSYUXA9nKhZWTVGl+CM8Ks7jXjl3IPynLd6cpYZ/5gyBVWEX7q3e
 8htG95NzdPPo7doxMiNSKGSmSm0Np1TJ/i89vsYeGfefsvsq53Fyjhu7dIuTWHmU
 2YUe6qHs6dF9x1bkHAAZz6T9Hs4BoGQBcXUnooT9JbzVdv2RfTPsrawdu8dOnzV1
 RhwCFdFcll0AIEl0T9fCYzUI/Ga8ZS0roXs5NZ4wl0lwr0BGFwiU8WC1FUdGsZo9
 0duaa0Tpv0OWt6rIMMB/E9QsqCDsQ4CMHuQpVVw+GOO5ux9kMms=
 =v6Xn
 -----END PGP SIGNATURE-----

Merge tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull thread_info flag accessor helper updates from Borislav Petkov:
 "Add a set of thread_info.flags accessors which snapshot it before
  accesing it in order to prevent any potential data races, and convert
  all users to those new accessors"

* tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  powerpc: Snapshot thread flags
  powerpc: Avoid discarding flags in system_call_exception()
  openrisc: Snapshot thread flags
  microblaze: Snapshot thread flags
  arm64: Snapshot thread flags
  ARM: Snapshot thread flags
  alpha: Snapshot thread flags
  sched: Snapshot thread flags
  entry: Snapshot thread flags
  x86: Snapshot thread flags
  thread_info: Add helpers to snapshot thread flags
2022-01-10 11:34:10 -08:00
Linus Torvalds
9b9e211360 arm64 updates for 5.17:
- KCSAN enabled for arm64.
 
 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.
 
 - Some more SVE clean-ups and refactoring in preparation for SME support
   (scalable matrix extensions).
 
 - BTI clean-ups (SYM_FUNC macros etc.)
 
 - arm64 atomics clean-up and codegen improvements.
 
 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and reciprocal
   square root estimate).
 
 - Use SHA3 instructions to speed-up XOR.
 
 - arm64 unwind code refactoring/unification.
 
 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP == 1
   (potentially set by a hypervisor; user-space already does this).
 
 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.
 
 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmHXNtYACgkQa9axLQDI
 XvHBGw/+OVGdbORxwrU+uRb7N6qIJkrW/mmM4x1KLo1i+REZLb8/VlXm0xC60FG+
 39x6FSVkRr+lLDfTqpQsOez5FpdsvOe9Fc4L3bwniDg+EPo7x65VmP2dw/Ae2q0i
 87xyWCczx5hFEPF/1sb1R1pm3bTXjeklBkdv+OXhwflLOwpCp1J8z8WJK8qJVFX6
 CmuE6Q4fDQr0ghl9Nf8DiAr20mHDh8wMKNUJOg4waaQOOCta6q1oJ3qfz6E9z1eW
 zEE3dfZgBCx7HCRc3KGgzT7H4Ces3BYvhBYP6bJRliVI88XdPiM4MfdGL4UIb27Q
 NLAdr+FVzk/YLzMHtxSfkT10nBqoOPWUTckLu9jIIl5cpBX73Wiz7jfzBvqFmC/y
 opSFMZ3lwQPM5WAPtAlZptA3GPPySeInVmvUgB7IQ+1Q1T1n8ri1y5hzTYC4Sc/g
 amJI1rXf1Al8+2zFBggr6Up+EOnfV9nAwrzLXkRlASsfmvY4dnVWg3NWfBqtEHAq
 VuZCecSgawxuSlpmJ4VGbLrBFaz18bn9EzujR5fFvi5Qcg1CMFOROi2+6IynopNV
 IS0R8j6fwgQPA5lcnNIPeJRRkQoqO4l8bPDzeXEny0BSw313EgBSo9aQtnjyIJbp
 BTuDHARKs+/NvDPvd8GQkxNPgwJnVOL9pdgNAolEu1/k7JtnIS0=
 =ecyi
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - KCSAN enabled for arm64.

 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.

 - Some more SVE clean-ups and refactoring in preparation for SME
   support (scalable matrix extensions).

 - BTI clean-ups (SYM_FUNC macros etc.)

 - arm64 atomics clean-up and codegen improvements.

 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and
   reciprocal square root estimate).

 - Use SHA3 instructions to speed-up XOR.

 - arm64 unwind code refactoring/unification.

 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP ==
   1 (potentially set by a hypervisor; user-space already does this).

 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.

 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits)
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64: Enable KCSAN
  kselftest/arm64: Add pidbench for floating point syscall cases
  arm64/fp: Add comments documenting the usage of state restore functions
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME
  ...
2022-01-10 08:49:37 -08:00
Linus Torvalds
a7ac314061 asm-generic: cleanups for 5.17
A few minor cleanups for cross-architecture code: Alexandre Ghiti
 deals with removing some leftovers from drivers and features that
 have been removed, and Wasin Thonkaew has a cosmetic change.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmHEj8UACgkQmmx57+YA
 GNnchRAAigY0M7hCG0qV/dRgAuxfnEyUtaV1zcx245tKjfQlVwUgecD33G/rwloO
 bFWkJBzTWjJSGNyHQwYvi6QF9JyLImT8EtyvHYXgNEjRfAxQj8ayRR9glUcEiOTX
 00OagUe9NC0Ui/NdX34J7b0ZMMm1XFij1jKQxaoy+U/I97mKqHHrRqJSHpRw77sK
 31948zf3mzJJD10Lo6CBvLngyN/gj5mVPYyrkrerdZ7X7QzzAsoO+gYTeEUDbkie
 4tj3tLwUNbqAUoJNO4lkN6BeXGph2G36tNTPgvJATyx9fZ7M5ayeK6I4QwxtlvBK
 xhmkHTCzc4m1xlwTPjdYO/g/UZHP8pa+WxOkTTSxdQgcgQh/p+zZl3MRxh+5lR/B
 iXcBy7qQfovDkgX7jW81KCJRJZztaFuLlbmQgskP53xsk/zj2A3oiH2pT3X07gGf
 qaZQ3xzeBJl4ax0U03QbC9WTjSqvZprprAbKrM52rzhehDCLHn3kYVMBvIeGI6eh
 m+mlZYBFEzvNh3ak9D5+oOB6iOctqK4h8giIEgvr6HJzxtbLsT1WZ6T4YEfD03fI
 hdttqkt4dBl06Ofql5TE88SbwEvXikG+nFZ8wIEgeRj9EErozCGjKbAd13SAEsJk
 u/ySXHJbwuoT+71SdFlqgj74A9PJWE6FJuVNoEZjAD5CxlVI1tA=
 =ZaDF
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "A few minor cleanups for cross-architecture code: Alexandre Ghiti
  deals with removing some leftovers from drivers and features that have
  been removed, and Wasin Thonkaew has a cosmetic change"

* tag 'asm-generic-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic/error-injection.h: fix a spelling mistake, and a coding style issue
  arch: Remove leftovers from prism54 wireless driver
  arch: Remove leftovers from mandatory file locking
  Documentation, arch: Remove leftovers from CIFS_WEAK_PW_HASH
  Documentation, arch: Remove leftovers from raw device
2022-01-10 08:42:28 -08:00
Masahiro Yamada
129ab0d2d9 kbuild: do not quote string values in include/config/auto.conf
The previous commit fixed up all shell scripts to not include
include/config/auto.conf.

Now that include/config/auto.conf is only included by Makefiles,
we can change it into a more Make-friendly form.

Previously, Kconfig output string values enclosed with double-quotes
(both in the .config and include/config/auto.conf):

    CONFIG_X="foo bar"

Unlike shell, Make handles double-quotes (and single-quotes as well)
verbatim. We must rip them off when used.

There are some patterns:

  [1] $(patsubst "%",%,$(CONFIG_X))
  [2] $(CONFIG_X:"%"=%)
  [3] $(subst ",,$(CONFIG_X))
  [4] $(shell echo $(CONFIG_X))

These are not only ugly, but also fragile.

[1] and [2] do not work if the value contains spaces, like
   CONFIG_X=" foo bar "

[3] does not work correctly if the value contains double-quotes like
   CONFIG_X="foo\"bar"

[4] seems to work better, but has a cost of forking a process.

Anyway, quoted strings were always PITA for our Makefiles.

This commit changes Kconfig to stop quoting in include/config/auto.conf.

These are the string type symbols referenced in Makefiles or scripts:

    ACPI_CUSTOM_DSDT_FILE
    ARC_BUILTIN_DTB_NAME
    ARC_TUNE_MCPU
    BUILTIN_DTB_SOURCE
    CC_IMPLICIT_FALLTHROUGH
    CC_VERSION_TEXT
    CFG80211_EXTRA_REGDB_KEYDIR
    EXTRA_FIRMWARE
    EXTRA_FIRMWARE_DIR
    EXTRA_TARGETS
    H8300_BUILTIN_DTB
    INITRAMFS_SOURCE
    LOCALVERSION
    MODULE_SIG_HASH
    MODULE_SIG_KEY
    NDS32_BUILTIN_DTB
    NIOS2_DTB_SOURCE
    OPENRISC_BUILTIN_DTB
    SOC_CANAAN_K210_DTB_SOURCE
    SYSTEM_BLACKLIST_HASH_LIST
    SYSTEM_REVOCATION_KEYS
    SYSTEM_TRUSTED_KEYS
    TARGET_CPU
    UNUSED_KSYMS_WHITELIST
    XILINX_MICROBLAZE0_FAMILY
    XILINX_MICROBLAZE0_HW_VER
    XTENSA_VARIANT_NAME

I checked them one by one, and fixed up the code where necessary.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-01-08 18:03:57 +09:00
Sachin Sant
f1aa0e47c2 powerpc/xmon: Dump XIVE information for online-only processors.
dxa command in XMON debugger iterates through all possible processors.
As a result, empty lines are printed even for processors which are not
online.

CPU 47:pp=00 CPPR=ff IPI=0x0040002f PQ=-- EQ idx=699 T=0 00000000 00000000
CPU 48:
CPU 49:

Restrict XIVE information(dxa) to be displayed for online processors only.

Signed-off-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/164139226833.12930.272224382183014664.sendpatchset@MacBook-Pro.local
2022-01-06 21:47:00 +11:00
Greg Kroah-Hartman
32a1bda4b1 powerpc/opal: use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the powerpc opal dump and elog sysfs code to use
default_groups field which has been the preferred way since aa30f47cf6
("kobject: Add support for default attribute groups to kobj_type") so
that we can soon get rid of the obsolete default_attrs field.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220104161318.1306023-1-gregkh@linuxfoundation.org
2022-01-05 10:59:52 +11:00
Greg Kroah-Hartman
2bdf3f9e9d powerpc/cacheinfo: use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the powerpc cacheinfo sysfs code to use default_groups
field which has been the preferred way since aa30f47cf6 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220104155450.1291277-1-gregkh@linuxfoundation.org
2022-01-05 10:58:23 +11:00
Guo Ren
08035a67f3 powerpc/sched: Remove unused TASK_SIZE_OF
This macro isn't used in Linux sched, now. Delete in
include/linux/sched.h and arch's include/asm.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211228064730.2882351-5-guoren@kernel.org
2022-01-04 23:12:27 +11:00
Ammar Faizi
18dbfcdedc powerpc/xive: Add missing null check after calling kmalloc
Commit 930914b7d5 ("powerpc/xive: Add a debugfs file to dump
internal XIVE state") forgot to add a null check.

Add it.

Fixes: 930914b7d5 ("powerpc/xive: Add a debugfs file to dump internal XIVE state")
Signed-off-by: Ammar Faizi <ammarfaizi2@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211226135314.251221-1-ammar.faizi@intel.com
2022-01-04 16:03:05 +11:00
Christophe JAILLET
e57c2fd6cd powerpc/floppy: Remove usage of the deprecated "pci-dma-compat.h" API
In [1], Christoph Hellwig has proposed to remove the wrappers in
include/linux/pci-dma-compat.h.

Some reasons why this API should be removed have been given by Julia
Lawall in [2].

A coccinelle script has been used to perform the needed transformation
Only relevant parts are given below.

@@ @@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@ @@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

[1]: https://lore.kernel.org/kernel-janitors/20200421081257.GA131897@infradead.org/
[2]: https://lore.kernel.org/kernel-janitors/alpine.DEB.2.22.394.2007120902170.2424@hadrien/

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9e24eedeab44cbb840598bb188561a48811de845.1641119338.git.christophe.jaillet@wanadoo.fr
2022-01-04 16:00:59 +11:00
Tianjia Zhang
41ea0f6c19 crypto: powerpc/sha - remove duplicate hash init function
sha*_base_init() series functions has implemented the initialization
of the hash context, this commit use sha*_base_init() function to
replace repeated implementations.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-12-31 18:10:55 +11:00
Jakub Kicinski
aec53e60e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
  commit 077cdda764 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
  commit 31108d142f ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  commit 4390c6edc0 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/

net/smc/smc_wr.c
  commit 49dc9013e3 ("net/smc: Use the bitmap API when applicable")
  commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
  bitmap_zero()/memset() is removed by the fix

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-30 12:12:12 -08:00
Linus Torvalds
f651faaaba powerpc fixes for 5.16 #5
Fix DEBUG_WX never reporting any WX mappings, due to use of an incorrect config symbol
 since we converted to using generic ptdump.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHK5joTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgOC0EACbVv34c9dt2ODchr45RHJFmfDdp/oh
 xhQPpSFWQZltfqsLcMwsSRdbVWOvDUDb5eUVJF6PdBhK9BhO30nq3vP8477N5Whp
 PJX237ygZVjFWVbhb0cQB7nmWdMAJhJ7inqIvHxHcgkmzf/M9D21AV/YM3zrZLT9
 NFgboT4TUHG7ll6s8xZce2AM+1Iq/3J8anFkh0AZJPPCdaefOh3JRAkAf7LDTYrV
 ARiVAImXW6zbtKcxZ7Qu5bDjPhHSB/wd00AKY3eqkGIEDwp78TDVItrIGVWX5Y+R
 U6abKgxkZwlMC1K+PCN0q53lg/eSX59Z4FEY/aiWZ2/zfNHU1CSIAVRu7SgL+JVN
 aypVpLuChp7TrWL0asngkJ8/ztFRx8SkBg0oxpLXOPkGU91KEhuedtBBlL60sfoG
 Xh55PvRxgjWUaNwiDX+f+G6Mx2zzaiGOUUeiwwexCtsji9Yt7Fe3NH2eteYLt45r
 S9hcyX/jfa/HMFTIoStqsb//pcQXi5fGA6xU/16aGNpIiRP8+nvrMcGSTuTO4CON
 RCO0mf6yd1Cdq+OrD7vX6lS8iXVhGJtfc49KQ6nPtMHkcMStEYS52jANlzfU3CKO
 ne3+mrX5raty7VkG0HRCfMauhgVPhHqbdlK1JNjSUagL0xXxf+88s3vVWuwRyIb2
 32tNV4XvWtysYw==
 =Pt/J
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "Fix DEBUG_WX never reporting any WX mappings, due to use of an
  incorrect config symbol since we converted to using generic ptdump"

* tag 'powerpc-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion
2021-12-28 11:42:01 -08:00
Michael Ellerman
fd1eaaaaa6 powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings
When CONFIG_PPC_RFI_SRR_DEBUG=y we check the SRR values before returning
from interrupts. This is done in asm using EMIT_BUG_ENTRY, and passing
BUGFLAG_WARNING.

However that fails to create an exception table entry for the warning,
and so do_program_check() fails the exception table search and proceeds
to call _exception(), resulting in an oops like:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 2 PID: 1204 Comm: sigreturn_unali Tainted: P                  5.16.0-rc2-00194-g91ca3d4f77c5 #12
  NIP:  c00000000000c5b0 LR: 0000000000000000 CTR: 0000000000000000
  ...
  NIP [c00000000000c5b0] system_call_common+0x150/0x268
  LR [0000000000000000] 0x0
  Call Trace:
  [c00000000db73e10] [c00000000000c558] system_call_common+0xf8/0x268 (unreliable)
  ...
  Instruction dump:
  7cc803a6 888d0931 2c240000 4082001c 38800000 988d0931 e8810170 e8a10178
  7c9a03a6 7cbb03a6 7d7a02a6 e9810170 <7f0b6088> 7d7b02a6 e9810178 7f0b6088

We should instead use EMIT_WARN_ENTRY, which creates an exception table
entry for the warning, allowing the warning to be correctly recognised,
and the code to resume after printing the warning.

Note however that because this warning is buried deep in the interrupt
return path, we are not able to recover from it (due to MSR_RI being
clear), so we still end up in die() with an unrecoverable exception.

Fixes: 59dc5bfca0 ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221135101.2085547-2-mpe@ellerman.id.au
2021-12-25 10:56:01 +11:00
Michael Ellerman
314f6c23dd powerpc/64s: Mask NIP before checking against SRR0
When CONFIG_PPC_RFI_SRR_DEBUG=y we check that NIP and SRR0 match when
returning from interrupts. This can trigger falsely if NIP has either of
its two low bits set via sigreturn or ptrace, while SRR0 has its low two
bits masked in hardware.

As a quick fix make sure to mask the low bits before doing the check.

Fixes: 59dc5bfca0 ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20211221135101.2085547-1-mpe@ellerman.id.au
2021-12-25 10:55:55 +11:00
Randy Dunlap
5b09250cca powerpc/perf: Fix spelling of "its"
Use the possessive "its" instead of the contraction of "it is" (it's).

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211223003942.22098-1-rdunlap@infradead.org
2021-12-23 22:36:58 +11:00
Christophe Leroy
bba496656a powerpc/32: Fix boot failure with GCC latent entropy plugin
Boot fails with GCC latent entropy plugin enabled.

This is due to early boot functions trying to access 'latent_entropy'
global data while the kernel is not relocated at its final
destination yet.

As there is no way to tell GCC to use PTRRELOC() to access it,
disable latent entropy plugin in early_32.o and feature-fixups.o and
code-patching.o

Fixes: 38addce8b6 ("gcc-plugins: Add latent_entropy plugin")
Cc: stable@vger.kernel.org # v4.9+
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215217
Link: https://lore.kernel.org/r/2bac55483b8daf5b1caa163a45fa5f9cdbe18be4.1640178426.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:58 +11:00
Christophe Leroy
309a0a6018 powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests
The purpose of selftests is to check that instructions are
properly formed. Not to check that they properly run.

For that test it uses normal memory, not special test
memory.

In preparation of a future patch enforcing patch_instruction()
to be used only on valid text areas, implement a ppc_inst_write()
instruction which is the complement of ppc_inst_read(). This
new function writes the formated instruction in valid kernel
memory and doesn't bother about icache.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7cf5335cc07ca9b6f8cdaa20ca9887fce4df3bea.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:58 +11:00
Christophe Leroy
f30a578d76 powerpc/code-patching: Move code patching selftests in its own file
Code patching selftests are half of code-patching.c.
As they are guarded by CONFIG_CODE_PATCHING_SELFTESTS,
they'd be better in their own file.

Also add a missing __init for instr_is_branch_to_addr()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c0c30504f04eb546a48ff77127a8bccd12a3d809.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:58 +11:00
Christophe Leroy
31acc59956 powerpc/code-patching: Move instr_is_branch_{i/b}form() in code-patching.h
To enable moving selftests in their own C file in following patch,
move instr_is_branch_iform() and instr_is_branch_bform()
to code-patching.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fca0f3b191211b3681020885a611bf73eef20563.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:58 +11:00
Christophe Leroy
29562a9da2 powerpc/code-patching: Move patch_exception() outside code-patching.c
patch_exception() is dedicated to book3e/64 is nothing more than
a normal use of patch_branch(), so move it into a place dedicated
to book3e/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0968622b98b1fb51838c35b844c42ad6609de62e.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:55 +11:00
Christophe Leroy
ff14a9c09f powerpc/code-patching: Use test_trampoline for prefixed patch test
Use the dedicated test_trampoline function for testing prefixed
patching like other tests and remove the hand coded assembly stuff.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a450ef3f8653f75e1bd9aaf7a3889d379752f33b.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:25 +11:00
Christophe Leroy
d5937db114 powerpc/code-patching: Fix patch_branch() return on out-of-range failure
Do not silentely ignore a failure of create_branch() in
patch_branch(). Return -ERANGE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8540cb64b1f06710eaf41e3835c7ba3e21fa2b05.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
6b21af7449 powerpc/code-patching: Reorganise do_patch_instruction() to ease error handling
Split do_patch_instruction() in two functions, the caller doing the
spin locking and the callee doing everything else.

And remove a few unnecessary initialisations and intermediate
variables.

This allows the callee to return from anywhere in the function.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dbc85980a0d2a935731b272e8907e8bb1d8fc8c5.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
a3483c3dd1 powerpc/code-patching: Fix unmap_patch_area() error handling
pXd_offset() doesn't return NULL. When the base is NULL, it
still adds the offset.

Use pXd_none() to check validity instead. It also improves
performance by folding out none existing levels as pXd_none()
always returns 0 in that case.

Such an error is unexpected, use WARN_ON() so that the caller
doesn't have to worry about it, and drop the returned value.

And now that unmap_patch_area() doesn't return error, we can
take into account the error returned by __patch_instruction().

While at it, remove the 'inline' property which is useless.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/299804b117fae35c786c827536c91f25352e279b.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
285672f993 powerpc/code-patching: Fix error handling in do_patch_instruction()
Use real errors instead of using -1 as error, so that errors
returned by callees can be used towards callers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/85259d894069e47f915ea580b169e1adbeec7a61.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
af5304a750 powerpc/code-patching: Remove init_mem_is_free
A new state has been added by commit d2635f2012 ("mm: create a new
system state and fix core_kernel_text()"). That state tells when
initmem is about to be released and is redundant with init_mem_is_free.

Remove init_mem_is_free.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ad8c3ccb39c8edaa89fd3eda1cc7218baea1cde5.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Christophe Leroy
edecd2d6d6 powerpc/code-patching: Remove pr_debug()/pr_devel() messages and fix check()
code-patching has been working for years now, time has come to
remove debugging messages.

Change useful message to KERN_INFO and remove other ones.

Also add KERN_ERR to check() macro and change it into a do/while
to make checkpatch happy.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3ff9823c0a812a8a145d979a9600a6d4591b80ee.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23 22:35:24 +11:00
Alexey Kardashevskiy
62479e6e26 powerpc/mm/book3s64/hash: Switch pre 2.06 tlbiel to .long
The llvm integrated assembler does not recognise the ISA 2.05 tlbiel
version. Work around it by switching to .long when an old arch level
detected.

Signed-off-by: Daniel Axtens <dja@axtens.net>
[aik: did "Eventually do this more smartly"]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-7-aik@ozlabs.ru
2021-12-23 22:35:13 +11:00
Alexey Kardashevskiy
d51f86cfd8 powerpc/mm: Switch obsolete dssall to .long
The dssall ("Data Stream Stop All") instruction is obsolete altogether
with other Data Cache Instructions since ISA 2.03 (year 2006).

LLVM IAS does not support it but PPC970 seems to be using it.
This switches dssall to .long as there is no much point in fixing LLVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-6-aik@ozlabs.ru
2021-12-23 22:35:13 +11:00
Daniel Axtens
d72c4a36d7 powerpc/64/asm: Do not reassign labels
The LLVM integrated assembler really does not like us reassigning things
to the same label:

<instantiation>:7:9: error: invalid reassignment of non-absolute variable 'fs_label'

This happens across a bunch of platforms:
https://github.com/ClangBuiltLinux/linux/issues/1043
https://github.com/ClangBuiltLinux/linux/issues/1008
https://github.com/ClangBuiltLinux/linux/issues/920
https://github.com/ClangBuiltLinux/linux/issues/1050

There is no hope of getting this fixed in LLVM (see
https://github.com/ClangBuiltLinux/linux/issues/1043#issuecomment-641571200
and https://bugs.llvm.org/show_bug.cgi?id=47798#c1 )
so if we want to build with LLVM_IAS, we need to hack
around it ourselves.

For us the big problem comes from this:

\#define USE_FIXED_SECTION(sname)				\
	fs_label = start_##sname;				\
	fs_start = sname##_start;				\
	use_ftsec sname;

\#define USE_TEXT_SECTION()
	fs_label = start_text;					\
	fs_start = text_start;					\
	.text

and in particular fs_label.

This works around it by not setting those 'variables' and requiring
that users of the variables instead track for themselves what section
they are in. This isn't amazing, by any stretch, but it gets us further
in the compilation.

Note that even though users have to keep track of the section, using
a wrong one produces an error with both binutils and llvm which prevents
from using wrong section at the compile time:

llvm error example:

AS      arch/powerpc/kernel/head_64.o
<unknown>:0: error: Cannot represent a difference across sections
make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1

binutils error example:

/home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
/home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S:1974: Error: can't resolve `system_call_common' {.text section} - `start_r
eal_vectors' {.head.text.real_vectors section}
make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-5-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Alexey Kardashevskiy
fd98395797 powerpc/64/asm: Inline BRANCH_TO_C000
It is used just once and does not really help with readability, remove it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-4-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Daniel Axtens
f5140cab44 powerpc: check for support for -Wa,-m{power4,any}
LLVM's integrated assembler does not like either -Wa,-mpower4
or -Wa,-many. So just don't pass them if they're not supported.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-3-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Alan Modra
a3ad84da07 powerpc/toc: Future proof kernel toc
This patch future-proofs the kernel against linker changes that might
put the toc pointer at some location other than .got+0x8000, by
replacing __toc_start+0x8000 with .TOC. throughout.  If the kernel's
idea of the toc pointer doesn't agree with the linker, bad things
happen.

prom_init.c code relocating its toc is also changed so that a symbolic
__prom_init_toc_start toc-pointer relative address is calculated
rather than assuming that it is always at toc-pointer - 0x8000.  The
length calculations loading values from the toc are also avoided.
It's a little incestuous to do that with unreloc_toc picking up
adjusted values (which is fine in practice, they both adjust by the
same amount if all goes well).

I've also changed the way .got is aligned in vmlinux.lds and
zImage.lds, mostly so that dumping out section info by objdump or
readelf plainly shows the alignment is 256.  This linker script
feature was added 2005-09-27, available in FSF binutils releases from
2.17 onwards.  Should be safe to use in the kernel, I think.

Finally, put *(.got) before the prom_init.o entry which only needs
*(.toc), so that the GOT header goes in the correct place.  I don't
believe this makes any difference for the kernel as it would for
dynamic objects being loaded by ld.so.  That change is just to stop
lusers who blindly copy kernel scripts being led astray.  Of course,
this change needs the prom_init.c changes.

Some notes on .toc and .got.

.toc is a compiler generated section of addresses.  .got is a linker
generated section of addresses, generally built when the linker sees
R_*_*GOT* relocations.  In the case of powerpc64 ld.bfd, there are
multiple generated .got sections, one per input object file.  So you
can somewhat reasonably write in a linker script an input section
statement like *prom_init.o(.got .toc) to mean "the .got and .toc
section for files matching *prom_init.o".  On other architectures that
doesn't make sense, because the linker generally has just one .got
section.  Even on powerpc64, note well that the GOT entries for
prom_init.o may be merged with GOT entries from other objects.  That
means that if prom_init.o references, say, _end via some GOT
relocation, and some other object also references _end via a GOT
relocation, the GOT entry for _end may be in the range
__prom_init_toc_start to __prom_init_toc_end and if the kernel does
something special to GOT/TOC entries in that range then the value of
_end as seen by objects other than prom_init.o will be affected.  On
the other hand the GOT entry for _end may not be in the range
__prom_init_toc_start to __prom_init_toc_end.  Which way it turns out
is deterministic but a detail of linker operation that should not be
relied on.

A feature of ld.bfd is that input .toc (and .got) sections matching
one linker input section statement may be sorted, to put entries used
by small-model code first, near the toc base.  This is why scripts for
powerpc64 normally use *(.got .toc) rather than *(.got) *(.toc), since
the first form allows more freedom to sort.

Another feature of ld.bfd is that indirect addressing sequences using
the GOT/TOC may be edited by the linker to relative addressing.  In
many cases relative addressing would be emitted by gcc for
-mcmodel=medium if you appropriately decorate variable declarations
with non-default visibility.

The original patch is here:
https://lore.kernel.org/linuxppc-dev/20210310034813.GM6042@bubble.grove.modra.org/

Signed-off-by: Alan Modra <amodra@au1.ibm.com>
[aik: removed non-relocatable which is gone in 24d33ac5b8]
[aik: added <=2.24 check]
[aik: because of llvm-as, kernel_toc_addr() uses "mr" instead of global register variable]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-2-aik@ozlabs.ru
2021-12-23 22:35:01 +11:00
Nick Child
7da1d1ddd1 cuda/pmu: Make find_via_cuda/pmu init functions
Make `find_via_cuda` and `find_via_pmu` initialization functions.
Previously, their definitions in `drivers/macintosh/via-cuda.h` include
the `__init` attribute but their alternative definitions in
`arch/powerpc/powermac/sectup./c` and prototypes in `include/linux/
cuda.h` and `include/linux/pmu.h` do not use the `__init` macro. Since,
only initialization functions call `find_via_cuda` and `find_via_pmu`
it is safe to label these functions with `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-21-nick.child@ibm.com
2021-12-23 22:35:00 +11:00
Nick Child
2493a24271 powerpc/512x: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/512x' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-20-nick.child@ibm.com
2021-12-23 22:33:19 +11:00
Nick Child
407454cafd powerpc/85xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/85xx' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-19-nick.child@ibm.com
2021-12-23 22:33:18 +11:00
Nick Child
f4a88b0ef5 powerpc/83xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/83xx' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-18-nick.child@ibm.com
2021-12-23 22:33:18 +11:00
Nick Child
c0dc225ae7 powerpc/embedded6xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/embedded6xx' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-17-nick.child@ibm.com
2021-12-23 22:33:17 +11:00
Nick Child
1ee969be25 powerpc/44x: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/44x/' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-16-nick.child@ibm.com
2021-12-23 22:33:17 +11:00
Nick Child
1e3d992d21 powerpc/4xx: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/4xx' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-15-nick.child@ibm.com
2021-12-23 22:33:16 +11:00
Nick Child
f1ba9b9474 powerpc/ps3: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/ps3' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-14-nick.child@ibm.com
2021-12-23 22:33:16 +11:00
Nick Child
e14ff96d08 powerpc/pseries: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/pseries' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-13-nick.child@ibm.com
2021-12-23 22:33:15 +11:00
Nick Child
e5913db1ef powerpc/powernv: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/powernv' are
deserving of an `__init` macro attribute. These functions are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-12-nick.child@ibm.com
2021-12-23 22:33:15 +11:00
Nick Child
b346f57100 powerpc/powermac: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/powermac` are only
called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-11-nick.child@ibm.com
2021-12-23 22:33:14 +11:00
Nick Child
e37e06af9b powerpc/pasemi: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/pasemi' are deserving
of an `__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-10-nick.child@ibm.com
2021-12-23 22:33:14 +11:00
Nick Child
d3aa3c5edf powerpc/chrp: Add __init attribute to eligible functions
The function `Enable_SRAM` defined in 'arch/powerpc/platforms/chrp' is
deserving of an `__init` macro attribute. This function is only called by
other initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-9-nick.child@ibm.com
2021-12-23 22:33:13 +11:00
Nick Child
7c1ab16b2d powerpc/cell: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/platforms/cell' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-8-nick.child@ibm.com
2021-12-23 22:33:13 +11:00
Nick Child
456e8eb324 powerpc/xmon: Add __init attribute to eligible functions
`xmon_register_spus` defined in 'arch/powerpc/xmon' is deserving of an
`__init` macro attribute. This functions is only called by other
initialization functions and therefore should inherit the attribute.
Also, change the function declaration in the header file to include
`__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-7-nick.child@ibm.com
2021-12-23 22:33:12 +11:00
Nick Child
6c552983d0 powerpc/sysdev: Add __init attribute to eligible functions
Some files functions in 'arch/powerpc/sysdev' are deserving of an `__init`
macro attribute. These functions are only called by other initialization
functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-6-nick.child@ibm.com
2021-12-23 22:33:12 +11:00
Nick Child
c49f5d88ff powerpc/perf: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/perf' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-5-nick.child@ibm.com
2021-12-23 22:33:11 +11:00
Nick Child
c13f2b2bb5 powerpc/mm: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/mm' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-4-nick.child@ibm.com
2021-12-23 22:33:11 +11:00
Nick Child
ce0c6be9c6 powerpc/lib: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/lib' are deserving of an `__init`
macro attribute. These functions are only called by other initialization
functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-3-nick.child@ibm.com
2021-12-23 22:33:10 +11:00
Nick Child
d276960d92 powerpc/kernel: Add __init attribute to eligible functions
Some functions defined in `arch/powerpc/kernel` (and one in `arch/powerpc/
kexec`) are deserving of an `__init` macro attribute. These functions are
only called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-2-nick.child@ibm.com
2021-12-23 22:33:10 +11:00
Rob Herring
9cbbe6bae9 powerpc/dts: Remove "spidev" nodes
"spidev" is not a real device, but a Linux implementation detail. It has
never been documented either. The kernel has WARNed on the use of it for
over 6 years. Time to remove its usage from the tree.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211217221400.3667133-1-robh@kernel.org
2021-12-21 20:17:58 +11:00
Michael Ellerman
8d84fca437 powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion
In note_prot_wx() we bail out without reporting anything if
CONFIG_PPC_DEBUG_WX is disabled.

But CONFIG_PPC_DEBUG_WX was removed in the conversion to generic ptdump,
we now need to use CONFIG_DEBUG_WX instead.

Fixes: e084728393 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20211203124112.2912562-1-mpe@ellerman.id.au
2021-12-21 13:05:59 +11:00
Nicholas Piggin
467ba14e16 powerpc/64s/radix: Fix huge vmap false positive
pmd_huge() is defined to false when HUGETLB_PAGE is not configured, but
the vmap code still installs huge PMDs. This leads to false bad PMD
errors when vunmapping because it is not seen as a huge PTE, and the bad
PMD check catches it. The end result may not be much more serious than
some bad pmd warning messages, because the pmd_none_or_clear_bad() does
what we wanted and clears the huge PTE anyway.

Fix this by checking pmd_is_leaf(), which checks for a PTE regardless of
config options. The whole huge/large/leaf stuff is a tangled mess but
that's kernel-wide and not something we can improve much in arch/powerpc
code.

pmd_page(), pud_page(), etc., called by vmalloc_to_page() on huge vmaps
can similarly trigger a false VM_BUG_ON when CONFIG_HUGETLB_PAGE=n, so
those checks are adjusted. The checks were added by commit d6eacedd1f
("powerpc/book3s: Use config independent helpers for page table walk"),
while implementing a similar fix for other page table walking functions.

Fixes: d909f9109c ("powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216103342.609192-1-npiggin@gmail.com
2021-12-20 12:13:32 +11:00
Yang Guang
a605b39e8e powerpc: use swap() to make code cleaner
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
[mpe: Add include of linux/minmax.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/71a702c2189b16c152affd8a8cda1d84ce32741c.1639792543.git.yang.guang5@zte.com.cn
2021-12-20 12:01:21 +11:00
Christophe JAILLET
2fe4ca6ad7 powerpc/mpic: Use bitmap_zalloc() when applicable
'mpic->protected' is a bitmap. So use 'bitmap_zalloc()' to simplify
code and improve the semantic, instead of hand writing it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa145f674e08044c98f13f1a985faa9cc29c3708.1639777976.git.christophe.jaillet@wanadoo.fr
2021-12-20 11:54:05 +11:00
Linus Torvalds
713ab911f2 powerpc fixes for 5.16 #4
Fix a recently introduced oops at boot on 85xx in some configurations.
 
 Fix crashes when loading some livepatch modules with STRICT_MODULE_RWX.
 
 Thanks to: Joe Lawrence, Russell Currey, Xiaoming Ni.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmG+q+UTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG5sEACVsb39+Q8w0GKJxUGR0Wri1InJTkLC
 UWpd9PVzG6n8+Epl0f+H5cMlv5V6xaDzTsNXEf27xIACMlPSVHVLDW8u/RFPizdp
 fUDUweGUvK/M0GQW89nRV29DjdNjMmUq1KQFM6eEgYN/0nS3HqF27jamiaFhiZ+e
 pJl1g2dy/DTWrFz3V/8b0nv6kyJE8GlpnCmTchDsPNDbzWQVJXVfM7IB/9N+EVna
 MQAZCyKSOafTgNRPOW+Sm4oF64z+Yn4cxURVKfzbM+0Cd8pNIcGZ7McXrnp0DR4t
 bLGeYEzyR1bPc3VTt7b+jafDP41AakRllKAFz65+pdYOcV1hTRM4inbLVPT5Bmz6
 7ANlX/w4BwYWFk3mWW14BoxU02FlRKRddXhusq7x2wIqvtn1KI8Qm1xpneiGpcW+
 GEe+k9ONGowXp/bc47/Co6dPSHKRRHUC3MsgbDv2Y7M5G9vsetckdQzA2bIqYSx8
 r1ZXNiMwQarPeB6FrGvNbekyCsIhJkk6chhtL0k27BVOFqYiHZtaqBs+RCQYPdC8
 Aih8WOEqdldbItFI3MihyKULucgkVpdnm80Ja67qRthuJICF9OML3mdrt83bxEIp
 fu4Bsw8dygVKJWScZj8AUC/ZBxOrrTGyZR1RvsHpaAiEfinMkxxTnXjqNDiSZWXt
 DKKLx6pfSFrokw==
 =Nmp5
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix a recently introduced oops at boot on 85xx in some configurations.

  Fix crashes when loading some livepatch modules with
  STRICT_MODULE_RWX.

  Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni"

* tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/module_64: Fix livepatching for RO modules
  powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
2021-12-19 11:31:14 -08:00
Paolo Bonzini
5a213b9220 Merge branch 'topic/ppc-kvm' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD
Fix conflicts between memslot overhaul and commit 511d25d6b7 ("KVM:
PPC: Book3S: Suppress warnings when allocating too big memory slots")
from the powerpc tree.
2021-12-19 15:27:21 +01:00
Alexandre Ghiti
e0cb56546d arch: Remove leftovers from prism54 wireless driver
This driver was removed so remove all references to it.

Fixes: d249ff28b1 ("intersil: remove obsolete prism54 wireless driver")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-17 14:12:09 +01:00
Alexandre Ghiti
2ac7069ad7 Documentation, arch: Remove leftovers from CIFS_WEAK_PW_HASH
This config was removed so remove all references to it.

Fixes: 76a3c92ec9 ("cifs: remove support for NTLM and weaker authentication algorithms")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Reviewed-by: Steve French <smfrench@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de> [arch/arm/configs]
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-17 14:12:03 +01:00
Alexandre Ghiti
473dcf0ffc Documentation, arch: Remove leftovers from raw device
Raw device interface was removed so remove all references to configs
related to it.

Fixes: 603e4922f1 ("remove the raw driver")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Acked-by: Arnd Bergmann <arnd@arndb.de> [arch/arm/configs]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-17 14:07:52 +01:00
Rob Herring
1f012283e9 of/fdt: Rework early_init_dt_scan_memory() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework
early_init_dt_scan_memory() to be called directly and use libfdt.

Cc: John Crispin <john@phrozen.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linux-mips@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211215150102.1303588-1-robh@kernel.org
2021-12-16 16:07:52 -06:00
Rob Herring
d665881d21 of/fdt: Rework early_init_dt_scan_root() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework early_init_dt_scan_root()
to be called directly and use libfdt.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Link: https://lore.kernel.org/r/20211118181213.1433346-3-robh@kernel.org
2021-12-16 16:07:48 -06:00
Rob Herring
60f20d84dc of/fdt: Rework early_init_dt_scan_chosen() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework
early_init_dt_scan_chosen() to be called directly and use libfdt.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Link: https://lore.kernel.org/r/20211118181213.1433346-2-robh@kernel.org
2021-12-16 16:07:41 -06:00
Thomas Gleixner
706b585a1b powerpc/mpic_u3msi: Use msi_for_each-desc()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.576162169@linutronix.de
2021-12-16 22:22:19 +01:00
Thomas Gleixner
ab430e7437 powerpc/fsl_msi: Use msi_for_each_desc()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.522641685@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
e22b0d1bbf powerpc/pasemi/msi: Convert to msi_on_each_dec()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.468512783@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
3c46658bd7 powerpc/cell/axon_msi: Convert to msi_on_each_desc()
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.414712173@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
85dabc2f72 powerpc/4xx/hsta: Rework MSI handling
Replace the about to vanish iterators and make use of the filtering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210748.359766435@linutronix.de
2021-12-16 22:22:18 +01:00
Thomas Gleixner
651b39c488 powerpc/pseries/msi: Let core code check for contiguous entries
Set the domain info flag and remove the check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221814.720998720@linutronix.de
2021-12-16 22:16:40 +01:00
Thomas Gleixner
173ffad79d PCI/MSI: Use msi_desc::msi_index
The usage of msi_desc::pci::entry_nr is confusing at best. It's the index
into the MSI[X] descriptor table.

Use msi_desc::msi_index which is shared between all MSI incarnations
instead of having a PCI specific storage for no value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211210221814.602911509@linutronix.de
2021-12-16 22:16:40 +01:00
Thomas Gleixner
ed1533b581 powerpc/pseries/msi: Use PCI device properties
instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211210221813.556202506@linutronix.de
2021-12-16 22:16:38 +01:00
Thomas Gleixner
d8a530578b powerpc/cell/axon_msi: Use PCI device property
instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211210221813.493922179@linutronix.de
2021-12-16 22:16:38 +01:00
Emmanuel Gil Peyrot
c636783d59 powerpc: wii_defconfig: Enable the RTC driver
This selects the rtc-gamecube driver, which provides a real-time clock
on this platform.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211215175501.6761-6-linkmauve@linkmauve.fr
2021-12-16 21:49:51 +01:00
Nicholas Piggin
3b54c71537 powerpc/pseries: use slab context cpumask allocation in CPU hotplug init
Slab is up at this point, using the bootmem allocator triggers a
warning. Switch to using the regular cpumask allocator.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105132923.1582514-1-npiggin@gmail.com
2021-12-16 21:31:46 +11:00
Nicholas Piggin
af47d79b04 powerpc/64s/interrupt: avoid saving CFAR in some asynchronous interrupts
Reading the CFAR register is quite costly (~20 cycles on POWER9). It is
a good idea to have for most synchronous interrupts, but for async ones
it is much less important.

Doorbell, external, and decrementer interrupts are the important
asynchronous ones. HV interrupts can't skip CFAR if KVM HV is possible,
because it might be a guest exit that requires CFAR preserved. But the
important pseries interrupts can avoid loading CFAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-7-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
ecb1057c0f powerpc/64/interrupt: reduce expensive debug tests
Move the assertions requiring restart table searches under
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-6-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
0faf20a1ad powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use
Enabling MSR[EE] in interrupt handlers while interrupts are still soft
masked allows PMIs to profile interrupt handlers to some degree, beyond
what SIAR latching allows.

When perf is not being used, this is almost useless work. It requires an
extra mtmsrd in the irq handler, and it also opens the door to masked
interrupts hitting and requiring replay, which is more expensive than
just taking them directly. This effect can be noticable in high IRQ
workloads.

Avoid enabling MSR[EE] unless perf is currently in use. This saves about
60 cycles (or 8%) on a simple decrementer interrupt microbenchmark.
Replayed interrupts drop from 1.4% of all interrupts taken, to 0.003%.

This does prevent the soft-nmi interrupt being taken in these handlers,
but that's not too reliable anyway. The SMP watchdog will continue to be
the reliable way to catch lockups.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-5-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
5a7745b96f powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI
Interrupt code enables MSR[EE] in some irq handlers while keeping local
irqs disabled via soft-mask, allowing PMI interrupts to be taken as
soft-NMI to improve profiling of irq handlers.

When perf is not enabled, there is no point to doing this, it's
additional overhead. So provide a function that can say if PMIs should
be taken promptly if possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-4-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
ff0b0d6e1a powerpc/64s/interrupt: handle MSR EE and RI in interrupt entry wrapper
The mtmsrd to enable MSR[RI] can be combined with the mtmsrd to enable
MSR[EE] in interrupt entry code, for those interrupts which enable EE.
This helps performance of important synchronous interrupts (e.g., page
faults).

This is similar to what commit dd152f70bd ("powerpc/64s: system call
avoid setting MSR[RI] until we set MSR[EE]") does for system calls.

Do this by enabling EE and RI together at the beginning of the entry
wrapper if PACA_IRQ_HARD_DIS is clear, and only enabling RI if it is
set.

Asynchronous interrupts set PACA_IRQ_HARD_DIS, but synchronous ones
leave it unchanged, so by default they always get EE=1 unless they have
interrupted a caller that is hard disabled. When the sync interrupt
later calls interrupt_cond_local_irq_enable(), it will not require
another mtmsrd because MSR[EE] was already enabled here.

This avoids one mtmsrd L=1 for synchronous interrupts on 64s, which
saves about 20 cycles on POWER9. And for kernel-mode interrupts, both
synchronous and asynchronous, this saves an additional 40 cycles due to
the mtmsrd being moved ahead of mfspr SPRN_AMR, which prevents a SPR
scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-3-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
4423eb5ae3 powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible
Make synchronous interrupt handler entry wrappers enable MSR[EE] if
MSR[EE] was enabled in the interrupted context. IRQs are soft-disabled
at this point so there is no change to high level code, but it's a
masked interrupt could fire.

This is a performance disadvantage for interrupts which do not later
call interrupt_cond_local_irq_enable(), because an an additional mtmsrd
or wrtee instruction is executed. However the important synchronous
interrupts (e.g., page fault) do enable interrupts, so the performance
disadvantage is mostly avoided.

In the next patch, MSR[RI] enabling can be combined with MSR[EE]
enabling, which mitigates the performance drop for the former and gives
a performance advanage for the latter interrupts, on 64s machines. 64e
is coming along for the ride for now to avoid divergences with 64s in
this tricky code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-2-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin
0a006ace63 powerpc/pseries/vas: Don't print an error when VAS is unavailable
KVM does not support VAS so guests always print a useless error on boot

    vas: HCALL(398) error -2, query_type 0, result buffer 0x57f2000

Change this to only print the message if the error is not H_FUNCTION.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211126052133.1664375-1-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Kajol Jain
6ed05a8efd powerpc/perf: Add data source encodings for power10 platform
The code represent memory/cache level data based on PERF_MEM_LVL_*
namespace, which is in the process of deprication in the favour of
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
Add data source encodings to represent cache/memory data based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.

Add data source encodings to represent data coming from local
memory/Remote memory/distant memory and remote/distant cache hits.

In order to represent data coming from OpenCAPI cache/memory, we use
LVLNUM "PMEM" field which is used to present persistent memory accesses.

Result in power10 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    29.46%          2331  L1 or L1 hit              [.] __random                                     libc-2.28.so
    23.11%          2121  L1 or L1 hit              [.] producer_populate_cache                      producer_consumer
    18.56%          1758  L1 or L1 hit              [.] __random_r                                   libc-2.28.so
    15.64%          1559  L2 or L2 hit              [.] __random                                     libc-2.28.so
    .....
    0.09%              5  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    0.07%              4  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    .....

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-5-kjain@linux.ibm.com
2021-12-16 21:31:44 +11:00
Kajol Jain
4a20ee1061 powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields
The code represent data coming from L1/L2/L3 cache hits based on
PERF_MEM_LVL_* namespace, which is in the process of deprecation in
the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
fields.

Add data source encodings to represent L1/L2/L3 cache hits based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
power10 and older platforms

Result in power9 system without patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                             Shared Object
 # ........  ............  ........................  .................................  ................
 #
    29.51%             1  L2 hit                    [k] perf_event_exec                [kernel.vmlinux]
    27.05%             1  L1 hit                    [k] perf_ctx_unlock                [kernel.vmlinux]
    13.93%             1  L1 hit                    [k] vtime_delta                    [kernel.vmlinux]
    13.11%             1  L1 hit                    [k] prepend_path.isra.11           [kernel.vmlinux]
     8.20%             1  L1 hit                    [.] 00000038.plt_call.__GI_strlen  libc-2.28.so
     8.20%             1  L1 hit                    [k] perf_event_interrupt           [kernel.vmlinux]

Result in power9 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    36.63%             1  L2 or L2 hit              [k] perf_event_exec         [kernel.vmlinux]
    25.50%             1  L1 or L1 hit              [k] vtime_delta             [kernel.vmlinux]
    13.12%             1  L1 or L1 hit              [k] unmap_region            [kernel.vmlinux]
    12.62%             1  L1 or L1 hit              [k] perf_sample_event_took  [kernel.vmlinux]
     6.93%             1  L1 or L1 hit              [k] perf_ctx_unlock         [kernel.vmlinux]
     5.20%             1  L1 or L1 hit              [.] __memcpy_power7         libc-2.28.so

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-4-kjain@linux.ibm.com
2021-12-16 21:31:44 +11:00
Emmanuel Gil Peyrot
57bd7d3565 powerpc: gamecube_defconfig: Enable the RTC driver
This selects the rtc-gamecube driver, which provides a real-time clock
on this platform.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211215175501.6761-5-linkmauve@linkmauve.fr
2021-12-16 10:46:35 +01:00
Emmanuel Gil Peyrot
5479618e1e powerpc: wii.dts: Expose HW_SRNPROT on this platform
This Hollywood register isn’t properly understood, but can allow or
reject access to the SRAM, which we need to set for RTC usage if it
isn’t previously set correctly beforehand.

See https://wiibrew.org/wiki/Hardware/Hollywood_Registers#HW_SRNPROT

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211215175501.6761-4-linkmauve@linkmauve.fr
2021-12-16 10:46:35 +01:00
Michael Ellerman
708da3ff1d Merge branch 'topic/ppc-kvm' into next
Bring in some more KVM commits from our KVM topic branch.
2021-12-15 11:29:53 +11:00
Russell Currey
8734b41b3e powerpc/module_64: Fix livepatching for RO modules
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
writes through the text poke area by using patch_instruction().

R_PPC_REL24 is the only relocation type generated by the kpatch-build
userspace tool or klp-convert kernel tree that I observed applying a
relocation to a post-init module.

A more comprehensive solution is planned, but using patch_instruction()
for R_PPC_REL24 on should serve as a sufficient fix.

This does have a performance impact, I observed ~15% overhead in
module_load() on POWER8 bare metal with checksum verification off.

Fixes: c35717c71e ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
[mpe: Check return codes from patch_instruction()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211214121248.777249-1-mpe@ellerman.id.au
2021-12-14 23:13:03 +11:00
Sean Christopherson
63fa47ba88 KVM: PPC: Book3S HV P9: Use kvm_arch_vcpu_get_wait() to get rcuwait object
Use kvm_arch_vcpu_get_wait() to get a vCPU's rcuwait object instead of
using vcpu->wait directly in kvmhv_run_single_vcpu().  Functionally, this
is a nop as vcpu->arch.waitp is guaranteed to point at vcpu->wait.  But
that is not obvious at first glance, and a future change coming in via
the KVM tree, commit 510958e997 ("KVM: Force PPC to define its own
rcuwait object"), will hide vcpu->wait from architectures that define
__KVM_HAVE_ARCH_WQP to prevent generic KVM from attepting to wake a vCPU
with the wrong rcuwait object.

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211213174556.3871157-1-seanjc@google.com
2021-12-14 22:49:36 +11:00
Eric W. Biederman
0e25498f8c exit: Add and use make_task_dead.
There are two big uses of do_exit.  The first is it's design use to be
the guts of the exit(2) system call.  The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure.  In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Ingo Molnar
6773cc31a9 Linux 5.16-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmG2fU0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC7EH/3R7Rt+OD8Wn8Ss3
 w8V+dBxVwa2u2oMTyUHPxaeOXZ7bi38XlUdLFPOK/76bGwO0a5TmYZqsWdRbGyT0
 HfcYjHsQ0lbJXk/nh2oM47oJxJXVpThIHXJEk0FZ0Y5t+DYjIYlNHzqZymUyhLem
 St74zgWcyT+MXuqY34vB827FJDUnOxhhhi85tObeunaSPAomy9aiYidSC1ARREnz
 iz2VUntP/QnRnKVvL2nUZNzcz1xL5vfCRSKsRGRSv3qW1Y/1M71ylt6JVmSftWq+
 VmMdFxFhdrb1OK/1ct/930Un/UP2NG9EJsWxote2XYlnVSZHzDqH7lUhbqgdCcLz
 1m2tVNY=
 =7wRd
 -----END PGP SIGNATURE-----

Merge tag 'v5.16-rc5' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-12-13 10:48:46 +01:00
Jakub Kicinski
be3158290d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2021-12-10 v2

We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).

The main changes are:

1) Various samples fixes, from Alexander Lobakin.

2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.

3) A batch of new unified APIs for libbpf, logging improvements, version
   querying, etc. Also a batch of old deprecations for old APIs and various
   bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.

4) BPF documentation reorganization and improvements, from Christoph Hellwig
   and Dave Tucker.

5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
   libbpf, from Hengqi Chen.

6) Verifier log fixes, from Hou Tao.

7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.

8) Extend branch record capturing to all platforms that support it,
   from Kajol Jain.

9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.

10) bpftool doc-generating script improvements, from Quentin Monnet.

11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.

12) Deprecation warning fix for perf/bpf_counter, from Song Liu.

13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
    from Tiezhu Yang.

14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.

15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
    Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
    and others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
  libbpf: Add "bool skipped" to struct bpf_map
  libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
  bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
  selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
  selftests/bpf: Add test for libbpf's custom log_buf behavior
  selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
  libbpf: Deprecate bpf_object__load_xattr()
  libbpf: Add per-program log buffer setter and getter
  libbpf: Preserve kernel error code and remove kprobe prog type guessing
  libbpf: Improve logging around BPF program loading
  libbpf: Allow passing user log setting through bpf_object_open_opts
  libbpf: Allow passing preallocated log_buf when loading BTF into kernel
  libbpf: Add OPTS-based bpf_btf_load() API
  libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
  samples/bpf: Remove unneeded variable
  bpf: Remove redundant assignment to pointer t
  selftests/bpf: Fix a compilation warning
  perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
  samples: bpf: Fix 'unknown warning group' build warning on Clang
  samples: bpf: Fix xdp_sample_user.o linking with Clang
  ...
====================

Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 15:56:13 -08:00
Peter Zijlstra
1614b2b11f arch: Make ARCH_STACKWALK independent of STACKTRACE
Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.

Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:03 +00:00
David Woodhouse
5f33868af8 KVM: powerpc: Use Makefile.kvm for common files
It's all fairly baroque but in the end, I don't think there's any reason
for $(KVM)/irqchip.o to have been handled differently, as they all end
up in $(kvm-y) in the end anyway, regardless of whether they get there
via $(common-objs-y) and the CPU-specific object lists.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Message-Id: <20211121125451.9489-7-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09 13:06:01 -05:00
Christophe Leroy
b149d5d45a powerpc/powermac: Add additional missing lockdep_register_key()
Commit df1f679d19 ("powerpc/powermac: Add missing
lockdep_register_key()") fixed a problem that was causing a WARNING.

There are two other places in the same file with the same problem
originating from commit 9e607f7274 ("i2c_powermac: shut up lockdep
warning").

Add missing lockdep_register_key()

Fixes: 9e607f7274 ("i2c_powermac: shut up lockdep warning")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Depends-on: df1f679d19 ("powerpc/powermac: Add missing lockdep_register_key()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200055
Link: https://lore.kernel.org/r/2c7e421874e21b2fb87813d768cf662f630c2ad4.1638984999.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:22 +11:00
Hari Bathini
06e629c25d powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dff
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bb ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f222 ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d3841 ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab26238bb ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com
2021-12-09 22:41:22 +11:00
Hari Bathini
219572d2fc powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
Kdump can be triggered after panic_notifers since commit f06e5153f4
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
after panic_notifers") introduced crash_kexec_post_notifiers option.
But using this option would mean smp_send_stop(), that marks all other
CPUs as offline, gets called before kdump is triggered. As a result,
kdump routines fail to save other CPUs' registers. To fix this, kdump
friendly crash_smp_send_stop() function was introduced with kernel
commit 0ee59413c9 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path"). Override this kdump friendly weak
function to handle crash_kexec_post_notifiers option appropriately
on powerpc.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
[Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-1-hbathini@linux.ibm.com
2021-12-09 22:41:22 +11:00
Anders Roxell
e89257e28e powerpc/cell: Fix clang -Wimplicit-fallthrough warning
Clang warns:

arch/powerpc/platforms/cell/pervasive.c:81:2: error: unannotated fall-through between switch labels
        case SRR1_WAKEEE:
        ^
arch/powerpc/platforms/cell/pervasive.c:81:2: note: insert 'break;' to avoid fall-through
        case SRR1_WAKEEE:
        ^
        break;
1 error generated.

Clang is more pedantic than GCC, which does not warn when failing
through to a case that is just break or return. Clang's version is more
in line with the kernel's own stance in deprecated.rst. Add athe missing
break to silence the warning.

Fixes: 6e83985b0f ("powerpc/cbe: Do not process external or decremeter interrupts from sreset")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207110228.698956-1-anders.roxell@linaro.org
2021-12-09 22:41:21 +11:00
Christophe Leroy
0d76914a4c powerpc/inst: Optimise copy_inst_from_kernel_nofault()
copy_inst_from_kernel_nofault() uses copy_from_kernel_nofault() to
copy one or two 32bits words. This means calling an out-of-line
function which itself calls back copy_from_kernel_nofault_allowed()
then performs a generic copy with loops.

Rewrite copy_inst_from_kernel_nofault() to do everything at a
single place and use __get_kernel_nofault() directly to perform
single accesses without loops.

Allthough the generic function uses pagefault_disable(), it is not
required on powerpc because do_page_fault() bails earlier when a
kernel mode fault happens on a kernel address.

As the function has now become very small, inline it.

With this change, on an 8xx the time spent in the loop in
ftrace_replace_code() is reduced by 23% at function tracer activation
and 27% at nop tracer activation.
The overall time to activate function tracer (measured with shell
command 'time') is 570ms before the patch and 470ms after the patch.

Even vmlinux size is reduced (by 152 instruction).

Before the patch:

	00000018 <copy_inst_from_kernel_nofault>:
	  18:	94 21 ff e0 	stwu    r1,-32(r1)
	  1c:	7c 08 02 a6 	mflr    r0
	  20:	38 a0 00 04 	li      r5,4
	  24:	93 e1 00 1c 	stw     r31,28(r1)
	  28:	7c 7f 1b 78 	mr      r31,r3
	  2c:	38 61 00 08 	addi    r3,r1,8
	  30:	90 01 00 24 	stw     r0,36(r1)
	  34:	48 00 00 01 	bl      34 <copy_inst_from_kernel_nofault+0x1c>
				34: R_PPC_REL24	copy_from_kernel_nofault
	  38:	2c 03 00 00 	cmpwi   r3,0
	  3c:	40 82 00 0c 	bne     48 <copy_inst_from_kernel_nofault+0x30>
	  40:	81 21 00 08 	lwz     r9,8(r1)
	  44:	91 3f 00 00 	stw     r9,0(r31)
	  48:	80 01 00 24 	lwz     r0,36(r1)
	  4c:	83 e1 00 1c 	lwz     r31,28(r1)
	  50:	38 21 00 20 	addi    r1,r1,32
	  54:	7c 08 03 a6 	mtlr    r0
	  58:	4e 80 00 20 	blr

After the patch (before inlining):

	00000018 <copy_inst_from_kernel_nofault>:
	  18:	3d 20 b0 00 	lis     r9,-20480
	  1c:	7c 04 48 40 	cmplw   r4,r9
	  20:	7c 69 1b 78 	mr      r9,r3
	  24:	41 80 00 14 	blt     38 <copy_inst_from_kernel_nofault+0x20>
	  28:	81 44 00 00 	lwz     r10,0(r4)
	  2c:	38 60 00 00 	li      r3,0
	  30:	91 49 00 00 	stw     r10,0(r9)
	  34:	4e 80 00 20 	blr

	  38:	38 60 ff de 	li      r3,-34
	  3c:	4e 80 00 20 	blr
	  40:	38 60 ff f2 	li      r3,-14
	  44:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Add clang workaround, with version check as suggested by Nathan]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d5b12183d5176dd702d29ad94c39c384e51c78f.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
9b307576f3 powerpc/inst: Move ppc_inst_t definition in asm/reg.h
Because of circular inclusion of asm/hw_breakpoint.h, we
need to move definition of asm/reg.h outside of inst.h
so that asm/hw_breakpoint.h gets it without including
asm/inst.h

Also remove asm/inst.h from asm/uprobes.h as it's not
needed anymore.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4b79f1491118af96b1ac0735e74aeca02ea4c04e.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
07b863aef5 powerpc/inst: Define ppc_inst_t as u32 on PPC32
Unlike PPC64 ABI, PPC32 uses the stack to pass a parameter defined
as a struct, even when the struct has a single simple element.

To avoid that, define ppc_inst_t as u32 on PPC32.

Keep it as 'struct ppc_inst' when __CHECKER__ is defined so that
sparse can perform type checking.

Also revert commit 511eea5e2c ("powerpc/kprobes: Fix Oops by passing
ppc_inst as a pointer to emulate_step() on ppc32") as now the
instruction to be emulated is passed as a register to emulate_step().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c6d0c46f598f76ad0b0a88bc0d84773bd921b17c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
c545b9f040 powerpc/inst: Define ppc_inst_t
In order to stop using 'struct ppc_inst' on PPC32,
define a ppc_inst_t typedef.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy
3261d99adb powerpc/inst: Refactor ___get_user_instr()
PPC64 version of ___get_user_instr() can be used for PPC32 as well,
by simply disabling the suffix part with IS_ENABLED(CONFIG_PPC64).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1f0ede830ccb33a659119a55cb590820c27004db.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
37eb7ca91b powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
Today we have the following IBATs allocated:

	---[ Instruction Block Address Translation ]---
	0: 0xc0000000-0xc03fffff 0x00000000         4M Kernel   x     m
	1: 0xc0400000-0xc05fffff 0x00400000         2M Kernel   x     m
	2: 0xc0600000-0xc06fffff 0x00600000         1M Kernel   x     m
	3: 0xc0700000-0xc077ffff 0x00700000       512K Kernel   x     m
	4: 0xc0780000-0xc079ffff 0x00780000       128K Kernel   x     m
	5: 0xc07a0000-0xc07bffff 0x007a0000       128K Kernel   x     m
	6:         -
	7:         -

The two 128K should be a single 256K instead.

When _etext is not aligned to 128Kbytes, the system will allocate
all necessary BATs to the lower 128Kbytes boundary, then allocate
an additional 128Kbytes BAT for the remaining block.

Instead, align the top to 128Kbytes so that the function directly
allocates a 256Kbytes last block:

	---[ Instruction Block Address Translation ]---
	0: 0xc0000000-0xc03fffff 0x00000000         4M Kernel   x     m
	1: 0xc0400000-0xc05fffff 0x00400000         2M Kernel   x     m
	2: 0xc0600000-0xc06fffff 0x00600000         1M Kernel   x     m
	3: 0xc0700000-0xc077ffff 0x00700000       512K Kernel   x     m
	4: 0xc0780000-0xc07bffff 0x00780000       256K Kernel   x     m
	5:         -
	6:         -
	7:         -

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ab58b296832b0ec650e2203200e060adbcb2677d.1637930421.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
dede19be51 powerpc: Remove CONFIG_PPC_HAVE_KUAP and CONFIG_PPC_HAVE_KUEP
All platforms now have KUAP and KUEP so remove CONFIG_PPC_HAVE_KUAP
and CONFIG_PPC_HAVE_KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a3c007ad0951965199e6ab2ef1035966bc66e771.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
57bc963837 powerpc/kuap: Wire-up KUAP on book3e/64
This adds KUAP support to book3e/64.
This is done by reading the content of SPRN_MAS1 and checking
the TID at the time user pgtable is loaded.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e2c2c9375afd4bbc06aa904d0103a5f5102a2b1a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
4f6a025201 powerpc/kuap: Wire-up KUAP on 85xx in 32 bits mode.
This adds KUAP support to 85xx in 32 bits mode.
This is done by reading the content of SPRN_MAS1 and checking
the TID at the time user pgtable is loaded.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f8696f8980ca1532ada3a2f0e0a03e756269c7fe.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
fcf9bb6d32 powerpc/kuap: Wire-up KUAP on 40x
This adds KUAP support to 40x. This is done by checking
the content of SPRN_PID at the time user pgtable is loaded.

40x doesn't have KUEP, but KUAP implies KUEP because when the
PID doesn't match the page's PID, the page cannot be read nor
executed.

So KUEP is now automatically selected when KUAP is selected and
disabled when KUAP is disabled.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aaefa91897ddc42ac11019dc0e1d1a525bd08e90.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
f6fad4fb55 powerpc/kuap: Wire-up KUAP on 44x
This adds KUAP support to 44x. This is done by checking
the content of SPRN_PID at the time it is read and written
into SPRN_MMUCR.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d6c3f1978a26feada74b084f651e8cf1e3b3a47.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy
43afcf8f01 powerpc: Add KUAP support for BOOKE and 40x
On booke/40x we don't have segments like book3s/32.
On booke/40x we don't have access protection groups like 8xx.

Use the PID register to provide user access protection.
Kernel address space can be accessed with any PID.
User address space has to be accessed with the PID of the user.
User PID is always not null.

Everytime the kernel is entered, set PID register to 0 and
restore PID register when returning to user.

Everytime kernel needs to access user data, PID is restored
for the access.

In TLB miss handlers, check the PID and bail out to data storage
exception when PID is 0 and accessed address is in user space.

Note that also forbids execution of user text by kernel except
when user access is unlocked. But this shouldn't be a problem
as the kernel is not supposed to ever run user text.

This patch prepares the infrastructure but the real activation of KUAP
is done by following patches for each processor type one by one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5d65576a8e31e9480415785a180c92dd4e72306d.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
e3c02f25b4 powerpc/kuap: Make PPC_KUAP_DEBUG depend on PPC_KUAP only
PPC_KUAP_DEBUG is supported by all platforms doing PPC_KUAP,
it doesn't depend on Radix on book3s/64.

This will avoid adding one more dependency when implementing
KUAP on book3e/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a5ff6228a36e51783b83d8c10d058db76e450f63.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
42e03bc524 powerpc/kuap: Prepare for supporting KUAP on BOOK3E/64
Also call kuap_lock() and kuap_save_and_lock() from
interrupt functions with CONFIG_PPC64.

For book3s/64 we keep them empty as it is done in assembly.

Also do the locked assert when switching task unless it is
book3s/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1cbf94e26e6d6e2e028fd687588a7e6622d454a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
047a6fd401 powerpc/config: Add CONFIG_BOOKE_OR_40x
We have many functionnalities common to 40x and BOOKE, it leads to
many places with #if defined(CONFIG_BOOKE) || defined(CONFIG_40x).

We are going to add a few more with KUAP for booke/40x, so create
a new symbol which is defined when either BOOKE or 40x is defined.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9a3dbd60924cb25c9f944d3d8205ac5a0d15e229.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
25ae981faf powerpc/nohash: Move setup_kuap out of 8xx.c
In order to reuse it on booke/4xx, move KUAP
setup routine out of 8xx.c

Make them usable on SMP by removing the __init tag
as it is called for each CPU.

And use __prevent_user_access() instead of hard
coding initial lock.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ae35eec3426509efc2b8ae69586c822e2fe2642a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
937fb7003e powerpc/kuap: Add kuap_lock()
Add kuap_lock() and call it when entering interrupts from user.

It is called kuap_lock() as it is similar to kuap_save_and_lock()
without the save.

However book3s/32 already have a kuap_lock(). Rename it
kuap_lock_addr().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4437e2deb9f6f549f7089d45e9c6f96a7e77905a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy
2341964e27 powerpc/kuap: Remove __kuap_assert_locked()
__kuap_assert_locked() is redundant with
__kuap_get_and_assert_locked().

Move the verification of CONFIG_PPC_KUAP_DEBUG in kuap_assert_locked()
and make it call __kuap_get_and_assert_locked() directly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1a60198a25d2ba38a37f1b92bc7d096435df4224.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
c252f3846d powerpc/kuap: Check KUAP activation in generic functions
Today, every platform checks that KUAP is not de-activated
before doing the real job.

Move the verification out of platform specific functions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/894f110397fcd248e125fb855d1e863e4e633a0d.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
ba454f9c8e powerpc/kuap: Add a generic intermediate layer
Make the following functions generic to all platforms.
- bad_kuap_fault()
- kuap_assert_locked()
- kuap_save_and_lock() (PPC32 only)
- kuap_kernel_restore()
- kuap_get_and_assert_locked()

And for all platforms except book3s/64
- allow_user_access()
- prevent_user_access()
- prevent_user_access_return()
- restore_user_access()

Prepend __ in front of the name of platform specific ones.

For now the generic just calls the platform specific, but
next patch will move redundant parts of specific functions
into the generic one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eaef143a8dae7288cd34565ffa7b49c16aee1ec3.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
6754862249 powerpc/kuep: Remove 'nosmep' boot time parameter except for book3s/64
Deactivating KUEP at boot time is unrelevant for PPC32 and BOOK3E/64.

Remove it.

It allows to refactor setup_kuep() via a __weak function
that only PPC64s will overide for now.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix CONFIG_PPC_BOOKS_64 -> CONFIG_PPC_BOOK3S_64 typo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4c36df18b41c988c4512f45d96220486adbe4c99.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:18 +11:00
Christophe Leroy
70428da94c powerpc/32s: Save content of sr0 to avoid 'mfsr'
Calling 'mfsr' to get the content of segment registers is heavy,
in addition it requires clearing of the 'reserved' bits.

In order to avoid this operation, save it in mm context and in
thread struct.

The saved sr0 is the one used by kernel, this means that on
locking entry it can be used as is.

For unlocking, the only thing to do is to clear SR_NX.

This improves null_syscall selftest by 12 cycles, ie 4%.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b02baf2ed8f09bad910dfaeeb7353b2ae6830525.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
526d4a4c77 powerpc/32s: Do kuep_lock() and kuep_unlock() in assembly
When interrupt and syscall entries where converted to C, KUEP locking
and unlocking was also converted. It improved performance by unrolling
the loop, and allowed easily implementing boot time deactivation of
KUEP.

However, null_syscall selftest shows that KUEP is still heavy
(361 cycles with KUEP, 212 cycles without).

A way to improve more is to group 'mtsr's together, instead of
repeating 'addi' + 'mtsr' several times.

In order to do that, more registers need to be available. In C, GCC
will always be able to provide the requested number of registers, but
at the cost of saving some data on the stack, which is counter
performant here.

So let's do it in assembly, when we have full control of which
register can be used. It also has the advantage of locking earlier
and unlocking later and it helps GCC generating less tricky code.
The only drawback is to make boot time deactivation less straight
forward and require 'hand' instruction patching.

Group 'mtsr's by 4.

With this change, null_syscall selftest reports 336 cycles. Without
the change it was 361 cycles, that's a 7% reduction.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/115cb279e9b9948dfd93a065e047081c59e3a2a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
df415cd758 powerpc/32s: Remove capability to disable KUEP at boottime
Disabling KUEP at boottime makes things unnecessarily complex.

Still allow disabling KUEP at build time, but when it's built-in
it is always there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/96f583f82423a29a4205c60b9721079111b35567.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
dc3a0e5b83 powerpc/book3e: Activate KUEP at all time
On book3e,
- When using 64 bits PTE: User pages don't have the SX bit defined
so KUEP is always active.
- When using 32 bits PTE: Implement KUEP by clearing SX bit during
TLB miss for user pages. The impact is minimal and worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e376b114283fb94504e2aa2de846780063252cde.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
ee2631603f powerpc/44x: Activate KUEP at all time
On 44x, KUEP is implemented by clearing SX bit during TLB miss
for user pages. The impact is minimal and not worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2414d662558e7fb27d1ed41c8e47c591d576acac.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
13dac4e31e powerpc/8xx: Activate KUEP at all time
On the 8xx, there is absolutely no runtime impact with KUEP. Protection
against execution of user code in kernel mode is set up at boot time
by configuring the groups with contain all user pages as having swapped
protection rights, in extenso EX for user and NA for supervisor.

Configure KUEP at startup and force selection of CONFIG_PPC_KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2129e86944323ffe9ed07fffbeafdfd2e363690a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy
6c1fa60d36 Revert "powerpc: Inline setup_kup()"
This reverts commit 1791ebd131.

setup_kup() was inlined to manage conflict between PPC32 marking
setup_{kuap/kuep}() __init and PPC64 not marking them __init.

But in fact PPC32 has removed the __init mark for all but 8xx
in order to properly handle SMP.

In order to make setup_kup() grow a bit, revert the commit
mentioned above but remove __init for 8xx as well so that
we don't have to mark setup_kup() as __ref.

Also switch the order so that KUAP is initialised before KUEP
because on the 40x, KUEP will depend on the activation of KUAP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7691088fd0994ee3c8db6298dc8c00259e3f6a7f.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:16 +11:00
Christophe Leroy
06e7cbc29e powerpc/40x: Map 32Mbytes of memory at startup
As reported by Carlo, 16Mbytes is not enough with modern kernels
that tend to be a bit big, so map another 16M page at boot.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/89b5f974a7fa5011206682cd092e2c905530ff46.1632755552.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:16 +11:00
Nicholas Piggin
31284f703d powerpc/microwatt: add POWER9_CPU, clear PPC_64S_HASH_MMU
Microwatt implements a subset of ISA v3.0 (which is equivalent to
the POWER9_CPU option). It is radix-only, so does not require hash
MMU support.

This saves 20kB compressed dtbImage and 56kB vmlinux size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-19-npiggin@gmail.com
2021-12-09 22:41:16 +11:00
Nicholas Piggin
387e220a2e powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves
128kB kernel image size (90kB text) on powernv_defconfig minus KVM,
350kB on pseries_defconfig minus KVM, 40kB on a tiny config.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG.
      Fix radix_enabled() use in setup_initial_memory_limit(). Add some
      stubs to reduce number of ifdefs.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
2021-12-09 22:41:13 +11:00
Nicholas Piggin
c28573744b powerpc/64s: Make hash MMU support configurable
This adds Kconfig selection which allows 64s hash MMU support to be
disabled. It can be disabled if radix support is enabled, the minimum
supported CPU type is POWER9 (or higher), and KVM is not selected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-17-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin
debeda0171 powerpc/64s: Always define arch unmapped area calls
To avoid any functional changes to radix paths when building with hash
MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the
arch get_unmapped_area calls on 64s platforms.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-16-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin
af3a0ea41c powerpc/64s: Fix radix MMU when MMU_FTR_HPTE_TABLE is clear
There are a few places that require MMU_FTR_HPTE_TABLE to be set even
when running in radix mode. Fix those up.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-15-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin
8dbfc0092b powerpc/64e: remove mmu_linear_psize
mmu_linear_psize is only set at boot once on 64e, is not necessarily
the correct size of the linear map pages, and is never used anywhere.
Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Retain the extern, so we can use IS_ENABLED() for related code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-14-npiggin@gmail.com
2021-12-09 22:39:39 +11:00
Thomas Gleixner
e58f2259b9 genirq/msi, treewide: Use a named struct for PCI/MSI attributes
The unnamed struct sucks and is in the way of further cleanups. Stick the
PCI related MSI data into a real data structure and cleanup all users.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211206210224.374863119@linutronix.de
2021-12-09 11:52:21 +01:00
Cédric Le Goater
eca213152a powerpc/4xx: Complete removal of MSI support
Finish the work by removing all references to the PPC4xx_MSI config
and the associated device nodes in the DTs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/e92f2bb3-b5e1-c870-8151-3917a789a640@kaod.org
2021-12-09 11:52:20 +01:00
Thomas Gleixner
4f1d038b5e powerpc/4xx: Remove MSI support which never worked
This code is broken since day one. ppc4xx_setup_msi_irqs() has the
following gems:

 1) The handling of the result of msi_bitmap_alloc_hwirqs() is completely
    broken:
    
    When the result is greater than or equal 0 (bitmap allocation
    successful) then the loop terminates and the function returns 0
    (success) despite not having installed an interrupt.

    When the result is less than 0 (bitmap allocation fails), it prints an
    error message and continues to "work" with that error code which would
    eventually end up in the MSI message data.

 2) On every invocation the file global pp4xx_msi::msi_virqs bitmap is
    allocated thereby leaking the previous one.

IOW, this has never worked and for more than 10 years nobody cared. Remove
the gunk.

Fixes: 3fb7933850 ("powerpc/4xx: Adding PCIe MSI support")
Fixes: 247540b03b ("powerpc/44x: Fix PCI MSI support for Maui APM821xx SoC and Bluestone board")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210223.872249537@linutronix.de
2021-12-09 11:52:20 +01:00
Paolo Bonzini
93b350f884 Merge branch 'kvm-on-hv-msrbm-fix' into HEAD
Merge bugfix for enlightened MSR Bitmap, before adding support
to KVM for exposing the feature to nested guests.
2021-12-08 05:30:48 -05:00
Sean Christopherson
91b99ea706 KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt()
Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting
the actual "block" sequences into a separate helper (to be named
kvm_vcpu_block()).  x86 will use the standalone block-only path to handle
non-halt cases where the vCPU is not runnable.

Rename block_ns to halt_ns to match the new function name.

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:51 -05:00
Sean Christopherson
005467e06b KVM: Drop obsolete kvm_arch_vcpu_block_finish()
Drop kvm_arch_vcpu_block_finish() now that all arch implementations are
nops.

No functional change intended.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:50 -05:00
Sean Christopherson
510958e997 KVM: Force PPC to define its own rcuwait object
Do not define/reference kvm_vcpu.wait if __KVM_HAVE_ARCH_WQP is true, and
instead force the architecture (PPC) to define its own rcuwait object.
Allowing common KVM to directly access vcpu->wait without a guard makes
it all too easy to introduce potential bugs, e.g. kvm_vcpu_block(),
kvm_vcpu_on_spin(), and async_pf_execute() all operate on vcpu->wait, not
the result of kvm_arch_vcpu_get_wait(), and so may do the wrong thing for
PPC.

Due to PPC's shenanigans with respect to callbacks and waits (it switches
to the virtual core's wait object at KVM_RUN!?!?), it's not clear whether
or not this fixes any bugs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:46 -05:00
Maciej S. Szmigiero
a54d806688 KVM: Keep memslots in tree-based structures instead of array-based ones
The current memslot code uses a (reverse gfn-ordered) memslot array for
keeping track of them.

Because the memslot array that is currently in use cannot be modified
every memslot management operation (create, delete, move, change flags)
has to make a copy of the whole array so it has a scratch copy to work on.

Strictly speaking, however, it is only necessary to make copy of the
memslot that is being modified, copying all the memslots currently present
is just a limitation of the array-based memslot implementation.

Two memslot sets, however, are still needed so the VM continues to run
on the currently active set while the requested operation is being
performed on the second, currently inactive one.

In order to have two memslot sets, but only one copy of actual memslots
it is necessary to split out the memslot data from the memslot sets.

The memslots themselves should be also kept independent of each other
so they can be individually added or deleted.

These two memslot sets should normally point to the same set of
memslots. They can, however, be desynchronized when performing a
memslot management operation by replacing the memslot to be modified
by its copy.  After the operation is complete, both memslot sets once
again point to the same, common set of memslot data.

This commit implements the aforementioned idea.

For tracking of gfns an ordinary rbtree is used since memslots cannot
overlap in the guest address space and so this data structure is
sufficient for ensuring that lookups are done quickly.

The "last used slot" mini-caches (both per-slot set one and per-vCPU one),
that keep track of the last found-by-gfn memslot, are still present in the
new code.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:34 -05:00
Maciej S. Szmigiero
ed922739c9 KVM: Use interval tree to do fast hva lookup in memslots
The current memslots implementation only allows quick binary search by gfn,
quick lookup by hva is not possible - the implementation has to do a linear
scan of the whole memslots array, even though the operation being performed
might apply just to a single memslot.

This significantly hurts performance of per-hva operations with higher
memslot counts.

Since hva ranges can overlap between memslots an interval tree is needed
for tracking them.

[sean: handle interval tree updates in kvm_replace_memslot()]
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <d66b9974becaa9839be9c4e1a5de97b177b4ac20.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:32 -05:00
Sean Christopherson
6a99c6e3f5 KVM: Stop passing kvm_userspace_memory_region to arch memslot hooks
Drop the @mem param from kvm_arch_{prepare,commit}_memory_region() now
that its use has been removed in all architectures.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <aa5ed3e62c27e881d0d8bc0acbc1572bc336dc19.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:25 -05:00
Sean Christopherson
eaaaed137e KVM: PPC: Avoid referencing userspace memory region in memslot updates
For PPC HV, get the number of pages directly from the new memslot instead
of computing the same from the userspace memory region, and explicitly
check for !DELETE instead of inferring the same when toggling mmio_update.
The motivation for these changes is to avoid referencing the @mem param
so that it can be dropped in a future commit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <1e97fb5198be25f98ef82e63a8d770c682264cc9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:22 -05:00
Sean Christopherson
537a17b314 KVM: Let/force architectures to deal with arch specific memslot data
Pass the "old" slot to kvm_arch_prepare_memory_region() and force arch
code to handle propagating arch specific data from "new" to "old" when
necessary.  This is a baby step towards dynamically allocating "new" from
the get go, and is a (very) minor performance boost on x86 due to not
unnecessarily copying arch data.

For PPC HV, copy the rmap in the !CREATE and !DELETE paths, i.e. for MOVE
and FLAGS_ONLY.  This is functionally a nop as the previous behavior
would overwrite the pointer for CREATE, and eventually discard/ignore it
for DELETE.

For x86, copy the arch data only for FLAGS_ONLY changes.  Unlike PPC HV,
x86 needs to reallocate arch data in the MOVE case as the size of x86's
allocations depend on the alignment of the memslot's gfn.

Opportunistically tweak kvm_arch_prepare_memory_region()'s param order to
match the "commit" prototype.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
[mss: add missing RISCV kvm_arch_prepare_memory_region() change]
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <67dea5f11bbcfd71e3da5986f11e87f5dd4013f9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:20 -05:00
Marc Zyngier
46808a4cb8 KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu
index. Unfortunately, we're about to move rework the iterator,
which requires this to be upgrade to an unsigned long.

Let's bite the bullet and repaint all of it in one go.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-7-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:15 -05:00
Marc Zyngier
27592ae8db KVM: Move wiping of the kvm->vcpus array to common code
All architectures have similar loops iterating over the vcpus,
freeing one vcpu at a time, and eventually wiping the reference
off the vcpus array. They are also inconsistently taking
the kvm->lock mutex when wiping the references from the array.

Make this code common, which will simplify further changes.
The locking is dropped altogether, as this should only be called
when there is no further references on the kvm structure.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-2-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:13 -05:00
Sebastian Andrzej Siewior
77993b595a locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
The printk header file includes ratelimit_types.h for its __ratelimit()
based usage. It is required for the static initializer used in
printk_ratelimited(). It uses a raw_spinlock_t and includes the
spinlock_types.h.

PREEMPT_RT substitutes spinlock_t with a rtmutex based implementation and so
its spinlock_t implmentation (provided by spinlock_rt.h) includes rtmutex.h and
atomic.h which leads to recursive includes where defines are missing.

By including only the raw_spinlock_t defines it avoids the atomic.h
related includes at this stage.

An example on powerpc:

|  CALL    scripts/atomic/check-atomics.sh
|In file included from include/linux/bug.h:5,
|                 from include/linux/page-flags.h:10,
|                 from kernel/bounds.c:10:
|arch/powerpc/include/asm/page_32.h: In function ‘clear_page’:
|arch/powerpc/include/asm/bug.h:87:4: error: implicit declaration of function â=80=98__WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
|   87 |    __WARN();    \
|      |    ^~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
|   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
|      |  ^~~~~~~
|arch/powerpc/include/asm/bug.h:58:17: error: invalid application of ‘sizeofâ€=99 to incomplete type ‘struct bug_entryâ€=99
|   58 |     "i" (sizeof(struct bug_entry)), \
|      |                 ^~~~~~
|arch/powerpc/include/asm/bug.h:89:3: note: in expansion of macro ‘BUG_ENTRYâ€=99
|   89 |   BUG_ENTRY(PPC_TLNEI " %4, 0",   \
|      |   ^~~~~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
|   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
|      |  ^~~~~~~
|In file included from arch/powerpc/include/asm/ptrace.h:298,
|                 from arch/powerpc/include/asm/hw_irq.h:12,
|                 from arch/powerpc/include/asm/irqflags.h:12,
|                 from include/linux/irqflags.h:16,
|                 from include/asm-generic/cmpxchg-local.h:6,
|                 from arch/powerpc/include/asm/cmpxchg.h:526,
|                 from arch/powerpc/include/asm/atomic.h:11,
|                 from include/linux/atomic.h:7,
|                 from include/linux/rwbase_rt.h:6,
|                 from include/linux/rwlock_types.h:55,
|                 from include/linux/spinlock_types.h:74,
|                 from include/linux/ratelimit_types.h:7,
|                 from include/linux/printk.h:10,
|                 from include/asm-generic/bug.h:22,
|                 from arch/powerpc/include/asm/bug.h:109,
|                 from include/linux/bug.h:5,
|                 from include/linux/page-flags.h:10,
|                 from kernel/bounds.c:10:
|include/linux/thread_info.h: In function â=80=98copy_overflowâ=80=99:
|include/linux/thread_info.h:210:2: error: implicit declaration of function â=80=98WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
|  210 |  WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
|      |  ^~~~

The WARN / BUG include pulls in printk.h and then ptrace.h expects WARN
(from bug.h) which is not yet complete. Even hw_irq.h has WARN_ON()
statements.

On POWERPC64 there are missing atomic64 defines while building 32bit
VDSO:
|  VDSO32C arch/powerpc/kernel/vdso32/vgettimeofday.o
|In file included from include/linux/atomic.h:80,
|                 from include/linux/rwbase_rt.h:6,
|                 from include/linux/rwlock_types.h:55,
|                 from include/linux/spinlock_types.h:74,
|                 from include/linux/ratelimit_types.h:7,
|                 from include/linux/printk.h:10,
|                 from include/linux/kernel.h:19,
|                 from arch/powerpc/include/asm/page.h:11,
|                 from arch/powerpc/include/asm/vdso/gettimeofday.h:5,
|                 from include/vdso/datapage.h:137,
|                 from lib/vdso/gettimeofday.c:5,
|                 from <command-line>:
|include/linux/atomic-arch-fallback.h: In function ‘arch_atomic64_incâ€=99:
|include/linux/atomic-arch-fallback.h:1447:2: error: implicit declaration of function ‘arch_atomic64_add’; did you mean ‘arch_atomic_add’? [-Werror=3Dimpl
|icit-function-declaration]
| 1447 |  arch_atomic64_add(1, v);
|      |  ^~~~~~~~~~~~~~~~~
|      |  arch_atomic_add

The generic fallback is not included, atomics itself are not used. If
kernel.h does not include printk.h then it comes later from the bug.h
include.

Allow asm/spinlock_types.h to be included from
linux/spinlock_types_raw.h.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211129174654.668506-12-bigeasy@linutronix.de
2021-12-07 15:14:12 +01:00
Nicholas Piggin
20626177c9 powerpc: make memremap_compat_align 64s-only
memremap_compat_align is only relevant when ZONE_DEVICE is selected.
ZONE_DEVICE depends on ARCH_HAS_PTE_DEVMAP, which is only selected
by PPC_BOOK3S_64.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-13-npiggin@gmail.com
2021-12-02 22:57:24 +11:00
Nicholas Piggin
ffbe5d21d1 powerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radix
Radix never sets mmu_linear_psize so it's always 4K, which causes pcpu
atom_size to always be PAGE_SIZE. 64e sets it to 1GB always.

Make paths for these platforms to be explicit about what value they set
atom_size to.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-12-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
f43d2ffb47 powerpc/64s: Rename hash_hugetlbpage.c to hugetlbpage.c
This file contains functions and data common to radix, so rename it to
remove the hash_ prefix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-11-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
bdad5d57df powerpc/64s: move page size definitions from hash specific file
The radix code uses some of the psize variables. Move the common
ones from hash_utils.c to pgtable.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-10-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
310dce6201 powerpc/64s: Make flush_and_reload_slb a no-op when radix is enabled
The radix test can exclude slb_flush_all_realmode() from being called
because flush_and_reload_slb() is only expected to flush ERAT when
called by flush_erat(), which is only on pre-ISA v3.0 CPUs that do not
support radix.

This helps the later change to make hash support configurable to not
introduce runtime changes to radix mode behaviour.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-9-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
162b0889bb powerpc/64s: move THP trace point creation out of hash specific file
In preparation for making hash MMU support configurable, move THP
trace point function definitions out of an otherwise hash-specific
file.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-8-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
3d3282fd34 powerpc/pseries: lparcfg don't include slb_size line in radix mode
This avoids a change in behaviour in the later patch making hash
support configurable. This is possibly a user interface change, so
the alternative would be a hard-coded slb_size=0 here.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-7-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
0c7cc15e92 powerpc/pseries: move process table registration away from hash-specific code
This reduces ifdefs in a later change which makes hash support configurable.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-6-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
935b534c24 powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific
slb.c is hash-specific SLB management, but do_bad_slb_fault deals with
segment interrupts that occur with radix MMU as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-5-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
a4135cbebd powerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVE
The pseries platform does not use the native hash code but the PAPR
virtualised hash interfaces, so remove PPC_HASH_MMU_NATIVE.

This requires moving tlbiel code from hash_native.c to hash_utils.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-4-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin
7ebc49031d powerpc: Rename PPC_NATIVE to PPC_HASH_MMU_NATIVE
PPC_NATIVE now only controls the native HPT code, so rename it to be
more descriptive. Restrict it to Book3S only.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-3-npiggin@gmail.com
2021-12-02 22:57:22 +11:00
Nicholas Piggin
79b74a6848 powerpc: Remove unused FW_FEATURE_NATIVE references
FW_FEATURE_NATIVE_ALWAYS and FW_FEATURE_NATIVE_POSSIBLE are always
zero and never do anything. Remove them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-2-npiggin@gmail.com
2021-12-02 22:57:22 +11:00
Alexey Kardashevskiy
792020907b KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST
H_COPY_TOFROM_GUEST is an hcall for an upper level VM to access its nested
VMs memory. The userspace can trigger WARN_ON_ONCE(!(gfp & __GFP_NOWARN))
in __alloc_pages() by constructing a tiny VM which only does
H_COPY_TOFROM_GUEST with a too big GPR9 (number of bytes to copy).

This silences the warning by adding __GFP_NOWARN.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901084550.1658699-1-aik@ozlabs.ru
2021-12-02 22:56:08 +11:00
Alexey Kardashevskiy
511d25d6b7 KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
The userspace can trigger "vmalloc size %lu allocation failure: exceeds
total pages" via the KVM_SET_USER_MEMORY_REGION ioctl.

This silences the warning by checking the limit before calling vzalloc()
and returns ENOMEM if failed.

This does not call underlying valloc helpers as __vmalloc_node() is only
exported when CONFIG_TEST_VMALLOC_MODULE and __vmalloc_node_range() is
not exported at all.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Use 'size' for the variable rather than 'cb']
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901084512.1658628-1-aik@ozlabs.ru
2021-12-02 22:55:10 +11:00
Nicholas Piggin
f6a1987773 KVM: PPC: Book3S HV P9: Remove unused ri_set local variable
ri_set is set and never used.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201052112.2137167-1-npiggin@gmail.com
2021-12-02 10:42:40 +11:00
Cédric Le Goater
2a2ac8a701 powerpc/xive: Fix compile when !CONFIG_PPC_POWERNV.
The automatic "save & restore" of interrupt context is a POWER10/XIVE2
feature exploited by KVM under the PowerNV platform. It is not
available under pSeries and the associated toggle should not be
exposed under the XIVE debugfs directory.

Introduce a platform handler for debugfs initialization and move the
'save-restore' entry under the native (PowerNV) backend to fix compile
when !CONFIG_PPC_POWERNV.

Fixes: 1e7684dc4f ("powerpc/xive: Add a debugfs toggle for save-restore")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201165418.1041842-1-clg@kaod.org
2021-12-02 10:40:38 +11:00
Kees Cook
62ea67e319 powerpc/signal32: Use struct_group() to zero spe regs
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add a struct_group() for the spe registers so that memset() can correctly reason
about the size:

   In function 'fortify_memset_chk',
       inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3:
   >> include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
     195 |    __write_overflow_field();
         |    ^~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211118203604.1288379-1-keescook@chromium.org
2021-12-02 10:39:00 +11:00
Mark Rutland
985faa7868 powerpc: Snapshot thread flags
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Generally this is unlikely to cause a
problem in practice, but it is somewhat unsound, and KCSAN will
legitimately warn that there is a data race.

To avoid such issues, a snapshot of the flags has to be taken prior to
using them. Some places already use READ_ONCE() for that, others do not.

Convert them all to the new flag accessor helpers.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Link: https://lore.kernel.org/r/20211129130653.2037928-11-mark.rutland@arm.com
2021-12-01 00:06:44 +01:00
Mark Rutland
08b0af5b2a powerpc: Avoid discarding flags in system_call_exception()
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Thus, when setting flags we must use
an atomic operation rather than a plain read-modify-write sequence, as a
plain read-modify-write may discard flags which are concurrently set by a
remote thread, e.g.

	// task A			// task B
	tmp = A->thread_info.flags;
					set_tsk_thread_flag(A, NEWFLAG_B);
	tmp |= NEWFLAG_A;
	A->thread_info.flags = tmp;

arch/powerpc/kernel/interrupt.c's system_call_exception() sets
_TIF_RESTOREALL in the thread info flags with a read-modify-write, which
may result in other flags being discarded.

Elsewhere in the file it uses clear_bits() to atomically remove flag bits,
so use set_bits() here for consistency with those.

There may be reasons (e.g. instrumentation) that prevent the use of
set_thread_flag() and clear_thread_flag() here, which would otherwise be
preferable.

Fixes: ae7aaecc3f ("powerpc/64s: system call rfscv workaround for TM bugs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Eirik Fuller <efuller@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20211129130653.2037928-10-mark.rutland@arm.com
2021-12-01 00:06:44 +01:00
Christophe Leroy
af11dee436 powerpc/32s: Fix shift-out-of-bounds in KASAN init
================================================================================
UBSAN: shift-out-of-bounds in arch/powerpc/mm/kasan/book3s_32.c:22:23
shift exponent -1 is negative
CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.5-gentoo-PowerMacG4 #9
Call Trace:
[c214be60] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
[c214be80] [c0b99288] ubsan_epilogue+0x10/0x5c
[c214be90] [c0b98fe0] __ubsan_handle_shift_out_of_bounds+0x94/0x138
[c214bf00] [c1c0f010] kasan_init_region+0xd8/0x26c
[c214bf30] [c1c0ed84] kasan_init+0xc0/0x198
[c214bf70] [c1c08024] setup_arch+0x18/0x54c
[c214bfc0] [c1c037f0] start_kernel+0x90/0x33c
[c214bff0] [00003610] 0x3610
================================================================================

This happens when the directly mapped memory is a power of 2.

Fix it by checking the shift and set the result to 0 when shift is -1

Fixes: 7974c47326 ("powerpc/32s: Implement dedicated kasan_init_region()")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215169
Link: https://lore.kernel.org/r/15cbc3439d4ad988b225e2119ec99502a5cc6ad3.1638261744.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:39 +11:00
Christophe Leroy
df1f679d19 powerpc/powermac: Add missing lockdep_register_key()
KeyWest i2c @0xf8001003 irq 42 /uni-n@f8000000/i2c@f8001000
BUG: key c2d00cbc has not been registered!
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:4801 lockdep_init_map_type+0x4c0/0xb4c
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.5-gentoo-PowerMacG4 #9
NIP:  c01a9428 LR: c01a9428 CTR: 00000000
REGS: e1033cf0 TRAP: 0700   Not tainted  (5.15.5-gentoo-PowerMacG4)
MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24002002  XER: 00000000

GPR00: c01a9428 e1033db0 c2d1cf20 00000016 00000004 00000001 c01c0630 e1033a73
GPR08: 00000000 00000000 00000000 e1033db0 24002004 00000000 f8729377 00000003
GPR16: c1829a9c 00000000 18305357 c1416fc0 c1416f80 c006ac60 c2d00ca8 c1416f00
GPR24: 00000000 c21586f0 c2160000 00000000 c2d00cbc c2170000 c216e1a0 c2160000
NIP [c01a9428] lockdep_init_map_type+0x4c0/0xb4c
LR [c01a9428] lockdep_init_map_type+0x4c0/0xb4c
Call Trace:
[e1033db0] [c01a9428] lockdep_init_map_type+0x4c0/0xb4c (unreliable)
[e1033df0] [c1c177b8] kw_i2c_add+0x334/0x424
[e1033e20] [c1c18294] pmac_i2c_init+0x9ec/0xa9c
[e1033e80] [c1c1a790] smp_core99_probe+0xbc/0x35c
[e1033eb0] [c1c03cb0] kernel_init_freeable+0x190/0x5a4
[e1033f10] [c000946c] kernel_init+0x28/0x154
[e1033f30] [c0035148] ret_from_kernel_thread+0x14/0x1c

Add missing lockdep_register_key()

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/69e4f55565bb45ebb0843977801b245af0c666fe.1638264741.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:39 +11:00
Christophe Leroy
f1797e4de1 powerpc/modules: Don't WARN on first module allocation attempt
module_alloc() first tries to allocate module text within 24 bits direct
jump from kernel text, and tries a wider allocation if first one fails.

When first allocation fails the following is observed in kernel logs:

  vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size
  systemd-udevd: vmalloc error: size 2395133, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null)
  CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G        W         5.15.5-gentoo-PowerMacG4 #9
  Call Trace:
  [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
  [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4
  [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c
  [e2a53c10] [c00338c0] module_alloc+0xa0/0xac
  [e2a53c40] [c027a368] load_module+0x2ae0/0x8148
  [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130
  [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28
  ...

Add __GFP_NOWARN flag to first allocation so that no warning appears
when it fails.

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Fixes: 2ec13df167 ("powerpc/modules: Load modules closer to kernel text")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:32 +11:00
Nicholas Piggin
5402e239d0 powerpc/64s: Get LPID bit width from device tree
Allow the LPID bit width and partition table size to be set at runtime
from the device tree.

Move the PID bit width detection into the same place.

KVM does not support using the extra bits yet, this is mainly required
to get the PTCR register values correct (so KVM will run but it will
not allocate > 4096 LPIDs).

OPAL firmware provides this property for POWER10 CPUs since skiboot
commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for
POWER10 CPUs").

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com
2021-11-30 22:27:07 +11:00
Athira Rajeev
2c9ac51b85 powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC
Running perf fuzzer showed below in dmesg logs:
  "Can't find PMC that caused IRQ"

This means a PMU exception happened, but none of the PMC's (Performance
Monitor Counter) were found to be overflown. There are some corner cases
that clears the PMCs after PMI gets masked. In such cases, the perf
interrupt handler will not find the active PMC values that had caused
the overflow and thus leads to this message while replaying.

Case 1: PMU Interrupt happens during replay of other interrupts and
counter values gets cleared by PMU callbacks before replay:

During replay of interrupts like timer, __do_irq() and doorbell
exception, we conditionally enable interrupts via may_hard_irq_enable().
This could potentially create a window to generate a PMI. Since irq soft
mask is set to ALL_DISABLED, the PMI will get masked here. We could get
IPIs run before perf interrupt is replayed and the PMU events could
be deleted or stopped. This will change the PMU SPR values and resets
the counters. Snippet of ftrace log showing PMU callbacks invoked in
__do_irq():

  <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
  <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
  <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
  <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
  <<>>
  <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
  <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
  <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
  <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
  <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
  <<>>
  <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
  <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts

Case 2: PMI's masked during local_* operations, example local_add(). If
the local_add() operation happens within a local_irq_save(), replay of
PMI will be during local_irq_restore(). Similar to case 1, this could
also create a window before replay where PMU events gets deleted or
stopped.

Fix it by updating the PMU callback function power_pmu_disable() to
check for pending perf interrupt. If there is an overflown PMC and
pending perf interrupt indicated in paca, clear the PMI bit in paca to
drop that sample. Clearing of PMI bit is done in power_pmu_disable()
since disable is invoked before any event gets deleted/stopped. With
this fix, if there are more than one event running in the PMU, there is
a chance that we clear the PMI bit for the event which is not getting
deleted/stopped. The other events may still remain active. Hence to make
sure we don't drop valid sample in such cases, another check is added in
power_pmu_enable. This checks if there is an overflown PMC found among
the active events and if so enable back the PMI bit. Two new helper
functions are introduced to clear/set the PMI, ie
clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function
pmi_irq_pending() is introduced to give a warning if there is pending
PMI bit in paca, but no PMC is overflown.

Also there are corner cases which result in performance monitor
interrupts being triggered during power_pmu_disable(). This happens
since PMXE bit is not cleared along with disabling of other MMCR0 bits
in the pmu_disable. Such PMI's could leave the PMU running and could
trigger PMI again which will set MMCR0 PMAO bit. This could lead to
spurious interrupts in some corner cases. Example, a timer after
power_pmu_del() which will re-enable interrupts and triggers a PMI again
since PMAO bit is still set. But fails to find valid overflow since PMC
was cleared in power_pmu_del(). Fix that by disabling PMXE along with
disabling of other MMCR0 bits in power_pmu_disable().

We can't just replay PMI any time. Hence this approach is preferred
rather than replaying PMI before resetting overflown PMC. Patch also
documents core-book3s on a race condition which can trigger these PMC
messages during idle path in PowerNV.

Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Make pmi_irq_pending() return bool, reflow/reword some comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-11-30 17:15:49 +11:00
Christophe Leroy
f05cab0034 powerpc/atomics: Remove atomic_inc()/atomic_dec() and friends
Now that atomic_add() and atomic_sub() handle immediate operands,
atomic_inc() and atomic_dec() have no added value compared to the
generic fallback which calls atomic_add(1) and atomic_sub(1).

Also remove atomic_inc_not_zero() which fallsback to
atomic_add_unless() which itself fallsback to
atomic_fetch_add_unless() which now handles immediate operands.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0bc64a2f18726055093dbb2e479cefc60a409cfd.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:57 +11:00
Christophe Leroy
41d65207de powerpc/atomics: Use immediate operand when possible
Today we get the following code generation for atomic operations:

	c001bb2c:	39 20 00 01 	li      r9,1
	c001bb30:	7d 40 18 28 	lwarx   r10,0,r3
	c001bb34:	7d 09 50 50 	subf    r8,r9,r10
	c001bb38:	7d 00 19 2d 	stwcx.  r8,0,r3

	c001c7a8:	39 40 00 01 	li      r10,1
	c001c7ac:	7d 00 18 28 	lwarx   r8,0,r3
	c001c7b0:	7c ea 42 14 	add     r7,r10,r8
	c001c7b4:	7c e0 19 2d 	stwcx.  r7,0,r3

By allowing GCC to choose between immediate or regular operation,
we get:

	c001bb2c:	7d 20 18 28 	lwarx   r9,0,r3
	c001bb30:	39 49 ff ff 	addi    r10,r9,-1
	c001bb34:	7d 40 19 2d 	stwcx.  r10,0,r3
	--
	c001c7a4:	7d 40 18 28 	lwarx   r10,0,r3
	c001c7a8:	39 0a 00 01 	addi    r8,r10,1
	c001c7ac:	7d 00 19 2d 	stwcx.  r8,0,r3

For "and", the dot form has to be used because "andi" doesn't exist.

For logical operations we use unsigned 16 bits immediate.
For arithmetic operations we use signed 16 bits immediate.

On pmac32_defconfig, it reduces the text by approx another 8 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2ec558d44db8045752fe9dbd29c9ba84bab6030b.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:57 +11:00
Christophe Leroy
fb350784d8 powerpc/bitops: Use immediate operand when possible
Today we get the following code generation for bitops like
set or clear bit:

	c0009fe0:	39 40 08 00 	li      r10,2048
	c0009fe4:	7c e0 40 28 	lwarx   r7,0,r8
	c0009fe8:	7c e7 53 78 	or      r7,r7,r10
	c0009fec:	7c e0 41 2d 	stwcx.  r7,0,r8

	c000d568:	39 00 18 00 	li      r8,6144
	c000d56c:	7c c0 38 28 	lwarx   r6,0,r7
	c000d570:	7c c6 40 78 	andc    r6,r6,r8
	c000d574:	7c c0 39 2d 	stwcx.  r6,0,r7

Most set bits are constant on lower 16 bits, so it can easily
be replaced by the "immediate" version of the operation. Allow
GCC to choose between the normal or immediate form.

For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
when all bits to be cleared are consecutive.

On 64 bits we don't have any equivalent single operation for clearing,
single bits or a few bits, we'd need two 'rldicl' so it is not
worth it, the li/andc sequence is doing the same.

With this patch we get:

	c0009fe0:	7d 00 50 28 	lwarx   r8,0,r10
	c0009fe4:	61 08 08 00 	ori     r8,r8,2048
	c0009fe8:	7d 00 51 2d 	stwcx.  r8,0,r10

	c000d558:	7c e0 40 28 	lwarx   r7,0,r8
	c000d55c:	54 e7 05 64 	rlwinm  r7,r7,0,21,18
	c000d560:	7c e0 41 2d 	stwcx.  r7,0,r8

On pmac32_defconfig, it reduces the text by approx 10 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:50 +11:00
Nicholas Piggin
aebd1fb45c powerpc: flexible GPR range save/restore macros
Introduce macros that operate on a (start, end) range of GPRs, which
reduces lines of code and need to do mental arithmetic while reading the
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com
2021-11-29 23:15:20 +11:00
Nicholas Piggin
e012c49998 powerpc/watchdog: help remote CPUs to flush NMI printk output
The printk layer at the moment does not seem to have a good way to force
flush printk messages that are created in NMI context, except in the
panic path.

NMI-context printk messages normally get to the console with irq_work,
but that won't help if the CPU is stuck with irqs disabled, as can be
the case for hard lockup watchdog messages.

The watchdog currently flushes the printk buffers after detecting a
lockup on remote CPUs, but they may not have processed their NMI IPI
yet by that stage, or they may have self-detected a lockup in which
case they won't go via this NMI IPI path.

Improve the situation by having NMI-context mark a flag if it called
printk, and have watchdog timer interrupts check if that flag was set
and try to flush if it was. Latency is not a big problem because we
were already stuck for a while, just need to try to make sure the
messages eventually make it out.

Depends-on: 5d5e4522a7 ("printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119113146.752759-6-npiggin@gmail.com
2021-11-29 23:08:43 +11:00
Christophe Leroy
57dd3a7bdf powerpc: Don't bother about .data..Lubsan sections
Since commit 9a427556fb ("vmlinux.lds.h: catch compound literals
into data and BSS") .data..Lubsan sections are taken into account
in DATA_MAIN which is included in DATA_DATA macro.

No need to take care of them anymore in powerpc vmlinux.lds.S

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3eb14570612eef17e01bb67f14a4450136001794.1637840601.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
cdc81aece8 powerpc/ptdump: Fix display a BAT's size unit
We have wrong units on BAT's sizes (G instead of M, M instead of ...)

	---[ Instruction Block Address Translation ]---
	0: 0xc0000000-0xc03fffff 0x00000000         4G Kernel   x     m
	1: 0xc0400000-0xc05fffff 0x00400000         2G Kernel   x     m
	2: 0xc0600000-0xc06fffff 0x00600000         1G Kernel   x     m
	3: 0xc0700000-0xc077ffff 0x00700000       512M Kernel   x     m
	4: 0xc0780000-0xc079ffff 0x00780000       128M Kernel   x     m
	5: 0xc07a0000-0xc07bffff 0x007a0000       128M Kernel   x     m
	6:         -
	7:         -

This is because pt_dump_size() expects a size in Kbytes but
bat_show_603() gives the size in bytes.

To avoid risk of confusion, change pt_dump_size() to take bytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f16c30f5c9185a63335322cf1a8b22f189d335ef.1637922595.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
7dfbfb87c2 powerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32
Unlike PPC64, PPC32 doesn't require any special compiler option
to get _mcount() call not clobbering registers.

Provide ftrace_regs_caller() and ftrace_regs_call() and activate
HAVE_DYNAMIC_FTRACE_WITH_REGS.

That's heavily copied from ftrace_64_mprofile.S

For the time being leave livepatching aside, it will come with
following patch.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1862dc7719855cc2a4eec80920d94c955877557e.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
c93d4f6ecf powerpc/ftrace: Add module_trampoline_target() for PPC32
module_trampoline_target() is used by __ftrace_modify_call().

Implement it for PPC32 so that CONFIG_DYNAMIC_FTRACE_WITH_REGS
can be activated on PPC32 as well.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/42345f464fb465f0fc76f3090e250be8fc1729f0.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy
88670fdb26 powerpc/ftrace: No need to read LR from stack in _mcount()
All functions calling _mcount do it exactly the same way, with the
following sequence of instructions:

	c07de788:       7c 08 02 a6     mflr    r0
	c07de78c:       90 01 00 04     stw     r0,4(r1)
	c07de790:       4b 84 13 65     bl      c001faf4 <_mcount>

Allthough LR is pushed on stack, it is still in r0 while entering
_mcount().

Function arguments are in r3-r10, so r11 and r12 are still available
at that point.

Do like PPC64 and use r12 to move LR into CTR, so that r0 is preserved
and doesn't need to be restored from the stack.

While at it, bring back the EXPORT_SYMBOL at the end of _mcount.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24a3ba7db388537c44a038026f926d885372e6d3.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Michael Ellerman
ab85a27395 powerpc: Mark probe_machine() __init and static
Prior to commit b1923caa6e ("powerpc: Merge 32-bit and 64-bit
setup_arch()") probe_machine() was called from setup_32/64.c and lived
in setup-common.c. But now it's only called from setup-common.c so it
can be static and __init, and we don't need the declaration in
machdep.h either.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-6-mpe@ellerman.id.au
2021-11-29 22:49:26 +11:00
Michael Ellerman
a4ac0d249a powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
setup_profiling_timer() is only needed when CONFIG_PROFILING is enabled.

Fixes the following W=1 warning when CONFIG_PROFILING=n:
  linux/arch/powerpc/kernel/smp.c:1638:5: error: no previous prototype for ‘setup_profiling_timer’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-5-mpe@ellerman.id.au
2021-11-29 22:49:23 +11:00
Michael Ellerman
ff47a95d1a powerpc/mm: Move tlbcam_sz() and make it static
Building with W=1 we see a warning:
  linux/arch/powerpc/mm/nohash/fsl_book3e.c:63:15: error: no previous prototype for ‘tlbcam_sz’

tlbcam_sz() is not used outside this file, so we can make it static.
However it's only used inside #ifdef CONFIG_PPC32, so move it within
that ifdef, otherwise we would get a defined but not used error.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-4-mpe@ellerman.id.au
2021-11-29 22:49:20 +11:00
Michael Ellerman
d9150d5bb5 powerpc/85xx: Make c293_pcie_pic_init() static
To fix the W=1 warning:
  linux/arch/powerpc/platforms/85xx/c293pcie.c:22:13: error: no previous prototype for ‘c293_pcie_pic_init’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-3-mpe@ellerman.id.au
2021-11-29 22:49:17 +11:00
Michael Ellerman
84a61fb43f powerpc/85xx: Make mpc85xx_smp_kexec_cpu_down() static
To fix the W=1 warning:
  arch/powerpc/platforms/85xx/smp.c:369:6: error: no previous prototype for ‘mpc85xx_smp_kexec_cpu_down’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-2-mpe@ellerman.id.au
2021-11-29 22:49:14 +11:00
Michael Ellerman
4ea9e321c2 powerpc/85xx: Fix no previous prototype warning for mpc85xx_setup_pmc()
Fixes the following W=1 warning:
  arch/powerpc/platforms/85xx/mpc85xx_pm_ops.c:89:12: warning: no previous prototype for 'mpc85xx_setup_pmc'

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-1-mpe@ellerman.id.au
2021-11-29 22:49:09 +11:00
Nicholas Piggin
2eafc4748b powerpc: select CPUMASK_OFFSTACK if NR_CPUS >= 8192
Some core kernel code starts to go beyond the 2048 byte stack size
warning at NR_CPUS=8192, so select CPUMASK_OFFSTACK in that case.
x86 does similarly for very large NR_CPUS.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105035042.1398309-2-npiggin@gmail.com
2021-11-29 22:48:32 +11:00
Nicholas Piggin
b350111bf7 powerpc: remove cpu_online_cores_map function
This function builds the cores online map with on-stack cpumasks which
can cause high stack usage with large NR_CPUS.

It is not used in any performance sensitive paths, so instead just check
for first thread sibling.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105035042.1398309-1-npiggin@gmail.com
2021-11-29 22:48:32 +11:00
Xiaoming Ni
3dc709e518 powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
When CONFIG_FSL_PMC is set to n, no value is assigned to cpu_up_prepare
in the mpc85xx_pm_ops structure. As a result, oops is triggered in
smp_85xx_start_cpu().

  smp: Bringing up secondary CPUs ...
  kernel tried to execute user page (0) - exploit attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch (NULL pointer?)
  Faulting instruction address: 0x00000000
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [00000000] 0x0
  LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568
  Call Trace:
  [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable)
  [c1051de8] [c0011460] __cpu_up+0xc0/0x228
  [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224
  [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c
  [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8
  [c1051eb8] [c07e67bc] smp_init+0x30/0x78
  [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8
  [c1051f18] [c00032d8] kernel_init+0x14/0x124
  [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c

Fixes: c45361abb9 ("powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n")
Reported-by: Martin Kennedy <hurricos@gmail.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Tested-by: Martin Kennedy <hurricos@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211126041153.16926-1-nixiaoming@huawei.com
2021-11-29 17:47:05 +11:00
Michael Ellerman
af3fdce4ab Revert "powerpc/code-patching: Improve verification of patchability"
This reverts commit 8b8a8f0ab3.

As reported[1] by Sachin this causes problems with ftrace, and it also
causes the code patching selftests to fail as reported[2] by Stephen.

So revert it for now.

1: https://lore.kernel.org/linuxppc-dev/3668743C-09DF-4673-B15C-2FFE2A57F7D7@linux.vnet.ibm.com/
2: https://lore.kernel.org/linuxppc-dev/20211126161747.1f7795b0@canb.auug.org.au/

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-11-29 17:41:52 +11:00
Linus Torvalds
7b65b798a6 powerpc fixes for 5.16 #3
Fix KVM using a Power9 instruction on earlier CPUs, which could lead to the host SLB being
 incorrectly invalidated and a subsequent host crash.
 
 Fix kernel hardlockup on vmap stack overflow on 32-bit.
 
 Thanks to: Christophe Leroy, Nicholas Piggin, Fabiano Rosas.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGh96MTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgGHDEACu37n/NLFp3qzHyrk4fHN6sUQ0wGsE
 oroKeIrTP2+JPsoEDd6Jd9oQmqmVLmVTp4y/Lu6AAJbK4p1He1iPIxIbZn12NwqY
 /siCihV76PXR8748g1j8r3a3UYDmmvmllAIf9JdAn2I4kKPE3J37/1NEGSqZyJ80
 HzkheTczUweANhVuG1RDh3WvnAKMuOvtrT/YgetVILig23oiUORQkZH5wcpScaCN
 qShnqKSuRt8HWaF3uRh6D0sf4o0BWfZ6Pjyi7R/T7s2dQVgz0uCtu4MuyGirx/0J
 uU3yDEZD24y6doyi6HUgAFGeGYqJXky2XDOMrgV6CNGFty9HFp6GgN6gtX353pZa
 9Xplw4kg9UJwGcyk1BvgdQBHdc83y5zmr8c2PfhgiP/5rfEyCTYtHgZKUu5LLW8Y
 ECYkSNnlAT68SancCmiXgos1heIiwB7922cRWJANfGEjItNBFldtb2QZio2bWOUX
 g4gZ8HN2P0OmOxV1nnNkmIe+ZKLeHPN2N/FR7pUdkFk2+BLWFx+SbrceCCicwMF4
 CUUzll8FMFmVAGjFivG/axzqlDWMY5lrDsFZamaWN3VQEo7TwL6l8/cDcFZF4QBq
 viLWbi7sfQm6M3tZXFS465sG4SpUQ4lI8PKptuz5DMdr6WKisTLR7OF0klxsEwZj
 ZcT8a/wzUIZ53w==
 =MBp+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix KVM using a Power9 instruction on earlier CPUs, which could lead
  to the host SLB being incorrectly invalidated and a subsequent host
  crash.

  Fix kernel hardlockup on vmap stack overflow on 32-bit.

  Thanks to Christophe Leroy, Nicholas Piggin, and Fabiano Rosas"

* tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Fix hardlockup on vmap stack overflow
  KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
2021-11-27 10:06:15 -08:00
Linus Torvalds
b501b85957 asm-generic: syscall table updates
André Almeida sends an update for the newly added futex_waitv
 syscall that was initially only added to a few architectures.
 
 Some additional ones have since made it through architecture
 maintainer trees, this finishes the remaining ones.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGfrmIACgkQmmx57+YA
 GNl0eg/9Eu2iliGxiQPzPw93a3DCYr7eiylbhHbgzvCbUtDL4cTE/0fcFGCYUoVu
 k5UT/5ja3Q4z8wE+MUbxXf6igr+ANKJNwomKksTJV1Q6wKF23k5vUEd2ZXGyvrSk
 F+2X0bpB0wZzmMBh98L6dWsEaDJ//8iRZNW4ErmPR4mp25iDqhzZ1k2RoRg9mYoM
 /Lw6u5zlBiqAw5nnMc6BxPo8gJGwgcKDu4VDngpGDVGlvBHgd7Kf1AKi1+aBumml
 WzVXEEdUp3LTj7O/8oU3dKPuZMhl+k/mKUvHn/cieG00FxfMpOnxj0gdaUGgG0O4
 imhQquXtPueq/W3Uod+MhVulKR6AziEOsPhJayMGopLReE0spKDxcplC/OSbMhGj
 0iA6auLdHpHBCg3KWZDj0Sjf2NwL9UCFVDrk8XAQQo7bY2U5LQbcbr0PS7iLlpzN
 gIb+r2k3Azwk/Iqnvl2/SR1cZTx5KxfxCydT03vXaYQXhc5l+u4RvjOZcSGgZXqd
 gzC78vd6fJGKyVeaL2xb6/w7qmA4DFd6sQAbgMlVFRRS6AIJrIdDEZAQztSJ+uYK
 p9A2qzeXz+ftzonySbPHmV4RBGjSVVecUBjRmmDfKNbqmFUjjnCwCv+LeDXh4IS6
 mjmSA/9liYdkHUaujwCLhOEskuCx9ZPeAsqmzXMER0BDzReVlH4=
 =Vy2A
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic syscall table update from Arnd Bergmann:
 "André Almeida sends an update for the newly added futex_waitv syscall
  that was initially only added to a few architectures.

  Some additional ones have since made it through architecture
  maintainer trees, this finishes the remaining ones"

* tag 'asm-generic-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  futex: Wireup futex_waitv syscall
2021-11-25 10:41:28 -08:00
André Almeida
a0eb2da92b futex: Wireup futex_waitv syscall
Wireup futex_waitv syscall for all remaining archs.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-25 14:26:12 +01:00
Nicholas Piggin
3d030e3018 powerpc/watchdog: Fix wd_smp_last_reset_tb reporting
wd_smp_last_reset_tb now gets reset by watchdog_smp_panic() as part of
marking CPUs stuck and removing them from the pending mask before it
begins any printing. This causes last reset times reported to be off.

Fix this by reading it into a local variable before it gets reset.

Fixes: 76521c4b02 ("powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211125103346.1188958-1-npiggin@gmail.com
2021-11-25 21:48:39 +11:00
Michael Ellerman
4afc78eae1 powerpc/microwatt: Make microwatt_get_random_darn() static
Make microwatt_get_random_darn() static, because it can be.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211118004415.1706863-1-mpe@ellerman.id.au
2021-11-25 11:25:34 +11:00
Nicholas Piggin
1f01bf9076 powerpc/watchdog: read TB close to where it is used
When taking watchdog actions, printing messages, comparing and
re-setting wd_smp_last_reset_tb, etc., read TB close to the point of use
and under wd_smp_lock or printing lock (if applicable).

This should keep timebase mostly monotonic with kernel log messages, and
could prevent (in theory) a laggy CPU updating wd_smp_last_reset_tb to
something a long way in the past, and causing other CPUs to appear to be
stuck.

These additional TB reads are all slowpath (lockup has been detected),
so performance does not matter.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-5-npiggin@gmail.com
2021-11-25 11:25:34 +11:00
Nicholas Piggin
76521c4b02 powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi
There is a deadlock with the console_owner lock and the wd_smp_lock:

CPU x takes the console_owner lock
CPU y takes a watchdog timer interrupt and takes __wd_smp_lock
CPU x takes a soft-NMI interrupt, detects deadlock, spins on __wd_smp_lock
CPU y detects deadlock, tries to print something and spins on console_owner
-> deadlock

Change the watchdog locking scheme so wd_smp_lock protects the watchdog
internal data, but "reporting" (printing, issuing NMI IPIs, taking any
action outside of watchdog) uses a non-waiting exclusion. If a CPU detects
a problem but can not take the reporting lock, it just returns because
something else is already reporting. It will try again at some point.

Typically hard lockup watchdog report usefulness is not impacted due to
failure to spewing a large enough amount of data in as short a time as
possible, but by messages getting garbled.

Laurent debugged this and found the deadlock, and this patch is based on
his general approach to avoid expensive operations while holding the lock.
With the addition of the reporting exclusion.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
[np: rework to add reporting exclusion update changelog]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-4-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Nicholas Piggin
858c93c315 powerpc/watchdog: tighten non-atomic read-modify-write access
Most updates to wd_smp_cpus_pending are under lock except the watchdog
interrupt bit clear.

This can race with non-atomic RMW updates to the mask under lock, which
can happen in two instances:

Firstly, if another CPU detects this one is stuck, removes it from the
mask, mask becomes empty and is re-filled with non-atomic stores. This
is okay because it would re-fill the mask with this CPU's bit clear
anyway (because this CPU is now stuck), so it doesn't matter that the
bit clear update got "lost". Add a comment for this.

Secondly, if another CPU detects a different CPU is stuck and removes it
from the pending mask with a non-atomic store to bytes which also
include the bit of this CPU. This case can result in the bit clear being
lost and the end result being the bit is set. This should be so rare it
hardly matters, but to make things simpler to reason about just avoid
the non-atomic access for that case.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-3-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Nicholas Piggin
5dad4ba68a powerpc/watchdog: Fix missed watchdog reset due to memory ordering race
It is possible for all CPUs to miss the pending cpumask becoming clear,
and then nobody resetting it, which will cause the lockup detector to
stop working. It will eventually expire, but watchdog_smp_panic will
avoid doing anything if the pending mask is clear and it will never be
reset.

Order the cpumask clear vs the subsequent test to close this race.

Add an extra check for an empty pending mask when the watchdog fires and
finds its bit still clear, to try to catch any other possible races or
bugs here and keep the watchdog working. The extra test in
arch_touch_nmi_watchdog is required to prevent the new warning from
firing off.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Debugged-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-2-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Peiwei Hu
869fb7e5ae powerpc/prom_init: Fix improper check of prom_getprop()
prom_getprop() can return PROM_ERROR. Binary operator can not identify
it.

Fixes: 94d2dde738 ("[POWERPC] Efika: prune fixups and make them more carefull")
Signed-off-by: Peiwei Hu <jlu.hpw@foxmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/tencent_BA28CC6897B7C95A92EB8C580B5D18589105@qq.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
dd5cde457a powerpc/rtas: rtas_busy_delay_time() kernel-doc
Provide API documentation for rtas_busy_delay_time(), explaining why we
return the same value for 9900 and -2.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-3-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
38f7b7067d powerpc/rtas: rtas_busy_delay() improvements
Generally RTAS cannot block, and in PAPR it is required to return control
to the OS within a few tens of microseconds. In order to support operations
which may take longer to complete, many RTAS primitives can return
intermediate -2 ("busy") or 990x ("extended delay") values, which indicate
that the OS should reattempt the same call with the same arguments at some
point in the future.

Current versions of PAPR are less than clear about this, but the intended
meanings of these values in more detail are:

RTAS_BUSY (-2): RTAS has suspended a potentially long-running operation in
order to meet its latency obligation and give the OS the opportunity to
perform other work. RTAS can resume making progress as soon as the OS
reattempts the call.

RTAS_EXTENDED_DELAY_{MIN...MAX} (9900-9905): RTAS must wait for an external
event to occur or for internal contention to resolve before it can complete
the requested operation. The value encodes a non-binding hint as to roughly
how long the OS should wait before calling again, but the OS is allowed to
reattempt the call sooner or even immediately.

Linux of course must take its own CPU scheduling obligations into account
when handling these statuses; e.g. a task which receives an RTAS_BUSY
status should check whether to reschedule before it attempts the RTAS call
again to avoid starving other tasks.

rtas_busy_delay() is a helper function that "consumes" a busy or extended
delay status. Common usage:

    int rc;

    do {
        rc = rtas_call(rtas_token("some-function"), ...);
    } while (rtas_busy_delay(rc));

    /* convert rc to Linux error value, etc */

If rc is a busy or extended delay status, the caller can rely on
rtas_busy_delay() to perform an appropriate sleep or reschedule and return
nonzero. Other statuses are handled normally by the caller.

The current implementation of rtas_busy_delay() both oversleeps and
overuses the CPU:

*  It performs msleep() for all 990x and even when no delay is
   suggested (-2), but this is understood to actually sleep for two jiffies
   minimum in practice (20ms with HZ=100). 9900 (1ms) and 9901 (10ms)
   appear to be the most common extended delay statuses, and the
   oversleeping measurably lengthens DLPAR operations, which perform
   many RTAS calls.

*  It does not sleep on 990x unless need_resched() is true, causing code
   like the loop above to needlessly retry, wasting CPU time.

Alter the logic to align better with the intended meanings:

*  When passed RTAS_BUSY, perform cond_resched() and return without
   sleeping. The caller should reattempt immediately

*  Always sleep when passed an extended delay status, using usleep_range()
   for precise shorter sleeps. Limit the sleep time to one second even
   though there are higher architected values.

Change rtas_busy_delay()'s return type to bool to better reflect its usage,
and add kernel-doc.

rtas_busy_delay_time() is unchanged, even though it "incorrectly" returns 1
for RTAS_BUSY. There are users of that API with open-coded delay loops in
sensitive contexts that will have to be taken on an individual basis.

Brief results for addition and removal of 5GB memory on a small P9 PowerVM
partition follow. Load was generated with stress-ng --cpu N. For add,
elapsed time is greatly reduced without significant change in the number of
RTAS calls or time spent on CPU. For remove, elapsed time is modestly
reduced, with significant reductions in RTAS calls and time spent on CPU.

With no competing workload (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             1,935      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
-            609.99 msec task-clock                #    0.183 CPUs utilized            ( +-  0.19% )
+             1,956      probe:rtas_call           #    0.003 M/sec                    ( +-  0.17% )
+            618.56 msec task-clock                #    0.278 CPUs utilized            ( +-  0.11% )

-            3.3322 +- 0.0670 seconds time elapsed  ( +-  2.01% )
+            2.2222 +- 0.0416 seconds time elapsed  ( +-  1.87% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             6,224      probe:rtas_call           #    0.008 M/sec                    ( +-  2.57% )
-            750.36 msec task-clock                #    0.190 CPUs utilized            ( +-  2.01% )
+               843      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
+            250.66 msec task-clock                #    0.068 CPUs utilized            ( +-  0.17% )

-            3.9394 +- 0.0890 seconds time elapsed  ( +-  2.26% )
+             3.678 +- 0.113 seconds time elapsed  ( +-  3.07% )

With all CPUs 100% busy (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             2,979      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
-          1,096.62 msec task-clock                #    0.105 CPUs utilized            ( +-  0.10% )
+             2,981      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
+          1,095.26 msec task-clock                #    0.154 CPUs utilized            ( +-  0.21% )

-            10.476 +- 0.104 seconds time elapsed  ( +-  1.00% )
+            7.1124 +- 0.0865 seconds time elapsed  ( +-  1.22% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             2,702      probe:rtas_call           #    0.004 M/sec                    ( +-  4.00% )
-            722.71 msec task-clock                #    0.067 CPUs utilized            ( +-  2.41% )
+             1,246      probe:rtas_call           #    0.003 M/sec                    ( +-  0.25% )
+            487.73 msec task-clock                #    0.049 CPUs utilized            ( +-  0.20% )

-            10.829 +- 0.163 seconds time elapsed  ( +-  1.51% )
+            9.9887 +- 0.0866 seconds time elapsed  ( +-  0.87% )

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-2-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
22887f319a powerpc/pseries: delete scanlog
Remove the pseries scanlog driver.

This code supports functions from Power4-era servers that are not present
on targets currently supported by arch/powerpc. System manuals from this
time have this description:

  Scan Dump data is a set of chip data that the service processor gathers
  after a system malfunction. It consists of chip scan rings, chip trace
  arrays, and Scan COM (SCOM) registers. This data is stored in the
  scan-log partition of the system’s Nonvolatile Random Access
  Memory (NVRAM).

PowerVM partition firmware development doesn't recognize the associated
function call or property, and they don't see any references to them in
their codebase. It seems to have been specific to non-virtualized pseries.

References:

Historical Linux commit from February 2003 (interesting to note this seems
to be the source of non-GPL exports for rtas_call etc):
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=f92e361842d5251e50562b09664082dcbd0548bb

IntelliStation and pSeries docs which refer to the feature:
http://ps-2.retropc.se/basil.holloway/ALL%20PDF/380635.pdf
http://ps-2.kev009.com/rs6000/manuals/p/p615-6C3-6E3/6C3_and_6E3_Users_Guide_SA38-0629.pdf

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210920173203.1800475-1-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch
53cadf7dee powerpc/rtas: kernel-doc fixes
Fix the following issues reported by kernel-doc:

$ scripts/kernel-doc -v -none arch/powerpc/kernel/rtas.c
arch/powerpc/kernel/rtas.c:810: info: Scanning doc for function rtas_activate_firmware
arch/powerpc/kernel/rtas.c:818: warning: contents before sections
arch/powerpc/kernel/rtas.c:841: info: Scanning doc for function rtas_call_reentrant
arch/powerpc/kernel/rtas.c:893: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Find a specific pseries error log in an RTAS extended event log.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211116215806.928235-1-nathanl@linux.ibm.com
2021-11-25 11:25:32 +11:00
Christophe Leroy
8b8a8f0ab3 powerpc/code-patching: Improve verification of patchability
Today, patch_instruction() assumes that it is called exclusively on
valid addresses, and only checks that it is not called on an init
address after init section has been freed.

Improve verification by calling kernel_text_address() instead.

kernel_text_address() already includes a verification of
initmem release.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bc683d499a411730504b132a924de0ccc2ef1f79.1636971137.git.christophe.leroy@csgroup.eu
2021-11-25 11:25:32 +11:00
Jason Wang
a3bcfc182b powerpc/tsi108: make EXPORT_SYMBOL follow its function immediately
EXPORT_SYMBOL(foo); should immediately follow its function/variable.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211114115616.493815-1-wangborong@cdjrlc.com
2021-11-25 11:25:32 +11:00
Hari Bathini
e919c0b232 bpf ppc32: Access only if addr is kernel address
With KUAP enabled, any kernel code which wants to access userspace
needs to be surrounded by disable-enable KUAP. But that is not
happening for BPF_PROBE_MEM load instruction. Though PPC32 does not
support read protection, considering the fact that PTR_TO_BTF_ID
(which uses BPF_PROBE_MEM mode) could either be a valid kernel pointer
or NULL but should never be a pointer to userspace address, execute
BPF_PROBE_MEM load only if addr is kernel address, otherwise set
dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-9-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Hari Bathini
23b51916ee bpf ppc32: Add BPF_PROBE_MEM support for JIT
BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains 3 instructions,
first 2 instructions clear dest_reg (lower & higher 32-bit registers)
and last instruction jumps to next instruction in the BPF code.
extable 'insn' field contains relative offset of the instruction and
'fixup' field contains relative offset of the fixup entry. Example
layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| lwz   r28,4(r4)  |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r24)   |
             | lwz  r4,4(r24)   |
             |                  |
             |                  |
             |------------------|
   0x4278 -->| li  r28,0        |  \
             | li  r27,0        |  | fixup entry
             | b   0x4024       |  /
   0x4284 -->| li  r4,0         |
             | li  r3,0         |
             | b   0x40b4       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffe4 |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffe8 |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-8-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Ravi Bangoria
9c70c7147f bpf ppc64: Access only if addr is kernel address
On PPC64 with KUAP enabled, any kernel code which wants to
access userspace needs to be surrounded by disable-enable KUAP.
But that is not happening for BPF_PROBE_MEM load instruction.
So, when BPF program tries to access invalid userspace address,
page-fault handler considers it as bad KUAP fault:

  Kernel attempted to read user page (d0000000) - exploit attempt? (uid: 0)

Considering the fact that PTR_TO_BTF_ID (which uses BPF_PROBE_MEM
mode) could either be a valid kernel pointer or NULL but should
never be a pointer to userspace address, execute BPF_PROBE_MEM load
only if addr is kernel address, otherwise set dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-7-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Ravi Bangoria
983bdc0245 bpf ppc64: Add BPF_PROBE_MEM support for JIT
BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains two instructions,
first instruction clears dest_reg and 2nd jumps to next instruction
in the BPF code. extable 'insn' field contains relative offset of
the instruction and 'fixup' field contains relative offset of the
fixup entry. Example layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| ld   r27,4(r3)   |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r4)    |
             |                  |
             |                  |
             |------------------|
   0x4280 -->| li  r27,0        |  \ fixup entry
             | b   0x4024       |  /
   0x4288 -->| li  r3,0         |
             | b   0x40b0       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffec |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffec |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-6-hbathini@linux.ibm.com
2021-11-25 11:25:32 +11:00
Hari Bathini
f15a71b388 powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
Define and use PPC_RAW_BRANCH() macro instead of open coding it. This
macro is used while adding BPF_PROBE_MEM support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-5-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Hari Bathini
efa95f031b bpf powerpc: refactor JIT compiler code
Refactor powerpc LDX JITing code to simplify adding BPF_PROBE_MEM
support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-4-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Ravi Bangoria
04c04205bc bpf powerpc: Remove extra_pass from bpf_jit_build_body()
In case of extra_pass, usual JIT passes are always skipped. So,
extra_pass is always false while calling bpf_jit_build_body() and
can be removed.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-3-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Ravi Bangoria
c9ce7c36e4 bpf powerpc: Remove unused SEEN_STACK
SEEN_STACK is unused on PowerPC. Remove it. Also, have
SEEN_TAILCALL use 0x40000000.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-2-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Oliver O'Halloran
157616f3c2 powerpc/eeh: Use a goto for recovery failures
The EEH recovery logic in eeh_handle_normal_event() has some pretty strange
flow control. If we remove all the actual recovery logic we're left with
the following skeleton:

	if (result != PCI_ERS_RESULT_DISCONNECT) {
		...
	}

	if (result != PCI_ERS_RESULT_DISCONNECT) {
		...
	}

	if (result == PCI_ERS_RESULT_NONE) {
		...
	}

	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
		...
	}

	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
		...
	}

	if (result == PCI_ERS_RESULT_NEED_RESET) {
		...
	}

	if ((result == PCI_ERS_RESULT_RECOVERED) ||
	    (result == PCI_ERS_RESULT_NONE)) {
		...
		goto out;
	}

	/*
	 * unsuccessful recovery / PCI_ERS_RESULT_DISCONECTED
	 * handling is here.
	 */
	...

	out:
	...

Most of the "if () { ... }" blocks above change "result" to
PCI_ERS_RESULT_DISCONNECTED if an error occurs in that recovery step. This
makes the control flow a bit confusing since it breaks the early-exit
pattern that is generally used in Linux. In any case we end up handling the
error in the final else block so why not just jump there directly? Doing so
also allows us to de-indent a bunch of code.

No functional changes.

[dja: rebase on top of linux-next + my preceeding refactor,
      move clearing the EEH_DEV_NO_HANDLER bit above the first goto so that
      it is always clear in the error handler code as it was before.]

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015070628.1331635-2-dja@axtens.net
2021-11-25 11:25:31 +11:00
Daniel Axtens
10b34ece13 powerpc/eeh: Small refactor of eeh_handle_normal_event()
The control flow of eeh_handle_normal_event() is a bit tricky.

Break out one of the error handling paths - rather than be in an else
block, we'll make it part of the regular body of the function and put a
'goto out;' in the true limb of the if.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015070628.1331635-1-dja@axtens.net
2021-11-25 11:25:31 +11:00
Cédric Le Goater
1e7684dc4f powerpc/xive: Add a debugfs toggle for save-restore
On POWER10, the automatic "save & restore" of interrupt context is
always available. Provide a way to deactivate it for tests or
performance.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-11-clg@kaod.org
2021-11-25 11:25:31 +11:00
Cédric Le Goater
c21ee04f11 powerpc/xive: Add a kernel parameter for StoreEOI
StoreEOI is activated by default on platforms supporting the feature
(POWER10) and will be used as soon as firmware advertises its
availability. The kernel parameter provides a way to deactivate its
use. It can be still be reactivated through debugfs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-10-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
d7bc1e376c powerpc/xive: Add a debugfs toggle for StoreEOI
It can be used to deactivate temporarily StoreEOI for tests or
performance on platforms supporting the feature (POWER10)

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-9-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
08f3f61021 powerpc/xive: Add a debugfs file to dump EQs
The XIVE driver under Linux uses a single interrupt priority and only
one event queue is configured per CPU. Expose the contents under
a 'xive/eqs/cpuX' debugfs file.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-8-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
33e1d4a152 powerpc/xive: Rename the 'cpus' debugfs file to 'ipis'
and remove the EQ entries output which is not very useful since only
the next two events of the queue are taken into account. We will
improve the dump of the EQ in the next patches.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-7-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
baed14de78 powerpc/xive: Change the debugfs file 'xive' into a directory
Use a 'cpus' file to dump CPU states and 'interrupts' to dump IRQ states.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-6-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
412877dfae powerpc/xive: Introduce xive_core_debugfs_create()
and fix some compile issues when !CONFIG_DEBUG_FS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[mpe: Add empty stub to fix !CONFIG_DEBUG_FS build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-5-clg@kaod.org
2021-11-25 11:25:30 +11:00
Cédric Le Goater
756c52c632 powerpc/xive: Activate StoreEOI on P10
StoreEOI (the capability to EOI with a store) requires load-after-store
ordering in some cases to be reliable. P10 introduced a new offset for
load operations to enforce correct ordering and the XIVE driver has
the required support since kernel 5.8, commit b1f9be9392
("powerpc/xive: Enforce load-after-store ordering when StoreEOI is active")

Since skiboot v7, StoreEOI support is advertised on P10 with a new flag
on the PowerNV platform. See skiboot commit 4bd7d84afe46 ("xive/p10:
Introduce a new OPAL_XIVE_IRQ_STORE_EOI2 flag"). When detected,
activate the feature.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-4-clg@kaod.org
2021-11-25 11:25:30 +11:00