Commit Graph

511 Commits

Author SHA1 Message Date
Xi Ruoyao
e05d4cd9b8 LoongArch: Add support for relocating the kernel with RELR relocation
RELR as a relocation packing format for relative relocations for
reducing the size of relative relocation records.  In a position
independent executable there are often many relative relocation
records, and our vmlinux is a PIE.

The LLD linker (since 17.0.0) and the BFD linker (since 2.43) supports
packing the relocations in the RELR format for LoongArch, with the flag
-z pack-relative-relocs.

Commits 5cf896fb6b ("arm64: Add support for relocating the kernel
with RELR relocations") and ccb2d173b9 ("Makefile: use -z
pack-relative-relocs") have already added the framework to use RELR.
We just need to wire it up and process the RELR relocation records in
relocate_relative() in addition to the RELA relocation records.

A ".p2align 3" directive is added to la_abs macro or the BFD linker
cannot pack the relocation records against the .la_abs section (the
". = ALIGN(8);" directive in vmlinux.lds.S is too late in the linking
process).

With defconfig and CONFIG_RELR vmlinux.efi is 2.1 MiB (6%) smaller, and
vmlinuz.efi (using gzip compression) is 384 KiB (2.8%) smaller.

Link: https://groups.google.com/d/topic/generic-abi/bX460iggiKg
Link: https://reviews.llvm.org/D138135#4531389
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d89ecf33ab6d
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20 22:41:07 +08:00
Huacai Chen
f60d251b27 LoongArch: Add architectural preparation for CPUFreq
Add architectural preparation for CPUFreq driver, including: Kconfig,
register definition and platform device registration.

Some of LoongArch processors support DVFS, their IOCSR.FEATURES has
IOCSRF_FREQSCALE set. And they has a micro-core in the package called
SMC (System Management Controller) to scale frequency, voltage, etc.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20 22:41:06 +08:00
Huacai Chen
8e02c3b782 LoongArch: Add writecombine support for DMW-based ioremap()
Currently, only TLB-based ioremap() support writecombine, so add the
counterpart for DMW-based ioremap() with help of DMW2. The base address
(WRITECOMBINE_BASE) is configured as 0xa000000000000000.

DMW3 is unused by kernel now, however firmware may leave garbage in them
and interfere kernel's address mapping. So clear it as necessary.

BTW, centralize the DMW configuration to macro SETUP_DMWINS.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20 22:40:59 +08:00
Huacai Chen
b7a2750ef2 LoongArch: Add ARCH_HAS_PTE_DEVMAP support
In order for things like get_user_pages() to work on ZONE_DEVICE memory,
we need a software PTE bit to identify device-backed PFNs.  Hook this up
along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20 22:40:59 +08:00
Huacai Chen
08f417db70 LoongArch: Add irq_work support via self IPIs
Add irq_work support for LoongArch via self IPIs. This make it possible
to run works in hardware interrupt context, which is a prerequisite for
NOHZ_FULL.

Implement:
 - arch_irq_work_raise()
 - arch_irq_work_has_interrupt()

Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20 22:40:58 +08:00
Huacai Chen
7697a0fe01 LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Chromium sandbox apparently wants to deny statx [1] so it could properly
inspect arguments after the sandboxed process later falls back to fstat.
Because there's currently not a "fd-only" version of statx, so that the
sandbox has no way to ensure the path argument is empty without being
able to peek into the sandboxed process's memory. For architectures able
to do newfstatat though, glibc falls back to newfstatat after getting
-ENOSYS for statx, then the respective SIGSYS handler [2] takes care of
inspecting the path argument, transforming allowed newfstatat's into
fstat instead which is allowed and has the same type of return value.

But, as LoongArch is the first architecture to not have fstat nor
newfstatat, the LoongArch glibc does not attempt falling back at all
when it gets -ENOSYS for statx -- and you see the problem there!

Actually, back when the LoongArch port was under review, people were
aware of the same problem with sandboxing clone3 [3], so clone was
eventually kept. Unfortunately it seemed at that time no one had noticed
statx, so besides restoring fstat/newfstatat to LoongArch uapi (and
postponing the problem further), it seems inevitable that we would need
to tackle seccomp deep argument inspection.

However, this is obviously a decision that shouldn't be taken lightly,
so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT
in unistd.h. This is the simplest solution for now, and so we hope the
community will tackle the long-standing problem of seccomp deep argument
inspection in the future [4][5].

Also add "newstat" to syscall_abis_64 in Makefile.syscalls due to
upstream asm-generic changes.

More infomation please reading this thread [6].

[1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150
[2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355
[3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.cx/
[4] https://lwn.net/Articles/799557/
[5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf
[6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@brauner/T/#t

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-20 22:40:58 +08:00
Linus Torvalds
70045bfc4c ftrace: Rewrite of function graph tracer
Up until now, the function graph tracer could only have a single user
 attached to it. If another user tried to attach to the function graph
 tracer while one was already attached, it would fail. Allowing function
 graph tracer to have more than one user has been asked for since 2009, but
 it required a rewrite to the logic to pull it off so it never happened.
 Until now!
 
 There's three systems that trace the return of a function. That is
 kretprobes, function graph tracer, and BPF. kretprobes and function graph
 tracing both do it similarly. The difference is that kretprobes uses a
 shadow stack per callback and function graph tracer creates a shadow stack
 for all tasks. The function graph tracer method makes it possible to trace
 the return of all functions. As kretprobes now needs that feature too,
 allowing it to use function graph tracer was needed. BPF also wants to
 trace the return of many probes and its method doesn't scale either.
 Having it use function graph tracer would improve that.
 
 By allowing function graph tracer to have multiple users allows both
 kretprobes and BPF to use function graph tracer in these cases. This will
 allow kretprobes code to be removed in the future as it's version will no
 longer be needed. Note, function graph tracer is only limited to 16
 simultaneous users, due to shadow stack size and allocated slots.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZpbWlxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgtvAP9jxmgEiEhz4Bpe1vRKVSMYK6ozXHTT
 7MFKRMeQqQ8zeAEA2sD5Zrt9l7zKzg0DFpaDLgc3/yh14afIDxzTlIvkmQ8=
 =umuf
 -----END PGP SIGNATURE-----

Merge tag 'ftrace-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace updates from Steven Rostedt:
 "Rewrite of function graph tracer to allow multiple users

  Up until now, the function graph tracer could only have a single user
  attached to it. If another user tried to attach to the function graph
  tracer while one was already attached, it would fail. Allowing
  function graph tracer to have more than one user has been asked for
  since 2009, but it required a rewrite to the logic to pull it off so
  it never happened. Until now!

  There's three systems that trace the return of a function. That is
  kretprobes, function graph tracer, and BPF. kretprobes and function
  graph tracing both do it similarly. The difference is that kretprobes
  uses a shadow stack per callback and function graph tracer creates a
  shadow stack for all tasks. The function graph tracer method makes it
  possible to trace the return of all functions. As kretprobes now needs
  that feature too, allowing it to use function graph tracer was needed.
  BPF also wants to trace the return of many probes and its method
  doesn't scale either. Having it use function graph tracer would
  improve that.

  By allowing function graph tracer to have multiple users allows both
  kretprobes and BPF to use function graph tracer in these cases. This
  will allow kretprobes code to be removed in the future as it's version
  will no longer be needed.

  Note, function graph tracer is only limited to 16 simultaneous users,
  due to shadow stack size and allocated slots"

* tag 'ftrace-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (49 commits)
  fgraph: Use str_plural() in test_graph_storage_single()
  function_graph: Add READ_ONCE() when accessing fgraph_array[]
  ftrace: Add missing kerneldoc parameters to unregister_ftrace_direct()
  function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove it
  function_graph: Fix up ftrace_graph_ret_addr()
  function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACE
  function_graph: Rename BYTE_NUMBER to CHAR_NUMBER in selftests
  fgraph: Remove some unused functions
  ftrace: Hide one more entry in stack trace when ftrace_pid is enabled
  function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not enabled
  function_graph: Make fgraph_do_direct static key static
  ftrace: Fix prototypes for ftrace_startup/shutdown_subops()
  ftrace: Assign RCU list variable with rcu_assign_ptr()
  ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU
  ftrace: Declare function_trace_op in header to quiet sparse warning
  ftrace: Add comments to ftrace_hash_move() and friends
  ftrace: Convert "inc" parameter to bool in ftrace_hash_rec_update_modify()
  ftrace: Add comments to ftrace_hash_rec_disable/enable()
  ftrace: Remove "filter_hash" parameter from __ftrace_hash_rec_update()
  ftrace: Rename dup_hash() and comment it
  ...
2024-07-18 13:36:33 -07:00
Paolo Bonzini
86014c1e20 KVM generic changes for 6.11
- Enable halt poll shrinking by default, as Intel found it to be a clear win.
 
  - Setup empty IRQ routing when creating a VM to avoid having to synchronize
    SRCU when creating a split IRQCHIP on x86.
 
  - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
    that arch code can use for hooking both sched_in() and sched_out().
 
  - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
    truncating a bogus value from userspace, e.g. to help userspace detect bugs.
 
  - Mark a vCPU as preempted if and only if it's scheduled out while in the
    KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
    memory when retrieving guest state during live migration blackout.
 
  - A few minor cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmaRuOYACgkQOlYIJqCj
 N/1UnQ/8CI5Qfr+/0gzYgtWmtEMczGG+rMNpzD3XVqPjJjXcMcBiQnplnzUVLhha
 vlPdYVK7vgmEt003XGzV55mik46LHL+DX/v4hI3HEdblfyCeNLW3fKEWVRB44qJe
 o+YUQwSK42SORUp9oXuQINxhA//U9EnI7CQxlJ8w8wenv5IJKfIGr01DefmfGPAV
 PKm9t6WLcNqvhZMEyy/zmzM3KVPCJL0NcwI97x6sHxFpQYIDtL0E/VexA4AFqMoT
 QK7cSDC/2US41Zvem/r/GzM/ucdF6vb9suzZYBohwhxtVhwJe2CDeYQZvtNKJ1U7
 GOHPaKL6nBWdZCm/yyWbbX2nstY1lHqxhN3JD0X8wqU5rNcwm2b8Vfyav0Ehc7H+
 jVbDTshOx4YJmIgajoKjgM050rdBK59TdfVL+l+AAV5q/TlHocalYtvkEBdGmIDg
 2td9UHSime6sp20vQfczUEz4bgrQsh4l2Fa/qU2jFwLievnBw0AvEaMximkSGMJe
 b8XfjmdTjlOesWAejANKtQolfrq14+1wYw0zZZ8PA+uNVpKdoovmcqSOcaDC9bT8
 GO/NFUvoG+lkcvJcIlo1SSl81SmGLosijwxWfGvFAqsgpR3/3l3dYp0QtztoCNJO
 d3+HnjgYn5o5FwufuTD3eUOXH4AFjG108DH0o25XrIkb2Kymy0o=
 =BalU
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEAD

KVM generic changes for 6.11

 - Enable halt poll shrinking by default, as Intel found it to be a clear win.

 - Setup empty IRQ routing when creating a VM to avoid having to synchronize
   SRCU when creating a split IRQCHIP on x86.

 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
   that arch code can use for hooking both sched_in() and sched_out().

 - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
   truncating a bogus value from userspace, e.g. to help userspace detect bugs.

 - Mark a vCPU as preempted if and only if it's scheduled out while in the
   KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
   memory when retrieving guest state during live migration blackout.

 - A few minor cleanups
2024-07-16 09:51:36 -04:00
Arnd Bergmann
26a3b85bac loongarch: convert to generic syscall table
The uapi/asm/unistd_64.h and asm/syscall_table_64.h headers can now be
generated from scripts/syscall.tbl, which makes this consistent with
the other architectures that have their own syscall.tbl.

Unlike the other architectures using the asm-generic header, loongarch
uses none of the deprecated system calls at the moment.

Both the user visible side of asm/unistd.h and the internal syscall
table in the kernel should have the same effective contents after this.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10 14:23:38 +02:00
Arnd Bergmann
505d66d1ab clone3: drop __ARCH_WANT_SYS_CLONE3 macro
When clone3() was introduced, it was not obvious how each architecture
deals with setting up the stack and keeping the register contents in
a fork()-like system call, so this was left for the architecture
maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those
that already implement it.

Five years later, we still have a few architectures left that are missing
clone3(), and the macro keeps getting in the way as it's fundamentally
different from all the other __ARCH_WANT_SYS_* macros that are meant
to provide backwards-compatibility with applications using older
syscalls that are no longer provided by default.

Address this by reversing the polarity of the macro, adding an
__ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't
already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3
from all the other ones.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10 14:23:38 +02:00
Arnd Bergmann
ff96f5c697 loongarch: avoid generating extra header files
The list of generated headers is rather outdated, some of these no longer
exist, while others are already listed in include/asm-generic/Kbuild
so there is no need to list them here.

As we start validating the list of headers against the files that exist,
the outdated ones now cause a warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10 14:23:38 +02:00
Bibo Mao
03779999ac LoongArch: KVM: Add PV steal time support in guest side
Per-cpu struct kvm_steal_time is added here, its size is 64 bytes and
also defined as 64 bytes, so that the whole structure is in one physical
page.

When a VCPU is online, function pv_enable_steal_time() is called. This
function will pass guest physical address of struct kvm_steal_time and
tells hypervisor to enable steal time. When a vcpu is offline, physical
address is set as 0 and tells hypervisor to disable steal time.

Here is an output of vmstat on guest when there is workload on both host
and guest. It shows steal time stat information.

procs -----------memory---------- -----io---- -system-- ------cpu-----
 r  b   swpd   free  inact active   bi    bo   in   cs us sy id wa st
15  1      0 7583616 184112  72208    20    0  162   52 31  6 43  0 20
17  0      0 7583616 184704  72192    0     0 6318 6885  5 60  8  5 22
16  0      0 7583616 185392  72144    0     0 1766 1081  0 49  0  1 50
16  0      0 7583616 184816  72304    0     0 6300 6166  4 62 12  2 20
18  0      0 7583632 184480  72240    0     0 2814 1754  2 58  4  1 35

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
b4ba157044 LoongArch: KVM: Add PV steal time support in host side
Add ParaVirt steal time feature in host side, VM can search supported
features provided by KVM hypervisor, a feature KVM_FEATURE_STEAL_TIME
is added here. Like x86, steal time structure is saved in guest memory,
one hypercall function KVM_HCALL_FUNC_NOTIFY is added to notify KVM to
enable this feature.

One CPU attr ioctl command KVM_LOONGARCH_VCPU_PVTIME_CTRL is added to
save and restore the base address of steal time structure when a VM is
migrated.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
8c34704252 LoongArch: KVM: Add dirty bitmap initially all set support
Add KVM_DIRTY_LOG_INITIALLY_SET support on LoongArch system, this
feature comes from other architectures like x86 and arm64.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:51 +08:00
Bibo Mao
b5d4e2325d LoongArch: KVM: Delay secondary mmu tlb flush until guest entry
With hardware assisted virtualization, there are two level HW mmu, one
is GVA to GPA mapping, the other is GPA to HPA mapping which is called
secondary mmu in generic. If there is page fault for secondary mmu,
there needs tlb flush operation indexed with fault GPA address and VMID.
VMID is stored at register CSR.GSTAT and will be reload or recalculated
before guest entry.

Currently CSR.GSTAT is not saved and restored during VCPU context
switch, instead it is recalculated during guest entry. So CSR.GSTAT is
effective only when a VCPU runs in guest mode, however it may not be
effective if the VCPU exits to host mode. Since register CSR.GSTAT may
be stale, it may records the VMID of the last schedule-out VCPU, rather
than the current VCPU.

Function kvm_flush_tlb_gpa() should be called with its real VMID, so
here move it to the guest entrance. Also an arch-specific request id
KVM_REQ_TLB_FLUSH_GPA is added to flush tlb for secondary mmu, and it
can be optimized if VMID is updated, since all guest tlb entries will
be invalid if VMID is updated.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-07-09 16:25:50 +08:00
Bang Li
8f65aa3223 mm: implement update_mmu_tlb() using update_mmu_tlb_range()
Let's make update_mmu_tlb() simply a generic wrapper around
update_mmu_tlb_range().  Only the latter can now be overridden by the
architecture.  We can now remove __HAVE_ARCH_UPDATE_MMU_TLB as well.

Link: https://lkml.kernel.org/r/20240522061204.117421-3-libang.li@antgroup.com
Signed-off-by: Bang Li <libang.li@antgroup.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:57 -07:00
Bang Li
23b1b44e6c mm: add update_mmu_tlb_range()
Patch series "Add update_mmu_tlb_range() to simplify code", v4.

This series of commits mainly adds the update_mmu_tlb_range() to batch
update tlb in an address range and implement update_mmu_tlb() using
update_mmu_tlb_range().

After commit 19eaf44954 ("mm: thp: support allocation of anonymous
multi-size THP"), We may need to batch update tlb of a certain address
range by calling update_mmu_tlb() in a loop.  Using the
update_mmu_tlb_range(), we can simplify the code and possibly reduce the
execution of some unnecessary code in some architectures.


This patch (of 3):

Add update_mmu_tlb_range(), we can batch update tlb of an address range.

Link: https://lkml.kernel.org/r/20240522061204.117421-1-libang.li@antgroup.com
Link: https://lkml.kernel.org/r/20240522061204.117421-2-libang.li@antgroup.com
Signed-off-by: Bang Li <libang.li@antgroup.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:57 -07:00
Hui Li
c8e57ab099 LoongArch: Trigger user-space watchpoints correctly
In the current code, gdb can set the watchpoint successfully through
ptrace interface, but watchpoint will not be triggered.

When debugging the following code using gdb.

lihui@bogon:~$ cat test.c
  #include <stdio.h>
  int a = 0;
  int main()
  {
	a = 1;
	printf("a = %d\n", a);
	return 0;
  }
lihui@bogon:~$ gcc -g test.c -o test
lihui@bogon:~$ gdb test
...
(gdb) watch a
...
(gdb) r
...
a = 1
[Inferior 1 (process 4650) exited normally]

No watchpoints were triggered, the root causes are:

1. Kernel uses perf_event and hw_breakpoint framework to control
   watchpoint, but the perf_event corresponding to watchpoint is
   not enabled. So it needs to be enabled according to MWPnCFG3
   or FWPnCFG3 PLV bit field in ptrace_hbp_set_ctrl(), and privilege
   is set according to the monitored addr in hw_breakpoint_control().
   Furthermore, add a judgment in ptrace_hbp_set_addr() to ensure
   kernel-space addr cannot be monitored in user mode.

2. The global enable control for all watchpoints is the WE bit of
   CSR.CRMD, and hardware sets the value to 0 when an exception is
   triggered. When the ERTN instruction is executed to return, the
   hardware restores the value of the PWE field of CSR.PRMD here.
   So, before a thread containing watchpoints be scheduled, the PWE
   field of CSR.PRMD needs to be set to 1. Add this modification in
   hw_breakpoint_control().

All changes according to the LoongArch Reference Manual:
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#basic-control-and-status-registers

With this patch:

lihui@bogon:~$ gdb test
...
(gdb) watch a
Hardware watchpoint 1: a
(gdb) r
...
Hardware watchpoint 1: a

Old value = 0
New value = 1
main () at test.c:6
6		printf("a = %d\n", a);
(gdb) c
Continuing.
a = 1
[Inferior 1 (process 775) exited normally]

Cc: stable@vger.kernel.org
Signed-off-by: Hui Li <lihui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-21 10:18:40 +08:00
Hui Li
f63a47b34b LoongArch: Fix watchpoint setting error
In the current code, when debugging the following code using gdb,
"invalid argument ..." message will be displayed.

lihui@bogon:~$ cat test.c
  #include <stdio.h>
  int a = 0;
  int main()
  {
	a = 1;
	return 0;
  }
lihui@bogon:~$ gcc -g test.c -o test
lihui@bogon:~$ gdb test
...
(gdb) watch a
Hardware watchpoint 1: a
(gdb) r
...
Invalid argument setting hardware debug registers

There are mainly two types of issues.

1. Some incorrect judgment condition existed in user_watch_state
   argument parsing, causing -EINVAL to be returned.

When setting up a watchpoint, gdb uses the ptrace interface,
ptrace(PTRACE_SETREGSET, tid, NT_LOONGARCH_HW_WATCH, (void *) &iov)).
Register values in user_watch_state as follows:

  addr[0] = 0x0, mask[0] = 0x0, ctrl[0] = 0x0
  addr[1] = 0x0, mask[1] = 0x0, ctrl[1] = 0x0
  addr[2] = 0x0, mask[2] = 0x0, ctrl[2] = 0x0
  addr[3] = 0x0, mask[3] = 0x0, ctrl[3] = 0x0
  addr[4] = 0x0, mask[4] = 0x0, ctrl[4] = 0x0
  addr[5] = 0x0, mask[5] = 0x0, ctrl[5] = 0x0
  addr[6] = 0x0, mask[6] = 0x0, ctrl[6] = 0x0
  addr[7] = 0x12000803c, mask[7] = 0x0, ctrl[7] = 0x610

In arch_bp_generic_fields(), return -EINVAL when ctrl.len is
LOONGARCH_BREAKPOINT_LEN_8(0b00). So delete the incorrect judgment here.

In ptrace_hbp_fill_attr_ctrl(), when note_type is NT_LOONGARCH_HW_WATCH
and ctrl[0] == 0x0, if ((type & HW_BREAKPOINT_RW) != type) will return
-EINVAL. Here ctrl.type should be set based on note_type, and unnecessary
judgments can be removed.

2. The watchpoint argument was not set correctly due to unnecessary
   offset and alignment_mask.

Modify ptrace_hbp_fill_attr_ctrl() and hw_breakpoint_arch_parse(), which
ensure the watchpont argument is set correctly.

All changes according to the LoongArch Reference Manual:
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints

Cc: stable@vger.kernel.org
Signed-off-by: Hui Li <lihui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-21 10:18:40 +08:00
Sean Christopherson
2a27c43140 KVM: Delete the now unused kvm_arch_sched_in()
Delete kvm_arch_sched_in() now that all implementations are nops.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20240522014013.1672962-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-11 14:18:45 -07:00
Steven Rostedt (Google)
5f7fb89a11 function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove it
All architectures that implement function graph also implements
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR. Remove it, as it is no longer a
differentiator.

Link: https://lore.kernel.org/linux-trace-kernel/20240611031737.982047614@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-11 11:18:24 -04:00
Jiaxun Yang
1098efd299 LoongArch: Override higher address bits in JUMP_VIRT_ADDR
In JUMP_VIRT_ADDR we are performing an or calculation on address value
directly from pcaddi.

This will only work if we are currently running from direct 1:1 mapping
addresses or firmware's DMW is configured exactly same as kernel. Still,
we should not rely on such assumption.

Fix by overriding higher bits in address comes from pcaddi, so we can
get rid of or operator.

Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03 15:45:53 +08:00
Jiaxun Yang
3de9c42d02 LoongArch: Add all CPUs enabled by fdt to NUMA node 0
NUMA enabled kernel on FDT based machine fails to boot because CPUs
are all in NUMA_NO_NODE and mm subsystem won't accept that.

Fix by adding them to default NUMA node at FDT parsing phase and move
numa_add_cpu(0) to a later point.

Cc: stable@vger.kernel.org
Fixes: 88d4d957ed ("LoongArch: Add FDT booting support from efi system table")
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03 15:45:53 +08:00
Linus Torvalds
c760b3725e - A series ("kbuild: enable more warnings by default") from Arnd
Bergmann which enables a number of additional build-time warnings.  We
   fixed all the fallout which we could find, there may still be a few
   stragglers.
 
 - Samuel Holland has developed the series "Unified cross-architecture
   kernel-mode FPU API".  This does a lot of consolidation of
   per-architecture kernel-mode FPU usage and enables the use of newer AMD
   GPUs on RISC-V.
 
 - Tao Su has fixed some selftests build warnings in the series
   "Selftests: Fix compilation warnings due to missing _GNU_SOURCE
   definition".
 
 - This pull also includes a nilfs2 fixup from Ryusuke Konishi.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZk6OSAAKCRDdBJ7gKXxA
 jpTGAP9hQaZ+g7CO38hKQAtEI8rwcZJtvUAP84pZEGMjYMGLxQD/S8z1o7UHx61j
 DUbnunbOkU/UcPx3Fs/gp4KcJARMEgs=
 =EPi9
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more non-mm updates from Andrew Morton:

 - A series ("kbuild: enable more warnings by default") from Arnd
   Bergmann which enables a number of additional build-time warnings. We
   fixed all the fallout which we could find, there may still be a few
   stragglers.

 - Samuel Holland has developed the series "Unified cross-architecture
   kernel-mode FPU API". This does a lot of consolidation of
   per-architecture kernel-mode FPU usage and enables the use of newer
   AMD GPUs on RISC-V.

 - Tao Su has fixed some selftests build warnings in the series
   "Selftests: Fix compilation warnings due to missing _GNU_SOURCE
   definition".

 - This pull also includes a nilfs2 fixup from Ryusuke Konishi.

* tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
  nilfs2: make block erasure safe in nilfs_finish_roll_forward()
  selftests/harness: use 1024 in place of LINE_MAX
  Revert "selftests/harness: remove use of LINE_MAX"
  selftests/fpu: allow building on other architectures
  selftests/fpu: move FP code to a separate translation unit
  drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT
  drm/amd/display: only use hard-float, not altivec on powerpc
  riscv: add support for kernel-mode FPU
  x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
  arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS
  arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS
  ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  arch: add ARCH_HAS_KERNEL_FPU_SUPPORT
  x86/fpu: fix asm/fpu/types.h include guard
  kbuild: enable -Wcast-function-type-strict unconditionally
  kbuild: enable -Wformat-truncation on clang
  ...
2024-05-22 18:59:29 -07:00
Linus Torvalds
4f05e82003 LoongArch changes for v6.10
1, Select some options in Kconfig;
 2, Give a chance to build with !CONFIG_SMP;
 3, Switch to use built-in rustc target;
 4, Add new supported device nodes to dts;
 5, Some bug fixes and other small changes;
 6, Update the default config file.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmZKCycWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImeoXWD/9pFhbbJj49T1xiwc2j/XgQL8HI
 s88/h4z5AXEbHFO8XIG1Cpw/Z3a1DsCiWBsOkCogagILzYuN0r7UqcrI02ZoeY6N
 fbuDatB3i+hJWCBzcl1HPkFy/9av4j4EktZs0+X/wVgKkd0aIh78qs8+1RwKhshf
 FoOv+cMu7zFS8Jrt+w16diNCY1JsDv7TCkCVhvJxAodrtGg4oo2NPfrGOrKAP8Dq
 LClvFEqDcXq1kKcipw3Q7BwDlBpJEvLZ0iAl19BnLAmBzI3Wfze9ouoYv8WiUyaY
 br0GPShGf16I3DKtTdHsHH/zmayQ7JSmFzZ9JEHzcBrE4AprfWLuwsUjd2WXDD6U
 wK+p4tWd0AUFf+/h4u1yQB9/rlt+JZ2ny/A2u4YR/BPtthiYqp8SDSH62vpCSFOE
 dByDeTbfjTdJsWr+bsI2gOO0sVwDYpph9SJfAyBn4miKw7v8w+2rI1oqo/ZQkP59
 0SczM9C9jzpgXSGDc4yQbnqoA4KA9U6zljd12mYL5HV/AjhD19va3FmENgByZUuE
 Z7A0RZsiU5T401xEZiOUhwzy9m/USc1O2ivCmeowx9kP/gWic0KeAsmlMiro0jeR
 y9jthci8iOgjjLmCEVC06GWGUojP2roXI/38We6enVevy2GXbEEDRa1QGbQ5ndoJ
 MEPm4NvW1wsBgWIYmg==
 =WR15
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Select some options in Kconfig

 - Give a chance to build with !CONFIG_SMP

 - Switch to use built-in rustc target

 - Add new supported device nodes to dts

 - Some bug fixes and other small changes

 - Update the default config file

* tag 'loongarch-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Update Loongson-3 default config file
  LoongArch: dts: Add new supported device nodes to Loongson-2K2000
  LoongArch: dts: Add new supported device nodes to Loongson-2K0500
  LoongArch: dts: Remove "disabled" state of clock controller node
  LoongArch: rust: Switch to use built-in rustc target
  LoongArch: Fix callchain parse error with kernel tracepoint events again
  LoongArch: Give a chance to build with !CONFIG_SMP
  LoongArch: Select THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
  LoongArch: Select ARCH_WANT_DEFAULT_BPF_JIT
  LoongArch: Select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
  LoongArch: Select ARCH_HAS_FAST_MULTIPLIER
2024-05-22 09:43:07 -07:00
Linus Torvalds
3eb3c33c1d asm-generic cleanups for 6.10
These are a few cross-architecture cleanup patches:
 
  - Thomas Zimmermann works on separating fbdev support from the asm/video.h
    contents that may be used by either the old fbdev drivers or the
    newer drm display code.
 
  - Thorsten Blum contributes cleanups for the generic bitops code
    and asm-generic/bug.h
 
  - I remove the orphaned include/asm-generic/page.h header that used to
    included by long-removed mmu-less architectures.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmZLvewACgkQYKtH/8kJ
 UicUEQ//b5WVLOVXkFGlQvAaZkagOLEF8xSTnchA7aKrWQ/C6hSwLN6CQU6MAY7j
 Fe54jYQtjwBwpVIj3jn20xiXP/pZbQp9aldkOx4v8YoGnjNF5UWLHm5510DV1ecE
 0LF/2YIH25vIXGY6MVm6sFq+nkDgWZee6fBFNc3GsCu2y0biD1Gob9xH/ngCHjIj
 tw9KS/j6MivPy/9vJ/Ml2YeutV6+pUA9hNmSrbSVlXSWFh3Wq6IZ+j6bNEftqtZY
 xdnYwdVfReOCIayq6hSHhAgIp/uw8JOqLuE2JNwG/9sSF4zp4ZHLvTaMhqEoCpyB
 3kZYd1qQTwV3eL5PyYtRcW03KvbhfZpMPzZT+wbl9SNPUljC2MSVeSFF30Uqatgb
 yUJ9d/vlb1ynu1yQrFfTZ/kK+U0pPByydwLybcMtEIZ6Hrb1h/eRicvHhUx7bKUB
 H9z/FN/TxGY+tPradx2lqm3J1wNu0ox8DUreXjtlJijKIUZQeAkJrGJgr6i6XLBz
 crwgKzuQUClzEjBcoWzuTVUB7v19jaDuHMsaBBu8O9f1g5FnEIJlItqnXf1J0Dno
 rJy68Mxsg4Dzt4YI3lpOJGDDDPhpOTBXfgsjkuru2MrdFMgZQh+DYLl3qOkJ4DJe
 rdiEJb9PygBaGGQnoXO71oOLf5yQuenj+Fg5GIe9AQrci5fXwRQ=
 =riCs
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "These are a few cross-architecture cleanup patches:

   - separate out fbdev support from the asm/video.h contents that may
     be used by either the old fbdev drivers or the newer drm display
     code (Thomas Zimmermann)

   - cleanups for the generic bitops code and asm-generic/bug.h
     (Thorsten Blum)

   - remove the orphaned include/asm-generic/page.h header that used to
     be included by long-removed mmu-less architectures (me)"

* tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: Fix name collision with ACPI's video.o
  bug: Improve comment
  asm-generic: remove unused asm-generic/page.h
  arch: Rename fbdev header and source files
  arch: Remove struct fb_info from video helpers
  arch: Select fbdev helpers with CONFIG_VIDEO
  bitops: Change function return types from long to int
2024-05-20 15:18:34 -07:00
Samuel Holland
372f662345 LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
asm/fpu.h, so it only needs to add kernel_fpu_available() and export the
CFLAGS adjustments.

Link: https://lkml.kernel.org/r/20240329072441.591471-8-samuel.holland@sifive.com
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Acked-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com> 
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-19 14:36:18 -07:00
Linus Torvalds
61307b7be4 The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.  Notable
 series include:
 
 - Lucas Stach has provided some page-mapping
   cleanup/consolidation/maintainability work in the series "mm/treewide:
   Remove pXd_huge() API".
 
 - In the series "Allow migrate on protnone reference with
   MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
   MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
   test.
 
 - In their series "Memory allocation profiling" Kent Overstreet and
   Suren Baghdasaryan have contributed a means of determining (via
   /proc/allocinfo) whereabouts in the kernel memory is being allocated:
   number of calls and amount of memory.
 
 - Matthew Wilcox has provided the series "Various significant MM
   patches" which does a number of rather unrelated things, but in largely
   similar code sites.
 
 - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
   Weiner has fixed the page allocator's handling of migratetype requests,
   with resulting improvements in compaction efficiency.
 
 - In the series "make the hugetlb migration strategy consistent" Baolin
   Wang has fixed a hugetlb migration issue, which should improve hugetlb
   allocation reliability.
 
 - Liu Shixin has hit an I/O meltdown caused by readahead in a
   memory-tight memcg.  Addressed in the series "Fix I/O high when memory
   almost met memcg limit".
 
 - In the series "mm/filemap: optimize folio adding and splitting" Kairui
   Song has optimized pagecache insertion, yielding ~10% performance
   improvement in one test.
 
 - Baoquan He has cleaned up and consolidated the early zone
   initialization code in the series "mm/mm_init.c: refactor
   free_area_init_core()".
 
 - Baoquan has also redone some MM initializatio code in the series
   "mm/init: minor clean up and improvement".
 
 - MM helper cleanups from Christoph Hellwig in his series "remove
   follow_pfn".
 
 - More cleanups from Matthew Wilcox in the series "Various page->flags
   cleanups".
 
 - Vlastimil Babka has contributed maintainability improvements in the
   series "memcg_kmem hooks refactoring".
 
 - More folio conversions and cleanups in Matthew Wilcox's series
 
 	"Convert huge_zero_page to huge_zero_folio"
 	"khugepaged folio conversions"
 	"Remove page_idle and page_young wrappers"
 	"Use folio APIs in procfs"
 	"Clean up __folio_put()"
 	"Some cleanups for memory-failure"
 	"Remove page_mapping()"
 	"More folio compat code removal"
 
 - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
   functions to work on folis".
 
 - Code consolidation and cleanup work related to GUP's handling of
   hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
 
 - Rick Edgecombe has developed some fixes to stack guard gaps in the
   series "Cover a guard gap corner case".
 
 - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
   "mm/ksm: fix ksm exec support for prctl".
 
 - Baolin Wang has implemented NUMA balancing for multi-size THPs.  This
   is a simple first-cut implementation for now.  The series is "support
   multi-size THP numa balancing".
 
 - Cleanups to vma handling helper functions from Matthew Wilcox in the
   series "Unify vma_address and vma_pgoff_address".
 
 - Some selftests maintenance work from Dev Jain in the series
   "selftests/mm: mremap_test: Optimizations and style fixes".
 
 - Improvements to the swapping of multi-size THPs from Ryan Roberts in
   the series "Swap-out mTHP without splitting".
 
 - Kefeng Wang has significantly optimized the handling of arm64's
   permission page faults in the series
 
 	"arch/mm/fault: accelerate pagefault when badaccess"
 	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
 
 - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
   GUP-fast".
 
 - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
   use struct vm_fault".
 
 - selftests build fixes from John Hubbard in the series "Fix
   selftests/mm build without requiring "make headers"".
 
 - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
   series "Improved Memory Tier Creation for CPUless NUMA Nodes".  Fixes
   the initialization code so that migration between different memory types
   works as intended.
 
 - David Hildenbrand has improved follow_pte() and fixed an errant driver
   in the series "mm: follow_pte() improvements and acrn follow_pte()
   fixes".
 
 - David also did some cleanup work on large folio mapcounts in his
   series "mm: mapcount for large folios + page_mapcount() cleanups".
 
 - Folio conversions in KSM in Alex Shi's series "transfer page to folio
   in KSM".
 
 - Barry Song has added some sysfs stats for monitoring multi-size THP's
   in the series "mm: add per-order mTHP alloc and swpout counters".
 
 - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
   and limit checking cleanups".
 
 - Matthew Wilcox has been looking at buffer_head code and found the
   documentation to be lacking.  The series is "Improve buffer head
   documentation".
 
 - Multi-size THPs get more work, this time from Lance Yang.  His series
   "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
   the freeing of these things.
 
 - Kemeng Shi has added more userspace-visible writeback instrumentation
   in the series "Improve visibility of writeback".
 
 - Kemeng Shi then sent some maintenance work on top in the series "Fix
   and cleanups to page-writeback".
 
 - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
   series "Improve anon_vma scalability for anon VMAs".  Intel's test bot
   reported an improbable 3x improvement in one test.
 
 - SeongJae Park adds some DAMON feature work in the series
 
 	"mm/damon: add a DAMOS filter type for page granularity access recheck"
 	"selftests/damon: add DAMOS quota goal test"
 
 - Also some maintenance work in the series
 
 	"mm/damon/paddr: simplify page level access re-check for pageout"
 	"mm/damon: misc fixes and improvements"
 
 - David Hildenbrand has disabled some known-to-fail selftests ni the
   series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
 
 - memcg metadata storage optimizations from Shakeel Butt in "memcg:
   reduce memory consumption by memcg stats".
 
 - DAX fixes and maintenance work from Vishal Verma in the series
   "dax/bus.c: Fixups for dax-bus locking".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
 jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
 nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
 =V3R/
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
2024-05-19 09:21:03 -07:00
Huacai Chen
d6af2c7639 LoongArch: Fix callchain parse error with kernel tracepoint events again
With commit d3119bc985 ("LoongArch: Fix callchain parse error with
kernel tracepoint events"), perf can parse kernel callchain, but not
complete and sometimes maybe error. The reason is LoongArch's unwinders
(guess, prologue and orc) don't really need fp (i.e., regs[22]), and
they use sp (i.e., regs[3]) as the frame address rather than the current
stack pointer.

Fix that by removing the assignment of regs[22], and instead assign the
__builtin_frame_address(0) to regs[3].

Without fix:

  Children      Self  Command        Shared Object      Symbol
  ........  ........  .............  .................  ................
  33.91%    33.91%    swapper        [kernel.vmlinux]   [k] __schedule
            |
            |--33.04%--__schedule
            |
             --0.87%--__arch_cpu_idle
                       __schedule

With this fix:

  Children      Self  Command        Shared Object      Symbol
  ........  ........  .............  .................  ................
  31.16%    31.16%    swapper        [kernel.vmlinux]   [k] __schedule
            |
            |--20.63%--smpboot_entry
            |          cpu_startup_entry
            |          schedule_idle
            |          __schedule
            |
             --10.53%--start_kernel
                       cpu_startup_entry
                       schedule_idle
                       __schedule

Fixes: d3119bc985 ("LoongArch: Fix callchain parse error with kernel tracepoint events")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-14 12:24:18 +08:00
Tiezhu Yang
5685d7fcb5 LoongArch: Give a chance to build with !CONFIG_SMP
In the current code, SMP is selected in Kconfig for LoongArch, the users
can not unset it, this is reasonable for a multi-processor machine. But
as the help info of config SMP said, if you have a system with only one
CPU, say N. On a uni-processor machine, the kernel will run faster if you
say N here.

Loongson-2K0500 is a single-core CPU for applications like industrial
control, printing terminals, and BMC (Baseboard Management Controller),
there are many development boards, products and solutions on the market,
so it is better and necessary to give a chance to build with !CONFIG_SMP
for a uni-processor machine.

First of all, do not select SMP for config LOONGARCH in Kconfig to make
it possible to unset CONFIG_SMP. Then, do some changes to fix warnings
and errors if CONFIG_SMP is not set.

(1) Define get_ipi_irq() only if CONFIG_SMP is set to fix the warning:
arch/loongarch/kernel/irq.c:90:19: warning: 'get_ipi_irq' defined but not used [-Wunused-function]

(2) Add "#ifdef CONFIG_SMP" in asm/smp.h to fix the warning:
./arch/loongarch/include/asm/smp.h:49:9: warning: "raw_smp_processor_id" redefined
   49 | #define raw_smp_processor_id raw_smp_processor_id
      |         ^~~~~~~~~~~~~~~~~~~~
./include/linux/smp.h:198:9: note: this is the location of the previous definition
  198 | #define raw_smp_processor_id()                  0

(3) Define machine_shutdown() as empty under !CONFIG_SMP to fix the error:
arch/loongarch/kernel/machine_kexec.c: In function 'machine_shutdown':
arch/loongarch/kernel/machine_kexec.c:233:25: error: implicit declaration of function 'cpu_device_up'; did you mean 'put_device'? [-Wimplicit-function-declaration]

(4) Make config SCHED_SMT depends on SMP to fix many errors such as:
kernel/sched/core.c: In function 'sched_core_find':
kernel/sched/core.c:310:43: error: 'struct rq' has no member named 'cpu'

(5) Define cpu_logical_map(cpu) as 0 under !CONFIG_SMP in asm/smp.h,
then include asm/smp.h in asm/acpi.h (because acpi.h is included in
linux/irq.h indirectly) to fix many build errors under drivers/irqchip
such as:
drivers/irqchip/irq-loongson-eiointc.c: In function 'cpu_to_eio_node':
drivers/irqchip/irq-loongson-eiointc.c:59:16: error: implicit declaration of function 'cpu_logical_map' [-Wimplicit-function-declaration]

(6) Do not write per_cpu_offset(0) to PERCPU_BASE_KS when resume because
the per_cpu_offset(x) macro is defined as (__per_cpu_offset[x]) only
under CONFIG_SMP in include/asm-generic/percpu.h. Just save the value of
PERCPU_BASE_KS when suspend and restore it when resume to fix the error:
arch/loongarch/power/suspend.c: In function 'loongarch_common_resume':
arch/loongarch/power/suspend.c:47:21: error: implicit declaration of function 'per_cpu_offset' [-Wimplicit-function-declaration]

(7) Fix huge page handling under !CONFIG_SMP in tlbex.S.

When running the UnixBench tests with "-c 1" single-streamed pass, the
improvement of performance is about 9 percent with this patch.

By the way, it is helpful to debug and analysis the kernel issues of
multi-processor system under !CONFIG_SMP.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-14 12:24:18 +08:00
Xi Ruoyao
5125d033c8 LoongArch: Select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
This allows compiling a full 128-bit product of two 64-bit integers as a
mul/mulh pair, instead of a nasty long sequence of 20+ instructions.

However, after selecting ARCH_SUPPORTS_INT128, when optimizing for size
the compiler generates calls to __ashlti3, __ashrti3, and __lshrti3 for
shifting __int128 values, causing a link failure:

    loongarch64-unknown-linux-gnu-ld: kernel/sched/fair.o: in
    function `mul_u64_u32_shr':
    <PATH>/include/linux/math64.h:161:(.text+0x5e4): undefined
    reference to `__lshrti3'

So provide the implementation of these functions if ARCH_SUPPORTS_INT128.

Closes: https://lore.kernel.org/loongarch/CAAhV-H5EZ=7OF7CSiYyZ8_+wWuenpo=K2WT8-6mAT4CvzUC_4g@mail.gmail.com/
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-14 12:24:18 +08:00
Paolo Bonzini
4232da23d7 Merge tag 'loongarch-kvm-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.10

1. Add ParaVirt IPI support.
2. Add software breakpoint support.
3. Add mmio trace events support.
2024-05-10 13:20:18 -04:00
Bibo Mao
163e9fc695 LoongArch: KVM: Add software breakpoint support
When VM runs in kvm mode, system will not exit to host mode when
executing a general software breakpoint instruction such as INSN_BREAK,
trap exception happens in guest mode rather than host mode. In order to
debug guest kernel on host side, one mechanism should be used to let VM
exit to host mode.

Here a hypercall instruction with a special code is used for software
breakpoint usage. VM exits to host mode and kvm hypervisor identifies
the special hypercall code and sets exit_reason with KVM_EXIT_DEBUG. And
then let qemu handle it.

Idea comes from ppc kvm, one api KVM_REG_LOONGARCH_DEBUG_INST is added
to get the hypercall code. VMM needs get sw breakpoint instruction with
this api and set the corresponding sw break point for guest kernel.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06 22:00:47 +08:00
Bibo Mao
74c16b2e2b LoongArch: KVM: Add PV IPI support on guest side
PARAVIRT config option and PV IPI is added for the guest side, function
pv_ipi_init() is used to add IPI sending and IPI receiving hooks. This
function firstly checks whether system runs in VM mode, and if kernel
runs in VM mode, it will call function kvm_para_available() to detect
the current hypervirsor type (now only KVM type detection is supported).
The paravirt functions can work only if current hypervisor type is KVM,
since there is only KVM supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver functions. With
virtual IPI sender, IPI message is stored in memory rather than emulated
HW. IPI multicast is also supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SWI0 is used rather than real IPI HW.
Since VCPU has separate HW SWI0 like HW timer, there is no trap in IPI
interrupt acknowledge. Since IPI message is stored in memory, there is
no trap in getting IPI message.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06 22:00:47 +08:00
Bibo Mao
e33bda7ee5 LoongArch: KVM: Add PV IPI support on host side
On LoongArch system, IPI hw uses iocsr registers. There are one iocsr
register access on IPI sending, and two iocsr access on IPI receiving
for the IPI interrupt handler. In VM mode all iocsr accessing will cause
VM to trap into hypervisor. So with one IPI hw notification there will
be three times of trap.

In this patch PV IPI is added for VM, hypercall instruction is used for
IPI sender, and hypervisor will inject an SWI to the destination vcpu.
During the SWI interrupt handler, only CSR.ESTAT register is written to
clear irq. CSR.ESTAT register access will not trap into hypervisor, so
with PV IPI supported, there is one trap with IPI sender, and no trap
with IPI receiver, there is only one trap with IPI notification.

Also this patch adds IPI multicast support, the method is similar with
x86. With IPI multicast support, IPI notification can be sent to at
most 128 vcpus at one time. It greatly reduces the times of trapping
into hypervisor.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06 22:00:47 +08:00
Bibo Mao
73516e9da5 LoongArch: KVM: Add vcpu mapping from physical cpuid
Physical CPUID is used for interrupt routing for irqchips such as ipi,
msgint and eiointc interrupt controllers. Physical CPUID is stored at
the CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
is created and the physical CPUIDs of two vcpus cannot be the same.

Different irqchips have different size declaration about physical CPUID,
the max CPUID value for CSR LOONGARCH_CSR_CPUID on Loongson-3A5000 is
512, the max CPUID supported by IPI hardware is 1024, while for eiointc
irqchip is 256, and for msgint irqchip is 65536.

The smallest value from all interrupt controllers is selected now, and
the max cpuid size is defines as 256 by KVM which comes from the eiointc
irqchip.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06 22:00:47 +08:00
Bibo Mao
9753d30379 LoongArch: KVM: Add cpucfg area for kvm hypervisor
Instruction cpucfg can be used to get processor features. And there
is a trap exception when it is executed in VM mode, and also it can be
used to provide cpu features to VM. On real hardware cpucfg area 0 - 20
is used by now. Here one specified area 0x40000000 -- 0x400000ff is used
for KVM hypervisor to provide PV features, and the area can be extended
for other hypervisors in future. This area will never be used for real
HW, it is only used by software.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06 22:00:47 +08:00
Bibo Mao
372631bb62 LoongArch: KVM: Add hypercall instruction emulation
On LoongArch system, there is a hypercall instruction special for
virtualization. When system executes this instruction on host side,
there is an illegal instruction exception reported, however it will
trap into host when it is executed in VM mode.

When hypercall is emulated, A0 register is set with value
KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid
instruction exception. So VM can continue to executing the next code.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06 22:00:46 +08:00
Bibo Mao
316863cb62 LoongArch/smp: Refine some ipi functions on LoongArch platform
Refine the ipi handling on LoongArch platform, there are three
modifications:

1. Add generic function get_percpu_irq(), replacing some percpu irq
functions such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with
get_percpu_irq().

2. Change definition about parameter action called by function
loongson_send_ipi_single() and loongson_send_ipi_mask(), and it is
defined as decimal encoding format at ipi sender side. Normal decimal
encoding is used rather than binary bitmap encoding for ipi action, ipi
hw sender uses decimal encoding code, and ipi receiver will get binary
bitmap encoding, the ipi hw will convert it into bitmap in ipi message
buffer.

3. Add a structure smp_ops on LoongArch platform so that pv ipi can be
used later.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06 22:00:46 +08:00
Thomas Zimmermann
2fd001cd36
arch: Rename fbdev header and source files
The per-architecture fbdev code has no dependencies on fbdev and can
be used for any video-related subsystem. Rename the files to 'video'.
Use video-sti.c on parisc as the source file depends on CONFIG_STI_CORE.

On arc, arm, arm64, sh, and um the asm header file is an empty wrapper
around the file in asm-generic. Let Kbuild generate the file. The build
system does this automatically. Only um needs to generate video.h
explicitly, so that it overrides the host architecture's header. The
latter would otherwise interfere with the build.

Further update all includes statements, include guards, and Makefiles.
Also update a few strings and comments to refer to video instead of
fbdev.

v3:
- arc, arm, arm64, sh: generate asm header via build system (Sam,
Helge, Arnd)
- um: rename fb.h to video.h
- fix typo in commit message (Sam)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-05-03 17:07:50 +02:00
Kent Overstreet
0069455bcb fix missing vmalloc.h includes
Patch series "Memory allocation profiling", v6.

Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for
debug kernels, overhead low enough to be deployed in production.

Example output:
  root@moria-kvm:~# sort -rn /proc/allocinfo
   127664128    31168 mm/page_ext.c:270 func:alloc_page_ext
    56373248     4737 mm/slub.c:2259 func:alloc_slab_page
    14880768     3633 mm/readahead.c:247 func:page_cache_ra_unbounded
    14417920     3520 mm/mm_init.c:2530 func:alloc_large_system_hash
    13377536      234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
    11718656     2861 mm/filemap.c:1919 func:__filemap_get_folio
     9192960     2800 kernel/fork.c:307 func:alloc_thread_stack_node
     4206592        4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
     4136960     1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
     3940352      962 mm/memory.c:4214 func:alloc_anon_folio
     2894464    22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
     ...

Usage:
kconfig options:
 - CONFIG_MEM_ALLOC_PROFILING
 - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
 - CONFIG_MEM_ALLOC_PROFILING_DEBUG
   adds warnings for allocations that weren't accounted because of a
   missing annotation

sysctl:
  /proc/sys/vm/mem_profiling

Runtime info:
  /proc/allocinfo

Notes:

[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
(6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)  && CONFIG_MEMCG_KMEM=y
(7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y

Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:

                        kmalloc                 pgalloc
(1 baseline)            6.764s                  16.902s
(2 default disabled)    6.793s  (+0.43%)        17.007s (+0.62%)
(3 default enabled)     7.197s  (+6.40%)        23.666s (+40.02%)
(4 runtime enabled)     7.405s  (+9.48%)        23.901s (+41.41%)
(5 memcg)               13.388s (+97.94%)       48.460s (+186.71%)
(6 def disabled+memcg)  13.332s (+97.10%)       48.105s (+184.61%)
(7 def enabled+memcg)   13.446s (+98.78%)       54.963s (+225.18%)

Memory overhead:
Kernel size:

   text           data        bss         dec         diff
(1) 26515311	      18890222    17018880    62424413
(2) 26524728	      19423818    16740352    62688898    264485
(3) 26524724	      19423818    16740352    62688894    264481
(4) 26524728	      19423818    16740352    62688898    264485
(5) 26541782	      18964374    16957440    62463596    39183

Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags:           192 kB
PageExts:         262144 kB (256MB)
SlabExts:           9876 kB (9.6MB)
PcpuExts:            512 kB (0.5MB)

Total overhead is 0.2% of total memory.

Benchmarks:

Hackbench tests run 100 times:
hackbench -s 512 -l 200 -g 15 -f 25 -P
      baseline       disabled profiling           enabled profiling
avg   0.3543         0.3559 (+0.0016)             0.3566 (+0.0023)
stdev 0.0137         0.0188                       0.0077


hackbench -l 10000
      baseline       disabled profiling           enabled profiling
avg   6.4218         6.4306 (+0.0088)             6.5077 (+0.0859)
stdev 0.0933         0.0286                       0.0489

stress-ng tests:
stress-ng --class memory --seq 4 -t 60
stress-ng --class cpu --seq 4 -t 60
Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/

[2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/


This patch (of 37):

The next patch drops vmalloc.h from a system header in order to fix a
circular dependency; this adds it to all the files that were pulling it in
implicitly.

[kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c]
  Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev
[surenb@google.com: fix arch/x86/mm/numa_32.c]
  Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com
[kent.overstreet@linux.dev: a few places were depending on sizes.h]
  Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev
[arnd@arndb.de: fix mm/kasan/hw_tags.c]
  Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org
[surenb@google.com: fix arc build]
  Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:55:49 -07:00
Huacai Chen
d3119bc985 LoongArch: Fix callchain parse error with kernel tracepoint events
In order to fix perf's callchain parse error for LoongArch, we implement
perf_arch_fetch_caller_regs() which fills several necessary registers
used for callchain unwinding, including sp, fp, and era. This is similar
to the following commits.

commit b3eac0265b:
("arm: perf: Fix callchain parse error with kernel tracepoint events")

commit 5b09a094f2:
("arm64: perf: Fix callchain parse error with kernel tracepoint events")

commit 9a7e8ec0d4:
("riscv: perf: Fix callchain parse error with kernel tracepoint events")

Test with commands:

 perf record -e sched:sched_switch -g --call-graph dwarf
 perf report

Without this patch:

 Children      Self  Command        Shared Object      Symbol
 ........  ........  .............  .................  ....................

 43.41%    43.41%  swapper          [unknown]          [k] 0000000000000000

 10.94%    10.94%  loong-container  [unknown]          [k] 0000000000000000
         |
         |--5.98%--0x12006ba38
         |
         |--2.56%--0x12006bb84
         |
          --2.40%--0x12006b6b8

With this patch, callchain can be parsed correctly:

 Children      Self  Command        Shared Object      Symbol
 ........  ........  .............  .................  ....................

 47.57%    47.57%  swapper          [kernel.vmlinux]   [k] __schedule
         |
         ---__schedule

 26.76%    26.76%  loong-container  [kernel.vmlinux]   [k] __schedule
         |
         |--13.78%--0x12006ba38
         |          |
         |          |--9.19%--__schedule
         |          |
         |           --4.59%--handle_syscall
         |                     do_syscall
         |                     sys_futex
         |                     do_futex
         |                     futex_wait
         |                     futex_wait_queue_me
         |                     hrtimer_start_range_ns
         |                     __schedule
         |
         |--8.38%--0x12006bb84
         |          handle_syscall
         |          do_syscall
         |          sys_epoll_pwait
         |          do_epoll_wait
         |          schedule_hrtimeout_range_clock
         |          hrtimer_start_range_ns
         |          __schedule
         |
          --4.59%--0x12006b6b8
                    handle_syscall
                    do_syscall
                    sys_nanosleep
                    hrtimer_nanosleep
                    do_nanosleep
                    hrtimer_start_range_ns
                    __schedule

Cc: stable@vger.kernel.org
Fixes: b37042b2bb ("LoongArch: Add perf events support")
Reported-by: Youling Tang <tangyouling@kylinos.cn>
Suggested-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-24 12:36:07 +08:00
David Hildenbrand
7ab22b5c2a LoongArch: Fix a build error due to __tlb_remove_tlb_entry()
With LLVM=1 and W=1 we get:

  ./include/asm-generic/tlb.h:629:10: error: parameter 'ptep' set
  but not used [-Werror,-Wunused-but-set-parameter]

We fixed a similar issue via Arnd in the introducing commit, missed the
LoongArch variant. Turns out, there is no need for LoongArch to have a
custom variant, so let's just drop it and rely on the asm-generic one.

Fixes: 4d5bf0b618 ("mm/mmu_gather: add tlb_remove_tlb_entries()")
Closes: https://lkml.kernel.org/r/CANiq72mQh3O9S4umbvrKBgMMorty48UMwS01U22FR0mRyd3cyQ@mail.gmail.com
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-24 12:36:07 +08:00
Baoquan He
697f334247 LoongArch: Fix Kconfig item and left code related to CRASH_CORE
In commit 85fcde402d ("kexec: split crashkernel reservation code
out from crash_core.c"), crashkernel reservation code is split out from
crash_core.c, and add CRASH_RESERVE to control it. And also rename each
ARCH's <asm/crash_core.h> to <asm/crash_reserve.h> accordingly.

But the relevant part in LoongArch is missed. Do it now.

Fixes: 85fcde402d ("kexec: split crashkernel reservation code out from crash_core.c")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-24 12:36:07 +08:00
Paolo Bonzini
f3b65bbaed KVM: delete .change_pte MMU notifier callback
The .change_pte() MMU notifier callback was intended as an
optimization. The original point of it was that KSM could tell KVM to flip
its secondary PTE to a new location without having to first zap it. At
the time there was also an .invalidate_page() callback; both of them were
*not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
and .invalidate_page() also doubled as a fallback implementation of
.change_pte().

Later on, however, both callbacks were changed to occur within an
invalidate_range_start/end() block.

In the case of .change_pte(), commit 6bdb913f0a ("mm: wrap calls to
set_pte_at_notify with invalidate_range_start and invalidate_range_end",
2012-10-09) did so to remove the fallback from .invalidate_page() to
.change_pte() and allow sleepable .invalidate_page() hooks.

This however made KVM's usage of the .change_pte() callback completely
moot, because KVM unmaps the sPTEs during .invalidate_range_start()
and therefore .change_pte() has no hope of finding a sPTE to change.
Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
well as all the architecture specific implementations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anup Patel <anup@brainfault.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20240405115815.3226315-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11 13:18:27 -04:00
Randy Dunlap
a07c772fa6 LoongArch: Include linux/sizes.h in addrspace.h to prevent build errors
LoongArch's include/asm/addrspace.h uses SZ_32M and SZ_16K, so add
<linux/sizes.h> to provide those macros to prevent build errors:

In file included from ../arch/loongarch/include/asm/io.h:11,
                 from ../include/linux/io.h:13,
                 from ../include/linux/io-64-nonatomic-lo-hi.h:5,
                 from ../drivers/cxl/pci.c:4:
../include/asm-generic/io.h: In function 'ioport_map':
../arch/loongarch/include/asm/addrspace.h:124:25: error: 'SZ_32M' undeclared (first use in this function); did you mean 'PS_32M'?
  124 | #define PCI_IOSIZE      SZ_32M

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-10 21:08:51 +08:00
Huacai Chen
0ca84aeaee LoongArch: Make {virt, phys, page, pfn} translation work with KFENCE
KFENCE changes virt_to_page() to be able to translate tlb mapped virtual
addresses, but forget to change virt_to_phys()/phys_to_virt() and other
translation functions as well. This patch fix it, otherwise some drivers
(such as nvme and virtio-blk) cannot work with KFENCE.

All {virt, phys, page, pfn} translation functions are updated:
1, virt_to_pfn()/pfn_to_virt();
2, virt_to_page()/page_to_virt();
3, virt_to_phys()/phys_to_virt().

DMW/TLB mapped addresses are distinguished by comparing the vaddress
with vm_map_base in virt_to_xyz(), and we define WANT_PAGE_VIRTUAL in
the KFENCE case for the reverse translations, xyz_to_virt().

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-04-10 21:08:51 +08:00
Linus Torvalds
1e3cd03c54 LoongArch changes for v6.9
1, Add objtool support for LoongArch;
 2, Add ORC stack unwinder support for LoongArch;
 3, Add kernel livepatching support for LoongArch;
 4, Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig;
 5, Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig;
 6, Some bug fixes and other small changes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmX69KsWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImerjFD/9rAzm1+G4VxFvFzlOiJXEqquNJ
 +Vz2fAZLU3lJhBlx0uUKXFVijvLe8s/DnoLrM9e/M6gk4ivT9eszy3DnqT3NjGDX
 njYFkPUWZhZGACmbkoVk9St80R8sPIdZrwXtW3q7g3T0bC7LXUXrJw52Sh4gmbYx
 RqLsE6GoEWGY0zhhWqeeAM9LkKDuLxxyjH4fYE4g623EhQt7A7hP5okyaC+xHzp+
 qp/4dPFLu61LeqIfeBUKK7nQ6uzno3EWLiME2eHEHiuelYfzmh+BtNMcX9Ugb/En
 j0vLGNsoDGmEYw7xGa6OSRaCR/nCwVJz4SvuH33wbbbHhVAiUKUBVNFR3gmAtLlc
 BSa2dDZbKhHkiWSUCM9K2ihr7WiQNuraTK1kKHwBgfa+RbEVOTu1q8yokAB9XCaT
 T7lijJ8MKQmzHpMvgev7nN41baDB6V5bPIni0Ueh+NhQJKZ2/IxtYA3XzV5D0UgL
 TBovVgYB/VNThS9gzOrlenKuDX9hT+kCQgyudErXaoIo645P6dsPFowOZRQxCEIv
 WnLskZatLTCA8xWl1XyC1bqtGxhp34Gbhg0ZcvUqlNE20luaK/qi8wtW9Mv1Utp+
 aXFO3i7d93z99oAcUT0oc1N83T0x0M/p69Z+rL/2+L0sYQgBf1cwUEiDNRW4OCdI
 h15289rRTxjeL7NZPw==
 =lSkY
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Add objtool support for LoongArch

 - Add ORC stack unwinder support for LoongArch

 - Add kernel livepatching support for LoongArch

 - Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig

 - Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig

 - Some bug fixes and other small changes

* tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch/crypto: Clean up useless assignment operations
  LoongArch: Define the __io_aw() hook as mmiowb()
  LoongArch: Remove superfluous flush_dcache_page() definition
  LoongArch: Move {dmw,tlb}_virt_to_page() definition to page.h
  LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
  LoongArch: Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig
  LoongArch: Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig
  LoongArch: Add kernel livepatching support
  LoongArch: Add ORC stack unwinder support
  objtool: Check local label in read_unwind_hints()
  objtool: Check local label in add_dead_ends()
  objtool/LoongArch: Enable orc to be built
  objtool/x86: Separate arch-specific and generic parts
  objtool/LoongArch: Implement instruction decoder
  objtool/LoongArch: Enable objtool to be built
2024-03-22 10:22:45 -07:00
Huacai Chen
9c68ece8b2 LoongArch: Define the __io_aw() hook as mmiowb()
Commit fb24ea52f7 ("drivers: Remove explicit invocations of
mmiowb()") remove all mmiowb() in drivers, but it says:

"NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation."

The mmio in radeon_ring_commit() is protected by a mutex rather than a
spinlock, but in the mutex fastpath it behaves similar to spinlock. We
can add mmiowb() calls in the radeon driver but the maintainer says he
doesn't like such a workaround, and radeon is not the only example of
mutex protected mmio.

So we should extend the mmiowb tracking system from spinlock to mutex,
and maybe other locking primitives. This is not easy and error prone, so
we solve it in the architectural code, by simply defining the __io_aw()
hook as mmiowb(). And we no longer need to override queued_spin_unlock()
so use the generic definition.

Without this, we get such an error when run 'glxgears' on weak ordering
architectures such as LoongArch:

radeon 0000:04:00.0: ring 0 stalled for more than 10324msec
radeon 0000:04:00.0: ring 3 stalled for more than 10240msec
radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3)
radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)

Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@amd.com/T/#t
Link: https://lore.kernel.org/linux-arch/20240301130532.3953167-1-chenhuacai@loongson.cn/T/#t
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-19 15:50:34 +08:00
Huacai Chen
82bf60a6fe LoongArch: Remove superfluous flush_dcache_page() definition
LoongArch doesn't have cache aliases, so flush_dcache_page() is a no-op.
There is a generic implementation for this case in include/asm-generic/
cacheflush.h. So remove the superfluous flush_dcache_page() definition,
which also silences such build warnings:

   In file included from crypto/scompress.c:12:
   include/crypto/scatterwalk.h: In function 'scatterwalk_pagedone':
   include/crypto/scatterwalk.h:76:30: warning: variable 'page' set but not used [-Wunused-but-set-variable]
      76 |                 struct page *page;
         |                              ^~~~
   crypto/scompress.c: In function 'scomp_acomp_comp_decomp':
>> crypto/scompress.c:174:38: warning: unused variable 'dst_page' [-Wunused-variable]
     174 |                         struct page *dst_page = sg_page(req->dst);
         |

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202403091614.NeUw5zcv-lkp@intel.com/
Suggested-by: Barry Song <baohua@kernel.org>
Acked-by: Barry Song <baohua@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-19 15:50:34 +08:00
Max Kellermann
d42ab9af60 LoongArch: Move {dmw,tlb}_virt_to_page() definition to page.h
These two functions are implemented in pgtable.c, and they are needed
only by the virt_to_page() macro in page.h. Having the prototypes in
pgtable.h causes a circular dependency between page.h and pgtable.h,
because the virt_to_page() macro in page.h needs pgtable.h for these
two functions, while pgtable.h needs various definitions from page.h
(e.g. pte_t and pgt_t).

Let's avoid this circular dependency by moving the function prototypes
to page.h.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-19 15:50:34 +08:00
Huacai Chen
c87e12e0e8 LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
From GCC commit 3f13154553f8546a ("df-scan: remove ad-hoc handling of
global regs in asms"), global registers will no longer be forced to add
to the def-use chain. Then current_thread_info(), current_stack_pointer
and __my_cpu_offset may be lifted out of the loop because they are no
longer treated as "volatile variables".

This optimization is still correct for the current_thread_info() and
current_stack_pointer usages because they are associated to a thread.
However it is wrong for __my_cpu_offset because it is associated to a
CPU rather than a thread: if the thread migrates to a different CPU in
the loop, __my_cpu_offset should be changed.

Change __my_cpu_offset definition to treat it as a "volatile variable",
in order to avoid such a mis-optimization.

Cc: stable@vger.kernel.org
Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Reported-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Xing Li <lixing@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Rui Wang <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-19 15:50:34 +08:00
Linus Torvalds
4f712ee0cb S390:
* Changes to FPU handling came in via the main s390 pull request
 
 * Only deliver to the guest the SCLP events that userspace has
   requested.
 
 * More virtual vs physical address fixes (only a cleanup since
   virtual and physical address spaces are currently the same).
 
 * Fix selftests undefined behavior.
 
 x86:
 
 * Fix a restriction that the guest can't program a PMU event whose
   encoding matches an architectural event that isn't included in the
   guest CPUID.  The enumeration of an architectural event only says
   that if a CPU supports an architectural event, then the event can be
   programmed *using the architectural encoding*.  The enumeration does
   NOT say anything about the encoding when the CPU doesn't report support
   the event *in general*.  It might support it, and it might support it
   using the same encoding that made it into the architectural PMU spec.
 
 * Fix a variety of bugs in KVM's emulation of RDPMC (more details on
   individual commits) and add a selftest to verify KVM correctly emulates
   RDMPC, counter availability, and a variety of other PMC-related
   behaviors that depend on guest CPUID and therefore are easier to
   validate with selftests than with custom guests (aka kvm-unit-tests).
 
 * Zero out PMU state on AMD if the virtual PMU is disabled, it does not
   cause any bug but it wastes time in various cases where KVM would check
   if a PMC event needs to be synthesized.
 
 * Optimize triggering of emulated events, with a nice ~10% performance
   improvement in VM-Exit microbenchmarks when a vPMU is exposed to the
   guest.
 
 * Tighten the check for "PMI in guest" to reduce false positives if an NMI
   arrives in the host while KVM is handling an IRQ VM-Exit.
 
 * Fix a bug where KVM would report stale/bogus exit qualification information
   when exiting to userspace with an internal error exit code.
 
 * Add a VMX flag in /proc/cpuinfo to report 5-level EPT support.
 
 * Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for
   read, e.g. to avoid serializing vCPUs when userspace deletes a memslot.
 
 * Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB).  KVM
   doesn't support yielding in the middle of processing a zap, and 1GiB
   granularity resulted in multi-millisecond lags that are quite impolite
   for CONFIG_PREEMPT kernels.
 
 * Allocate write-tracking metadata on-demand to avoid the memory overhead when
   a kernel is built with i915 virtualization support but the workloads use
   neither shadow paging nor i915 virtualization.
 
 * Explicitly initialize a variety of on-stack variables in the emulator that
   triggered KMSAN false positives.
 
 * Fix the debugregs ABI for 32-bit KVM.
 
 * Rework the "force immediate exit" code so that vendor code ultimately decides
   how and when to force the exit, which allowed some optimization for both
   Intel and AMD.
 
 * Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
   vCPU creation ultimately failed, causing extra unnecessary work.
 
 * Cleanup the logic for checking if the currently loaded vCPU is in-kernel.
 
 * Harden against underflowing the active mmu_notifier invalidation
   count, so that "bad" invalidations (usually due to bugs elsehwere in the
   kernel) are detected earlier and are less likely to hang the kernel.
 
 x86 Xen emulation:
 
 * Overlay pages can now be cached based on host virtual address,
   instead of guest physical addresses.  This removes the need to
   reconfigure and invalidate the cache if the guest changes the
   gpa but the underlying host virtual address remains the same.
 
 * When possible, use a single host TSC value when computing the deadline for
   Xen timers in order to improve the accuracy of the timer emulation.
 
 * Inject pending upcall events when the vCPU software-enables its APIC to fix
   a bug where an upcall can be lost (and to follow Xen's behavior).
 
 * Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
   events fails, e.g. if the guest has aliased xAPIC IDs.
 
 RISC-V:
 
 * Support exception and interrupt handling in selftests
 
 * New self test for RISC-V architectural timer (Sstc extension)
 
 * New extension support (Ztso, Zacas)
 
 * Support userspace emulation of random number seed CSRs.
 
 ARM:
 
 * Infrastructure for building KVM's trap configuration based on the
   architectural features (or lack thereof) advertised in the VM's ID
   registers
 
 * Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
   x86's WC) at stage-2, improving the performance of interacting with
   assigned devices that can tolerate it
 
 * Conversion of KVM's representation of LPIs to an xarray, utilized to
   address serialization some of the serialization on the LPI injection
   path
 
 * Support for _architectural_ VHE-only systems, advertised through the
   absence of FEAT_E2H0 in the CPU's ID register
 
 * Miscellaneous cleanups, fixes, and spelling corrections to KVM and
   selftests
 
 LoongArch:
 
 * Set reserved bits as zero in CPUCFG.
 
 * Start SW timer only when vcpu is blocking.
 
 * Do not restart SW timer when it is expired.
 
 * Remove unnecessary CSR register saving during enter guest.
 
 * Misc cleanups and fixes as usual.
 
 Generic:
 
 * cleanup Kconfig by removing CONFIG_HAVE_KVM, which was basically always
   true on all architectures except MIPS (where Kconfig determines the
   available depending on CPU capabilities).  It is replaced either by
   an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM)
   everywhere else.
 
 * Factor common "select" statements in common code instead of requiring
   each architecture to specify it
 
 * Remove thoroughly obsolete APIs from the uapi headers.
 
 * Move architecture-dependent stuff to uapi/asm/kvm.h
 
 * Always flush the async page fault workqueue when a work item is being
   removed, especially during vCPU destruction, to ensure that there are no
   workers running in KVM code when all references to KVM-the-module are gone,
   i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded.
 
 * Grab a reference to the VM's mm_struct in the async #PF worker itself instead
   of gifting the worker a reference, so that there's no need to remember
   to *conditionally* clean up after the worker.
 
 Selftests:
 
 * Reduce boilerplate especially when utilize selftest TAP infrastructure.
 
 * Add basic smoke tests for SEV and SEV-ES, along with a pile of library
   support for handling private/encrypted/protected memory.
 
 * Fix benign bugs where tests neglect to close() guest_memfd files.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmX0iP8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroND7wf+JZoNvwZ+bmwWe/4jn/YwNoYi/C5z
 eypn8M1gsWEccpCpqPBwznVm9T29rF4uOlcMvqLEkHfTpaL1EKUUjP1lXPz/ileP
 6a2RdOGxAhyTiFC9fjy+wkkjtLbn1kZf6YsS0hjphP9+w0chNbdn0w81dFVnXryd
 j7XYI8R/bFAthNsJOuZXSEjCfIHxvTTG74OrTf1B1FEBB+arPmrgUeJftMVhffQK
 Sowgg8L/Ii/x6fgV5NZQVSIyVf1rp8z7c6UaHT4Fwb0+RAMW8p9pYv9Qp1YkKp8y
 5j0V9UzOHP7FRaYimZ5BtwQoqiZXYylQ+VuU/Y2f4X85cvlLzSqxaEMAPA==
 =mqOV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "S390:

   - Changes to FPU handling came in via the main s390 pull request

   - Only deliver to the guest the SCLP events that userspace has
     requested

   - More virtual vs physical address fixes (only a cleanup since
     virtual and physical address spaces are currently the same)

   - Fix selftests undefined behavior

  x86:

   - Fix a restriction that the guest can't program a PMU event whose
     encoding matches an architectural event that isn't included in the
     guest CPUID. The enumeration of an architectural event only says
     that if a CPU supports an architectural event, then the event can
     be programmed *using the architectural encoding*. The enumeration
     does NOT say anything about the encoding when the CPU doesn't
     report support the event *in general*. It might support it, and it
     might support it using the same encoding that made it into the
     architectural PMU spec

   - Fix a variety of bugs in KVM's emulation of RDPMC (more details on
     individual commits) and add a selftest to verify KVM correctly
     emulates RDMPC, counter availability, and a variety of other
     PMC-related behaviors that depend on guest CPUID and therefore are
     easier to validate with selftests than with custom guests (aka
     kvm-unit-tests)

   - Zero out PMU state on AMD if the virtual PMU is disabled, it does
     not cause any bug but it wastes time in various cases where KVM
     would check if a PMC event needs to be synthesized

   - Optimize triggering of emulated events, with a nice ~10%
     performance improvement in VM-Exit microbenchmarks when a vPMU is
     exposed to the guest

   - Tighten the check for "PMI in guest" to reduce false positives if
     an NMI arrives in the host while KVM is handling an IRQ VM-Exit

   - Fix a bug where KVM would report stale/bogus exit qualification
     information when exiting to userspace with an internal error exit
     code

   - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support

   - Rework TDP MMU root unload, free, and alloc to run with mmu_lock
     held for read, e.g. to avoid serializing vCPUs when userspace
     deletes a memslot

   - Tear down TDP MMU page tables at 4KiB granularity (used to be
     1GiB). KVM doesn't support yielding in the middle of processing a
     zap, and 1GiB granularity resulted in multi-millisecond lags that
     are quite impolite for CONFIG_PREEMPT kernels

   - Allocate write-tracking metadata on-demand to avoid the memory
     overhead when a kernel is built with i915 virtualization support
     but the workloads use neither shadow paging nor i915 virtualization

   - Explicitly initialize a variety of on-stack variables in the
     emulator that triggered KMSAN false positives

   - Fix the debugregs ABI for 32-bit KVM

   - Rework the "force immediate exit" code so that vendor code
     ultimately decides how and when to force the exit, which allowed
     some optimization for both Intel and AMD

   - Fix a long-standing bug where kvm_has_noapic_vcpu could be left
     elevated if vCPU creation ultimately failed, causing extra
     unnecessary work

   - Cleanup the logic for checking if the currently loaded vCPU is
     in-kernel

   - Harden against underflowing the active mmu_notifier invalidation
     count, so that "bad" invalidations (usually due to bugs elsehwere
     in the kernel) are detected earlier and are less likely to hang the
     kernel

  x86 Xen emulation:

   - Overlay pages can now be cached based on host virtual address,
     instead of guest physical addresses. This removes the need to
     reconfigure and invalidate the cache if the guest changes the gpa
     but the underlying host virtual address remains the same

   - When possible, use a single host TSC value when computing the
     deadline for Xen timers in order to improve the accuracy of the
     timer emulation

   - Inject pending upcall events when the vCPU software-enables its
     APIC to fix a bug where an upcall can be lost (and to follow Xen's
     behavior)

   - Fall back to the slow path instead of warning if "fast" IRQ
     delivery of Xen events fails, e.g. if the guest has aliased xAPIC
     IDs

  RISC-V:

   - Support exception and interrupt handling in selftests

   - New self test for RISC-V architectural timer (Sstc extension)

   - New extension support (Ztso, Zacas)

   - Support userspace emulation of random number seed CSRs

  ARM:

   - Infrastructure for building KVM's trap configuration based on the
     architectural features (or lack thereof) advertised in the VM's ID
     registers

   - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
     x86's WC) at stage-2, improving the performance of interacting with
     assigned devices that can tolerate it

   - Conversion of KVM's representation of LPIs to an xarray, utilized
     to address serialization some of the serialization on the LPI
     injection path

   - Support for _architectural_ VHE-only systems, advertised through
     the absence of FEAT_E2H0 in the CPU's ID register

   - Miscellaneous cleanups, fixes, and spelling corrections to KVM and
     selftests

  LoongArch:

   - Set reserved bits as zero in CPUCFG

   - Start SW timer only when vcpu is blocking

   - Do not restart SW timer when it is expired

   - Remove unnecessary CSR register saving during enter guest

   - Misc cleanups and fixes as usual

  Generic:

   - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
     always true on all architectures except MIPS (where Kconfig
     determines the available depending on CPU capabilities). It is
     replaced either by an architecture-dependent symbol for MIPS, and
     IS_ENABLED(CONFIG_KVM) everywhere else

   - Factor common "select" statements in common code instead of
     requiring each architecture to specify it

   - Remove thoroughly obsolete APIs from the uapi headers

   - Move architecture-dependent stuff to uapi/asm/kvm.h

   - Always flush the async page fault workqueue when a work item is
     being removed, especially during vCPU destruction, to ensure that
     there are no workers running in KVM code when all references to
     KVM-the-module are gone, i.e. to prevent a very unlikely
     use-after-free if kvm.ko is unloaded

   - Grab a reference to the VM's mm_struct in the async #PF worker
     itself instead of gifting the worker a reference, so that there's
     no need to remember to *conditionally* clean up after the worker

  Selftests:

   - Reduce boilerplate especially when utilize selftest TAP
     infrastructure

   - Add basic smoke tests for SEV and SEV-ES, along with a pile of
     library support for handling private/encrypted/protected memory

   - Fix benign bugs where tests neglect to close() guest_memfd files"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  selftests: kvm: remove meaningless assignments in Makefiles
  KVM: riscv: selftests: Add Zacas extension to get-reg-list test
  RISC-V: KVM: Allow Zacas extension for Guest/VM
  KVM: riscv: selftests: Add Ztso extension to get-reg-list test
  RISC-V: KVM: Allow Ztso extension for Guest/VM
  RISC-V: KVM: Forward SEED CSR access to user space
  KVM: riscv: selftests: Add sstc timer test
  KVM: riscv: selftests: Change vcpu_has_ext to a common function
  KVM: riscv: selftests: Add guest helper to get vcpu id
  KVM: riscv: selftests: Add exception handling support
  LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
  LoongArch: KVM: Do not restart SW timer when it is expired
  LoongArch: KVM: Start SW timer only when vcpu is blocking
  LoongArch: KVM: Set reserved bits as zero in CPUCFG
  KVM: selftests: Explicitly close guest_memfd files in some gmem tests
  KVM: x86/xen: fix recursive deadlock in timer injection
  KVM: pfncache: simplify locking and make more self-contained
  KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
  KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
  KVM: x86/xen: improve accuracy of Xen timers
  ...
2024-03-15 13:03:13 -07:00
Jinyang He
199cc14cb4 LoongArch: Add kernel livepatching support
The arch-specified function ftrace_regs_set_instruction_pointer() has
been implemented in arch/loongarch/include/asm/ftrace.h, so here only
implement arch_stack_walk_reliable() function.

Here are the test logs:

[root@linux fedora]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-6.8.0-rc2 root=/dev/sda3

[root@linux fedora]# modprobe livepatch-sample
[root@linux fedora]# cat /proc/cmdline
this has been live patched

[root@linux fedora]# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
[root@linux fedora]# rmmod livepatch_sample
[root@linux fedora]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-6.8.0-rc2 root=/dev/sda3

[root@linux fedora]# dmesg -t | tail -5
livepatch: enabling patch 'livepatch_sample'
livepatch: 'livepatch_sample': starting patching transition
livepatch: 'livepatch_sample': patching complete
livepatch: 'livepatch_sample': starting unpatching transition
livepatch: 'livepatch_sample': unpatching complete

Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-11 22:23:47 +08:00
Tiezhu Yang
cb8a2ef084 LoongArch: Add ORC stack unwinder support
The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is
similar in concept to a DWARF unwinder. The difference is that the format
of the ORC data is much simpler than DWARF, which in turn allows the ORC
unwinder to be much simpler and faster.

The ORC data consists of unwind tables which are generated by objtool.
After analyzing all the code paths of a .o file, it determines information
about the stack state at each instruction address in the file and outputs
that information to the .orc_unwind and .orc_unwind_ip sections.

The per-object ORC sections are combined at link time and are sorted and
post-processed at boot time. The unwinder uses the resulting data to
correlate instruction addresses with their stack states at run time.

Most of the logic are similar with x86, in order to get ra info before ra
is saved into stack, add ra_reg and ra_offset into orc_entry. At the same
time, modify some arch-specific code to silence the objtool warnings.

Co-developed-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Co-developed-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-03-11 22:23:47 +08:00
Paolo Bonzini
7d8942d8e7 KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
    avoid creating ABI that KVM can't sanely support.
 
  - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
    clear that such VMs are purely a development and testing vehicle, and
    come with zero guarantees.
 
  - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
    is to support confidential VMs with deterministic private memory (SNP
    and TDX) only in the TDP MMU.
 
  - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
    when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmXZB/8ACgkQOlYIJqCj
 N/3XlQ//RIsvqr38k7kELSKhCMyWgF4J57itABrHpMqAZu3gaAo5sETX8AGcHEe5
 mxmquxyNQSf4cthhWy1kzxjGCy6+fk+Z0Z7wzfz0Yd5D+FI6vpo3HhkjovLb2gpt
 kSrHuhJyuj2vkftNvdaz0nHX1QalVyIEnXnR3oqTmxUUsg6lp1x/zr5SP0KBXjo8
 ZzJtyFd0fkRXWpA792T7XPRBWrzPV31HYZBLX8sPlYmJATcbIx9rYSThgCN6XuVN
 bfE6wATsC+mwv5BpCoDFpCKmFcqSqamag9NGe5qE5mOby5DQGYTCRMCQB8YXXBR0
 97ppaY9ZJV4nOVjrYJn6IMOSMVNfoG7nTRFfcd0eFP4tlPEgHwGr5BGDaBtQPkrd
 KcgWJw8nS02eCA2iOE+FtCXvGJwKhTTjQ45w7rU4EcfUk603L5J4GO1ddmjMhPcP
 upGGcWDK9vCGrSUFTm8pyWp/NKRJPvAQEiQd/BweSk9+isQHTX2RYCQgPAQnwlTS
 wTg7ZPNSLoUkRYmd6r+TUT32ELJGNc8GLftMnxIwweq6V7AgNMi0HE60eMovuBNO
 7DAWWzfBEZmJv+0mNNZPGXczHVv4YvMWysRdKkhztBc3+sO7P3AL1zWIDlm5qwoG
 LpFeeI3qo3o5ZNaqGzkSop2pUUGNGpWCH46WmP0AG7RpzW/Natw=
 =M0td
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of https://github.com/kvm-x86/linux into HEAD

KVM GUEST_MEMFD fixes for 6.8:

 - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
   avoid creating ABI that KVM can't sanely support.

 - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
   clear that such VMs are purely a development and testing vehicle, and
   come with zero guarantees.

 - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
   is to support confidential VMs with deterministic private memory (SNP
   and TDX) only in the TDP MMU.

 - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
   when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-09 11:48:35 -05:00
Arnd Bergmann
ba89f9c8cc arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
These four architectures define the same Kconfig symbols for configuring
the page size. Move the logic into a common place where it can be shared
with all other architectures.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-06 19:28:26 +01:00
Linus Torvalds
4356e9f841 work around gcc bugs with 'asm goto' with outputs
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c323 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").

Then, much later, we ended up removing the workaround in commit
43c249ea0b ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.

Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround.  But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.

It looks like there are at least two separate issues that all hit in
this area:

 (a) some versions of gcc don't mark the asm goto as 'volatile' when it
     has outputs:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

     which is easy to work around by just adding the 'volatile' by hand.

 (b) Internal compiler errors:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

     which are worked around by adding the extra empty 'asm' as a
     barrier, as in the original workaround.

but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.

but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.

Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-09 15:57:48 -08:00
Paolo Bonzini
8886640dad kvm: replace __KVM_HAVE_READONLY_MEM with Kconfig symbol
KVM uses __KVM_HAVE_* symbols in the architecture-dependent uapi/asm/kvm.h to mask
unused definitions in include/uapi/linux/kvm.h.  __KVM_HAVE_READONLY_MEM however
was nothing but a misguided attempt to define KVM_CAP_READONLY_MEM only on
architectures where KVM_CHECK_EXTENSION(KVM_CAP_READONLY_MEM) could possibly
return nonzero.  This however does not make sense, and it prevented userspace
from supporting this architecture-independent feature without recompilation.

Therefore, these days __KVM_HAVE_READONLY_MEM does not mask anything and
is only used in virt/kvm/kvm_main.c.  Userspace does not need to test it
and there should be no need for it to exist.  Remove it and replace it
with a Kconfig symbol within Linux source code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:06 -05:00
Huacai Chen
4551b30525 LoongArch: Change acpi_core_pic[NR_CPUS] to acpi_core_pic[MAX_CORE_PIC]
With default config, the value of NR_CPUS is 64. When HW platform has
more then 64 cpus, system will crash on these platforms. MAX_CORE_PIC
is the maximum cpu number in MADT table (max physical number) which can
exceed the supported maximum cpu number (NR_CPUS, max logical number),
but kernel should not crash. Kernel should boot cpus with NR_CPUS, let
the remainder cpus stay in BIOS.

The potential crash reason is that the array acpi_core_pic[NR_CPUS] can
be overflowed when parsing MADT table, and it is obvious that CORE_PIC
should be corresponding to physical core rather than logical core, so it
is better to define the array as acpi_core_pic[MAX_CORE_PIC].

With the patch, system can boot up 64 vcpus with qemu parameter -smp 128,
otherwise system will crash with the following message.

[    0.000000] CPU 0 Unable to handle kernel paging request at virtual address 0000420000004259, era == 90000000037a5f0c, ra == 90000000037a46ec
[    0.000000] Oops[#1]:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-rc2+ #192
[    0.000000] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022
[    0.000000] pc 90000000037a5f0c ra 90000000037a46ec tp 9000000003c90000 sp 9000000003c93d60
[    0.000000] a0 0000000000000019 a1 9000000003d93bc0 a2 0000000000000000 a3 9000000003c93bd8
[    0.000000] a4 9000000003c93a74 a5 9000000083c93a67 a6 9000000003c938f0 a7 0000000000000005
[    0.000000] t0 0000420000004201 t1 0000000000000000 t2 0000000000000001 t3 0000000000000001
[    0.000000] t4 0000000000000003 t5 0000000000000000 t6 0000000000000030 t7 0000000000000063
[    0.000000] t8 0000000000000014 u0 ffffffffffffffff s9 0000000000000000 s0 9000000003caee98
[    0.000000] s1 90000000041b0480 s2 9000000003c93da0 s3 9000000003c93d98 s4 9000000003c93d90
[    0.000000] s5 9000000003caa000 s6 000000000a7fd000 s7 000000000f556b60 s8 000000000e0a4330
[    0.000000]    ra: 90000000037a46ec platform_init+0x214/0x250
[    0.000000]   ERA: 90000000037a5f0c efi_runtime_init+0x30/0x94
[    0.000000]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[    0.000000]  PRMD: 00000000 (PPLV0 -PIE -PWE)
[    0.000000]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[    0.000000]  ECFG: 00070800 (LIE=11 VS=7)
[    0.000000] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
[    0.000000]  BADV: 0000420000004259
[    0.000000]  PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
[    0.000000] Modules linked in:
[    0.000000] Process swapper (pid: 0, threadinfo=(____ptrval____), task=(____ptrval____))
[    0.000000] Stack : 9000000003c93a14 9000000003800898 90000000041844f8 90000000037a46ec
[    0.000000]         000000000a7fd000 0000000008290000 0000000000000000 0000000000000000
[    0.000000]         0000000000000000 0000000000000000 00000000019d8000 000000000f556b60
[    0.000000]         000000000a7fd000 000000000f556b08 9000000003ca7700 9000000003800000
[    0.000000]         9000000003c93e50 9000000003800898 9000000003800108 90000000037a484c
[    0.000000]         000000000e0a4330 000000000f556b60 000000000a7fd000 000000000f556b08
[    0.000000]         9000000003ca7700 9000000004184000 0000000000200000 000000000e02b018
[    0.000000]         000000000a7fd000 90000000037a0790 9000000003800108 0000000000000000
[    0.000000]         0000000000000000 000000000e0a4330 000000000f556b60 000000000a7fd000
[    0.000000]         000000000f556b08 000000000eaae298 000000000eaa5040 0000000000200000
[    0.000000]         ...
[    0.000000] Call Trace:
[    0.000000] [<90000000037a5f0c>] efi_runtime_init+0x30/0x94
[    0.000000] [<90000000037a46ec>] platform_init+0x214/0x250
[    0.000000] [<90000000037a484c>] setup_arch+0x124/0x45c
[    0.000000] [<90000000037a0790>] start_kernel+0x90/0x670
[    0.000000] [<900000000378b0d8>] kernel_entry+0xd8/0xdc

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-02-06 12:32:05 +08:00
Randy Dunlap
48ef9e87b4 LoongArch: KVM: Add returns to SIMD stubs
The stubs for kvm_own/lsx()/kvm_own_lasx() when CONFIG_CPU_HAS_LSX or
CONFIG_CPU_HAS_LASX is not defined should have a return value since they
return an int, so add "return -EINVAL;" to the stubs.
Fixes the build error:

In file included from ../arch/loongarch/include/asm/kvm_csr.h:12,
                 from ../arch/loongarch/kvm/interrupt.c:8:
../arch/loongarch/include/asm/kvm_vcpu.h: In function 'kvm_own_lasx':
../arch/loongarch/include/asm/kvm_vcpu.h:73:39: error: no return statement in function returning non-void [-Werror=return-type]
   73 | static inline int kvm_own_lasx(struct kvm_vcpu *vcpu) { }

Fixes: db1ecca22e ("LoongArch: KVM: Add LSX (128bit SIMD) support")
Fixes: 118e10cd89 ("LoongArch: KVM: Add LASX (256bit SIMD) support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-01-26 16:22:07 +08:00
Linus Torvalds
24fdd51899 LoongArch changes for v6.8
1, Raise minimum clang version to 18.0.0;
 2, Enable initial Rust support for LoongArch;
 3, Add built-in dtb support for LoongArch;
 4, Use generic interface to support crashkernel=X,[high,low];
 5, Some bug fixes and other small changes;
 6, Update the default config file.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmWnW9cWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImel3CD/0Wnd2VOhoPubJkCXd+v7SdPDFB
 +BlkevAdmKQXkxNVXHRwfirsEBnUdQTfSN/5hMd69ZWUTayYq3WFxOcaPs27AAyn
 cXmGAzxfCjanSj+zxK8Gcmef5kppx3PRSbFdnWgc42Povu0xTOH3M31HXx5WXGtv
 hZK439DspNGHlF1Bsbs3J8xbS76jc/HDZAqnIjLuefQUaWM8nhsYxJIwVeGKUX1T
 IyEgBwhHhsY9ho/86yk8VXgordAN4dnMVmAHbR63HqjLo/8sck4IiPNxWKFCHex8
 vgxp0zGxfBBts284EfSofDQHrSrrWl4+e2fW2QJ81BBDSS0wPCs4TAnzH+x9X7Wb
 MJuh8WIJqhfXdPFxs5fdnUeykEm1V/oWFfkWORk4jbQkpY9aZbk/iv6uxsmRhmhv
 2WPWvjF+7B2zSXtMcjgm71ymb/nU95W2FZO02GlwTnbGJRKA2xLkjn9rCXoHWjd3
 IlxgIgZJ1vkPvFPS/sbekaTUEG+6/qTPGGa2Ol3Q5ZTTLk9serfDa8ay1xCZeOny
 +fRBgLsuQAOGO2pvxfXjs+uvboZNUHeKrAi7XeR61GcbNpQDkjuwNJXQMiMQ+f66
 jWM6H+hV+6sQ/W43KVrGCyBqTX4J9PSN/gX/Cq0PL74Yheop6neYXZTl5uDNYDe9
 WYxiS9j/FoYgj8lxYQ==
 =GzFR
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Raise minimum clang version to 18.0.0

 - Enable initial Rust support for LoongArch

 - Add built-in dtb support for LoongArch

 - Use generic interface to support crashkernel=X,[high,low]

 - Some bug fixes and other small changes

 - Update the default config file.

* tag 'loongarch-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (22 commits)
  MAINTAINERS: Add BPF JIT for LOONGARCH entry
  LoongArch: Update Loongson-3 default config file
  LoongArch: BPF: Prevent out-of-bounds memory access
  LoongArch: BPF: Support 64-bit pointers to kfuncs
  LoongArch: Fix definition of ftrace_regs_set_instruction_pointer()
  LoongArch: Use generic interface to support crashkernel=X,[high,low]
  LoongArch: Fix and simplify fcsr initialization on execve()
  LoongArch: Let cores_io_master cover the largest NR_CPUS
  LoongArch: Change SHMLBA from SZ_64K to PAGE_SIZE
  LoongArch: Add a missing call to efi_esrt_init()
  LoongArch: Parsing CPU-related information from DTS
  LoongArch: dts: DeviceTree for Loongson-2K2000
  LoongArch: dts: DeviceTree for Loongson-2K1000
  LoongArch: dts: DeviceTree for Loongson-2K0500
  LoongArch: Allow device trees be built into the kernel
  dt-bindings: interrupt-controller: loongson,liointc: Fix dtbs_check warning for interrupt-names
  dt-bindings: interrupt-controller: loongson,liointc: Fix dtbs_check warning for reg-names
  dt-bindings: loongarch: Add Loongson SoC boards compatibles
  dt-bindings: loongarch: Add CPU bindings for LoongArch
  LoongArch: Enable initial Rust support
  ...
2024-01-19 13:30:49 -08:00
Linus Torvalds
09d1c6a80f Generic:
- Use memdup_array_user() to harden against overflow.
 
 - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
 
 - Clean up Kconfigs that all KVM architectures were selecting
 
 - New functionality around "guest_memfd", a new userspace API that
   creates an anonymous file and returns a file descriptor that refers
   to it.  guest_memfd files are bound to their owning virtual machine,
   cannot be mapped, read, or written by userspace, and cannot be resized.
   guest_memfd files do however support PUNCH_HOLE, which can be used to
   switch a memory area between guest_memfd and regular anonymous memory.
 
 - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
   per-page attributes for a given page of guest memory; right now the
   only attribute is whether the guest expects to access memory via
   guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
   TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
   confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
 
 x86:
 
 - Support for "software-protected VMs" that can use the new guest_memfd
   and page attributes infrastructure.  This is mostly useful for testing,
   since there is no pKVM-like infrastructure to provide a meaningfully
   reduced TCB.
 
 - Fix a relatively benign off-by-one error when splitting huge pages during
   CLEAR_DIRTY_LOG.
 
 - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
   TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
 
 - Use more generic lockdep assertions in paths that don't actually care
   about whether the caller is a reader or a writer.
 
 - let Xen guests opt out of having PV clock reported as "based on a stable TSC",
   because some of them don't expect the "TSC stable" bit (added to the pvclock
   ABI by KVM, but never set by Xen) to be set.
 
 - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
 
 - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
   flushes on nested transitions, i.e. always satisfies flush requests.  This
   allows running bleeding edge versions of VMware Workstation on top of KVM.
 
 - Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
 
 - On AMD machines with vNMI, always rely on hardware instead of intercepting
   IRET in some cases to detect unmasking of NMIs
 
 - Support for virtualizing Linear Address Masking (LAM)
 
 - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
   prior to refreshing the vPMU model.
 
 - Fix a double-overflow PMU bug by tracking emulated counter events using a
   dedicated field instead of snapshotting the "previous" counter.  If the
   hardware PMC count triggers overflow that is recognized in the same VM-Exit
   that KVM manually bumps an event count, KVM would pend PMIs for both the
   hardware-triggered overflow and for KVM-triggered overflow.
 
 - Turn off KVM_WERROR by default for all configs so that it's not
   inadvertantly enabled by non-KVM developers, which can be problematic for
   subsystems that require no regressions for W=1 builds.
 
 - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
   "features".
 
 - Don't force a masterclock update when a vCPU synchronizes to the current TSC
   generation, as updating the masterclock can cause kvmclock's time to "jump"
   unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
 
 - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
   partly as a super minor optimization, but mostly to make KVM play nice with
   position independent executable builds.
 
 - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
   CONFIG_HYPERV as a minor optimization, and to self-document the code.
 
 - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
   at build time.
 
 ARM64:
 
 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
   base granule sizes. Branch shared with the arm64 tree.
 
 - Large Fine-Grained Trap rework, bringing some sanity to the
   feature, although there is more to come. This comes with
   a prefix branch shared with the arm64 tree.
 
 - Some additional Nested Virtualization groundwork, mostly
   introducing the NV2 VNCR support and retargetting the NV
   support to that version of the architecture.
 
 - A small set of vgic fixes and associated cleanups.
 
 Loongarch:
 
 - Optimization for memslot hugepage checking
 
 - Cleanup and fix some HW/SW timer issues
 
 - Add LSX/LASX (128bit/256bit SIMD) support
 
 RISC-V:
 
 - KVM_GET_REG_LIST improvement for vector registers
 
 - Generate ISA extension reg_list using macros in get-reg-list selftest
 
 - Support for reporting steal time along with selftest
 
 s390:
 
 - Bugfixes
 
 Selftests:
 
 - Fix an annoying goof where the NX hugepage test prints out garbage
   instead of the magic token needed to run the test.
 
 - Fix build errors when a header is delete/moved due to a missing flag
   in the Makefile.
 
 - Detect if KVM bugged/killed a selftest's VM and print out a helpful
   message instead of complaining that a random ioctl() failed.
 
 - Annotate the guest printf/assert helpers with __printf(), and fix the
   various bugs that were lurking due to lack of said annotation.
 
 There are two non-KVM patches buried in the middle of guest_memfd support:
 
   fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
   mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
 
 The first is small and mostly suggested-by Christian Brauner; the second
 a bit less so but it was written by an mm person (Vlastimil Babka).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
 6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
 wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
 4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
 rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
 a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
 =66Ws
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Generic:

   - Use memdup_array_user() to harden against overflow.

   - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
     architectures.

   - Clean up Kconfigs that all KVM architectures were selecting

   - New functionality around "guest_memfd", a new userspace API that
     creates an anonymous file and returns a file descriptor that refers
     to it. guest_memfd files are bound to their owning virtual machine,
     cannot be mapped, read, or written by userspace, and cannot be
     resized. guest_memfd files do however support PUNCH_HOLE, which can
     be used to switch a memory area between guest_memfd and regular
     anonymous memory.

   - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
     per-page attributes for a given page of guest memory; right now the
     only attribute is whether the guest expects to access memory via
     guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
     TDX or ARM64 pKVM is checked by firmware or hypervisor that
     guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
     the case of pKVM).

  x86:

   - Support for "software-protected VMs" that can use the new
     guest_memfd and page attributes infrastructure. This is mostly
     useful for testing, since there is no pKVM-like infrastructure to
     provide a meaningfully reduced TCB.

   - Fix a relatively benign off-by-one error when splitting huge pages
     during CLEAR_DIRTY_LOG.

   - Fix a bug where KVM could incorrectly test-and-clear dirty bits in
     non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
     a non-huge SPTE.

   - Use more generic lockdep assertions in paths that don't actually
     care about whether the caller is a reader or a writer.

   - let Xen guests opt out of having PV clock reported as "based on a
     stable TSC", because some of them don't expect the "TSC stable" bit
     (added to the pvclock ABI by KVM, but never set by Xen) to be set.

   - Revert a bogus, made-up nested SVM consistency check for
     TLB_CONTROL.

   - Advertise flush-by-ASID support for nSVM unconditionally, as KVM
     always flushes on nested transitions, i.e. always satisfies flush
     requests. This allows running bleeding edge versions of VMware
     Workstation on top of KVM.

   - Sanity check that the CPU supports flush-by-ASID when enabling SEV
     support.

   - On AMD machines with vNMI, always rely on hardware instead of
     intercepting IRET in some cases to detect unmasking of NMIs

   - Support for virtualizing Linear Address Masking (LAM)

   - Fix a variety of vPMU bugs where KVM fail to stop/reset counters
     and other state prior to refreshing the vPMU model.

   - Fix a double-overflow PMU bug by tracking emulated counter events
     using a dedicated field instead of snapshotting the "previous"
     counter. If the hardware PMC count triggers overflow that is
     recognized in the same VM-Exit that KVM manually bumps an event
     count, KVM would pend PMIs for both the hardware-triggered overflow
     and for KVM-triggered overflow.

   - Turn off KVM_WERROR by default for all configs so that it's not
     inadvertantly enabled by non-KVM developers, which can be
     problematic for subsystems that require no regressions for W=1
     builds.

   - Advertise all of the host-supported CPUID bits that enumerate
     IA32_SPEC_CTRL "features".

   - Don't force a masterclock update when a vCPU synchronizes to the
     current TSC generation, as updating the masterclock can cause
     kvmclock's time to "jump" unexpectedly, e.g. when userspace
     hotplugs a pre-created vCPU.

   - Use RIP-relative address to read kvm_rebooting in the VM-Enter
     fault paths, partly as a super minor optimization, but mostly to
     make KVM play nice with position independent executable builds.

   - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
     CONFIG_HYPERV as a minor optimization, and to self-document the
     code.

   - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
     "emulation" at build time.

  ARM64:

   - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
     granule sizes. Branch shared with the arm64 tree.

   - Large Fine-Grained Trap rework, bringing some sanity to the
     feature, although there is more to come. This comes with a prefix
     branch shared with the arm64 tree.

   - Some additional Nested Virtualization groundwork, mostly
     introducing the NV2 VNCR support and retargetting the NV support to
     that version of the architecture.

   - A small set of vgic fixes and associated cleanups.

  Loongarch:

   - Optimization for memslot hugepage checking

   - Cleanup and fix some HW/SW timer issues

   - Add LSX/LASX (128bit/256bit SIMD) support

  RISC-V:

   - KVM_GET_REG_LIST improvement for vector registers

   - Generate ISA extension reg_list using macros in get-reg-list
     selftest

   - Support for reporting steal time along with selftest

  s390:

   - Bugfixes

  Selftests:

   - Fix an annoying goof where the NX hugepage test prints out garbage
     instead of the magic token needed to run the test.

   - Fix build errors when a header is delete/moved due to a missing
     flag in the Makefile.

   - Detect if KVM bugged/killed a selftest's VM and print out a helpful
     message instead of complaining that a random ioctl() failed.

   - Annotate the guest printf/assert helpers with __printf(), and fix
     the various bugs that were lurking due to lack of said annotation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
  x86/kvm: Do not try to disable kvmclock if it was not enabled
  KVM: x86: add missing "depends on KVM"
  KVM: fix direction of dependency on MMU notifiers
  KVM: introduce CONFIG_KVM_COMMON
  KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
  KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
  RISC-V: KVM: selftests: Add get-reg-list test for STA registers
  RISC-V: KVM: selftests: Add steal_time test support
  RISC-V: KVM: selftests: Add guest_sbi_probe_extension
  RISC-V: KVM: selftests: Move sbi_ecall to processor.c
  RISC-V: KVM: Implement SBI STA extension
  RISC-V: KVM: Add support for SBI STA registers
  RISC-V: KVM: Add support for SBI extension registers
  RISC-V: KVM: Add SBI STA info to vcpu_arch
  RISC-V: KVM: Add steal-update vcpu request
  RISC-V: KVM: Add SBI STA extension skeleton
  RISC-V: paravirt: Implement steal-time support
  RISC-V: Add SBI STA extension definitions
  RISC-V: paravirt: Add skeleton for pv-time support
  RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
  ...
2024-01-17 13:03:37 -08:00
Tiezhu Yang
91af17cd7d LoongArch: Fix definition of ftrace_regs_set_instruction_pointer()
The current definition of ftrace_regs_set_instruction_pointer() is not
correct. Obviously, this function is used to set instruction pointer but
not return value, so it should call instruction_pointer_set() instead of
regs_set_return_value().

There is no side effect by now because it is only used for kernel live-
patching which is not supported, so fix it to avoid failure when testing
livepatch in the future.

Fixes: 6fbff14a63 ("LoongArch: ftrace: Abstract DYNAMIC_FTRACE_WITH_ARGS accesses")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-01-17 12:43:08 +08:00
Youling Tang
78de91b458 LoongArch: Use generic interface to support crashkernel=X,[high,low]
LoongArch already supports two crashkernel regions in kexec-tools, so we
can directly use the common interface to support crashkernel=X,[high,low]
after commit 0ab97169aa ("crash_core: add generic function to do
reservation").

With the help of newly changed function parse_crashkernel() and generic
reserve_crashkernel_generic(), crashkernel reservation can be simplified
by steps:

1) Add a new header file <asm/crash_core.h>, then define CRASH_ALIGN,
   CRASH_ADDR_LOW_MAX and CRASH_ADDR_HIGH_MAX and in <asm/crash_core.h>;

2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
   reserve_crashkernel_generic();

3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
   arch/loongarch/Kconfig.

One can reserve the crash kernel from high memory above DMA zone range
by explicitly passing "crashkernel=X,high"; or reserve a memory range
below 4G with "crashkernel=X,low". Besides, there are few rules need to
take notice:

1) "crashkernel=X,[high,low]" will be ignored if "crashkernel=size" is
   specified.
2) "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed
   and there is enough memory to be allocated under 4G.
3) When allocating crashkernel above 4G and no "crashkernel=X,low" is
   specified, a 128M low memory will be allocated automatically for
   swiotlb bounce buffer.
See Documentation/admin-guide/kernel-parameters.txt for more information.

Following test cases have been performed as expected:
1) crashkernel=256M                          //low=256M
2) crashkernel=1G                            //low=1G
3) crashkernel=4G                            //high=4G, low=128M(default)
4) crashkernel=4G crashkernel=256M,high      //high=4G, low=128M(default), high is ignored
5) crashkernel=4G crashkernel=256M,low       //high=4G, low=128M(default), low is ignored
6) crashkernel=4G,high                       //high=4G, low=128M(default)
7) crashkernel=256M,low                      //low=0M, invalid
8) crashkernel=4G,high crashkernel=256M,low  //high=4G, low=256M
9) crashkernel=4G,high crashkernel=4G,low    //high=0M, low=0M, invalid
10) crashkernel=512M@2560M                   //low=512M
11) crashkernel=1G,high crashkernel=0M,low   //high=1G, low=0M

Recommended usage in general:
1) In the case of small memory: crashkernel=512M
2) In the case of large memory: crashkernel=1024M,high crashkernel=128M,low

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-01-17 12:43:08 +08:00
Xi Ruoyao
c239665130 LoongArch: Fix and simplify fcsr initialization on execve()
There has been a lingering bug in LoongArch Linux systems causing some
GCC tests to intermittently fail (see Closes link).  I've made a minimal
reproducer:

    zsh% cat measure.s
    .align 4
    .globl _start
    _start:
        movfcsr2gr  $a0, $fcsr0
        bstrpick.w  $a0, $a0, 16, 16
        beqz        $a0, .ok
        break       0
    .ok:
        li.w        $a7, 93
        syscall     0
    zsh% cc mesaure.s -o measure -nostdlib
    zsh% echo $((1.0/3))
    0.33333333333333331
    zsh% while ./measure; do ; done

This while loop should not stop as POSIX is clear that execve must set
fenv to the default, where FCSR should be zero.  But in fact it will
just stop after running for a while (normally less than 30 seconds).
Note that "$((1.0/3))" is needed to reproduce this issue because it
raises FE_INVALID and makes fcsr0 non-zero.

The problem is we are currently relying on SET_PERSONALITY2() to reset
current->thread.fpu.fcsr.  But SET_PERSONALITY2() is executed before
start_thread which calls lose_fpu(0).  We can see if kernel preempt is
enabled, we may switch to another thread after SET_PERSONALITY2() but
before lose_fpu(0).  Then bad thing happens: during the thread switch
the value of the fcsr0 register is stored into current->thread.fpu.fcsr,
making it dirty again.

The issue can be fixed by setting current->thread.fpu.fcsr after
lose_fpu(0) because lose_fpu() clears TIF_USEDFPU, then the thread
switch won't touch current->thread.fpu.fcsr.

The only other architecture setting FCSR in SET_PERSONALITY2() is MIPS.
I've ran a similar test on MIPS with mainline kernel and it turns out
MIPS is buggy, too.  Anyway MIPS do this for supporting different FP
flavors (NaN encodings, etc.) which do not exist on LoongArch.  So for
LoongArch, we can simply remove the current->thread.fpu.fcsr setting
from SET_PERSONALITY2() and do it in start_thread(), after lose_fpu(0).

The while loop failing with the mainline kernel has survived one hour
after this change on LoongArch.

Fixes: 803b0fc5c3 ("LoongArch: Add process management")
Closes: https://github.com/loongson-community/discussions/issues/7
Link: https://lore.kernel.org/linux-mips/7a6aa1bbdbbe2e63ae96ff163fab0349f58f1b9e.camel@xry111.site/
Cc: stable@vger.kernel.org
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-01-17 12:43:08 +08:00
Huacai Chen
ce68ff3528 LoongArch: Let cores_io_master cover the largest NR_CPUS
Now loongson_system_configuration::cores_io_master only covers 64 cpus,
if NR_CPUS > 64 there will be memory corruption. So let cores_io_master
cover the largest NR_CPUS (256).

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-01-17 12:43:08 +08:00
Huacai Chen
d23b77953f LoongArch: Change SHMLBA from SZ_64K to PAGE_SIZE
LoongArch has hardware page coloring for L1 Cache, so we don't have
cache aliases. But SFB (Store Fill Buffer) still has aliases. So we
define SHMLBA to SZ_64K previously. But there are losts of applications
use PAGE_SIZE rather than SHMLBA to mmap() file pages and shared pages.
Of course we can fix them one by one, but not easy.

On the other hand, we can simply disable SFB for 4KB page size to fix
cache alias (there will be performance decrease, but acceptable), and
in future we will fix SFB in hardware. So we can safely define SHMLBA to
PAGE_SIZE (use the generic shmparam.h) to make life easier.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-01-17 12:43:08 +08:00
Linus Torvalds
a7e4c6cf5b EFI updates for v6.8
- Fix a syzbot reported issue in efivarfs where concurrent accesses to
   the file system resulted in list corruption
 
 - Add support for accessing EFI variables via the TEE subsystem (and a
   trusted application in the secure world) instead of via EFI runtime
   firmware running in the OS's execution context
 
 - Avoid linker tricks to discover the image base on LoongArch
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZYVaHQAKCRAwbglWLn0t
 XPm/AQDzX9A6TND00eOLYYWw91kybHnzrVd8GRKOv2EIxGz33AEAgW6nXIJlBRax
 MBq6S/sXdyknuCC3sO7H9FexdD4BzQM=
 =MZUx
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:

 - Fix a syzbot reported issue in efivarfs where concurrent accesses to
   the file system resulted in list corruption

 - Add support for accessing EFI variables via the TEE subsystem (and a
   trusted application in the secure world) instead of via EFI runtime
   firmware running in the OS's execution context

 - Avoid linker tricks to discover the image base on LoongArch

* tag 'efi-next-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: memmap: fix kernel-doc warnings
  efi/loongarch: Directly position the loaded image file
  efivarfs: automatically update super block flag
  efi: Add tee-based EFI variable driver
  efi: Add EFI_ACCESS_DENIED status code
  efi: expose efivar generic ops register function
  efivarfs: Move efivarfs list into superblock s_fs_info
  efivarfs: Free s_fs_info on unmount
  efivarfs: Move efivar availability check into FS context init
  efivarfs: force RO when remounting if SetVariable is not supported
2024-01-09 17:11:27 -08:00
Linus Torvalds
fb46e22a9e Many singleton patches against the MM code. The patch series which
are included in this merge do the following:
 
 - Peng Zhang has done some mapletree maintainance work in the
   series
 
 	"maple_tree: add mt_free_one() and mt_attr() helpers"
 	"Some cleanups of maple tree"
 
 - In the series "mm: use memmap_on_memory semantics for dax/kmem"
   Vishal Verma has altered the interworking between memory-hotplug
   and dax/kmem so that newly added 'device memory' can more easily
   have its memmap placed within that newly added memory.
 
 - Matthew Wilcox continues folio-related work (including a few
   fixes) in the patch series
 
 	"Add folio_zero_tail() and folio_fill_tail()"
 	"Make folio_start_writeback return void"
 	"Fix fault handler's handling of poisoned tail pages"
 	"Convert aops->error_remove_page to ->error_remove_folio"
 	"Finish two folio conversions"
 	"More swap folio conversions"
 
 - Kefeng Wang has also contributed folio-related work in the series
 
 	"mm: cleanup and use more folio in page fault"
 
 - Jim Cromie has improved the kmemleak reporting output in the
   series "tweak kmemleak report format".
 
 - In the series "stackdepot: allow evicting stack traces" Andrey
   Konovalov to permits clients (in this case KASAN) to cause
   eviction of no longer needed stack traces.
 
 - Charan Teja Kalla has fixed some accounting issues in the page
   allocator's atomic reserve calculations in the series "mm:
   page_alloc: fixes for high atomic reserve caluculations".
 
 - Dmitry Rokosov has added to the samples/ dorectory some sample
   code for a userspace memcg event listener application.  See the
   series "samples: introduce cgroup events listeners".
 
 - Some mapletree maintanance work from Liam Howlett in the series
   "maple_tree: iterator state changes".
 
 - Nhat Pham has improved zswap's approach to writeback in the
   series "workload-specific and memory pressure-driven zswap
   writeback".
 
 - DAMON/DAMOS feature and maintenance work from SeongJae Park in
   the series
 
 	"mm/damon: let users feed and tame/auto-tune DAMOS"
 	"selftests/damon: add Python-written DAMON functionality tests"
 	"mm/damon: misc updates for 6.8"
 
 - Yosry Ahmed has improved memcg's stats flushing in the series
   "mm: memcg: subtree stats flushing and thresholds".
 
 - In the series "Multi-size THP for anonymous memory" Ryan Roberts
   has added a runtime opt-in feature to transparent hugepages which
   improves performance by allocating larger chunks of memory during
   anonymous page faults.
 
 - Matthew Wilcox has also contributed some cleanup and maintenance
   work against eh buffer_head code int he series "More buffer_head
   cleanups".
 
 - Suren Baghdasaryan has done work on Andrea Arcangeli's series
   "userfaultfd move option".  UFFDIO_MOVE permits userspace heap
   compaction algorithms to move userspace's pages around rather than
   UFFDIO_COPY'a alloc/copy/free.
 
 - Stefan Roesch has developed a "KSM Advisor", in the series
   "mm/ksm: Add ksm advisor".  This is a governor which tunes KSM's
   scanning aggressiveness in response to userspace's current needs.
 
 - Chengming Zhou has optimized zswap's temporary working memory
   use in the series "mm/zswap: dstmem reuse optimizations and
   cleanups".
 
 - Matthew Wilcox has performed some maintenance work on the
   writeback code, both code and within filesystems.  The series is
   "Clean up the writeback paths".
 
 - Andrey Konovalov has optimized KASAN's handling of alloc and
   free stack traces for secondary-level allocators, in the series
   "kasan: save mempool stack traces".
 
 - Andrey also performed some KASAN maintenance work in the series
   "kasan: assorted clean-ups".
 
 - David Hildenbrand has gone to town on the rmap code.  Cleanups,
   more pte batching, folio conversions and more.  See the series
   "mm/rmap: interface overhaul".
 
 - Kinsey Ho has contributed some maintenance work on the MGLRU
   code in the series "mm/mglru: Kconfig cleanup".
 
 - Matthew Wilcox has contributed lruvec page accounting code
   cleanups in the series "Remove some lruvec page accounting
   functions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
 jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
 Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
 =0NHs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Peng Zhang has done some mapletree maintainance work in the series

	'maple_tree: add mt_free_one() and mt_attr() helpers'
	'Some cleanups of maple tree'

   - In the series 'mm: use memmap_on_memory semantics for dax/kmem'
     Vishal Verma has altered the interworking between memory-hotplug
     and dax/kmem so that newly added 'device memory' can more easily
     have its memmap placed within that newly added memory.

   - Matthew Wilcox continues folio-related work (including a few fixes)
     in the patch series

	'Add folio_zero_tail() and folio_fill_tail()'
	'Make folio_start_writeback return void'
	'Fix fault handler's handling of poisoned tail pages'
	'Convert aops->error_remove_page to ->error_remove_folio'
	'Finish two folio conversions'
	'More swap folio conversions'

   - Kefeng Wang has also contributed folio-related work in the series

	'mm: cleanup and use more folio in page fault'

   - Jim Cromie has improved the kmemleak reporting output in the series
     'tweak kmemleak report format'.

   - In the series 'stackdepot: allow evicting stack traces' Andrey
     Konovalov to permits clients (in this case KASAN) to cause eviction
     of no longer needed stack traces.

   - Charan Teja Kalla has fixed some accounting issues in the page
     allocator's atomic reserve calculations in the series 'mm:
     page_alloc: fixes for high atomic reserve caluculations'.

   - Dmitry Rokosov has added to the samples/ dorectory some sample code
     for a userspace memcg event listener application. See the series
     'samples: introduce cgroup events listeners'.

   - Some mapletree maintanance work from Liam Howlett in the series
     'maple_tree: iterator state changes'.

   - Nhat Pham has improved zswap's approach to writeback in the series
     'workload-specific and memory pressure-driven zswap writeback'.

   - DAMON/DAMOS feature and maintenance work from SeongJae Park in the
     series

	'mm/damon: let users feed and tame/auto-tune DAMOS'
	'selftests/damon: add Python-written DAMON functionality tests'
	'mm/damon: misc updates for 6.8'

   - Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
     memcg: subtree stats flushing and thresholds'.

   - In the series 'Multi-size THP for anonymous memory' Ryan Roberts
     has added a runtime opt-in feature to transparent hugepages which
     improves performance by allocating larger chunks of memory during
     anonymous page faults.

   - Matthew Wilcox has also contributed some cleanup and maintenance
     work against eh buffer_head code int he series 'More buffer_head
     cleanups'.

   - Suren Baghdasaryan has done work on Andrea Arcangeli's series
     'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
     compaction algorithms to move userspace's pages around rather than
     UFFDIO_COPY'a alloc/copy/free.

   - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
     Add ksm advisor'. This is a governor which tunes KSM's scanning
     aggressiveness in response to userspace's current needs.

   - Chengming Zhou has optimized zswap's temporary working memory use
     in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.

   - Matthew Wilcox has performed some maintenance work on the writeback
     code, both code and within filesystems. The series is 'Clean up the
     writeback paths'.

   - Andrey Konovalov has optimized KASAN's handling of alloc and free
     stack traces for secondary-level allocators, in the series 'kasan:
     save mempool stack traces'.

   - Andrey also performed some KASAN maintenance work in the series
     'kasan: assorted clean-ups'.

   - David Hildenbrand has gone to town on the rmap code. Cleanups, more
     pte batching, folio conversions and more. See the series 'mm/rmap:
     interface overhaul'.

   - Kinsey Ho has contributed some maintenance work on the MGLRU code
     in the series 'mm/mglru: Kconfig cleanup'.

   - Matthew Wilcox has contributed lruvec page accounting code cleanups
     in the series 'Remove some lruvec page accounting functions'"

* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
  mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
  mm, treewide: introduce NR_PAGE_ORDERS
  selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
  selftests/mm: skip test if application doesn't has root privileges
  selftests/mm: conform test to TAP format output
  selftests: mm: hugepage-mmap: conform to TAP format output
  selftests/mm: gup_test: conform test to TAP format output
  mm/selftests: hugepage-mremap: conform test to TAP format output
  mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
  mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
  mm/memcontrol: remove __mod_lruvec_page_state()
  mm/khugepaged: use a folio more in collapse_file()
  slub: use a folio in __kmalloc_large_node
  slub: use folio APIs in free_large_kmalloc()
  slub: use alloc_pages_node() in alloc_slab_page()
  mm: remove inc/dec lruvec page state functions
  mm: ratelimit stat flush from workingset shrinker
  kasan: stop leaking stack trace handles
  mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
  mm/mglru: add dummy pmd_dirty()
  ...
2024-01-09 11:18:47 -08:00
Kinsey Ho
533c67e635 mm/mglru: add dummy pmd_dirty()
Add dummy pmd_dirty() for architectures that don't provide it.
This is similar to commit 6617da8fb5 ("mm: add dummy pmd_young()
for architectures not having it").

Link: https://lkml.kernel.org/r/20231227141205.2200125-5-kinseyho@google.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312210606.1Etqz3M4-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202312210042.xQEiqlEh-lkp@intel.com/
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Donet Tom <donettom@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:44 -08:00
Paolo Bonzini
136292522e LoongArch KVM changes for v6.8
1. Optimization for memslot hugepage checking.
 2. Cleanup and fix some HW/SW timer issues.
 3. Add LSX/LASX (128bit/256bit SIMD) support.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmWGu+0WHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImesO7D/wOdYP96R+mRzpLBeuTtFxU8e4A
 3n2luxOeP8v1WYtQ9H8M01Wgly+9u6cJ2pgAlv79BQHfmCfC0aWQLmpnCZmk/mYW
 wtQ75ASA3Qg6zOBWEksCkA0LUdPDHfQuaaUXT7RYZ7QtHKSNkkhsw2nMCq6fgrXU
 RnZjGctjuxgYSqQtwzfYO2AjSBAfAq1MjSzCTULJ0KkE8o5Bg0KOoGj8ijC1U+ua
 QWBnqTNzeKmYmqAFfhXoiiFYcuBUq7DEk5RtwDU7SeqqJEV3a8AbbsrWfz+wMemG
 gri95uRxvnhpPZ+6/PrVjIezqexPJmQ9+tjY6mxh/bPRnS5ICFygjV3lt050JUK8
 xIaJEFvl7g88RIz5mnTeM9tU4ibIsCLgA9zj33ps2H7QP5NazUm1dzk1YGAgqPdw
 m5hjwtTFQEujQM6cz1DLfhoi15VDNcYUonJIvGFZMhl7InitDpB3u9sI+AVGIVUG
 yKzBkqGB1L1vbJGnuWmspEqSUo7Z9iYzuVGbOnjc9LKQ/8OpLxj0brymYheA+CKG
 CIdULximQFVEHc2lbE+H+bW4hnrFP4sN9hlTng7KN7ommCIg+FltisM8Nt5NLWID
 9ywLj4Qa0Qrc5vB3FJ8+ksuDe2nD83uVLj247R7B0wxQcYw4ocyW/YU+gayF4EjY
 6azutwllW5ZB+I3hyw==
 =phol
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.8

1. Optimization for memslot hugepage checking.
2. Cleanup and fix some HW/SW timer issues.
3. Add LSX/LASX (128bit/256bit SIMD) support.
2024-01-02 13:16:29 -05:00
Wang Yao
174a0c565c efi/loongarch: Directly position the loaded image file
The use of the 'kernel_offset' variable to position the image file that
has been loaded by UEFI or GRUB is unnecessary, because we can directly
position the loaded image file through using the image_base field of the
efi_loaded_image struct provided by UEFI.

Replace kernel_offset with image_base to position the image file that has
been loaded by UEFI or GRUB.

Signed-off-by: Wang Yao <wangyao@lemote.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-12-19 11:16:37 +01:00
Tianrui Zhao
118e10cd89 LoongArch: KVM: Add LASX (256bit SIMD) support
This patch adds LASX (256bit SIMD) support for LoongArch KVM.

There will be LASX exception in KVM when guest use the LASX instructions.
KVM will enable LASX and restore the vector registers for guest and then
return to guest to continue running.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-12-19 10:48:28 +08:00
Tianrui Zhao
db1ecca22e LoongArch: KVM: Add LSX (128bit SIMD) support
This patch adds LSX (128bit SIMD) support for LoongArch KVM.

There will be LSX exception in KVM when guest use the LSX instructions.
KVM will enable LSX and restore the vector registers for guest and then
return to guest to continue running.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-12-19 10:48:28 +08:00
Bibo Mao
1ab9c60994 LoongArch: KVM: Remove kvm_acquire_timer() before entering guest
Timer emulation method in VM is switch to SW timer, there are two
places where timer emulation is needed. One is during vcpu thread
context switch, the other is halt-polling with idle instruction
emulation. SW timer switching is removed during halt-polling mode,
so it is not necessary to disable SW timer before entering to guest.

This patch removes SW timer handling before entering guest mode, and
put it in HW timer restoring flow when vcpu thread is sched-in. With
this patch, vm timer emulation is simpler, there is SW/HW timer
switch only in vcpu thread context switch scenario.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-12-19 10:48:28 +08:00
Bibo Mao
7ab6fb505b LoongArch: KVM: Optimization for memslot hugepage checking
During shadow mmu page fault, there is checking for huge page for
specified memslot. Page fault is hot path, check logic can be done
when memslot is created. Here two flags are added for huge page
checking, KVM_MEM_HUGEPAGE_CAPABLE and KVM_MEM_HUGEPAGE_INCAPABLE.
Indeed for an optimized qemu, memslot for DRAM is always huge page
aligned. The flag is firstly checked during hot page fault path.

Now only huge page flag is supported, there is a long way for super
page support in LoongArch system. Since super page size is 64G for
16K pagesize and 1G for 4K pagesize, 64G physical address is rarely
used and LoongArch kernel needs support super page for 4K. Also memory
layout of LoongArch qemu VM should be 1G aligned.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-12-19 10:48:27 +08:00
Linus Torvalds
af2a9c6a83 EFI fixes for v6.7 #2
- Deal with a regression in the recently refactored x86 EFI stub code on
   older Dell systems by disabling randomization of the physical load
   address
 - Use the correct load address for relocatable Loongarch kernels
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZXgvLAAKCRAwbglWLn0t
 XLgKAP9oKLP7v0TD2BJOPGqr4kEtMfZYayV2EUN387VbPYfT0wEAoeDeZmaGUYce
 BuovToERSgjj2FylAWNlZATEh2d35ww=
 =kv9E
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-for-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - Deal with a regression in the recently refactored x86 EFI stub code
   on older Dell systems by disabling randomization of the physical load
   address

 - Use the correct load address for relocatable Loongarch kernels

* tag 'efi-urgent-for-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/x86: Avoid physical KASLR on older Dell systems
  efi/loongarch: Use load address to calculate kernel entry address
2023-12-13 10:54:50 -08:00
Wang Yao
271f2a4a95 efi/loongarch: Use load address to calculate kernel entry address
The efi_relocate_kernel() may load the PIE kernel to anywhere, the
loaded address may not be equal to link address or
EFI_KIMG_PREFERRED_ADDRESS.

Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Wang Yao <wangyao@lemote.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-12-11 11:18:26 +01:00
Hengqi Chen
d6c5f06e46 LoongArch: Preserve syscall nr across execve()
Currently, we store syscall nr in pt_regs::regs[11] and syscall execve()
accidentally overrides it during its execution:

    sys_execve()
      -> do_execve()
        -> do_execveat_common()
          -> bprm_execve()
            -> exec_binprm()
              -> search_binary_handler()
                -> load_elf_binary()
                  -> ELF_PLAT_INIT()

ELF_PLAT_INIT() reset regs[11] to 0, so in syscall_exit_to_user_mode()
we later get a wrong syscall nr. This breaks tools like execsnoop since
it relies on execve() tracepoints.

Skip pt_regs::regs[11] reset in ELF_PLAT_INIT() to fix the issue.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-12-09 15:49:15 +08:00
Xi Ruoyao
8146c5b349 LoongArch: Slightly clean up drdtime()
As we are just discarding the stable clock ID, simply write it into
$zero instead of allocating a temporary register.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-12-09 15:49:15 +08:00
Huacai Chen
ee2daf7102 LoongArch: Add __percpu annotation for __percpu_read()/__percpu_write()
When build kernel with C=1, we get:

arch/loongarch/kernel/process.c:234:46: warning: incorrect type in argument 1 (different address spaces)
arch/loongarch/kernel/process.c:234:46:    expected void *ptr
arch/loongarch/kernel/process.c:234:46:    got unsigned long [noderef] __percpu *
arch/loongarch/kernel/process.c:234:46: warning: incorrect type in argument 1 (different address spaces)
arch/loongarch/kernel/process.c:234:46:    expected void *ptr
arch/loongarch/kernel/process.c:234:46:    got unsigned long [noderef] __percpu *
arch/loongarch/kernel/process.c:234:46: warning: incorrect type in argument 1 (different address spaces)
arch/loongarch/kernel/process.c:234:46:    expected void *ptr
arch/loongarch/kernel/process.c:234:46:    got unsigned long [noderef] __percpu *
arch/loongarch/kernel/process.c:234:46: warning: incorrect type in argument 1 (different address spaces)
arch/loongarch/kernel/process.c:234:46:    expected void *ptr
arch/loongarch/kernel/process.c:234:46:    got unsigned long [noderef] __percpu *

Add __percpu annotation for __percpu_read()/__percpu_write() can avoid
such warnings. __percpu_xchg() and other functions don't need annotation
because their wrapper, i.e. _pcp_protect(), already suppresses warnings.

Also adjust the indentations in this file.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311080409.LlOfTR3m-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202311080840.Vc2kXhfp-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202311081340.3k72KKdg-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202311120926.cjYHyoYw-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202311152142.g6UyNx1R-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202311160339.DbhaH8LX-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202311181454.CTPrSYmQ-lkp@intel.com/
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-11-21 15:03:25 +08:00
WANG Rui
aa0cbc1b50 LoongArch: Record pc instead of offset in la_abs relocation
To clarify, the previous version functioned flawlessly. However, it's
worth noting that the LLVM's LoongArch backend currently lacks support
for cross-section label calculations. With this patch, we enable the use
of clang to compile relocatable kernels.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-11-21 15:03:25 +08:00
Paolo Bonzini
6c370dc653 Merge branch 'kvm-guestmemfd' into HEAD
Introduce several new KVM uAPIs to ultimately create a guest-first memory
subsystem within KVM, a.k.a. guest_memfd.  Guest-first memory allows KVM
to provide features, enhancements, and optimizations that are kludgly
or outright impossible to implement in a generic memory subsystem.

The core KVM ioctl() for guest_memfd is KVM_CREATE_GUEST_MEMFD, which
similar to the generic memfd_create(), creates an anonymous file and
returns a file descriptor that refers to it.  Again like "regular"
memfd files, guest_memfd files live in RAM, have volatile storage,
and are automatically released when the last reference is dropped.
The key differences between memfd files (and every other memory subystem)
is that guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be resized.
guest_memfd files do however support PUNCH_HOLE, which can be used to
convert a guest memory area between the shared and guest-private states.

A second KVM ioctl(), KVM_SET_MEMORY_ATTRIBUTES, allows userspace to
specify attributes for a given page of guest memory.  In the long term,
it will likely be extended to allow userspace to specify per-gfn RWX
protections, including allowing memory to be writable in the guest
without it also being writable in host userspace.

The immediate and driving use case for guest_memfd are Confidential
(CoCo) VMs, specifically AMD's SEV-SNP, Intel's TDX, and KVM's own pKVM.
For such use cases, being able to map memory into KVM guests without
requiring said memory to be mapped into the host is a hard requirement.
While SEV+ and TDX prevent untrusted software from reading guest private
data by encrypting guest memory, pKVM provides confidentiality and
integrity *without* relying on memory encryption.  In addition, with
SEV-SNP and especially TDX, accessing guest private memory can be fatal
to the host, i.e. KVM must be prevent host userspace from accessing
guest memory irrespective of hardware behavior.

Long term, guest_memfd may be useful for use cases beyond CoCo VMs,
for example hardening userspace against unintentional accesses to guest
memory.  As mentioned earlier, KVM's ABI uses userspace VMA protections to
define the allow guest protection (with an exception granted to mapping
guest memory executable), and similarly KVM currently requires the guest
mapping size to be a strict subset of the host userspace mapping size.
Decoupling the mappings sizes would allow userspace to precisely map
only what is needed and with the required permissions, without impacting
guest performance.

A guest-first memory subsystem also provides clearer line of sight to
things like a dedicated memory pool (for slice-of-hardware VMs) and
elimination of "struct page" (for offload setups where userspace _never_
needs to DMA from or into guest memory).

guest_memfd is the result of 3+ years of development and exploration;
taking on memory management responsibilities in KVM was not the first,
second, or even third choice for supporting CoCo VMs.  But after many
failed attempts to avoid KVM-specific backing memory, and looking at
where things ended up, it is quite clear that of all approaches tried,
guest_memfd is the simplest, most robust, and most extensible, and the
right thing to do for KVM and the kernel at-large.

The "development cycle" for this version is going to be very short;
ideally, next week I will merge it as is in kvm/next, taking this through
the KVM tree for 6.8 immediately after the end of the merge window.
The series is still based on 6.6 (plus KVM changes for 6.7) so it
will require a small fixup for changes to get_file_rcu() introduced in
6.7 by commit 0ede61d858 ("file: convert to SLAB_TYPESAFE_BY_RCU").
The fixup will be done as part of the merge commit, and most of the text
above will become the commit message for the merge.

Pending post-merge work includes:
- hugepage support
- looking into using the restrictedmem framework for guest memory
- introducing a testing mechanism to poison memory, possibly using
  the same memory attributes introduced here
- SNP and TDX support

There are two non-KVM patches buried in the middle of this series:

  fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
  mm: Add AS_UNMOVABLE to mark mapping as completely unmovable

The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
2023-11-14 08:31:31 -05:00
Sean Christopherson
f128cf8cfb KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIER
Convert KVM_ARCH_WANT_MMU_NOTIFIER into a Kconfig and select it where
appropriate to effectively maintain existing behavior.  Using a proper
Kconfig will simplify building more functionality on top of KVM's
mmu_notifier infrastructure.

Add a forward declaration of kvm_gfn_range to kvm_types.h so that
including arch/powerpc/include/asm/kvm_ppc.h's with CONFIG_KVM=n doesn't
generate warnings due to kvm_gfn_range being undeclared.  PPC defines
hooks for PR vs. HV without guarding them via #ifdeffery, e.g.

  bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range);
  bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range);
  bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range);
  bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range);

Alternatively, PPC could forward declare kvm_gfn_range, but there's no
good reason not to define it in common KVM.

Acked-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-13 05:29:09 -05:00
Linus Torvalds
4eeee6636a LoongArch changes for v6.7
1, Support PREEMPT_DYNAMIC with static keys;
 2, Relax memory ordering for atomic operations;
 3, Support BPF CPU v4 instructions for LoongArch;
 4, Some build and runtime warning fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmVQWXgWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImepDTEACS808EsgSNIM1+JwldhdqKOErt
 XDWlLuIddVpenInx8F+9GnZJzKBU+wl+Ow5ejcVarjcecIJDv5UhoVrbhpeOHkfv
 RszRXQR4p/ZNSFvdraYDjjJ9UX6bp5rq7vMUC2d9bLazMauAfwf7T/HJ5qj9OYZi
 RLlcwaKo2UQHYsT7nJicjh0qpH1YpZQBYTaUUCwzilzB6vAIOTf6X12vFmhtM/i+
 5RIPnesMA1IQSm2ywUODpDHCs7Pirvy8aJvx0CsYdi3xl1yg3pUS6u69Ms61uWlw
 29yYhNbWmVnDikTVLTNISDb/jwto5SAVB2KQKBhF1trF4ZBNE6r7sP4m2tfllYo9
 KXK9tm0U8McS5o46Qd5er6eEnxL7mEeAsc12tNKUYOMe3SIkmHJmj/rZQOtpsiBg
 zqQsYkGUfO2VAwMWiGke8dxPZElOYwZ3UCOpbEpXEXy3NW71VJTIuQFGmsYKJhdy
 3xaAtQxdffE5yUTt2j3Y8Mex2b2oSUBSF263imsZjzWOOxd480iaoejtamf1V779
 bElevzZjMDmbiQ7kiVSf96TWc7iYcSv33jhP4DorKIqnPseYPfrXEeD1xY7JV+IU
 kkvSlO0hAJzVMmQgu5n0PPT1wrVpuvwtbsfcRobIkr1vktZyLaKHRq7rh4R5HTRL
 ZUUm6c0kUDywGT+J4A==
 =bmFe
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - support PREEMPT_DYNAMIC with static keys

 - relax memory ordering for atomic operations

 - support BPF CPU v4 instructions for LoongArch

 - some build and runtime warning fixes

* tag 'loongarch-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  selftests/bpf: Enable cpu v4 tests for LoongArch
  LoongArch: BPF: Support signed mod instructions
  LoongArch: BPF: Support signed div instructions
  LoongArch: BPF: Support 32-bit offset jmp instructions
  LoongArch: BPF: Support unconditional bswap instructions
  LoongArch: BPF: Support sign-extension mov instructions
  LoongArch: BPF: Support sign-extension load instructions
  LoongArch: Add more instruction opcodes and emit_* helpers
  LoongArch/smp: Call rcutree_report_cpu_starting() earlier
  LoongArch: Relax memory ordering for atomic operations
  LoongArch: Mark __percpu functions as always inline
  LoongArch: Disable module from accessing external data directly
  LoongArch: Support PREEMPT_DYNAMIC with static keys
2023-11-12 10:58:08 -08:00
Hengqi Chen
add2802440 LoongArch: Add more instruction opcodes and emit_* helpers
This patch adds more instruction opcodes and their corresponding emit_*
helpers which will be used in later patches.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-11-08 14:12:15 +08:00
WANG Rui
affef66b65 LoongArch: Relax memory ordering for atomic operations
This patch relaxes the implementation while satisfying the memory ordering
requirements for atomic operations, which will help improve performance on
LA664+.

Unixbench with full threads (8)
                                           before       after
  Dhrystone 2 using register variables   203910714.2  203909539.8   0.00%
  Double-Precision Whetstone                 37930.9        37931   0.00%
  Execl Throughput                           29431.5      29545.8   0.39%
  File Copy 1024 bufsize 2000 maxblocks    6645759.5      6676320   0.46%
  File Copy 256 bufsize 500 maxblocks      2138772.4    2144182.4   0.25%
  File Copy 4096 bufsize 8000 maxblocks   11640698.4     11602703  -0.33%
  Pipe Throughput                          8849077.7    8917009.4   0.77%
  Pipe-based Context Switching             1255108.5    1287277.3   2.56%
  Process Creation                           50825.9      50442.1  -0.76%
  Shell Scripts (1 concurrent)               25795.8      25942.3   0.57%
  Shell Scripts (8 concurrent)                3812.6       3835.2   0.59%
  System Call Overhead                     9248212.6    9353348.6   1.14%
                                                                  =======
  System Benchmarks Index Score               8076.6       8114.4   0.47%

Signed-off-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-11-08 14:12:15 +08:00
Nathan Chancellor
71945968d8 LoongArch: Mark __percpu functions as always inline
A recent change to the optimization pipeline in LLVM reveals some
fragility around the inlining of LoongArch's __percpu functions, which
manifests as a BUILD_BUG() failure:

  In file included from kernel/sched/build_policy.c:17:
  In file included from include/linux/sched/cputime.h:5:
  In file included from include/linux/sched/signal.h:5:
  In file included from include/linux/rculist.h:11:
  In file included from include/linux/rcupdate.h:26:
  In file included from include/linux/irqflags.h:18:
  arch/loongarch/include/asm/percpu.h:97:3: error: call to '__compiletime_assert_51' declared with 'error' attribute: BUILD_BUG failed
     97 |                 BUILD_BUG();
        |                 ^
  include/linux/build_bug.h:59:21: note: expanded from macro 'BUILD_BUG'
     59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
        |                     ^
  include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
     39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
        |                                     ^
  include/linux/compiler_types.h:425:2: note: expanded from macro 'compiletime_assert'
    425 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
        |         ^
  include/linux/compiler_types.h:413:2: note: expanded from macro '_compiletime_assert'
    413 |         __compiletime_assert(condition, msg, prefix, suffix)
        |         ^
  include/linux/compiler_types.h:406:4: note: expanded from macro '__compiletime_assert'
    406 |                         prefix ## suffix();                             \
        |                         ^
  <scratch space>:86:1: note: expanded from here
     86 | __compiletime_assert_51
        | ^
  1 error generated.

If these functions are not inlined (which the compiler is free to do
even with functions marked with the standard 'inline' keyword), the
BUILD_BUG() in the default case cannot be eliminated since the compiler
cannot prove it is never used, resulting in a build failure due to the
error attribute.

Mark these functions as __always_inline to guarantee inlining so that
the BUILD_BUG() only triggers when the default case genuinely cannot be
eliminated due to an unexpected size.

Cc:  <stable@vger.kernel.org>
Closes: https://github.com/ClangBuiltLinux/linux/issues/1955
Fixes: 46859ac8af ("LoongArch: Add multi-processor (SMP) support")
Link: 1a2e77cf9e
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-11-08 14:12:15 +08:00
Linus Torvalds
ecae0bd517 Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
 
 - Kemeng Shi has contributed some compation maintenance work in the
   series "Fixes and cleanups to compaction".
 
 - Joel Fernandes has a patchset ("Optimize mremap during mutual
   alignment within PMD") which fixes an obscure issue with mremap()'s
   pagetable handling during a subsequent exec(), based upon an
   implementation which Linus suggested.
 
 - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the
   following patch series:
 
 	mm/damon: misc fixups for documents, comments and its tracepoint
 	mm/damon: add a tracepoint for damos apply target regions
 	mm/damon: provide pseudo-moving sum based access rate
 	mm/damon: implement DAMOS apply intervals
 	mm/damon/core-test: Fix memory leaks in core-test
 	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
 
 - In the series "Do not try to access unaccepted memory" Adrian Hunter
   provides some fixups for the recently-added "unaccepted memory' feature.
   To increase the feature's checking coverage.  "Plug a few gaps where
   RAM is exposed without checking if it is unaccepted memory".
 
 - In the series "cleanups for lockless slab shrink" Qi Zheng has done
   some maintenance work which is preparation for the lockless slab
   shrinking code.
 
 - Qi Zheng has redone the earlier (and reverted) attempt to make slab
   shrinking lockless in the series "use refcount+RCU method to implement
   lockless slab shrink".
 
 - David Hildenbrand contributes some maintenance work for the rmap code
   in the series "Anon rmap cleanups".
 
 - Kefeng Wang does more folio conversions and some maintenance work in
   the migration code.  Series "mm: migrate: more folio conversion and
   unification".
 
 - Matthew Wilcox has fixed an issue in the buffer_head code which was
   causing long stalls under some heavy memory/IO loads.  Some cleanups
   were added on the way.  Series "Add and use bdev_getblk()".
 
 - In the series "Use nth_page() in place of direct struct page
   manipulation" Zi Yan has fixed a potential issue with the direct
   manipulation of hugetlb page frames.
 
 - In the series "mm: hugetlb: Skip initialization of gigantic tail
   struct pages if freed by HVO" has improved our handling of gigantic
   pages in the hugetlb vmmemmep optimizaton code.  This provides
   significant boot time improvements when significant amounts of gigantic
   pages are in use.
 
 - Matthew Wilcox has sent the series "Small hugetlb cleanups" - code
   rationalization and folio conversions in the hugetlb code.
 
 - Yin Fengwei has improved mlock()'s handling of large folios in the
   series "support large folio for mlock"
 
 - In the series "Expose swapcache stat for memcg v1" Liu Shixin has
   added statistics for memcg v1 users which are available (and useful)
   under memcg v2.
 
 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
   prctl so that userspace may direct the kernel to not automatically
   propagate the denial to child processes.  The series is named "MDWE
   without inheritance".
 
 - Kefeng Wang has provided the series "mm: convert numa balancing
   functions to use a folio" which does what it says.
 
 - In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch
   makes is possible for a process to propagate KSM treatment across
   exec().
 
 - Huang Ying has enhanced memory tiering's calculation of memory
   distances.  This is used to permit the dax/kmem driver to use "high
   bandwidth memory" in addition to Optane Data Center Persistent Memory
   Modules (DCPMM).  The series is named "memory tiering: calculate
   abstract distance based on ACPI HMAT"
 
 - In the series "Smart scanning mode for KSM" Stefan Roesch has
   optimized KSM by teaching it to retain and use some historical
   information from previous scans.
 
 - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the
   series "mm: memcg: fix tracking of pending stats updates values".
 
 - In the series "Implement IOCTL to get and optionally clear info about
   PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits
   us to atomically read-then-clear page softdirty state.  This is mainly
   used by CRIU.
 
 - Hugh Dickins contributed the series "shmem,tmpfs: general maintenance"
   - a bunch of relatively minor maintenance tweaks to this code.
 
 - Matthew Wilcox has increased the use of the VMA lock over file-backed
   page faults in the series "Handle more faults under the VMA lock".  Some
   rationalizations of the fault path became possible as a result.
 
 - In the series "mm/rmap: convert page_move_anon_rmap() to
   folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups
   and folio conversions.
 
 - In the series "various improvements to the GUP interface" Lorenzo
   Stoakes has simplified and improved the GUP interface with an eye to
   providing groundwork for future improvements.
 
 - Andrey Konovalov has sent along the series "kasan: assorted fixes and
   improvements" which does those things.
 
 - Some page allocator maintenance work from Kemeng Shi in the series
   "Two minor cleanups to break_down_buddy_pages".
 
 - In thes series "New selftest for mm" Breno Leitao has developed
   another MM self test which tickles a race we had between madvise() and
   page faults.
 
 - In the series "Add folio_end_read" Matthew Wilcox provides cleanups
   and an optimization to the core pagecache code.
 
 - Nhat Pham has added memcg accounting for hugetlb memory in the series
   "hugetlb memcg accounting".
 
 - Cleanups and rationalizations to the pagemap code from Lorenzo
   Stoakes, in the series "Abstract vma_merge() and split_vma()".
 
 - Audra Mitchell has fixed issues in the procfs page_owner code's new
   timestamping feature which was causing some misbehaviours.  In the
   series "Fix page_owner's use of free timestamps".
 
 - Lorenzo Stoakes has fixed the handling of new mappings of sealed files
   in the series "permit write-sealed memfd read-only shared mappings".
 
 - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
   series "Batch hugetlb vmemmap modification operations".
 
 - Some buffer_head folio conversions and cleanups from Matthew Wilcox in
   the series "Finish the create_empty_buffers() transition".
 
 - As a page allocator performance optimization Huang Ying has added
   automatic tuning to the allocator's per-cpu-pages feature, in the series
   "mm: PCP high auto-tuning".
 
 - Roman Gushchin has contributed the patchset "mm: improve performance
   of accounted kernel memory allocations" which improves their performance
   by ~30% as measured by a micro-benchmark.
 
 - folio conversions from Kefeng Wang in the series "mm: convert page
   cpupid functions to folios".
 
 - Some kmemleak fixups in Liu Shixin's series "Some bugfix about
   kmemleak".
 
 - Qi Zheng has improved our handling of memoryless nodes by keeping them
   off the allocation fallback list.  This is done in the series "handle
   memoryless nodes more appropriately".
 
 - khugepaged conversions from Vishal Moola in the series "Some
   khugepaged folio conversions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA
 jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y
 FgeUPAD1oasg6CP+INZvCj34waNxwAc=
 =E+Y4
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Kemeng Shi has contributed some compation maintenance work in the
     series 'Fixes and cleanups to compaction'

   - Joel Fernandes has a patchset ('Optimize mremap during mutual
     alignment within PMD') which fixes an obscure issue with mremap()'s
     pagetable handling during a subsequent exec(), based upon an
     implementation which Linus suggested

   - More DAMON/DAMOS maintenance and feature work from SeongJae Park i
     the following patch series:

	mm/damon: misc fixups for documents, comments and its tracepoint
	mm/damon: add a tracepoint for damos apply target regions
	mm/damon: provide pseudo-moving sum based access rate
	mm/damon: implement DAMOS apply intervals
	mm/damon/core-test: Fix memory leaks in core-test
	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval

   - In the series 'Do not try to access unaccepted memory' Adrian
     Hunter provides some fixups for the recently-added 'unaccepted
     memory' feature. To increase the feature's checking coverage. 'Plug
     a few gaps where RAM is exposed without checking if it is
     unaccepted memory'

   - In the series 'cleanups for lockless slab shrink' Qi Zheng has done
     some maintenance work which is preparation for the lockless slab
     shrinking code

   - Qi Zheng has redone the earlier (and reverted) attempt to make slab
     shrinking lockless in the series 'use refcount+RCU method to
     implement lockless slab shrink'

   - David Hildenbrand contributes some maintenance work for the rmap
     code in the series 'Anon rmap cleanups'

   - Kefeng Wang does more folio conversions and some maintenance work
     in the migration code. Series 'mm: migrate: more folio conversion
     and unification'

   - Matthew Wilcox has fixed an issue in the buffer_head code which was
     causing long stalls under some heavy memory/IO loads. Some cleanups
     were added on the way. Series 'Add and use bdev_getblk()'

   - In the series 'Use nth_page() in place of direct struct page
     manipulation' Zi Yan has fixed a potential issue with the direct
     manipulation of hugetlb page frames

   - In the series 'mm: hugetlb: Skip initialization of gigantic tail
     struct pages if freed by HVO' has improved our handling of gigantic
     pages in the hugetlb vmmemmep optimizaton code. This provides
     significant boot time improvements when significant amounts of
     gigantic pages are in use

   - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
     rationalization and folio conversions in the hugetlb code

   - Yin Fengwei has improved mlock()'s handling of large folios in the
     series 'support large folio for mlock'

   - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
     added statistics for memcg v1 users which are available (and
     useful) under memcg v2

   - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
     prctl so that userspace may direct the kernel to not automatically
     propagate the denial to child processes. The series is named 'MDWE
     without inheritance'

   - Kefeng Wang has provided the series 'mm: convert numa balancing
     functions to use a folio' which does what it says

   - In the series 'mm/ksm: add fork-exec support for prctl' Stefan
     Roesch makes is possible for a process to propagate KSM treatment
     across exec()

   - Huang Ying has enhanced memory tiering's calculation of memory
     distances. This is used to permit the dax/kmem driver to use 'high
     bandwidth memory' in addition to Optane Data Center Persistent
     Memory Modules (DCPMM). The series is named 'memory tiering:
     calculate abstract distance based on ACPI HMAT'

   - In the series 'Smart scanning mode for KSM' Stefan Roesch has
     optimized KSM by teaching it to retain and use some historical
     information from previous scans

   - Yosry Ahmed has fixed some inconsistencies in memcg statistics in
     the series 'mm: memcg: fix tracking of pending stats updates
     values'

   - In the series 'Implement IOCTL to get and optionally clear info
     about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
     which permits us to atomically read-then-clear page softdirty
     state. This is mainly used by CRIU

   - Hugh Dickins contributed the series 'shmem,tmpfs: general
     maintenance', a bunch of relatively minor maintenance tweaks to
     this code

   - Matthew Wilcox has increased the use of the VMA lock over
     file-backed page faults in the series 'Handle more faults under the
     VMA lock'. Some rationalizations of the fault path became possible
     as a result

   - In the series 'mm/rmap: convert page_move_anon_rmap() to
     folio_move_anon_rmap()' David Hildenbrand has implemented some
     cleanups and folio conversions

   - In the series 'various improvements to the GUP interface' Lorenzo
     Stoakes has simplified and improved the GUP interface with an eye
     to providing groundwork for future improvements

   - Andrey Konovalov has sent along the series 'kasan: assorted fixes
     and improvements' which does those things

   - Some page allocator maintenance work from Kemeng Shi in the series
     'Two minor cleanups to break_down_buddy_pages'

   - In thes series 'New selftest for mm' Breno Leitao has developed
     another MM self test which tickles a race we had between madvise()
     and page faults

   - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
     and an optimization to the core pagecache code

   - Nhat Pham has added memcg accounting for hugetlb memory in the
     series 'hugetlb memcg accounting'

   - Cleanups and rationalizations to the pagemap code from Lorenzo
     Stoakes, in the series 'Abstract vma_merge() and split_vma()'

   - Audra Mitchell has fixed issues in the procfs page_owner code's new
     timestamping feature which was causing some misbehaviours. In the
     series 'Fix page_owner's use of free timestamps'

   - Lorenzo Stoakes has fixed the handling of new mappings of sealed
     files in the series 'permit write-sealed memfd read-only shared
     mappings'

   - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
     series 'Batch hugetlb vmemmap modification operations'

   - Some buffer_head folio conversions and cleanups from Matthew Wilcox
     in the series 'Finish the create_empty_buffers() transition'

   - As a page allocator performance optimization Huang Ying has added
     automatic tuning to the allocator's per-cpu-pages feature, in the
     series 'mm: PCP high auto-tuning'

   - Roman Gushchin has contributed the patchset 'mm: improve
     performance of accounted kernel memory allocations' which improves
     their performance by ~30% as measured by a micro-benchmark

   - folio conversions from Kefeng Wang in the series 'mm: convert page
     cpupid functions to folios'

   - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
     kmemleak'

   - Qi Zheng has improved our handling of memoryless nodes by keeping
     them off the allocation fallback list. This is done in the series
     'handle memoryless nodes more appropriately'

   - khugepaged conversions from Vishal Moola in the series 'Some
     khugepaged folio conversions'"

[ bcachefs conflicts with the dynamically allocated shrinkers have been
  resolved as per Stephen Rothwell in

     https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/

  with help from Qi Zheng.

  The clone3 test filtering conflict was half-arsed by yours truly ]

* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
  mm/damon/sysfs: update monitoring target regions for online input commit
  mm/damon/sysfs: remove requested targets when online-commit inputs
  selftests: add a sanity check for zswap
  Documentation: maple_tree: fix word spelling error
  mm/vmalloc: fix the unchecked dereference warning in vread_iter()
  zswap: export compression failure stats
  Documentation: ubsan: drop "the" from article title
  mempolicy: migration attempt to match interleave nodes
  mempolicy: mmap_lock is not needed while migrating folios
  mempolicy: alloc_pages_mpol() for NUMA policy without vma
  mm: add page_rmappable_folio() wrapper
  mempolicy: remove confusing MPOL_MF_LAZY dead code
  mempolicy: mpol_shared_policy_init() without pseudo-vma
  mempolicy trivia: use pgoff_t in shared mempolicy tree
  mempolicy trivia: slightly more consistent naming
  mempolicy trivia: delete those ancient pr_debug()s
  mempolicy: fix migrate_pages(2) syscall return nr_failed
  kernfs: drop shared NUMA mempolicy hooks
  hugetlbfs: drop shared NUMA mempolicy pretence
  mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
  ...
2023-11-02 19:38:47 -10:00
Linus Torvalds
6803bd7956 ARM:
* Generalized infrastructure for 'writable' ID registers, effectively
   allowing userspace to opt-out of certain vCPU features for its guest
 
 * Optimization for vSGI injection, opportunistically compressing MPIDR
   to vCPU mapping into a table
 
 * Improvements to KVM's PMU emulation, allowing userspace to select
   the number of PMCs available to a VM
 
 * Guest support for memory operation instructions (FEAT_MOPS)
 
 * Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
   bugs and getting rid of useless code
 
 * Changes to the way the SMCCC filter is constructed, avoiding wasted
   memory allocations when not in use
 
 * Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
   the overhead of errata mitigations
 
 * Miscellaneous kernel and selftest fixes
 
 LoongArch:
 
 * New architecture.  The hardware uses the same model as x86, s390
   and RISC-V, where guest/host mode is orthogonal to supervisor/user
   mode.  The virtualization extensions are very similar to MIPS,
   therefore the code also has some similarities but it's been cleaned
   up to avoid some of the historical bogosities that are found in
   arch/mips.  The kernel emulates MMU, timer and CSR accesses, while
   interrupt controllers are only emulated in userspace, at least for
   now.
 
 RISC-V:
 
 * Support for the Smstateen and Zicond extensions
 
 * Support for virtualizing senvcfg
 
 * Support for virtualized SBI debug console (DBCN)
 
 S390:
 
 * Nested page table management can be monitored through tracepoints
   and statistics
 
 x86:
 
 * Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC,
   which could result in a dropped timer IRQ
 
 * Avoid WARN on systems with Intel IPI virtualization
 
 * Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without
   forcing more common use cases to eat the extra memory overhead.
 
 * Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
   SBPB, aka Selective Branch Predictor Barrier).
 
 * Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
   creating the original vCPU would cause KVM to try to synchronize the vCPU's
   TSC and thus clobber the correct TSC being set by userspace.
 
 * Compute guest wall clock using a single TSC read to avoid generating an
   inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
 
 * "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
   about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
   Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server
   2022.
 
 * Don't apply side effects to Hyper-V's synthetic timer on writes from
   userspace to fix an issue where the auto-enable behavior can trigger
   spurious interrupts, i.e. do auto-enabling only for guest writes.
 
 * Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
   without PML enabled.
 
 * Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
 
 * Harden the fast page fault path to guard against encountering an invalid
   root when walking SPTEs.
 
 * Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
 
 * Use the fast path directly from the timer callback when delivering Xen
   timer events, instead of waiting for the next iteration of the run loop.
   This was not done so far because previously proposed code had races,
   but now care is taken to stop the hrtimer at critical points such as
   restarting the timer or saving the timer information for userspace.
 
 * Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
 
 * Optimize injection of PMU interrupts that are simultaneous with NMIs.
 
 * Usual handful of fixes for typos and other warts.
 
 x86 - MTRR/PAT fixes and optimizations:
 
 * Clean up code that deals with honoring guest MTRRs when the VM has
   non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
 
 * Zap EPT entries when non-coherent DMA assignment stops/start to prevent
   using stale entries with the wrong memtype.
 
 * Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y.
   This was done as a workaround for virtual machine BIOSes that did not
   bother to clear CR0.CD (because ancient KVM/QEMU did not bother to
   set it, in turn), and there's zero reason to extend the quirk to
   also ignore guest PAT.
 
 x86 - SEV fixes:
 
 * Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
   running an SEV-ES guest.
 
 * Clean up the recognition of emulation failures on SEV guests, when KVM would
   like to "skip" the instruction but it had already been partially emulated.
   This makes it possible to drop a hack that second guessed the (insufficient)
   information provided by the emulator, and just do the right thing.
 
 Documentation:
 
 * Various updates and fixes, mostly for x86
 
 * MTRR and PAT fixes and optimizations:
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or
 l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH
 hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p
 bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD
 eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL
 6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g==
 =UIxm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Generalized infrastructure for 'writable' ID registers, effectively
     allowing userspace to opt-out of certain vCPU features for its
     guest

   - Optimization for vSGI injection, opportunistically compressing
     MPIDR to vCPU mapping into a table

   - Improvements to KVM's PMU emulation, allowing userspace to select
     the number of PMCs available to a VM

   - Guest support for memory operation instructions (FEAT_MOPS)

   - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
     bugs and getting rid of useless code

   - Changes to the way the SMCCC filter is constructed, avoiding wasted
     memory allocations when not in use

   - Load the stage-2 MMU context at vcpu_load() for VHE systems,
     reducing the overhead of errata mitigations

   - Miscellaneous kernel and selftest fixes

  LoongArch:

   - New architecture for kvm.

     The hardware uses the same model as x86, s390 and RISC-V, where
     guest/host mode is orthogonal to supervisor/user mode. The
     virtualization extensions are very similar to MIPS, therefore the
     code also has some similarities but it's been cleaned up to avoid
     some of the historical bogosities that are found in arch/mips. The
     kernel emulates MMU, timer and CSR accesses, while interrupt
     controllers are only emulated in userspace, at least for now.

  RISC-V:

   - Support for the Smstateen and Zicond extensions

   - Support for virtualizing senvcfg

   - Support for virtualized SBI debug console (DBCN)

  S390:

   - Nested page table management can be monitored through tracepoints
     and statistics

  x86:

   - Fix incorrect handling of VMX posted interrupt descriptor in
     KVM_SET_LAPIC, which could result in a dropped timer IRQ

   - Avoid WARN on systems with Intel IPI virtualization

   - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs
     without forcing more common use cases to eat the extra memory
     overhead.

   - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
     SBPB, aka Selective Branch Predictor Barrier).

   - Fix a bug where restoring a vCPU snapshot that was taken within 1
     second of creating the original vCPU would cause KVM to try to
     synchronize the vCPU's TSC and thus clobber the correct TSC being
     set by userspace.

   - Compute guest wall clock using a single TSC read to avoid
     generating an inaccurate time, e.g. if the vCPU is preempted
     between multiple TSC reads.

   - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which
     complain about a "Firmware Bug" if the bit isn't set for select
     F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to
     appease Windows Server 2022.

   - Don't apply side effects to Hyper-V's synthetic timer on writes
     from userspace to fix an issue where the auto-enable behavior can
     trigger spurious interrupts, i.e. do auto-enabling only for guest
     writes.

   - Remove an unnecessary kick of all vCPUs when synchronizing the
     dirty log without PML enabled.

   - Advertise "support" for non-serializing FS/GS base MSR writes as
     appropriate.

   - Harden the fast page fault path to guard against encountering an
     invalid root when walking SPTEs.

   - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.

   - Use the fast path directly from the timer callback when delivering
     Xen timer events, instead of waiting for the next iteration of the
     run loop. This was not done so far because previously proposed code
     had races, but now care is taken to stop the hrtimer at critical
     points such as restarting the timer or saving the timer information
     for userspace.

   - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future
     flag.

   - Optimize injection of PMU interrupts that are simultaneous with
     NMIs.

   - Usual handful of fixes for typos and other warts.

  x86 - MTRR/PAT fixes and optimizations:

   - Clean up code that deals with honoring guest MTRRs when the VM has
     non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.

   - Zap EPT entries when non-coherent DMA assignment stops/start to
     prevent using stale entries with the wrong memtype.

   - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y

     This was done as a workaround for virtual machine BIOSes that did
     not bother to clear CR0.CD (because ancient KVM/QEMU did not bother
     to set it, in turn), and there's zero reason to extend the quirk to
     also ignore guest PAT.

  x86 - SEV fixes:

   - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts
     SHUTDOWN while running an SEV-ES guest.

   - Clean up the recognition of emulation failures on SEV guests, when
     KVM would like to "skip" the instruction but it had already been
     partially emulated. This makes it possible to drop a hack that
     second guessed the (insufficient) information provided by the
     emulator, and just do the right thing.

  Documentation:

   - Various updates and fixes, mostly for x86

   - MTRR and PAT fixes and optimizations"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits)
  KVM: selftests: Avoid using forced target for generating arm64 headers
  tools headers arm64: Fix references to top srcdir in Makefile
  KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
  KVM: arm64: selftest: Perform ISB before reading PAR_EL1
  KVM: arm64: selftest: Add the missing .guest_prepare()
  KVM: arm64: Always invalidate TLB for stage-2 permission faults
  KVM: x86: Service NMI requests after PMI requests in VM-Enter path
  KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
  KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
  KVM: arm64: Refine _EL2 system register list that require trap reinjection
  arm64: Add missing _EL2 encodings
  arm64: Add missing _EL12 encodings
  KVM: selftests: aarch64: vPMU test for validating user accesses
  KVM: selftests: aarch64: vPMU register test for unimplemented counters
  KVM: selftests: aarch64: vPMU register test for implemented counters
  KVM: selftests: aarch64: Introduce vpmu_counter_access test
  tools: Import arm_pmuv3.h
  KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
  KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
  KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  ...
2023-11-02 15:45:15 -10:00
Paolo Bonzini
ef12ea629e LoongArch KVM changes for v6.7
Add LoongArch's KVM support. Loongson 3A5000/3A6000 supports hardware
 assisted virtualization. With cpu virtualization, there are separate
 hw-supported user mode and kernel mode in guest mode. With memory
 virtualization, there are two-level hw mmu table for guest mode and host
 mode. Also there is separate hw cpu timer with consant frequency in
 guest mode, so that vm can migrate between hosts with different freq.
 Currently, we are able to boot LoongArch Linux Guests.
 
 Few key aspects of KVM LoongArch added by this series are:
 1. Enable kvm hardware function when kvm module is loaded.
 2. Implement VM and vcpu related ioctl interface such as vcpu create,
    vcpu run etc. GET_ONE_REG/SET_ONE_REG ioctl commands are use to
    get general registers one by one.
 3. Hardware access about MMU, timer and csr are emulated in kernel.
 4. Hardwares such as mmio and iocsr device are emulated in user space
    such as IPI, irqchips, pci devices etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmUaKb4WHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImeuj+EACFIsWmHb8I3q4feviWBIdRcve7
 xzpseO/r2Xfx5IdU6/iAKIW1YNVpHU3c8+nYS20K73uGJnMJnAZ/hmPFf+0pJ4AB
 qZL4aBClRynH4YsdGUeYwBfU7VMKg66ijiaZkqFEa7lSePA81gwYY2MYao58J7Gw
 qMe9IPnxcU/a+UqxTrFs2/G4aSb4SR0cp69GFSIGXZ7uKBc4LCMZ0ujzo/NY/V4G
 NkoJi8I85frF1OQbJaeJ/lGkC1Dx3Qbv7AbXYObA8K/ka1c5lr9RKraha2BOFPif
 mHYDa7lE6qj6e5LX9QVcpkuDnxs2yaA0Zny/p2I2o7AS0xUU/eFGGdnu1bkMqrsw
 42ZuKL5Dtp/Wl5W2EEIkCVmZar+3UOeiWwGXPTk4A4fQHEGayXzJvrdUa/7f+O6Y
 JWYKSVjo6ZHg0ArKFtA13N54KximsrOKTpaB4fuUIP5SXMB/+JVn77/93Oyq6XSY
 9+7yUTzjeMqz+4o8DLPhBIyJPvZ6cxAfx2i4htHiQg7e9ir0rZE3jsOtvJ2pgo3K
 1QtGSJWH9dQMeOFnBfpDIvx3aDBcXPayXHNdaL3t/gCwJCpRuH1g6HeDTYXCJsQc
 qu6FDoeQn5+5IeEzoZawapQIJ2y7iXTk5SoysybKcKmnp8yYKB6olCqW8cxbWeMs
 HSqTUusPMsnJGlJMTw==
 =jaPm
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.7

Add LoongArch's KVM support. Loongson 3A5000/3A6000 supports hardware
assisted virtualization. With cpu virtualization, there are separate
hw-supported user mode and kernel mode in guest mode. With memory
virtualization, there are two-level hw mmu table for guest mode and host
mode. Also there is separate hw cpu timer with consant frequency in
guest mode, so that vm can migrate between hosts with different freq.
Currently, we are able to boot LoongArch Linux Guests.

Few key aspects of KVM LoongArch added by this series are:
1. Enable kvm hardware function when kvm module is loaded.
2. Implement VM and vcpu related ioctl interface such as vcpu create,
   vcpu run etc. GET_ONE_REG/SET_ONE_REG ioctl commands are use to
   get general registers one by one.
3. Hardware access about MMU, timer and csr are emulated in kernel.
4. Hardwares such as mmio and iocsr device are emulated in user space
   such as IPI, irqchips, pci devices etc.
2023-10-31 09:55:40 -04:00
Linus Torvalds
3cf3fabccb Locking changes in this cycle are:
- Futex improvements:
 
     - Add the 'futex2' syscall ABI, which is an attempt to get away from the
       multiplex syscall and adds a little room for extentions, while lifting
       some limitations.
 
     - Fix futex PI recursive rt_mutex waiter state bug
 
     - Fix inter-process shared futexes on no-MMU systems
 
     - Use folios instead of pages
 
  - Micro-optimizations of locking primitives:
 
     - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
       architectures, to improve lockref code generation.
 
     - Improve the x86-32 lockref_get_not_zero() main loop by adding
       build-time CMPXCHG8B support detection for the relevant lockref code,
       and by better interfacing the CMPXCHG8B assembly code with the compiler.
 
     - Introduce arch_sync_try_cmpxchg() on x86 to improve sync_try_cmpxchg()
       code generation. Convert some sync_cmpxchg() users to sync_try_cmpxchg().
 
     - Micro-optimize rcuref_put_slowpath()
 
  - Locking debuggability improvements:
 
     - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well
 
     - Enforce atomicity of sched_submit_work(), which is de-facto atomic but
       was un-enforced previously.
 
     - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check semantics
 
     - Fix ww_mutex self-tests
 
     - Clean up const-propagation in <linux/seqlock.h> and simplify
       the API-instantiation macros a bit.
 
  - RT locking improvements:
 
     - Provide the rt_mutex_*_schedule() primitives/helpers and use them
       in the rtmutex code to avoid recursion vs. rtlock on the PI state.
 
     - Add nested blocking lockdep asserts to rt_mutex_lock(), rtlock_lock()
       and rwbase_read_lock().
 
  - Plus misc fixes & cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU877IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g9jw/+N7rxQ78dmFCYh4UWnLCYvuKP0/ivHErG
 493JcB8MupuA2tfJHIkDdr4aM2mNq2E61w69/WlZAQWWD6pdOhwgF5Xf5eoEcJm0
 vsAhWBGLxihXdtevPuMAx0dEpg3AMp2wc6i5PkN831KdPUgCNsrKq9Bfnfef7/G8
 MQTSHjmtba6jxleyxfEa4tE2xe5PJX825nRfkX2e1cf+stkYua+uJFxVxUfxFWGE
 4pBy70D9OC7MsJ44WWOA1gwkVtMMiBTmRPNjlP8Gz2GQ0f3ERHRwYk3jDHOPHZI6
 0GNt7pE3IMXQn2UuDtfkvv9IFTd+U5qD+APnWIn2ntWXqzGLFqOlmovMrobVn7El
 olYDCyweWPG71m1Qblsb1VK2QjRPQVJ9NAEg8RlDHIu2ThxHbMysDVGPVOYnPFq4
 S8QFpmldzbNoPU4rDJyT1fAmoUIrusBHkl+Us3yGfC74iM+fHnDEvaSoMZbzEdY1
 x/Nocj9XgKEgfXdYzrCWFmZ9xXqHkO25/wDL6yKqBdQtvaEalXuHTT6mQcYxrUPm
 Xx1BPan2Jg7p4u2oOFcVtKewUtRH9KBx8qytr5S+JK4PJbrBsixMnr84HLd/3X2V
 ykYkO+367T5MTYv4TnJDE5vdurzUqekKSCFPY3skPujPJfdLj1vsPzYf9iMkCLdo
 hU2f/R+Wpdk=
 =36Ff
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Info Molnar:
 "Futex improvements:

   - Add the 'futex2' syscall ABI, which is an attempt to get away from
     the multiplex syscall and adds a little room for extentions, while
     lifting some limitations.

   - Fix futex PI recursive rt_mutex waiter state bug

   - Fix inter-process shared futexes on no-MMU systems

   - Use folios instead of pages

  Micro-optimizations of locking primitives:

   - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
     architectures, to improve lockref code generation

   - Improve the x86-32 lockref_get_not_zero() main loop by adding
     build-time CMPXCHG8B support detection for the relevant lockref
     code, and by better interfacing the CMPXCHG8B assembly code with
     the compiler

   - Introduce arch_sync_try_cmpxchg() on x86 to improve
     sync_try_cmpxchg() code generation. Convert some sync_cmpxchg()
     users to sync_try_cmpxchg().

   - Micro-optimize rcuref_put_slowpath()

  Locking debuggability improvements:

   - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well

   - Enforce atomicity of sched_submit_work(), which is de-facto atomic
     but was un-enforced previously.

   - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check
     semantics

   - Fix ww_mutex self-tests

   - Clean up const-propagation in <linux/seqlock.h> and simplify the
     API-instantiation macros a bit

  RT locking improvements:

   - Provide the rt_mutex_*_schedule() primitives/helpers and use them
     in the rtmutex code to avoid recursion vs. rtlock on the PI state.

   - Add nested blocking lockdep asserts to rt_mutex_lock(),
     rtlock_lock() and rwbase_read_lock()

  .. plus misc fixes & cleanups"

* tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  futex: Don't include process MM in futex key on no-MMU
  locking/seqlock: Fix grammar in comment
  alpha: Fix up new futex syscall numbers
  locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts
  locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning
  locking/seqlock: Change __seqprop() to return the function pointer
  locking/seqlock: Simplify SEQCOUNT_LOCKNAME()
  locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
  locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
  locking/atomic/x86: Introduce arch_sync_try_cmpxchg()
  locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback
  locking/seqlock: Fix typo in comment
  futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic()
  locking/local, arch: Rewrite local_add_unless() as a static inline function
  locking/debug: Fix debugfs API return value checks to use IS_ERR()
  locking/ww_mutex/test: Make sure we bail out instead of livelock
  locking/ww_mutex/test: Fix potential workqueue corruption
  locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup
  futex: Add sys_futex_requeue()
  futex: Add flags2 argument to futex_requeue()
  ...
2023-10-30 12:38:48 -10:00
Icenowy Zheng
278be83601 LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
Currently the code disables WUC only disables it for ioremap_wc(), which
is only used when mapping writecombine pages like ioremap() (mapped to
the kernel space). But for VRAM mapped in TTM/GEM, it is mapped with a
crafted pgprot by the pgprot_writecombine() function, in which case WUC
isn't disabled now.

Disable WUC for pgprot_writecombine() (fallback to SUC) if needed, like
ioremap_wc().

This improves the AMDGPU driver's stability (solves some misrendering)
on Loongson-3A5000/3A6000 machines.

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-10-18 08:42:52 +08:00
Tiezhu Yang
00c2ca84c6 LoongArch: Use SYM_CODE_* to annotate exception handlers
As described in include/linux/linkage.h,

  FUNC -- C-like functions (proper stack frame etc.)
  CODE -- non-C code (e.g. irq handlers with different, special stack etc.)

  SYM_FUNC_{START, END} -- use for global functions
  SYM_CODE_{START, END} -- use for non-C (special) functions

So use SYM_CODE_* to annotate exception handlers.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-10-18 08:42:52 +08:00
Ingo Molnar
fdb8b7a1af Linux 6.6-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmUjFeceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGNCAH/RDI8G44DCV9Ps5U
 rl/FMf6iLUxU6fCS3Wwe8vtppLjPP7Y16AH5HKMumoDIqTfh9ZAUVKhZfT+PTgz3
 /oFXcGzZQLTcdbtH7XK2/zk7N/RI25/rDiCDd1uIJVCNii+hsBKS6Ihc4wXadxaR
 0z3lwoEKp2egeaeqmJWMzJLdjRrYhLs33+SEciVYqTiIvlWsM5QBm/sMvES7V57s
 TXrs5/y7yXtDBZ2PgYNCBRLyBazjqB28x07aQoePOAs6nFXl5N/wWPW/4wirWFHT
 s9LYZlmVo+O+RHWj10ASm/2l+ihgn959ZfRj1VekK2AWU1x/VzSPcuCXKvsrUoa+
 xEjL+vM=
 =efE3
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc5' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-09 18:09:23 +02:00
Baolin Wang
55d2a0bd5e mm: add statistics for PUD level pagetable
Recently, we found that cross-die access to pagetable pages on ARM64
machines can cause performance fluctuations in our business.  Currently,
there are no PMU events available to track this situation on our ARM64
machines, so accurate pagetable accounting can help to analyze this issue,
but now the PUD level pagetable accounting is missed.

So introduce pagetable_pud_ctor/dtor() to help to get accurate PUD
pagetable accounting, as well as converting the architectures which use
generic PUD pagetable allocation to add corresponding PUD pagetable
accounting.  Moreover this patch will mark the PUD level pagetable with
PG_table flag, which will help to do sanity validation in
unpoison_memory().

On my testing machine, I can see more pagetables statistics after the patch
with page-types tool:

Before patch:
        flags           page-count      MB  symbolic-flags                     long-symbolic-flags
0x0000000004000000           27326      106  __________________________g_________________       pgtable
After patch:
0x0000000004000000           27541      107  __________________________g_________________       pgtable

Link: https://lkml.kernel.org/r/876c71c03a7e69c17722a690e3225a4f7b172fb2.1695017383.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-06 14:44:10 -07:00
Uros Bizjak
5e0eb67974 locking/local, arch: Rewrite local_add_unless() as a static inline function
Rewrite local_add_unless() as a static inline function with boolean
return value, similar to the arch_atomic_add_unless() arch fallbacks.

The function is currently unused.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230731084458.28096-1-ubizjak@gmail.com
2023-10-04 11:38:11 +02:00
Tianrui Zhao
81efe043a3 LoongArch: KVM: Implement handle iocsr exception
Implement kvm handle vcpu iocsr exception, setting the iocsr info into
vcpu_run and return to user space to handle it.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-10-02 10:01:28 +08:00
Tianrui Zhao
752e2cd7b4 LoongArch: KVM: Implement kvm mmu operations
Implement LoongArch kvm mmu, it is used to switch gpa to hpa when guest
exit because of address translation exception.

This patch implement: allocating gpa page table, searching gpa from it,
and flushing guest gpa in the table.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-10-02 10:01:28 +08:00
Tianrui Zhao
dfe3dc07fa LoongArch: KVM: Add vcpu related header files
Add LoongArch vcpu related header files, including vcpu csr information,
irq number definitions, and some vcpu interfaces.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-10-02 10:01:27 +08:00
Tianrui Zhao
b37e6b680e LoongArch: KVM: Add kvm related header files
Add LoongArch KVM related header files, including kvm.h, kvm_host.h and
kvm_types.h. All of those are about LoongArch virtualization features
and kvm interfaces.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-10-02 10:01:20 +08:00
Tiezhu Yang
2761498876 LoongArch: Define relocation types for ABI v2.10
The relocation types from 101 to 109 are used by GNU binutils >= 2.41,
add their definitions to use them in later patches.

Link: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=include/elf/loongarch.h#l230
Cc: <stable@vger.kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-27 16:19:13 +08:00
Huacai Chen
99e5a2472a LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem()
As Linus suggested, kasan_mem_to_shadow()/kasan_shadow_to_mem() are not
performance-critical and too big to inline. This is simply wrong so just
define them out-of-line.

If they really need to be inlined in future, such as the objtool / SMAP
issue for X86, we should mark them __always_inline.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20 14:26:29 +08:00
Huacai Chen
2a86f1b56a kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usage
As Linus suggested, __HAVE_ARCH_XYZ is "stupid" and "having historical
uses of it doesn't make it good". So migrate __HAVE_ARCH_SHADOW_MAP to
separate macros named after the respective functions.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20 14:26:29 +08:00
Andy Shevchenko
3563b477dd LoongArch: Use _UL() and _ULL()
Use _UL() and _ULL() that are provided by const.h.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20 14:26:29 +08:00
Bibo Mao
c718a0bad7 LoongArch: Fix some build warnings with W=1
There are some building warnings when building LoongArch kernel with W=1
as following, this patch fixes them.

arch/loongarch/kernel/acpi.c:284:13: warning: no previous prototype for ‘acpi_numa_arch_fixup’ [-Wmissing-prototypes]
  284 | void __init acpi_numa_arch_fixup(void) {}
      |             ^~~~~~~~~~~~~~~~~~~~
arch/loongarch/kernel/time.c:32:13: warning: no previous prototype for ‘constant_timer_interrupt’ [-Wmissing-prototypes]
   32 | irqreturn_t constant_timer_interrupt(int irq, void *data)
      |             ^~~~~~~~~~~~~~~~~~~~~~~~
arch/loongarch/kernel/traps.c:496:25: warning: no previous prototype for 'do_fpe' [-Wmissing-prototypes]
  496 | asmlinkage void noinstr do_fpe(struct pt_regs *regs
      |                         ^~~~~~
arch/loongarch/kernel/traps.c:813:22: warning: variable ‘opcode’ set but not used [-Wunused-but-set-variable]
  813 |         unsigned int opcode;
      |                      ^~~~~~
arch/loongarch/kernel/signal.c:895:14: warning: no previous prototype for ‘get_sigframe’ [-Wmissing-prototypes]
  895 | void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
      |              ^~~~~~~~~~~~
arch/loongarch/kernel/syscall.c:21:40: warning: initialized field overwritten [-Woverride-init]
   21 | #define __SYSCALL(nr, call)     [nr] = (call),
      |                                        ^
arch/loongarch/kernel/syscall.c:40:14: warning: no previous prototype for ‘do_syscall’ [-Wmissing-prototypes]
   40 | void noinstr do_syscall(struct pt_regs *regs)
      |              ^~~~~~~~~~
arch/loongarch/kernel/smp.c:502:17: warning: no previous prototype for ‘start_secondary’ [-Wmissing-prototypes]
  502 | asmlinkage void start_secondary(void)
      |                 ^~~~~~~~~~~~~~~
arch/loongarch/kernel/process.c:309:15: warning: no previous prototype for ‘arch_align_stack’ [-Wmissing-prototypes]
  309 | unsigned long arch_align_stack(unsigned long sp)
      |               ^~~~~~~~~~~~~~~~
arch/loongarch/kernel/topology.c:13:5: warning: no previous prototype for ‘arch_register_cpu’ [-Wmissing-prototypes]
   13 | int arch_register_cpu(int cpu)
      |     ^~~~~~~~~~~~~~~~~
arch/loongarch/kernel/topology.c:27:6: warning: no previous prototype for ‘arch_unregister_cpu’ [-Wmissing-prototypes]
   27 | void arch_unregister_cpu(int cpu)
      |      ^~~~~~~~~~~~~~~~~~~
arch/loongarch/kernel/module-sections.c:103:5: warning: no previous prototype for ‘module_frob_arch_sections’ [-Wmissing-prototypes]
  103 | int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~
arch/loongarch/mm/hugetlbpage.c:56:5: warning: no previous prototype for ‘is_aligned_hugepage_range’ [-Wmissing-prototypes]
   56 | int is_aligned_hugepage_range(unsigned long addr, unsigned long len)
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20 14:26:28 +08:00
Linus Torvalds
12952b6bbd LoongArch changes for v6.6
1, Allow usage of LSX/LASX in the kernel;
 2, Add SIMD-optimized RAID5/RAID6 routines;
 3, Add Loongson Binary Translation (LBT) extension support;
 4, Add basic KGDB & KDB support;
 5, Add building with kcov coverage;
 6, Add KFENCE (Kernel Electric-Fence) support;
 7, Add KASAN (Kernel Address Sanitizer) support;
 8, Some bug fixes and other small changes;
 9, Update the default config file.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmT5TfMWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImeqd3EACjqCaHNlp33kwufSPpGuQw9a8I
 F7JW1KzBOoWELch5nFRjfQClROBWRmM4jN5YnxENBQ5K2F1K6gfxdkfjew+KV2mn
 ki9ByamCfFVJDZXo9wavUD2LBrVakEFmLT+SyXBxdWwJ3fDivHjF6A0qs9ltp7dq
 Bttq4bkw1mZsU6MnViRwPKVROtNUVrd9mwYSTq0iXviVEbWhPHQQTxRizNra9Z6X
 7XWxO0ODHl0WVvdOJU+F16mBRS3Bs1g/HHAIDc41yrYEHFFOeFCEUAQSF/4Nj5wj
 BAfAB8WOa9+vPH8fTnrpCt2RtGJmkz71TM49DdXB7jpGaWIyc4WDi9MXeeBiJ0wE
 vQg8IECc9POC1sH4/6BMwq2qkrWRj2PYFYof0fP66iWNjmodtNUf7GOVHy8MTQan
 xHWizJFAdY/u/bwbF9tRQ+EVeot/844CkjtZxkgTfV8shN6kCMEVAamwBItZ7TXN
 g/oc1ORM6nsKHBDQF3r2LSY0Gbf3OSfMJVL8SLEQ9hAhgGhotmJ36B4bdvyO7T0Q
 gNn//U+p4IIMFRKRxreEz9P0KjTOJrHAAxNzu1oZebhGZd5WI+i0PHYkkBDKZTXc
 7qaEdM2cX8Wd0ePIXOHQnSItwYO7ilrviHyeCM8wd/g2/W/00jvnpF3J+2rk7eJO
 rcfAr8+V5ylYBQzp6Q==
 =NXy2
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Allow usage of LSX/LASX in the kernel, and use them for
   SIMD-optimized RAID5/RAID6 routines

 - Add Loongson Binary Translation (LBT) extension support

 - Add basic KGDB & KDB support

 - Add building with kcov coverage

 - Add KFENCE (Kernel Electric-Fence) support

 - Add KASAN (Kernel Address Sanitizer) support

 - Some bug fixes and other small changes

 - Update the default config file

* tag 'loongarch-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (25 commits)
  LoongArch: Update Loongson-3 default config file
  LoongArch: Add KASAN (Kernel Address Sanitizer) support
  LoongArch: Simplify the processing of jumping new kernel for KASLR
  kasan: Add (pmd|pud)_init for LoongArch zero_(pud|p4d)_populate process
  kasan: Add __HAVE_ARCH_SHADOW_MAP to support arch specific mapping
  LoongArch: Add KFENCE (Kernel Electric-Fence) support
  LoongArch: Get partial stack information when providing regs parameter
  LoongArch: mm: Add page table mapped mode support for virt_to_page()
  kfence: Defer the assignment of the local variable addr
  LoongArch: Allow building with kcov coverage
  LoongArch: Provide kaslr_offset() to get kernel offset
  LoongArch: Add basic KGDB & KDB support
  LoongArch: Add Loongson Binary Translation (LBT) extension support
  raid6: Add LoongArch SIMD recovery implementation
  raid6: Add LoongArch SIMD syndrome calculation
  LoongArch: Add SIMD-optimized XOR routines
  LoongArch: Allow usage of LSX/LASX in the kernel
  LoongArch: Define symbol 'fault' as a local label in fpu.S
  LoongArch: Adjust {copy, clear}_user exception handler behavior
  LoongArch: Use static defined zero page rather than allocated
  ...
2023-09-08 12:16:52 -07:00
Qing Zhang
5aa4ac64e6 LoongArch: Add KASAN (Kernel Address Sanitizer) support
1/8 of kernel addresses reserved for shadow memory. But for LoongArch,
There are a lot of holes between different segments and valid address
space (256T available) is insufficient to map all these segments to kasan
shadow memory with the common formula provided by kasan core, saying
(addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET

So LoongArch has a arch-specific mapping formula, different segments are
mapped individually, and only limited space lengths of these specific
segments are mapped to shadow.

At early boot stage the whole shadow region populated with just one
physical page (kasan_early_shadow_page). Later, this page is reused as
readonly zero shadow for some memory that kasan currently don't track.
After mapping the physical memory, pages for shadow memory are allocated
and mapped.

Functions like memset()/memcpy()/memmove() do a lot of memory accesses.
If bad pointer passed to one of these function it is important to be
caught. Compiler's instrumentation cannot do this since these functions
are written in assembly.

KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions in
mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in names, so we could call non-instrumented variant
if needed.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:54:16 +08:00
Qing Zhang
9fbcc07679 LoongArch: Simplify the processing of jumping new kernel for KASLR
Modified relocate_kernel() doesn't return new kernel's entry point but
the random_offset. In this way we share the start_kernel() processing
with the normal kernel, which avoids calling 'jr a0' directly and allows
some other operations (e.g, kasan_early_init) before start_kernel() when
KASLR (CONFIG_RANDOMIZE_BASE) is turned on.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:54:16 +08:00
Enze Li
6ad3df56bb LoongArch: Add KFENCE (Kernel Electric-Fence) support
The LoongArch architecture is quite different from other architectures.
When the allocating of KFENCE itself is done, it is mapped to the direct
mapping configuration window [1] by default on LoongArch.  It means that
it is not possible to use the page table mapped mode which required by
the KFENCE system and therefore it should be remapped to the appropriate
region.

This patch adds architecture specific implementation details for KFENCE.
In particular, this implements the required interface in <asm/kfence.h>.

Tested this patch by running the testcases and all passed.

[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#virtual-address-space-and-address-translation-mode

Signed-off-by: Enze Li <lienze@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:54:16 +08:00
Enze Li
8b5cb1cbf3 LoongArch: mm: Add page table mapped mode support for virt_to_page()
According to LoongArch documentations, there are two types of address
translation modes: direct mapped address translation mode (DMW mode) and
page table mapped address translation mode (TLB mode).

Currently, virt_to_page() only supports direct mapped mode. This patch
determines which mode is used, and adds corresponding handling functions
for both modes.

For more details on the two modes, see [1].

[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#virtual-address-space-and-address-translation-mode

Signed-off-by: Enze Li <lienze@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:55 +08:00
Feiyang Chen
b72961f847 LoongArch: Provide kaslr_offset() to get kernel offset
Provide kaslr_offset() to get the kernel offset when KASLR is enabled.

Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:55 +08:00
Qing Zhang
e14dd07696 LoongArch: Add basic KGDB & KDB support
KGDB is intended to be used as a source level debugger for the Linux
kernel. It is used along with gdb to debug a Linux kernel. GDB can be
used to "break in" to the kernel to inspect memory, variables and regs
similar to the way an application developer would use GDB to debug an
application. KDB is a frontend of KGDB which is similar to GDB.

By now, in addition to the generic KGDB features, the LoongArch KGDB
implements the following features:
- Hardware breakpoints/watchpoints;
- Software single-step support for KDB.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>   # Framework & CoreFeature
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> # BreakPoint & SingleStep
Signed-off-by: Hui Li <lihui@loongson.cn>           # Some Minor Improvements
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> # Some Build Error Fixes
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:55 +08:00
Qi Hu
bd3c579848 LoongArch: Add Loongson Binary Translation (LBT) extension support
Loongson Binary Translation (LBT) is used to accelerate binary translation,
which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags)
and x87 fpu stack pointer (ftop).

This patch support kernel to save/restore these registers, handle the LBT
exception and maintain sigcontext.

Signed-off-by: Qi Hu <huqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:55 +08:00
WANG Xuerui
75ded18a5e LoongArch: Add SIMD-optimized XOR routines
Add LSX and LASX implementations of xor operations, operating on 64
bytes (one L1 cache line) at a time, for a balance between memory
utilization and instruction mix. Huacai confirmed that all future
LoongArch implementations by Loongson (that we care) will likely also
feature 64-byte cache lines, and experiments show no throughput
improvement with further unrolling.

Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:

> 8regs           : 12702 MB/sec
> 8regs_prefetch  : 10920 MB/sec
> 32regs          : 12686 MB/sec
> 32regs_prefetch : 10918 MB/sec
> lsx             : 17589 MB/sec
> lasx            : 26116 MB/sec

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:55 +08:00
Bibo Mao
0921af6ccf LoongArch: Use static defined zero page rather than allocated
On LoongArch system, there is only one page needed for zero page (no
cache synonyms), and there is no COLOR_ZERO_PAGE, so zero_page_mask is
useless and the macro __HAVE_COLOR_ZERO_PAGE is not necessary.

Like other popular architectures, It is simpler to define the zero page
in kernel BSS code segment rather than dynamically allocate.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:10 +08:00
Bibo Mao
2bb20d2926 LoongArch: mm: Introduce unified function populate_kernel_pte()
Function pcpu_populate_pte() and fixmap_pte() are similar, they populate
one page from kernel address space. And there is confusion between pgd
and p4d in the function fixmap_pte(), such as pgd_none() always returns
zero. This patch introduces a unified function populate_kernel_pte() and
then replaces pcpu_populate_pte() and fixmap_pte().

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:09 +08:00
Hongchen Zhang
303be4b335 LoongArch: mm: Add p?d_leaf() definitions
When I do LTP test, LTP test case ksm06 caused panic at
	break_ksm_pmd_entry
	  -> pmd_leaf (Huge page table but False)
	  -> pte_present (panic)

The reason is pmd_leaf() is not defined, So like commit 501b810467
("mips: mm: add p?d_leaf() definitions") add p?d_leaf() definition for
LoongArch.

Fixes: 09cfefb7fa ("LoongArch: Add memory management")
Cc: stable@vger.kernel.org
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:09 +08:00
Nathan Chancellor
8ff81bb24f LoongArch: Drop unused parse_r and parse_v macros
When building with CONFIG_LTO_CLANG_FULL, there are several errors due
to the way that parse_r is defined with an __asm__ statement in a
header:

  ld.lld: error: ld-temp.o <inline asm>:105:1: macro 'parse_r' is already defined
  .macro  parse_r var r
  ^

This was an issue for arch/mips as well, which was resolved by commit
67512a8cf5 ("MIPS: Avoid macro redefinitions").

However, parse_r is unused in arch/loongarch after commit 83d8b38967
("LoongArch: Simplify the invtlb wrappers"), so doing the same change
does not make much sense now. Just remove parse_r (and parse_v, which
is also unused) to resolve the redefinition error. If it needs to be
brought back due to an actual use, it should be brought back with the
same changes as the aforementioned arch/mips commit.

Closes: https://github.com/ClangBuiltLinux/linux/issues/1924
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:09 +08:00
Linus Torvalds
df57721f9a Add x86 shadow stack support
Convert IBT selftest to asm to fix objtool warning
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTv1QQACgkQaDWVMHDJ
 krAUwhAAn6TOwHJK8BSkHeiQhON1nrlP3c5cv0AyZ2NP8RYDrZrSZvhpYBJ6wgKC
 Cx5CGq5nn9twYsYS3KsktLKDfR3lRdsQ7K9qtyFtYiaeaVKo+7gEKl/K+klwai8/
 gninQWHk0zmSCja8Vi77q52WOMkQKapT8+vaON9EVDO8dVEi+CvhAIfPwMafuiwO
 Rk4X86SzoZu9FP79LcCg9XyGC/XbM2OG9eNUTSCKT40qTTKm5y4gix687NvAlaHR
 ko5MTsdl0Wfp6Qk0ohT74LnoA2c1g/FluvZIM33ci/2rFpkf9Hw7ip3lUXqn6CPx
 rKiZ+pVRc0xikVWkraMfIGMJfUd2rhelp8OyoozD7DB7UZw40Q4RW4N5tgq9Fhe9
 MQs3p1v9N8xHdRKl365UcOczUxNAmv4u0nV5gY/4FMC6VjldCl2V9fmqYXyzFS4/
 Ogg4FSd7c2JyGFKPs+5uXyi+RY2qOX4+nzHOoKD7SY616IYqtgKoz5usxETLwZ6s
 VtJOmJL0h//z0A7tBliB0zd+SQ5UQQBDC2XouQH2fNX2isJMn0UDmWJGjaHgK6Hh
 8jVp6LNqf+CEQS387UxckOyj7fu438hDky1Ggaw4YqowEOhQeqLVO4++x+HITrbp
 AupXfbJw9h9cMN63Yc0gVxXQ9IMZ+M7UxLtZ3Cd8/PVztNy/clA=
 =3UUm
 -----END PGP SIGNATURE-----

Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 shadow stack support from Dave Hansen:
 "This is the long awaited x86 shadow stack support, part of Intel's
  Control-flow Enforcement Technology (CET).

  CET consists of two related security features: shadow stacks and
  indirect branch tracking. This series implements just the shadow stack
  part of this feature, and just for userspace.

  The main use case for shadow stack is providing protection against
  return oriented programming attacks. It works by maintaining a
  secondary (shadow) stack using a special memory type that has
  protections against modification. When executing a CALL instruction,
  the processor pushes the return address to both the normal stack and
  to the special permission shadow stack. Upon RET, the processor pops
  the shadow stack copy and compares it to the normal stack copy.

  For more information, refer to the links below for the earlier
  versions of this patch set"

Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/

* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  x86/shstk: Change order of __user in type
  x86/ibt: Convert IBT selftest to asm
  x86/shstk: Don't retry vm_munmap() on -EINTR
  x86/kbuild: Fix Documentation/ reference
  x86/shstk: Move arch detail comment out of core mm
  x86/shstk: Add ARCH_SHSTK_STATUS
  x86/shstk: Add ARCH_SHSTK_UNLOCK
  x86: Add PTRACE interface for shadow stack
  selftests/x86: Add shadow stack test
  x86/cpufeatures: Enable CET CR4 bit for shadow stack
  x86/shstk: Wire in shadow stack interface
  x86: Expose thread features in /proc/$PID/status
  x86/shstk: Support WRSS for userspace
  x86/shstk: Introduce map_shadow_stack syscall
  x86/shstk: Check that signal frame is shadow stack mem
  x86/shstk: Check that SSP is aligned on sigreturn
  x86/shstk: Handle signals for shadow stack
  x86/shstk: Introduce routines modifying shstk
  x86/shstk: Handle thread shadow stack
  x86/shstk: Add user-mode shadow stack support
  ...
2023-08-31 12:20:12 -07:00
Linus Torvalds
d68b4b6f30 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
("refactor Kconfig to consolidate KEXEC and CRASH options").
 
 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h").
 
 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands").
 
 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions").
 
 - Switch the handling of kdump from a udev scheme to in-kernel handling,
   by Eric DeVolder ("crash: Kernel handling of CPU and memory hot
   un/plug").
 
 - Many singleton patches to various parts of the tree
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO2GpAAKCRDdBJ7gKXxA
 juW3AQD1moHzlSN6x9I3tjm5TWWNYFoFL8af7wXDJspp/DWH/AD/TO0XlWWhhbYy
 QHy7lL0Syha38kKLMXTM+bN6YQHi9AU=
 =WJQa
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
   ("refactor Kconfig to consolidate KEXEC and CRASH options")

 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h")

 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands")

 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions")

 - Switch the handling of kdump from a udev scheme to in-kernel
   handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
   hot un/plug")

 - Many singleton patches to various parts of the tree

* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
  document while_each_thread(), change first_tid() to use for_each_thread()
  drivers/char/mem.c: shrink character device's devlist[] array
  x86/crash: optimize CPU changes
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  crash: hotplug support for kexec_load()
  x86/crash: add x86 crash hotplug support
  crash: memory and CPU hotplug sysfs attributes
  kexec: exclude elfcorehdr from the segment digest
  crash: add generic infrastructure for crash hotplug support
  crash: move a few code bits to setup support of crash hotplug
  kstrtox: consistently use _tolower()
  kill do_each_thread()
  nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
  scripts/bloat-o-meter: count weak symbol sizes
  treewide: drop CONFIG_EMBEDDED
  lockdep: fix static memory detection even more
  lib/vsprintf: declare no_hash_pointers in sprintf.h
  lib/vsprintf: split out sprintf() and friends
  kernel/fork: stop playing lockless games for exe_file replacement
  adfs: delete unused "union adfs_dirtail" definition
  ...
2023-08-29 14:53:51 -07:00
Linus Torvalds
b96a3e9142 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP.  It
   also speeds up GUP handling of transparent hugepages.
 
 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").
 
 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").
 
 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").
 
 - xu xin has doe some work on KSM's handling of zero pages.  These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support tracking
   KSM-placed zero-pages").
 
 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
 
 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").
 
 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
 
 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").
 
 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").
 
 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").
 
 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").
 
 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").
 
 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap").  And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").
 
 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
 
 - Baoquan He has converted some architectures to use the GENERIC_IOREMAP
   ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
   GENERIC_IOREMAP way").
 
 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").
 
 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep").  Liam also developed some efficiency improvements
   ("Reduce preallocations for maple tree").
 
 - Cleanup and optimization to the secondary IOMMU TLB invalidation, from
   Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").
 
 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").
 
 - Kemeng Shi provides some maintenance work on the compaction code ("Two
   minor cleanups for compaction").
 
 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
   file-backed faults under the VMA lock").
 
 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").
 
 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").
 
 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").
 
 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
 
 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").
 
 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").
 
 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
 
 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").
 
 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").
 
 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
   on memory feature on ppc64").
 
 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
 
 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").
 
 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").
 
 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").
 
 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").
 
 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").
 
 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").
 
 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").
 
 - pagetable handling cleanups from Matthew Wilcox ("New page table range
   API").
 
 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").
 
 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").
 
 - Matthew Wilcox has also done some maintenance work on the MM subsystem
   documentation ("Improve mm documentation").
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
 jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
 Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
 =Afws
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow ->huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio->swap instead of page->private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page->private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
2023-08-29 14:25:26 -07:00
Linus Torvalds
1a7c611546 Perf events changes for v6.6:
- AMD IBS improvements
 - Intel PMU driver updates
 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events
 - Micro-optimize software events and the ring-buffer code
 - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTtBscRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hHoQ/+IBQ8Xi/rcdd40n8OqEB/VBWVuSjNT3uN
 3pHHcTl2Pio9CxBeat42NekNijlRILCKJrZ3Lt3JWBmWyWv5l3KFabelj+lDF2xa
 TVCjTnQNe1+HvrODYnF4ECIs5vaoMVjcJ9jg8+VDgAcOQr1nZs4m5TVAd6TLqPpV
 urBEQVULkkzk7ZRhfrugKhw+wrpWFefgGCx0RV8ijZB7TLMHc2wE+Q/sTxKdKceL
 wNaJaDgV33pZh0aImwR9pKUE532hF1FiBdLuehkh61PZa1L82jzAX1xjw2s1hSa4
 eIWemPHJIYfivRlENbJsDWc4N8gk6ijVHwrxGcr4Axu+NN+zPtQ3ddhaGMAyKdTo
 qUKXH3MZSMIl++jI5Fkc6xM+XLvY1rML62epSzMwu/cc7Z5MeyWdQcri0N9YFuO7
 wUUNnFpU00lwQBLbyyUQ3Zi8E0QV7NuPW4axTkmntiIjMpLagaEvVSf6nf8qLpbE
 WTT16s707t19hUZNazNZ7ONmhly4ALbHFQEH65J2KoYn99fYqy9z68Hwk+xnmykw
 bc3qvfhpw0MImQQ+DqHiBwb4n4UuvY2WlkkZI3FfNeSG63DaM2mZikfpElpXYjn6
 9iOIXvx21Wiq/n0cbLhidI2q/ZzFCzYLCk6ikZ320wb+rhvd7EoSlZil6QSzn3pH
 Qdk+NEZgWQY=
 =ZT6+
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event updates from Ingo Molnar:

 - AMD IBS improvements

 - Intel PMU driver updates

 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events

 - Micro-optimize software events and the ring-buffer code

 - Misc cleanups & fixes

* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
  perf/x86/intel: Add Crestmont PMU
  x86/cpu: Update Hybrids
  x86/cpu: Fix Crestmont uarch
  x86/cpu: Fix Gracemont uarch
  perf: Remove unused extern declaration arch_perf_get_page_size()
  perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
  locking/arch: Avoid variable shadowing in local_try_cmpxchg()
  perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
  perf/x86: Use local64_try_cmpxchg
  perf/amd: Prevent grouping of IBS events
2023-08-28 16:35:01 -07:00
Huacai Chen
656f9aec07 LoongArch: Ensure FP/SIMD registers in the core dump file is up to date
This is a port of commit 379eb01c21 ("riscv: Ensure the value
of FP registers in the core dump file is up to date").

The values of FP/SIMD registers in the core dump file come from the
thread.fpu. However, kernel saves the FP/SIMD registers only before
scheduling out the process. If no process switch happens during the
exception handling, kernel will not have a chance to save the latest
values of FP/SIMD registers. So it may cause their values in the core
dump file incorrect. To solve this problem, force fpr_get()/simd_get()
to save the FP/SIMD registers into the thread.fpu if the target task
equals the current task.

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-08-26 22:21:57 +08:00
Tiezhu Yang
c337c849ab LoongArch: Put the body of play_dead() into arch_cpu_idle_dead()
The initial aim is to silence the following objtool warning:

arch/loongarch/kernel/process.o: warning: objtool: arch_cpu_idle_dead() falls through to next function start_thread()

According to tools/objtool/Documentation/objtool.txt, this is because
the last instruction of arch_cpu_idle_dead() is a call to a noreturn
function play_dead(). In order to silence the warning, one simple way
is to add the noreturn function play_dead() to objtool's hard-coded
global_noreturns array, that is to say, just put "NORETURN(play_dead)"
into tools/objtool/noreturns.h, it works well.

But I noticed that play_dead() is only defined once and only called by
arch_cpu_idle_dead(), so put the body of play_dead() into the caller
arch_cpu_idle_dead(), then remove the noreturn function play_dead() is
an alternative way which can reduce the overhead of the function call
at the same time.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-08-25 23:40:38 +08:00
Tiezhu Yang
8879515e12 LoongArch: Add identifier names to arguments of die() declaration
Add identifier names to arguments of die() declaration in ptrace.h
to fix the following checkpatch warnings:

  WARNING: function definition argument 'const char *' should also have an identifier name
  WARNING: function definition argument 'struct pt_regs *' should also have an identifier name

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-08-25 23:40:26 +08:00
Tiezhu Yang
6933c11fb5 LoongArch: Do not kill the task in die() if notify_die() returns NOTIFY_STOP
If notify_die() returns NOTIFY_STOP, honor the return value from the
handler chain invocation in die() and return without killing the task
as, through a debugger, the fault may have been fixed. It makes sense
even if ignoring the event will make the system unstable: by allowing
access through a debugger it has been compromised already anyway. It
makes our port consistent with x86, arm64, riscv and csky.

Commit 20c0d2d440 ("[PATCH] i386: pass proper trap numbers to die
chain handlers") may be the earliest of similar changes.

Link: https://lore.kernel.org/r/43DDF02E.76F0.0078.0@novell.com/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-08-25 23:40:26 +08:00
Masahiro Yamada
a746ceb1f3 LoongArch: Remove <asm/export.h>
All *.S files under arch/loongarch/ have been converted to include
<linux/export.h> instead of <asm/export.h>.

Remove <asm/export.h>.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-08-25 23:40:26 +08:00
Matthew Wilcox (Oracle)
203b7b6aad mm: rationalise flush_icache_pages() and flush_icache_page()
Move the default (no-op) implementation of flush_icache_pages() to
<linux/cacheflush.h> from <asm-generic/cacheflush.h>.  Remove the
flush_icache_page() wrapper from each architecture into
<linux/cacheflush.h>.

Link: https://lkml.kernel.org/r/20230802151406.3735276-32-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:25 -07:00
Matthew Wilcox (Oracle)
a6d01af08b loongarch: implement the new page table range API
Add update_mmu_cache_range() and change _PFN_SHIFT to PFN_PTE_SHIFT.  It
would probably be more efficient to implement __update_tlb() by flushing
the entire folio instead of calling __update_tlb() N times, but I'll leave
that for someone who understands the architecture better.

Link: https://lkml.kernel.org/r/20230802151406.3735276-15-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:21 -07:00
Vishal Moola (Oracle)
382739797f loongarch: convert various functions to use ptdescs
As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions.  Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Link: https://lkml.kernel.org/r/20230807230513.102486-22-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Guo Ren <guoren@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:56 -07:00
Douglas Anderson
8d539b84f1 nmi_backtrace: allow excluding an arbitrary CPU
The APIs that allow backtracing across CPUs have always had a way to
exclude the current CPU.  This convenience means callers didn't need to
find a place to allocate a CPU mask just to handle the common case.

Let's extend the API to take a CPU ID to exclude instead of just a
boolean.  This isn't any more complex for the API to handle and allows the
hardlockup detector to exclude a different CPU (the one it already did a
trace for) without needing to find space for a CPU mask.

Arguably, this new API also encourages safer behavior.  Specifically if
the caller wants to avoid tracing the current CPU (maybe because they
already traced the current CPU) this makes it more obvious to the caller
that they need to make sure that the current CPU ID can't change.

[akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub]
Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:19:00 -07:00
Baoquan He
0b1f77e74b asm-generic/iomap.h: remove ARCH_HAS_IOREMAP_xx macros
Patch series "mm: ioremap: Convert architectures to take GENERIC_IOREMAP
way", v8.

Motivation and implementation:
==============================
Currently, many architecutres have't taken the standard GENERIC_IOREMAP
way to implement ioremap_prot(), iounmap(), and ioremap_xx(), but make
these functions specifically under each arch's folder.  Those cause many
duplicated code of ioremap() and iounmap().

In this patchset, firstly introduce generic_ioremap_prot() and
generic_iounmap() to extract the generic code for GENERIC_IOREMAP.  By
taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(),
generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() and
iounmap() are all visible and available to arch.  Arch needs to provide
wrapper functions to override the generic version if there's arch specific
handling in its corresponding ioremap_prot(), ioremap() or iounmap(). 
With these changes, duplicated ioremap/iounmap() code uder ARCH-es are
removed, and the equivalent functioality is kept as before.

Background info:
================

1) The converting more architectures to take GENERIC_IOREMAP way is
   suggested by Christoph in below discussion:
   https://lore.kernel.org/all/Yp7h0Jv6vpgt6xdZ@infradead.org/T/#u

2) In the previous v1 to v3, it's basically further action after arm64
   has converted to GENERIC_IOREMAP way in below patchset.  It's done by
   adding hook ioremap_allowed() and iounmap_allowed() in ARCH to add ARCH
   specific handling the middle of ioremap_prot() and iounmap().

[PATCH v5 0/6] arm64: Cleanup ioremap() and support ioremap_prot()
https://lore.kernel.org/all/20220607125027.44946-1-wangkefeng.wang@huawei.com/T/#u

Later, during v3 reviewing, Christophe Leroy suggested to introduce
generic_ioremap_prot() and generic_iounmap() to generic codes, and ARCH
can provide wrapper function ioremap_prot(), ioremap() or iounmap() if
needed.  Christophe made a RFC patchset as below to specially demonstrate
his idea.  This is what v4 and now v5 is doing.

[RFC PATCH 0/8] mm: ioremap: Convert architectures to take GENERIC_IOREMAP way
https://lore.kernel.org/all/cover.1665568707.git.christophe.leroy@csgroup.eu/T/#u

Testing:
========
In v8, I only applied this patchset onto the latest linus's tree to build
and run on arm64 and s390.


This patch (of 19):

Let's use '#define ioremap_xx' and "#ifdef ioremap_xx" instead.

To remove defined ARCH_HAS_IOREMAP_xx macros in <asm/io.h> of each ARCH,
the ARCH's own ioremap_wc|wt|np definition need be above "#include
<asm-generic/iomap.h>.  Otherwise the redefinition error would be seen
during compiling.  So the relevant adjustments are made to avoid compiling
error:

  loongarch:
  - doesn't include <asm-generic/iomap.h>, defining ARCH_HAS_IOREMAP_WC
    is redundant, so simply remove it.

  m68k:
  - selected GENERIC_IOMAP, <asm-generic/iomap.h> has been added in
    <asm-generic/io.h>, and <asm/kmap.h> is included above
    <asm-generic/iomap.h>, so simply remove ARCH_HAS_IOREMAP_WT defining.

  mips:
  - move "#include <asm-generic/iomap.h>" below ioremap_wc definition
    in <asm/io.h>

  powerpc:
  - remove "#include <asm-generic/iomap.h>" in <asm/io.h> because it's
    duplicated with the one in <asm-generic/io.h>, let's rely on the
    latter.

  x86:
  - selected GENERIC_IOMAP, remove #include <asm-generic/iomap.h> in
    the middle of <asm/io.h>. Let's rely on <asm-generic/io.h>.

Link: https://lkml.kernel.org/r/20230706154520.11257-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Helge Deller <deller@gmx.de>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:32 -07:00
Huacai Chen
1e74ae3280 LoongArch: Cleanup __builtin_constant_p() checking for cpu_has_*
In the current configuration, cpu_has_lsx and cpu_has_lasx cannot be
constants. So cleanup the __builtin_constant_p() checking to reduce the
complexity.

Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-07-28 10:30:42 +08:00
Rick Edgecombe
2f0584f3f4 mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma()
The x86 Shadow stack feature includes a new type of memory called shadow
stack. This shadow stack memory has some unusual properties, which requires
some core mm changes to function properly.

One of these unusual properties is that shadow stack memory is writable,
but only in limited ways. These limits are applied via a specific PTE
bit combination. Nevertheless, the memory is writable, and core mm code
will need to apply the writable permissions in the typical paths that
call pte_mkwrite(). The goal is to make pte_mkwrite() take a VMA, so
that the x86 implementation of it can know whether to create regular
writable or shadow stack mappings.

But there are a couple of challenges to this. Modifying the signatures of
each arch pte_mkwrite() implementation would be error prone because some
are generated with macros and would need to be re-implemented. Also, some
pte_mkwrite() callers operate on kernel memory without a VMA.

So this can be done in a three step process. First pte_mkwrite() can be
renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
added that just calls pte_mkwrite_novma(). Next callers without a VMA can
be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
can be changed to take/pass a VMA.

Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and
adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same
pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(),
create a new arch config HAS_HUGE_PAGE that can be used to tell if
pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the
compiler would not be able to find pmd_mkwrite_novma().

No functional change.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/
Link: https://lore.kernel.org/all/20230613001108.3040476-2-rick.p.edgecombe%40intel.com
2023-07-11 14:10:56 -07:00
Uros Bizjak
d6b45484c1 locking/arch: Avoid variable shadowing in local_try_cmpxchg()
Several architectures define arch_try_local_cmpxchg macro using
internal temporary variables named ___old, __old or _old. Remove
temporary varible in local_try_cmpxchg to avoid variable shadowing.

No functional change intended.

Fixes: d994f2c8e2 ("locking/arch: Wire up local_try_cmpxchg()")
Closes: https://lore.kernel.org/lkml/CAFGhKbyxtuk=LoW-E3yLXgcmR93m+Dfo5-u9oQA_YC5Fcy_t9g@mail.gmail.com/
Reported-by: Charlemagne Lasse <charlemagnelasse@gmail.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230708090048.63046-1-ubizjak@gmail.com
2023-07-10 09:52:36 +02:00
Linus Torvalds
7b82e90411 asm-generic updates for 6.5
These are cleanups for architecture specific header files:
 
  - the comments in include/linux/syscalls.h have gone out of sync
    and are really pointless, so these get removed
 
  - The asm/bitsperlong.h header no longer needs to be architecture
    specific on modern compilers, so use a generic version for newer
    architectures that use new enough userspace compilers
 
  - A cleanup for virt_to_pfn/virt_to_bus to have proper type
    checking, forcing the use of pointers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmSl138ACgkQYKtH/8kJ
 UieqWxAA2WjNVfyuieYckglOVE0PZPs2fzCwyzTY5iUTH3gE5cBFWJDWcg2EnouG
 v3X3htEQcowYWaCF9+rypQXaGiSx4WXi2Bjxnz3D/BcreqWPI4eSQ0fpGG5SURTY
 2zYF72GTt4JGR++l+7/R9MZwPbwYDT9BsD5tkel8PxnyVLM6/c5xFvbjzRSKFE8x
 SMN1jGZ62ITLNf/8coAOEPNxBYtDT6yQyu7P2sx5cd65LAQq9yLKjFklnBBovgWT
 OoCIZAdGkhcNwOh1LjyHcdNdpfNJGceKyqKPqty07IhCQuF2jxiyFYFzuBbeyQfE
 S0itN8o/MIfUmxaQl3e8dPAVb1RlNVr1zfQ6y4tUtWNdkNL2WwSnSQSRHrBfHxCQ
 QCF++PMeFcLhGwMYtqdNJ7XGLQ0PsjD74pRf0vo+vjmqDk2BJsJBP57VU+8MJn5r
 SoxqnJ0WxLvm1TfrNKusV7zMNWquc2duJDW40zsOssP4itjYELSI6qa56qmzlqmX
 zKmRx6mxAlx9RRK8FHXFYHbz3p93vv8z9vTOZV3AjIjjED960CLknUAwCC8FoJyz
 9b5wyMXsLQHQjGt8luAvPc6OiU0EiU9a4SPK+feWcv27serFvnjJlRTS/yG2Z3zd
 BYsUgsXHypsdoud+aE7MeCy7fE8n3mhoyMQQRBkOMFJ7RsG6wAE=
 =S/he
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "These are cleanups for architecture specific header files:

   - the comments in include/linux/syscalls.h have gone out of sync and
     are really pointless, so these get removed

   - The asm/bitsperlong.h header no longer needs to be architecture
     specific on modern compilers, so use a generic version for newer
     architectures that use new enough userspace compilers

   - A cleanup for virt_to_pfn/virt_to_bus to have proper type checking,
     forcing the use of pointers"

* tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  syscalls: Remove file path comments from headers
  tools arch: Remove uapi bitsperlong.h of hexagon and microblaze
  asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch
  m68k/mm: Make pfn accessors static inlines
  arm64: memory: Make virt_to_pfn() a static inline
  ARM: mm: Make virt_to_pfn() a static inline
  asm-generic/page.h: Make pfn accessors static inlines
  xen/netback: Pass (void *) to virt_to_page()
  netfs: Pass a pointer to virt_to_page()
  cifs: Pass a pointer to virt_to_page() in cifsglob
  cifs: Pass a pointer to virt_to_page()
  riscv: mm: init: Pass a pointer to virt_to_page()
  ARC: init: Pass a pointer to virt_to_pfn() in init
  m68k: Pass a pointer to virt_to_pfn() virt_to_page()
  fs/proc/kcore.c: Pass a pointer to virt_addr_valid()
2023-07-06 10:06:04 -07:00
Linus Torvalds
cccf0c2ee5 Tracing updates for 6.5:
- Add new feature to have function graph tracer record the return value.
   Adds a new option: funcgraph-retval ; when set, will show the return
   value of a function in the function graph tracer.
 
 - Also add the option: funcgraph-retval-hex where if it is not set, and
   the return value is an error code, then it will return the decimal of
   the error code, otherwise it still reports the hex value.
 
 - Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd
   That when a application opens it, it becomes the task that the timer lat
   tracer traces. The application can also read this file to find out how
   it's being interrupted.
 
 - Add the file /sys/kernel/tracing/available_filter_functions_addrs
   that works just the same as available_filter_functions but also shows
   the addresses of the functions like kallsyms, except that it gives the
   address of where the fentry/mcount jump/nop is. This is used by BPF to
   make it easier to attach BPF programs to ftrace hooks.
 
 - Replace strlcpy with strscpy in the tracing boot code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZJy6ixQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnzRAPsEI2YgjaJSHnuPoGRHbrNil6pq66wY
 LYaLizGI4Jv9BwEAqdSdcYcMiWo1SFBAO8QxEDM++BX3zrRyVgW8ahaTNgs=
 =TF0C
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add new feature to have function graph tracer record the return
   value. Adds a new option: funcgraph-retval ; when set, will show the
   return value of a function in the function graph tracer.

 - Also add the option: funcgraph-retval-hex where if it is not set, and
   the return value is an error code, then it will return the decimal of
   the error code, otherwise it still reports the hex value.

 - Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd
   That when a application opens it, it becomes the task that the timer
   lat tracer traces. The application can also read this file to find
   out how it's being interrupted.

 - Add the file /sys/kernel/tracing/available_filter_functions_addrs
   that works just the same as available_filter_functions but also shows
   the addresses of the functions like kallsyms, except that it gives
   the address of where the fentry/mcount jump/nop is. This is used by
   BPF to make it easier to attach BPF programs to ftrace hooks.

 - Replace strlcpy with strscpy in the tracing boot code.

* tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix warnings when building htmldocs for function graph retval
  riscv: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  tracing/boot: Replace strlcpy with strscpy
  tracing/timerlat: Add user-space interface
  tracing/osnoise: Skip running osnoise if all instances are off
  tracing/osnoise: Switch from PF_NO_SETAFFINITY to migrate_disable
  ftrace: Show all functions with addresses in available_filter_functions_addrs
  selftests/ftrace: Add funcgraph-retval test case
  LoongArch: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex
  function_graph: Support recording and printing the return value of function
  fgraph: Add declaration of "struct fgraph_ret_regs"
2023-06-30 10:33:17 -07:00
Linus Torvalds
112e7e2151 LoongArch changes for v6.5
1, Preliminary ClangBuiltLinux enablement;
 2, Add support to clone a time namespace;
 3, Add vector extensions support;
 4, Add SMT (Simultaneous Multi-Threading) support;
 5, Support dbar with different hints;
 6, Introduce hardware page table walker;
 7, Add jump-label implementation;
 8, Add rethook and uprobes support;
 9, Some bug fixes and other small changes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmSdmI4WHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImeoeDEACd+KFZnQrX6fwpohuxWgQ46YSk
 bmKRnVCVg6jg/yL99WTHloaubMbyncgNL7YNvCRmuXcTQdjFP1zFb3q3ZqxTkaOT
 Kg9EO4R5H8U2Wolz3uqcvTBbPxv6bwDor1gBWzVo8RTO4S7gYt5tLS7pvLiYPWzp
 Jhgko2AHE/Y02Qqg00ARLIzDDLMm9vR5Gdmpj2jhl8wMHMNaqMW5E0r7XaiFoav0
 G1PjIAk+0LIj9QHYUm5e0kcXHh0KzgUMG0LUDawbAanZn2r1KhAk0ouwXX/eu9J7
 NQQRDi1Z02pTI5X3V1VAf+0O7RGnGGnWb/r2K76nr7lZNp88RJfbLtgBM01pzw2U
 NTnIAP7cAomNDaBglzAuS17yeFTS953nxaQlR8/t3UefP+fHmiIJyOOlxrXFMwVM
 jOfW4JAIkcl5DD/8l9lU1t91+zyKMrjsv8IrlGPW1sLsr/UOQjBajdHwnH8veC41
 mL+xjiMb51g33JDRDAl6mloxXms9LvcNzbnKSwqVO1i23GkN1iHJmbq66Teut07C
 7AH2qM3tSAuiXmNF1ntvodK1ukLS8moOiP8ZuuHfS13rr7gxPyRAvYdBiDaNEhF2
 gCYYpIcaly+5rSf6wvDXgl/s3tS4o07AzDfpttyH5jYnY80nVj7CgS8vUU/mg1lW
 QpapBnBHU8wrz+eBqQ==
 =3gt3
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - preliminary ClangBuiltLinux enablement

 - add support to clone a time namespace

 - add vector extensions support

 - add SMT (Simultaneous Multi-Threading) support

 - support dbar with different hints

 - introduce hardware page table walker

 - add jump-label implementation

 - add rethook and uprobes support

 - some bug fixes and other small changes

* tag 'loongarch-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (28 commits)
  LoongArch: Remove five DIE_* definitions in kdebug.h
  LoongArch: Add uprobes support
  LoongArch: Use larch_insn_gen_break() for kprobes
  LoongArch: Add larch_insn_gen_break() to generate break insns
  LoongArch: Check for AMO instructions in insns_not_supported()
  LoongArch: Move three functions from kprobes.c to inst.c
  LoongArch: Replace kretprobe with rethook
  LoongArch: Add jump-label implementation
  LoongArch: Select HAVE_DEBUG_KMEMLEAK to support kmemleak
  LoongArch: Export some arch-specific pm interfaces
  LoongArch: Introduce hardware page table walker
  LoongArch: Support dbar with different hints
  LoongArch: Add SMT (Simultaneous Multi-Threading) support
  LoongArch: Add vector extensions support
  LoongArch: Add support to clone a time namespace
  Makefile: Add loongarch target flag for Clang compilation
  LoongArch: Mark Clang LTO as working
  LoongArch: Include KBUILD_CPPFLAGS in CHECKFLAGS invocation
  LoongArch: vDSO: Use CLANG_FLAGS instead of filtering out '--target='
  LoongArch: Tweak CFLAGS for Clang compatibility
  ...
2023-06-30 08:52:28 -07:00
Linus Torvalds
1b722407a1 drm changes for 6.5-rc1:
core:
 - replace strlcpy with strscpy
 - EDID changes to support further conversion to struct drm_edid
 - Move i915 DSC parameter code to common DRM helpers
 - Add Colorspace functionality
 
 aperture:
 - ignore framebuffers with non-primary devices
 
 fbdev:
 - use fbdev i/o helpers
 - add Kconfig options for fb_ops helpers
 - use new fb io helpers directly in drivers
 
 sysfs:
 - export DRM connector ID
 
 scheduler:
 - Avoid an infinite loop
 
 ttm:
 - store function table in .rodata
 - Add query for TTM mem limit
 - Add NUMA awareness to pools
 - Export ttm_pool_fini()
 
 bridge:
 - fsl-ldb: support i.MX6SX
 - lt9211, lt9611: remove blanking packets
 - tc358768: implement input bus formats, devm cleanups
 - ti-snd65dsi86: implement wait_hpd_asserted
 - analogix: fix endless probe loop
 - samsung-dsim: support swapped clock, fix enabling, support var clock
 - display-connector: Add support for external power supply
 - imx: Fix module linking
 - tc358762: Support reset GPIO
 
 panel:
 - nt36523: Support Lenovo J606F
 - st7703: Support Anbernic RG353V-V2
 - InnoLux G070ACE-L01 support
 - boe-tv101wum-nl6: Improve initialization
 - sharp-ls043t1le001: Mode fixes
 - simple: BOE EV121WXM-N10-1850, S6D7AA0
 - Ampire AM-800480L1TMQW-T00H
 - Rocktech RK043FN48H
 - Starry himax83102-j02
 - Starry ili9882t
 
 amdgpu:
 - add new ctx query flag to handle reset better
 - add new query/set shadow buffer for rdna3
 - DCN 3.2/3.1.x/3.0.x updates
 - Enable DC_FP on loongarch
 - PCIe fix for RDNA2
 - improve DC FAMS/SubVP support for better power management
 - partition support for lots of engines
 - Take NUMA into account when allocating memory
 - Add new DRM_AMDGPU_WERROR config parameter to help with CI
 - Initial SMU13 overdrive support
 - Add support for new colorspace KMS API
 - W=1 fixes
 
 amdkfd:
 - Query TTM mem limit rather than hardcoding it
 - GC 9.4.3 partition support
 - Handle NUMA for partitions
 - Add debugger interface for enabling gdb
 - Add KFD event age tracking
 
 radeon:
 - Fix possible UAF
 
 i915:
 - new getparam for PXP support
 - GSC/MEI proxy driver
 - Meteorlake display enablement
 - avoid clearing preallocated framebuffers with TTM
 - implement framebuffer mmap support
 - Disable sampler indirect state in bindless heap
 - Enable fdinfo for GuC backends
 - GuC loading and firmware table handling fixes
 - Various refactors for multi-tile enablement
 - Define MOCS and PAT tables for MTL
 - GSC/MEI support for Meteorlake
 - PMU multi-tile support
 - Large driver kernel doc cleanup
 - Allow VRR toggling and arbitrary refresh rates
 - Support async flips on linear buffers on display ver 12+
 - Expose CRTC CTM property on ILK/SNB/VLV
 - New debugfs for display clock frequencies
 - Hotplug refactoring
 - Display refactoring
 - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
 - Use large rings for compute contexts
 - HuC loading for MTL
 - Allow user to set cache at BO creation
 - MTL powermanagement enhancements
 - Switch to dedicated workqueues to stop using flush_scheduled_work()
 - Move display runtime init under display/
 - Remove 10bit gamma on desktop gen3 parts, they don't support it
 
 habanalabs:
 - uapi: return 0 for user queries if there was a h/w or f/w error
 - Add pci health check when we lose connection with the firmware. This can be used to
   distinguish between pci link down and firmware getting stuck.
 - Add more info to the error print when TPC interrupt occur.
 - Firmware fixes
 
 msm:
 - Adreno A660 bindings
 - SM8350 MDSS bindings fix
 - Added support for DPU on sm6350 and sm6375 platforms
 - Implemented tearcheck support to support vsync on SM150 and newer platforms
 - Enabled missing features (DSPP, DSC, split display) on sc8180x, sc8280xp, sm8450
 - Added support for DSI and 28nm DSI PHY on MSM8226 platform
 - Added support for DSI on sm6350 and sm6375 platforms
 - Added support for display controller on MSM8226 platform
 - A690 GPU support
 - Move cmdstream dumping out of fence signaling path
 - a610 support
 - Support for a6xx devices without GMU
 
 nouveau:
 - NULL ptr before deref fixes
 
 armada:
 - implement fbdev emulation as client
 
 sun4i:
 - fix mipi-dsi dotclock
 - release clocks
 
 vc4:
 - rgb range toggle property
 - BT601 / BT2020 HDMI support
 
 vkms:
 - convert to drmm helpers
 - add reflection and rotation support
 - fix rgb565 conversion
 
 gma500:
 - fix iomem access
 
 shmobile:
 - support renesas soc platform
 - enable fbdev
 
 mxsfb:
 - Add support for i.MX93 LCDIF
 
 stm:
 - dsi: Use devm_ helper
 - ltdc: Fix potential invalid pointer deref
 
 renesas:
 - Group drivers in renesas subdirectory to prepare for new platform
 - Drop deprecated R-Car H3 ES1.x support
 
 meson:
 - Add support for MIPI DSI displays
 
 virtio:
 - add sync object support
 
 mediatek:
 - Add display binding document for MT6795
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmSc3UwACgkQDHTzWXnE
 hr69fQ/+PF9L7FSB/qfjaoqJnk6wJyCehv7pDX2/UK7FUrW0e4EwNVx4KKIRqO/P
 pKSU9wRlC72ViGgqOYnw0pwzuh45630vWo1stbgxipU2cvM6Ywlq8FiQFdymFe+P
 tLYWe5MR55Y+E9Y+bCrKn2yvQ7v+f6EZ6ITIX7mrXL77Bpxhv58VzmZawkxmw5MV
 vwhSqJaaeeWNoyfSIDdN8Oj9fE6ScTyiA0YisOP6jnK/TiQofXQxFrMIdKctCcoA
 HjolfEEPVCDOSBipkV3hLiyN8lXmt47BmuHp9opSL/g1aASteVeD1/GrccTaA4xV
 ah+Jx1hBLcH5sm8CZzbCcHhNu3ILnPCFZFCx8gwflQqmDIOZvoMdL75j7lgqJZG8
 TePEiifG3kYO/ZiDc5TUBdeMfbgeehPOsxbvOlA3LxJrgyxe/5o9oejX2Uvvzhoq
 9fno1PLqeCILqYaMiCocJwyTw/2VKYCCH7Wiypd4o3h0nmAbbqPT3KeZgNOjoa2X
 GXpiIU9rTQ8LZgSmOXdCt2rc9Jb6q+eCiDgrZzAukbP8veQyOvO16Nx1+XzLhOYc
 BfjEOoA7nBJD+UPLWkwj42gKtoEWN7IOMTHgcK11d8jdpGISGupl/1nntGhYk0jO
 +3RRZXMB/Gjwe9ge4K9bFC81pbfuAE7ELQtPsgV9LapMmWHKccY=
 =FmUA
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "There is one set of patches to misc for a i915 gsc/mei proxy driver.

  Otherwise it's mostly amdgpu/i915/msm, lots of hw enablement and lots
  of refactoring.

  core:
   - replace strlcpy with strscpy
   - EDID changes to support further conversion to struct drm_edid
   - Move i915 DSC parameter code to common DRM helpers
   - Add Colorspace functionality

  aperture:
   - ignore framebuffers with non-primary devices

  fbdev:
   - use fbdev i/o helpers
   - add Kconfig options for fb_ops helpers
   - use new fb io helpers directly in drivers

  sysfs:
   - export DRM connector ID

  scheduler:
   - Avoid an infinite loop

  ttm:
   - store function table in .rodata
   - Add query for TTM mem limit
   - Add NUMA awareness to pools
   - Export ttm_pool_fini()

  bridge:
   - fsl-ldb: support i.MX6SX
   - lt9211, lt9611: remove blanking packets
   - tc358768: implement input bus formats, devm cleanups
   - ti-snd65dsi86: implement wait_hpd_asserted
   - analogix: fix endless probe loop
   - samsung-dsim: support swapped clock, fix enabling, support var
     clock
   - display-connector: Add support for external power supply
   - imx: Fix module linking
   - tc358762: Support reset GPIO

  panel:
   - nt36523: Support Lenovo J606F
   - st7703: Support Anbernic RG353V-V2
   - InnoLux G070ACE-L01 support
   - boe-tv101wum-nl6: Improve initialization
   - sharp-ls043t1le001: Mode fixes
   - simple: BOE EV121WXM-N10-1850, S6D7AA0
   - Ampire AM-800480L1TMQW-T00H
   - Rocktech RK043FN48H
   - Starry himax83102-j02
   - Starry ili9882t

  amdgpu:
   - add new ctx query flag to handle reset better
   - add new query/set shadow buffer for rdna3
   - DCN 3.2/3.1.x/3.0.x updates
   - Enable DC_FP on loongarch
   - PCIe fix for RDNA2
   - improve DC FAMS/SubVP support for better power management
   - partition support for lots of engines
   - Take NUMA into account when allocating memory
   - Add new DRM_AMDGPU_WERROR config parameter to help with CI
   - Initial SMU13 overdrive support
   - Add support for new colorspace KMS API
   - W=1 fixes

  amdkfd:
   - Query TTM mem limit rather than hardcoding it
   - GC 9.4.3 partition support
   - Handle NUMA for partitions
   - Add debugger interface for enabling gdb
   - Add KFD event age tracking

  radeon:
   - Fix possible UAF

  i915:
   - new getparam for PXP support
   - GSC/MEI proxy driver
   - Meteorlake display enablement
   - avoid clearing preallocated framebuffers with TTM
   - implement framebuffer mmap support
   - Disable sampler indirect state in bindless heap
   - Enable fdinfo for GuC backends
   - GuC loading and firmware table handling fixes
   - Various refactors for multi-tile enablement
   - Define MOCS and PAT tables for MTL
   - GSC/MEI support for Meteorlake
   - PMU multi-tile support
   - Large driver kernel doc cleanup
   - Allow VRR toggling and arbitrary refresh rates
   - Support async flips on linear buffers on display ver 12+
   - Expose CRTC CTM property on ILK/SNB/VLV
   - New debugfs for display clock frequencies
   - Hotplug refactoring
   - Display refactoring
   - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
   - Use large rings for compute contexts
   - HuC loading for MTL
   - Allow user to set cache at BO creation
   - MTL powermanagement enhancements
   - Switch to dedicated workqueues to stop using flush_scheduled_work()
   - Move display runtime init under display/
   - Remove 10bit gamma on desktop gen3 parts, they don't support it

  habanalabs:
   - uapi: return 0 for user queries if there was a h/w or f/w error
   - Add pci health check when we lose connection with the firmware.
     This can be used to distinguish between pci link down and firmware
     getting stuck.
   - Add more info to the error print when TPC interrupt occur.
   - Firmware fixes

  msm:
   - Adreno A660 bindings
   - SM8350 MDSS bindings fix
   - Added support for DPU on sm6350 and sm6375 platforms
   - Implemented tearcheck support to support vsync on SM150 and newer
     platforms
   - Enabled missing features (DSPP, DSC, split display) on sc8180x,
     sc8280xp, sm8450
   - Added support for DSI and 28nm DSI PHY on MSM8226 platform
   - Added support for DSI on sm6350 and sm6375 platforms
   - Added support for display controller on MSM8226 platform
   - A690 GPU support
   - Move cmdstream dumping out of fence signaling path
   - a610 support
   - Support for a6xx devices without GMU

  nouveau:
   - NULL ptr before deref fixes

  armada:
   - implement fbdev emulation as client

  sun4i:
   - fix mipi-dsi dotclock
   - release clocks

  vc4:
   - rgb range toggle property
   - BT601 / BT2020 HDMI support

  vkms:
   - convert to drmm helpers
   - add reflection and rotation support
   - fix rgb565 conversion

  gma500:
   - fix iomem access

  shmobile:
   - support renesas soc platform
   - enable fbdev

  mxsfb:
   - Add support for i.MX93 LCDIF

  stm:
   - dsi: Use devm_ helper
   - ltdc: Fix potential invalid pointer deref

  renesas:
   - Group drivers in renesas subdirectory to prepare for new platform
   - Drop deprecated R-Car H3 ES1.x support

  meson:
   - Add support for MIPI DSI displays

  virtio:
   - add sync object support

  mediatek:
   - Add display binding document for MT6795"

* tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm: (1791 commits)
  drm/i915: Fix a NULL vs IS_ERR() bug
  drm/i915: make i915_drm_client_fdinfo() reference conditional again
  drm/i915/huc: Fix missing error code in intel_huc_init()
  drm/i915/gsc: take a wakeref for the proxy-init-completion check
  drm/msm/a6xx: Add A610 speedbin support
  drm/msm/a6xx: Add A619_holi speedbin support
  drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
  drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
  drm/msm/a6xx: Fix some A619 tunables
  drm/msm/a6xx: Add A610 support
  drm/msm/a6xx: Add support for A619_holi
  drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
  drm/msm/a6xx: Introduce GMU wrapper support
  drm/msm/a6xx: Move CX GMU power counter enablement to hw_init
  drm/msm/a6xx: Extend and explain UBWC config
  drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
  drm/msm/a6xx: Add a helper for software-resetting the GPU
  drm/msm/a6xx: Improve a6xx_bus_clear_pending_transactions()
  drm/msm/a6xx: Move a6xx_bus_clear_pending_transactions to a6xx_gpu
  drm/msm/a6xx: Move force keepalive vote removal to a6xx_gmu_force_off()
  ...
2023-06-29 11:00:17 -07:00
Tiezhu Yang
5ee35c7696 LoongArch: Remove five DIE_* definitions in kdebug.h
For now, DIE_PAGE_FAULT, DIE_BREAK, DIE_SSTEPBP, DIE_UPROBE and
DIE_UPROBE_XOL are not used by any code, remove them.

Tested-by: Jeff Xie <xiehuan09@gmail.com>
Suggested-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Tiezhu Yang
19bc6cb640 LoongArch: Add uprobes support
Uprobes is the user-space counterpart to kprobes, this patch adds
uprobes support for LoongArch.

Here is a simple example with CONFIG_UPROBE_EVENTS=y:

  # cat test.c
  #include <stdio.h>

  int add(int a, int b)
  {
  	  return a + b;
  }

  int main()
  {
	  return add(2, 7);
  }
  # gcc test.c -o /tmp/test
  # nm /tmp/test | grep add
  0000000120004194 T add
  # cd /sys/kernel/debug/tracing
  # echo > uprobe_events
  # echo "p:myuprobe /tmp/test:0x4194 %r4 %r5" > uprobe_events
  # echo "r:myuretprobe /tmp/test:0x4194 %r4" >> uprobe_events
  # echo 1 > events/uprobes/enable
  # echo 1 > tracing_on
  # /tmp/test
  # cat trace
  ...
  #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
  #              | |         |   |||||     |         |
              test-1060    [001] DNZff  1015.770620: myuprobe: (0x120004194) arg1=0x2 arg2=0x7
              test-1060    [001] DNZff  1015.770930: myuretprobe: (0x1200041f0 <- 0x120004194) arg1=0x9

Tested-by: Jeff Xie <xiehuan09@gmail.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Tiezhu Yang
6e32036333 LoongArch: Use larch_insn_gen_break() for kprobes
For now, we can use larch_insn_gen_break() to define KPROBE_BP_INSN and
KPROBE_SSTEPBP_INSN. Because larch_insn_gen_break() returns instruction
word, define kprobe_opcode_t as u32, then do some small changes related
with type conversion, no functional change intended.

Tested-by: Jeff Xie <xiehuan09@gmail.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Tiezhu Yang
49ed320da5 LoongArch: Add larch_insn_gen_break() to generate break insns
There exist various break insns such as BRK_KPROBE_BP, BRK_KPROBE_SSTEPBP,
BRK_UPROBE_BP and BRK_UPROBE_XOLBP, add larch_insn_gen_break() to generate
break insns simpler, this is preparation for later patch.

Tested-by: Jeff Xie <xiehuan09@gmail.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Tiezhu Yang
b82fad4d5d LoongArch: Check for AMO instructions in insns_not_supported()
Like llsc instructions, the atomic memory access instructions shouldn't
be supported for probing, so check for them in insns_not_supported().

Closes: https://lore.kernel.org/all/SY4P282MB351877A70A0333C790FE85A5C09C9@SY4P282MB3518.AUSP282.PROD.OUTLOOK.COM/
Tested-by: Jeff Xie <xiehuan09@gmail.com>
Reported-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Tiezhu Yang
3d2c3daf82 LoongArch: Move three functions from kprobes.c to inst.c
The three functions insns_not_supported(), insns_need_simulation() and
arch_simulate_insn() will be used for uprobes, move them from kprobes.c
to inst.c, this is preparation for later patch, no functionality change.

Tested-by: Jeff Xie <xiehuan09@gmail.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Haoran Jiang
7b0a096436 LoongArch: Replace kretprobe with rethook
This is an adaptation of commit f3a112c0c4 ("x86,rethook,kprobes:
Replace kretprobe with rethook on x86") and commit b57c2f1240 ("riscv:
add riscv rethook implementation") to LoongArch. Mainly refer to commit
b57c2f1240 ("riscv: add riscv rethook implementation").

Replaces the kretprobe code with rethook on LoongArch. With this patch,
kretprobe on LoongArch uses the rethook instead of kretprobe specific
trampoline code.

Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Youling Tang
f02644e32c LoongArch: Add jump-label implementation
Add support for jump labels based on the ARM64 version.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Yinbo Zhu
31f1a8b0ec LoongArch: Export some arch-specific pm interfaces
Some PMC (Power Management Controllers) need to support DTS and will use
the suspend interfaces thus this patch was to export such interfaces for
their use.

Signed-off-by: Yinbo Zhu <zhuyinbo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Huacai Chen
01158487af LoongArch: Introduce hardware page table walker
Loongson-3A6000 and newer processors have hardware page table walker
(PTW) support. PTW can handle all fastpaths of TLBI/TLBL/TLBS/TLBM
exceptions by hardware, software only need to handle slowpaths (page
faults).

BTW, PTW doesn't append _PAGE_MODIFIED for page table entries, so we
change pmd_dirty() and pte_dirty() to also check _PAGE_DIRTY for the
"dirty" attribute.

Signed-off-by: Liang Gao <gaoliang@loongson.cn>
Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Huacai Chen
e031a5f3f1 LoongArch: Support dbar with different hints
Traditionally, LoongArch uses "dbar 0" (full completion barrier) for
everything. But the full completion barrier is a performance killer, so
Loongson-3A6000 and newer processors have made finer granularity hints
available:

Bit4: ordering or completion (0: completion, 1: ordering)
Bit3: barrier for previous read (0: true, 1: false)
Bit2: barrier for previous write (0: true, 1: false)
Bit1: barrier for succeeding read (0: true, 1: false)
Bit0: barrier for succeeding write (0: true, 1: false)

Hint 0x700: barrier for "read after read" from the same address, which
is needed by LL-SC loops on old models (dbar 0x700 behaves the same as
nop if such reordering is disabled on new models).

This patch makes use of the various new hints for different kinds of
memory barriers. It brings performance improvements on Loongson-3A6000
series, while not affecting the existing models because all variants are
treated as 'dbar 0' there.

Why override queued_spin_unlock()?
After commit 01e3b958ef ("drivers: Remove explicit invocations
of mmiowb()") we need a completion barrier in queued_spin_unlock(), but
the generic implementation use smp_store_release() which only provide an
ordering barrier.

Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:44 +08:00
Huacai Chen
f6f0c9a74a LoongArch: Add SMT (Simultaneous Multi-Threading) support
Loongson-3A6000 has SMT (Simultaneous Multi-Threading) support, each
physical core has two logical cores (threads). This patch add SMT probe
and scheduler support via ACPI PPTT.

If SCHED_SMT enabled, Loongson-3A6000 is treated as 4 cores, 8 threads;
If SCHED_SMT disabled, Loongson-3A6000 is treated as 8 cores, 8 threads.

Remove smp_num_siblings to support HMP (Heterogeneous Multi-Processing).

Signed-off-by: Liupu Wang <wangliupu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:43 +08:00
Huacai Chen
616500232e LoongArch: Add vector extensions support
Add LoongArch's vector extensions support, which including 128bit LSX
(i.e., Loongson SIMD eXtension) and 256bit LASX (i.e., Loongson Advanced
SIMD eXtension).

Linux kernel doesn't use vector itself, it only handle exceptions and
context save/restore. So it only needs a subset of these instructions:

* Vector load/store:   vld vst vldx vstx xvld xvst xvldx xvstx
* 8bit-elements move:  vpickve2gr.b xvpickve2gr.b vinsgr2vr.b xvinsgr2vr.b
* 16bit-elements move: vpickve2gr.h xvpickve2gr.h vinsgr2vr.h xvinsgr2vr.h
* 32bit-elements move: vpickve2gr.w xvpickve2gr.w vinsgr2vr.w xvinsgr2vr.w
* 64bit-elements move: vpickve2gr.d xvpickve2gr.d vinsgr2vr.d xvinsgr2vr.d
* Elements permute:    vpermi.w vpermi.d xvpermi.w xvpermi.d xvpermi.q

Introduce AS_HAS_LSX_EXTENSION and AS_HAS_LASX_EXTENSION to avoid non-
vector toolchains complains unsupported instructions.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:43 +08:00
Tiezhu Yang
aa5e65dc08 LoongArch: Add support to clone a time namespace
We can see that "Time namespaces are not supported" on LoongArch:

(1) clone3 test
  # cd tools/testing/selftests/clone3 && make && ./clone3
  ...
  # Time namespaces are not supported
  ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
  # Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0

(2) timens test
  # cd tools/testing/selftests/timens && make && ./timens
  ...
  1..0 # SKIP Time namespaces are not supported

On LoongArch the current kernel does not support CONFIG_TIME_NS which
depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
CONFIG_TIME_NS to build kernel/time/namespace.c.

Additionally, it needs to define some arch-dependent functions for the
timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
vdso_join_timens().

At the same time, modify the layout of vvar to use one page size for
generic vdso data, expand another page size for timens vdso data and
assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
the future) for loongarch vdso data, at last add the callback function
vvar_fault() and modify stack_top().

With this patch under CONFIG_TIME_NS:

(1) clone3 test
  # cd tools/testing/selftests/clone3 && make && ./clone3
  ...
  ok 18 [739] Result (0) matches expectation (0)
  # Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

(2) timens test
  # cd tools/testing/selftests/timens && make && ./timens
  ...
  # Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:43 +08:00
WANG Xuerui
38b10b269d LoongArch: Tweak CFLAGS for Clang compatibility
Now the arch code is mostly ready for LLVM/Clang consumption, it is time
to re-organize the CFLAGS a little to actually enable the LLVM build.
Namely, all -G0 switches from CFLAGS are removed, and -mexplicit-relocs
and -mdirect-extern-access are now wrapped with cc-option (with the
related asm/percpu.h definition guarded against toolchain combos that
are known to not work).

A build with !RELOCATABLE && !MODULE is confirmed working within a QEMU
environment; support for the two features are currently blocked on
LLVM/Clang, and will come later.

Why -G0 can be removed:

In GCC, -G stands for "small data threshold", that instructs the
compiler to put data smaller than the specified threshold in a dedicated
"small data" section (called .sdata on LoongArch and several other
arches).

However, benefiting from this would require ABI cooperation, which is
not the case for LoongArch; and current GCC behave the same whether -G0
(equal to disabling this optimization) is given or not. So, remove -G0
from CFLAGS altogether for one less thing to care about. This also
benefits LLVM/Clang compatibility where the -G switch is not supported.

Why -mexplicit-relocs can now be conditionally applied without
regressions:

Originally -mexplicit-relocs is unconditionally added to CFLAGS in case
of CONFIG_AS_HAS_EXPLICIT_RELOCS, because not having it (i.e. old GCC +
new binutils) would not work: modules will have R_LARCH_ABS_* relocs
inside, but given the rarity of such toolchain combo in the wild, it may
not be worthwhile to support it, so support for such relocs in modules
were not added back when explicit relocs support was upstreamed, and
-mexplicit-relocs is unconditionally added to fail the build early.

Now that Clang compatibility is desired, given Clang is behaving like
-mexplicit-relocs from day one but without support for the CLI flag, we
must ensure the flag is not passed in case of Clang. However, explicit
compiler flavor checks can be more brittle than feature detection: in
this case what actually matters is support for __attribute__((model))
when building modules. Given neither older GCC nor current Clang support
this attribute, probing for the attribute support and #error'ing out
would allow proper UX without checking for Clang, and also automatically
work when Clang support for the attribute is to be added in the future.

Why -mdirect-extern-access is now conditionally applied:

This is actually a nice-to-have optimization that can reduce GOT
accesses, but not having it is harmless either. Because Clang does not
support the option currently, but might do so in the future, conditional
application via cc-option ensures compatibility with both current and
future Clang versions.

Suggested-by: Xi Ruoyao <xry111@xry111.site> # cc-option changes
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:43 +08:00
WANG Xuerui
83d8b38967 LoongArch: Simplify the invtlb wrappers
The invtlb instruction has been supported by upstream LoongArch
toolchains from day one, so ditch the raw opcode trickery and just use
plain inline asm for it.

While at it, also make the invtlb asm statements barriers, for proper
modeling of the side effects. The functions are also marked as
__always_inline instead of just "inline", because they cannot work at
all if not inlined: the op argument will not be compile-time const in
that case, thus failing to satisfy the "i" constraint.

The signature of the other more specific invtlb wrappers contain unused
arguments right now, but these are not removed right away in order for
the patch to be focused. In the meantime, assertions are added to ensure
no accidental misuse happens before the refactor. (The more specific
wrappers cannot re-use the generic invtlb wrapper, because the ISA
manual says $zero shall be used in case a particular op does not take
the respective argument: re-using the generic wrapper would mean losing
control over the register usage.)

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:43 +08:00
WANG Xuerui
53a4858ccd LoongArch: Make the CPUCFG&CSR ops simple aliases of compiler built-ins
In addition to less visual clutter, this also makes Clang happy
regarding the const-ness of arguments. In the original approach, all
Clang gets to see is the incoming arguments whose const-ness cannot be
proven without first being inlined; so Clang errors out here while GCC
is fine.

While at it, tweak several printk format strings because the return type
of csr_read64 becomes effectively unsigned long, instead of unsigned
long long.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:43 +08:00
WANG Xuerui
38bb46f945 LoongArch: Prepare for assemblers with proper FCSR class support
The GNU assembler (as of 2.40) mis-treats FCSR operands as GPRs, but
the LLVM IAS does not. Probe for this and refer to FCSRs as "$fcsrNN"
if support is present.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:43 +08:00
WANG Rui
24da0249d9 LoongArch: extable: Also recognize ABI names of registers
When the kernel is compiled with LLVM, the register names being handled
during exception fixup building are ABI names instead of bare $rNN
style. Add mapping for the ABI names for LLVM compatibility.

Signed-off-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:43 +08:00
WANG Rui
0d03e9dce5 LoongArch: Add guard for the larch_insn_gen_xxx functions
Add guard for the larch_insn_gen_xxx functions to verify whether the
immediate operand is within the acceptable range.

Signed-off-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-29 20:58:42 +08:00
Linus Torvalds
bc6cb4d5bc Locking changes for v6.5:
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double().
 
   The cmpxchg128() family of functions is basically & functionally
   the same as cmpxchg_double(), but with a saner interface: instead
   of a 6-parameter horror that forced u128 - u64/u64-halves layout
   details on the interface and exposed users to complexity,
   fragility & bugs, use a natural 3-parameter interface with u128 types.
 
 - Restructure the generated atomic headers, and add
   kerneldoc comments for all of the generic atomic{,64,_long}_t
   operations. Generated definitions are much cleaner now,
   and come with documentation.
 
 - Implement lock_set_cmp_fn() on lockdep, for defining an ordering
   when taking multiple locks of the same type. This gets rid of
   one use of lockdep_set_novalidate_class() in the bcache code.
 
 - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended
   variable shadowing generating garbage code on Clang on certain
   ARM builds.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSav3wRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gDyxAAjCHQjpolrre7fRpyiTDwqzIKT27H04vQ
 zrQVlVc42WBnn9pe8LthGy43/RvYvqlZvLoLONA4fMkuYriM6nSMsoZjeUmE+6Rs
 QAElQC74P5YvEBOa67VNY3/M7sj22ftDe7ODtVV8OrnPjMk1sQNRvaK025Cs3yig
 8MAI//hHGNmyVAp1dPYZMJNqxGCvluReLZ4SaUJFCMrg7YgUXgCBj/5Gi07TlKxn
 sT8BFCssoEW/B9FXkh59B1t6FBCZoSy4XSZfsZe0uVAUJ4XDEOO+zBgaWFCedNQT
 wP323ryBgMrkzUKA8j2/o5d3QnMA1GcBfHNNlvAl/fOfrxWXzDZnOEY26YcaLMa0
 YIuRF/JNbPZlt6DCUVBUEvMPpfNYi18dFN0rat1a6xL2L4w+tm55y3mFtSsg76Ka
 r7L2nWlRrAGXnuA+VEPqkqbSWRUSWOv5hT2Mcyb5BqqZRsxBETn6G8GVAzIO6j6v
 giyfUdA8Z9wmMZ7NtB6usxe3p1lXtnZ/shCE7ZHXm6xstyZrSXaHgOSgAnB9DcuJ
 7KpGIhhSODQSwC/h/J0KEpb9Pr/5jCWmXAQ2DWnZK6ndt1jUfFi8pfK58wm0AuAM
 o9t8Mx3o8wZjbMdt6up9OIM1HyFiMx2BSaZK+8f/bWemHQ0xwez5g4k5O5AwVOaC
 x9Nt+Tp0Ze4=
 =DsYj
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()

   The cmpxchg128() family of functions is basically & functionally the
   same as cmpxchg_double(), but with a saner interface.

   Instead of a 6-parameter horror that forced u128 - u64/u64-halves
   layout details on the interface and exposed users to complexity,
   fragility & bugs, use a natural 3-parameter interface with u128
   types.

 - Restructure the generated atomic headers, and add kerneldoc comments
   for all of the generic atomic{,64,_long}_t operations.

   The generated definitions are much cleaner now, and come with
   documentation.

 - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
   taking multiple locks of the same type.

   This gets rid of one use of lockdep_set_novalidate_class() in the
   bcache code.

 - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
   shadowing generating garbage code on Clang on certain ARM builds.

* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
  percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
  locking/atomic: treewide: delete arch_atomic_*() kerneldoc
  locking/atomic: docs: Add atomic operations to the driver basic API documentation
  locking/atomic: scripts: generate kerneldoc comments
  docs: scripts: kernel-doc: accept bitwise negation like ~@var
  locking/atomic: scripts: simplify raw_atomic*() definitions
  locking/atomic: scripts: simplify raw_atomic_long*() definitions
  locking/atomic: scripts: split pfx/name/sfx/order
  locking/atomic: scripts: restructure fallback ifdeffery
  locking/atomic: scripts: build raw_atomic_long*() directly
  locking/atomic: treewide: use raw_atomic*_<op>()
  locking/atomic: scripts: add trivial raw_atomic*_<op>()
  locking/atomic: scripts: factor out order template generation
  locking/atomic: scripts: remove leftover "${mult}"
  locking/atomic: scripts: remove bogus order parameter
  locking/atomic: xtensa: add preprocessor symbols
  locking/atomic: x86: add preprocessor symbols
  locking/atomic: sparc: add preprocessor symbols
  locking/atomic: sh: add preprocessor symbols
  ...
2023-06-27 14:14:30 -07:00
Linus Torvalds
ed3b7923a8 Scheduler changes for v6.5:
- Scheduler SMP load-balancer improvements:
 
     - Avoid unnecessary migrations within SMT domains on hybrid systems.
 
       Problem:
 
         On hybrid CPU systems, (processors with a mixture of higher-frequency
 	SMT cores and lower-frequency non-SMT cores), under the old code
 	lower-priority CPUs pulled tasks from the higher-priority cores if
 	more than one SMT sibling was busy - resulting in many unnecessary
 	task migrations.
 
       Solution:
 
         The new code improves the load balancer to recognize SMT cores with more
         than one busy sibling and allows lower-priority CPUs to pull tasks, which
         avoids superfluous migrations and lets lower-priority cores inspect all SMT
         siblings for the busiest queue.
 
     - Implement the 'runnable boosting' feature in the EAS balancer: consider CPU
       contention in frequency, EAS max util & load-balance busiest CPU selection.
 
       This improves CPU utilization for certain workloads, while leaves other key
       workloads unchanged.
 
 - Scheduler infrastructure improvements:
 
     - Rewrite the scheduler topology setup code by consolidating it
       into the build_sched_topology() helper function and building
       it dynamically on the fly.
 
     - Resolve the local_clock() vs. noinstr complications by rewriting
       the code: provide separate sched_clock_noinstr() and
       local_clock_noinstr() functions to be used in instrumentation code,
       and make sure it is all instrumentation-safe.
 
 - Fixes:
 
     - Fix a kthread_park() race with wait_woken()
 
     - Fix misc wait_task_inactive() bugs unearthed by the -rt merge:
        - Fix UP PREEMPT bug by unifying the SMP and UP implementations.
        - Fix task_struct::saved_state handling.
 
     - Fix various rq clock update bugs, unearthed by turning on the rq clock
       debugging code.
 
     - Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger by
       creating enough cgroups, by removing the warnign and restricting
       window size triggers to PSI file write-permission or CAP_SYS_RESOURCE.
 
     - Propagate SMT flags in the topology when removing degenerate domain
 
     - Fix grub_reclaim() calculation bug in the deadline scheduler code
 
     - Avoid resetting the min update period when it is unnecessary, in
       psi_trigger_destroy().
 
     - Don't balance a task to its current running CPU in load_balance(),
       which was possible on certain NUMA topologies with overlapping
       groups.
 
     - Fix the sched-debug printing of rq->nr_uninterruptible
 
 - Cleanups:
 
     - Address various -Wmissing-prototype warnings, as a preparation
       to (maybe) enable this warning in the future.
 
     - Remove unused code
 
     - Mark more functions __init
 
     - Fix shadow-variable warnings
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSatWQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j62xAAuGOx1LcDfRGC6WGQzp1zOdlsVQtnDvlS
 qL58zYSHgizprpVQ3j87SBaG4CHCdvd2Bo36yW0lNZS4nd203qdq7fkrMb3hPP/w
 egUQUzMegf5fF6BWldKeMjuHSt+twFQz/ZAKK8iSbAir6CHNAqbNst1oL0i/+Tyk
 o33hBs1hT5tnbFb1NSVZkX4k+qT3LzTW4K2QgjjGtkScr6yHh2BdEVefyigWOjdo
 9s02d00ll9a2r+F5txlN7Dnw6TN7rmTXGMOJU5bZvBE90/anNiAorMXHJdEKCyUR
 u9+JtBdJWiCplGa/tSRcxT16ZW1VdtTnd9q66TDhXREd2UNDFqBEyg5Wl77K4Tlf
 vKFajmj/to+cTbuv6m6TVR+zyXpdEpdL6F04P44U3qiJvDobBqeDNKHHIqpmbHXl
 AXUXcPWTVAzXX1Ce5M+BeAgTBQ1T7C5tELILrTNQHJvO1s9VVBRFZ/l65Ps4vu7T
 wIZ781IFuopk0zWqHovNvgKrJ7oFmOQQZFttQEe8n6nafkjI7u+IZ8FayiGaUMRr
 4GawFGUCEdYh8z9qyslGKe8Q/Rphfk6hxMFRYUJpDmubQ0PkMeDjDGq77jDGl1PF
 VqwSDEyOaBJs7Gqf/mem00JtzBmXhkhm1SEjggHMI2IQbr/eeBXoLQOn3CDapO/N
 PiDbtX760ic=
 =EWQA
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Scheduler SMP load-balancer improvements:

   - Avoid unnecessary migrations within SMT domains on hybrid systems.

     Problem:

        On hybrid CPU systems, (processors with a mixture of
        higher-frequency SMT cores and lower-frequency non-SMT cores),
        under the old code lower-priority CPUs pulled tasks from the
        higher-priority cores if more than one SMT sibling was busy -
        resulting in many unnecessary task migrations.

     Solution:

        The new code improves the load balancer to recognize SMT cores
        with more than one busy sibling and allows lower-priority CPUs
        to pull tasks, which avoids superfluous migrations and lets
        lower-priority cores inspect all SMT siblings for the busiest
        queue.

   - Implement the 'runnable boosting' feature in the EAS balancer:
     consider CPU contention in frequency, EAS max util & load-balance
     busiest CPU selection.

     This improves CPU utilization for certain workloads, while leaves
     other key workloads unchanged.

  Scheduler infrastructure improvements:

   - Rewrite the scheduler topology setup code by consolidating it into
     the build_sched_topology() helper function and building it
     dynamically on the fly.

   - Resolve the local_clock() vs. noinstr complications by rewriting
     the code: provide separate sched_clock_noinstr() and
     local_clock_noinstr() functions to be used in instrumentation code,
     and make sure it is all instrumentation-safe.

  Fixes:

   - Fix a kthread_park() race with wait_woken()

   - Fix misc wait_task_inactive() bugs unearthed by the -rt merge:
       - Fix UP PREEMPT bug by unifying the SMP and UP implementations
       - Fix task_struct::saved_state handling

   - Fix various rq clock update bugs, unearthed by turning on the rq
     clock debugging code.

   - Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger
     by creating enough cgroups, by removing the warnign and restricting
     window size triggers to PSI file write-permission or
     CAP_SYS_RESOURCE.

   - Propagate SMT flags in the topology when removing degenerate domain

   - Fix grub_reclaim() calculation bug in the deadline scheduler code

   - Avoid resetting the min update period when it is unnecessary, in
     psi_trigger_destroy().

   - Don't balance a task to its current running CPU in load_balance(),
     which was possible on certain NUMA topologies with overlapping
     groups.

   - Fix the sched-debug printing of rq->nr_uninterruptible

  Cleanups:

   - Address various -Wmissing-prototype warnings, as a preparation to
     (maybe) enable this warning in the future.

   - Remove unused code

   - Mark more functions __init

   - Fix shadow-variable warnings"

* tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()
  sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
  sched/core: Fixed missing rq clock update before calling set_rq_offline()
  sched/deadline: Update GRUB description in the documentation
  sched/deadline: Fix bandwidth reclaim equation in GRUB
  sched/wait: Fix a kthread_park race with wait_woken()
  sched/topology: Mark set_sched_topology() __init
  sched/fair: Rename variable cpu_util eff_util
  arm64/arch_timer: Fix MMIO byteswap
  sched/fair, cpufreq: Introduce 'runnable boosting'
  sched/fair: Refactor CPU utilization functions
  cpuidle: Use local_clock_noinstr()
  sched/clock: Provide local_clock_noinstr()
  x86/tsc: Provide sched_clock_noinstr()
  clocksource: hyper-v: Provide noinstr sched_clock()
  clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
  x86/vdso: Fix gettimeofday masking
  math64: Always inline u128 version of mul_u64_u64_shr()
  s390/time: Provide sched_clock_noinstr()
  loongarch: Provide noinstr sched_clock_read()
  ...
2023-06-27 14:03:21 -07:00
Linus Torvalds
7cffdbe360 Updates for the x86 boot process:
- Initialize FPU late.
 
    Right now FPU is initialized very early during boot. There is no real
    requirement to do so. The only requirement is to have it done before
    alternatives are patched.
 
    That's done in check_bugs() which does way more than what the function
    name suggests.
 
    So first rename check_bugs() to arch_cpu_finalize_init() which makes it
    clear what this is about.
 
    Move the invocation of arch_cpu_finalize_init() earlier in
    start_kernel() as it has to be done before fork_init() which needs to
    know the FPU register buffer size.
 
    With those prerequisites the FPU initialization can be moved into
    arch_cpu_finalize_init(), which removes it from the early and fragile
    part of the x86 bringup.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSZdNYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaNBEACWtVd1uhqQldIFgSvZYujsrWXlmkU+
 pok6gDzKQNwZADiXW/tn5fP8SBLWT0pgLM9d+oZ5mEaLaOW7HcZLEHcVrn74e3TT
 53xN8e1zCzyjCJ/x22vrKH4sn/bU+bQyzSNVu9Disqn9Fl+ts37FqAHDv/ExbneD
 DaYXXCLgQsyGbPLD8B7yGOpJTGBUTJxNQS1ZFElBaRsAaw0mYZOEoPvuTFK4o7Uz
 GUB2vGefmeNfX+EgLYKG9QoS0F3SMS9X2IYswy1H76ZnV/eXmTsA1S3u3X9yX7kC
 XBnPtCC+iX+7o3xFkTpa0oQUdzEyGOItExZZgce6jEQu4Fl7NoIJxhlMg9/Y+vcF
 ntipEKSWFLAi1GkZzeKRwSSsoWqRaFxOKLy8qhn9kud09k+UtMBkNrF1CSp9laAz
 QParu3B1oHPEzx/jS0bSOCMN+AQZH8rX7LxRp4kpBOeBSZNCnfaBUzfIvmccPls+
 EJTO/0JUpRm5LsPSDiJhypPRoOOIP26IloR6OoZTcI3p76NrnYblRvisvuFAgDU6
 bk7Belf+GDx0kBZugqQgok7nDaHIBR7vEmca1NV8507UrffVyxLAiI4CiWPcFdOq
 ovhO8K+gP4xvzZx4cXZBwYwusjvl/oxKy8yQiGgoftDiWU4sdUCSrwX3x27+hUYL
 2P1OLDOXSGwESQ==
 =yxMj
 -----END PGP SIGNATURE-----

Merge tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Thomas Gleixner:
 "Initialize FPU late.

  Right now FPU is initialized very early during boot. There is no real
  requirement to do so. The only requirement is to have it done before
  alternatives are patched.

  That's done in check_bugs() which does way more than what the function
  name suggests.

  So first rename check_bugs() to arch_cpu_finalize_init() which makes
  it clear what this is about.

  Move the invocation of arch_cpu_finalize_init() earlier in
  start_kernel() as it has to be done before fork_init() which needs to
  know the FPU register buffer size.

  With those prerequisites the FPU initialization can be moved into
  arch_cpu_finalize_init(), which removes it from the early and fragile
  part of the x86 bringup"

* tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build
  x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
  x86/fpu: Mark init functions __init
  x86/fpu: Remove cpuinfo argument from init functions
  x86/init: Initialize signal frame size late
  init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
  init: Invoke arch_cpu_finalize_init() earlier
  init: Remove check_bugs() leftovers
  um/cpu: Switch to arch_cpu_finalize_init()
  sparc/cpu: Switch to arch_cpu_finalize_init()
  sh/cpu: Switch to arch_cpu_finalize_init()
  mips/cpu: Switch to arch_cpu_finalize_init()
  m68k/cpu: Switch to arch_cpu_finalize_init()
  loongarch/cpu: Switch to arch_cpu_finalize_init()
  ia64/cpu: Switch to arch_cpu_finalize_init()
  ARM: cpu: Switch to arch_cpu_finalize_init()
  x86/cpu: Switch to arch_cpu_finalize_init()
  init: Provide arch_cpu_finalize_init()
2023-06-26 13:39:10 -07:00
Tiezhu Yang
8386f58f8d asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch
Now we specify the minimal version of GCC as 5.1 and Clang/LLVM as 11.0.0
in Documentation/process/changes.rst, __CHAR_BIT__ and __SIZEOF_LONG__ are
usable, it is probably fine to unify the definition of __BITS_PER_LONG as
(__CHAR_BIT__ * __SIZEOF_LONG__) in asm-generic uapi bitsperlong.h.

In order to keep safe and avoid regression, only unify uapi bitsperlong.h
for some archs such as arm64, riscv and loongarch which are using newer
toolchains that have the definitions of __CHAR_BIT__ and __SIZEOF_LONG__.

Suggested-by: Xi Ruoyao <xry111@xry111.site>
Link: https://lore.kernel.org/all/d3e255e4746de44c9903c4433616d44ffcf18d1b.camel@xry111.site/
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/linux-arch/a3a4f48a-07d4-4ed9-bc53-5d383428bdd2@app.fastmail.com/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-06-22 17:04:36 +02:00
Donglin Peng
5779e3c0f5 LoongArch: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
The previous patch ("function_graph: Support recording and printing
the return value of function") has laid the groundwork for the for
the funcgraph-retval, and this modification makes it available on
the LoongArch platform.

We introduce a new structure called fgraph_ret_regs for the LoongArch
platform to hold return registers and the frame pointer. We then fill
its content in the return_to_handler and pass its address to the
function ftrace_return_to_handler to record the return value.

Link: https://lkml.kernel.org/r/c5462255e435fab363895c2d7433bc0f5a140411.1680954589.git.pengdonglin@sangfor.com.cn

Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-06-20 18:38:38 -04:00
Dave Airlie
cce3b573a5 Linux 6.4-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmSPcdMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGWrQH/3KmuvZsWMC4PpJY
 VcF9VfF9i+Zv7DoG8sjD5VpNh47e87RsR6WNOFnKol5SUrM6vsBAb5i2rfQahNIv
 NSj0fPCE4/Nj9LMecKVC9WD8CitxYdbR+CF9Is21AQj1VihUl9eHXGcAWxuaMyhk
 TjPUwmbOOsRVMXXdGJzjX78cvLsxqpSv8A/5OTh16IBimbh7p+YjKJFkbfj/PMWf
 aF1quFkIEXgzJcHCpP6KDZHr2KbpY+jIN9hUENnGKJxHYNso5u+KrIW1kAm8meP1
 x26ETSquM0T70OAzovOWg+BeVkLDac/3Rh30ztLAI4AtajrlSzycvFsU9UNEJCc2
 BnM2IZI=
 =ANT5
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next

Linux 6.4-rc7

Need this to pull in the msm work.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-06-19 16:01:25 +10:00
Thomas Gleixner
9841c42316 loongarch/cpu: Switch to arch_cpu_finalize_init()
check_bugs() is about to be phased out. Switch over to the new
arch_cpu_finalize_init() implementation.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.195288218@linutronix.de
2023-06-16 10:15:59 +02:00
Qi Hu
346dc92962 LoongArch: Fix the write_fcsr() macro
The "write_fcsr()" macro uses wrong the positions for val and dest in
asm. Fix it!

Reported-by: Miao HAO <haomiao19@mails.ucas.ac.cn>
Signed-off-by: Qi Hu <huqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-15 14:35:52 +08:00
Hongchen Zhang
ddc1729b07 LoongArch: Let pmd_present() return true when splitting pmd
When we split a pmd into ptes, pmd_present() and pmd_trans_huge() should
return true, otherwise it would be treated as a swap pmd.

This is the same as arm64 does in commit b65399f611 ("arm64/mm: Change
THP helpers to comply with generic MM semantics"), we also add a new bit
named _PAGE_PRESENT_INVALID for LoongArch.

Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-06-15 14:35:52 +08:00
Peter Zijlstra
6b10fef09f loongarch: Provide noinstr sched_clock_read()
With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_read() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102715.502547082@infradead.org
2023-06-05 21:11:05 +02:00
Mark Rutland
ef558b4b7b locking/atomic: treewide: delete arch_atomic_*() kerneldoc
Currently several architectures have kerneldoc comments for
arch_atomic_*(), which is unhelpful as these live in a shared namespace
where they clash, and the arch_atomic_*() ops are now an implementation
detail of the raw_atomic_*() ops, which no-one should use those
directly.

Delete the kerneldoc comments for arch_atomic_*(), along with
pseudo-kerneldoc comments which are in the correct style but are missing
the leading '/**' necessary to be true kerneldoc comments.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-28-mark.rutland@arm.com
2023-06-05 09:57:24 +02:00
Mark Rutland
d12157efc8 locking/atomic: make atomic*_{cmp,}xchg optional
Most architectures define the atomic/atomic64 xchg and cmpxchg
operations in terms of arch_xchg and arch_cmpxchg respectfully.

Add fallbacks for these cases and remove the trivial cases from arch
code. On some architectures the existing definitions are kept as these
are used to build other arch_atomic*() operations.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-5-mark.rutland@arm.com
2023-06-05 09:57:14 +02:00
Thomas Zimmermann
20d54e48d9 fbdev: Rename fb_mem*() helpers
Update the names of the fb_mem*() helpers to be consistent with their
regular counterparts. Hence, fb_memset() now becomes fb_memset_io(),
fb_memcpy_fromfb() now becomes fb_memcpy_fromio() and fb_memcpy_tofb()
becomes fb_memcpy_toio(). No functional changes.

v6:
	* update new file fb_io_fops.c

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Sui Jingfeng <suijingfeng@loongson.cn>
Link: https://patchwork.freedesktop.org/patch/msgid/20230512102444.5438-8-tzimmermann@suse.de
2023-05-18 11:07:54 +02:00
Thomas Zimmermann
8f8eaa1b02 fbdev: Move framebuffer I/O helpers into <asm/fb.h>
Implement framebuffer I/O helpers, such as fb_read*() and fb_write*(),
in the architecture's <asm/fb.h> header file or the generic one.

The common case has been the use of regular I/O functions, such as
__raw_readb() or memset_io(). A few architectures used plain system-
memory reads and writes. Sparc used helpers for its SBus.

The architectures that used special cases provide the same code in
their __raw_*() I/O helpers. So the patch replaces this code with the
__raw_*() functions and moves it to <asm-generic/fb.h> for all
architectures.

v8:
	* remove garbage after commit-message tags
v6:
	* fix fb_readq()/fb_writeq() on 64-bit mips (kernel test robot)
v5:
	* include <linux/io.h> in <asm-generic/fb>; fix s390 build
v4:
	* ia64, loongarch, sparc64: add fb_mem*() to arch headers
	  to keep current semantics (Arnd)
v3:
	* implement all architectures with generic helpers
	* support reordering and native byte order (Geert, Arnd)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Tested-by: Sui Jingfeng <suijingfeng@loongson.cn>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230512102444.5438-7-tzimmermann@suse.de
2023-05-18 11:07:25 +02:00
Maxime Ripard
ff32fcca64
Merge drm/drm-next into drm-misc-next
Start the 6.5 release cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2023-05-09 15:03:40 +02:00
Linus Torvalds
b115d85a95 Locking changes in v6.4:
- Introduce local{,64}_try_cmpxchg() - a slightly more optimal
    primitive, which will be used in perf events ring-buffer code.
 
  - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation.
 
  - Misc cleanups/fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRUvUoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hlIhAArP33rTKi+HAndQ3UHW3XtmHRxEEQTfiE
 wvIoN89h58QW4DGMeAV4ltafbIPQAkI233Aogwz903L0qbDV0Ro4OU3XJembRuWl
 LeOADKwYyypXdOa8XICuY9aIP7e1/h0DF3ySs7inLcwK9JCyAIxnsVHYej+hsRXA
 kZoXN98T3TR1C0V9UQy4SU3HI1lC3tsG3R9Ti9TnYUg3ygVXhRE9lOQ4kv9lFPVz
 BNuj2Blj7KNiVaY9kehrhO54THI7NmsCVZO44Rcl48I0KAcFulAmFcNlE7GnR8Nj
 thj38pU6XAFVHXG8MYjgE+Al+PnK48NtJxexCtHyGvGG4D2aLzRMnkolxAUCcVuK
 G+UBsQm3ybjYgHgt1zuN6ehcpT+5tULkDH8JA7vrgZYaVgxHzsUaHgYfCCWKnmUY
 mPR6aImEmYZwZVNLskhe0HT4mq244bp+VnWlnJ6LZK7t/itenvDhqnj7KTi4Bfej
 lTHplOTitV/8uCEW8V4pX+YTEenVsIQmTc/G3iIabXP/6HzLffA3q4vyW6vKIErE
 pqrpuFA0Z4GB+pU0mJXt7+I7zscDVthwI055jDyQBjA7IcdVGm2MjQ6xcNRW5FYN
 UynvaEMocue4ZO4WdFsd1ZBUd9VfoNzGQspBw46DhCL1MEQBYv36SKQNjej/9aRr
 ilVwqnOWI2s=
 =mM0A
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Introduce local{,64}_try_cmpxchg() - a slightly more optimal
   primitive, which will be used in perf events ring-buffer code

 - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation

 - Misc cleanups/fixes

* tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomic: Correct (cmp)xchg() instrumentation
  locking/x86: Define arch_try_cmpxchg_local()
  locking/arch: Wire up local_try_cmpxchg()
  locking/generic: Wire up local{,64}_try_cmpxchg()
  locking/atomic: Add generic try_cmpxchg{,64}_local() support
  locking/rwbase: Mitigate indefinite writer starvation
  locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-05-05 12:56:55 -07:00
Linus Torvalds
611c9d8830 LoongArch changes for v6.4
1, Better backtraces for humanization;
 2, Relay BCE exceptions to userland as SIGSEGV;
 3, Provide kernel fpu functions;
 4, Optimize memory ops (memset/memcpy/memmove);
 5, Optimize checksum and crc32(c) calculation;
 6, Add ARCH_HAS_FORTIFY_SOURCE selection;
 7, Add function error injection support;
 8, Add ftrace with direct call support;
 9, Add basic perf tools support.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmRQlUsWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImekCTD/9fc2U+FIXhJOWV5yK9TCjJTRnK
 ASvk0JMYIDA60+fnof3C85tDu9Py9M5Mvt/Ec5pBaHErn16irq85AdD74/OmyCc2
 V4pRFHbYLu0WBFQN77gfNXH0XErgYXdceZvaMXajVz2H6NlSKSWZOVN/9ut5SLi3
 mt0rCwCsyahj92n8+hOjjZeFbDaPfPMCQ/8n9dnadhbBm9iz35fOKY+qIBHJMJ9a
 wPfZ2k3wu5DHs/2+ZjFNhlwrlURTp3RlcVQ7QWDcR1LM3Z4/lEkD8tAI/r8sR9gw
 rxzoBSaQzo/zscUmYo0jh1BoW2w0n+x/GfH70Pyz3iwZky3jwpdP0nRwnB4h+tnE
 wKlpa5K7RfaqUxZExFfGALmlkALtjQgiXPYbORHMsD6l6XwrOMCeyQismm1oo66m
 JBlsdXCms5aracYmWhXnVmTlBqGjAgYAxm62ap62uwlmULy4qUv6kFeW0fERn9NJ
 5bKgbrkcal/WkMBawQqtG03niRkykqpqFooZ95ubj4Lib4VM0BmEvFrREjgXO7AE
 jpLimYsT9ROE3YQJqyWyLYkmc2ShwWj70INTpz2viMtQ2blIRKvRVsxs976bHuwS
 mGsZtiiANjhT2bAUhN7bct2Cf13MtPXiuf0etcJbrNSAtoBIFk+3uRRKHH2rM+CK
 oKYjO+exPyuQ9nSOBg==
 =3aTV
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Better backtraces for humanization

 - Relay BCE exceptions to userland as SIGSEGV

 - Provide kernel fpu functions

 - Optimize memory ops (memset/memcpy/memmove)

 - Optimize checksum and crc32(c) calculation

 - Add ARCH_HAS_FORTIFY_SOURCE selection

 - Add function error injection support

 - Add ftrace with direct call support

 - Add basic perf tools support

* tag 'loongarch-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (24 commits)
  tools/perf: Add basic support for LoongArch
  LoongArch: ftrace: Add direct call trampoline samples support
  LoongArch: ftrace: Add direct call support
  LoongArch: ftrace: Implement ftrace_find_callable_addr() to simplify code
  LoongArch: ftrace: Fix build error if DYNAMIC_FTRACE_WITH_REGS is not set
  LoongArch: ftrace: Abstract DYNAMIC_FTRACE_WITH_ARGS accesses
  LoongArch: Add support for function error injection
  LoongArch: Add ARCH_HAS_FORTIFY_SOURCE selection
  LoongArch: crypto: Add crc32 and crc32c hw acceleration
  LoongArch: Add checksum optimization for 64-bit system
  LoongArch: Optimize memory ops (memset/memcpy/memmove)
  LoongArch: Provide kernel fpu functions
  LoongArch: Relay BCE exceptions to userland as SIGSEGV with si_code=SEGV_BNDERR
  LoongArch: Tweak the BADV and CPUCFG.PRID lines in show_regs()
  LoongArch: Humanize the ESTAT line when showing registers
  LoongArch: Humanize the ECFG line when showing registers
  LoongArch: Humanize the EUEN line when showing registers
  LoongArch: Humanize the PRMD line when showing registers
  LoongArch: Humanize the CRMD line when showing registers
  LoongArch: Fix format of CSR lines during show_regs()
  ...
2023-05-04 12:40:16 -07:00
Youling Tang
9cdc3b6a29 LoongArch: ftrace: Add direct call support
Select the HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS to provide the
register_ftrace_direct[_multi] interfaces allowing users to register
the customed trampoline (direct_caller) as the mcount for one or more
target functions. And modify_ftrace_direct[_multi] are also provided
for modifying direct_caller.

There are a few cases to distinguish:
- If a direct call ops is the only one tracing a function AND the direct
  called trampoline is within the reach of a 'bl' instruction
  -> the ftrace patchsite jumps to the trampoline
- Else
  -> the ftrace patchsite jumps to the ftrace_regs_caller trampoline points
     to ftrace_list_ops so it iterates over all registered ftrace ops,
     including the direct call ops and calls its call_direct_funcs handler
     which stores the direct called trampoline's address in the ftrace_regs
     and the ftrace_regs_caller trampoline will return to that address
     instead of returning to the traced function

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:53 +08:00
Qing Zhang
6fbff14a63 LoongArch: ftrace: Abstract DYNAMIC_FTRACE_WITH_ARGS accesses
Add new ftrace_regs_{get,set}_*() helpers which can be used to manipulate
ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y, these can always
be used on any ftrace_regs, and when CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
=n these can be used when regs are available.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:52 +08:00
Tiezhu Yang
8b5ee2c66d LoongArch: Add support for function error injection
Inspired by the commit 42d038c4fb ("arm64: Add support for function
error injection") and the commit ee55ff803b ("riscv: Add support for
function error injection"), this patch supports function error injection
for LoongArch.

Mainly implement two functions:
(1) regs_set_return_value() which is used to overwrite the return value,
(2) override_function_with_return() which is used to override the probed
function returning and jump to its caller.

Here is a simple test under CONFIG_FUNCTION_ERROR_INJECTION and
CONFIG_FAIL_FUNCTION:

  # echo sys_clone > /sys/kernel/debug/fail_function/inject
  # echo 100 > /sys/kernel/debug/fail_function/probability
  # dmesg
  bash: fork: Invalid argument
  # dmesg
  ...
  FAULT_INJECTION: forcing a failure.
  name fail_function, interval 1, probability 100, space 0, times 1
  ...
  Call Trace:
  [<90000000002238f4>] show_stack+0x5c/0x180
  [<90000000012e384c>] dump_stack_lvl+0x60/0x88
  [<9000000000b1879c>] should_fail_ex+0x1b0/0x1f4
  [<900000000032ead4>] fei_kprobe_handler+0x28/0x6c
  [<9000000000230970>] kprobe_breakpoint_handler+0xf0/0x118
  [<90000000012e3e60>] do_bp+0x2c4/0x358
  [<9000000002241924>] exception_handlers+0x1924/0x10000
  [<900000000023b7d0>] sys_clone+0x0/0x4
  [<90000000012e4744>] do_syscall+0x7c/0x94
  [<9000000000221e44>] handle_syscall+0xc4/0x160

Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:52 +08:00
Bibo Mao
69e3a6aa6b LoongArch: Add checksum optimization for 64-bit system
LoongArch platform is 64-bit system, which supports 8-bytes memory
accessing, but generic checksum functions use 4-byte memory access.
So add 8-bytes memory access optimization for checksum functions on
LoongArch. And the code comes from arm64 system.

When network hw checksum is disabled, iperf performance improves about
10% with this patch.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:43 +08:00
Huacai Chen
2b3bd32ea3 LoongArch: Provide kernel fpu functions
Provide kernel_fpu_begin()/kernel_fpu_end() to allow the kernel itself
to use fpu. They can be used by some other kernel components, e.g., the
AMDGPU graphic driver for DCN.

Reported-by: WANG Xuerui <kernel@xen0n.name>
Tested-by: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:27 +08:00
WANG Xuerui
c23e7f01cf LoongArch: Relay BCE exceptions to userland as SIGSEGV with si_code=SEGV_BNDERR
SEGV_BNDERR was introduced initially for supporting the Intel MPX, but
fell into disuse after the MPX support was removed. The LoongArch
bounds-checking instructions behave very differently than MPX, but
overall the interface is still kind of suitable for conveying the
information to userland when bounds-checking assertions trigger, so we
wouldn't have to invent more UAPI. Specifically, when the BCE triggers,
a SEGV_BNDERR is sent to userland, with si_addr set to the out-of-bounds
address or value (in asrt{gt,le}'s case), and one of si_lower or
si_upper set to the configured bound depending on the faulting
instruction. The other bound is set to either 0 or ULONG_MAX to resemble
a range with both lower and upper bounds.

Note that it is possible to have si_addr == si_lower in case of a
failing asrtgt or {ld,st}gt, because those instructions test for strict
greater-than relationship. This should not pose a problem for userland,
though, because the faulting PC is available for the application to
associate back to the exact instruction for figuring out the
expectation.

Example exception context generated by a faulting `asrtgt.d t0, t1`
(assert t0 > t1 or BCE) with t0=100 and t1=200:

> pc 00005555558206a4 ra 00007ffff2d854fc tp 00007ffff2f2f180 sp 00007ffffbf9fb80
> a0 0000000000000002 a1 00007ffffbf9fce8 a2 00007ffffbf9fd00 a3 00007ffff2ed4558
> a4 0000000000000000 a5 00007ffff2f044c8 a6 00007ffffbf9fce0 a7 fffffffffffff000
> t0 0000000000000064 t1 00000000000000c8 t2 00007ffffbfa2d5e t3 00007ffff2f12aa0
> t4 00007ffff2ed6158 t5 00007ffff2ed6158 t6 000000000000002e t7 0000000003d8f538
> t8 0000000000000005 u0 0000000000000000 s9 0000000000000000 s0 00007ffffbf9fce8
> s1 0000000000000002 s2 0000000000000000 s3 00007ffff2f2c038 s4 0000555555820610
> s5 00007ffff2ed5000 s6 0000555555827e38 s7 00007ffffbf9fd00 s8 0000555555827e38
>    ra: 00007ffff2d854fc
>   ERA: 00005555558206a4
>  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>  PRMD: 00000007 (PPLV3 +PIE -PWE)
>  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
>  ECFG: 0007181c (LIE=2-4,11-12 VS=7)
> ESTAT: 000a0000 [BCE] (IS= ECode=10 EsubCode=0)
>  PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:27 +08:00
WANG Xuerui
aa552254cf LoongArch: Define regular names for BCE/WATCH/HVC/GSPR exceptions
Define them according to the ISA manual, in order to enable matching the
sub-exceptions for humanization purposes later.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:10 +08:00
WANG Xuerui
9e36fa4299 LoongArch: Clean up the architectural interrupt definitions
While interrupts are assigned ECodes `64 + interrupt number`, all
existing use sites of interrupt numbers want the 64 subtracted.
Re-arrange the definitions so that the actual interrupt number is used
everywhere, and make EXCCODE_INT_END inclusive as it is more intuitive
that way.

While at it, according to the asm/loongarch.h definitions, the total
number of architectural interrupts should be 14, but various other
places indicate otherwise (13 or 15). Those places have been adjusted
to 14 as well for consistency.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:10 +08:00
Uros Bizjak
d994f2c8e2 locking/arch: Wire up local_try_cmpxchg()
Implement target specific support for local_try_cmpxchg()
and local_cmpxchg() using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230405141710.3551-4-ubizjak@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:09:16 +02:00
Andrzej Hajda
068550631f locking/arch: Rename all internal __xchg() names to __arch_xchg()
Decrease the probability of this internal facility to be used by
driver code.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [riscv]
Link: https://lore.kernel.org/r/20230118154450.73842-1-andrzej.hajda@intel.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:08:44 +02:00
Linus Torvalds
2aff7c706c Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did
    this inconsistently follow this new, common convention, and fix all the fallout
    that objtool can now detect statically.
 
  - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity,
    split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it.
 
  - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code.
 
  - Generate ORC data for __pfx code
 
  - Add more __noreturn annotations to various kernel startup/shutdown/panic functions.
 
  - Misc improvements & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM
 pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul
 TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd
 LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP
 c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT
 /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a
 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1
 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ
 SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p
 vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo
 x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi
 fRho4CqzrtM=
 =NwEV
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Thomas Zimmermann
84998fc1c3 arch/loongarch: Implement <asm/fb.h> with generic helpers
Replace the architecture's fbdev helpers with the generic
ones from <asm-generic/fb.h>. No functional changes.

v2:
	* use default implementation for fb_pgprotect() (Arnd)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230417125651.25126-7-tzimmermann@suse.de
2023-04-20 10:04:33 +02:00
Enze Li
213ef669d1 LoongArch: Replace hard-coded values in comments with VALEN
According to LoongArch documentation [1], CSR.PGDL and CSR.PGDH are
concerned with the VA's MSB which is VALEN-1 instead of always being 47.
Fix comments to avoid misleading others.

[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#page-global-directory-base-address-for-lower-half-address-space

Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Enze Li <lienze@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-19 12:07:27 +08:00
Tiezhu Yang
afca6e0649 LoongArch: Clean up plat_swiotlb_setup() related code
After commit c78c43fe7d ("LoongArch: Use acpi_arch_dma_setup() and
remove ARCH_HAS_PHYS_TO_DMA"), plat_swiotlb_setup() has been deleted,
so clean up the related code.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-19 12:07:27 +08:00
Qing Zhang
ff9f3d7aef LoongArch: Adjust user_watch_state for explicit alignment
This is done in order to easily calculate the number of breakpoints in
hw_break_get()/hw_break_set().

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-19 12:07:27 +08:00
Huacai Chen
93eb1215ed LoongArch: module: set section addresses to 0x0
These got*, plt* and .text.ftrace_trampoline sections specified for
LoongArch have non-zero addressses. Non-zero section addresses in a
relocatable ELF would confuse GDB when it tries to compute the section
offsets and it ends up printing wrong symbol addresses. Therefore, set
them to zero, which mirrors the change in commit 5d8591bc0f
("arm64 module: set plt* section addresses to 0x0").

Cc: stable@vger.kernel.org
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-18 19:38:58 +08:00
Qing Zhang
6637775ca3 LoongArch: Fix _CONST64_(x) as unsigned
Addresses should all be of unsigned type to avoid unnecessary conversions.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-18 19:38:58 +08:00
Huacai Chen
1cf62488f5 LoongArch: Fix build error if CONFIG_SUSPEND is not set
We can see the following build error on LoongArch if CONFIG_SUSPEND is
not set:

  ld: drivers/acpi/sleep.o: in function 'acpi_pm_prepare':
  sleep.c:(.text+0x2b8): undefined reference to 'loongarch_wakeup_start'

Here is the call trace:

  acpi_pm_prepare()
    __acpi_pm_prepare()
      acpi_sleep_prepare()
        acpi_get_wakeup_address()
          loongarch_wakeup_start()

Root cause: loongarch_wakeup_start() is defined in arch/loongarch/power/
suspend_asm.S which is only built under CONFIG_SUSPEND. In order to fix
the build error, just let acpi_get_wakeup_address() return 0 if CONFIG_
SUSPEND is not set.

Fixes: 366bb35a8e ("LoongArch: Add suspend (ACPI S3) support")
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/all/11215033-fa3c-ecb1-2fc0-e9aeba47be9b@infradead.org/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-18 19:38:58 +08:00
Huacai Chen
df83033604 LoongArch: Fix probing of the CRC32 feature
Not all LoongArch processors support CRC32 instructions. This feature
is indicated by CPUCFG1.CRC32 (Bit25) but it is wrongly defined in the
previous versions of the ISA manual (and so does in loongarch.h). The
CRC32 feature is set unconditionally now, so fix it.

BTW, expose the CRC32 feature in /proc/cpuinfo.

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-18 19:38:58 +08:00
Huacai Chen
16c52e5030 LoongArch: Make WriteCombine configurable for ioremap()
LoongArch maintains cache coherency in hardware, but when paired with
LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
to WriteCombine) is out of the scope of cache coherency machanism for
PCIe devices (this is a PCIe protocol violation, which may be fixed in
newer chipsets).

This means WUC can only used for write-only memory regions now, so this
option is disabled by default, making WUC silently fallback to SUC for
ioremap(). You can enable this option if the kernel is ensured to run on
hardware without this bug.

Kernel parameter writecombine=on/off can be used to override the Kconfig
option.

Cc: stable@vger.kernel.org
Suggested-by: WANG Xuerui <kernel@xen0n.name>
Reviewed-by: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-18 19:38:58 +08:00
Josh Poimboeuf
6c0f2d071e loongarch/cpu: Mark play_dead() __noreturn
play_dead() doesn't return.  Annotate it as such.  By extension this
also makes arch_cpu_idle_dead() noreturn.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/4da55acfdec8a9132c4e21ffb7edb1f846841193.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-06 15:34:06 -08:00
Linus Torvalds
a8356cdb5b LoongArch changes for v6.3
1, Make -mstrict-align configurable;
 2, Add kernel relocation and KASLR support;
 3, Add single kernel image implementation for kdump;
 4, Add hardware breakpoints/watchpoints support;
 5, Add kprobes/kretprobes/kprobes_on_ftrace support;
 6, Add LoongArch support for some selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmP+9H0WHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImerz+D/98MjkLXM4qtgfAxuBKpVdEVA4U
 bzO19UlpqWlwTJbwrhf0GYsRrAis37PTVJG4eNORJairJ/oTkMtEEBPhwq0D9Whc
 URDEh+VrjzFztLsu2OlvzOA9gE7lpg+xAx2LKflP7ixlOELOWeercDLW3octp5/J
 CJDE8wPaw9tJrMHFWuiVybs03yZmY3YFV55JdWL9hY8Ryy4DY5997mruOfzjvHpl
 EfDgQM2zCn2JSQwaD+Kl3MHxHyRx07Tj2wnZAh9ptaGeptK/yplc7nqRwhe7BevS
 QwClhJNPICcOi+evZ7cDUY0PTL4evpw2KRnF1N4zw+58RhZECjVrCEJNdf6L1scj
 muptQngWKrE/TJvn4way3cJr44stSCtT71elPhn629S23my/CauMmFqCqKpYOPOf
 pxwzzCaqDcaZKwMu96qBkZS76tIrhoNeNFntj+C9RS+8ezY3+o144S3vF1A6A9Zb
 M4gwa2NiQuLqnCUwKK6dZkLQVX2NMIMViUkYNKdUStxNWx/K7fFmXcl0ycAFpGYp
 8Q95LLH34jUrpSgqMSCmcylsPvNiN1QnuXFnw8Tu+zDthp5dOzio60tORLPM1ZUq
 gobPeGjeTQInq4eMCf2B5HH8fOMVtJyj6H4K9G1M6HUMg64UtcBp6BvEbwPxTxNN
 sIOFUjDfDnBiIXWF4w==
 =SzL5
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

 - Make -mstrict-align configurable

 - Add kernel relocation and KASLR support

 - Add single kernel image implementation for kdump

 - Add hardware breakpoints/watchpoints support

 - Add kprobes/kretprobes/kprobes_on_ftrace support

 - Add LoongArch support for some selftests.

* tag 'loongarch-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (23 commits)
  selftests/ftrace: Add LoongArch kprobe args string tests support
  selftests/seccomp: Add LoongArch selftesting support
  tools: Add LoongArch build infrastructure
  samples/kprobes: Add LoongArch support
  LoongArch: Mark some assembler symbols as non-kprobe-able
  LoongArch: Add kprobes on ftrace support
  LoongArch: Add kretprobes support
  LoongArch: Add kprobes support
  LoongArch: Simulate branch and PC* instructions
  LoongArch: ptrace: Add hardware single step support
  LoongArch: ptrace: Add function argument access API
  LoongArch: ptrace: Expose hardware breakpoints to debuggers
  LoongArch: Add hardware breakpoints/watchpoints support
  LoongArch: kdump: Add crashkernel=YM handling
  LoongArch: kdump: Add single kernel image implementation
  LoongArch: Add support for kernel address space layout randomization (KASLR)
  LoongArch: Add support for kernel relocation
  LoongArch: Add la_abs macro implementation
  LoongArch: Add JUMP_VIRT_ADDR macro implementation to avoid using la.abs
  LoongArch: Use la.pcrel instead of la.abs when it's trivially possible
  ...
2023-03-01 09:27:00 -08:00