Commit Graph

7 Commits

Author SHA1 Message Date
Kanglong Wang
63dbd8fb2a LoongArch: Optimize module load time by optimizing PLT/GOT counting
When enabling CONFIG_KASAN, CONFIG_PREEMPT_VOLUNTARY_BUILD and
CONFIG_PREEMPT_VOLUNTARY at the same time, there will be soft deadlock,
the relevant logs are as follows:

rcu: INFO: rcu_sched self-detected stall on CPU
...
Call Trace:
[<900000000024f9e4>] show_stack+0x5c/0x180
[<90000000002482f4>] dump_stack_lvl+0x94/0xbc
[<9000000000224544>] rcu_dump_cpu_stacks+0x1fc/0x280
[<900000000037ac80>] rcu_sched_clock_irq+0x720/0xf88
[<9000000000396c34>] update_process_times+0xb4/0x150
[<90000000003b2474>] tick_nohz_handler+0xf4/0x250
[<9000000000397e28>] __hrtimer_run_queues+0x1d0/0x428
[<9000000000399b2c>] hrtimer_interrupt+0x214/0x538
[<9000000000253634>] constant_timer_interrupt+0x64/0x80
[<9000000000349938>] __handle_irq_event_percpu+0x78/0x1a0
[<9000000000349a78>] handle_irq_event_percpu+0x18/0x88
[<9000000000354c00>] handle_percpu_irq+0x90/0xf0
[<9000000000348c74>] handle_irq_desc+0x94/0xb8
[<9000000001012b28>] handle_cpu_irq+0x68/0xa0
[<9000000001def8c0>] handle_loongarch_irq+0x30/0x48
[<9000000001def958>] do_vint+0x80/0xd0
[<9000000000268a0c>] kasan_mem_to_shadow.part.0+0x2c/0x2a0
[<90000000006344f4>] __asan_load8+0x4c/0x120
[<900000000025c0d0>] module_frob_arch_sections+0x5c8/0x6b8
[<90000000003895f0>] load_module+0x9e0/0x2958
[<900000000038b770>] __do_sys_init_module+0x208/0x2d0
[<9000000001df0c34>] do_syscall+0x94/0x190
[<900000000024d6fc>] handle_syscall+0xbc/0x158

After analysis, this is because the slow speed of loading the amdgpu
module leads to the long time occupation of the cpu and then the soft
deadlock.

When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs/GOTs that will be needed to handle all the RELAs. It
will call the count_max_entries() to find in an out-of-order date which
counting algorithm has O(n^2) complexity.

To make it faster, we sort the relocation list by info and addend. That
way, to check for a duplicate relocation, it just needs to compare with
the previous entry. This reduces the complexity of the algorithm to O(n
 log n), as done in commit d4e0340919 ("arm64/module: Optimize module
load time by optimizing PLT counting"). This gives sinificant reduction
in module load time for modules with large number of relocations.

After applying this patch, the soft deadlock problem has been solved,
and the kernel starts normally without "Call Trace".

Using the default configuration to test some modules, the results are as
follows:

Module              Size
ip_tables           36K
fat                 143K
radeon              2.5MB
amdgpu              16MB

Without this patch:
Module              Module load time (ms)	Count(PLTs/GOTs)
ip_tables           18				59/6
fat                 0				162/14
radeon              54				1221/84
amdgpu              1411			4525/1098

With this patch:
Module              Module load time (ms)	Count(PLTs/GOTs)
ip_tables           18				59/6
fat                 0				162/14
radeon              22				1221/84
amdgpu              45				4525/1098

Fixes: fcdfe9d22b ("LoongArch: Add ELF and module support")
Signed-off-by: Kanglong Wang <wangkanglong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-20 22:23:44 +08:00
Bibo Mao
c718a0bad7 LoongArch: Fix some build warnings with W=1
There are some building warnings when building LoongArch kernel with W=1
as following, this patch fixes them.

arch/loongarch/kernel/acpi.c:284:13: warning: no previous prototype for ‘acpi_numa_arch_fixup’ [-Wmissing-prototypes]
  284 | void __init acpi_numa_arch_fixup(void) {}
      |             ^~~~~~~~~~~~~~~~~~~~
arch/loongarch/kernel/time.c:32:13: warning: no previous prototype for ‘constant_timer_interrupt’ [-Wmissing-prototypes]
   32 | irqreturn_t constant_timer_interrupt(int irq, void *data)
      |             ^~~~~~~~~~~~~~~~~~~~~~~~
arch/loongarch/kernel/traps.c:496:25: warning: no previous prototype for 'do_fpe' [-Wmissing-prototypes]
  496 | asmlinkage void noinstr do_fpe(struct pt_regs *regs
      |                         ^~~~~~
arch/loongarch/kernel/traps.c:813:22: warning: variable ‘opcode’ set but not used [-Wunused-but-set-variable]
  813 |         unsigned int opcode;
      |                      ^~~~~~
arch/loongarch/kernel/signal.c:895:14: warning: no previous prototype for ‘get_sigframe’ [-Wmissing-prototypes]
  895 | void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
      |              ^~~~~~~~~~~~
arch/loongarch/kernel/syscall.c:21:40: warning: initialized field overwritten [-Woverride-init]
   21 | #define __SYSCALL(nr, call)     [nr] = (call),
      |                                        ^
arch/loongarch/kernel/syscall.c:40:14: warning: no previous prototype for ‘do_syscall’ [-Wmissing-prototypes]
   40 | void noinstr do_syscall(struct pt_regs *regs)
      |              ^~~~~~~~~~
arch/loongarch/kernel/smp.c:502:17: warning: no previous prototype for ‘start_secondary’ [-Wmissing-prototypes]
  502 | asmlinkage void start_secondary(void)
      |                 ^~~~~~~~~~~~~~~
arch/loongarch/kernel/process.c:309:15: warning: no previous prototype for ‘arch_align_stack’ [-Wmissing-prototypes]
  309 | unsigned long arch_align_stack(unsigned long sp)
      |               ^~~~~~~~~~~~~~~~
arch/loongarch/kernel/topology.c:13:5: warning: no previous prototype for ‘arch_register_cpu’ [-Wmissing-prototypes]
   13 | int arch_register_cpu(int cpu)
      |     ^~~~~~~~~~~~~~~~~
arch/loongarch/kernel/topology.c:27:6: warning: no previous prototype for ‘arch_unregister_cpu’ [-Wmissing-prototypes]
   27 | void arch_unregister_cpu(int cpu)
      |      ^~~~~~~~~~~~~~~~~~~
arch/loongarch/kernel/module-sections.c:103:5: warning: no previous prototype for ‘module_frob_arch_sections’ [-Wmissing-prototypes]
  103 | int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~
arch/loongarch/mm/hugetlbpage.c:56:5: warning: no previous prototype for ‘is_aligned_hugepage_range’ [-Wmissing-prototypes]
   56 | int is_aligned_hugepage_range(unsigned long addr, unsigned long len)
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-20 14:26:28 +08:00
Qing Zhang
28ac0a9e04 LoongArch: modules/ftrace: Initialize PLT at load time
This patch implements ftrace trampolines through plt entry.

Tested by forcing ftrace_make_call() to use the module PLT, and then
loading up a module after setting up ftrace with:

| echo ":mod:<module-name>" > set_ftrace_filter;
| echo function > current_tracer;
| modprobe <module-name>

Since FTRACE_ADDR/FTRACE_REGS_ADDR is only defined when CONFIG_DYNAMIC_
FTRACE is selected, we wrap their usage in module_init_ftrace_plt() with
ifdeffery rather than using IS_ENABLED().

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-12-14 08:41:54 +08:00
Huacai Chen
9151dde403 LoongArch: module: Use got/plt section indices for relocations
Instead of saving a pointer to the .got, .plt and .plt_idx sections to
apply {got,plt}-based relocations, save and use their section indices
instead.

The mod->arch.{core,init}.{got,plt} pointers were problematic for live-
patch because they pointed within temporary section headers (provided by
the module loader via info->sechdrs) that would be freed after module
load. Since livepatch modules may need to apply relocations post-module-
load (for example, to patch a module that is loaded later), using section
indices to offset into the section headers (instead of accessing them
through a saved pointer) allows livepatch modules on LoongArch to pass
in their own copy of the section headers to apply_relocate_add() to
apply delayed relocations.

The method used is same as commit c8ebf64eab ("arm64/module: use plt
section indices for relocations").

Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-12-14 08:41:53 +08:00
Xi Ruoyao
59b3d4a9b0 LoongArch: Support R_LARCH_GOT_PC_{LO12,HI20} in modules
GCC >= 13 and GNU assembler >= 2.40 use these relocations to address
external symbols, so we need to add them.

Let the module loader emit GOT entries for data symbols so we would be
able to handle GOT relocations. The GOT entry is just the data's symbol
address.

In module.lds, emit a stub .got section for a section header entry. The
actual content of the section entry will be filled at runtime by module_
frob_arch_sections().

Tested-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-10-12 16:36:14 +08:00
Xi Ruoyao
9bd1e38032 LoongArch: Support PC-relative relocations in modules
Binutils >= 2.40 uses R_LARCH_B26 instead of R_LARCH_SOP_PUSH_PLT_PCREL,
and R_LARCH_PCALA* instead of R_LARCH_SOP_PUSH_PCREL.

Handle R_LARCH_B26 and R_LARCH_PCALA* in the module loader. For R_LARCH_
B26, also create a PLT entry as needed.

Tested-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-10-12 16:36:14 +08:00
Huacai Chen
fcdfe9d22b LoongArch: Add ELF and module support
Add ELF-related definition and module relocation code for basic
LoongArch support.

Cc: Jessica Yu <jeyu@kernel.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-06-03 20:09:28 +08:00