Compare commits

...

383 Commits

Author SHA1 Message Date
Wentao Guan
44f8c91dbc LoongArch: vDSO: Remove -nostdlib complier flag
Since $(LD) is directly used, hence -nostdlib is unneeded, MIPS has
removed this, we should remove it too.

bdbf2038fb ("MIPS: VDSO: remove -nostdlib compiler flag").

In fact, other architectures also use $(LD) now.

fe00e50b2d ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to link VDSO")
691efbedc6 ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO")
2ff906994b ("MIPS: VDSO: Use $(LD) instead of $(CC) to link VDSO")
2b2a25845d ("s390/vdso: Use $(LD) instead of $(CC) to link vDSO")

Cc: stable@vger.kernel.org
Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>

------
 arch/loongarch/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:55 +08:00
Binbin Zhou
ea48b7ff33 LoongArch: dts: Add eMMC/SDIO controller support to Loongson-2K2000
The Loongson-2K2000 integrates one eMMC controller and one SDIO controller.

The module is supported now, enable it.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:55 +08:00
Binbin Zhou
53944287ab LoongArch: dts: Add SDIO controller support to Loongson-2K1000
The Loongson-2K1000 integrates one SDIO controller for SD storage cards
and SDIO cards.

The module is supported now, enable it.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:55 +08:00
Binbin Zhou
a79db7e585 LoongArch: dts: Add SDIO controller support to Loongson-2K0500
The Loongson-2K0500 integrates two SDIO controllers for SD storage cards
and SDIO cards, supporting SD storage card boot.

The module is supported now, enable it.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:55 +08:00
Tiezhu Yang
70f34b6376 LoongArch: BPF: Set bpf_jit_bypass_spec_v1/v4()
JITs can set bpf_jit_bypass_spec_v1/v4() if they want the verifier to
skip analysis/patching for the respective vulnerability, it is safe to
set both bpf_jit_bypass_spec_v1/v4(), because there is no speculation
barrier instruction for LoongArch.

Suggested-by: Luis Gerhorst <luis.gerhorst@fau.de>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:55 +08:00
Haoran Jiang
19713a3a2c LoongArch: BPF: Fix the tailcall hierarchy
In specific use cases combining tailcalls and BPF-to-BPF calls,
MAX_TAIL_CALL_CNT won't work because of missing tail_call_cnt
back-propagation from callee to caller. This patch fixes this
tailcall issue caused by abusing the tailcall in bpf2bpf feature
on LoongArch like the way of "bpf, x64: Fix tailcall hierarchy".

Push tail_call_cnt_ptr and tail_call_cnt into the stack,
tail_call_cnt_ptr is passed between tailcall and bpf2bpf,
uses tail_call_cnt_ptr to increment tail_call_cnt.

Fixes: bb035ef0cc ("LoongArch: BPF: Support mixing bpf2bpf and tailcalls")
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:55 +08:00
Haoran Jiang
8191e973d4 LoongArch: BPF: Fix jump offset calculation in tailcall
The extra pass of bpf_int_jit_compile() skips JIT context initialization
which essentially skips offset calculation leaving out_offset = -1, so
the jmp_offset in emit_bpf_tail_call is calculated by

"#define jmp_offset (out_offset - (cur_offset))"

is a negative number, which is wrong. The final generated assembly are
as follow.

54:	bgeu        	$a2, $t1, -8	    # 0x0000004c
58:	addi.d      	$a6, $s5, -1
5c:	bltz        	$a6, -16	    # 0x0000004c
60:	alsl.d      	$t2, $a2, $a1, 0x3
64:	ld.d        	$t2, $t2, 264
68:	beq         	$t2, $zero, -28	    # 0x0000004c

Before apply this patch, the follow test case will reveal soft lock issues.

cd tools/testing/selftests/bpf/
./test_progs --allow=tailcalls/tailcall_bpf2bpf_1

dmesg:
watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_progs:25056]

Cc: stable@vger.kernel.org
Fixes: 5dc615520c ("LoongArch: Add BPF JIT support")
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:55 +08:00
Tiezhu Yang
24b2510d08 LoongArch: BPF: Add struct ops support for trampoline
Use BPF_TRAMP_F_INDIRECT flag to detect struct ops and emit proper
prologue and epilogue for this case.

With this patch, all of the struct_ops related testcases (except
struct_ops_multi_pages) passed on LoongArch.

The testcase struct_ops_multi_pages failed is because the actual
image_pages_cnt is 40 which is bigger than MAX_TRAMP_IMAGE_PAGES.

Before:

  $ sudo ./test_progs -t struct_ops -d struct_ops_multi_pages
  ...
  WATCHDOG: test case struct_ops_module/struct_ops_load executes for 10 seconds...

After:

  $ sudo ./test_progs -t struct_ops -d struct_ops_multi_pages
  ...
  #15      bad_struct_ops:OK
  ...
  #399     struct_ops_autocreate:OK
  ...
  #400     struct_ops_kptr_return:OK
  ...
  #401     struct_ops_maybe_null:OK
  ...
  #402     struct_ops_module:OK
  ...
  #404     struct_ops_no_cfi:OK
  ...
  #405     struct_ops_private_stack:SKIP
  ...
  #406     struct_ops_refcounted:OK
  Summary: 8/25 PASSED, 3 SKIPPED, 0 FAILED

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:55 +08:00
Chenghao Duan
51d79eb97e LoongArch: BPF: Add basic bpf trampoline support
BPF trampoline is the critical infrastructure of the BPF subsystem,
acting as a mediator between kernel functions and BPF programs. Numerous
important features, such as using BPF program for zero overhead kernel
introspection, rely on this key component.

The related tests have passed, including the following technical points:
1. fentry
2. fmod_ret
3. fexit

The following related testcases passed on LoongArch:
sudo ./test_progs -a fentry_test/fentry
sudo ./test_progs -a fexit_test/fexit
sudo ./test_progs -a fentry_fexit
sudo ./test_progs -a modify_return
sudo ./test_progs -a fexit_sleep
sudo ./test_progs -a test_overhead
sudo ./test_progs -a trampoline_count

This issue was first reported by Geliang Tang in June 2024 while
debugging MPTCP BPF selftests on a LoongArch machine (see commit
eef0532e90 "selftests/bpf: Null checks for links in bpf_tcp_ca").
Geliang, Huacai, and Tiezhu then worked together to drive the
implementation of this feature, encouraging broader collaboration among
Chinese kernel engineers.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507100034.wXofj6VX-lkp@intel.com/
Reported-by: Geliang Tang <geliang@kernel.org>
Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Tested-by: Vincent Li <vincent.mc.li@gmail.com>
Co-developed-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:50 +08:00
Chenghao Duan
483b1521a7 LoongArch: BPF: Add dynamic code modification support
This commit adds support for BPF dynamic code modification on the
LoongArch architecture:
1. Add bpf_arch_text_copy() for instruction block copying.
2. Add bpf_arch_text_poke() for runtime instruction patching.
3. Add bpf_arch_text_invalidate() for code invalidation.

On LoongArch, since symbol addresses in the direct mapping region can't
be reached via relative jump instructions from the paged mapping region,
we use the move_imm+jirl instruction pair as absolute jump instructions.
These require 2-5 instructions, so we reserve 5 NOP instructions in the
program as placeholders for function jumps.

The larch_insn_text_copy() function is solely used for BPF. And the use
of larch_insn_text_copy() requires PAGE_SIZE alignment. Currently, only
the size of the BPF trampoline is page-aligned.

Co-developed-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:50 +08:00
Chenghao Duan
32c8bf5ded LoongArch: BPF: Rename and refactor validate_code()
1. Rename the existing validate_code() to validate_ctx()
2. Factor out the code validation handling into a new helper
   validate_code()

Then:

* validate_code() is used to check the validity of code.
* validate_ctx() is used to check both code validity and table entry
  correctness.

The new validate_code() will be used in subsequent changes.

Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Co-developed-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:50 +08:00
Chenghao Duan
6ab55e0a9e LoongArch: Add larch_insn_gen_{beq,bne} helpers
Add larch_insn_gen_beq() and larch_insn_gen_bne() helpers which will be
used in BPF trampoline implementation.

Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Co-developed-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
Co-developed-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:50 +08:00
Thomas Weißschuh
2362e8124e LoongArch: Don't use %pK through printk() in unwinder
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.

Since commit ad67b74d24 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.

Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.

Switch to the regular pointer formatting which is safer and easier to
reason about.

Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:47 +08:00
Yao Zi
70a2365e18 LoongArch: Avoid in-place string operation on FDT content
In init_cpu_fullname(), a constant pointer to "model" property is
retrieved. It's later modified by the strsep() function, which is
illegal and corrupts kernel's FDT copy. This is shown by dmesg,

	OF: fdt: not creating '/sys/firmware/fdt': CRC check failed

Create a mutable copy of the model property and do in-place operations
on the mutable copy instead. loongson_sysconf.cpuname lives across the
kernel lifetime, thus manually releasing isn't necessary.

Also move the of_node_put() call for the root node after the usage of
its property, since of_node_put() decreases the reference counter thus
usage after the call is unsafe.

Cc: stable@vger.kernel.org
Fixes: 44a01f1f72 ("LoongArch: Parsing CPU-related information from DTS")
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Yao Zi <ziyao@disroot.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:47 +08:00
Ming Wang
243f8de49f LoongArch: Support mem=<size> kernel parameter
The LoongArch mem= parameter parser was previously limited to the
mem=<size>@<start> format. This was inconvenient for the common use
case of simply capping the total system memory, as it forced users to
manually specify a start address. It was also inconsistent with the
behavior on other architectures.

This patch enhances the parser in early_parse_mem() to also support the
more user-friendly mem=<size> format. The implementation now checks for
the presence of the '@' symbol to determine the user's intent:

- If mem=<size> is provided (no '@'), the kernel now calls
  memblock_enforce_memory_limit(). This trims memory from the top down
  to the specified size.
- If mem=<size>@<start> is provided, the original behavior is retained
  for backward compatibility. This allows for defining specific memory
  banks.

This change introduces an important usage rule reflected in the code's
comments: the mem=<size> format should only be specified once on the
kernel command line. It acts as a single, global cap on total memory. In
contrast, the mem=<size>@<start> format can be specified multiple times
to define several distinct memory regions.

Signed-off-by: Ming Wang <wangming01@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:47 +08:00
Huacai Chen
a1a81b5477 LoongArch: Make relocate_new_kernel_size be a .quad value
Now relocate_new_kernel_size is a .long value, which means 32bit, so its
high 32bit is undefined. This causes memcpy((void *)reboot_code_buffer,
relocate_new_kernel, relocate_new_kernel_size) in machine_kexec_prepare()
access out of range memories in some cases, and then end up with an ADE
exception.

So make relocate_new_kernel_size be a .quad value, which means 64bit, to
avoid such errors.

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:47 +08:00
Yanteng Si
41fee4f003 LoongArch: Complete KSave registers definition
According to the "LoongArch Reference Manual Volume 1: Basic
Architecture", the KSave registers (SAVE0-SAVE15) are defined in
Section 7.4.16 "Data Save (SAVE)" and listed in Table 7-1 "Control
and Status Registers Overview". These registers occupy the CSR
addresses from 0x30 to 0x3F, with 16 registers in total.

This patch completes the definitions of KS9 to KS15, so as to match
the architecture specification.

Reviewed-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-03 22:49:47 +08:00
Huacai Chen
0a91336e28 Merge tag 'bpf-next-6.17' into loongarch-next
LoongArch architecture changes for 6.17 have many bpf features such as
trampoline, so merge 'bpf-next-6.17' to create a base to make bpf work
well.
2025-08-02 11:07:06 +08:00
Alexei Starovoitov
cd7c97f458 Merge branch 'bpf-show-precise-rejected-function-when-attaching-to-__noreturn-and-deny-list-functions'
KaFai Wan says:

====================
bpf: Show precise rejected function when attaching to __noreturn and deny list functions

Show precise rejected function when attaching fexit/fmod_ret to __noreturn
functions.
Add log for attaching tracing programs to functions in deny list.
Add selftest for attaching tracing programs to functions in deny list.
Migrate fexit_noreturns case into tracing_failure test suite.

changes:
v4:
- change tracing_deny case attaching function (Yonghong Song)
- add Acked-by: Yafang Shao and Yonghong Song

v3:
- add tracing_deny case into existing files (Alexei)
- migrate fexit_noreturns into tracing_failure
- change SOB
  https://lore.kernel.org/bpf/20250722153434.20571-1-kafai.wan@linux.dev/

v2:
- change verifier log message (Alexei)
- add missing Suggested-by
  https://lore.kernel.org/bpf/20250714120408.1627128-1-mannkafai@gmail.com/

v1:
 https://lore.kernel.org/all/20250710162717.3808020-1-mannkafai@gmail.com/
---
====================

Link: https://patch.msgid.link/20250724151454.499040-1-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 19:39:30 -07:00
KaFai Wan
51d3750aba selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite
Delete fexit_noreturns.c files and migrate the cases into
tracing_failure.c files.

The result:

 $ tools/testing/selftests/bpf/test_progs -t tracing_failure/fexit_noreturns
 #467/4   tracing_failure/fexit_noreturns:OK
 #467     tracing_failure:OK
 Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-5-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 19:39:30 -07:00
KaFai Wan
a32f6f17a7 selftests/bpf: Add selftest for attaching tracing programs to functions in deny list
The result:

 $ tools/testing/selftests/bpf/test_progs -t tracing_failure/tracing_deny
 #468/3   tracing_failure/tracing_deny:OK
 #468     tracing_failure:OK
 Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-4-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 19:39:29 -07:00
KaFai Wan
863aab3d4d bpf: Add log for attaching tracing programs to functions in deny list
Show the rejected function name when attaching tracing programs to
functions in deny list.

With this change, we know why tracing programs can't attach to functions
like __rcu_read_lock() from log.

$ ./fentry
libbpf: prog '__rcu_read_lock': BPF program load failed: -EINVAL
libbpf: prog '__rcu_read_lock': -- BEGIN PROG LOAD LOG --
Attaching tracing programs to function '__rcu_read_lock' is rejected.

Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-3-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 19:39:29 -07:00
KaFai Wan
a5a6b29a70 bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions
With this change, we know the precise rejected function name when
attaching fexit/fmod_ret to __noreturn functions from log.

$ ./fexit
libbpf: prog 'fexit': BPF program load failed: -EINVAL
libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG --
Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected.

Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 19:39:29 -07:00
Suchit Karunakaran
5b4c54ac49 bpf: Fix various typos in verifier.c comments
This patch fixes several minor typos in comments within the BPF verifier.
No changes in functionality.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Link: https://lore.kernel.org/r/20250727081754.15986-1-suchitkarunakaran@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:57 -07:00
Alexei Starovoitov
a9f8d8adcb Merge branch 'bpf-improve-64bits-bounds-refinement'
Paul Chaignon says:

====================
bpf: Improve 64bits bounds refinement

This patchset improves the 64bits bounds refinement when the s64 ranges
crosses the sign boundary. The first patch explains the small addition
to __reg64_deduce_bounds. The last one explains why we need a third
round of __reg_deduce_bounds. The third patch adds a selftest with a
more complete example of the impact on verification. The second and
fourth patches update the existing selftests to take the new refinement
into account.

This patchset should reduce the number of kernel warnings hit by
syzkaller due to invariant violations [1]. It was also tested with
Agni [2] (and Cilium's CI for good measure).

Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1]
Link: https://github.com/bpfverif/agni [2]

Changes in v4:
  - Fixed outdated test comment, noticed by Eduard.
  - Rebased.
Changes in v3:
  - Added a 5th patch to call __reg_deduce_bounds a third time in
    reg_bounds_sync following tests from Eduard.
  - Fixed broken indentations in the first patch.
Changes in v2 (all on Eduard's suggestions):
  - Added two tests to ensure we cover all cases of u64/s64 overlap.
  - Improved tests to check deduced ranges with __msg.
  - Improved code comments.
====================

Link: https://patch.msgid.link/cover.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:13 -07:00
Paul Chaignon
5dbb19b16a bpf: Add third round of bounds deduction
Commit d7f0087381 ("bpf: try harder to deduce register bounds from
different numeric domains") added a second call to __reg_deduce_bounds
in reg_bounds_sync because a single call wasn't enough to converge to a
fixed point in terms of register bounds.

With patch "bpf: Improve bounds when s64 crosses sign boundary" from
this series, Eduard noticed that calling __reg_deduce_bounds twice isn't
enough anymore to converge. The first selftest added in "selftests/bpf:
Test cross-sign 64bits range refinement" highlights the need for a third
call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync
performs the following bounds deduction:

  reg_bounds_sync entry:          scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
  __update_reg_bounds:            scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
  __reg_deduce_bounds:
      __reg32_deduce_bounds:      scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
      __reg64_deduce_bounds:      scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
      __reg_deduce_mixed_bounds:  scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
  __reg_deduce_bounds:
      __reg32_deduce_bounds:      scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
      __reg64_deduce_bounds:      scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
      __reg_deduce_mixed_bounds:  scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
  __reg_bound_offset:             scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
  __update_reg_bounds:            scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))

In particular, notice how:
1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds
   learns new u32 bounds.
2. __reg64_deduce_bounds is unable to improve bounds at this point.
3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds.
4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds
   improves the smax and umin bounds thanks to patch "bpf: Improve
   bounds when s64 crosses sign boundary" from this series.
5. Subsequent functions are unable to improve the ranges further (only
   tnums). Yet, a better smin32 bound could be learned from the smin
   bound.

__reg32_deduce_bounds is able to improve smin32 from smin, but for that
we need a third call to __reg_deduce_bounds.

As discussed in [1], there may be a better way to organize the deduction
rules to learn the same information with less calls to the same
functions. Such an optimization requires further analysis and is
orthogonal to the present patchset.

Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:13 -07:00
Paul Chaignon
f96841bbf4 selftests/bpf: Test invariants on JSLT crossing sign
The improvement of the u64/s64 range refinement fixed the invariant
violation that was happening on this test for BPF_JSLT when crossing the
sign boundary.

After this patch, we have one test remaining with a known invariant
violation. It's the same test as fixed here but for 32 bits ranges.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ad046fb0016428f1a33c3b81617aabf31b51183f.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:13 -07:00
Paul Chaignon
26e5e346a5 selftests/bpf: Test cross-sign 64bits range refinement
This patch adds coverage for the new cross-sign 64bits range refinement
logic. The three tests cover the cases when the u64 and s64 ranges
overlap (1) in the negative portion of s64, (2) in the positive portion
of s64, and (3) in both portions.

The first test is a simplified version of a BPF program generated by
syzkaller that caused an invariant violation [1]. It looks like
syzkaller could not extract the reproducer itself (and therefore didn't
report it to the mailing list), but I was able to extract it from the
console logs of a crash.

The principle is similar to the invariant violation described in
commit 6279846b9b ("bpf: Forget ranges when refining tnum after
JSET"): the verifier walks a dead branch, uses the condition to refine
ranges, and ends up with inconsistent ranges. In this case, the dead
branch is when we fallthrough on both jumps. The new refinement logic
improves the bounds such that the second jump is properly detected as
always-taken and the verifier doesn't end up walking a dead branch.

The second and third tests are inspired by the first, but rely on
condition jumps to prepare the bounds instead of ALU instructions. An
R10 write is used to trigger a verifier error when the bounds can't be
refined.

Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/a0e17b00dab8dabcfa6f8384e7e151186efedfdd.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:12 -07:00
Paul Chaignon
da653de268 selftests/bpf: Update reg_bound range refinement logic
This patch updates the range refinement logic in the reg_bound test to
match the new logic from the previous commit. Without this change, tests
would fail because we end with more precise ranges than the tests
expect.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/b7f6b1fbe03373cca4e1bb6a113035a6cd2b3ff7.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:12 -07:00
Paul Chaignon
00bf8d0c6c bpf: Improve bounds when s64 crosses sign boundary
__reg64_deduce_bounds currently improves the s64 range using the u64
range and vice versa, but only if it doesn't cross the sign boundary.

This patch improves __reg64_deduce_bounds to cover the case where the
s64 range crosses the sign boundary but overlaps with the u64 range on
only one end. In that case, we can improve both ranges. Consider the
following example, with the s64 range crossing the sign boundary:

    0                                                   U64_MAX
    |  [xxxxxxxxxxxxxx u64 range xxxxxxxxxxxxxx]              |
    |----------------------------|----------------------------|
    |xxxxx s64 range xxxxxxxxx]                       [xxxxxxx|
    0                     S64_MAX S64_MIN                    -1

The u64 range overlaps only with positive portion of the s64 range. We
can thus derive the following new s64 and u64 ranges.

    0                                                   U64_MAX
    |  [xxxxxx u64 range xxxxx]                               |
    |----------------------------|----------------------------|
    |  [xxxxxx s64 range xxxxx]                               |
    0                     S64_MAX S64_MIN                    -1

The same logic can probably apply to the s32/u32 ranges, but this patch
doesn't implement that change.

In addition to the selftests, the __reg64_deduce_bounds change was
also tested with Agni, the formal verification tool for the range
analysis [1].

Link: https://github.com/bpfverif/agni [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/933bd9ce1f36ded5559f92fdc09e5dbc823fa245.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:12 -07:00
Linus Torvalds
038d61fd64 Linux 6.16 2025-07-27 14:26:38 -07:00
Paul Chaignon
5345e64760 bpf: Simplify bounds refinement from s32
During the bounds refinement, we improve the precision of various ranges
by looking at other ranges. Among others, we improve the following in
this order (other things happen between 1 and 2):

  1. Improve u32 from s32 in __reg32_deduce_bounds.
  2. Improve s/u64 from u32 in __reg_deduce_mixed_bounds.
  3. Improve s/u64 from s32 in __reg_deduce_mixed_bounds.

In particular, if the s32 range forms a valid u32 range, we will use it
to improve the u32 range in __reg32_deduce_bounds. In
__reg_deduce_mixed_bounds, under the same condition, we will use the s32
range to improve the s/u64 ranges.

If at (1) we were able to learn from s32 to improve u32, we'll then be
able to use that in (2) to improve s/u64. Hence, as (3) happens under
the same precondition as (1), it won't improve s/u64 ranges further than
(1)+(2) did. Thus, we can get rid of (3).

In addition to the extensive suite of selftests for bounds refinement,
this patch was also tested with the Agni formal verification tool [1].

Additionally, Eduard mentioned:

  The argument appears to be as follows:

  Under precondition `(u32)reg->s32_min <= (u32)reg->s32_max`
  __reg32_deduce_bounds produces:

    reg->u32_min = max_t(u32, reg->s32_min, reg->u32_min);
    reg->u32_max = min_t(u32, reg->s32_max, reg->u32_max);

  And then first part of __reg_deduce_mixed_bounds assigns:

    a. reg->umin umax= (reg->umin & ~0xffffffffULL) | max_t(u32, reg->s32_min, reg->u32_min);
    b. reg->umax umin= (reg->umax & ~0xffffffffULL) | min_t(u32, reg->s32_max, reg->u32_max);

  And then second part of __reg_deduce_mixed_bounds assigns:

    c. reg->umin umax= (reg->umin & ~0xffffffffULL) | (u32)reg->s32_min;
    d. reg->umax umin= (reg->umax & ~0xffffffffULL) | (u32)reg->s32_max;

  But assignment (c) is a noop because:

     max_t(u32, reg->s32_min, reg->u32_min) >= (u32)reg->s32_min

  Hence RHS(a) >= RHS(c) and umin= does nothing.

  Also assignment (d) is a noop because:

    min_t(u32, reg->s32_max, reg->u32_max) <= (u32)reg->s32_max

  Hence RHS(b) <= RHS(d) and umin= does nothing.

  Plus the same reasoning for the part dealing with reg->s{min,max}_value:

    e. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) | max_t(u32, reg->s32_min_value, reg->u32_min_value);
    f. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) | min_t(u32, reg->s32_max_value, reg->u32_max_value);

      vs

    g. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) | (u32)reg->s32_min_value;
    h. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) | (u32)reg->s32_max_value;

      RHS(e) >= RHS(g) and RHS(f) <= RHS(h), hence smax=,smin= do nothing.

  This appears to be correct.

Also, Shung-Hsi:

  Beside going through the reasoning, I also played with CBMC a bit to
  double check that as far as a single run of __reg_deduce_bounds() is
  concerned (and that the register state matches certain handwavy
  expectations), the change indeed still preserve the original behavior.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://github.com/bpfverif/agni [1]
Link: https://lore.kernel.org/bpf/aIJwnFnFyUjNsCNa@mail.gmail.com
2025-07-27 19:23:29 +02:00
Linus Torvalds
b711733e89 A single fix for the PTP systemcounter mechanism:
The rework of this mechanism added a 'use_nsec' member to struct
   system_counterval. get_device_system_crosststamp() instantiates that
   struct on the stack and hands a pointer to the driver callback.
 
   Only the drivers which set use_nsec to true, initialize that field, but
   all others ignore it. As get_device_system_crosststamp() does not
   initialize the struct, the use_nsec field contains random stack content
   in those cases. That causes a miscalulation usually resulting in a
   failing range check in the best case.
 
   Initialize the structure before handing it to the drivers to cure that.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGFA4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRcsEACvQI0LmKTOigzSZvBT1CZnGcwpeqYi
 Ez0v/w+tpyfbwQgf9kxR+ZbjNdwqCYFnR8PZPFKFuvsanWRTIcYaTkIQWvDhcEX/
 U4AFI3VkdZUFckCEY/fv7j3/jkp7pbLVHMq001Z9xaMMcE+ox1AlHpEW0Khd3gqL
 VFLXU5S7Q9H6J6ujjFAXAMuhgjk6WOz8q+ew3hnc3dxwyuEBAz83jOScH/be3dTl
 10ydzoxFEa+ZlacAHX+SqZ7nhS7ExxNlwlUuTYj/EkBCQ8UIoS93YLA5bYMcWCao
 W5rs6vFJmMO6NR6lkqwfKmKyjovx79jHMVNKoxydZGvkqcNMtfc/eUfByxAkyCDP
 gmTCFwgKVGdjGsYwkGqafejmJt5OFrD1hMyWfBhGWQ/Z8CXuuJNEa/8trSyUK/CS
 DFD1InOLltbYuw7rY5gRxb+xmgBTxUMj8gF/hXYs7wNzJqNJXXNae/2Sue+Xi+mV
 iieEF8UonmpMe9k9w3+fFGGDWYa4lYnT5O3VMQ0nEjj6dt5RVQqRvjTa+GtQJzUs
 h4fUs+BIKyCkh6DgRKyIsDruzryOSnZ+vqMcGMm0gvPttc3cGYksLiQVlWYjQhxs
 pTFrHNGOSXMT5WBQ7KWKzGypHlf3WYhVWk1+dmPJedrdyr23AfgKAGM6zva1Oqjc
 81w9DvppBL0sOA==
 =j6Po
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for the PTP systemcounter mechanism:

  The rework of this mechanism added a 'use_nsec' member to struct
  system_counterval. get_device_system_crosststamp() instantiates that
  struct on the stack and hands a pointer to the driver callback.

  Only the drivers which set use_nsec to true, initialize that field,
  but all others ignore it. As get_device_system_crosststamp() does not
  initialize the struct, the use_nsec field contains random stack
  content in those cases. That causes a miscalulation usually resulting
  in a failing range check in the best case.

  Initialize the structure before handing it to the drivers to cure
  that"

* tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Zero initialize system_counterval when querying time from phc drivers
2025-07-27 09:31:32 -07:00
Linus Torvalds
ec2df43646 spi: Fix for v6.16
One last fix for v6.16, removing some hard coding to avoid data
 corruption on some NAND devices in the QPIC driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmiFSgwACgkQJNaLcl1U
 h9BChwf9Epc80iYZruJgP+o0wZTVMUDSU3gdVP6Zl177M4G7adGn4piW7CamJxgn
 HcrlZNN9PNCLgIjw5n6w6ZRTXGU0Jd/3+gyZ39kB6Hn+1HDQMLJhz4hOibdotolu
 KRbZy/j4beTgDFJSSgG0Vnqo/f8Ew+XGHBzhAflbxp3s/jgju4OCIyCpB8MBo98R
 HLgQ+uJ+vpJp3sDleACF0LrQxYrGe7/GjFaoCGfHA2+pvi0pt17Oc/CVMCjRi/bs
 ojlSm7myg9Os0/Q2wv/3JPCGaLHPvZ/nWQ3blEQjoXsu0fxZbPjz/pWDNLYa8UV1
 CfUlNT4+dmgHrQEcmX4ysVHGpFAqLg==
 =Cu4n
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
 "One last fix for v6.16, removing some hard coding to avoid data
  corruption on some NAND devices in the QPIC driver"

* tag 'spi-fix-v6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-qpic-snand: don't hardcode ECC steps
2025-07-26 14:38:33 -07:00
Linus Torvalds
513fc69f8f i2c-for-6.16-rc8
qup: avoid potential hang when waiting for bus idle
 tegra: improve ACPI reset error handling
 virtio: use interruptible wait to prevent hang during transfer
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmiFQ4YACgkQFA3kzBSg
 KbZO3BAArkTbL/wlJHHheexhXgK8ZJQC8aBR4AfFWgUIU9uPvJ4sR+OFkL62oqDH
 eAbVOaUXmD1KpqC+TwhGWGl6c+QYB8xfuEzI10TYs7S8DqGkU3AhYtFPwmjYL4xq
 LChLGmjvZdt7WeKqvrcCWczcI4NJJFQlBe9K/lEdhbI/CUKUxvUM0AHo0nmmrXoi
 3Rl/EcWyC8T2S5+42oXmUtNn9uvEQKzcSKj3XC3qmEE6kXrtnKthUhkJ/5eOl7OJ
 +Uvccn6DecxwNrL5uwPXT5af+NgPILLkAvnnV2HBdIdR4aanZ4BV7XmOCgG5OmCE
 MMjY52eXg/gG7b+CHsa0GtajHspTo4xzgrrbX0qIyiUO7RRROifBmgy9XhKrU1Tw
 AeH41Z5JrIRdD0D7d2GgLAHro83ckJCZLrsQRB/2cgx02HrWTFngvse7MN+QOSZ9
 RwCez8BKoETd2rf1r5PXYqf+24jKp+wKrG6+H/al3RCUOUKeH1fE/eojNc7rcpYM
 5Ner5BRKTSSztk06KyOo/cT9wiiQTbQ093kKZdBl0cEpyeP48TJFoPlI9WTCdW+t
 e0rIoC2+GVwb2W08KTwlsjLcyWyvr/yHGmSWjKI1LvvLTwsce5RRGXHSpbR+fRrj
 FZBjYklq3rn3eY3mBZeVdeWAb+5lgYVExdtgIMLVzn3ClHzlsLk=
 =rpOM
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

 - qup: avoid potential hang when waiting for bus idle

 - tegra: improve ACPI reset error handling

 - virtio: use interruptible wait to prevent hang during transfer

* tag 'i2c-for-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: qup: jump out of the loop in case of timeout
  i2c: virtio: Avoid hang by using interruptible completion wait
  i2c: tegra: Fix reset error handling with ACPI
2025-07-26 14:25:41 -07:00
Linus Torvalds
874885990b A few Allwinner clk driver fixes:
- Mark Allwinner A523 MBUS clock as critical to avoid system
    stalls
  - Fix names of CSI related clocks on Allwinner V3s. This
    includes changes to the driver, DT bindings and DT files.
  - Fix parents of TCON clock on Allwinner V3s
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmiFLZcUHHN3Ym95ZEBj
 aHJvbWl1bS5vcmcACgkQrQKIl8bklSUyyw//VGy/Am1BDCsALNp4JeLeklq4n+/9
 Ly0Knl3MFXP+pMum/RF2vlVPaur/ry/Lo9NpY/Te4P+R/i4baJlajDyz6NOicw5W
 cIt0aJTt2x9U/YWFofu1NzQkXiK8CntJ8RNy96SwyFWcj8V9+Q9n+lww67daGddk
 zNVGA8GpiXF09Vzwx1xTOBT1n2pWuK8r9jH9Sv3Wei4NZGVdKbC61NEP7fpTKkKR
 K7BwKhRO98qRr8BYiyhanAtWgrslgVW4lJHO0fkCpyCmNkuBo6pyeO39QXcoHejT
 aaSUPE/mb3AAtEa8eiSMmV9KExIX2p9+UNul3aADKrr2408O5eN8usvMI8XYOw1m
 03wLCojmSB0qz8R+BQP9SG+vRdokWxsqQjgi7IXecOslvXjJ3kR7ttQhyae2jjqk
 p/K8ceMv5imA2FrKMRaSNcbiNo6qRjwXaLgQX5w7qpvGmsdPi2WPvJrn96tye5kz
 CM7gfHUflKgpptZnSDeFwk+IU8wjMNtUhvR3CAFtTf4MX3zuI+s60Q8S8z/IJ565
 ILVMEqlso9SiGEiUlUs219CkdgBxavh2JMKxPqSH8IG2VeFP7ZHOAE0La6mlgU9W
 KSfM8CrpJmfHsmIEB/Wz6mXo3o0B2GYCBFL8xqh3lvCU5TQ8LQmrSaPcB4nTC830
 EBNXR311TY7tbeo=
 =Q0zO
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "A few Allwinner clk driver fixes:

   - Mark Allwinner A523 MBUS clock as critical to avoid
     system stalls

   - Fix names of CSI related clocks on Allwinner V3s. This
     includes changes to the driver, DT bindings and DT files.

   - Fix parents of TCON clock on Allwinner V3s"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: v3s: Fix TCON clock parents
  clk: sunxi-ng: v3s: Fix CSI1 MCLK clock name
  clk: sunxi-ng: v3s: Fix CSI SCLK clock name
  clk: sunxi-ng: a523: Mark MBUS clock as critical
2025-07-26 13:26:33 -07:00
Puranjay Mohan
e9f545d0d3 selftests/bpf: Enable private stack tests for arm64
As arm64 JIT now supports private stack, make sure all relevant tests
run on arm64 architecture.

Relevant tests:

 #415/1   struct_ops_private_stack/private_stack:OK
 #415/2   struct_ops_private_stack/private_stack_fail:OK
 #415/3   struct_ops_private_stack/private_stack_recur:OK
 #415     struct_ops_private_stack:OK
 #549/1   verifier_private_stack/Private stack, single prog:OK
 #549/2   verifier_private_stack/Private stack, subtree > MAX_BPF_STACK:OK
 #549/3   verifier_private_stack/No private stack:OK
 #549/4   verifier_private_stack/Private stack, callback:OK
 #549/5   verifier_private_stack/Private stack, exception in mainprog:OK
 #549/6   verifier_private_stack/Private stack, exception in subprog:OK
 #549/7   verifier_private_stack/Private stack, async callback, not nested:OK
 #549/8   verifier_private_stack/Private stack, async callback, potential nesting:OK
 #549     verifier_private_stack:OK
 Summary: 2/11 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250724120257.7299-4-puranjay@kernel.org
2025-07-26 21:27:15 +02:00
Puranjay Mohan
6c17a882d3 bpf, arm64: JIT support for private stack
The private stack is allocated in bpf_int_jit_compile() with 16-byte
alignment. It includes additional guard regions to detect stack
overflows and underflows at runtime.

Memory layout:

              +------------------------------------------------------+
              |                                                      |
              |  16 bytes padding (overflow guard - stack top)       |
              |  [ detects writes beyond top of stack ]              |
     BPF FP ->+------------------------------------------------------+
              |                                                      |
              |  BPF private stack (sized by verifier)               |
              |  [ 16-byte aligned ]                                 |
              |                                                      |
BPF PRIV SP ->+------------------------------------------------------+
              |                                                      |
              |  16 bytes padding (underflow guard - stack bottom)   |
              |  [ detects accesses before start of stack ]          |
              |                                                      |
              +------------------------------------------------------+

On detection of an overflow or underflow, the kernel emits messages
like:

    BPF private stack overflow/underflow detected for prog <prog_name>

After commit bd737fcb64 ("bpf, arm64: Get rid of fpb"), Jited BPF
programs use the stack in two ways:

1. Via the BPF frame pointer (top of stack), using negative offsets.
2. Via the stack pointer (bottom of stack), using positive offsets in
   LDR/STR instructions.

When a private stack is used, ARM64 callee-saved register x27 replaces
the stack pointer. The BPF frame pointer usage remains unchanged; but
it now points to the top of the private stack.

Relevant tests (Enabled in following patch):

 #415/1   struct_ops_private_stack/private_stack:OK
 #415/2   struct_ops_private_stack/private_stack_fail:OK
 #415/3   struct_ops_private_stack/private_stack_recur:OK
 #415     struct_ops_private_stack:OK
 #549/1   verifier_private_stack/Private stack, single prog:OK
 #549/2   verifier_private_stack/Private stack, subtree > MAX_BPF_STACK:OK
 #549/3   verifier_private_stack/No private stack:OK
 #549/4   verifier_private_stack/Private stack, callback:OK
 #549/5   verifier_private_stack/Private stack, exception in main prog:OK
 #549/6   verifier_private_stack/Private stack, exception in subprog:OK
 #549/7   verifier_private_stack/Private stack, async callback, not nested:OK
 #549/8   verifier_private_stack/Private stack, async callback, potential nesting:OK
 #549     verifier_private_stack:OK
 Summary: 2/11 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250724120257.7299-3-puranjay@kernel.org
2025-07-26 21:26:56 +02:00
Puranjay Mohan
3ba58312e6 bpf: Move bpf_jit_get_prog_name() to core.c
bpf_jit_get_prog_name() will be used by all JITs when enabling support
for private stack. This function is currently implemented in the x86
JIT.

Move the function to core.c so that other JITs can easily use it in
their implementation of private stack.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250724120257.7299-2-puranjay@kernel.org
2025-07-26 21:26:51 +02:00
Puranjay Mohan
b114fcee76 bpf, arm64: Fix fp initialization for exception boundary
In the ARM64 BPF JIT when prog->aux->exception_boundary is set for a BPF
program, find_used_callee_regs() is not called because for a program
acting as exception boundary, all callee saved registers are saved.
find_used_callee_regs() sets `ctx->fp_used = true;` when it sees FP
being used in any of the instructions.

For programs acting as exception boundary, ctx->fp_used remains false
even if frame pointer is used by the program and therefore, FP is not
set-up for such programs in the prologue. This can cause the kernel to
crash due to a pagefault.

Fix it by setting ctx->fp_used = true for exception boundary programs as
fp is always saved in such programs.

Fixes: 5d4fa9ec56 ("bpf, arm64: Avoid blindly saving/restoring all callee-saved registers")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/20250722133410.54161-2-puranjay@kernel.org
2025-07-26 21:23:38 +02:00
Thomas Weißschuh
b7b3500bd4 umd: Remove usermode driver framework
The code is unused since 98e20e5e13 ("bpfilter: remove bpfilter"),
therefore remove it.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-2-0d0083334382@linutronix.de
2025-07-26 21:03:04 +02:00
Thomas Weißschuh
2b03164eee bpf/preload: Don't select USERMODE_DRIVER
The usermode driver framework is not used anymore by the BPF
preload code.

Fixes: cb80ddc671 ("bpf: Convert bpf_preload.ko to use light skeleton.")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-1-0d0083334382@linutronix.de
2025-07-26 21:02:48 +02:00
Linus Torvalds
302f88ff35 ARM fixes for 6.16
- use an absolute path for asm/unified.h in KBUILD_AFLAGS to solve
   a regression caused by commit d5c8d6e0fa
   ("kbuild: Update assembler calls to use proper flags and language
   target"),
 
 - fix dead code elimination binutils version check again.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAmiDnecACgkQ9OeQG+St
 rGRzHxAAjYw0d2oa59q8hDuOfvHw52ZB6sUZBCqydQOl0nfGpCpgqm391NT+y1Hd
 Bnlxk0h0hDT5XDO7cKaq0y5qf6l0i0KKMYv1rUMjsKQeRZbIj+1+1XmLdSNOtUyy
 ON1SxpQEBAnwq9EUa/aBbiteFtv3N0xOrbI6lqnL20EW/VTou96eTwY8Suc6wBaZ
 xQHAPGF3+8JMMs6xEkAAG1TEmvOiHiIBxngm+kaXyL7SgHSYFLfVxqDyKP+bUYkL
 kVrT0nT+XU10W34LLTsNtLUTUTX+Z0gFdJ6Hj7Osnx/K7Bd8wGZBryzg/ywVPVQh
 4isxo9R1XXKrp7pGnz0CVRr3B7F9c9a8oK9AE2qMQuCDWQMx/VcVCZ0CNBmI6yQ0
 QfJuXfGwnLRzCdf1lHogIOAC2rDjjKkOgrGya1lbwt6Bl3haXXK718jm4x5M2ysC
 QK0oacopVxqc+a39GDSyQPzBaf23aPj5tdeGsnSibMxGpZj9rXpPkQQycbEv3uvJ
 xRbZhxJwclAyNAeD6Whgdi8Gyn8W2fRwcqdyZfZx7949Pqi0HXsDSJkckgCJIdPG
 9cO2q3D2S5xSnGy+shPzAhg7ve41WPvRHQWHHAUrn5KHGxTQUGG+qFuZAxy0WPxy
 M2wCYwGxTSF70xyHsmNxxxfrY6OBDVXaMPCqiTuI85mqnzufdn8=
 =ouRB
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux

Pull ARM fixes from Russell King:

 - use an absolute path for asm/unified.h in KBUILD_AFLAGS to solve a
   regression caused by commit d5c8d6e0fa ("kbuild: Update assembler
   calls to use proper flags and language target")

 - fix dead code elimination binutils version check again

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9450/1: Fix allowing linker DCE with binutils < 2.36
  ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS
2025-07-26 10:21:25 -07:00
Linus Torvalds
6121f69c36 soc: fixes for 6.16, part 3
These are two fixes that came in late, one addresses a regression
 on a rockchips based board, the other is for ensuring a consistent
 dt binding for a device added in 6.16 before the incorrect one
 makes it into a release.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmiEr0kACgkQmmx57+YA
 GNnqng//bMqUF0zvPePqhIuDlrAPyHqczDl+2STs6LM1Y89483QFbqlH1481b5dO
 dKeSu/4G4ePcpUcM5ZST5961mLds7JReSDOjQt2RE1P5q2JQCHOnwJD0ptGH0AwJ
 LCWFQc5lhj1lUNqj+zkJydoeos48HC7KBniP1E2d00d7vfqq5CoEHHjsAiaz5Yu1
 bkd2ATbazFv0eE3W1KEEu6HgPjt+jzz9I8tjhXp9M1oYMA9M7g6Eu08X6Befr7tA
 dWDro81YwnoVdXdSGeCHE58Vl413jcpfzDV+RrUG/rzJ8Jb4/5OrIRg4fiH2di//
 qSSsD6kgAEajuK36ys7Z3h74cjm6eJif8QkDkA/qGWl9t6fL0BI/Vjd6PL6mNhJz
 WmYRq8R/uFeh7Q+rGkaB8vl7fmLhnpBz0UwBpQ0DrV3lU8HnCjWiwEWkk4gXflI5
 1p07OfqnjzZ8bqa1P5MEApRcngWNME6IbzL2Yt07FEvLdfJ9xauly/QhIiSlZ2bw
 5HjLhfutTJYHGRGQwKw632Vqi1bldUkDOttP71fO4ocGqH7Jx2W8Fz/APGjmIvmG
 OEDMSfjQjAQfWGM4KhTIv4zRbndtVJsiMG3xzZl496hSrDJqpBI/cnEJ7bZftfT3
 dCIVN8thTP8W6c4Thh3DQllBLln1jwy7zszao3jycEJEuJSO9MA=
 =qwRh
 -----END PGP SIGNATURE-----

Merge tag 'soc-fixes-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
 "These are two fixes that came in late, one addresses a regression on a
  rockchips based board, the other is for ensuring a consistent dt
  binding for a device added in 6.16 before the incorrect one makes it
  into a release"

* tag 'soc-fixes-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: rockchip: Drop netdev led-triggers on NanoPi R5S
  arm64: dts: allwinner: a523: Rename emac0 to gmac0
2025-07-26 10:10:05 -07:00
Martin KaFai Lau
9ea0691e47 Merge branch 'selftests-bpf-fix-a-few-dynptr-test-failures-with-64k-page-size'
Yonghong Song says:

====================
selftests/bpf: Fix a few dynptr test failures with 64K page size

There are a few dynptr test failures with arm64 64K page size.
They are fixed in this patch set and please see individual patches
for details.
====================

Link: https://patch.msgid.link/20250725043425.208128-1-yonghong.song@linux.dev
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-07-25 18:20:44 -07:00
Yonghong Song
4a5dcb3373 selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure
For arm64 64K page size, the xdp data size was set to be more than 64K
in one of previous patches. This will cause failure for bpf_dynptr_memset().
Since the failure of bpf_dynptr_memset() is expected with 64K page size,
return success.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250725043440.209266-1-yonghong.song@linux.dev
2025-07-25 18:20:44 -07:00
Yonghong Song
90f791a975 selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure
For arm64 64K page size, the bpf_dynptr_copy() in test dynptr/test_dynptr_copy_xdp
will succeed, but the test will failure with 4K page size. This patch made a change
so the test will fail expectedly for both 4K and 64K page sizes.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://patch.msgid.link/20250725043435.208974-1-yonghong.song@linux.dev
2025-07-25 18:20:43 -07:00
Yonghong Song
4c82768e41 selftests/bpf: Increase xdp data size for arm64 64K page size
With arm64 64K page size, the following 4 subtests failed:
  #97/25   dynptr/test_probe_read_user_dynptr:FAIL
  #97/26   dynptr/test_probe_read_kernel_dynptr:FAIL
  #97/27   dynptr/test_probe_read_user_str_dynptr:FAIL
  #97/28   dynptr/test_probe_read_kernel_str_dynptr:FAIL

These failures are due to function bpf_dynptr_check_off_len() in
include/linux/bpf.h where there is a test
  if (len > size || offset > size - len)
    return -E2BIG;
With 64K page size, the 'offset' is greater than 'size - len',
which caused the test failure.

For 64KB page size, this patch increased the xdp buffer size from 5000 to
90000. The above 4 test failures are fixed as 'size' value is increased.
But it introduced two new failures:
  #97/4    dynptr/test_dynptr_copy_xdp:FAIL
  #97/12   dynptr/test_dynptr_memset_xdp_chunks:FAIL

These two failures will be addressed in subsequent patches.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://patch.msgid.link/20250725043430.208469-1-yonghong.song@linux.dev
2025-07-25 18:20:43 -07:00
Wolfram Sang
31f08841dd i2c-host-fixes for v6.16-rc8
qup: avoid potential hang when waiting for bus idle
 tegra: improve ACPI reset error handling
 virtio: use interruptible wait to prevent hang during transfer
 -----BEGIN PGP SIGNATURE-----
 
 iIwEABYKADQWIQScDfrjQa34uOld1VLaeAVmJtMtbgUCaIOznxYcYW5kaS5zaHl0
 aUBrZXJuZWwub3JnAAoJENp4BWYm0y1uD4ABALfEdKZJzebUmYe8LIchwE7j3n9p
 upZJy0+eQtIgSAQdAP930znDO05ezlUcnxAxA+UDfMFamrKIrpu/I4My9fIBBA==
 =d9vN
 -----END PGP SIGNATURE-----

Merge tag 'i2c-host-fixes-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

i2c-host-fixes for v6.16-rc8

qup: avoid potential hang when waiting for bus idle
tegra: improve ACPI reset error handling
virtio: use interruptible wait to prevent hang during transfer
2025-07-26 00:59:39 +02:00
Linus Torvalds
5f33ebd201 drm fixes (part 2) for 6.16-rc8/final
i915:
 - Fix DP 2.7 Gbps DP_LINK_BW value on g4x
 - Fix return value on intel_atomic_commit_fence_wait
 
 xe:
 - Fix build without debugfs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmiD6UwACgkQDHTzWXnE
 hr5DPRAAqXbmjZVfKlPkFYf7flCT29D8TFO7dVbrnol0P8B0JbMzhcTLY9wKOZOl
 dyp5NuTOWOcwV7sgG4yxbkclwUnkFdhPVW7oAqt1b2VwX7Y6f97xQQwxI2CWqsBa
 DK8Z+ej17nQrHUknRfhNxpxe3kfnn6O+4A7i7Iqdc0RUrYbe82wrOP6KFgYpX1Ck
 QmgU1GhiUmvSRN/XZTIlcyB3LNrow6O0XTMy6j7wwKmnmecGGW/C/nSzhdqfMszI
 Qj7GaszS5OyuCk5Grylg0iYGPw+mbGa2qs9hxiwIe6zgR++EP4cVxQtBGoLjmINa
 yZOT/8gXdMmFH3QfGMOes+WJVfFxY77y4lyTzB7L0meeZNdTaLYxbsBjnQp2HSF8
 x3fFfu0e6Jgz2GfpsPW5jm1w3bAYAG24UNCNts/JhOs/9U2Fl9UY2oHJnGx/cCfd
 3xFjJBv6mFD0AmKapmdDKSlS6uBpq0ElfKrPuwwhlrmX01c2jpe5sYT+JxSkeG1R
 efLgWSyyuFZ4IwpT2BkEVgYVwEAi1Kyix+/Yyobc2YJjSpzOhaNmRgdWecBJuWMn
 CMjb7jciXAxJt+wP+y5301Fk0q73uaQWSypDaX5lZhnD7yjvTFtkUQzPFOd2bQ/E
 cHIXzhKolLByaVKZV2r7IAeI+xujJAMT8XLaHe4U/9Vv/CTr7lQ=
 =rf5N
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2025-07-26' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes (part 2) from Dave Airlie:
 "Just the follow up fixes for i915 and xe, all pretty minor.

  i915:
   - Fix DP 2.7 Gbps DP_LINK_BW value on g4x
   - Fix return value on intel_atomic_commit_fence_wait

  xe:
   - Fix build without debugfs"

* tag 'drm-fixes-2025-07-26' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe: Fix build without debugfs
  drm/i915/display: Fix dma_fence_wait_timeout() return value handling
  drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4x
2025-07-25 13:36:35 -07:00
Linus Torvalds
327579671a block-6.16-20250725
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmiDdRYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnkWD/9h7VFAPOxDCWvmp8awU3OoKVbJI5lo0656
 Irch2xNlhtCAB6fUYSRPOq5xIZjNQmI5Fvzp7Gyto1fQYmtxsU75Kbgh7gzTOxsI
 j0I2KAwD2RrifozONOJa3aCYS8T18nEdcK32zMvVUegBAPhd9wI474fJJSAKa5t2
 qhXcYMRyiy4Wc1Sz187kD5H7RBljdkgnmO0VcWbplwTW0vPID70tSacDKUW1Jmuf
 kSqDh52jzPaYyt7f2gr/TaiHf1TsUuGKdIS58gdN+CBXEMMo4IKOxrU0qFMytOr9
 N1B2VzG9aEUZjZFqArOnO2BpUnfhHwI1JlqONOvdholpqTCVvdxpDlMIc918cQ+v
 5mYTWOtYCE+ziLRJlp+ttNOipVLMOPemr/Rnb4w9I84Xsdt1dxAAv8MOuB4lGomT
 vSwoK6SLUS5u6PSSTAv8f9I1fgijghbzsXs6TpDwHMYujNQn/MyHJLIYQ4yWhDrJ
 25bjLRYJePR83I1AdjbL/fJqCi6gUtzzRrDfN3xSziMo875mP0XxjPOaQeGLpMXM
 Br1GFrXHtvUZ/2ipvGzbVDL/qs3a5S/rQJ2HNhgQvd/FcSs1ZMirCEbWmTyDtNdj
 MkYu4VGFXwhVxVBXGqnShRRbf6KnLM/MC1GkQVpqfKjhMSBsBaJ68kCXbROVbkGj
 3BSu0SlV0w==
 =Q1wN
 -----END PGP SIGNATURE-----

Merge tag 'block-6.16-20250725' of git://git.kernel.dk/linux

Pull block fix from Jens Axboe:
 "Just a single fix for regression in this release, where a module
  reference could be leaked"

* tag 'block-6.16-20250725' of git://git.kernel.dk/linux:
  block: fix module reference leak in mq-deadline I/O scheduler
2025-07-25 08:05:17 -07:00
Linus Torvalds
4bb0122091 vfs-6.16-rc8.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaIM+0AAKCRCRxhvAZXjc
 oopdAP9SviubsueENDGRNvyvjCajdaUcZ481UeWCsvshc1ykmgEAo+Cwu3QxRzF7
 BVjSIJSV9r4ae/fNw/MASbJVwyfCJQc=
 =hGG6
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "Two last-minute fixes for this cycle:

   - Set afs vllist to NULL if addr parsing fails

   - Add a missing check for reaching the end of the string in afs"

* tag 'vfs-6.16-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  afs: Set vllist to NULL if addr parsing fails
  afs: Fix check for NULL terminator
2025-07-25 08:00:48 -07:00
Linus Torvalds
bef3012b2f bcachefs fixes for v6.16
User reported fixes:
 
 - Fix btree node scan on encrypted filesystems by not using btree node
   header fields encrypted
 
 - Fix a race in btree write buffer flush; this caused EROs primarily
   during fsck for some people
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmiC8nIACgkQE6szbY3K
 bnb2Mw//c9gFPpIqCQnMGwG1lXCCDjyKMkaP9OFq/Y2Ul5ERqWr2x3zzj36AYAkn
 Ujo4/xNVBTsdvz8Kj4kG926F87UPnSJPJ64Y3L2Ag8WtOImMDWCu2GBHQ6ILdZYp
 sPaf3Kd5P8GS221s/Rg45DihHlkoaZiXe7nE9/i/SdrvWyeMG7WQBuwgyTkjbVSh
 c95sot66FlKURHDilFxZ1JU7UIf+QLyZRb5d6gQlmCce4c3w7RaRjK/dtnA9wZPQ
 GkLim+Ja5oxUxT+pn4g2ImfLDL1nIGN7xtZR8vg0awekIbQxCsFzv6l7t5vvLEQ/
 l9Q8bCUFhJ5pQ4uMGsde47G3smmVC8OFDJTNeUog9grcbcsyUh1+bN9Ix886oVnd
 14Yji6++WmwekDISw9wH8tKNkGp7PZFMDxRFET8LQf+WReIDr/n1F4j1fhNBL0jr
 UVy0F8u3prGwE3TigUX35cBvDmdbhIpkSZIaXskUewkiOByzcLXh5fF5XlfEtAJ0
 OhMQQPAOKv5vcmLF3+JYMGj0JDch3s6HJy6AfxQ2EfZs4iozL47aOEiMlQ3PdZfw
 CJfMAXwXs/PgznSDBdAdV0iMRM5f43QtHGXfBIsHk+F3Ax05ERvvghtnCYht3ltd
 5Fy80Tm1iGxH4nsItR3bXxAnkFQmhJFDPAZ7RD8TkvgkSrDYJhA=
 =qe+b
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2025-07-24' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "User reported fixes:

   - Fix btree node scan on encrypted filesystems by not using btree
     node header fields encrypted

   - Fix a race in btree write buffer flush; this caused EROs primarily
     during fsck for some people"

* tag 'bcachefs-2025-07-24' of git://evilpiepirate.org/bcachefs:
  bcachefs: Add missing snapshots_seen_add_inorder()
  bcachefs: Fix write buffer flushing from open journal entry
  bcachefs: btree_node_scan: don't re-read before initializing found_btree_node
2025-07-25 07:56:38 -07:00
Nathan Chancellor
53e7e1fb81 ARM: 9450/1: Fix allowing linker DCE with binutils < 2.36
Commit e7607f7d6d ("ARM: 9443/1: Require linker to support KEEP within
OVERLAY for DCE") accidentally broke the binutils version restriction
that was added in commit 0d437918fb ("ARM: 9414/1: Fix build issue
with LD_DEAD_CODE_DATA_ELIMINATION"), reintroducing the segmentation
fault addressed by that workaround.

Restore the binutils version dependency by using
CONFIG_LD_CAN_USE_KEEP_IN_OVERLAY as an additional condition to ensure
that CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION is only enabled with
binutils >= 2.36 and ld.lld >= 21.0.0.

Closes: https://lore.kernel.org/6739da7d-e555-407a-b5cb-e5681da71056@landley.net/
Closes: https://lore.kernel.org/CAFERDQ0zPoya5ZQfpbeuKVZEo_fKsonLf6tJbp32QnSGAtbi+Q@mail.gmail.com/

Cc: stable@vger.kernel.org
Fixes: e7607f7d6d ("ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE")
Reported-by: Rob Landley <rob@landley.net>
Tested-by: Rob Landley <rob@landley.net>
Reported-by: Martin Wetterwald <martin@wetterwald.eu>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-07-25 15:33:51 +01:00
Nathan Chancellor
87c4e1459e ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS
After commit d5c8d6e0fa ("kbuild: Update assembler calls to use proper
flags and language target"), which updated as-instr to use the
'assembler-with-cpp' language option, the Kbuild version of as-instr
always fails internally for arch/arm with

  <command-line>: fatal error: asm/unified.h: No such file or directory
  compilation terminated.

because '-include' flags are now taken into account by the compiler
driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not
found.

This went unnoticed at the time of the Kbuild change because the last
use of as-instr in Kbuild that arch/arm could reach was removed in 5.7
by commit 541ad0150c ("arm: Remove 32bit KVM host support") but a
stable backport of the Kbuild change to before that point exposed this
potential issue if one were to be reintroduced.

Follow the general pattern of '-include' paths throughout the tree and
make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can
be used independently.

Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg@mail.gmail.com/

Cc: stable@vger.kernel.org
Fixes: d5c8d6e0fa ("kbuild: Update assembler calls to use proper flags and language target")
Reported-by: KernelCI bot <bot@kernelci.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-07-25 15:33:50 +01:00
Kent Overstreet
c37495fe35 bcachefs: Add missing snapshots_seen_add_inorder()
This fixes an infinite loop when repairing "extent past end of inode",
when the extent is an older snapshot than the inode that needs repair.

Without the snaphsots_seen_add_inorder() we keep trying to delete the
same extent, even though it's no longer visible in the inode's snapshot.

Fixes: 63d6e93119 ("bcachefs: bch2_fpunch_snapshot()")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-24 22:56:37 -04:00
Kent Overstreet
1831840c2b bcachefs: Fix write buffer flushing from open journal entry
When flushing the btree write buffer, we pull write buffer keys directly
from the journal instead of letting the journal write path copy them to
the write buffer.

When flushing from the currently open journal buffer, we have to block
new reservations and wait for outstanding reservations to complete.

Recheck the reservation state after blocking new reservations:
previously, we were checking the reservation count from before calling
__journal_block().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-24 22:56:37 -04:00
Linus Torvalds
2942242dde 11 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels.
 
 7 are for MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaILYBgAKCRDdBJ7gKXxA
 jo0uAQDvTlAjH6TcgRW/cbqHRIeiRoZ9Bwh/RUlJXM9neDR2LgEA41B+ohTsxUmZ
 OhM3Ce94tiGrHnVlW3SsmVaO+1TjGAU=
 =KUR9
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "11 hotfixes. 9 are cc:stable and the remainder address post-6.15
  issues or aren't considered necessary for -stable kernels.

  7 are for MM"

* tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  sprintf.h requires stdarg.h
  resource: fix false warning in __request_region()
  mm/damon/core: commit damos_quota_goal->nid
  kasan: use vmalloc_dump_obj() for vmalloc error reports
  mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show()
  mm: update MAINTAINERS entry for HMM
  nilfs2: reject invalid file types when reading inodes
  selftests/mm: fix split_huge_page_test for folio_split() tests
  mailmap: add entry for Senozhatsky
  mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n
  mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
2025-07-24 19:13:30 -07:00
Dave Airlie
14e8f8e74d Driver Changes:
- Fix build without debugfs (Lucas)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRskUM7w1oG5rx2IZO4FpNVCsYGvwUCaIKVEwAKCRC4FpNVCsYG
 vwvHAQC9TKBBykzhZibcetgAqPgOkZhoANiRhAoK9iUZf2OTpQEA5u3ahxetKxW9
 Z6kUhYQPrCXOuCbO46kpta2Jo8TzIQY=
 =l5ox
 -----END PGP SIGNATURE-----

Merge tag 'drm-xe-fixes-2025-07-24' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix build without debugfs (Lucas)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aIKWC2RPlbRxZc5o@fedora
2025-07-25 11:03:08 +10:00
Stephen Rothwell
0dec720178 sprintf.h requires stdarg.h
In file included from drivers/crypto/intel/qat/qat_common/adf_pm_dbgfs_utils.c:4:
include/linux/sprintf.h:11:54: error: unknown type name 'va_list'
   11 | __printf(2, 0) int vsprintf(char *buf, const char *, va_list);
      |                                                      ^~~~~~~
include/linux/sprintf.h:1:1: note: 'va_list' is defined in header '<stdarg.h>'; this is probably fixable by adding '#include <stdarg.h>'

Link: https://lkml.kernel.org/r/20250721173754.42865913@canb.auug.org.au
Fixes: 39ced19b9e ("lib/vsprintf: split out sprintf() and friends")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24 17:58:00 -07:00
Akinobu Mita
91a229bb7b resource: fix false warning in __request_region()
A warning is raised when __request_region() detects a conflict with a
resource whose resource.desc is IORES_DESC_DEVICE_PRIVATE_MEMORY.

But this warning is only valid for iomem_resources.
The hmem device resource uses resource.desc as the numa node id, which can
cause spurious warnings.

This warning appeared on a machine with multiple cxl memory expanders. 
One of the NUMA node id is 6, which is the same as the value of
IORES_DESC_DEVICE_PRIVATE_MEMORY.

In this environment it was just a spurious warning, but when I saw the
warning I suspected a real problem so it's better to fix it.

This change fixes this by restricting the warning to only iomem_resource.
This also adds a missing new line to the warning message.

Link: https://lkml.kernel.org/r/20250719112604.25500-1-akinobu.mita@gmail.com
Fixes: 7dab174e2e ("dax/hmem: Move hmem device registration to dax_hmem.ko")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24 17:57:59 -07:00
SeongJae Park
1aef9df0ee mm/damon/core: commit damos_quota_goal->nid
DAMOS quota goal uses 'nid' field when the metric is
DAMOS_QUOTA_NODE_MEM_{USED,FREE}_BP.  But the goal commit function is not
updating the goal's nid field.  Fix it.

Link: https://lkml.kernel.org/r/20250719181932.72944-1-sj@kernel.org
Fixes: 0e1c773b50 ("mm/damon/core: introduce damos quota goal metrics for memory node utilization")	[6.16.x]
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24 17:57:59 -07:00
Dave Airlie
e6b39e516c - Fix DP 2.7 Gbps DP_LINK_BW value on g4x (Ville)
- Fix return value on intel_atomic_commit_fence_wait (Aakash)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmiCRLEACgkQ+mJfZA7r
 E8rQqwf9E2xMW+4F/Lm60wDXOj6u4zUc2Eh/2nYg6y/UR98Xy19u7iPdhOhlQYzf
 UhHYC/5xhaivONRdob5xnkJYAMWJ/CUAQ8iXBDGxKY5uClMWnNDnFYOUdBSwYDdh
 njs0IPPHnH2uehaIvw6v76yyaN2Z60W5WVlidwQFwaCSpIui38nlelxqzOj/JCjP
 XZbBDiFC4wEue3B1KStBAaLqfJYl5RPsS+flu/gbS6pbK4xsbnivC1OofTXuygdo
 4F+dzvp+Wj2Ngaw9DPB9tROpB7fODkcVhzON/fclmkn26ss3Qy+Z+0eOe3as99II
 /co+IJDnf87IHNVnF42Loh0JVabbNg==
 =TXWl
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-fixes-2025-07-24' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix DP 2.7 Gbps DP_LINK_BW value on g4x (Ville)
- Fix return value on intel_atomic_commit_fence_wait (Aakash)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aIJE9F-PcCe35PFb@intel.com
2025-07-25 10:57:28 +10:00
Linus Torvalds
94ce1ac2c9 pci-v6.16-fixes-4
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmiCrE0UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyhfRAAv5pB484Q4w1fKw2l3JZeYztBPeFh
 S0I4yZQCI+a1ecTSL1IgTUyRWrRoJDNtvJJResNtOuZKE9sxm5V/9h3d0qhMxCqw
 dOKk1vDB8im4VDlC+IgEcvKY28T54rWdoklqEf6DztJz5tr10vrSRIvu01wDoeI2
 FKH20jSbvhzzhP2JVlIXfYRnJ4jQVjgLbCl37m8oee3ifekzUM213+ZhFoOXR6a9
 fJnHBt6pJAOduBATGlutFMetEyQ6Zn7XzT6OWrm5pSJkahgvF2+7hbPynBYXehV+
 vn9xRJHKuC9WYpfAn3LEhnjNBNkDV+huYu7tv77htpB6M2Xq0HTlJWC0DeJGcozL
 0v9UULZCzV758faUqAyNbGh7MAXt0MAUSOLjKmCM4gK8D0YJRrpVYIG8quQ5sPQO
 zts3v6AYyOeliRarsgs+1MUe6/3hUSWwsy0k0wV5KH8NIXp4gc4qIte781vIEJsV
 G07LjGIoMQUrjVpXRr5ra53XxPWu0pliTd+A684YUnr/5MMqu2I4lKsP/vqxzOoX
 W4G1fDL8itEA3G0VBgTbYHPaZHQBa1+y4yOL6RcG/3gwS4bqkfcJPxN2jhRtHoXb
 N3RRkb8rFuNGjTEIXt217BonXZRFeQxwgrXDLrjGE9SryhxNlsN0u/h+7TCeBC58
 pPO+RyDQSBV0bBQ=
 =ZwhJ
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.16-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

 - Create pwrctrl devices only when we need them, i.e., when
   CONFIG_PCI_PWRCTRL is enabled.

   This allows brcmstb to work around a pwrctrl regression by
   disabling CONFIG_PCI_PWRCTRL (Manivannan Sadhasivam)

* tag 'pci-v6.16-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/pwrctrl: Create pwrctrl devices only when CONFIG_PCI_PWRCTRL is enabled
2025-07-24 15:33:00 -07:00
Lucas De Marchi
99e91521ce drm/xe: Fix build without debugfs
When CONFIG_DEBUG_FS is off, drivers/gpu/drm/xe/xe_gt_debugfs.o
is not built and build fails on some setups with:

	ld: drivers/gpu/drm/xe/xe_gt.o: in function `xe_fault_inject_gt_reset':
	drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1659): undefined reference to `gt_reset_failure'
	ld: drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1c16): undefined reference to `gt_reset_failure'
	collect2: error: ld returned 1 exit status

Do not use the gt_reset_failure attribute if debugfs is not enabled.

Fixes: 8f3013e0b2 ("drm/xe: Introduce fault injection for gt reset")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250722-xe-fix-build-fault-v1-1-157384d50987@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 4d3bbe9dd28c0a4ca119e4b8823c5f5e9cb3ff90)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-07-24 21:06:52 +02:00
Linus Torvalds
dd9c17322a sound fixes for 6.16-final
Some last-minute fixes.  All changes are device-specific small fixes
 or quirks, safe to apply.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmiCKpcOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE+4cg//ZnF1BQVX056UpOEL5QrYnQXHnA5p7g1av6Ek
 k/E155lAxm8GBgfydC5AlOo7KLtwMDBQ485NVAP+pHMd4VrWp8udptcq6lxMCYS+
 5NmCPaePOcTP+fVFKI0miBC60oEfhHDVie15nxARSsCJyIM04edO93KXfrnCyfbx
 +PDynl8YQCvLQ4H9Fz6BCI/oPfw6DIocgGpvnyU7zl6lJWVfHvw3pssOfmVHH7dp
 dlTUZe+S2MDBSApxogdOtx1BsJi6/V7DnNZc43gLarmE4H042ufcHXjatFdzSFQ6
 +X5jWLbRy0mwkRq/gewf8AvPnGKe4jPEup4MZ3yZG0NZz//iLzH3rmDycbgkyEZR
 wc01uAnaBY5lB/1YORsE2szI2AUMN7+g06c1yfEtxUsndaviJjWFX9MOOSy/4O3a
 3GdqxyFLsQujvYUPyiV/iygv6JTGTRf9pbKB4EdpBbK7yHd9vchKxzqlp40UrIfM
 TPd/4L2TZj29xwOl4eBn0Vye4Js3r9OUEqWIUrHMUNXjwx83skCpJ1P5BbH5CD1y
 pmXyFlYmbI6QxR5hzm+13K72tGFMPowpreh2JiICsgxrkbLHrX7KQEwn28s0SdH4
 QmMOhkV4GOuEdzSzO2z4xRjdpI6i4y4qdjXt1+hFr4sJNfs4UI1lqW1MnHHNvfeH
 9JHgTZ0=
 =mkoF
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Some last-minute fixes. All changes are device-specific small fixes or
  quirks, safe to apply"

* tag 'sound-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: mediatek: common: fix device and OF node leak
  ALSA: hda/realtek: Fix mute LED mask on HP OMEN 16 laptop
  ALSA: usb-audio: qcom: Adjust mutex unlock order
  ASoC: SDCA: correct the calculation of the maximum init table size
  ASoC: rt5650: Eliminate the high frequency glitch
  ASoC: SOF: Intel: PTL: Add the sdw_process_wakeen op
  ALSA: hda/realtek - Add mute LED support for HP Pavilion 15-eg0xxx
  ALSA: hda/realtek - Add mute LED support for HP Victus 15-fa0xxx
  ASoC: mediatek: mt8365-dai-i2s: pass correct size to mt8365_dai_set_priv
2025-07-24 09:15:16 -07:00
Linus Torvalds
cef6c8c92f arm64 fixes for 6.16
- Fix broken UUID value for the KVM/arm64 hypervisor SMCCC interface.
 
 - Fix stack corruption on context-switch, primarily seen on (but not
   limited to) configurations with both pNMI and SCS enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmiBBXwQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNFitB/9qIwohXTPcwfFux3FJpiUAMzY0KlP2TIJF
 +wMRNSkYw/EKKorz4nzdEagyvWLvbFqg8eZDaoVTV7+DyQXb5Fz9R7s8yVQwBIpY
 naAPm5A+I9gitiISA8M1Kr44MpTJBUHRpOSIIiTQK0/ijmd02sDkZZJuTUqYRmUP
 1SY7A6Ps3Oz3cfqDtOIKUy8LNi22Wha6n9r3YghO5nhlLgrO1vMf/2uX1fRULTee
 rpv4vICUjy9JK7F3W2osQ88UQkRY+l7vEtEzDdAYREPbg7m0ye1R1rTp2NiQ/mja
 b+YoKudvdce4Mwui3UB5kRCDBB5kJIX7Itvu/AfjJ5e/cr0fI15X
 =tkat
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "Two important arm64 fixes ahead of the 6.16 release.

  The first fixes a regression introduced during the merge window where
  the KVM UUID (which is used to advertise KVM-specific hypercalls for
  things like time synchronisation in the guest) was corrupted thanks to
  an endianness bug introduced when converting the code to use the
  UUID_INIT() helper.

  The second fixes a stack-pointer corruption issue during
  context-switch which has been observed in the wild when taking a
  pseudo-NMI with shadow call stack enabled.

  Summary:

   - Fix broken UUID value for the KVM/arm64 hypervisor SMCCC interface

   - Fix stack corruption on context-switch, primarily seen on (but not
     limited to) configurations with both pNMI and SCS enabled"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
  arm64: kvm, smccc: Fix vendor uuid
2025-07-24 08:50:55 -07:00
Linus Torvalds
407c114c98 Including fixes from can and xfrm.
The TI regression notified last week is actually on our net-next tree,
 it does not affect 6.16.
 We are investigating a virtio regression which is quite hard to
 reproduce - currently only our CI sporadically hits it. Hopefully it
 should not be critical, and I'm not sure that an additional week would
 be enough to solve it.
 
 Current release - fix to a fix:
 
   - sched: sch_qfq: avoid sleeping in atomic context in qfq_delete_class
 
 Previous releases - regressions:
 
   - xfrm:
     - set transport header to fix UDP GRO handling
     - delete x->tunnel as we delete x
 
   - eth: mlx5: fix memory leak in cmd_exec()
 
   - eth: i40e: when removing VF MAC filters, avoid losing PF-set MAC
 
   - eth: gve: fix stuck TX queue for DQ queue format
 
 Previous releases - always broken:
 
   - can: fix NULL pointer deref of struct can_priv::do_set_mode
 
   - eth: ice: fix a null pointer dereference in ice_copy_and_init_pkg()
 
   - eth: ism: fix concurrency management in ism_cmd()
 
   - eth: dpaa2: fix device reference count leak in MAC endpoint handling
 
   - eth: icssg-prueth: fix buffer allocation for ICSSG
 
 Misc:
 
   - selftests: mptcp: increase code coverage
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmiCGdASHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkV2YP/idUgK6lAEHGIVvmhYn7TJCeA4Ekbybt
 rjf71bUJRs1w64y2UFaK4/LOdOWoK6ztZIX3tFkkPwMKxvGsvymMcClhghSGU6E5
 8h1jcgSCzvtAV6bSDeulDSfqKu4OJ2KCJomwQKKO0bu3RDQTkQdPNLxb6ysAonml
 6Y0v2kiclnjtIdXJ+emdgQ3CBjLabPLTzXBVy1yuk0J1JLL2UACzK4xmacKvlpN0
 heN7eEontRrPmBVNRY5+rw5yrYBwAg6Y8pa3dF3M8bXiXc0xvoT5c8lA9d0PNRSe
 9TBeXwCASdr9vDFORoL87GwDV82pfrXrSAcHBfdW55ItblgkzLBiIkGsGGt5YynZ
 1ZA8fGJVJwNbV7jFqQkjFob+Te6dCL1wcFNbbrrRtU5V24D0Q0YE56HHcFqgbuht
 dLuDSsxVL1TGlpA5q2f1jylfIekLgMV9d1X8adagY1VRuzK4SzGZpCXEcpBUI2z3
 boTpc/BTQx9t42qjegLZqCu+TmvVwYM6zarUvO8jQtC9LZpMY1sPxpu+qFd5yYJ1
 VyK+EGOq7At6qwxchUBiwVsek08G5vXWsQFVA9qn+sPTi5iKhDB97dJFnnb1aBo/
 aw/lYHFBzn5I31HhLaDKFMqie4JUOfj9gd/Ab7hR4u4jAqUG9tkPjrdCWtLFDrSn
 Jn+o05q8HLwK
 =7rIq
 -----END PGP SIGNATURE-----

Merge tag 'net-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can and xfrm.

  The TI regression notified last week is actually on our net-next tree,
  it does not affect 6.16.

  We are investigating a virtio regression which is quite hard to
  reproduce - currently only our CI sporadically hits it. Hopefully it
  should not be critical, and I'm not sure that an additional week would
  be enough to solve it.

  Current release - fix to a fix:

   - sched: sch_qfq: avoid sleeping in atomic context in qfq_delete_class

  Previous releases - regressions:

   - xfrm:
      - set transport header to fix UDP GRO handling
      - delete x->tunnel as we delete x

   - eth:
      - mlx5: fix memory leak in cmd_exec()
      - i40e: when removing VF MAC filters, avoid losing PF-set MAC
      - gve: fix stuck TX queue for DQ queue format

  Previous releases - always broken:

   - can: fix NULL pointer deref of struct can_priv::do_set_mode

   - eth:
      - ice: fix a null pointer dereference in ice_copy_and_init_pkg()
      - ism: fix concurrency management in ism_cmd()
      - dpaa2: fix device reference count leak in MAC endpoint handling
      - icssg-prueth: fix buffer allocation for ICSSG

  Misc:

   - selftests: mptcp: increase code coverage"

* tag 'net-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
  net: hns3: default enable tx bounce buffer when smmu enabled
  net: hns3: fixed vf get max channels bug
  net: hns3: disable interrupt when ptp init failed
  net: hns3: fix concurrent setting vlan filter issue
  s390/ism: fix concurrency management in ism_cmd()
  selftests: drv-net: wait for iperf client to stop sending
  MAINTAINERS: Add in6.h to MAINTAINERS
  selftests: netfilter: tone-down conntrack clash test
  can: netlink: can_changelink(): fix NULL pointer deref of struct can_priv::do_set_mode
  net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in qfq_delete_class
  gve: Fix stuck TX queue for DQ queue format
  net: appletalk: Fix use-after-free in AARP proxy probe
  net: bcmasp: Restore programming of TX map vector register
  selftests: mptcp: connect: also cover checksum
  selftests: mptcp: connect: also cover alt modes
  e1000e: ignore uninitialized checksum word on tgp
  e1000e: disregard NVM checksum on tgp when valid checksum bit is not set
  ice: Fix a null pointer dereference in ice_copy_and_init_pkg()
  i40e: When removing VF MAC filters, only check PF-set MAC
  i40e: report VF tx_dropped with tx_errors instead of tx_discards
  ...
2025-07-24 08:44:42 -07:00
Paolo Abeni
291d5dc80e ipsec-2025-07-23
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmiAj9QACgkQrB3Eaf9P
 W7dQEhAAj0ccCgKych/IBx4zlK/TSsPN9PnZmylTgpfWQdClCKjNEa80Umf2wWgc
 0j5rA77B2AJSHkRHMW6cWpmZRbicnWDavAtUOB/2lEcw8EDyMcIEEByzBtRkylvb
 S1slLlpAYyezHsSr9uGu6+Meb3llkL2+pnF2CiA02uWI0MaEXaqtGfEqZ4DZX9BK
 xQcb3St6Mr6pBV0sQ/rlY3Gfe/t/gzUnJNQQtRuKcZmBG4Ls1MLpJ6fYU++k+lfX
 my4JehdI2Sg6QujX4HdlBDF1bCzcKpxzJGQVhkjT9OPx25owtoknMezeWdrK3Gdk
 0pwIFs0+dFCc6hi2x6seabSBPNZJzT7978XZ0U2PneGoKlSw36UuAvk2ckAaBS+N
 I2flFQRGAf7XjxNn53lRailG/nxGYUMEvhlM+uGjy1rujDn8NmdZXAwtQUGP25bJ
 DCcXiQp6KUnclIWJ0MjScuFUG++Afn52DgsM+N5Fmjg9U/F1dVqIWjixcOOM4Lx2
 DOpJy9gi3r7JZ0b1K7KMq1FibxFGszaW4YeABNPqlcEjnWiVhPNQjxKm0S84UQdL
 Qh0s1PiAWhBqt4bvdf6h+r8309JWgo704MWxC2tBodDVJoYgugNXbnPihrr6+wO3
 D9SbUJHSdTLOWTWzhlrFEb2oHVHAXc3KZpIQnk+32XaxqcNHpeY=
 =MBQn
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2025-07-23

1) Premption fixes for xfrm_state_find.
   From Sabrina Dubroca.

2) Initialize offload path also for SW IPsec GRO. This fixes a
   performance regression on SW IPsec offload.
   From Leon Romanovsky.

3) Fix IPsec UDP GRO for IKE packets.
   From Tobias Brunner,

4) Fix transport header setting for IPcomp after decompressing.
   From Fernando Fernandez Mancera.

5)  Fix use-after-free when xfrmi_changelink tries to change
    collect_md for a xfrm interface.
    From Eyal Birger .

6) Delete the special IPcomp x->tunnel state along with the state x
   to avoid refcount problems.
   From Sabrina Dubroca.

Please pull or let me know if there are problems.

* tag 'ipsec-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  Revert "xfrm: destroy xfrm_state synchronously on net exit path"
  xfrm: delete x->tunnel as we delete x
  xfrm: interface: fix use-after-free after changing collect_md xfrm interface
  xfrm: ipcomp: adjust transport header after decompressing
  xfrm: Set transport header to fix UDP GRO handling
  xfrm: always initialize offload path
  xfrm: state: use a consistent pcpu_id in xfrm_state_find
  xfrm: state: initialize state_ptrs earlier in xfrm_state_find
====================

Link: https://patch.msgid.link/20250723075417.3432644-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 12:30:40 +02:00
Paolo Abeni
89fd905dab Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver

v1: https://lore.kernel.org/all/20250702130901.2879031-1-shaojijie@huawei.com/
====================

Link: https://patch.msgid.link/20250722125423.1270673-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:28 +02:00
Jijie Shao
49ade8630f net: hns3: default enable tx bounce buffer when smmu enabled
The SMMU engine on HIP09 chip has a hardware issue.
SMMU pagetable prefetch features may prefetch and use a invalid PTE
even the PTE is valid at that time. This will cause the device trigger
fake pagefaults. The solution is to avoid prefetching by adding a
SYNC command when smmu mapping a iova. But the performance of nic has a
sharp drop. Then we do this workaround, always enable tx bounce buffer,
avoid mapping/unmapping on TX path.

This issue only affects HNS3, so we always enable
tx bounce buffer when smmu enabled to improve performance.

Fixes: 295ba232a8 ("net: hns3: add device version to replace pci revision")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722125423.1270673-5-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:22 +02:00
Jian Shen
b3e75c0bcc net: hns3: fixed vf get max channels bug
Currently, the queried maximum of vf channels is the maximum of channels
supported by each TC. However, the actual maximum of channels is
the maximum of channels supported by the device.

Fixes: 849e460776 ("net: hns3: add ethtool_ops.get_channels support for VF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722125423.1270673-4-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:21 +02:00
Yonglong Liu
cde304655f net: hns3: disable interrupt when ptp init failed
When ptp init failed, we'd better disable the interrupt and clear the
flag, to avoid early report interrupt at next probe.

Fixes: 0bf5eb7885 ("net: hns3: add support for PTP")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722125423.1270673-3-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:21 +02:00
Jian Shen
4555f8f8b6 net: hns3: fix concurrent setting vlan filter issue
The vport->req_vlan_fltr_en may be changed concurrently by function
hclge_sync_vlan_fltr_state() called in periodic work task and
function hclge_enable_vport_vlan_filter() called by user configuration.
It may cause the user configuration inoperative. Fixes it by protect
the vport->req_vlan_fltr by vport_lock.

Fixes: 2ba306627f ("net: hns3: add support for modify VLAN filter state")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722125423.1270673-2-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 11:27:21 +02:00
Halil Pasic
897e8601b9 s390/ism: fix concurrency management in ism_cmd()
The s390x ISM device data sheet clearly states that only one
request-response sequence is allowable per ISM function at any point in
time.  Unfortunately as of today the s390/ism driver in Linux does not
honor that requirement. This patch aims to rectify that.

This problem was discovered based on Aliaksei's bug report which states
that for certain workloads the ISM functions end up entering error state
(with PEC 2 as seen from the logs) after a while and as a consequence
connections handled by the respective function break, and for future
connection requests the ISM device is not considered -- given it is in a
dysfunctional state. During further debugging PEC 3A was observed as
well.

A kernel message like
[ 1211.244319] zpci: 061a:00:00.0: Event 0x2 reports an error for PCI function 0x61a
is a reliable indicator of the stated function entering error state
with PEC 2. Let me also point out that a kernel message like
[ 1211.244325] zpci: 061a:00:00.0: The ism driver bound to the device does not support error recovery
is a reliable indicator that the ISM function won't be auto-recovered
because the ISM driver currently lacks support for it.

On a technical level, without this synchronization, commands (inputs to
the FW) may be partially or fully overwritten (corrupted) by another CPU
trying to issue commands on the same function. There is hard evidence that
this can lead to DMB token values being used as DMB IOVAs, leading to
PEC 2 PCI events indicating invalid DMA. But this is only one of the
failure modes imaginable. In theory even completely losing one command
and executing another one twice and then trying to interpret the outputs
as if the command we intended to execute was actually executed and not
the other one is also possible.  Frankly, I don't feel confident about
providing an exhaustive list of possible consequences.

Fixes: 684b89bc39 ("s390/ism: add device driver for internal shared memory")
Reported-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Tested-by: Mahanta Jambigi <mjambigi@linux.ibm.com>
Tested-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722161817.1298473-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-24 10:57:26 +02:00
Takashi Iwai
0d57ed922b ASoC: Fixes for v6.16
A few device specific fixes, none especially remarkable though all
 useful.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmiBFdUACgkQJNaLcl1U
 h9CrBwf/S9zAmKv9zfCgpgyxN8GEJv4pwj/8NGLjv1FlNetX3gMsGq6Xp58b+9Ry
 /nPLkKnl3Vgon3QDQCbWNza3wliDDyPFVqqrIsbGrEy3E78UraIKNXIjxlanJkDx
 emZGfi4C8tCdaBMjaHTNGcxomYsH0TGascDfng5yrtVDi9mityg+zeC0gLBUuR/Z
 07qLIakp5iYavHGuxBkY55eHWncperoQpRv3C3efzXlDM/Eu3HcQoRmKX1M3yTx9
 mboJ/xfuLDOSf7ZZJeGnoeZToDgvoUcyM8m6R06uj9th3dq6sP3FraFGHe60E+Mq
 qwbBrtGv/dczbf/LyIBnolqXx1xdfg==
 =Pl6U
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v6.16-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.16

A few device specific fixes, none especially remarkable though all
useful.
2025-07-24 08:45:13 +02:00
Linus Torvalds
25fae0b93d drm fixes for 6.16-rc8
gem:
 - revert all the dma-buf/gem changes
   as there as lifetime issues with it.
 
 nouveau:
 - revert an ioctl change as it causes issues
 - fix NULL ptr on fermi
 
 bridge:
 - remove extra semicolon
 
 sched:
 - remove hang causing optimisation
 
 amdgpu:
 - fix garbage in cleared vram after resume
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmiBfb0ACgkQDHTzWXnE
 hr4ZRBAAiGleBHnZGpdByMmmMIW6LVETISJiaBhZ8Cfx3dgFlW13Edhce9wACJ6t
 R6zP6fGTDVhhRohIEx+H62R8hdZiL3Y3MYdFh47VwwxTWEs9qqjVMJUVWPgl+xS6
 5UTNqb1nm6lrKJHncD2vXCVKsk4XEJaZEXolwrmUpCrVPlv6zz1uxBFRNXBxzaOZ
 Q1zz5UgjE67Jv8qb3MaH5rgysTKUUHrC+6LfKvXpauovGVA3g0FlGK7N5gsJm8A7
 ThPjwAOs9HUTHeaSZDGW96Kv84ltR5yhPIcAX+U0S4RNl1NTUOQaACHTTWrf7OTb
 FXVy1WihM958olRkZeyaa6B8lNRF98eHDJXRa4Kvzr7LmW79J28XyW/9Ht2bjRhq
 e2v/XGOWJRLU46ahsFF4cfRo1nE9ctksHvggX7E0uwclRuXLaHDjbWW5FG9THciE
 geNQEN3AcoBN1gwQ49wzOvF2UKRFM+CIXiXhriAKe5wpqYt559mSyddCLsL8oH5L
 S3ctlcm1R3UYcvWwZ/hzkoFzwGCmj50xslswE2s6lTEFohM9wG84QunZx6u97nNA
 nQRuU45juljMMvgVYhngIgEPQbnnh+Rz2UFyRf6mB2n/OmLxZ74zSCJkD/RxB7+h
 ffSJgwbYIFs9k7HWYp9u/ReFsYNg3SyNOuEEs8aetQG913OO7ZY=
 =4EOG
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2025-07-24' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "This might just be part one, but I'm sending it a bit early as it has
  two sets of reverts for regressions, one is all the gem/dma-buf
  handling and another was a nouveau ioctl change.

  Otherwise there is an amdgpu fix, nouveau fix and a scheduler fix.

  If any other changes come in I'll follow up with another more usual
  Fri/Sat MR.

  gem:
   - revert all the dma-buf/gem changes as there as lifetime issues
     with them

  nouveau:
   - revert an ioctl change as it causes issues
   - fix NULL ptr on fermi

  bridge:
   - remove extra semicolon

  sched:
   - remove hang causing optimisation

  amdgpu:
   - fix garbage in cleared vram after resume"

* tag 'drm-fixes-2025-07-24' of https://gitlab.freedesktop.org/drm/kernel:
  drm/bridge: ti-sn65dsi86: Remove extra semicolon in ti_sn_bridge_probe()
  Revert "drm/nouveau: check ioctl command codes better"
  drm/nouveau/nvif: fix null ptr deref on pre-fermi boards
  Revert "drm/gem-dma: Use dma_buf from GEM object instance"
  Revert "drm/gem-shmem: Use dma_buf from GEM object instance"
  Revert "drm/gem-framebuffer: Use dma_buf from GEM object instance"
  Revert "drm/prime: Use dma_buf from GEM object instance"
  Revert "drm/etnaviv: Use dma_buf from GEM object instance"
  Revert "drm/vmwgfx: Use dma_buf from GEM object instance"
  Revert "drm/virtio: Use dma_buf from GEM object instance"
  drm/sched: Remove optimization that causes hang when killing dependent jobs
  drm/amdgpu: Reset the clear flag in buddy during resume
2025-07-23 18:56:24 -07:00
Nimrod Oren
8694138250 selftests: drv-net: wait for iperf client to stop sending
A few packets may still be sent out during the termination of iperf
processes. These late packets cause failures in rss_ctx.py when they
arrive on queues expected to be empty.

Example failure observed:

  Check failed 2 != 0 traffic on inactive queues (context 1):
    [0, 0, 1, 1, 386385, 397196, 0, 0, 0, 0, ...]

  Check failed 4 != 0 traffic on inactive queues (context 2):
    [0, 0, 0, 0, 2, 2, 247152, 253013, 0, 0, ...]

  Check failed 2 != 0 traffic on inactive queues (context 3):
    [0, 0, 0, 0, 0, 0, 1, 1, 282434, 283070, ...]

To avoid such failures, wait until all client sockets for the requested
port are either closed or in the TIME_WAIT state.

Fixes: 847aa551fa ("selftests: drv-net: rss_ctx: factor out send traffic and check")
Signed-off-by: Nimrod Oren <noren@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722122655.3194442-1-noren@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 18:52:12 -07:00
Yang Xiwen
a7982a14b3 i2c: qup: jump out of the loop in case of timeout
Original logic only sets the return value but doesn't jump out of the
loop if the bus is kept active by a client. This is not expected. A
malicious or buggy i2c client can hang the kernel in this case and
should be avoided. This is observed during a long time test with a
PCA953x GPIO extender.

Fix it by changing the logic to not only sets the return value, but also
jumps out of the loop and return to the caller with -ETIMEDOUT.

Fixes: fbfab1ab06 ("i2c: qup: reorganization of driver code to remove polling for qup v1")
Signed-off-by: Yang Xiwen <forbidden405@outlook.com>
Cc: <stable@vger.kernel.org> # v4.17+
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250616-qca-i2c-v1-1-2a8d37ee0a30@outlook.com
2025-07-24 00:38:01 +02:00
Viresh Kumar
a663b3c47a i2c: virtio: Avoid hang by using interruptible completion wait
The current implementation uses wait_for_completion(), which can cause
the caller to hang indefinitely if the transfer never completes.

Switch to wait_for_completion_interruptible() so that the operation can
be interrupted by signals.

Fixes: 84e1d0bf1d ("i2c: virtio: disable timeout handling")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: <stable@vger.kernel.org> # v5.16+
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/b8944e9cab8eb959d888ae80add6f2a686159ba2.1751541962.git.viresh.kumar@linaro.org
2025-07-24 00:38:01 +02:00
Akhil R
56344e241c i2c: tegra: Fix reset error handling with ACPI
The acpi_evaluate_object() returns an ACPI error code and not
Linux one. For the some platforms the err will have positive code
which may be interpreted incorrectly. Use device_reset() for
reset control which handles it correctly.

Fixes: bd2fdedbf2 ("i2c: tegra: Add the ACPI support")
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Cc: <stable@vger.kernel.org> # v5.17+
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250710131206.2316-2-akhilrajeev@nvidia.com
2025-07-24 00:38:00 +02:00
Kees Cook
14822f7827 MAINTAINERS: Add in6.h to MAINTAINERS
My CC-adding automation returned nothing on a future patch to the
include/linux/in6.h file, and I went looking for why. Add the missed
in6.h to MAINTAINERS.

Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722165645.work.047-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23 15:33:35 -07:00
Linus Torvalds
f9af7b5d93 x86:
* Fix cleanup mistake (probably a cut-and-paste error) in a Xen
   hypercall.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmiBWM8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNC7wf+Oqx6vEB4RyFR2hoLRlQiKiL4uA1y
 TLesrbHefK1VwrMn3rEXAzJQh1u67aUBj03u5ggqvRVvbDVaUztrAPM50qnxLIvY
 G3nK7N+M3FCjDodXGD9xLywfV5+uFaCTlkeHbzL1CvO+cNKD7DPodH3yTJnuv7mN
 4h0dSr1XWhHe+7nTtbkSVObPG1ndlZgbw+yxCel62p/dCBLcQblYjkbfD3XNmv+M
 deJnlZt7p/bOSWCQh8qGY54JvF2hnjn/m11dosSXlSdjLUKBqF/YJMfuiBvYwl/C
 ffDHVHhtXU0wAzyBwm4LJ7HhG65HoakYPDDEJkcOzQj84EH6F1+C2Y9EJQ==
 =gjN5
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fix from Paolo Bonzini:

 - Fix cleanup mistake (probably a cut-and-paste error) in a Xen
   hypercall

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/xen: Fix cleanup logic in emulation of Xen schedop poll hypercalls
2025-07-23 15:04:27 -07:00
Manuel Andreas
5a53249d14 KVM: x86/xen: Fix cleanup logic in emulation of Xen schedop poll hypercalls
kvm_xen_schedop_poll does a kmalloc_array() when a VM polls the host
for more than one event channel potr (nr_ports > 1).

After the kmalloc_array(), the error paths need to go through the
"out" label, but the call to kvm_read_guest_virt() does not.

Fixes: 92c58965e9 ("KVM: x86/xen: Use kvm_read_guest_virt() instead of open-coding it badly")
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Manuel Andreas <manuel.andreas@tum.de>
[Adjusted commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-23 23:48:54 +02:00
Dave Airlie
337666c522 Merge tag 'drm-misc-fixes-2025-07-23' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16-rc8/final?:
- Revert all uses of drm_gem_object->dmabuf to
  drm_gem_object->import_attach->dmabuf.
- Fix amdgpu returning BIOS cluttered VRAM after resume.
- Scheduler hang fix.
- Revert nouveau ioctl fix as it caused regressions.
- Fix null pointer deref in nouveau.
- Fix unnecessary semicolon in ti_sn_bridge_probe.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/72235afd-c849-49fe-9cc1-2b1781abdf08@linux.intel.com
2025-07-24 06:49:38 +10:00
Linus Torvalds
01a412d06b regression in ufs options parsing
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIED9AAKCRBZ7Krx/gZQ
 69tPAQD4FLiS82oG1LfSjPFa6of9omdkSQBhZptuFBTZ8MyVJwD/T/fDFnIXdZQb
 J4MazfsRCtgRdFlRGH90EQGRygwxdQA=
 =UiOi
 -----END PGP SIGNATURE-----

Merge tag 'pull-ufs-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull ufs fix from Al Viro:
 "Fix regression in ufs options parsing"

* tag 'pull-ufs-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix the regression in ufs options parsing
2025-07-23 08:53:38 -07:00
Al Viro
e09a335a81 fix the regression in ufs options parsing
A really dumb braino on rebasing and a dumber fuckup with managing #for-next

Fixes: b70cb45989 ("ufs: convert ufs to the new mount API")
Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-23 11:45:04 -04:00
Aakash Deep Sarkar
fd77b2c1b6
drm/i915/display: Fix dma_fence_wait_timeout() return value handling
dma_fence_wait_timeout returns a long type but the driver is
only using the lower 32 bits of the retval and discarding the
upper 32 bits.

This is particularly problematic if there are already signalled
or stub fences on some of the hw planes. In this case the
dma_fence_wait_timeout function will immediately return with
timeout value MAX_SCHEDULE_TIMEOUT (0x7fffffffffffffff) since
the fence is already signalled. If the driver only uses the lower
32 bits of this return value then it'll interpret it as an error
code (0xFFFFFFFF or (-1)) and skip the wait on the remaining fences.

This issue was first observed in the xe driver with the Android
compositor where the GPU composited layer was not properly waited
on when there were stub fences in other overlay planes resulting in
visual artifacts.

Fixes: d59cf7bb73 ("drm/i915/display: Use dma_fence interfaces instead of i915_sw_fence")
Signed-off-by: Aakash Deep Sarkar <aakash.deep.sarkar@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Link: https://lore.kernel.org/r/20250708074540.1948068-1-aakash.deep.sarkar@intel.com
(cherry picked from commit cdb16039515a5ac4d2c923f7a651cf19a803a3fe)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-23 09:12:22 -04:00
Gabor Juhos
f820034864
spi: spi-qpic-snand: don't hardcode ECC steps
NAND devices with different page sizes requires different number
of ECC steps, yet the qcom_spi_ecc_init_ctx_pipelined() function
sets 4 steps in 'ecc_cfg' unconditionally.

The correct number of the steps is calculated earlier in the
function already, so use that instead of the hardcoded value.

Fixes: 7304d19090 ("spi: spi-qpic: add driver for QCOM SPI NAND flash Interface")
Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
Link: https://patch.msgid.link/20250723-qpic-snand-fix-steps-v1-1-d800695dde4c@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-07-23 13:04:03 +01:00
Johan Hovold
696e123aa3
ASoC: mediatek: common: fix device and OF node leak
Make sure to drop the references to the accdet OF node and platform
device taken by of_parse_phandle() and of_find_device_by_node() after
looking up the sound component during probe.

Fixes: cf536e2622 ("ASoC: mediatek: common: Handle mediatek,accdet property")
Cc: stable@vger.kernel.org	# 6.15
Cc: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20250722092542.32754-1-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-07-23 13:03:57 +01:00
Edward Adam Davis
8b3c655fa2
afs: Set vllist to NULL if addr parsing fails
syzbot reported a bug in in afs_put_vlserverlist.

  kAFS: bad VL server IP address
  BUG: unable to handle page fault for address: fffffffffffffffa
  ...
  Oops: Oops: 0002 [#1] SMP KASAN PTI
  ...
  RIP: 0010:refcount_dec_and_test include/linux/refcount.h:450 [inline]
  RIP: 0010:afs_put_vlserverlist+0x3a/0x220 fs/afs/vl_list.c:67
  ...
  Call Trace:
   <TASK>
   afs_alloc_cell fs/afs/cell.c:218 [inline]
   afs_lookup_cell+0x12a5/0x1680 fs/afs/cell.c:264
   afs_cell_init+0x17a/0x380 fs/afs/cell.c:386
   afs_proc_rootcell_write+0x21f/0x290 fs/afs/proc.c:247
   proc_simple_write+0x114/0x1b0 fs/proc/generic.c:825
   pde_write fs/proc/inode.c:330 [inline]
   proc_reg_write+0x23d/0x330 fs/proc/inode.c:342
   vfs_write+0x25c/0x1180 fs/read_write.c:682
   ksys_write+0x12a/0x240 fs/read_write.c:736
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

Because afs_parse_text_addrs() parses incorrectly, its return value -EINVAL
is assigned to vllist, which results in -EINVAL being used as the vllist
address when afs_put_vlserverlist() is executed.

Set the vllist value to NULL when a parsing error occurs to avoid this
issue.

Fixes: e2c2cb8ef0 ("afs: Simplify cell record handling")
Reported-by: syzbot+5c042fbab0b292c98fc6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5c042fbab0b292c98fc6
Tested-by: syzbot+5c042fbab0b292c98fc6@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/4119365.1753108011@warthog.procyon.org.uk
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-23 13:54:34 +02:00
Leo Stone
9aa6418295
afs: Fix check for NULL terminator
Add a missing check for reaching the end of the string while attempting
to split a command.

Fixes: f94f70d39c ("afs: Provide a way to configure address priorities")
Reported-by: syzbot+7741f872f3c53385a2e2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7741f872f3c53385a2e2
Signed-off-by: Leo Stone <leocstone@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/4119428.1753108152@warthog.procyon.org.uk
Acked-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-23 13:54:05 +02:00
Jakub Kicinski
67e9d0b40b linux-can-fixes-for-6.16-20250722
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmh/bnkTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAMdGXf+ZCRnD/YB/4r/iJoSGIOjVdSVXB+EediaB9tkS7k
 XRODKHkCwfo/QFC6WIl+lFAhTfd09PTjUERJZoUSbNU0oYOSbFR2lhVniBOHobT+
 cLq7GGWFWxNdQkba/hzxI1gh/J+/YtYeC36aq54/5ICIcckJ6jHwUi/j8NE9sSBU
 A6evv7+MtYxhysT5F7ECQ4v1d2ypppBrCHlllaMqEBm/IV9PhG1epe9fstjR3Im7
 vDd6c5aDaeAi8xwwlQqYZ6ypLtYLhojMM9IwyQ5QQLxdPgSSbsDNO0/tMloOKCl8
 r5ycnBof00FWVxkvF0eMj9MukorXGPGm6clY4wtuTwkjQg4aRTRu7qZF
 =qCya
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-6.16-20250722' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-07-22

The patch is by me and fixes a potential NULL pointer deref in the CAN
device driver infrastructure. It can be triggered from user space.

* tag 'linux-can-fixes-for-6.16-20250722' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: netlink: can_changelink(): fix NULL pointer deref of struct can_priv::do_set_mode
====================

Link: https://patch.msgid.link/20250722110059.3664104-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-22 18:39:51 -07:00
Florian Westphal
dca56cc8b5 selftests: netfilter: tone-down conntrack clash test
The test is supposed to observe that the 'clash_resolve' stat counter
incremented (i.e., the code path was covered).
This check was incorrect, 'conntrack -S' needs to be called in the
revevant namespace, not the initial netns.

The clash resolution logic in conntrack is only exercised when multiple
packets with the same udp quadruple race. Depending on kernel config,
number of CPUs, scheduling policy etc.  this might not trigger even
after several retries.  Thus the script eventually returns SKIP if the
retry count is exceeded.

The udpclash tool with also exit with a failure if it did not observe
the expected number of replies.

In the script, make a note of this but do not fail anymore, just check if
the clash resolution logic triggered after all.

Remove the 'single-core' test: while unlikely, with preemptible kernel it
should be possible to also trigger clash resolution logic.

With this change the test will either SKIP or pass.

Hard error could be restored later once its clear whats going on, so
also dump 'conntrack -S' when some packets went missing to see if
conntrack dropped them on insert.

Fixes: 78a5883635 ("selftests: netfilter: add conntrack clash resolution test case")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250721223652.6956-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-22 18:26:54 -07:00
Jakub Kicinski
71c33df471 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-07-21 (i40e, ice, e1000e)

For i40e:
Dennis Chen adjusts reporting of VF Tx dropped to a more appropriate
field.

Jamie Bainbridge fixes a check which can cause a PF set VF MAC address
to be lost.

For ice:
Haoxiang Li adds an error check in DDP load to prevent NULL pointer
dereference.

For e1000e:
Jacek Kowalski adds workarounds for issues surrounding Tiger Lake
platforms with uninitialized NVMs.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  e1000e: ignore uninitialized checksum word on tgp
  e1000e: disregard NVM checksum on tgp when valid checksum bit is not set
  ice: Fix a null pointer dereference in ice_copy_and_init_pkg()
  i40e: When removing VF MAC filters, only check PF-set MAC
  i40e: report VF tx_dropped with tx_errors instead of tx_discards
====================

Link: https://patch.msgid.link/20250721173733.2248057-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-22 18:24:10 -07:00
Diederik de Haas
912b1f2a79
arm64: dts: rockchip: Drop netdev led-triggers on NanoPi R5S
Sometimes the netdev triggers causes tasks to get blocked for more then
120 seconds, which in turn makes the (WAN) network port on the NanoPi
R5S fail to come up.
This results in the following (partial) trace:

  INFO: task kworker/0:1:11 blocked for more than 120 seconds.
        Not tainted 6.16-rc6+unreleased-arm64-cknow #1 Debian 6.16~rc6-1~exp1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:kworker/0:1     state:D stack:0     pid:11    tgid:11    ppid:2      task_flags:0x4208060 flags:0x00000010
  Workqueue: events_power_efficient reg_check_chans_work [cfg80211]
  Call trace:
   __switch_to+0xf8/0x168 (T)
   __schedule+0x3f8/0xda8
   schedule+0x3c/0x120
   schedule_preempt_disabled+0x2c/0x58
   __mutex_lock.constprop.0+0x4d0/0xab8
   __mutex_lock_slowpath+0x1c/0x30
   mutex_lock+0x50/0x68
   rtnl_lock+0x20/0x38
   reg_check_chans_work+0x40/0x478 [cfg80211]
   process_one_work+0x178/0x3e0
   worker_thread+0x260/0x390
   kthread+0x150/0x250
   ret_from_fork+0x10/0x20
  INFO: task kworker/0:1:11 is blocked on a mutex likely owned by task dhcpcd:615.
  task:dhcpcd          state:D stack:0     pid:615   tgid:615   ppid:614    task_flags:0x400140 flags:0x00000018
  Call trace:
   __switch_to+0xf8/0x168 (T)
   __schedule+0x3f8/0xda8
   schedule+0x3c/0x120
   schedule_preempt_disabled+0x2c/0x58
   rwsem_down_write_slowpath+0x1e4/0x750
   down_write+0x98/0xb0
   led_trigger_register+0x134/0x1c0
   phy_led_triggers_register+0xf4/0x258 [libphy]
   phy_attach_direct+0x30c/0x390 [libphy]
   phylink_fwnode_phy_connect+0xb0/0x138 [phylink]
   __stmmac_open+0xec/0x520 [stmmac]
   stmmac_open+0x4c/0xe8 [stmmac]
   __dev_open+0x130/0x2e0
   __dev_change_flags+0x1c4/0x248
   netif_change_flags+0x2c/0x80
   dev_change_flags+0x88/0xc8
   devinet_ioctl+0x35c/0x610
   inet_ioctl+0x204/0x260
   sock_do_ioctl+0x6c/0x140
   sock_ioctl+0x2e4/0x388
   __arm64_sys_ioctl+0xb4/0x120
   invoke_syscall+0x6c/0x100
   el0_svc_common.constprop.0+0x48/0xf0
   do_el0_svc+0x24/0x38
   el0_svc+0x3c/0x188
   el0t_64_sync_handler+0x10c/0x140
   el0t_64_sync+0x198/0x1a0

In order to not introduce a regression with kernel 6.16, drop the netdev
triggers for now while the problem is being investigated further.

Fixes: 1631cbdb80 ("arm64: dts: rockchip: Improve LED config for NanoPi R5S")
Helped-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Diederik de Haas <didi.debian@cknow.org>
Link: https://lore.kernel.org/r/20250722123628.25660-1-didi.debian@cknow.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-07-22 22:33:36 +02:00
Manivannan Sadhasivam
8c493cc91f PCI/pwrctrl: Create pwrctrl devices only when CONFIG_PCI_PWRCTRL is enabled
If devicetree describes power supplies related to a PCI device, we
unnecessarily created a pwrctrl device even if CONFIG_PCI_PWRCTL was not
enabled.

We only need pci_pwrctrl_create_device() when CONFIG_PCI_PWRCTRL is
enabled.  Compile it out when CONFIG_PCI_PWRCTRL is not enabled.

When pci_pwrctrl_create_device() creates and returns a pwrctrl device,
pci_scan_device() doesn't enumerate the PCI device. It assumes the pwrctrl
core will rescan the bus after turning on the power. However, if
CONFIG_PCI_PWRCTRL is not enabled, the rescan never happens, which breaks
PCI enumeration on any system that describes power supplies in devicetree
but does not use pwrctrl.

Jim reported that some brcmstb platforms break this way.  The brcmstb
driver is still broken if CONFIG_PCI_PWRCTRL is enabled, but this commit at
least allows brcmstb to work when it's NOT enabled.

Fixes: 957f40d039 ("PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()")
Reported-by: Jim Quinlan <james.quinlan@broadcom.com>
Link: https://lore.kernel.org/r/CA+-6iNwgaByXEYD3j=-+H_PKAxXRU78svPMRHDKKci8AGXAUPg@mail.gmail.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org	# v6.15
Link: https://patch.msgid.link/20250701064731.52901-1-manivannan.sadhasivam@linaro.org
2025-07-22 13:36:04 -05:00
SHARAN KUMAR M
931837cd92 ALSA: hda/realtek: Fix mute LED mask on HP OMEN 16 laptop
this patch is to fix my previous Commit <e5182305a519> i have fixed mute
led but for by This patch corrects the coefficient mask value introduced
in commit <e5182305a519>, which was intended to enable the mute LED
functionality. During testing, multiple values were evaluated, and
an incorrect value was mistakenly included in the final commit.
This update fixes that error by applying the correct mask value for
proper mute LED behavior.

Tested on 6.15.5-arch1-1

Fixes: e5182305a5 ("ALSA: hda/realtek: Enable Mute LED on HP OMEN 16 Laptop xd000xx")
Signed-off-by: SHARAN KUMAR M <sharweshraajan@gmail.com>
Link: https://patch.msgid.link/20250722172224.15359-1-sharweshraajan@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-07-22 19:57:09 +02:00
Ada Couprie Diaz
d42e6c20de arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
`cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
to different stacks along with the Shadow Call Stack if it is enabled.
Those two stack changes cannot be done atomically and both functions
can be interrupted by SErrors or Debug Exceptions which, though unlikely,
is very much broken : if interrupted, we can end up with mismatched stacks
and Shadow Call Stack leading to clobbered stacks.

In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
but x18 stills points to the old task's SCS. When the interrupt handler
tries to save the task's SCS pointer, it will save the old task
SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
clobbering it.

In `call_on_irq_stack()`, it can happen when switching from the task stack
to the IRQ stack and when switching back. In both cases, we can be
interrupted when the SCS pointer points to the IRQ SCS, but SP points to
the task stack. The nested interrupt handler pushes its return addresses
on the IRQ SCS. It then detects that SP points to the task stack,
calls `call_on_irq_stack()` and clobbers the task SCS pointer with
the IRQ SCS pointer, which it will also use !

This leads to tasks returning to addresses on the wrong SCS,
or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK
or FPAC if enabled.

This is possible on a default config, but unlikely.
However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and
instead the GIC is responsible for filtering what interrupts the CPU
should receive based on priority.
Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU
even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very*
frequently depending on the system configuration and workload, leading
to unpredictable kernel panics.

Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
Do the same in `call_on_irq_stack()`, but restore and mask around
the branch.
Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
of behaviour between all configurations.

Introduce and use an assembly macro for saving and masking DAIF,
as the existing one saves but only masks IF.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Reported-by: Cristian Prundeanu <cpru@amazon.com>
Fixes: 59b37fe52f ("arm64: Stash shadow stack pointer in the task struct on interrupt")
Tested-by: Cristian Prundeanu <cpru@amazon.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250718142814.133329-1-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-07-22 16:39:30 +01:00
Douglas Anderson
15a7ca747d drm/bridge: ti-sn65dsi86: Remove extra semicolon in ti_sn_bridge_probe()
As reported by the kernel test robot, a recent patch introduced an
unnecessary semicolon. Remove it.

Fixes: 55e8ff8420 ("drm/bridge: ti-sn65dsi86: Add HPD for DisplayPort connector type")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506301704.0SBj6ply-lkp@intel.com/
Reviewed-by: Devarsh Thakkar <devarsht@ti.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250714130631.1.I1cfae3222e344a3b3c770d079ee6b6f7f3b5d636@changeid
2025-07-22 07:46:34 -07:00
Markus Blöchl
67c632b4a7 timekeeping: Zero initialize system_counterval when querying time from phc drivers
Most drivers only populate the fields cycles and cs_id of system_counterval
in their get_time_fn() callback for get_device_system_crosststamp(), unless
they explicitly provide nanosecond values.

When the use_nsecs field was added to struct system_counterval, most
drivers did not care.  Clock sources other than CSID_GENERIC could then get
converted in convert_base_to_cs() based on an uninitialized use_nsecs field,
which usually results in -EINVAL during the following range check.

Pass in a fully zero initialized system_counterval_t to cure that.

Fixes: 6b2e299775 ("timekeeping: Provide infrastructure for converting to/from a base clock")
Signed-off-by: Markus Blöchl <markus@blochl.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250720-timekeeping_uninit_crossts-v2-1-f513c885b7c2@blochl.de
2025-07-22 14:25:21 +02:00
Arnd Bergmann
270b329f7e Revert "drm/nouveau: check ioctl command codes better"
My previous patch ended up causing a regression for the
DRM_IOCTL_NOUVEAU_NVIF ioctl. The intention of my patch was to only
pass ioctl commands that have the correct dir/type/nr bits into the
nouveau_abi16_ioctl() function.

This turned out to be too strict, as userspace does use at least
write-only and write-read direction settings. Checking for both of these
still did not fix the issue, so the best we can do for the 6.16 release
is to revert back to what we've had since linux-3.16.

This version is still fragile, but at least it is known to work with
existing userspace. Fixing this properly requires a better understanding
of what commands are being passed from userspace in practice, and how
that relies on the undocumented (miss)behavior in nouveau_drm_ioctl().

Fixes: e5478166df ("drm/nouveau: check ioctl command codes better")
Reported-by: Satadru Pramanik <satadru@gmail.com>
Closes: https://lore.kernel.org/lkml/CAFrh3J85tsZRpOHQtKgNHUVnn=EG=QKBnZTRtWS8eWSc1K1xkA@mail.gmail.com/
Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Closes: https://lore.kernel.org/lkml/aH9n_QGMFx2ZbKlw@debian.local/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250722115830.2587297-1-arnd@kernel.org
[ Add Closes: tags, fix minor typo in commit message. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-07-22 14:03:27 +02:00
Marc Kleine-Budde
c1f3f9797c can: netlink: can_changelink(): fix NULL pointer deref of struct can_priv::do_set_mode
Andrei Lalaev reported a NULL pointer deref when a CAN device is
restarted from Bus Off and the driver does not implement the struct
can_priv::do_set_mode callback.

There are 2 code path that call struct can_priv::do_set_mode:
- directly by a manual restart from the user space, via
  can_changelink()
- delayed automatic restart after bus off (deactivated by default)

To prevent the NULL pointer deference, refuse a manual restart or
configure the automatic restart delay in can_changelink() and report
the error via extack to user space.

As an additional safety measure let can_restart() return an error if
can_priv::do_set_mode is not set instead of dereferencing it
unchecked.

Reported-by: Andrei Lalaev <andrey.lalaev@gmail.com>
Closes: https://lore.kernel.org/all/20250714175520.307467-1-andrey.lalaev@gmail.com
Fixes: 39549eef35 ("can: CAN Network device driver and Netlink interface")
Link: https://patch.msgid.link/20250718-fix-nullptr-deref-do_set_mode-v1-1-0b520097bb96@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-22 12:55:13 +02:00
Xiang Mei
cf074eca00 net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in qfq_delete_class
might_sleep could be trigger in the atomic context in qfq_delete_class.

qfq_destroy_class was moved into atomic context locked
by sch_tree_lock to avoid a race condition bug on
qfq_aggregate. However, might_sleep could be triggered by
qfq_destroy_class, which introduced sleeping in atomic context (path:
qfq_destroy_class->qdisc_put->__qdisc_destroy->lockdep_unregister_key
->might_sleep).

Considering the race is on the qfq_aggregate objects, keeping
qfq_rm_from_agg in the lock but moving the left part out can solve
this issue.

Fixes: 5e28d5a3f7 ("net/sched: sch_qfq: Fix race condition on qfq_aggregate")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/4a04e0cc-a64b-44e7-9213-2880ed641d77@sabinyo.mountain
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20250717230128.159766-1-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-22 11:48:34 +02:00
Erick Karanja
48915162b5 ALSA: usb-audio: qcom: Adjust mutex unlock order
The mutexes qdev_mutex and chip->mutex are acquired in that order
throughout the driver. To preserve proper lock hierarchy and avoid
potential deadlocks, they must be released in the reverse
order of acquisition.

This change reorders the unlock sequence to first release chip->mutex
followed by qdev_mutex, ensuring consistency with the locking pattern.

[ fixed the code indentations and Fixes tag by tiwai ]

Fixes: 326bbc3482 ("ALSA: usb-audio: qcom: Introduce QC USB SND offloading support")
Signed-off-by: Erick Karanja <karanja99erick@gmail.com>
Link: https://patch.msgid.link/20250721114554.1666104-1-karanja99erick@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-07-22 11:39:40 +02:00
Yonghong Song
95993dc303 bpf: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...))
Intel linux test robot reported a warning that ERR_CAST can be used
for error pointer casting instead of more-complicated/rarely-used
ERR_PTR(PTR_ERR(...)) style.

There is no functionality change, but still let us replace two such
instances as it improves consistency and readability.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507201048.bceHy8zX-lkp@intel.com/
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250720164754.3999140-1-yonghong.song@linux.dev
2025-07-21 17:27:09 -07:00
Praveen Kaligineedi
b03f15c019 gve: Fix stuck TX queue for DQ queue format
gve_tx_timeout was calculating missed completions in a way that is only
relevant in the GQ queue format. Additionally, it was attempting to
disable device interrupts, which is not needed in either GQ or DQ queue
formats.

As a result, TX timeouts with the DQ queue format likely would have
triggered early resets without kicking the queue at all.

This patch drops the check for pending work altogether and always kicks
the queue after validating the queue has not seen a TX timeout too
recently.

Cc: stable@vger.kernel.org
Fixes: 87a7f321bb ("gve: Recover from queue stall due to missed IRQ")
Co-developed-by: Tim Hostetler <thostet@google.com>
Signed-off-by: Tim Hostetler <thostet@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250717192024.1820931-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-21 17:14:12 -07:00
Kito Xu (veritas501)
6c4a92d07b net: appletalk: Fix use-after-free in AARP proxy probe
The AARP proxy‐probe routine (aarp_proxy_probe_network) sends a probe,
releases the aarp_lock, sleeps, then re-acquires the lock.  During that
window an expire timer thread (__aarp_expire_timer) can remove and
kfree() the same entry, leading to a use-after-free.

race condition:

         cpu 0                          |            cpu 1
    atalk_sendmsg()                     |   atif_proxy_probe_device()
    aarp_send_ddp()                     |   aarp_proxy_probe_network()
    mod_timer()                         |   lock(aarp_lock) // LOCK!!
    timeout around 200ms                |   alloc(aarp_entry)
    and then call                       |   proxies[hash] = aarp_entry
    aarp_expire_timeout()               |   aarp_send_probe()
                                        |   unlock(aarp_lock) // UNLOCK!!
    lock(aarp_lock) // LOCK!!           |   msleep(100);
    __aarp_expire_timer(&proxies[ct])   |
    free(aarp_entry)                    |
    unlock(aarp_lock) // UNLOCK!!       |
                                        |   lock(aarp_lock) // LOCK!!
                                        |   UAF aarp_entry !!

==================================================================
BUG: KASAN: slab-use-after-free in aarp_proxy_probe_network+0x560/0x630 net/appletalk/aarp.c:493
Read of size 4 at addr ffff8880123aa360 by task repro/13278

CPU: 3 UID: 0 PID: 13278 Comm: repro Not tainted 6.15.2 #3 PREEMPT(full)
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:408 [inline]
 print_report+0xc1/0x630 mm/kasan/report.c:521
 kasan_report+0xca/0x100 mm/kasan/report.c:634
 aarp_proxy_probe_network+0x560/0x630 net/appletalk/aarp.c:493
 atif_proxy_probe_device net/appletalk/ddp.c:332 [inline]
 atif_ioctl+0xb58/0x16c0 net/appletalk/ddp.c:857
 atalk_ioctl+0x198/0x2f0 net/appletalk/ddp.c:1818
 sock_do_ioctl+0xdc/0x260 net/socket.c:1190
 sock_ioctl+0x239/0x6a0 net/socket.c:1311
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:906 [inline]
 __se_sys_ioctl fs/ioctl.c:892 [inline]
 __x64_sys_ioctl+0x194/0x200 fs/ioctl.c:892
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcb/0x250 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
 </TASK>

Allocated:
 aarp_alloc net/appletalk/aarp.c:382 [inline]
 aarp_proxy_probe_network+0xd8/0x630 net/appletalk/aarp.c:468
 atif_proxy_probe_device net/appletalk/ddp.c:332 [inline]
 atif_ioctl+0xb58/0x16c0 net/appletalk/ddp.c:857
 atalk_ioctl+0x198/0x2f0 net/appletalk/ddp.c:1818

Freed:
 kfree+0x148/0x4d0 mm/slub.c:4841
 __aarp_expire net/appletalk/aarp.c:90 [inline]
 __aarp_expire_timer net/appletalk/aarp.c:261 [inline]
 aarp_expire_timeout+0x480/0x6e0 net/appletalk/aarp.c:317

The buggy address belongs to the object at ffff8880123aa300
 which belongs to the cache kmalloc-192 of size 192
The buggy address is located 96 bytes inside of
 freed 192-byte region [ffff8880123aa300, ffff8880123aa3c0)

Memory state around the buggy address:
 ffff8880123aa200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8880123aa280: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880123aa300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
 ffff8880123aa380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff8880123aa400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>
Link: https://patch.msgid.link/20250717012843.880423-1-hxzene@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-21 16:55:08 -07:00
Florian Fainelli
18ff09c1b9 net: bcmasp: Restore programming of TX map vector register
On ASP versions v2.x we need to program the TX map vector register to
properly exercise end-to-end flow control, otherwise the TX engine can
either lock-up, or cause the hardware calculated checksum to be
wrong/corrupted when multiple back to back packets are being submitted
for transmission. This register defaults to 0, which means no flow
control being applied.

Fixes: e9f31435ee ("net: bcmasp: Add support for asp-v3.0")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250718212242.3447751-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-21 16:41:36 -07:00
Jakub Kicinski
53b2fb6b05 Merge branch 'selftests-mptcp-connect-cover-alt-modes'
Matthieu Baerts says:

====================
selftests: mptcp: connect: cover alt modes

mptcp_connect.sh can be executed manually with "-m <MODE>" and "-C" to
make sure everything works as expected when using "mmap" and "sendfile"
modes instead of "poll", and with the MPTCP checksum support.

These modes should be validated, but they are not when the selftests are
executed via the kselftest helpers. It means that most CIs validating
these selftests, like NIPA for the net development trees and LKFT for
the stable ones, are not covering these modes.

To fix that, new test programs have been added, simply calling
mptcp_connect.sh with the right parameters.

The first patch can be backported up to v5.6, and the second one up to
v5.14.

v1: https://lore.kernel.org/20250714-net-mptcp-sft-connect-alt-v1-0-bf1c5abbe575@kernel.org
====================

Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-0-8230ddd82454@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-21 16:21:33 -07:00
Matthieu Baerts (NGI0)
fdf0f60a2b selftests: mptcp: connect: also cover checksum
The checksum mode has been added a while ago, but it is only validated
when manually launching mptcp_connect.sh with "-C".

The different CIs were then not validating these MPTCP Connect tests
with checksum enabled. To make sure they do, add a new test program
executing mptcp_connect.sh with the checksum mode.

Fixes: 94d66ba1d8 ("selftests: mptcp: enable checksum in mptcp_connect.sh")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-2-8230ddd82454@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-21 16:21:30 -07:00
Matthieu Baerts (NGI0)
37848a456f selftests: mptcp: connect: also cover alt modes
The "mmap" and "sendfile" alternate modes for mptcp_connect.sh/.c are
available from the beginning, but only tested when mptcp_connect.sh is
manually launched with "-m mmap" or "-m sendfile", not via the
kselftests helpers.

The MPTCP CI was manually running "mptcp_connect.sh -m mmap", but not
"-m sendfile". Plus other CIs, especially the ones validating the stable
releases, were not validating these alternate modes.

To make sure these modes are validated by these CIs, add two new test
programs executing mptcp_connect.sh with the alternate modes.

Fixes: 048d19d444 ("mptcp: add basic kselftest for mptcp")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250715-net-mptcp-sft-connect-alt-v2-1-8230ddd82454@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-21 16:21:30 -07:00
Jacek Kowalski
61114910a5 e1000e: ignore uninitialized checksum word on tgp
As described by Vitaly Lifshits:

> Starting from Tiger Lake, LAN NVM is locked for writes by SW, so the
> driver cannot perform checksum validation and correction. This means
> that all NVM images must leave the factory with correct checksum and
> checksum valid bit set.

Unfortunately some systems have left the factory with an uninitialized
value of 0xFFFF at register address 0x3F (checksum word location).
So on Tiger Lake platform we ignore the computed checksum when such
condition is encountered.

Signed-off-by: Jacek Kowalski <jacek@jacekk.info>
Tested-by: Vlad URSU <vlad@ursu.me>
Fixes: 4051f68318 ("e1000e: Do not take care about recovery NVM checksum")
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-21 10:31:09 -07:00
Jacek Kowalski
536fd741c7 e1000e: disregard NVM checksum on tgp when valid checksum bit is not set
As described by Vitaly Lifshits:

> Starting from Tiger Lake, LAN NVM is locked for writes by SW, so the
> driver cannot perform checksum validation and correction. This means
> that all NVM images must leave the factory with correct checksum and
> checksum valid bit set. Since Tiger Lake devices were the first to have
> this lock, some systems in the field did not meet this requirement.
> Therefore, for these transitional devices we skip checksum update and
> verification, if the valid bit is not set.

Signed-off-by: Jacek Kowalski <jacek@jacekk.info>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Fixes: 4051f68318 ("e1000e: Do not take care about recovery NVM checksum")
Cc: stable@vger.kernel.org
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-21 10:31:09 -07:00
Haoxiang Li
4ff12d82da ice: Fix a null pointer dereference in ice_copy_and_init_pkg()
Add check for the return value of devm_kmemdup()
to prevent potential null pointer dereference.

Fixes: c764881096 ("ice: Implement Dynamic Device Personalization (DDP) download")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-21 10:31:09 -07:00
Jamie Bainbridge
5a0df02999 i40e: When removing VF MAC filters, only check PF-set MAC
When the PF is processing an Admin Queue message to delete a VF's MACs
from the MAC filter, we currently check if the PF set the MAC and if
the VF is trusted.

This results in undesirable behaviour, where if a trusted VF with a
PF-set MAC sets itself down (which sends an AQ message to delete the
VF's MAC filters) then the VF MAC is erased from the interface.

This results in the VF losing its PF-set MAC which should not happen.

There is no need to check for trust at all, because an untrusted VF
cannot change its own MAC. The only check needed is whether the PF set
the MAC. If the PF set the MAC, then don't erase the MAC on link-down.

Resolve this by changing the deletion check only for PF-set MAC.

(the out-of-tree driver has also intentionally removed the check for VF
trust here with OOT driver version 2.26.8, this changes the Linux kernel
driver behaviour and comment to match the OOT driver behaviour)

Fixes: ea2a1cfc3b ("i40e: Fix VF MAC filter removal")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-21 10:31:09 -07:00
Dennis Chen
50b2af4515 i40e: report VF tx_dropped with tx_errors instead of tx_discards
Currently the tx_dropped field in VF stats is not updated correctly
when reading stats from the PF. This is because it reads from
i40e_eth_stats.tx_discards which seems to be unused for per VSI stats,
as it is not updated by i40e_update_eth_stats() and the corresponding
register, GLV_TDPC, is not implemented[1].

Use i40e_eth_stats.tx_errors instead, which is actually updated by
i40e_update_eth_stats() by reading from GLV_TEPC.

To test, create a VF and try to send bad packets through it:

$ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs
$ cat test.py
from scapy.all import *

vlan_pkt = Ether(dst="ff:ff:ff:ff:ff:ff") / Dot1Q(vlan=999) / IP(dst="192.168.0.1") / ICMP()
ttl_pkt = IP(dst="8.8.8.8", ttl=0) / ICMP()

print("Send packet with bad VLAN tag")
sendp(vlan_pkt, iface="enp2s0f0v0")
print("Send packet with TTL=0")
sendp(ttl_pkt, iface="enp2s0f0v0")
$ ip -s link show dev enp2s0f0
16: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:ec:ef:b7:e0:ac brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
             0       0      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
             0       0      0       0       0       0
    vf 0     link/ether e2:c6:fd:c1:1e:92 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    RX: bytes  packets  mcast   bcast   dropped
             0        0       0       0        0
    TX: bytes  packets   dropped
             0        0        0
$ python test.py
Send packet with bad VLAN tag
.
Sent 1 packets.
Send packet with TTL=0
.
Sent 1 packets.
$ ip -s link show dev enp2s0f0
16: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:ec:ef:b7:e0:ac brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
             0       0      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
             0       0      0       0       0       0
    vf 0     link/ether e2:c6:fd:c1:1e:92 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    RX: bytes  packets  mcast   bcast   dropped
             0        0       0       0        0
    TX: bytes  packets   dropped
             0        0        0

A packet with non-existent VLAN tag and a packet with TTL = 0 are sent,
but tx_dropped is not incremented.

After patch:

$ ip -s link show dev enp2s0f0
19: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:ec:ef:b7:e0:ac brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
             0       0      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
             0       0      0       0       0       0
    vf 0     link/ether 4a:b7:3d:37:f7:56 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    RX: bytes  packets  mcast   bcast   dropped
             0        0       0       0        0
    TX: bytes  packets   dropped
             0        0        2

Fixes: dc645daef9 ("i40e: implement VF stats NDO")
Signed-off-by: Dennis Chen <dechen@redhat.com>
Link: https://www.intel.com/content/www/us/en/content-details/596333/intel-ethernet-controller-x710-tm4-at2-carlsville-datasheet.html
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-21 10:31:09 -07:00
Jack Thomson
ab16122115 arm64: kvm, smccc: Fix vendor uuid
Commit 13423063c7 ("arm64: kvm, smccc: Introduce and use API for
getting hypervisor UUID") replaced the explicit register constants
with the UUID_INIT macro. However, there is an endian issue, meaning
the UUID generated and used in the handshake didn't match UUID prior to
the commit.

The change in UUID causes the SMCCC vendor handshake to fail with older
guest kernels, meaning devices such as PTP were not available in the
guest.

This patch updates the parameters to the macro to generate a UUID which
matches the previous value, and re-establish backwards compatibility
with older guest kernels.

Fixes: 13423063c7 ("arm64: kvm, smccc: Introduce and use API for getting hypervisor UUID")
Signed-off-by: Jack Thomson <jackabt@amazon.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20250721130558.50823-1-jackabt.amazon@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-07-21 18:18:51 +01:00
Stephen Boyd
e4b2a0c2b9 Allwinner clock fixes for 6.16
- Mark A523 MBUS clock as critical
 - Fix names of CSI related clocks on V3s
   This includes changes to the driver, DT bindings and DT files.
 - Fix parents of TCON clock on V3s
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCgAsFiEE2nN1m/hhnkhOWjtHOJpUIZwPJDAFAmh2hw4OHHdlbnNAY3Np
 ZS5vcmcACgkQOJpUIZwPJDCoNhAApMSWflxAXohu5tYLk82d/87BJAAIodRkJnVU
 B+1fUkqAsT7hKvnP2z3JoLufYXYm+uXrbVR5m3/00QsmXh2E3sKuXbnetxVvnpdQ
 gpGq0yZBV5qFnuLkG7UN9FNovkVzY1dgQ0J1gKG++itMkkW/yHGQhXOo6ZiRHy2R
 /8D/LhZa4gq2K+szQkuFH12o6+dDCFTMfwUiE5eNXI4XLuZailZ/a9tn5Qpym+kw
 k5/svQF8OHXLloyJ1QkIph3uDylFeTILzc3684AUD/b3cZNmppJ8CErKV8Gk8ELC
 0QscfeDc+Evbh+1qFZa5a4JK5Cz0F+GmG37G3XsJtA6IE6BMU75PSTTBYqaoDGoR
 eynxApXmBarPcmWzmQgmk7aE8A3gV05t4N3ZLjLCtjUjfg7yWveMu40MQlzs15Hm
 mWnOcQSETTxxbIbZBr6ZcYhXLsWjrzlQKzIodmNaVuf0TiZhH5DJ55CD6WlMT8T7
 +6fGa0Bez3XLTBLScnj+5ghXOvOHSr71koXFGTcoRhfWwUliTVtzJhfqgnGhSvwI
 Wfq1XVhaGOPpdKIXuBs63I4624J9zslJ5AuN75bP9Wyy0HvnxAx65PavgjOWinmg
 /5rO0Oi85eLpJmzZp9Cyz99lDZ3zqtrxJZf+gE/EOSM/UzNiUR8T0ZCK1UaurWuG
 GJnVxZU=
 =aH5v
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmh+dlMUHHN3Ym95ZEBj
 aHJvbWl1bS5vcmcACgkQrQKIl8bklSX5OA//e4VYLWjx93M+M40Lsc9+xxy/naoh
 hFWRyUSthVcnA/EYAGR9YuO5aBZSniXSMRKuY4bHt0yEOUWjKP0ypvm8ygGZYUPM
 eVlY+ITrggh3MC9isqyptGQ/9gY6BjS3goM6UIogz5oEZQeoo02a7inSKeW4kSd8
 z1j1GekeONLydnXBtSTUamnhDpXjY3eOM1VQeLh3lFdgScKlvQSEGpN8yFnjxLKx
 Df4O41zXvhGz0uEN/+cQ6sGGhLdhjnT7gIUpreFShnSY64Y19RzGtwhvERa2Ozeu
 pAuFIfj/QhSisEKCmRXgWBe0O+zhgs6iKUZZctwjNSeW1/WEEUOEsrP0rAoi/RGl
 IUu1W5m2Q5wzYGWYKnmguzKNDoB24Ej2qf0itVOSzQA9xtjTB40x/y1rB1hCTo/U
 XisbDoh4CEfmBuXsW/YUSNBqxi9KG03ZVbH95+HGjAkj+fzJ6l5Tw5lDSQt/JNEt
 Y2LnIknUn3BxSaq4vhINda5azs0zhDuQ7FCpMsdagz50BW1Ea1uHbIomIlprDNrQ
 khJI1tV+SZDy0q6GdhubgvEZWAtAsp/L4HxmF4dzrcle36RB49FB+Z/y4UxvqD+k
 BIoDKUc+VZCx2Z+gUcsEkWaemjVKIl1CsN2NwVTNMo+VeAFZ17/VHc9sybQ9qfXS
 Sgt8sVjguYl7p2k=
 =Uds7
 -----END PGP SIGNATURE-----

Merge tag 'sunxi-clk-fixes-for-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes

Pull Allwinner clock fixes from Chen-Yu Tsai:

 - Mark A523 MBUS clock as critical
 - Fix names of CSI related clocks on V3s
   This includes changes to the driver, DT bindings and DT files.
 - Fix parents of TCON clock on V3s

* tag 'sunxi-clk-fixes-for-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  clk: sunxi-ng: v3s: Fix TCON clock parents
  clk: sunxi-ng: v3s: Fix CSI1 MCLK clock name
  clk: sunxi-ng: v3s: Fix CSI SCLK clock name
  clk: sunxi-ng: a523: Mark MBUS clock as critical
2025-07-21 10:17:51 -07:00
Linus Torvalds
964ebc07c5 platform-drivers-x86 for v6.16-4
Fixes and New HW Support
 
 - alienware-wmi-wmax:
 
   - Add AWCC support for Alienware Area-51m and m15 R5.
 
   - Fix `dmi_system_id` array termination
 
 - arm64: huawei-gaokun-ec: fix OF node leak
 
 - dell-ddv: Fix taking psy->extensions_sem twice
 
 - dell-lis3lv02d: Add Precision 3551 accelerometer support
 
 - firmware_attributes_class: Fix initialization order
 
 - ideapad-laptop: Retain FnLock and kbd backlight across boots
 
 - lenovo-wmi-hotkey: Avoid triggering error -5 due to missing mute LED
 
 - mellanox: mlxbf-pmc: Validate event names and bool input
 
 - power: supply: Add get/set property direct to allow avoiding taking
                  psy->extensions_sem twice from power supply extensions
 
 The following is an automated shortlog grouped by driver:
 
 alieneware-wmi-wmax:
  -  Add AWCC support to more laptops
 
 alienware-wmi-wmax:
  -  Fix `dmi_system_id` array
 
 arm64: huawei-gaokun-ec:
  -  fix OF node leak
 
 dell-ddv:
  -  Fix taking the psy->extensions_sem lock twice
 
 dell-lis3lv02d:
  -  Add Precision 3551
 
 Fix initialization order for firmware_attributes_class:
  - Fix initialization order for firmware_attributes_class
 
 ideapad-laptop:
  -  Fix FnLock not remembered among boots
  -  Fix kbd backlight not remembered among boots
 
 lenovo-wmi-hotkey:
  -  Avoid triggering error -5 due to missing mute LED
 
 MAINTAINERS:
  -  Update entries for IFS and SBL drivers
 
 mlxbf-pmc:
  -  Remove newline char from event name input
  -  Use kstrtobool() to check 0/1 input
  -  Validate event/enable input
 
 power: supply: core:
  -  Add power_supply_get/set_property_direct()
 
 power: supply: test-power:
  -  Test access to extended power supply
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCaH40yAAKCRBZrE9hU+XO
 MbSEAP4tNEOahKx4ER4Awg7BvJKY6lfifKvu1yt5K3/aWde9VQD8CRv99ryV3kAy
 EX3Ut1otWsB7UY7+tfV8WeYmXutf/Qs=
 =wHun
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Ilpo Järvinen:

 - power supply code:
    - Add get/set property direct to allow avoiding taking
      psy->extensions_sem twice from power supply extensions

 - alienware-wmi-wmax:
    - Add AWCC support for Alienware Area-51m and m15 R5
    - Fix `dmi_system_id` array termination

 - arm64: huawei-gaokun-ec: fix OF node leak

 - dell-ddv: Fix taking psy->extensions_sem twice

 - dell-lis3lv02d: Add Precision 3551 accelerometer support

 - firmware_attributes_class: Fix initialization order

 - ideapad-laptop: Retain FnLock and kbd backlight across boots

 - lenovo-wmi-hotkey: Avoid triggering error -5 due to missing mute LED

 - mellanox: mlxbf-pmc: Validate event names and bool input

* tag 'platform-drivers-x86-v6.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  MAINTAINERS: Update entries for IFS and SBL drivers
  platform/x86: dell-lis3lv02d: Add Precision 3551
  platform/x86: alieneware-wmi-wmax: Add AWCC support to more laptops
  platform/x86: Fix initialization order for firmware_attributes_class
  platform: arm64: huawei-gaokun-ec: fix OF node leak
  lenovo-wmi-hotkey: Avoid triggering error -5 due to missing mute LED
  platform/x86: ideapad-laptop: Fix kbd backlight not remembered among boots
  platform/x86: ideapad-laptop: Fix FnLock not remembered among boots
  platform/mellanox: mlxbf-pmc: Use kstrtobool() to check 0/1 input
  platform/mellanox: mlxbf-pmc: Validate event/enable input
  platform/mellanox: mlxbf-pmc: Remove newline char from event name input
  platform/x86: dell-ddv: Fix taking the psy->extensions_sem lock twice
  power: supply: test-power: Test access to extended power supply
  power: supply: core: Add power_supply_get/set_property_direct()
  platform/x86: alienware-wmi-wmax: Fix `dmi_system_id` array
2025-07-21 08:32:36 -07:00
Shuming Fan
9e55f11926
ASoC: SDCA: correct the calculation of the maximum init table size
One initial setting is 5 bytes, so num_init_writes should divide by 5.

Signed-off-by: Shuming Fan <shumingf@realtek.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20250721112334.388506-1-shumingf@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-07-21 16:04:47 +01:00
Ville Syrjälä
9e0c433d0c
drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4x
On g4x we currently use the 96MHz non-SSC refclk, which can't actually
generate an exact 2.7 Gbps link rate. In practice we end up with 2.688
Gbps which seems to be close enough to actually work, but link training
is currently failing due to miscalculating the DP_LINK_BW value (we
calcualte it directly from port_clock which reflects the actual PLL
outpout frequency).

Ideas how to fix this:
- nudge port_clock back up to 270000 during PLL computation/readout
- track port_clock and the nominal link rate separately so they might
  differ a bit
- switch to the 100MHz refclk, but that one should be SSC so perhaps
  not something we want

While we ponder about a better solution apply some band aid to the
immediate issue of miscalculated DP_LINK_BW value. With this
I can again use 2.7 Gbps link rate on g4x.

Cc: stable@vger.kernel.org
Fixes: 665a7b0409 ("drm/i915: Feed the DPLL output freq back into crtc_state")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250710201718.25310-2-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
(cherry picked from commit a8b874694db5cae7baaf522756f87acd956e6e66)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-21 09:44:53 -04:00
Derek Fang
d312962188
ASoC: rt5650: Eliminate the high frequency glitch
The glitch was detected in the high frequency of the HP playback.
This patch adjusts the DAC dither setting to avoid this situation
for almost all cases.

Signed-off-by: Derek Fang <derek.fang@realtek.com>
Link: https://patch.msgid.link/20250721034728.1396238-1-derek.fang@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-07-21 13:26:52 +01:00
Ranjani Sridharan
0503ac474a
ASoC: SOF: Intel: PTL: Add the sdw_process_wakeen op
Add the missing op in the device description to avoid issues with jack
detection.

Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20250721063039.2234279-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-07-21 13:26:51 +01:00
Jithu Joseph
e2967b50b7
MAINTAINERS: Update entries for IFS and SBL drivers
Update the MAINTAINERS file to reflect the following changes for two Intel
platform drivers:

- Tony has agreed to take over maintainership of the Intel In-Field Scan
  (IFS) driver, and is now listed as the new maintainer.
- Remove myself as the maintainer for the Slim BootLoader (SBL) firmware
  update driver and mark it as Orphan. To the best of my knowledge, there
  is no one familiar with SBL who can take over this role.

These changes are being made as I will soon be leaving Intel.

Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250714164643.3879784-1-jithu.joseph@intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-21 14:38:19 +03:00
Ben Skeggs
491254fff9 drm/nouveau/nvif: fix null ptr deref on pre-fermi boards
Check that gpfifo.post() exists before trying to call it.

Fixes: 862450a85b ("drm/nouveau/gf100-: track chan progress with non-WFI semaphore release")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Closes: https://lore.kernel.org/lkml/aElJIo9_Se6tAR1a@audible.transient.net/
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Closes: https://lore.kernel.org/all/CALjTZvZgH0N43rMTcZiDVSX93PFL680hsYPwtp8=Ja1OWPvZ1A@mail.gmail.com/
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Link: https://lore.kernel.org/r/20250714025923.29591-1-bskeggs@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-07-21 11:16:11 +02:00
Dawid Rezler
9744ede709 ALSA: hda/realtek - Add mute LED support for HP Pavilion 15-eg0xxx
The mute LED on the HP Pavilion Laptop 15-eg0xxx,
which uses the ALC287 codec, didn't work.
This patch fixes the issue by enabling the ALC287_FIXUP_HP_GPIO_LED quirk.

Tested on a physical device, the LED now works as intended.

Signed-off-by: Dawid Rezler <dawidrezler.patches@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20250720154907.80815-2-dawidrezler.patches@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-07-21 09:23:07 +02:00
Nilay Shroff
1966554b2e block: fix module reference leak in mq-deadline I/O scheduler
During probe, when the block layer registers a request queue, it
defaults to the mq-deadline I/O scheduler if the device is single-queue
and the mq-deadline module is available. To determine availability, the
elevator_set_default() invokes elevator_find_get(), which increments the
module's reference count. However, this reference is never released,
resulting in a module reference leak that prevents the mq-deadline module
from being unloaded.

This patch fixes the issue by ensuring the acquired module reference is
properly released.

Fixes: 1e44bedbc9 ("block: unifying elevator change")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250719132722.769536-1-nilay@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-20 13:18:13 -06:00
Kent Overstreet
eef9170700 bcachefs: btree_node_scan: don't re-read before initializing found_btree_node
If the btree node is encrypted, this caused us to initialize
found_btree_node from the encrypted header.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-20 13:26:08 -04:00
Marco Elver
6ade153349 kasan: use vmalloc_dump_obj() for vmalloc error reports
Since 6ee9b3d847 ("kasan: remove kasan_find_vm_area() to prevent
possible deadlock"), more detailed info about the vmalloc mapping and the
origin was dropped due to potential deadlocks.

While fixing the deadlock is necessary, that patch was too quick in
killing an otherwise useful feature, and did no due-diligence in
understanding if an alternative option is available.

Restore printing more helpful vmalloc allocation info in KASAN reports
with the help of vmalloc_dump_obj().  Example report:

| BUG: KASAN: vmalloc-out-of-bounds in vmalloc_oob+0x4c9/0x610
| Read of size 1 at addr ffffc900002fd7f3 by task kunit_try_catch/493
|
| CPU: [...]
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0xa8/0xf0
|  print_report+0x17e/0x810
|  kasan_report+0x155/0x190
|  vmalloc_oob+0x4c9/0x610
|  [...]
|
| The buggy address belongs to a 1-page vmalloc region starting at 0xffffc900002fd000 allocated at vmalloc_oob+0x36/0x610
| The buggy address belongs to the physical page:
| page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x126364
| flags: 0x200000000000000(node=0|zone=2)
| raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000
| raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
| page dumped because: kasan: bad access detected
|
| [..]

Link: https://lkml.kernel.org/r/20250716152448.3877201-1-elver@google.com
Fixes: 6ee9b3d847 ("kasan: remove kasan_find_vm_area() to prevent possible deadlock")
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Uladzislau Rezki <urezki@gmail.com>
Acked-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Yunseong Kim <ysk@kzalloc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:26:17 -07:00
Nathan Chancellor
153ad56672 mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show()
After a recent change in clang to expose uninitialized warnings from const
variables [1], there is a false positive warning from the if statement in
advisor_mode_show().

  mm/ksm.c:3687:11: error: variable 'output' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
   3687 |         else if (ksm_advisor == KSM_ADVISOR_SCAN_TIME)
        |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  mm/ksm.c:3690:33: note: uninitialized use occurs here
   3690 |         return sysfs_emit(buf, "%s\n", output);
        |                                        ^~~~~~

Rewrite the if statement to implicitly make KSM_ADVISOR_NONE the else
branch so that it is obvious to the compiler that ksm_advisor can only be
KSM_ADVISOR_NONE or KSM_ADVISOR_SCAN_TIME due to the assignments in
advisor_mode_store().

Link: https://lkml.kernel.org/r/20250715-ksm-fix-clang-21-uninit-warning-v1-1-f443feb4bfc4@kernel.org
Fixes: 66790e9a73 ("mm/ksm: add sysfs knobs for advisor")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://github.com/ClangBuiltLinux/linux/issues/2100
Link: 2464313eef [1]
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:26:17 -07:00
Jason Gunthorpe
d367a177e2 mm: update MAINTAINERS entry for HMM
Jérôme has moved on from RH and has not been looking at HMM patches for
some time. I've made the most changes to the core code in the recent
period and Leon is now working on the HMM side from the RDMA ODP.

Link: https://lkml.kernel.org/r/0-v1-a1df5219c7a3+1d981-hmm_maintainers_jgg@nvidia.com
Closes: https://lore.kernel.org/all/39d43309-9f34-48bc-a9ad-108c607ba175@samsung.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:26:16 -07:00
Ryusuke Konishi
4aead50caf nilfs2: reject invalid file types when reading inodes
To prevent inodes with invalid file types from tripping through the vfs
and causing malfunctions or assertion failures, add a missing sanity check
when reading an inode from a block device.  If the file type is not valid,
treat it as a filesystem error.

Link: https://lkml.kernel.org/r/20250710134952.29862-1-konishi.ryusuke@gmail.com
Fixes: 05fe58fdc1 ("nilfs2: inode operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:26:16 -07:00
Zi Yan
7563fcbfd4 selftests/mm: fix split_huge_page_test for folio_split() tests
PID_FMT does not have an offset field, so folio_split() tests are not
performed.  Add PID_FMT_OFFSET with an offset field and use it to perform
folio_split() tests.

Link: https://lkml.kernel.org/r/20250709012800.3225727-1-ziy@nvidia.com
Fixes: 80a5c494c8 ("selftests/mm: add tests for folio_split(), buddy allocator like split")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Tested-by : Donet Tom <donettom@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:26:16 -07:00
Sergey Senozhatsky
091a08b002 mailmap: add entry for Senozhatsky
Consolidate and map all addresses to a single one.

Link: https://lkml.kernel.org/r/20250707075243.858895-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:26:16 -07:00
Harry Yoo
694d6b9992 mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n
Commit 48b4800a1c ("zsmalloc: page migration support") added support for
migrating zsmalloc pages using the movable_operations migration framework.
However, the commit did not take into account that zsmalloc supports
migration only when CONFIG_COMPACTION is enabled.  Tracing shows that
zsmalloc was still passing the __GFP_MOVABLE flag even when compaction is
not supported.

This can result in unmovable pages being allocated from movable page
blocks (even without stealing page blocks), ZONE_MOVABLE and CMA area.

Possible user visible effects:
- Some ZONE_MOVABLE memory can be not actually movable
- CMA allocation can fail because of this
- Increased memory fragmentation due to ignoring the page mobility
  grouping feature  
I'm not really sure who uses kernels without compaction support, though :(


To fix this, clear the __GFP_MOVABLE flag when
!IS_ENABLED(CONFIG_COMPACTION).

Link: https://lkml.kernel.org/r/20250704103053.6913-1-harry.yoo@oracle.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:26:15 -07:00
Jinjiang Tu
9f1e8cd0b7 mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
In shrink_folio_list(), the hwpoisoned folio may be large folio, which
can't be handled by unmap_poisoned_folio().  For THP, try_to_unmap_one()
must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
retry.  Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
pvmw.pte.  Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a
WARN_ON_ONCE due to the page isn't in swapcache.

Since UCE is rare in real world, and race with reclaimation is more rare,
just skipping the hwpoisoned large folio is enough.  memory_failure() will
handle it if the UCE is triggered again.

This happens when memory reclaim for large folio races with
memory_failure(), and will lead to kernel panic.  The race is as
follows:

cpu0      cpu1
 shrink_folio_list memory_failure
  TestSetPageHWPoison
  unmap_poisoned_folio
  --> trigger BUG_ON due to
  unmap_poisoned_folio couldn't
   handle large folio

[tujinjiang@huawei.com: add comment to unmap_poisoned_folio()]
  Link: https://lkml.kernel.org/r/69fd4e00-1b13-d5f7-1c82-705c7d977ea4@huawei.com
Link: https://lkml.kernel.org/r/20250627125747.3094074-2-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Fixes: 1b0449544c ("mm/vmscan: don't try to reclaim hwpoison folio")
Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:26:15 -07:00
Jakub Kicinski
81e0db8e83 Merge branch 'mlx5-misc-fixes-2025-07-17'
Tariq Toukan says:

====================
mlx5 misc fixes 2025-07-17

This small patchset provides misc bug fixes from the team to the mlx5
driver.
====================

Link: https://patch.msgid.link/1752753970-261832-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18 17:33:05 -07:00
Shahar Shitrit
5b4c56ad4d net/mlx5: E-Switch, Fix peer miss rules to use peer eswitch
In the original design, it is assumed local and peer eswitches have the
same number of vfs. However, in new firmware, local and peer eswitches
can have different number of vfs configured by mlxconfig.  In such
configuration, it is incorrect to derive the number of vfs from the
local device's eswitch.

Fix this by updating the peer miss rules add and delete functions to use
the peer device's eswitch and vf count instead of the local device's
information, ensuring correct behavior regardless of vf configuration
differences.

Fixes: ac004b8321 ("net/mlx5e: E-Switch, Add peer miss rules")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1752753970-261832-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18 17:32:58 -07:00
Chiara Meiohas
3afa3ae3db net/mlx5: Fix memory leak in cmd_exec()
If cmd_exec() is called with callback and mlx5_cmd_invoke() returns an
error, resources allocated in cmd_exec() will not be freed.

Fix the code to release the resources if mlx5_cmd_invoke() returns an
error.

Fixes: f086470122 ("net/mlx5: cmdif, Return value improvements")
Reported-by: Alex Tereshkin <atereshkin@nvidia.com>
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1752753970-261832-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18 17:32:58 -07:00
Himanshu Mittal
6e86fb73de net: ti: icssg-prueth: Fix buffer allocation for ICSSG
Fixes overlapping buffer allocation for ICSSG peripheral
used for storing packets to be received/transmitted.
There are 3 buffers:
1. Buffer for Locally Injected Packets
2. Buffer for Forwarding Packets
3. Buffer for Host Egress Packets

In existing allocation buffers for 2. and 3. are overlapping causing
packet corruption.

Packet corruption observations:
During tcp iperf testing, due to overlapping buffers the received ack
packet overwrites the packet to be transmitted. So, we see packets on
wire with the ack packet content inside the content of next TCP packet
from sender device.

Details for AM64x switch mode:
-> Allocation by existing driver:
+---------+-------------------------------------------------------------+
|         |          SLICE 0             |          SLICE 1             |
|         +------+--------------+--------+------+--------------+--------+
|         | Slot | Base Address | Size   | Slot | Base Address | Size   |
|---------+------+--------------+--------+------+--------------+--------+
|         | 0    | 70000000     | 0x2000 | 0    | 70010000     | 0x2000 |
|         | 1    | 70002000     | 0x2000 | 1    | 70012000     | 0x2000 |
|         | 2    | 70004000     | 0x2000 | 2    | 70014000     | 0x2000 |
| FWD     | 3    | 70006000     | 0x2000 | 3    | 70016000     | 0x2000 |
| Buffers | 4    | 70008000     | 0x2000 | 4    | 70018000     | 0x2000 |
|         | 5    | 7000A000     | 0x2000 | 5    | 7001A000     | 0x2000 |
|         | 6    | 7000C000     | 0x2000 | 6    | 7001C000     | 0x2000 |
|         | 7    | 7000E000     | 0x2000 | 7    | 7001E000     | 0x2000 |
+---------+------+--------------+--------+------+--------------+--------+
|         | 8    | 70020000     | 0x1000 | 8    | 70028000     | 0x1000 |
|         | 9    | 70021000     | 0x1000 | 9    | 70029000     | 0x1000 |
|         | 10   | 70022000     | 0x1000 | 10   | 7002A000     | 0x1000 |
| Our     | 11   | 70023000     | 0x1000 | 11   | 7002B000     | 0x1000 |
| LI      | 12   | 00000000     | 0x0    | 12   | 00000000     | 0x0    |
| Buffers | 13   | 00000000     | 0x0    | 13   | 00000000     | 0x0    |
|         | 14   | 00000000     | 0x0    | 14   | 00000000     | 0x0    |
|         | 15   | 00000000     | 0x0    | 15   | 00000000     | 0x0    |
+---------+------+--------------+--------+------+--------------+--------+
|         | 16   | 70024000     | 0x1000 | 16   | 7002C000     | 0x1000 |
|         | 17   | 70025000     | 0x1000 | 17   | 7002D000     | 0x1000 |
|         | 18   | 70026000     | 0x1000 | 18   | 7002E000     | 0x1000 |
| Their   | 19   | 70027000     | 0x1000 | 19   | 7002F000     | 0x1000 |
| LI      | 20   | 00000000     | 0x0    | 20   | 00000000     | 0x0    |
| Buffers | 21   | 00000000     | 0x0    | 21   | 00000000     | 0x0    |
|         | 22   | 00000000     | 0x0    | 22   | 00000000     | 0x0    |
|         | 23   | 00000000     | 0x0    | 23   | 00000000     | 0x0    |
+---------+------+--------------+--------+------+--------------+--------+
--> here 16, 17, 18, 19 overlapping with below express buffer

+-----+-----------------------------------------------+
|     |       SLICE 0       |        SLICE 1          |
|     +------------+----------+------------+----------+
|     | Start addr | End addr | Start addr | End addr |
+-----+------------+----------+------------+----------+
| EXP | 70024000   | 70028000 | 7002C000   | 70030000 | <-- Overlapping
| PRE | 70030000   | 70033800 | 70034000   | 70037800 |
+-----+------------+----------+------------+----------+

+---------------------+----------+----------+
|                     | SLICE 0  |  SLICE 1 |
+---------------------+----------+----------+
| Default Drop Offset | 00000000 | 00000000 |     <-- Field not configured
+---------------------+----------+----------+

-> Allocation this patch brings:
+---------+-------------------------------------------------------------+
|         |          SLICE 0             |          SLICE 1             |
|         +------+--------------+--------+------+--------------+--------+
|         | Slot | Base Address | Size   | Slot | Base Address | Size   |
|---------+------+--------------+--------+------+--------------+--------+
|         | 0    | 70000000     | 0x2000 | 0    | 70040000     | 0x2000 |
|         | 1    | 70002000     | 0x2000 | 1    | 70042000     | 0x2000 |
|         | 2    | 70004000     | 0x2000 | 2    | 70044000     | 0x2000 |
| FWD     | 3    | 70006000     | 0x2000 | 3    | 70046000     | 0x2000 |
| Buffers | 4    | 70008000     | 0x2000 | 4    | 70048000     | 0x2000 |
|         | 5    | 7000A000     | 0x2000 | 5    | 7004A000     | 0x2000 |
|         | 6    | 7000C000     | 0x2000 | 6    | 7004C000     | 0x2000 |
|         | 7    | 7000E000     | 0x2000 | 7    | 7004E000     | 0x2000 |
+---------+------+--------------+--------+------+--------------+--------+
|         | 8    | 70010000     | 0x1000 | 8    | 70050000     | 0x1000 |
|         | 9    | 70011000     | 0x1000 | 9    | 70051000     | 0x1000 |
|         | 10   | 70012000     | 0x1000 | 10   | 70052000     | 0x1000 |
| Our     | 11   | 70013000     | 0x1000 | 11   | 70053000     | 0x1000 |
| LI      | 12   | 00000000     | 0x0    | 12   | 00000000     | 0x0    |
| Buffers | 13   | 00000000     | 0x0    | 13   | 00000000     | 0x0    |
|         | 14   | 00000000     | 0x0    | 14   | 00000000     | 0x0    |
|         | 15   | 00000000     | 0x0    | 15   | 00000000     | 0x0    |
+---------+------+--------------+--------+------+--------------+--------+
|         | 16   | 70014000     | 0x1000 | 16   | 70054000     | 0x1000 |
|         | 17   | 70015000     | 0x1000 | 17   | 70055000     | 0x1000 |
|         | 18   | 70016000     | 0x1000 | 18   | 70056000     | 0x1000 |
| Their   | 19   | 70017000     | 0x1000 | 19   | 70057000     | 0x1000 |
| LI      | 20   | 00000000     | 0x0    | 20   | 00000000     | 0x0    |
| Buffers | 21   | 00000000     | 0x0    | 21   | 00000000     | 0x0    |
|         | 22   | 00000000     | 0x0    | 22   | 00000000     | 0x0    |
|         | 23   | 00000000     | 0x0    | 23   | 00000000     | 0x0    |
+---------+------+--------------+--------+------+--------------+--------+

+-----+-----------------------------------------------+
|     |       SLICE 0       |        SLICE 1          |
|     +------------+----------+------------+----------+
|     | Start addr | End addr | Start addr | End addr |
+-----+------------+----------+------------+----------+
| EXP | 70018000   | 7001C000 | 70058000   | 7005C000 |
| PRE | 7001C000   | 7001F800 | 7005C000   | 7005F800 |
+-----+------------+----------+------------+----------+

+---------------------+----------+----------+
|                     | SLICE 0  |  SLICE 1 |
+---------------------+----------+----------+
| Default Drop Offset | 7001F800 | 7005F800 |
+---------------------+----------+----------+

Rootcause: missing buffer configuration for Express frames in
function: prueth_fw_offload_buffer_setup()

Details:
Driver implements two distinct buffer configuration functions that are
invoked based on the driver state and ICSSG firmware:-
- prueth_fw_offload_buffer_setup()
- prueth_emac_buffer_setup()

During initialization, driver creates standard network interfaces
(netdevs) and configures buffers via prueth_emac_buffer_setup().
This function properly allocates and configures all required memory
regions including:
- LI buffers
- Express packet buffers
- Preemptible packet buffers

However, when the driver transitions to an offload mode (switch/HSR/PRP),
buffer reconfiguration is handled by prueth_fw_offload_buffer_setup().
This function does not reconfigure the buffer regions required for
Express packets, leading to incorrect buffer allocation.

Fixes: abd5576b9c ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
Signed-off-by: Himanshu Mittal <h-mittal1@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250717094220.546388-1-h-mittal1@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18 17:20:45 -07:00
Ma Ke
96e056ffba dpaa2-switch: Fix device reference count leak in MAC endpoint handling
The fsl_mc_get_endpoint() function uses device_find_child() for
localization, which implicitly calls get_device() to increment the
device's reference count before returning the pointer. However, the
caller dpaa2_switch_port_connect_mac() fails to properly release this
reference in multiple scenarios. We should call put_device() to
decrement reference count properly.

As comment of device_find_child() says, 'NOTE: you will need to drop
the reference with put_device() after use'.

Found by code review.

Cc: stable@vger.kernel.org
Fixes: 84cba72956 ("dpaa2-switch: integrate the MAC endpoint support")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250717022309.3339976-3-make24@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18 17:13:50 -07:00
Ma Ke
ee9f3a81ab dpaa2-eth: Fix device reference count leak in MAC endpoint handling
The fsl_mc_get_endpoint() function uses device_find_child() for
localization, which implicitly calls get_device() to increment the
device's reference count before returning the pointer. However, the
caller dpaa2_eth_connect_mac() fails to properly release this
reference in multiple scenarios. We should call put_device() to
decrement reference count properly.

As comment of device_find_child() says, 'NOTE: you will need to drop
the reference with put_device() after use'.

Found by code review.

Cc: stable@vger.kernel.org
Fixes: 7194792308 ("dpaa2-eth: add MAC/PHY support through phylink")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250717022309.3339976-2-make24@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18 17:13:50 -07:00
Ma Ke
bddbe13d36 bus: fsl-mc: Fix potential double device reference in fsl_mc_get_endpoint()
The fsl_mc_get_endpoint() function may call fsl_mc_device_lookup()
twice, which would increment the device's reference count twice if
both lookups find a device. This could lead to a reference count leak.

Found by code review.

Cc: stable@vger.kernel.org
Fixes: 1ac210d128 ("bus: fsl-mc: add the fsl_mc_get_endpoint function")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 8567494ceb ("bus: fsl-mc: rescan devices if endpoint not found")
Link: https://patch.msgid.link/20250717022309.3339976-1-make24@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18 17:13:50 -07:00
Eduard Zingerman
42be23e8f2 libbpf: Verify that arena map exists when adding arena relocations
Fuzzer reported a memory access error in bpf_program__record_reloc()
that happens when:
- ".addr_space.1" section exists
- there is a relocation referencing this section
- there are no arena maps defined in BTF.

Sanity checks for maps existence are already present in
bpf_program__record_reloc(), hence this commit adds another one.

[1] https://github.com/libbpf/libbpf/actions/runs/16375110681/job/46272998064

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250718222059.281526-1-eddyz87@gmail.com
2025-07-18 17:12:50 -07:00
Alexei Starovoitov
beb1097ec8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc6
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-18 12:15:59 -07:00
Matteo Croce
0ee30d937c libbpf: Fix warning in calloc() usage
When compiling libbpf with some compilers, this warning is triggered:

libbpf.c: In function ‘bpf_object__gen_loader’:
libbpf.c:9209:28: error: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
 9209 |         gen = calloc(sizeof(*gen), 1);
      |                            ^
libbpf.c:9209:28: note: earlier argument should specify number of elements, later size of each element

Fix this by inverting the calloc() arguments.

Signed-off-by: Matteo Croce <teknoraver@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250717200337.49168-1-technoboy85@gmail.com
2025-07-18 08:29:50 -07:00
Edip Hazuri
21c8ed9047 ALSA: hda/realtek - Add mute LED support for HP Victus 15-fa0xxx
The mute led on this laptop is using ALC245 but requires a quirk to work
This patch enables the existing quirk for the device.

Tested on my Victus 15-fa0xxx Laptop. The LED behaviour works
as intended.

Cc: <stable@vger.kernel.org>
Signed-off-by: Edip Hazuri <edip@medip.dev>
Link: https://patch.msgid.link/20250717212625.366026-2-edip@medip.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-07-18 14:23:07 +02:00
Guoqing Jiang
6bea85979d
ASoC: mediatek: mt8365-dai-i2s: pass correct size to mt8365_dai_set_priv
Given mt8365_dai_set_priv allocate priv_size space to copy priv_data which
means we should pass mt8365_i2s_priv[i] or "struct mtk_afe_i2s_priv"
instead of afe_priv which has the size of "struct mt8365_afe_private".

Otherwise the KASAN complains about.

[   59.389765] BUG: KASAN: global-out-of-bounds in mt8365_dai_set_priv+0xc8/0x168 [snd_soc_mt8365_pcm]
...
[   59.394789] Call trace:
[   59.395167]  dump_backtrace+0xa0/0x128
[   59.395733]  show_stack+0x20/0x38
[   59.396238]  dump_stack_lvl+0xe8/0x148
[   59.396806]  print_report+0x37c/0x5e0
[   59.397358]  kasan_report+0xac/0xf8
[   59.397885]  kasan_check_range+0xe8/0x190
[   59.398485]  asan_memcpy+0x3c/0x98
[   59.399022]  mt8365_dai_set_priv+0xc8/0x168 [snd_soc_mt8365_pcm]
[   59.399928]  mt8365_dai_i2s_register+0x1e8/0x2b0 [snd_soc_mt8365_pcm]
[   59.400893]  mt8365_afe_pcm_dev_probe+0x4d0/0xdf0 [snd_soc_mt8365_pcm]
[   59.401873]  platform_probe+0xcc/0x228
[   59.402442]  really_probe+0x340/0x9e8
[   59.402992]  driver_probe_device+0x16c/0x3f8
[   59.403638]  driver_probe_device+0x64/0x1d8
[   59.404256]  driver_attach+0x1dc/0x4c8
[   59.404840]  bus_for_each_dev+0x100/0x190
[   59.405442]  driver_attach+0x44/0x68
[   59.405980]  bus_add_driver+0x23c/0x500
[   59.406550]  driver_register+0xf8/0x3d0
[   59.407122]  platform_driver_register+0x68/0x98
[   59.407810]  mt8365_afe_pcm_driver_init+0x2c/0xff8 [snd_soc_mt8365_pcm]

Fixes: 402bbb13a1 ("ASoC: mediatek: mt8365: Add I2S DAI support")
Signed-off-by: Guoqing Jiang <guoqing.jiang@canonical.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20250710011806.134507-1-guoqing.jiang@canonical.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-07-18 12:28:21 +01:00
Arnd Bergmann
816309d500 Allwinner fixes for 6.16
Only one fix:
 
 Correct the name of the A523's EMAC0 to GMAC0, as seen in the SoC's
 datasheets. The matching DT binding change is in the net tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCgAsFiEE2nN1m/hhnkhOWjtHOJpUIZwPJDAFAmh2iT4OHHdlbnNAY3Np
 ZS5vcmcACgkQOJpUIZwPJDCMJBAAuauzK0XFSSwuzxlkd2b4fb9iY8V8U+h4T6E0
 T83snFr3IDZxRU3z9+6EKhIAyh/yHpu8wv27jMNyb6UAbafzhqYaCENXdIrI7lDX
 CzgDCJJWPTJ9L/Iq5EY+BXSBNaHQz1PLrryjdLJeNSzxXWP4k62oPCE09HioYIvK
 hMEUIyH9jdTK6h4jfp0J0YHYfDZPr4ix33Ua33ToPnTYeaMeowaWDZ2I/IXyhmG7
 q+XRzdNrpcMuKQH5hpSTDYI1TjonWa1jovpYH9uktuzb7CHHz2C9f3wxz9X2xpui
 nxa44UoqKF0XfP3Luur9cfXHS6zq5HqKRaXkCP9EiXoIpocjo5rCHqvIabyC+6QU
 ACRrbp/9rCJ/XZm5MRvXD3+JlcF+SFJ3vYhZgk1iLXFFEPGgfUezf0s6hAUv5xGc
 OnxmF2ztxxXVAKCkRFG7JVbyHzQmKojX9lZUVWQP1dRZCfaQ1dqAN6oCO1NuVvLc
 mPVgftt3N2XDaM/XCp054RFBu0Dmjm6oIUNJy2Jc59ED5FF+Erit+xTqYfv9iZGu
 PfyBk43lPhFr+6twsYsaivoh45px7aCRAuKnwY4AizIjr01ZeVjc1snauiUS7pzo
 jtdIESYuaRnmTE1tiY0keAB5y4OH/jxB61jJ4IwIUIEGzySrmG8HchizVnd0UYoA
 eJsh/Jw=
 =j8oj
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmh56IgACgkQmmx57+YA
 GNkLmBAAhe2cre+Xw4DJu7jTKIcHJM5T7Ug93Jfa8rJ17vg6ksk+/PDwTATS5EmS
 trcw4H0ch23ZI/KQ0Lkf2fVAUwZkhbk16xlHLPzhfx8I6hn5VmhQc0JV1lsGgxu6
 SJSMwMsmDJH2BKkQUQHkx3vYtv4MF9FduNWLj2UgJUFaZH6QbeTmnTr+ksVfZJ7M
 F0sLlHBGYJybHk4yJsV6zdvuLIy1lZ5EU7NAD6ZDCl9uzp79LgzLKEEoc2ip9B0g
 NxvBDyc71MqqqSnEoCVPBQi3N6HFXVWSwGnzyhkmKmG57oiKyudagMHga4rIN3Iw
 DBkXYBp1LduiA6v64lKtXjCcXmidFi/hQDlll9eyOzwRjUBdHrNZe0Lt3sdrHKdb
 l12SCgEmHG0BNDP1YOOQSnrA1DyLcoK0Y45YkzDHjUkSpZY1uxeC6RBUiKMCCrxS
 shSZLSfQlK0IcZ130eIHENkwqm75T/Kje0vnJN0PdqechYgQFLn9z4nT0P/oYUhk
 +TrLUjukL4H9CGR3sFpeQmpobIQcBU50G7xG9K9kgLk+BawIcrbPcABJ/WkgkWnF
 GXCaIDqdiYn2ihZ4xwQGUHt2ft6T59YnvcARKcJak3Vqic+ASnNGsjvpCKV3c6FK
 PF2nJJHyFAPBSBCzVjDd5m70Of+M6gJj0197a7KMYhBHaVahoTU=
 =5gpa
 -----END PGP SIGNATURE-----

Merge tag 'sunxi-fixes-for-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes

Allwinner fixes for 6.16

Only one fix:

Correct the name of the A523's EMAC0 to GMAC0, as seen in the SoC's
datasheets. The matching DT binding change is in the net tree.

* tag 'sunxi-fixes-for-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  arm64: dts: allwinner: a523: Rename emac0 to gmac0

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-07-18 08:24:07 +02:00
Thomas Zimmermann
1918e79be9 Revert "drm/gem-dma: Use dma_buf from GEM object instance"
This reverts commit e8afa1557f.

The dma_buf field in struct drm_gem_object is not stable over the
object instance's lifetime. The field becomes NULL when user space
releases the final GEM handle on the buffer object. This resulted
in a NULL-pointer deref.

Workarounds in commit 5307dce878 ("drm/gem: Acquire references on
GEM handles for framebuffers") and commit f6bfc9afc7 ("drm/framebuffer:
Acquire internal references on GEM handles") only solved the problem
partially. They especially don't work for buffer objects without a DRM
framebuffer associated.

Hence, this revert to going back to using .import_attach->dmabuf.

v3:
- cc stable

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: <stable@vger.kernel.org> # v6.15+
Link: https://lore.kernel.org/r/20250715155934.150656-8-tzimmermann@suse.de
2025-07-17 12:26:19 +02:00
Thomas Zimmermann
6d496e9569 Revert "drm/gem-shmem: Use dma_buf from GEM object instance"
This reverts commit 1a148af060.

The dma_buf field in struct drm_gem_object is not stable over the
object instance's lifetime. The field becomes NULL when user space
releases the final GEM handle on the buffer object. This resulted
in a NULL-pointer deref.

Workarounds in commit 5307dce878 ("drm/gem: Acquire references on
GEM handles for framebuffers") and commit f6bfc9afc7 ("drm/framebuffer:
Acquire internal references on GEM handles") only solved the problem
partially. They especially don't work for buffer objects without a DRM
framebuffer associated.

Hence, this revert to going back to using .import_attach->dmabuf.

v3:
- cc stable

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: <stable@vger.kernel.org> # v6.15+
Link: https://lore.kernel.org/r/20250715155934.150656-7-tzimmermann@suse.de
2025-07-17 12:26:19 +02:00
Thomas Zimmermann
2712ca878b Revert "drm/gem-framebuffer: Use dma_buf from GEM object instance"
This reverts commit cce16fcd74.

The dma_buf field in struct drm_gem_object is not stable over the
object instance's lifetime. The field becomes NULL when user space
releases the final GEM handle on the buffer object. This resulted
in a NULL-pointer deref.

Workarounds in commit 5307dce878 ("drm/gem: Acquire references on
GEM handles for framebuffers") and commit f6bfc9afc7 ("drm/framebuffer:
Acquire internal references on GEM handles") only solved the problem
partially. They especially don't work for buffer objects without a DRM
framebuffer associated.

Hence, this revert to going back to using .import_attach->dmabuf.

v3:
- cc stable

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: <stable@vger.kernel.org> # v6.15+
Link: https://lore.kernel.org/r/20250715155934.150656-6-tzimmermann@suse.de
2025-07-17 12:26:19 +02:00
Thomas Zimmermann
fb4ef4a52b Revert "drm/prime: Use dma_buf from GEM object instance"
This reverts commit f83a9b8c7f.

The dma_buf field in struct drm_gem_object is not stable over the
object instance's lifetime. The field becomes NULL when user space
releases the final GEM handle on the buffer object. This resulted
in a NULL-pointer deref.

Workarounds in commit 5307dce878 ("drm/gem: Acquire references on
GEM handles for framebuffers") and commit f6bfc9afc7 ("drm/framebuffer:
Acquire internal references on GEM handles") only solved the problem
partially. They especially don't work for buffer objects without a DRM
framebuffer associated.

Hence, this revert to going back to using .import_attach->dmabuf.

v3:
- cc stable

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: <stable@vger.kernel.org> # v6.15+
Link: https://lore.kernel.org/r/20250715155934.150656-5-tzimmermann@suse.de
2025-07-17 12:26:19 +02:00
Thomas Zimmermann
bb7f4972a6 Revert "drm/etnaviv: Use dma_buf from GEM object instance"
This reverts commit e91eb3ae41.

The dma_buf field in struct drm_gem_object is not stable over the
object instance's lifetime. The field becomes NULL when user space
releases the final GEM handle on the buffer object. This resulted
in a NULL-pointer deref.

Workarounds in commit 5307dce878 ("drm/gem: Acquire references on
GEM handles for framebuffers") and commit f6bfc9afc7 ("drm/framebuffer:
Acquire internal references on GEM handles") only solved the problem
partially. They especially don't work for buffer objects without a DRM
framebuffer associated.

Hence, this revert to going back to using .import_attach->dmabuf.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/r/20250715155934.150656-4-tzimmermann@suse.de
2025-07-17 12:26:19 +02:00
Thomas Zimmermann
1e9d2aed7c Revert "drm/vmwgfx: Use dma_buf from GEM object instance"
This reverts commit aec8a40228.

The dma_buf field in struct drm_gem_object is not stable over the
object instance's lifetime. The field becomes NULL when user space
releases the final GEM handle on the buffer object. This resulted
in a NULL-pointer deref.

Workarounds in commit 5307dce878 ("drm/gem: Acquire references on
GEM handles for framebuffers") and commit f6bfc9afc7 ("drm/framebuffer:
Acquire internal references on GEM handles") only solved the problem
partially. They especially don't work for buffer objects without a DRM
framebuffer associated.

Hence, this revert to going back to using .import_attach->dmabuf.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/r/20250715155934.150656-3-tzimmermann@suse.de
2025-07-17 12:26:19 +02:00
Thomas Zimmermann
0ecfb8ddb9 Revert "drm/virtio: Use dma_buf from GEM object instance"
This reverts commit 415cb45895.

The dma_buf field in struct drm_gem_object is not stable over the
object instance's lifetime. The field becomes NULL when user space
releases the final GEM handle on the buffer object. This resulted
in a NULL-pointer deref.

Workarounds in commit 5307dce878 ("drm/gem: Acquire references on
GEM handles for framebuffers") and commit f6bfc9afc7 ("drm/framebuffer:
Acquire internal references on GEM handles") only solved the problem
partially. They especially don't work for buffer objects without a DRM
framebuffer associated.

Hence, this revert to going back to using .import_attach->dmabuf.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/r/20250715155934.150656-3-tzimmermann@suse.de
2025-07-17 12:26:19 +02:00
Lin.Cao
15f77764e9 drm/sched: Remove optimization that causes hang when killing dependent jobs
When application A submits jobs and application B submits a job with a
dependency on A's fence, the normal flow wakes up the scheduler after
processing each job. However, the optimization in
drm_sched_entity_add_dependency_cb() uses a callback that only clears
dependencies without waking up the scheduler.

When application A is killed before its jobs can run, the callback gets
triggered but only clears the dependency without waking up the scheduler,
causing the scheduler to enter sleep state and application B to hang.

Remove the optimization by deleting drm_sched_entity_clear_dep() and its
usage, ensuring the scheduler is always woken up when dependencies are
cleared.

Fixes: 777dbd458c ("drm/amdgpu: drop a dummy wakeup scheduler")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Lin.Cao <lincao12@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250717084453.921097-1-lincao12@amd.com
2025-07-17 12:06:07 +02:00
Alexei Starovoitov
0768e980fe Merge branch 'a-tool-to-verify-the-bpf-memory-model'
Puranjay Mohan says:

====================
A tool to verify the BPF memory model

I am building a tool called blitmus[1] that converts memory model litmus
tests written in C into BPF programs that run in parallel to verify that the
JITs are enforcing the memory model correctly.

With this tool I was able to find a bug in the implementation of the smp_mb()
in the selftests.

Using the following litmus test:

C SB+fencembonceonces

(*
 * Result: Never
 *
 * This litmus test demonstrates that full memory barriers suffice to
 * order the store-buffering pattern, where each process writes to the
 * variable that the preceding process reads.  (Locking and RCU can also
 * suffice, but not much else.)
 *)

{}

P0(int *x, int *y)
{
        int r0;

        WRITE_ONCE(*x, 1);
        smp_mb();
        r0 = READ_ONCE(*y);
}

P1(int *x, int *y)
{
        int r0;

        WRITE_ONCE(*y, 1);
        smp_mb();
        r0 = READ_ONCE(*x);
}

exists (0:r0=0 /\ 1:r0=0)

Running the generated program on an ARMv8 machine:

With the current implementation of smp_mb():

    [root@fedora blitmus]# ./sb_fencembonceonces
    Starting litmus test with configuration:
      Test: SB+fencembonceonces
      Iterations: 4100

    Test SB+fencembonceonces Allowed
    Histogram (4 states)
    4545     *>0:r0=0; 1:r0=0;
    20403742 :>0:r0=0; 1:r0=1;
    20591700 :>0:r0=1; 1:r0=0;
    13       :>0:r0=1; 1:r0=1;
    Ok

    Witnesses
    Positive: 4545, Negative: 40995455
    Condition exists (0:r0=0 /\ 1:r0=0) is validated
    Observation SB+fencembonceonces Sometimes 4545 40995455
    Time SB+fencembonceonces 8.33

    Thu Jul 10 16:56:41 UTC

Positive witnesses mean that smp_mb() is not working as
expected and not providing any ordering.

After applying the patch to fix smp_mb():

    [root@fedora blitmus]# ./sb_fencembonceonces
    Starting litmus test with configuration:
      Test: SB+fencembonceonces
      Iterations: 4100

    Test SB+fencembonceonces Allowed
    Histogram (3 states)
    19657569 :>0:r0=0; 1:r0=1;
    20227574 :>0:r0=1; 1:r0=0;
    1114857  :>0:r0=1; 1:r0=1;
    No

    Witnesses
    Positive: 0, Negative: 41000000
    Condition exists (0:r0=0 /\ 1:r0=0) is NOT validated
    Observation SB+fencembonceonces Never 0 41000000
    Time SB+fencembonceonces 9.58

    Thu Jul 10 16:56:10 UTC

0 positive witnesses mean that invalid behaviour is not seen and smp_mb()
is ordering the operations properly.

I hope to improve this tool more and use it to fuzz the JITs of ARMv8,
RISC-V, and Power and see what other bugs can be exposed.

[1] https://github.com/puranjaymohan/blitmus
====================

Link: https://patch.msgid.link/20250710175434.18829-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:38:53 -07:00
Puranjay Mohan
0769857a07 selftests/bpf: fix implementation of smp_mb()
As BPF doesn't include any barrier instructions, smp_mb() is implemented
by doing a dummy value returning atomic operation. Such an operation
acts a full barrier as enforced by LKMM and also by the work in progress
BPF memory model.

If the returned value is not used, clang[1] can optimize the value
returning atomic instruction in to a normal atomic instruction which
provides no ordering guarantees.

Mark the variable as volatile so the above optimization is never
performed and smp_mb() works as expected.

[1] https://godbolt.org/z/qzze7bG6z

Fixes: 88d706ba7c ("selftests/bpf: Introduce arena spin lock")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20250710175434.18829-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:38:52 -07:00
Tao Chen
fd60aa0a45 bpf/selftests: Add selftests for token info
A previous change added bpf_token_info to get token info with
bpf_get_obj_info_by_fd, this patch adds a new test for token info.

 #461/12  token/bpf_token_info:OK

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Link: https://lore.kernel.org/r/20250716134654.1162635-2-chen.dylane@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:38:05 -07:00
Tao Chen
19d18fdfc7 bpf: Add struct bpf_token_info
The 'commit 35f96de041 ("bpf: Introduce BPF token object")' added
BPF token as a new kind of BPF kernel object. And BPF_OBJ_GET_INFO_BY_FD
already used to get BPF object info, so we can also get token info with
this cmd.
One usage scenario, when program runs failed with token, because of
the permission failure, we can report what BPF token is allowing with
this API for debugging.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Link: https://lore.kernel.org/r/20250716134654.1162635-1-chen.dylane@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:38:05 -07:00
Andrii Nakryiko
8080500cba libbpf: start v1.7 dev cycle
With libbpf 1.6.0 released, adjust libbpf.map and libbpf_version.h to
start v1.7 development cycles.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250716175936.2343013-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:37:27 -07:00
Feng Yang
62ef449b8d bpf: Clean up individual BTF_ID code
Use BTF_ID_LIST_SINGLE(a, b, c) instead of
BTF_ID_LIST(a)
BTF_ID(b, c)

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Link: https://lore.kernel.org/r/20250710055419.70544-1-yangfeng59949@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:34:42 -07:00
Ilya Leoshkevich
1f489662fb bpf: Update iterators.lskel-big-endian.h
The last iterators update (commit 515ee52b22 ("bpf: make preloaded
map iterators to display map elements count")) missed the big-endian
skeleton. Update it by running "make big" with Debian clang version
21.0.0 (++20250706105601+01c97b4953e8-1~exp1~20250706225612.1558).

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250710100907.45880-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:29:18 -07:00
Alexei Starovoitov
13630f9042 Merge branch 'bpf-arm64-relax-constraint-in-bpf-jit-compiler'
Alexis Lothoré (eBPF Foundation) says:

====================
this series follows up on the one introducing 9+ args for tracing
programs [1]. It has been observed with this series that there are cases
for which we can not identify accurately the location of the target
function arguments to prepare correctly the corresponding BPF
trampoline. This is the case for example if:
- the function consumes a struct variable _by value_
- it is passed on the stack (no more register available for it)
- it has some __packed__ or __aligned(X)__ attribute

As a consequence, a small restrictive check has been added to the ARM64
side, highlighting that other arch supporting 9+ args in BPF trampolines
are already suffering from the same issue.  After a bit of discussions
and attempts, the chosen solution is, rather than applying the same
constraint to all JIT compilers, to prevent such function from being
encoded at all in BTF info([2]). As the pahole side is closed to be
integrated, we can now remove the restrictive check from kernel side.

[1] https://lore.kernel.org/bpf/20250527-many_args_arm64-v3-0-3faf7bb8e4a2@bootlin.com/
[2] https://lore.kernel.org/bpf/20250707-btf_skip_structs_on_stack-v3-0-29569e086c12@bootlin.com/

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
---
Alexis Lothoré (eBPF Foundation) (2):
      bpf, arm64: remove structs on stack constraint
      selftests/bpf: enable tracing_struct tests for arm64

 arch/arm64/net/bpf_jit_comp.c                | 5 -----
 tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 -
 2 files changed, 6 deletions(-)
---
base-commit: 8da1e37fc84868b50ba6a7cdf082aa3b0d11e006
change-id: 20250708-arm64_relax_jit_comp-e8889647d8d2

Best regards,
====================

Link: https://patch.msgid.link/20250709-arm64_relax_jit_comp-v1-0-3850fe189092@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:28:43 -07:00
Alexis Lothoré (eBPF Foundation)
4a760d2d7a selftests/bpf: enable tracing_struct tests for arm64
Now that the constraint preventing attachment to functions consuming
struct on stack has been removed from the kernel (and moved to pahole,
with a slightly smarter detection, to prevent only those that are
packed), re-enable the tracing_struct tests for arm64.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20250709-arm64_relax_jit_comp-v1-2-3850fe189092@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:28:30 -07:00
Alexis Lothoré (eBPF Foundation)
dc704d0cfa bpf, arm64: remove structs on stack constraint
While introducing support for 9+ arguments for tracing programs on
ARM64, commit 9014cf56f1 ("bpf, arm64: Support up to 12 function
arguments") has also introduced a constraint preventing BPF trampolines
from being generated if the target function consumes a struct argument
passed on stack, because of uncertainties around the exact struct
location: if the struct has been marked as packed or with a custom
alignment, this info is not reflected in BTF data, and so generated
tracing trampolines could read the target function arguments at wrong
offsets.

This issue is not specific to ARM64: there has been an attempt (see [1])
to bring the same constraint to other architectures JIT compilers. But
discussions following this attempt led to the move of this constraint
out of the kernel (see [2]): instead of preventing the kernel from
generating trampolines for those functions consuming structs on stack,
it is simpler to just make sure that those functions with uncertain
struct arguments location are not encoded in BTF information, and so
that one can not even attempt to attach a tracing program to such
function. The task is then deferred to pahole (see [3]).

Now that the constraint is handled by pahole, remove it from the arm64
JIT compiler to keep it simple.

[1] https://lore.kernel.org/bpf/20250613-deny_trampoline_structs_on_stack-v1-0-5be9211768c3@bootlin.com/
[2] https://lore.kernel.org/bpf/CAADnVQ+sj9XhscN9PdmTzjVa7Eif21noAUH3y1K6x5bWcL-5pg@mail.gmail.com/
[3] https://lore.kernel.org/bpf/20250707-btf_skip_structs_on_stack-v3-0-29569e086c12@bootlin.com/

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20250709-arm64_relax_jit_comp-v1-1-3850fe189092@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:28:30 -07:00
Arunpravin Paneer Selvam
95a16160ca drm/amdgpu: Reset the clear flag in buddy during resume
- Added a handler in DRM buddy manager to reset the cleared
  flag for the blocks in the freelist.

- This is necessary because, upon resuming, the VRAM becomes
  cluttered with BIOS data, yet the VRAM backend manager
  believes that everything has been cleared.

v2:
  - Add lock before accessing drm_buddy_clear_reset_blocks()(Matthew Auld)
  - Force merge the two dirty blocks.(Matthew Auld)
  - Add a new unit test case for this issue.(Matthew Auld)
  - Having this function being able to flip the state either way would be
    good. (Matthew Brost)

v3(Matthew Auld):
  - Do merge step first to avoid the use of extra reset flag.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: stable@vger.kernel.org
Fixes: a68c7eaa7a ("drm/amdgpu: Enable clear page functionality")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250716075125.240637-2-Arunpravin.PaneerSelvam@amd.com
2025-07-16 12:50:32 +02:00
Yonghong Song
e860a98c8a selftests/bpf: Fix build error due to certain uninitialized variables
With the latest llvm21 compiler, I hit several errors when building bpf
selftests. Some of errors look like below:

  test_maps.c:565:40: error: variable 'val' is uninitialized when passed as a
      const pointer argument here [-Werror,-Wuninitialized-const-pointer]
    565 |         assert(bpf_map_update_elem(fd, NULL, &val, 0) < 0 &&
        |                                               ^~~

  prog_tests/bpf_iter.c:400:25: error: variable 'c' is uninitialized when passed
      as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
  400 |         write(finish_pipe[1], &c, 1);
      |                                ^

Some other errors have similar the pattern as the above.

These errors are fixed by initializing those variables properly.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250715185910.3659447-1-yonghong.song@linux.dev
2025-07-15 14:38:58 -07:00
Steffen Klassert
28712d6ed3 Merge branch 'ipsec: fix splat due to ipcomp fallback tunnel'
Sabrina Dubroca says:

====================
IPcomp tunnel states have an associated fallback tunnel, a keep a
reference on the corresponding xfrm_state, to allow deleting that
extra state when it's not needed anymore. These states cause issues
during netns deletion.

Commit f75a2804da ("xfrm: destroy xfrm_state synchronously on net
exit path") tried to address these problems but doesn't fully solve
them, and slowed down netns deletion by adding one synchronize_rcu per
deleted state.

The first patch solves the problem by moving the fallback state
deletion earlier (when we delete the user state, rather than at
destruction), then we can revert the previous fix.
====================

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-14 08:59:48 +02:00
Andrii Nakryiko
ea2aecdf7a Merge branch 'move-attach_type-into-bpf_link'
Tao Chen says:

====================
Move attach_type into bpf_link

Andrii suggested moving the attach_type into bpf_link, the previous discussion
is as follows:
https://lore.kernel.org/bpf/CAEf4BzY7TZRjxpCJM-+LYgEqe23YFj5Uv3isb7gat2-HU4OSng@mail.gmail.com

patch1 add attach_type in bpf_link, and pass it to bpf_link_init, which
will init the attach_type field.

patch2-7 remove the attach_type in struct bpf_xx_link, update the info
with bpf_link attach_type.

There are some functions finally call bpf_link_init but do not have bpf_attr
from user or do not need to init attach_type from user like bpf_raw_tracepoint_open,
now use prog->expected_attach_type to init attach_type.

bpf_struct_ops_map_update_elem
bpf_raw_tracepoint_open
bpf_struct_ops_test_run

Feedback of any kind is welcome, thanks.
====================

Link: https://patch.msgid.link/20250710032038.888700-1-chen.dylane@linux.dev
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-11 11:01:09 -07:00
Tao Chen
601a3956fe netkit: Remove location field in netkit_link
Use attach_type in bpf_link to replace the location field, and
remove location field in netkit_link.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250710032038.888700-8-chen.dylane@linux.dev
2025-07-11 11:01:09 -07:00
Tao Chen
0eeeebdcc5 bpf: Remove attach_type in bpf_tracing_link
Use attach_type in bpf_link, and remove it in bpf_tracing_link.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250710032038.888700-7-chen.dylane@linux.dev
2025-07-11 11:01:08 -07:00
Tao Chen
2a76a80c7f bpf: Remove attach_type in bpf_netns_link
Use attach_type in bpf_link, and remove it in bpf_netns_link.
And move netns_type field to the end to fill the byte hole.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20250710032038.888700-6-chen.dylane@linux.dev
2025-07-11 11:01:04 -07:00
Tao Chen
6e816e1c05 bpf: Remove location field in tcx_link
Use attach_type in bpf_link to replace the location filed, and
remove location field in tcx_link.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20250710032038.888700-5-chen.dylane@linux.dev
2025-07-11 11:00:57 -07:00
Tao Chen
33f69f7365 bpf: Remove attach_type in sockmap_link
Use attach_type in bpf_link, and remove it in sockmap_link.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250710032038.888700-4-chen.dylane@linux.dev
2025-07-11 10:51:55 -07:00
Tao Chen
9b8d543dc2 bpf: Remove attach_type in bpf_cgroup_link
Use attach_type in bpf_link, and remove it in bpf_cgroup_link.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250710032038.888700-3-chen.dylane@linux.dev
2025-07-11 10:51:55 -07:00
Tao Chen
b725441f02 bpf: Add attach_type field to bpf_link
Attach_type will be set when a link is created by user. It is better to
record attach_type in bpf_link generically and have it available
universally for all link types. So add the attach_type field in bpf_link
and move the sleepable field to avoid unnecessary gap padding.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250710032038.888700-2-chen.dylane@linux.dev
2025-07-11 10:51:55 -07:00
Paul Chaignon
d81526a6eb selftests/bpf: Range analysis test case for JSET
This patch adds coverage for the warning detected by syzkaller and fixed
in the previous patch. Without the previous patch, this test fails with:

  verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds
  violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0]
  var_off=(0x0, 0x0)(1)

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/c7893be1170fdbcf64e0200c110cdbd360ce7086.1752171365.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11 10:45:25 -07:00
Paul Chaignon
6279846b9b bpf: Forget ranges when refining tnum after JSET
Syzbot reported a kernel warning due to a range invariant violation on
the following BPF program.

  0: call bpf_get_netns_cookie
  1: if r0 == 0 goto <exit>
  2: if r0 & Oxffffffff goto <exit>

The issue is on the path where we fall through both jumps.

That path is unreachable at runtime: after insn 1, we know r0 != 0, but
with the sign extension on the jset, we would only fallthrough insn 2
if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
figure this out, so the verifier walks all branches. The verifier then
refines the register bounds using the second condition and we end
up with inconsistent bounds on this unreachable path:

  1: if r0 == 0 goto <exit>
    r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
  2: if r0 & 0xffffffff goto <exit>
    r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
    r0 after reg_bounds_sync:  u64=[0x1, 0] var_off=(0, 0)

Improving the range refinement for JSET to cover all cases is tricky. We
also don't expect many users to rely on JSET given LLVM doesn't generate
those instructions. So instead of improving the range refinement for
JSETs, Eduard suggested we forget the ranges whenever we're narrowing
tnums after a JSET. This patch implements that approach.

Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/9d4fd6432a095d281f815770608fdcd16028ce0b.1752171365.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11 10:45:25 -07:00
Alexei Starovoitov
2b1fd82cba Merge branch 'bpf-arena-add-kfunc-for-reserving-arena-memory'
Emil Tsalapatis says:

====================
bpf/arena: Add kfunc for reserving arena memory

Add a new kfunc for BPF arenas that reserves a region of the mapping
to prevent it from being mapped. These regions serve as guards against
out-of-bounds accesses and are useful for debugging arena-related code.

>From v3 (20250709015712.97099-1-emil@etsalapatis.com)
------------------------------------------------------

- Added Acked-by tags by Yonghong.
- Replace hardcoded error numbers in selftests (Yonghong).
- Fixed selftest for partially freeing a reserved region (Yonghong).

>From v2 (20250702003351.197234-1-emil@etsalapatis.com)
------------------------------------------------------

- Removed -EALREADY and replaced with -EINVAL to bring error handling in
  line with the rest of the BPF code (Alexei).

>From v1 (20250620031118.245601-1-emil@etsalapatis.com)
------------------------------------------------------

- Removed the additional guard range tree. Adjusted tests accordingly.
  Reserved regions now behave like allocated regions, and can be
  unreserved using bpf_arena_free_pages(). They can also be allocated
  from userspace through minor faults. It is up to the user to prevent
  erroneous frees and/or use the BPF_F_SEGV_ON_FAULT flag to catch
  stray userspace accesses (Alexei).
- Changed terminology from guard pages to reserved pages (Alexei,
  Kartikeya).

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
====================

Link: https://patch.msgid.link/20250709191312.29840-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11 10:44:02 -07:00
Emil Tsalapatis
9f9559f0ac selftests/bpf: add selftests for bpf_arena_reserve_pages
Add selftests for the new bpf_arena_reserve_pages kfunc.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250709191312.29840-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11 10:43:54 -07:00
Emil Tsalapatis
8fc3d2d8b5 bpf/arena: add bpf_arena_reserve_pages kfunc
Add a new BPF arena kfunc for reserving a range of arena virtual
addresses without backing them with pages. This prevents the range from
being populated using bpf_arena_alloc_pages().

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250709191312.29840-2-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11 10:43:54 -07:00
Jan-Niklas Burfeind
aef9da3338
platform/x86: dell-lis3lv02d: Add Precision 3551
This marks 0x29 as accelerometer address on Dell Precision 3551.

I followed previous works of Paul Menzel and Hans de Goede to verify it:

$ cd /sys/bus/pci/drivers/i801_smbus/0000\:00\:1f.4

$ ls -d i2c-?
i2c-0

$ sudo modprobe i2c-dev

$ sudo i2cdetect 0
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-0.
I will probe address range 0x08-0x77.
Continue? [Y/n] Y
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         08 -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- 29 -- -- -- -- -- --
30: 30 -- -- -- -- 35 UU UU -- -- -- -- -- -- -- --
40: -- -- -- -- 44 -- -- -- -- -- -- -- -- -- -- --
50: UU -- 52 -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

$ echo lis3lv02d 0x29 > sudo tee /sys/bus/i2c/devices/i2c-0/new_device
lis3lv02d 0x29

$ sudo dmesg
[    0.000000] Linux version 6.12.28 (nixbld@localhost) (gcc (GCC) 14.2.1 20250322, GNU ld (GNU Binutils) 2.44) #1-NixOS SMP PREEMPT_DYNAMIC Fri May  9 07:50:53 UTC 2025
[...]
[    0.000000] DMI: Dell Inc. Precision 3551/07YHW8, BIOS 1.18.0 10/03/2022
[...]
[ 3749.077624] lis3lv02d_i2c 0-0029: supply Vdd not found, using dummy regulator
[ 3749.077732] lis3lv02d_i2c 0-0029: supply Vdd_IO not found, using dummy regulator
[ 3749.098674] lis3lv02d: 8 bits 3DC sensor found
[ 3749.182480] input: ST LIS3LV02DL Accelerometer as /devices/platform/lis3lv02d/input/input28
[ 3749.182899] i2c i2c-0: new_device: Instantiated device lis3lv02d at 0x29

Signed-off-by: Jan-Niklas Burfeind <kernel@aiyionpri.me>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/r/20250710190919.37842-1-kernel@aiyionpri.me
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-11 19:00:15 +03:00
Kurt Borja
dbfb567f4a
platform/x86: alieneware-wmi-wmax: Add AWCC support to more laptops
Add support to Alienware Area-51m and Alienware m15 R5.

Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Link: https://lore.kernel.org/r/20250710-m15_r5-v1-1-2c6ad44e5987@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-11 18:59:55 +03:00
Torsten Hilbrich
2bfe3ae1aa
platform/x86: Fix initialization order for firmware_attributes_class
The think-lmi driver uses the firwmare_attributes_class. But this class
is registered after think-lmi, causing the "think-lmi" directory in
"/sys/class/firmware-attributes" to be missing when the driver is
compiled as builtin.

Fixes: 5592240380 ("platform/x86: think-lmi: Directly use firmware_attributes_class")
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Link: https://lore.kernel.org/r/7dce5f7f-c348-4350-ac53-d14a8e1e8034@secunet.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-11 18:52:54 +03:00
Johan Hovold
bc48d79a18
platform: arm64: huawei-gaokun-ec: fix OF node leak
Make sure to drop the OF node reference taken when creating the Gaokun
auxiliary devices when the devices are later released.

Fixes: 7636f090d0 ("platform: arm64: add Huawei Matebook E Go EC driver")
Cc: Pengyu Luo <mitltlatltl@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Pengyu Luo <mitltlatltl@gmail.com>
Link: https://lore.kernel.org/r/20250708085358.15657-1-johan@kernel.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-11 18:48:39 +03:00
Jackie Dong
c86f7bb92f
lenovo-wmi-hotkey: Avoid triggering error -5 due to missing mute LED
Not all of Lenovo non-ThinkPad devices support both mic mute LED (on F4)
and audio mute LED (on F1). Some of them only support one mute LED, some
of them don't have any mute LEDs. If any of the mute LEDs is missing,
the driver reports error -5.

Check if the device supports a mute LED or not. Do not trigger error -5
message from missing a mute LED if it is not supported on the device.

Signed-off-by: Jackie Dong <xy-jackie@139.com>
Suggested-by: Hans de Goede <hansg@kernel.org>
Link: https://lore.kernel.org/r/20250709035716.36267-1-xy-jackie@139.com
[ij: major edits to the changelog.]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-09 11:49:20 +03:00
Eduard Zingerman
ad97cb2ed0 selftests/bpf: Remove enum64 case from __arg_untrusted test suite
The enum64 type used by verifier_global_ptr_args test case requires
CONFIG_SCHED_CLASS_EXT. At the moment selftets do not depend on this
option. There are just a few enum64 types in the kernel. Instead of
tying selftests to implementation details of unrelated sub-systems,
just remove enum64 test case. Simple enums are covered and that should
be sufficient.

Fixes: 68cca81fd5 ("selftests/bpf: tests for __arg_untrusted void * global func params")
Reported-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20250708220856.3059578-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-08 18:57:47 -07:00
Chen-Yu Tsai
a46b4822be arm64: dts: allwinner: a523: Rename emac0 to gmac0
The datasheets refer to the first Ethernet controller as GMAC0, not
EMAC0.

Fix the compatible string, node label and pinmux function name to match
what the datasheets use. A change to the device tree binding is sent
separately.

Fixes: 56766ca6c4 ("arm64: dts: allwinner: a523: Add EMAC0 ethernet MAC")
Fixes: acca163f3f ("arm64: dts: allwinner: a527: add EMAC0 to Radxa A5E board")
Fixes: c6800f1599 ("arm64: dts: allwinner: t527: add EMAC0 to Avaota-A1 board")
Link: https://patch.msgid.link/20250628054438.2864220-3-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-07-09 00:19:44 +08:00
Sabrina Dubroca
2a198bbec6 Revert "xfrm: destroy xfrm_state synchronously on net exit path"
This reverts commit f75a2804da.

With all states (whether user or kern) removed from the hashtables
during deletion, there's no need for synchronous destruction of
states. xfrm6_tunnel states still need to have been destroyed (which
will be the case when its last user is deleted (not destroyed)) so
that xfrm6_tunnel_free_spi removes it from the per-netns hashtable
before the netns is destroyed.

This has the benefit of skipping one synchronize_rcu per state (in
__xfrm_state_destroy(sync=true)) when we exit a netns.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-08 13:28:29 +02:00
Sabrina Dubroca
b441cf3f8c xfrm: delete x->tunnel as we delete x
The ipcomp fallback tunnels currently get deleted (from the various
lists and hashtables) as the last user state that needed that fallback
is destroyed (not deleted). If a reference to that user state still
exists, the fallback state will remain on the hashtables/lists,
triggering the WARN in xfrm_state_fini. Because of those remaining
references, the fix in commit f75a2804da ("xfrm: destroy xfrm_state
synchronously on net exit path") is not complete.

We recently fixed one such situation in TCP due to defered freeing of
skbs (commit 9b6412e697 ("tcp: drop secpath at the same time as we
currently drop dst")). This can also happen due to IP reassembly: skbs
with a secpath remain on the reassembly queue until netns
destruction. If we can't guarantee that the queues are flushed by the
time xfrm_state_fini runs, there may still be references to a (user)
xfrm_state, preventing the timely deletion of the corresponding
fallback state.

Instead of chasing each instance of skbs holding a secpath one by one,
this patch fixes the issue directly within xfrm, by deleting the
fallback state as soon as the last user state depending on it has been
deleted. Destruction will still happen when the final reference is
dropped.

A separate lockdep class for the fallback state is required since
we're going to lock x->tunnel while x is locked.

Fixes: 9d4139c769 ("netns xfrm: per-netns xfrm_state_all list")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-08 13:28:27 +02:00
Rong Zhang
e10981075a
platform/x86: ideapad-laptop: Fix kbd backlight not remembered among boots
On some models supported by ideapad-laptop, the HW/FW can remember the
state of keyboard backlight among boots. However, it is always turned
off while shutting down, as a side effect of the LED class device
unregistering sequence.

This is inconvenient for users who always prefer turning on the
keyboard backlight. Thus, set LED_RETAIN_AT_SHUTDOWN on the LED class
device so that the state of keyboard backlight gets remembered, which
also aligns with the behavior of manufacturer utilities on Windows.

Fixes: 503325f84b ("platform/x86: ideapad-laptop: add keyboard backlight control support")
Cc: stable@vger.kernel.org
Signed-off-by: Rong Zhang <i@rong.moe>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Link: https://lore.kernel.org/r/20250707163808.155876-3-i@rong.moe
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-08 13:23:29 +03:00
Rong Zhang
9533b789df
platform/x86: ideapad-laptop: Fix FnLock not remembered among boots
On devices supported by ideapad-laptop, the HW/FW can remember the
FnLock state among boots. However, since the introduction of the FnLock
LED class device, it is turned off while shutting down, as a side effect
of the LED class device unregistering sequence.

Many users always turn on FnLock because they use function keys much
more frequently than multimedia keys. The behavior change is
inconvenient for them. Thus, set LED_RETAIN_AT_SHUTDOWN on the LED class
device so that the FnLock state gets remembered, which also aligns with
the behavior of manufacturer utilities on Windows.

Fixes: 07f48f668f ("platform/x86: ideapad-laptop: add FnLock LED class device")
Cc: stable@vger.kernel.org
Signed-off-by: Rong Zhang <i@rong.moe>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Link: https://lore.kernel.org/r/20250707163808.155876-2-i@rong.moe
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-08 13:23:26 +03:00
Tao Chen
3413bc0cf1 bpf: Clean code with bpf_copy_to_user()
No logic change, use bpf_copy_to_user() to clean code.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250703163700.677628-1-chen.dylane@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:53:59 -07:00
Paul Chaignon
192e3aa145 selftests/bpf: Negative test case for tail call map
This patch adds a negative test case for the following verifier error.

    expected prog array map for tail call

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/aGu0i1X_jII-3aFa@mail.gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:46:13 -07:00
Alexei Starovoitov
df4b1eebd8 Merge branch 'bpf-fix-and-test-aux-usage-after-do_check_insn'
Luis Gerhorst says:

====================
bpf: Fix and test aux usage after do_check_insn()

Fix cur_aux()->nospec_result test after do_check_insn() referring to the
to-be-analyzed (potentially unsafe) instruction, not the
already-analyzed (safe) instruction. This might allow a unsafe insn to
slip through on a speculative path. Create some tests from the
reproducer [1].

Commit d6f1c85f22 ("bpf: Fall back to nospec for Spectre v1") should
not be in any stable kernel yet, therefore bpf-next should suffice.

[1] https://lore.kernel.org/bpf/685b3c1b.050a0220.2303ee.0010.GAE@google.com/

Changes since v2:
- Use insn_aux variable instead of introducing prev_aux() as suggested
  by Eduard (and therefore also drop patch 1)
- v2: https://lore.kernel.org/bpf/20250628145016.784256-1-luis.gerhorst@fau.de/

Changes since v1:
- Fix compiler error due to missed rename of prev_insn_idx in first
  patch
- v1: https://lore.kernel.org/bpf/20250628125927.763088-1-luis.gerhorst@fau.de/

Changes since RFC:
- Introduce prev_aux() as suggested by Alexei. For this, we must move
  the env->prev_insn_idx assignment to happen directly after
  do_check_insn(), for which I have created a separate commit. This
  patch could be simplified by using a local prev_aux variable as
  sugested by Eduard, but I figured one might find the new
  assignment-strategy easier to understand (before, prev_insn_idx and
  env->prev_insn_idx were out-of-sync for the latter part of the loop).
  Also, like this we do not have an additional prev_* variable that must
  be kept in-sync and the local variable's usage (old prev_insn_idx, new
  tmp) is much more local. If you think it would be better to not take
  the risk and keep the fix simple by just introducing the prev_aux
  variable, let me know.
- Change WARN_ON_ONCE() to verifier_bug_if() as suggested by Alexei
- Change assertion to check instruction is BPF_JMP[32] as suggested by
  Eduard
- RFC: https://lore.kernel.org/bpf/8734bmoemx.fsf@fau.de/
====================

Link: https://patch.msgid.link/20250705190908.1756862-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:32:35 -07:00
Luis Gerhorst
92974cef83 selftests/bpf: Add Spectre v4 tests
Add the following tests:

1. A test with an (unimportant) ldimm64 (16 byte insn) and a
   Spectre-v4--induced nospec that clarifies and serves as a basic
   Spectre v4 test.

2. Make sure a Spectre v4 nospec_result does not prevent a Spectre v1
   nospec from being added before the dangerous instruction (tests that
   [1] is fixed).

3. Combine the two, which is the combination that triggers the warning
   in [2]. This is because the unanalyzed stack write has nospec_result
   set, but the ldimm64 (which was just analyzed) had incremented
   insn_idx by 2. That violates the assertion that nospec_result is only
   used after insns that increment insn_idx by 1 (i.e., stack writes).

[1] https://lore.kernel.org/bpf/4266fd5de04092aa4971cbef14f1b4b96961f432.camel@gmail.com/
[2] https://lore.kernel.org/bpf/685b3c1b.050a0220.2303ee.0010.GAE@google.com/

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Link: https://lore.kernel.org/r/20250705190908.1756862-3-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:32:34 -07:00
Luis Gerhorst
dadb59104c bpf: Fix aux usage after do_check_insn()
We must terminate the speculative analysis if the just-analyzed insn had
nospec_result set. Using cur_aux() here is wrong because insn_idx might
have been incremented by do_check_insn(). Therefore, introduce and use
insn_aux variable.

Also change cur_aux(env)->nospec in case do_check_insn() ever manages to
increment insn_idx but still fail.

Change the warning to check the insn class (which prevents it from
triggering for ldimm64, for which nospec_result would not be
problematic) and use verifier_bug_if().

In line with Eduard's suggestion, do not introduce prev_aux() because
that requires one to understand that after do_check_insn() call what was
current became previous. This would at-least require a comment.

Fixes: d6f1c85f22 ("bpf: Fall back to nospec for Spectre v1")
Reported-by: Paul Chaignon <paul.chaignon@gmail.com>
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Reported-by: syzbot+dc27c5fb8388e38d2d37@syzkaller.appspotmail.com
Link: https://lore.kernel.org/bpf/685b3c1b.050a0220.2303ee.0010.GAE@google.com/
Link: https://lore.kernel.org/bpf/4266fd5de04092aa4971cbef14f1b4b96961f432.camel@gmail.com/
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250705190908.1756862-2-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:32:34 -07:00
Saket Kumar Bhaskar
0f626c98fd selftests/bpf: Set CONFIG_PACKET=y for selftests
BPF selftest fails to build with below error:

  CLNG-BPF [test_progs] lsm_cgroup.bpf.o
progs/lsm_cgroup.c:105:21: error: variable has incomplete type 'struct sockaddr_ll'
  105 |         struct sockaddr_ll sa = {};
      |                            ^
progs/lsm_cgroup.c:105:9: note: forward declaration of 'struct sockaddr_ll'
  105 |         struct sockaddr_ll sa = {};
      |                ^
1 error generated.

lsm_cgroup selftest requires sockaddr_ll structure which is not there
in vmlinux.h when the kernel is built with CONFIG_PACKET=m.

Enabling CONFIG_PACKET=y ensures that sockaddr_ll is available in vmlinux,
allowing it to be captured in the generated vmlinux.h for bpf selftests.

Reported-by: Sachin P Bappalige <sachinpb@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250707071735.705137-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:31:57 -07:00
Alexei Starovoitov
0074250c35 Merge branch 'bpf-streams-fixes'
Kumar Kartikeya Dwivedi says:

====================
BPF Streams - Fixes

This set contains some fixes for recently reported issues for BPF
streams. Please check individual patches for details.
====================

Link: https://patch.msgid.link/20250705053035.3020320-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:30:15 -07:00
Kumar Kartikeya Dwivedi
bfa2bb9abd bpf: Fix improper int-to-ptr cast in dump_stack_cb
On 32-bit platforms, we'll try to convert a u64 directly to a pointer
type which is 32-bit, which causes the compiler to complain about cast
from an integer of a different size to a pointer type. Cast to long
before casting to the pointer type to match the pointer width.

Reported-by: kernelci.org bot <bot@kernelci.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: d7c431cafc ("bpf: Add dump_stack() analogue to print to BPF stderr")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20250705053035.3020320-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:30:15 -07:00
Kumar Kartikeya Dwivedi
116c8f4747 bpf: Fix bounds for bpf_prog_get_file_line linfo loop
We may overrun the bounds because linfo and jited_linfo are already
advanced to prog->aux->linfo_idx, hence we must only iterate the
remaining elements until we reach prog->aux->nr_linfo. Adjust the
nr_linfo calculation to fix this. Reported in [0].

  [0]: https://lore.kernel.org/bpf/f3527af3b0620ce36e299e97e7532d2555018de2.camel@gmail.com

Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Fixes: 0e521efaf3 ("bpf: Add function to extract program source info")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250705053035.3020320-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:30:15 -07:00
Alexei Starovoitov
6e5cae9dda Merge branch 'bpf-additional-use-cases-for-untrusted-ptr_to_mem'
Eduard Zingerman says:

====================
bpf: additional use-cases for untrusted PTR_TO_MEM

This patch set introduces two usability enhancements leveraging
untrusted pointers to mem:
- When reading a pointer field from a PTR_TO_BTF_ID, the resulting
  value is now assumed to be PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED
  instead of SCALAR_VALUE, provided the pointer points to a primitive
  type.
- __arg_untrusted attribute for global function parameters,
  allowed for pointer arguments of both structural and primitive
  types:
  - For structural types, the attribute produces
    PTR_TO_BTF_ID|PTR_UNTRUSTED.
  - For primitive types, it yields
    PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED.

Here are examples enabled by the series:

  struct foo {
    int *arr;
  };
  ...
  p = bpf_core_cast(..., struct foo);
  bpf_for(i, 0, ...) {
    ... p->arr[i] ...       // load at any offset is allowed
  }

  int memcmp(void *a __arg_untrusted, void *b __arg_untrusted, size_t n) {
    bpf_for(i, 0, n)
      if (a[i] - b[i])      // load at any offset is allowed
        return ...;
    return 0;
  }

The patch-set was inspired by Anrii's series [1]. The goal of that
series was to define a generic global glob_match function, capable to
accept any pointer type:

  __weak int glob_match(const char *pat, const char *str);

  char filename_glob[32];

  void foo(...) {
    ...
    task = bpf_get_current_task_btf();
    filename = task->mm->exe_file->f_path.dentry->d_name.name;
    ... match_glob(filename_glob,  // pointer to map value
                   filename) ...   // scalar
  }

At the moment, there is no straightforward way to express such a
function. This patch-set makes it possible to define it as follows:

  __weak int glob_match(const char *pat __arg_untrusted,
                        const char *str __arg_untrusted);

[1] https://github.com/anakryiko/linux/tree/bpf-mem-cast

Changelog:
v1: https://lore.kernel.org/bpf/20250702224209.3300396-1-eddyz87@gmail.com/
v1 -> v2:
- Added safety check in btf_prepare_func_args() to ensure that only
  struct or primitive types could be combined with __arg_untrusted
  (Alexei).
- Removed unnecessary 'continue' btf_check_func_arg_match() (Alexei).
- Added test cases for __arg_untrusted pointers to enum and
  __arg_untrusted combined with non-kernel type (Kumar).
- Added acks from Kumar.
====================

Link: https://patch.msgid.link/20250704230354.1323244-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:08 -07:00
Eduard Zingerman
68cca81fd5 selftests/bpf: tests for __arg_untrusted void * global func params
Check usage of __arg_untrusted parameters of primitive type:
- passing of {trusted, untrusted, map value, scalar value, values with
  variable offset} to untrusted `void *`, `char *` or enum is ok;
- varifier represents such parameters as rdonly_untrusted_mem(sz=0).

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-9-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:07 -07:00
Eduard Zingerman
c4aa454c64 bpf: support for void/primitive __arg_untrusted global func params
Allow specifying __arg_untrusted for void */char */int */long *
parameters. Treat such parameters as
PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED of size zero.
Intended usage is as follows:

  int memcmp(char *a __arg_untrusted, char *b __arg_untrusted, size_t n) {
    bpf_for(i, 0, n) {
      if (a[i] - b[i])      // load at any offset is allowed
        return a[i] - b[i];
    }
    return 0;
  }

Allocate register id for ARG_PTR_TO_MEM parameters only when
PTR_MAYBE_NULL is set. Register id for PTR_TO_MEM is used only to
propagate non-null status after conditionals.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:07 -07:00
Eduard Zingerman
54ac2c9418 selftests/bpf: test cases for __arg_untrusted
Check usage of __arg_untrusted parameters with PTR_TO_BTF_ID:
- combining __arg_untrusted with other tags is forbidden;
- non-kernel (program local) types for __arg_untrusted are forbidden;
- passing of {trusted, untrusted, map value, scalar value, values with
  variable offset} to untrusted is ok;
- passing of PTR_TO_BTF_ID with a different type to untrusted is ok;
- passing of untrusted to trusted is forbidden.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:07 -07:00
Eduard Zingerman
aaa0e57e69 libbpf: __arg_untrusted in bpf_helpers.h
Make btf_decl_tag("arg:untrusted") available for libbpf users via
macro. Makes the following usage possible:

  void foo(struct bar *p __arg_untrusted) { ... }
  void bar(struct foo *p __arg_trusted) {
    ...
    foo(p->buz->bar); // buz derefrence looses __trusted
    ...
  }

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:07 -07:00
Eduard Zingerman
182f7df704 bpf: attribute __arg_untrusted for global function parameters
Add support for PTR_TO_BTF_ID | PTR_UNTRUSTED global function
parameters. Anything is allowed to pass to such parameters, as these
are read-only and probe read instructions would protect against
invalid memory access.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:06 -07:00
Eduard Zingerman
f1f5d6f25d selftests/bpf: ptr_to_btf_id struct walk ending with primitive pointer
Validate that reading a PTR_TO_BTF_ID field produces a value of type
PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED, if field is a pointer to a
primitive type.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:06 -07:00
Eduard Zingerman
2d5c91e1cc bpf: rdonly_untrusted_mem for btf id walk pointer leafs
When processing a load from a PTR_TO_BTF_ID, the verifier calculates
the type of the loaded structure field based on the load offset.
For example, given the following types:

  struct foo {
    struct foo *a;
    int *b;
  } *p;

The verifier would calculate the type of `p->a` as a pointer to
`struct foo`. However, the type of `p->b` is currently calculated as a
SCALAR_VALUE.

This commit updates the logic for processing PTR_TO_BTF_ID to instead
calculate the type of p->b as PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED.
This change allows further dereferencing of such pointers (using probe
memory instructions).

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:06 -07:00
Eduard Zingerman
b9d44bc9fd bpf: make makr_btf_ld_reg return error for unexpected reg types
Non-functional change:
mark_btf_ld_reg() expects 'reg_type' parameter to be either
SCALAR_VALUE or PTR_TO_BTF_ID. Next commit expands this set, so update
this function to fail if unexpected type is passed. Also update
callers to propagate the error.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:06 -07:00
Shravan Kumar Ramani
0e2cebd723
platform/mellanox: mlxbf-pmc: Use kstrtobool() to check 0/1 input
For setting the enable value, the input should be 0 or 1 only. Use
kstrtobool() in place of kstrtoint() in mlxbf_pmc_enable_store() to
accept only valid input.

Fixes: 423c336185 ("platform/mellanox: mlxbf-pmc: Add support for BlueField-3")
Signed-off-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/2ee618c59976bcf1379d5ddce2fc60ab5014b3a9.1751380187.git.shravankr@nvidia.com
[ij: split kstrbool() change to own commit.]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-07 15:53:28 +03:00
Shravan Kumar Ramani
f8c1311769
platform/mellanox: mlxbf-pmc: Validate event/enable input
Before programming the event info, validate the event number received as input
by checking if it exists in the event_list. Also fix a typo in the comment for
mlxbf_pmc_get_event_name() to correctly mention that it returns the event name
when taking the event number as input, and not the other way round.

Fixes: 423c336185 ("platform/mellanox: mlxbf-pmc: Add support for BlueField-3")
Signed-off-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/2ee618c59976bcf1379d5ddce2fc60ab5014b3a9.1751380187.git.shravankr@nvidia.com
[ij: split kstrbool() change to own commit.]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-07 15:53:27 +03:00
Shravan Kumar Ramani
44e6ca8fae
platform/mellanox: mlxbf-pmc: Remove newline char from event name input
Since the input string passed via the command line appends a newline char,
it needs to be removed before comparison with the event_list.

Fixes: 1a218d312e ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver")
Signed-off-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/4978c18e33313b48fa2ae7f3aa6dbcfce40877e4.1751380187.git.shravankr@nvidia.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-07 15:53:23 +03:00
Armin Wolf
d4e83784b2
platform/x86: dell-ddv: Fix taking the psy->extensions_sem lock twice
Calling power_supply_get_property() inside
dell_wmi_ddv_battery_translate() can cause a deadlock since this
function is also being called from the power supply extension code,
in which case psy->extensions_sem is already being held.

Fix this by using the new power_supply_get_property_direct() function
to ignore any power supply extensions when retrieving the battery
serial number.

Tested on a Dell Inspiron 3505.

Reported-by: Hans de Goede <hansg@kernel.org>
Fixes: 058de163a3 ("platform/x86: dell-ddv: Implement the battery matching algorithm")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20250627205124.250433-3-W_Armin@gmx.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-07 15:19:51 +03:00
Armin Wolf
a5f3542321
power: supply: test-power: Test access to extended power supply
Test that power supply extensions can access properties of their
power supply using power_supply_get_property_direct(). This both
ensures that the functionality works and serves as an example for
future driver developers.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://lore.kernel.org/r/20250627205124.250433-2-W_Armin@gmx.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-07 15:19:49 +03:00
Armin Wolf
3ebed2fddf
power: supply: core: Add power_supply_get/set_property_direct()
Power supply extensions might want to interact with the underlying
power supply to retrieve data like serial numbers, charging status
and more. However doing so causes psy->extensions_sem to be locked
twice, possibly causing a deadlock.

Provide special variants of power_supply_get/set_property() that
ignore any power supply extensions and thus do not touch the
associated psy->extensions_sem lock.

Suggested-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Link: https://lore.kernel.org/r/20250627205124.250433-1-W_Armin@gmx.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-07 15:19:46 +03:00
Kurt Borja
8346c6af27
platform/x86: alienware-wmi-wmax: Fix dmi_system_id array
Add missing empty member to `awcc_dmi_table`.

Cc: stable@vger.kernel.org
Fixes: 6d7f1b1a5d ("platform/x86: alienware-wmi: Split DMI table")
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Link: https://lore.kernel.org/r/20250707-dmi-fix-v1-1-6730835d824d@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-07 15:11:43 +03:00
Eyal Birger
a90b2a1aaa xfrm: interface: fix use-after-free after changing collect_md xfrm interface
collect_md property on xfrm interfaces can only be set on device creation,
thus xfrmi_changelink() should fail when called on such interfaces.

The check to enforce this was done only in the case where the xi was
returned from xfrmi_locate() which doesn't look for the collect_md
interface, and thus the validation was never reached.

Calling changelink would thus errornously place the special interface xi
in the xfrmi_net->xfrmi hash, but since it also exists in the
xfrmi_net->collect_md_xfrmi pointer it would lead to a double free when
the net namespace was taken down [1].

Change the check to use the xi from netdev_priv which is available earlier
in the function to prevent changes in xfrm collect_md interfaces.

[1] resulting oops:
[    8.516540] kernel BUG at net/core/dev.c:12029!
[    8.516552] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[    8.516559] CPU: 0 UID: 0 PID: 12 Comm: kworker/u80:0 Not tainted 6.15.0-virtme #5 PREEMPT(voluntary)
[    8.516565] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    8.516569] Workqueue: netns cleanup_net
[    8.516579] RIP: 0010:unregister_netdevice_many_notify+0x101/0xab0
[    8.516590] Code: 90 0f 0b 90 48 8b b0 78 01 00 00 48 8b 90 80 01 00 00 48 89 56 08 48 89 32 4c 89 80 78 01 00 00 48 89 b8 80 01 00 00 eb ac 90 <0f> 0b 48 8b 45 00 4c 8d a0 88 fe ff ff 48 39 c5 74 5c 41 80 bc 24
[    8.516593] RSP: 0018:ffffa93b8006bd30 EFLAGS: 00010206
[    8.516598] RAX: ffff98fe4226e000 RBX: ffffa93b8006bd58 RCX: ffffa93b8006bc60
[    8.516601] RDX: 0000000000000004 RSI: 0000000000000000 RDI: dead000000000122
[    8.516603] RBP: ffffa93b8006bdd8 R08: dead000000000100 R09: ffff98fe4133c100
[    8.516605] R10: 0000000000000000 R11: 00000000000003d2 R12: ffffa93b8006be00
[    8.516608] R13: ffffffff96c1a510 R14: ffffffff96c1a510 R15: ffffa93b8006be00
[    8.516615] FS:  0000000000000000(0000) GS:ffff98fee73b7000(0000) knlGS:0000000000000000
[    8.516619] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.516622] CR2: 00007fcd2abd0700 CR3: 000000003aa40000 CR4: 0000000000752ef0
[    8.516625] PKRU: 55555554
[    8.516627] Call Trace:
[    8.516632]  <TASK>
[    8.516635]  ? rtnl_is_locked+0x15/0x20
[    8.516641]  ? unregister_netdevice_queue+0x29/0xf0
[    8.516650]  ops_undo_list+0x1f2/0x220
[    8.516659]  cleanup_net+0x1ad/0x2e0
[    8.516664]  process_one_work+0x160/0x380
[    8.516673]  worker_thread+0x2aa/0x3c0
[    8.516679]  ? __pfx_worker_thread+0x10/0x10
[    8.516686]  kthread+0xfb/0x200
[    8.516690]  ? __pfx_kthread+0x10/0x10
[    8.516693]  ? __pfx_kthread+0x10/0x10
[    8.516697]  ret_from_fork+0x82/0xf0
[    8.516705]  ? __pfx_kthread+0x10/0x10
[    8.516709]  ret_from_fork_asm+0x1a/0x30
[    8.516718]  </TASK>

Fixes: abc340b38b ("xfrm: interface: support collect metadata mode")
Reported-by: Lonial Con <kongln9170@gmail.com>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-04 09:25:25 +02:00
Alexei Starovoitov
03fe01ddd1 Merge branch 'bpf-reduce-verifier-stack-frame-size'
Yonghong Song says:

====================
bpf: Reduce verifier stack frame size

Arnd Bergmann reported an issue ([1]) where clang compiler (less than
llvm18) may trigger an error where the stack frame size exceeds the limit.
I can reproduce the error like below:
  kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check'
      [-Werror,-Wframe-larger-than]
  kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check'
      [-Werror,-Wframe-larger-than]

This patch series fixed the above two errors by reducing stack size.
See each individual patches for details.

  [1] https://lore.kernel.org/bpf/20250620113846.3950478-1-arnd@kernel.org/

Changelogs:
  v2 -> v3:
    - v2: https://lore.kernel.org/bpf/20250702171134.2370432-1-yonghong.song@linux.dev/
    - Rename env->callchain to env->callchain_buf so it is clear that
    - env->callchain_buf is used for a temp buf.

  v1 -> v2:
    - v1: https://lore.kernel.org/bpf/20250702053332.1991516-1-yonghong.song@linux.dev/
    - Simplify assignment to struct bpf_insn pointer in do_misc_fixups().
    - Restore original implementation in opt_hard_wire_dead_code_branches()
      as only one insn on the stack.
    - Avoid unnecessary insns for 64bit modulo (mod 0/-1) operations.
====================

Link: https://patch.msgid.link/20250703141101.1482025-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:31:30 -07:00
Yonghong Song
82bc4abf28 bpf: Avoid putting struct bpf_scc_callchain variables on the stack
Add a 'struct bpf_scc_callchain callchain_buf' field in bpf_verifier_env.
This way, the previous bpf_scc_callchain local variables can be
replaced by taking address of env->callchain_buf. This can reduce stack
usage and fix the following error:
    kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check'
        [-Werror,-Wframe-larger-than]

Reported-by: Arnd Bergmann <arnd@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250703141117.1485108-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:31:30 -07:00
Yonghong Song
45e9cd38aa bpf: Reduce stack frame size by using env->insn_buf for bpf insns
Arnd Bergmann reported an issue ([1]) where clang compiler (less than
llvm18) may trigger an error where the stack frame size exceeds the limit.
I can reproduce the error like below:
  kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check'
      [-Werror,-Wframe-larger-than]
  kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check'
      [-Werror,-Wframe-larger-than]

Use env->insn_buf for bpf insns instead of putting these insns on the
stack. This can resolve the above 'bpf_check' error. The 'do_check' error
will be resolved in the next patch.

  [1] https://lore.kernel.org/bpf/20250620113846.3950478-1-arnd@kernel.org/

Reported-by: Arnd Bergmann <arnd@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250703141111.1484521-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:31:29 -07:00
Yonghong Song
3b87251439 bpf: Simplify assignment to struct bpf_insn pointer in do_misc_fixups()
In verifier.c, the following code patterns (in two places)
  struct bpf_insn *patch = &insn_buf[0];
can be simplified to
  struct bpf_insn *patch = insn_buf;
which is easier to understand.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250703141106.1483216-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:31:29 -07:00
Paul Chaignon
032547272e bpf: Avoid warning on unexpected map for tail call
Before handling the tail call in record_func_key(), we check that the
map is of the expected type and log a verifier error if it isn't. Such
an error however doesn't indicate anything wrong with the verifier. The
check for map<>func compatibility is done after record_func_key(), by
check_map_func_compatibility().

Therefore, this patch logs the error as a typical reject instead of a
verifier error.

Fixes: d2e4c1e6c2 ("bpf: Constant map key tracking for prog array pokes")
Fixes: 0df1a55afa ("bpf: Warn on internal verifier errors")
Reported-by: syzbot+efb099d5833bca355e51@syzkaller.appspotmail.com
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/1f395b74e73022e47e04a31735f258babf305420.1751578055.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:54 -07:00
Alexei Starovoitov
71b4a9959e Merge branch 'bpf-standard-streams'
Kumar Kartikeya Dwivedi says:

====================
BPF Standard Streams

This set introduces a standard output interface with two streams, namely
stdout and stderr, for BPF programs. The idea is that these streams will
be written to by BPF programs and the kernel, and serve as standard
interfaces for informing user space of any BPF runtime violations. Users
can also utilize them for printing normal messages for debugging usage,
as is the case with bpf_printk() and trace pipe interface.

BPF programs and the kernel can use these streams to output messages.
User space can dump these messages using bpftool.

The stream interface itself is implemented using a lockless list, so
that we can queue messages from any context. Every printk statement into
the stream leads to memory allocation. Allocation itself relies on
try_alloc_pages() to construct a bespoke bump allocator to carve out
elements. If this fails, we finally give up and drop the message.

See commit logs for more details.

Two scenarios are covered:
 - Deadlocks and timeouts in rqspinlock.
 - Timeouts for may_goto.

In each we provide the stack trace and source information for the
offending BPF programs. Both the C source line and the file and line
numbers are printed. The output format is as follows:

ERROR: AA or ABBA deadlock detected for bpf_res_spin_lock
Attempted lock   = 0xff11000108f3a5e0
Total held locks = 1
Held lock[ 0] = 0xff11000108f3a5e0
CPU: 48 UID: 0 PID: 786 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xb0/0xd0
bpf_prog_report_rqspinlock_violation+0x10b/0x130
bpf_res_spin_lock+0x8c/0xa0
bpf_prog_3699ea119d1f6ed8_foo+0xe5/0x140
  if (!bpf_res_spin_lock(&v2->lock)) @ stream_bpftool.c:62
bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0
  foo(stream); @ stream_bpftool.c:93
bpf_prog_test_run_syscall+0x102/0x240
__sys_bpf+0xd68/0x2bf0
__x64_sys_bpf+0x1e/0x30
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e

ERROR: Timeout detected for may_goto instruction
CPU: 48 UID: 0 PID: 786 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xb0/0xd0
bpf_prog_report_may_goto_violation+0x6a/0x90
bpf_check_timed_may_goto+0x4d/0xa0
arch_bpf_timed_may_goto+0x21/0x40
bpf_prog_3699ea119d1f6ed8_foo+0x12f/0x140
  while (can_loop) @ stream_bpftool.c:71
bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0
  foo(stream); @ stream_bpftool.c:93
bpf_prog_test_run_syscall+0x102/0x240
__sys_bpf+0xd68/0x2bf0
__x64_sys_bpf+0x1e/0x30
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e

Changelog:
----------
v4 -> v5
v4: https://lore.kernel.org/bpf/20250702031737.407548-1-memxor@gmail.com

 * Add acks from Emil.
 * Address various nits.
 * Add extra failure tests.
 * Make deadlock test a little more robust to catch problems.

v3 -> v4
v3: https://lore.kernel.org/bpf/20250624031252.2966759-1-memxor@gmail.com

 * Switch to alloc_pages_nolock(), avoid incorrect memcg accounting. (Alexei)
   * We will figure out proper accounting later.
 * Drop error limit logic, restrict stream capacity to 100,000 bytes. (Alexei)
 * Remove extra invocation of is_bpf_text_address(). (Jiri)
 * Avoid emitting NULL byte into the stream text, adjust regex in selftests. (Alexei)
 * Add comment around rcu_read_lock() for bpf_prog_ksym_find. (Alexei)
 * Tighten stream capacity check selftest.
 * Add acks from Andrii.

v2 -> v3
v2: https://lore.kernel.org/bpf/20250524011849.681425-1-memxor@gmail.com

 * Fix bug when handling single element stream stage. (Eduard)
 * Move to mutex for protection of stream read and copy_to_user(). (Alexei)
 * Split bprintf refactor into its own patch. (Alexei)
 * Move kfunc definition to common_btf_ids to avoid initcall proliferation. (Alexei)
 * Return line number by reference in bpf_prog_get_file_line. (Alexei)
 * Remove NULL checks for BTF name pointer. (Alexei)
 * Add WARN_ON_ONCE(!rcu_read_lock_held()) in bpf_prog_ksym_find. (Eduard)
 * Remove hardcoded stream stage from macros. (Alexei, Eduard)
 * Move refactoring hunks to their own patch. (Alexei)
 * Add empty opts parameter for future extensibility to libbpf API. (Andrii, Eduard)
 * Add BPF_STREAM_{STDOUT,STDERR} to UAPI. (Andrii)
 * Add code to match on backtrace output. (Eduard)
 * Fix misc nits.
 * Add acks.

v1 -> v2
v1: https://lore.kernel.org/bpf/20250507171720.1958296-1-memxor@gmail.com

 * Drop arena page fault prints, will be done as follow up. (Alexei)
 * Defer Andrii's request to reuse code and Alan's suggestion of error
   counts to follow up.
 * Drop bpf_dynptr_from_mem_slice patch.
 * Drop some acks due to heavy reworking.
 * Fix KASAN splat in bpf_prog_get_file_line. (Eduard)
 * Collapse bpf_prog_ksym_find and is_bpf_text_address into single
   call. (Eduard)
 * Add missing RCU read lock in bpf_prog_ksym_find.
 * Fix incorrect error handling in dump_stack_cb.
 * Simplify libbpf macro. (Eduard, Andrii)
 * Introduce bpf_prog_stream_read() libbpf API. (Eduard, Alexei, Andrii)
 * Drop BPF prog from the bpftool, use libbpf API.
 * Rework selftests.

RFC v1 -> v1
RFC v1: https://lore.kernel.org/bpf/20250414161443.1146103-1-memxor@gmail.com

 * Rebase on bpf-next/master.
 * Change output in dump_stack to also print source line. (Alexei)
 * Simplify API to single pop() operation. (Eduard, Alexei)
 * Add kdoc for bpf_dynptr_from_mem_slice.
 * Fix -EINVAL returned from prog_dump_stream. (Eduard)
 * Split dump_stack() patch into multiple commits.
 * Add macro wrapping stream staging API.
 * Change bpftool command from dump to tracelog. (Quentin)
 * Add bpftool documentation and bash completion. (Quentin)
 * Change license of bpftool to Dual BSD/GPL.
 * Simplify memory allocator. (Alexei)
   * No overflow into second page.
   * Remove bpf_mem_alloc() fallback.
 * Symlink bpftool BPF program and exercise as selftest. (Eduard)
 * Verify output after dumping from ringbuf. (Eduard)
 * More failure cases to check API invariants.
 * Remove patches for dynptr lifetime fixes (split into separate set).
 * Limit maximum error messages, and add stream capacity. (Eduard)
====================

Link: https://patch.msgid.link/20250703204818.925464-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:08 -07:00
Kumar Kartikeya Dwivedi
5697683e13 selftests/bpf: Add tests for prog streams
Add selftests to stress test the various facets of the stream API,
memory allocation pattern, and ensuring dumping support is tested and
functional.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-13-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi
876f5ebd58 bpftool: Add support for dumping streams
Add support for printing the BPF stream contents of a program in
bpftool. The new bpftool prog tracelog command is extended to take
stdout and stderr arguments, and then the prog specification.

The bpf_prog_stream_read() API added in previous patch is simply reused
to grab data and then it is dumped to the respective file. The stdout
data is sent to stdout, and stderr is printed to stderr.

Cc: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-12-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi
3bbc1ba9cc libbpf: Introduce bpf_prog_stream_read() API
Introduce a libbpf API so that users can read data from a given BPF
stream for a BPF prog fd. For now, only the low-level syscall wrapper
is provided, we can add a bpf_program__* accessor as a follow up if
needed.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi
21a3afc76a libbpf: Add bpf_stream_printk() macro
Add a convenience macro to print data to the BPF streams. BPF_STDOUT and
BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to
print to the respective streams.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-10-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi
ecec5b5743 bpf: Report rqspinlock deadlocks/timeout to BPF stderr
Begin reporting rqspinlock deadlocks and timeout to BPF program's
stderr.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-9-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi
e8d0133022 bpf: Report may_goto timeout to BPF stderr
Begin reporting may_goto timeouts to BPF program's stderr stream.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi
d7c431cafc bpf: Add dump_stack() analogue to print to BPF stderr
Introduce a kernel function which is the analogue of dump_stack()
printing some useful information and the stack trace. This is not
exposed to BPF programs yet, but can be made available in the future.

When we have a program counter for a BPF program in the stack trace,
also additionally output the filename and line number to make the trace
helpful. The rest of the trace can be passed into ./decode_stacktrace.sh
to obtain the line numbers for kernel symbols.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi
f0c53fd4a7 bpf: Add function to find program from stack trace
In preparation of figuring out the closest program that led to the
current point in the kernel, implement a function that scans through the
stack trace and finds out the closest BPF program when walking down the
stack trace.

Special care needs to be taken to skip over kernel and BPF subprog
frames. We basically scan until we find a BPF main prog frame. The
assumption is that if a program calls into us transitively, we'll
hit it along the way. If not, we end up returning NULL.

Contextually the function will be used in places where we know the
program may have called into us.

Due to reliance on arch_bpf_stack_walk(), this function only works on
x86 with CONFIG_UNWINDER_ORC, arm64, and s390. Remove the warning from
arch_bpf_stack_walk as well since we call it outside bpf_throw()
context.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:06 -07:00
Kumar Kartikeya Dwivedi
d090326860 bpf: Ensure RCU lock is held around bpf_prog_ksym_find
Add a warning to ensure RCU lock is held around tree lookup, and then
fix one of the invocations in bpf_stack_walker. The program has an
active stack frame and won't disappear. Use the opportunity to remove
unneeded invocation of is_bpf_text_address.

Fixes: f18b03faba ("bpf: Implement BPF exceptions")
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:06 -07:00
Kumar Kartikeya Dwivedi
0e521efaf3 bpf: Add function to extract program source info
Prepare a function for use in future patches that can extract the file
info, line info, and the source line number for a given BPF program
provided it's program counter.

Only the basename of the file path is provided, given it can be
excessively long in some cases.

This will be used in later patches to print source info to the BPF
stream.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:06 -07:00
Kumar Kartikeya Dwivedi
5ab154f146 bpf: Introduce BPF standard streams
Add support for a stream API to the kernel and expose related kfuncs to
BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
can be used for printing messages that can be consumed from user space,
thus it's similar in spirit to existing trace_pipe interface.

The kernel will use the BPF_STDERR stream to notify the program of any
errors encountered at runtime. BPF programs themselves may use both
streams for writing debug messages. BPF library-like code may use
BPF_STDERR to print warnings or errors on misuse at runtime.

The implementation of a stream is as follows. Everytime a message is
emitted from the kernel (directly, or through a BPF program), a record
is allocated by bump allocating from per-cpu region backed by a page
obtained using alloc_pages_nolock(). This ensures that we can allocate
memory from any context. The eventual plan is to discard this scheme in
favor of Alexei's kmalloc_nolock() [0].

This record is then locklessly inserted into a list (llist_add()) so
that the printing side doesn't require holding any locks, and works in
any context. Each stream has a maximum capacity of 4MB of text, and each
printed message is accounted against this limit.

Messages from a program are emitted using the bpf_stream_vprintk kfunc,
which takes a stream_id argument in addition to working otherwise
similar to bpf_trace_vprintk.

The bprintf buffer helpers are extracted out to be reused for printing
the string into them before copying it into the stream, so that we can
(with the defined max limit) format a string and know its true length
before performing allocations of the stream element.

For consuming elements from a stream, we expose a bpf(2) syscall command
named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the
stream of a given prog_fd into a user space buffer. The main logic is
implemented in bpf_stream_read(). The log messages are queued in
bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and
ordered correctly in the stream backlog.

For this purpose, we hold a lock around bpf_stream_backlog_peek(), as
llist_del_first() (if we maintained a second lockless list for the
backlog) wouldn't be safe from multiple threads anyway. Then, if we
fail to find something in the backlog log, we splice out everything from
the lockless log, and place it in the backlog log, and then return the
head of the backlog. Once the full length of the element is consumed, we
will pop it and free it.

The lockless list bpf_stream::log is a LIFO stack. Elements obtained
using a llist_del_all() operation are in LIFO order, thus would break
the chronological ordering if printed directly. Hence, this batch of
messages is first reversed. Then, it is stashed into a separate list in
the stream, i.e. the backlog_log. The head of this list is the actual
message that should always be returned to the caller. All of this is
done in bpf_stream_backlog_fill().

From the kernel side, the writing into the stream will be a bit more
involved than the typical printk. First, the kernel typically may print
a collection of messages into the stream, and parallel writers into the
stream may suffer from interleaving of messages. To ensure each group of
messages is visible atomically, we can lift the advantage of using a
lockless list for pushing in messages.

To enable this, we add a bpf_stream_stage() macro, and require kernel
users to use bpf_stream_printk statements for the passed expression to
write into the stream. Underneath the macro, we have a message staging
API, where a bpf_stream_stage object on the stack accumulates the
messages being printed into a local llist_head, and then a commit
operation splices the whole batch into the stream's lockless log list.

This is especially pertinent for rqspinlock deadlock messages printed to
program streams. After this change, we see each deadlock invocation as a
non-interleaving contiguous message without any confusion on the
reader's part, improving their user experience in debugging the fault.

While programs cannot benefit from this staged stream writing API, they
could just as well hold an rqspinlock around their print statements to
serialize messages, hence this is kept kernel-internal for now.

Overall, this infrastructure provides NMI-safe any context printing of
messages to two dedicated streams.

Later patches will add support for printing splats in case of BPF arena
page faults, rqspinlock deadlocks, and cond_break timeouts, and
integration of this facility into bpftool for dumping messages to user
space.

  [0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:06 -07:00
Kumar Kartikeya Dwivedi
0426729f46 bpf: Refactor bprintf buffer support
Refactor code to be able to get and put bprintf buffers and use
bpf_printf_prepare independently. This will be used in the next patch to
implement BPF streams support, particularly as a staging buffer for
strings that need to be formatted and then allocated and pushed into a
stream.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:06 -07:00
Tao Chen
da7e9c0a7f bpf: Add show_fdinfo for kprobe_multi
Show kprobe_multi link info with fdinfo, the info as follows:

link_type:	kprobe_multi
link_id:	1
prog_tag:	a69740b9746f7da8
prog_id:	21
kprobe_cnt:	8
missed:	0
cookie	 func
1	 bpf_fentry_test1+0x0/0x20
7	 bpf_fentry_test2+0x0/0x20
2	 bpf_fentry_test3+0x0/0x20
3	 bpf_fentry_test4+0x0/0x20
4	 bpf_fentry_test5+0x0/0x20
5	 bpf_fentry_test6+0x0/0x20
6	 bpf_fentry_test7+0x0/0x20
8	 bpf_fentry_test8+0x0/0x10

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Link: https://lore.kernel.org/r/20250702153958.639852-3-chen.dylane@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:29:42 -07:00
Tao Chen
b4dfe26fbf bpf: Add show_fdinfo for uprobe_multi
Show uprobe_multi link info with fdinfo, the info as follows:

link_type:	uprobe_multi
link_id:	9
prog_tag:	e729f789e34a8eca
prog_id:	39
uprobe_cnt:	3
pid:	0
path:	/home/dylane/bpf/tools/testing/selftests/bpf/test_progs
cookie	 offset	 ref_ctr_offset
3	 0xa69f13	 0x0
1	 0xa69f1e	 0x0
2	 0xa69f29	 0x0

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Link: https://lore.kernel.org/r/20250702153958.639852-2-chen.dylane@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:29:42 -07:00
Tao Chen
803f0700a3 bpf: Show precise link_type for {uprobe,kprobe}_multi fdinfo
Alexei suggested, 'link_type' can be more precise and differentiate
for human in fdinfo. In fact BPF_LINK_TYPE_KPROBE_MULTI includes
kretprobe_multi type, the same as BPF_LINK_TYPE_UPROBE_MULTI, so we
can show it more concretely.

link_type:	kprobe_multi
link_id:	1
prog_tag:	d2b307e915f0dd37
...
link_type:	kretprobe_multi
link_id:	2
prog_tag:	ab9ea0545870781d
...
link_type:	uprobe_multi
link_id:	9
prog_tag:	e729f789e34a8eca
...
link_type:	uretprobe_multi
link_id:	10
prog_tag:	7db356c03e61a4d4

Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Link: https://lore.kernel.org/r/20250702153958.639852-1-chen.dylane@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:29:42 -07:00
Andrii Nakryiko
1f24c0d819 Merge branch 'bpf-add-bpf_dynptr_memset-kfunc'
Ihor Solodrai says:

====================
bpf: add bpf_dynptr_memset() kfunc

Implement bpf_dynptr_memset() kfunc and add tests for it.

v3->v4:
  * do error checks after slice, nits
v2->v3:
  * nits and slow-path loop rewrite (Andrii)
  * simplify xdp chunks test (Mykyta)
v1->v2:
  * handle non-linear buffers with bpf_dynptr_write()
  * change function signature to include offset arg
  * add more test cases

v3: https://lore.kernel.org/bpf/20250630212113.573097-1-isolodrai@meta.com/
v2: https://lore.kernel.org/bpf/20250624205240.1311453-1-isolodrai@meta.com/
v1: https://lore.kernel.org/bpf/20250618223310.3684760-1-isolodrai@meta.com/
====================

Link: https://patch.msgid.link/20250702210309.3115903-1-isolodrai@meta.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-03 15:21:21 -07:00
Ihor Solodrai
7b29689263 selftests/bpf: Add test cases for bpf_dynptr_memset()
Add tests to verify the behavior of bpf_dynptr_memset():
  * normal memset 0
  * normal memset non-0
  * memset with an offset
  * memset in dynptr that was adjusted
  * error: size overflow
  * error: offset+size overflow
  * error: readonly dynptr
  * memset into non-linear xdp dynptr

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/bpf/20250702210309.3115903-3-isolodrai@meta.com
2025-07-03 15:21:20 -07:00
Ihor Solodrai
5fc5d8fded bpf: Add bpf_dynptr_memset() kfunc
Currently there is no straightforward way to fill dynptr memory with a
value (most commonly zero). One can do it with bpf_dynptr_write(), but
a temporary buffer is necessary for that.

Implement bpf_dynptr_memset() - an analogue of memset() from libc.

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250702210309.3115903-2-isolodrai@meta.com
2025-07-03 15:21:20 -07:00
Paul Kocialkowski
01fdcbc7e5 clk: sunxi-ng: v3s: Fix TCON clock parents
The TCON clock can be parented to both the video PLL and the periph0 PLL.
Add the latter, which was missing from the list.

Fixes: d0f11d14b0 ("clk: sunxi-ng: add support for V3s CCU")
Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
Link: https://patch.msgid.link/20250701201124.812882-5-paulk@sys-base.io
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-07-03 23:31:05 +08:00
Paul Kocialkowski
2b73328629 clk: sunxi-ng: v3s: Fix CSI1 MCLK clock name
The CSI1 MCLK clock is reported as "csi-mclk" while it is specific to
CSI1 as the name of the definition indicates. Fix it in the driver.

Fixes: d0f11d14b0 ("clk: sunxi-ng: add support for V3s CCU")
Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
Reviewed-By: Icenowy Zheng <uwu@icenowy.me>
Link: https://patch.msgid.link/20250701201124.812882-4-paulk@sys-base.io
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-07-03 23:31:05 +08:00
Paul Kocialkowski
f45b2949b1 clk: sunxi-ng: v3s: Fix CSI SCLK clock name
The CSI SCLK clock is incorrectly called CSI1 SCLK while it is used for
both the CSI0 and CSI1 interfaces and is called CSI SCLK all around the
documentation.

Fix the name in the driver, header and device-tree.

Fixes: d0f11d14b0 ("clk: sunxi-ng: add support for V3s CCU")
Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
Reviewed-By: Icenowy Zheng <uwu@icenowy.me>
Link: https://patch.msgid.link/20250701201124.812882-3-paulk@sys-base.io
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-07-03 23:31:04 +08:00
Mykyta Yatsenko
38d95beb4b selftests/bpf: Allow veristat compile standalone
Veristat is synced into the standalone repo, where it compiles without
kernel private dependencies. This patch fixes compilation errors in
standalone veristat.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250702175622.358405-1-mykyta.yatsenko5@gmail.com
2025-07-02 11:45:14 -07:00
Paul Chaignon
65fdafd676 bpf: Avoid warning on multiple referenced args in call
The description of full helper calls in syzkaller [1] and the addition of
kernel warnings in commit 0df1a55afa ("bpf: Warn on internal verifier
errors") allowed syzbot to reach a verifier state that was thought to
indicate a verifier bug [2]:

    12: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204
    verifier bug: more than one arg with ref_obj_id R2 2 2

This error can be reproduced with the program from the previous commit:

    0: (b7) r2 = 20
    1: (b7) r3 = 0
    2: (18) r1 = 0xffff92cee3cbc600
    4: (85) call bpf_ringbuf_reserve#131
    5: (55) if r0 == 0x0 goto pc+3
    6: (bf) r1 = r0
    7: (bf) r2 = r0
    8: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204
    9: (95) exit

bpf_tcp_raw_gen_syncookie_ipv4 expects R1 and R2 to be
ARG_PTR_TO_FIXED_SIZE_MEM (with a size of at least sizeof(struct iphdr)
for R1). R0 is a ring buffer payload of 20B and therefore matches this
requirement.

The verifier reaches the check on ref_obj_id while verifying R2 and
rejects the program because the helper isn't supposed to take two
referenced arguments.

This case is a legitimate rejection and doesn't indicate a kernel bug,
so we shouldn't log it as such and shouldn't emit a kernel warning.

Link: https://github.com/google/syzkaller/pull/4313 [1]
Link: https://lore.kernel.org/all/686491d6.a70a0220.3b7e22.20ea.GAE@google.com/T/ [2]
Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Fixes: 0df1a55afa ("bpf: Warn on internal verifier errors")
Reported-by: syzbot+69014a227f8edad4d8c6@syzkaller.appspotmail.com
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/cd09afbfd7bef10bbc432d72693f78ffdc1e8ee5.1751463262.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02 10:43:38 -07:00
Paul Chaignon
7ec899ac90 selftests/bpf: Negative test case for ref_obj_id in args
This patch adds a test case, as shown below, for the verifier error
"more than one arg with ref_obj_id".

    0: (b7) r2 = 20
    1: (b7) r3 = 0
    2: (18) r1 = 0xffff92cee3cbc600
    4: (85) call bpf_ringbuf_reserve#131
    5: (55) if r0 == 0x0 goto pc+3
    6: (bf) r1 = r0
    7: (bf) r2 = r0
    8: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204
    9: (95) exit

This error is currently incorrectly reported as a verifier bug, with a
warning. The next patch in this series will address that.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/3ba78e6cda47ccafd6ea70dadbc718d020154664.1751463262.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02 10:43:34 -07:00
Eduard Zingerman
a90f5f7370 selftests/bpf: null checks for rdonly_untrusted_mem should be preserved
Test case checking that verifier does not assume rdonly_untrusted_mem
values as not null.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250702073620.897517-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02 10:43:30 -07:00
Eduard Zingerman
c3b9faac9b bpf: avoid jump misprediction for PTR_TO_MEM | PTR_UNTRUSTED
Commit f2362a57ae ("bpf: allow void* cast using bpf_rdonly_cast()")
added a notion of PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED type.
This simultaneously introduced a bug in jump prediction logic for
situations like below:

  p = bpf_rdonly_cast(..., 0);
  if (p) a(); else b();

Here verifier would wrongly predict that else branch cannot be taken.
This happens because:
- Function reg_not_null() assumes that PTR_TO_MEM w/o PTR_MAYBE_NULL
  flag cannot be zero.
- The call to bpf_rdonly_cast() returns a rdonly_untrusted_mem value
  w/o PTR_MAYBE_NULL flag.

Tracking of PTR_MAYBE_NULL flag for untrusted PTR_TO_MEM does not make
sense, as the main purpose of the flag is to catch null pointer access
errors. Such errors are not possible on load of PTR_UNTRUSTED values
and verifier makes sure that PTR_UNTRUSTED can't be passed to helpers
or kfuncs.

Hence, modify reg_not_null() to assume that nullness of untrusted
PTR_TO_MEM is not known.

Fixes: f2362a57ae ("bpf: allow void* cast using bpf_rdonly_cast()")
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250702073620.897517-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02 10:43:10 -07:00
Matteo Croce
07ee18a0bc selftests/bpf: Don't call fsopen() as privileged user
In the BPF token example, the fsopen() syscall is called as privileged
user. This is unneeded because fsopen() can be called also as
unprivileged user from the user namespace.
As the `fs_fd` file descriptor which was sent back and forth is still the
same, keep it open instead of cloning and closing it twice via SCM_RIGHTS.

cfr. https://github.com/systemd/systemd/pull/36134

Signed-off-by: Matteo Croce <teknoraver@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/bpf/20250701183123.31781-1-technoboy85@gmail.com
2025-07-02 10:42:54 -07:00
Fernando Fernandez Mancera
2ca58d87eb xfrm: ipcomp: adjust transport header after decompressing
The skb transport header pointer needs to be adjusted by network header
pointer plus the size of the ipcomp header.

This shows up when running traffic over ipcomp using transport mode.
After being reinjected, packets are dropped because the header isn't
adjusted properly and some checks can be triggered. E.g the skb is
mistakenly considered as IP fragmented packet and later dropped.

kworker/30:1-mm     443 [030]   102.055250:     skb:kfree_skb:skbaddr=0xffff8f104aa3ce00 rx_sk=(
        ffffffff8419f1f4 sk_skb_reason_drop+0x94 ([kernel.kallsyms])
        ffffffff8419f1f4 sk_skb_reason_drop+0x94 ([kernel.kallsyms])
        ffffffff84281420 ip_defrag+0x4b0 ([kernel.kallsyms])
        ffffffff8428006e ip_local_deliver+0x4e ([kernel.kallsyms])
        ffffffff8432afb1 xfrm_trans_reinject+0xe1 ([kernel.kallsyms])
        ffffffff83758230 process_one_work+0x190 ([kernel.kallsyms])
        ffffffff83758f37 worker_thread+0x2d7 ([kernel.kallsyms])
        ffffffff83761cc9 kthread+0xf9 ([kernel.kallsyms])
        ffffffff836c3437 ret_from_fork+0x197 ([kernel.kallsyms])
        ffffffff836718da ret_from_fork_asm+0x1a ([kernel.kallsyms])

Fixes: eb2953d269 ("xfrm: ipcomp: Use crypto_acomp interface")
Link: https://bugzilla.suse.com/1244532
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-02 09:34:21 +02:00
Tobias Brunner
3ac9e29211 xfrm: Set transport header to fix UDP GRO handling
The referenced commit replaced a call to __xfrm4|6_udp_encap_rcv() with
a custom check for non-ESP markers.  But what the called function also
did was setting the transport header to the ESP header.  The function
that follows, esp4|6_gro_receive(), relies on that being set when it calls
xfrm_parse_spi().  We have to set the full offset as the skb's head was
not moved yet so adding just the UDP header length won't work.

Fixes: e3fd057776 ("xfrm: Fix UDP GRO handling for some corner cases")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-02 09:19:56 +02:00
Colin Ian King
1230be8209 selftests/bpf: Fix spelling mistake "subtration" -> "subtraction"
There are spelling mistakes in description text. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250630125528.563077-1-colin.i.king@gmail.com
2025-07-01 13:28:56 -07:00
Mykyta Yatsenko
cce3fee729 selftests/bpf: Enable dynptr/test_probe_read_user_str_dynptr
Enable previously disabled dynptr/test_probe_read_user_str_dynptr test,
after the fix it depended on was merged into bpf-next.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250630133515.1108325-1-mykyta.yatsenko5@gmail.com
2025-07-01 12:42:23 -07:00
Paul Chaignon
0df1a55afa bpf: Warn on internal verifier errors
This patch is a follow up to commit 1cb0f56d96 ("bpf: WARN_ONCE on
verifier bugs"). It generalizes the use of verifier_error throughout
the verifier, in particular for logs previously marked "verifier
internal error". As a consequence, all of those verifier bugs will now
come with a kernel warning (under CONFIG_DBEUG_KERNEL) detectable by
fuzzers.

While at it, some error messages that were too generic (ex., "bpf
verifier is misconfigured") have been reworded.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/aGQqnzMyeagzgkCK@Tunnel
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-01 12:38:30 -07:00
Alexei Starovoitov
26d0e53246 Merge branch 's390-bpf-describe-the-frame-using-a-struct-instead-of-constants'
Ilya Leoshkevich says:

====================
s390/bpf: Describe the frame using a struct instead of constants

Hi,

This series contains two small refactorings without functional changes.

The first one removes the code duplication around calculating the
distance from %r15 to the stack frame.

The second one simplifies how offsets to various values stored inside
the frame are calculated.

Best regards,
Ilya
====================

Link: https://patch.msgid.link/20250624121501.50536-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-01 12:36:52 -07:00
Ilya Leoshkevich
e26d523edf s390/bpf: Describe the frame using a struct instead of constants
Currently the caller-allocated portion of the stack frame is described
using constants, hardcoded values, and an ASCII drawing, making it
harder than necessary to ensure that everything is in sync.

Declare a struct and use offsetof() and offsetofend() macros to refer
to various values stored within the frame.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250624121501.50536-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-01 12:36:51 -07:00
Ilya Leoshkevich
b2268d550d s390/bpf: Centralize frame offset calculations
The calculation of the distance from %r15 to the caller-allocated
portion of the stack frame is copy-pasted into multiple places in the
JIT code.

Move it to bpf_jit_prog() and save the result into bpf_jit::frame_off,
so that the other parts of the JIT can use it.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250624121501.50536-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-01 12:36:51 -07:00
Eduard Zingerman
c4b1be928e selftests/bpf: bpf_rdonly_cast u{8,16,32,64} access tests
Tests with aligned and misaligned memory access of different sizes via
pointer returned by bpf_rdonly_cast().

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250627015539.1439656-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-27 18:55:28 -07:00
Mykyta Yatsenko
ffaff1804e selftests/bpf: improve error messages in veristat
Return error if preset parsing fails. Avoid proceeding with veristat run
if preset does not parse.
Before:
```
./veristat set_global_vars.bpf.o -G "arr[999999999999999999999] = 1"
Failed to parse value '999999999999999999999'
Processing 'set_global_vars.bpf.o'...
File                   Program           Verdict  Duration (us)  Insns  States  Program size  Jited size
---------------------  ----------------  -------  -------------  -----  ------  ------------  ----------
set_global_vars.bpf.o  test_set_globals  success             27     64       0            82           0
---------------------  ----------------  -------  -------------  -----  ------  ------------  ----------
Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs.
```
After:
```
./veristat set_global_vars.bpf.o -G "arr[999999999999999999999] = 1"
Failed to parse value '999999999999999999999'
Failed to parse global variable presets: arr[999999999999999999999] = 1
```

Improve error messages:
 * If preset struct member can't be found.
 * Array index out of bounds

Extract rtrim function.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250627144342.686896-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-27 18:54:46 -07:00
Song Liu
bacdf5a0e6 selftests/bpf: Fix cgroup_xattr/read_cgroupfs_xattr
cgroup_xattr/read_cgroupfs_xattr has two issues:

1. cgroup_xattr/read_cgroupfs_xattr messes up lo without creating a netns
   first. This causes issue with other tests.

   Fix this by using a different hook (lsm.s/file_open) and not messing
   with lo.

2. cgroup_xattr/read_cgroupfs_xattr sets up cgroups without proper
   mount namespaces.

   Fix this by using the existing cgroup helpers. A new helper
   set_cgroup_xattr() is added to set xattr on cgroup files.

Fixes: f4fba2d6d2 ("selftests/bpf: Add tests for bpf_cgroup_read_xattr")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Closes: https://lore.kernel.org/bpf/CAADnVQ+iqMi2HEj_iH7hsx+XJAsqaMWqSDe4tzcGAnehFWA9Sw@mail.gmail.com/
Signed-off-by: Song Liu <song@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20250627191221.765921-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-27 14:34:08 -07:00
Eduard Zingerman
a5a7b25d75 bpf: guard BTF_ID_FLAGS(bpf_cgroup_read_xattr) with CONFIG_BPF_LSM
Function bpf_cgroup_read_xattr is defined in fs/bpf_fs_kfuncs.c,
which is compiled only when CONFIG_BPF_LSM is set. Add CONFIG_BPF_LSM
check to bpf_cgroup_read_xattr spec in common_btf_ids in
kernel/bpf/helpers.c to avoid build failures for configs w/o
CONFIG_BPF_LSM.

Build failure example:

    BTF     .tmp_vmlinux1.btf.o
  btf_encoder__tag_kfunc: failed to find kfunc 'bpf_cgroup_read_xattr' in BTF
  ...
  WARN: resolve_btfids: unresolved symbol bpf_cgroup_read_xattr
  make[2]: *** [scripts/Makefile.vmlinux:91: vmlinux.unstripped] Error 255

Fixes: 535b070f4a ("bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node")
Reported-by: Jake Hillion <jakehillion@meta.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250627175309.2710973-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-27 12:18:42 -07:00
Viktor Malik
5272b51367 bpf: Fix string kfuncs names in doc comments
Documentation comments for bpf_strnlen and bpf_strcspn contained
incorrect function names.

Fixes: e91370550f ("bpf: Add kfuncs for read-only string operations")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/bpf/20250627174759.3a435f86@canb.auug.org.au/T/#u
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/20250627082001.237606-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-27 08:48:06 -07:00
Alexei Starovoitov
48d998af99 Merge branch 'vfs-6.17.bpf' of https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Merge branch 'vfs-6.17.bpf' from vfs tree into bpf-next/master
and resolve conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 19:01:04 -07:00
Andrii Nakryiko
6def0822d2 Merge branch 'support-array-presets-in-veristat'
Mykyta Yatsenko says:

====================
Support array presets in veristat

From: Mykyta Yatsenko <yatsenko@meta.com>

This patch series implements support for array variable presets in
veristat. Currently users can set values to global variables before
loading BPF program, but not for arrays. With this change array
elements are supported as well, for example:
```
sudo ./veristat set_global_vars.bpf.o -G "arr[0] = 1"
```

v4 -> v5
 * Simplify array elements offset calculation using  btf__resolve_size
 * Support white spaces in array indexes, e.g. arr [ 0 ]
 * Rename btf_find_member into adjust_var_secinfo_member

v3 -> v4
 * Rework implementation to support multi dimensional arrays
 * Rework data structures for storing parsed presets

v2 -> v3
 * Added more negative tests
 * Fix mem leak
 * Other small fixes

v1 -> v2
 * Support enums as indexes
 * Separating parsing logic from preset processing
 * Add more tests
====================

Link: https://patch.msgid.link/20250625165904.87820-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-26 10:28:52 -07:00
Mykyta Yatsenko
583588594b selftests/bpf: Test array presets in veristat
Modify existing veristat tests to verify that array presets are applied
as expected.
Introduce few negative tests as well to check that common error modes
are handled.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250625165904.87820-4-mykyta.yatsenko5@gmail.com
2025-06-26 10:28:51 -07:00
Mykyta Yatsenko
edc99d0b02 selftests/bpf: Support array presets in veristat
Implement support for presetting values for array elements in veristat.
For example:
```
sudo ./veristat set_global_vars.bpf.o -G "arr[3] = 1"
```
Arrays of structures and structure of arrays work, but each individual
scalar value has to be set separately: `foo[1].bar[2] = value`.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250625165904.87820-3-mykyta.yatsenko5@gmail.com
2025-06-26 10:28:50 -07:00
Mykyta Yatsenko
be898cb5cb selftests/bpf: Separate var preset parsing in veristat
Refactor var preset parsing in veristat to simplify implementation.
Prepare parsed variable beforehand so that parsing logic is separated
from functionality of calculating offsets and searching fields.
Introduce rvalue struct, storing either int or enum (string value),
will be reused in the next patch, extract parsing rvalue into a
separate function.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250625165904.87820-2-mykyta.yatsenko5@gmail.com
2025-06-26 10:28:50 -07:00
Alexei Starovoitov
886178a33a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3
Cross-merge BPF, perf and other fixes after downstream PRs.
It restores BPF CI to green after critical fix
commit bc4394e5e7 ("perf: Fix the throttle error of some clock events")

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:49:39 -07:00
Alexei Starovoitov
5046acc113 Merge branch 'bpf-add-kfuncs-for-read-only-string-operations'
Viktor Malik says:

====================
bpf: Add kfuncs for read-only string operations

String operations are commonly used in programming and BPF programs are
no exception. Since it is cumbersome to reimplement them over and over,
this series introduce kfuncs which provide the most common operations.
For now, we only limit ourselves to functions which do not copy memory
since these usually introduce undefined behaviour in case the
source/destination buffers overlap which would have to be prevented by
the verifier.

The kernel already contains implementations for all of these, however,
it is not possible to use them from BPF context. The main reason is that
the verifier is not able to check that it is safe to access the entire
string and that the string is null-terminated and the function won't
loop forever. Therefore, the operations are open-coded using
__get_kernel_nofault instead of plain dereference and bounded to at most
XATTR_SIZE_MAX characters to make them safe. That allows to skip all the
verfier checks for the passed-in strings as safety is ensured
dynamically.

All of the proposed functions return integers, even those that normally
(in the kernel or libc) return pointers into the strings. The reason is
that since the strings are generally treated as unsafe, the pointers
couldn't be dereferenced anyways. So, instead, we return an index to the
string and let user decide what to do with it. The integer APIs also
allow to return various error codes when unexpected situations happen
while processing the strings.

The series include both positive and negative tests using the kfuncs.

Changelog
---------

Changes in v8:
- Return -ENOENT (instead of -1) when "item not found" for relevant
  functions (Alexei).
- Small adjustments of the string algorithms (Andrii).
- Adapt comments to kernel style (Alexei).

Changes in v7:
- Disable negative tests passing NULL and 0x1 to kfuncs on s390 as they
  aren't relevant (see comment in string_kfuncs_failure1.c for details).

Changes in v6:
- Improve the third patch which allows to use macros in __retval in
  selftests. The previous solution broke several tests.

Changes in v5:
- Make all kfuncs return integers (Andrii).
- Return -ERANGE when passing non-kernel pointers on arches with
  non-overlapping address spaces (Alexei).
- Implement "unbounded" variants using the bounded ones (Andrii).
- Add more negative test cases.

Changes in v4 (all suggested by Andrii):
- Open-code all the kfuncs, not just the unbounded variants.
- Introduce `pagefault` lock guard to simplify the implementation
- Return appropriate error codes (-E2BIG and -EFAULT) on failures
- Const-ify all arguments and return values
- Add negative test-cases

Changes in v3:
- Open-code unbounded variants with __get_kernel_nofault instead of
  dereference (suggested by Alexei).
- Use the __sz suffix for size parameters in bounded variants (suggested
  by Eduard and Alexei).
- Make tests more compact (suggested by Eduard).
- Add benchmark.
====================

Link: https://patch.msgid.link/cover.1750917800.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:44:46 -07:00
Viktor Malik
e8763fb66a selftests/bpf: Add tests for string kfuncs
Add both positive and negative tests cases using string kfuncs added in
the previous patches.

Positive tests check that the functions work as expected.

Negative tests pass various incorrect strings to the kfuncs and check
for the expected error codes:
  -E2BIG  when passing too long strings
  -EFAULT when trying to read inaccessible kernel memory
  -ERANGE when passing userspace pointers on arches with non-overlapping
          address spaces

A majority of the tests use the RUN_TESTS helper which executes BPF
programs with BPF_PROG_TEST_RUN and check for the expected return value.
An exception to this are tests for long strings as we need to memset the
long string from userspace (at least I haven't found an ergonomic way to
memset it from a BPF program), which cannot be done using the RUN_TESTS
infrastructure.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/090451a2e60c9ae1dceb4d1bfafa3479db5c7481.1750917800.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:44:46 -07:00
Viktor Malik
a55b7d3932 selftests/bpf: Allow macros in __retval
Allow macro expansion for values passed to the `__retval` and
`__retval_unpriv` attributes. This is especially useful for testing
programs which return various error codes.

With this change, the code for parsing special literals can be made
simpler, as the literals are defined via macros. The only exception is
INT_MIN which expands to (-INT_MAX -1), which is not single number and
cannot be parsed by strtol. So, we instead use a prefixed literal
_INT_MIN in __retval and handle it separately (assign the expected
return to INT_MIN). Also, strtol cannot handle the "ll" suffix so change
the value of POINTER_VALUE from 0xcafe4all to 0xbadcafe.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/a6c6b551ae0575351faa7b7a1df52f9341a5cbe8.1750917800.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:44:46 -07:00
Viktor Malik
e91370550f bpf: Add kfuncs for read-only string operations
String operations are commonly used so this exposes the most common ones
to BPF programs. For now, we limit ourselves to operations which do not
copy memory around.

Unfortunately, most in-kernel implementations assume that strings are
%NUL-terminated, which is not necessarily true, and therefore we cannot
use them directly in the BPF context. Instead, we open-code them using
__get_kernel_nofault instead of plain dereference to make them safe and
limit the strings length to XATTR_SIZE_MAX to make sure the functions
terminate. When __get_kernel_nofault fails, functions return -EFAULT.
Similarly, when the size bound is reached, the functions return -E2BIG.
In addition, we return -ERANGE when the passed strings are outside of
the kernel address space.

Note that thanks to these dynamic safety checks, no other constraints
are put on the kfunc args (they are marked with the "__ign" suffix to
skip any verifier checks for them).

All of the functions return integers, including functions which normally
(in kernel or libc) return pointers to the strings. The reason is that
since the strings are generally treated as unsafe, the pointers couldn't
be dereferenced anyways. So, instead, we return an index to the string
and let user decide what to do with it. This also nicely fits with
returning various error codes when necessary (see above).

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/4b008a6212852c1b056a413f86e3efddac73551c.1750917800.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:44:45 -07:00
Viktor Malik
3a95a561f2 uaccess: Define pagefault lock guard
Define a pagefault lock guard which allows to simplify functions that
need to disable page faults.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/8a01beb0b671923976f08297d81242bb2129881d.1750917800.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:44:45 -07:00
Anton Protopopov
d83caf7c8d bpf: add btf_type_is_i{32,64} helpers
There are places in BPF code which check if a BTF type is an integer
of particular size. This code can be made simpler by using helpers.
Add new btf_type_is_i{32,64} helpers, and simplify code in a few
files. (Suggested by Eduard for a patch which copy-pasted such a
check [1].)

  v1 -> v2:
    * export less generic helpers (Eduard)
    * make subject less generic than in [v1] (Eduard)

[1] https://lore.kernel.org/bpf/7edb47e73baa46705119a23c6bf4af26517a640f.camel@gmail.com/
[v1] https://lore.kernel.org/bpf/20250624193655.733050-1-a.s.protopopov@gmail.com/

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625151621.1000584-1-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:15:49 -07:00
Alexei Starovoitov
0ed5f79987 Merge branch 'bpf-allow-void-cast-using-bpf_rdonly_cast'
Eduard Zingerman says:

====================
bpf: allow void* cast using bpf_rdonly_cast()

Currently, pointers returned by `bpf_rdonly_cast()` have a type of
"pointer to btf id", and only casts to structure types are allowed.
Access to memory pointed to by these pointers is done through
`BPF_PROBE_{MEM,MEMSX}` instructions and does not produce errors on
invalid memory access.

This patch set extends `bpf_rdonly_cast()` to allow casts to an
equivalent of 'void *', effectively replacing
`bpf_probe_read_kernel()` calls in situations where access to
individual bytes or integers is necessary.

The mechanism was suggested and explored by Andrii Nakryiko in [1].

To help with detecting support for this feature, an
`enum bpf_features` is added with intended usage as follows:

  if (bpf_core_enum_value_exists(enum bpf_features,
                                 BPF_FEAT_RDONLY_CAST_TO_VOID))
    ...

[1] https://github.com/anakryiko/linux/tree/bpf-mem-cast

Changelog:

v2: https://lore.kernel.org/bpf/20250625000520.2700423-1-eddyz87@gmail.com/
v2 -> v3:
- dropped direct numbering for __MAX_BPF_FEAT.

v1: https://lore.kernel.org/bpf/20250624191009.902874-1-eddyz87@gmail.com/
v1 -> v2:
- renamed BPF_FEAT_TOTAL to __MAX_BPF_FEAT and moved patch introducing
  bpf_features enum to the start of the series (Alexei);
- dropped patch #3 allowing optout from CAP_SYS_ADMIN drop in
  prog_tests/verifier.c, use a separate runner in prog_tests/*
  instead.
====================

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://patch.msgid.link/20250625182414.30659-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:13:27 -07:00
Eduard Zingerman
12ed81f823 selftests/bpf: check operations on untrusted ro pointers to mem
The following cases are tested:
- it is ok to load memory at any offset from rdonly_untrusted_mem;
- rdonly_untrusted_mem offset/bounds are not tracked;
- writes into rdonly_untrusted_mem are forbidden;
- atomic operations on rdonly_untrusted_mem are forbidden;
- rdonly_untrusted_mem can't be passed as a memory argument of a
  helper of kfunc;
- it is ok to use PTR_TO_MEM and PTR_TO_BTF_ID in a same load
  instruction.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625182414.30659-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:13:16 -07:00
Eduard Zingerman
f2362a57ae bpf: allow void* cast using bpf_rdonly_cast()
Introduce support for `bpf_rdonly_cast(v, 0)`, which casts the value
`v` to an untyped, untrusted pointer, logically similar to a `void *`.
The memory pointed to by such a pointer is treated as read-only.
As with other untrusted pointers, memory access violations on loads
return zero instead of causing a fault.

Technically:
- The resulting pointer is represented as a register of type
  `PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED` with size zero.
- Offsets within such pointers are not tracked.
- Same load instructions are allowed to have both
  `PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED` and `PTR_TO_BTF_ID`
  as the base pointer types.
  In such cases, `bpf_insn_aux_data->ptr_type` is considered the
  weaker of the two: `PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED`.

The following constraints apply to the new pointer type:
- can be used as a base for LDX instructions;
- can't be used as a base for ST/STX or atomic instructions;
- can't be used as parameter for kfuncs or helpers.

These constraints are enforced by existing handling of `MEM_RDONLY`
flag and `PTR_TO_MEM` of size zero.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625182414.30659-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:13:16 -07:00
Eduard Zingerman
b23e97ffc2 bpf: add bpf_features enum
This commit adds a kernel side enum for use in conjucntion with BTF
CO-RE bpf_core_enum_value_exists. The goal of the enum is to assist
with available BPF features detection. Intended usage looks as
follows:

  if (bpf_core_enum_value_exists(enum bpf_features, BPF_FEAT_<f>))
     ... use feature f ...

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625182414.30659-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:13:15 -07:00
Alexei Starovoitov
0967f5399b Merge branch 'range-tracking-for-bpf_neg'
Song Liu says:

====================
Add range tracking for BPF_NEG. Please see commit log of 1/2 for more
details.
---

Changes v3 => v4:
1. Fix selftest verifier_value_ptr_arith.c. (Eduard)

v3: https://lore.kernel.org/bpf/20250624233328.313573-1-song@kernel.org/

Changes v2 => v3:
1. Minor changes in the selftests. (Eduard)

v2: https://lore.kernel.org/bpf/20250624220038.656646-1-song@kernel.org/

Changes v1 => v2:
1. Split new selftests to a separate patch. (Eduard)
2. Reset reg id on BPF_NEG. (Eduard)
3. Use env->fake_reg instead of a bpf_reg_state on the stack. (Eduard)
4. Add __msg for passing selftests.

v1: https://lore.kernel.org/bpf/20250624172320.2923031-1-song@kernel.org/
====================

Link: https://patch.msgid.link/20250625164025.3310203-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:12:30 -07:00
Song Liu
2945434e24 selftests/bpf: Add tests for BPF_NEG range tracking logic
BPF_REG now has range tracking logic. Add selftests for BPF_NEG.
Specifically, return value of LSM hook lsm.s/socket_connect is used to
show that the verifer tracks BPF_NEG(1) falls in the [-4095, 0] range;
while BPF_NEG(100000) does not fall in that range.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625164025.3310203-3-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:12:18 -07:00
Song Liu
aced132599 bpf: Add range tracking for BPF_NEG
Add range tracking for instruction BPF_NEG. Without this logic, a trivial
program like the following will fail

    volatile bool found_value_b;
    SEC("lsm.s/socket_connect")
    int BPF_PROG(test_socket_connect)
    {
        if (!found_value_b)
                return -1;
        return 0;
    }

with verifier log:

"At program exit the register R0 has smin=0 smax=4294967295 should have
been in [-4095, 0]".

This is because range information is lost in BPF_NEG:

0: R1=ctx() R10=fp0
; if (!found_value_b) @ xxxx.c:24
0: (18) r1 = 0xffa00000011e7048       ; R1_w=map_value(...)
2: (71) r0 = *(u8 *)(r1 +0)           ; R0_w=scalar(smin32=0,smax=255)
3: (a4) w0 ^= 1                       ; R0_w=scalar(smin32=0,smax=255)
4: (84) w0 = -w0                      ; R0_w=scalar(range info lost)

Note that, the log above is manually modified to highlight relevant bits.

Fix this by maintaining proper range information with BPF_NEG, so that
the verifier will know:

4: (84) w0 = -w0                      ; R0_w=scalar(smin32=-255,smax=0)

Also updated selftests based on the expected behavior.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625164025.3310203-2-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:12:17 -07:00
Yonghong Song
d69bafe6ee selftests/bpf: Fix usdt multispec failure with arm64/clang20 selftest build
When building the selftest with arm64/clang20, the following test failed:
  ...
  ubtest_multispec_usdt:PASS:usdt_100_called 0 nsec
  subtest_multispec_usdt:PASS:usdt_100_sum 0 nsec
  subtest_multispec_usdt:FAIL:usdt_300_bad_attach unexpected pointer: 0xaaaad82a2a80
  #471/2   usdt/multispec:FAIL
  #471     usdt:FAIL

But arm64/gcc11 built kernel selftests succeeded. Further debug found arm64/clang
generated code has much less argument pattern after dedup, but gcc generated
code has a lot more.

Check usdt probes with usdt.test.o on arm64 platform:

with gcc11 build binary:
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000054f8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp]
  stapsdt              0x00000031       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000005510, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 4]
  ...
  stapsdt              0x00000032       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000005660, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 60]
  ...
  stapsdt              0x00000034       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000070e8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 1192]
  stapsdt              0x00000034       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000007100, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 1196]
  ...
  stapsdt              0x00000032       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000009ec4, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 60]

with clang20 build binary:
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000009a0, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x9]
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000009b8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x9]
  ...
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000002590, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x9]
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000025a8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x8]
  ...
  stapsdt              0x0000002f       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000007fdc, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x10]

There are total 300 locations for usdt_300. For gcc11 built binary, there are
300 spec's. But for clang20 built binary, there are 3 spec's. The default
BPF_USDT_MAX_SPEC_CNT is 256, so bpf_program__attach_usdt() will fail for gcc
but it will succeed with clang.

To fix the problem, do not do bpf_program__attach_usdt() for usdt_300
with arm64/clang setup.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250624211802.2198821-1-yonghong.song@linux.dev
2025-06-25 12:52:54 -07:00
Alexei Starovoitov
3713b584da Merge branch 'bpf-verifier-improve-precision-of-bpf_add-and-bpf_sub'
Harishankar Vishwanathan says:

====================
bpf, verifier: Improve precision of BPF_ADD and BPF_SUB

This patchset improves the precision of BPF_ADD and BPF_SUB range
tracking. It also adds selftests that exercise the cases where precision
improvement occurs, and selftests for the cases where precise bounds
cannot be computed and the output register state values are set to
unbounded.

Changelog:

v3:
* Improve readability in selftests and commit message by using
  more readable constants (suggested by Eduard Zingerman).
* Add four new selftests for the cases where precise output register
  state bounds cannot be computed in scalar(32)_min_max_add/sub, so the
  output register state must be set to unbounded, i.e., [0, U64_MAX]
  or [0, U32_MAX].
* Add suggested-by Eduard tag to commit message for changes to
  verifier_bounds.c

v2:
* Add clearer example of precision improvement in the commit message for
  verifier.c changes.
* Add selftests that exercise the precision improvement to
  verifier_bounds.c (suggested by Eduard Zingerman).

v1:
  https://lore.kernel.org/bpf/20250610221356.2663491-1-harishankar.vishwanathan@gmail.com/
====================

Link: https://patch.msgid.link/20250623040359.343235-1-harishankar.vishwanathan@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-24 18:48:39 -07:00
Harishankar Vishwanathan
e1d794541b selftests/bpf: Add testcases for BPF_ADD and BPF_SUB
The previous commit improves the precision in scalar(32)_min_max_add,
and scalar(32)_min_max_sub. The improvement in precision occurs in cases
when all outcomes overflow or underflow, respectively.

This commit adds selftests that exercise those cases.

This commit also adds selftests for cases where the output register
state bounds for u(32)_min/u(32)_max are conservatively set to unbounded
(when there is partial overflow or underflow).

Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu>
Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu>
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250623040359.343235-3-harishankar.vishwanathan@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-24 18:48:34 -07:00
Harishankar Vishwanathan
7a998a7316 bpf, verifier: Improve precision for BPF_ADD and BPF_SUB
This patch improves the precison of the scalar(32)_min_max_add and
scalar(32)_min_max_sub functions, which update the u(32)min/u(32)_max
ranges for the BPF_ADD and BPF_SUB instructions. We discovered this more
precise operator using a technique we are developing for automatically
synthesizing functions for updating tnums and ranges.

According to the BPF ISA [1], "Underflow and overflow are allowed during
arithmetic operations, meaning the 64-bit or 32-bit value will wrap".
Our patch leverages the wrap-around semantics of unsigned overflow and
underflow to improve precision.

Below is an example of our patch for scalar_min_max_add; the idea is
analogous for all four functions.

There are three cases to consider when adding two u64 ranges [dst_umin,
dst_umax] and [src_umin, src_umax]. Consider a value x in the range
[dst_umin, dst_umax] and another value y in the range [src_umin,
src_umax].

(a) No overflow: No addition x + y overflows. This occurs when even the
largest possible sum, i.e., dst_umax + src_umax <= U64_MAX.

(b) Partial overflow: Some additions x + y overflow. This occurs when
the largest possible sum overflows (dst_umax + src_umax > U64_MAX), but
the smallest possible sum does not overflow (dst_umin + src_umin <=
U64_MAX).

(c) Full overflow: All additions x + y overflow. This occurs when both
the smallest possible sum and the largest possible sum overflow, i.e.,
both (dst_umin + src_umin) and (dst_umax + src_umax) are > U64_MAX.

The current implementation conservatively sets the output bounds to
unbounded, i.e, [umin=0, umax=U64_MAX], whenever there is *any*
possibility of overflow, i.e, in cases (b) and (c). Otherwise it
computes tight bounds as [dst_umin + src_umin, dst_umax + src_umax]:

if (check_add_overflow(*dst_umin, src_reg->umin_value, dst_umin) ||
    check_add_overflow(*dst_umax, src_reg->umax_value, dst_umax)) {
	*dst_umin = 0;
	*dst_umax = U64_MAX;
}

Our synthesis-based technique discovered a more precise operator.
Particularly, in case (c), all possible additions x + y overflow and
wrap around according to eBPF semantics, and the computation of the
output range as [dst_umin + src_umin, dst_umax + src_umax] continues to
work. Only in case (b), do we need to set the output bounds to
unbounded, i.e., [0, U64_MAX].

Case (b) can be checked by seeing if the minimum possible sum does *not*
overflow and the maximum possible sum *does* overflow, and when that
happens, we set the output to unbounded:

min_overflow = check_add_overflow(*dst_umin, src_reg->umin_value, dst_umin);
max_overflow = check_add_overflow(*dst_umax, src_reg->umax_value, dst_umax);

if (!min_overflow && max_overflow) {
	*dst_umin = 0;
	*dst_umax = U64_MAX;
}

Below is an example eBPF program and the corresponding log from the
verifier.

The current implementation of scalar_min_max_add() sets r3's bounds to
[0, U64_MAX] at instruction 5: (0f) r3 += r3, due to conservative
overflow handling.

0: R1=ctx() R10=fp0
0: (b7) r4 = 0                        ; R4_w=0
1: (87) r4 = -r4                      ; R4_w=scalar()
2: (18) r3 = 0xa000000000000000       ; R3_w=0xa000000000000000
4: (4f) r3 |= r4                      ; R3_w=scalar(smin=0xa000000000000000,smax=-1,umin=0xa000000000000000,var_off=(0xa000000000000000; 0x5fffffffffffffff)) R4_w=scalar()
5: (0f) r3 += r3                      ; R3_w=scalar()
6: (b7) r0 = 1                        ; R0_w=1
7: (95) exit

With our patch, r3's bounds after instruction 5 are set to a much more
precise [0x4000000000000000,0xfffffffffffffffe].

...
5: (0f) r3 += r3                      ; R3_w=scalar(umin=0x4000000000000000,umax=0xfffffffffffffffe)
6: (b7) r0 = 1                        ; R0_w=1
7: (95) exit

The logic for scalar32_min_max_add is analogous. For the
scalar(32)_min_max_sub functions, the reasoning is similar but applied
to detecting underflow instead of overflow.

We verified the correctness of the new implementations using Agni [3,4].

We since also discovered that a similar technique has been used to
calculate output ranges for unsigned interval addition and subtraction
in Hacker's Delight [2].

[1] https://docs.kernel.org/bpf/standardization/instruction-set.html
[2] Hacker's Delight Ch.4-2, Propagating Bounds through Add’s and Subtract’s
[3] https://github.com/bpfverif/agni
[4] https://people.cs.rutgers.edu/~sn349/papers/sas24-preprint.pdf

Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu>
Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu>
Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250623040359.343235-2-harishankar.vishwanathan@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-24 18:37:22 -07:00
Luis Gerhorst
3ce7cdde66 selftests/bpf: Support ppc64el in vmtest
With a rootfs built using libbpf's BPF CI [1], we can run specific tests
as follows:

$ ../libbpf-ci/rootfs/mkrootfs_debian.sh --arch ppc64el --distro noble
$ PLATFORM=ppc64el CROSS_COMPILE=powerpc64le-linux-gnu- \
    tools/testing/selftests/bpf/vmtest.sh \
    -l libbpf-vmtest-rootfs-*-noble-ppc64el.tar.zst \
    -- ./test_progs -t verifier_array_access

Does not include a DENYLIST or support for KVM for now.

[1] https://github.com/libbpf/ci

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Link: https://lore.kernel.org/r/20250619140854.2135283-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-23 09:34:58 -07:00
Menglong Dong
c11f34e300 bpf: Make update_prog_stats() always_inline
The function update_prog_stats() will be called in the bpf trampoline.
In most cases, it will be optimized by the compiler by making it inline.
However, we can't rely on the compiler all the time, and just make it
__always_inline to reduce the possible overhead.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250621045501.101187-1-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-23 09:21:07 -07:00
Christian Brauner
13b0cce9e2
Merge patch series "Introduce bpf_cgroup_read_xattr"
Song Liu <song@kernel.org> says:

Introduce a new kfunc bpf_cgroup_read_xattr, which can read xattr from
cgroupfs nodes. The primary users are LSMs, cgroup programs, and sched_ext.

* patches from https://lore.kernel.org/20250623063854.1896364-1-song@kernel.org:
  selftests/bpf: Add tests for bpf_cgroup_read_xattr
  bpf: Mark cgroup_subsys_state->cgroup RCU safe
  bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node
  kernfs: remove iattr_mutex

Link: https://lore.kernel.org/20250623063854.1896364-1-song@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-23 13:03:12 +02:00
Song Liu
f4fba2d6d2
selftests/bpf: Add tests for bpf_cgroup_read_xattr
Add tests for different scenarios with bpf_cgroup_read_xattr:
1. Read cgroup xattr from bpf_cgroup_from_id;
2. Read cgroup xattr from bpf_cgroup_ancestor;
3. Read cgroup xattr from css_iter;
4. Use bpf_cgroup_read_xattr in LSM hook security_socket_connect.
5. Use bpf_cgroup_read_xattr in cgroup program.

Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/20250623063854.1896364-5-song@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-23 13:03:12 +02:00
Song Liu
1504d8c7c7
bpf: Mark cgroup_subsys_state->cgroup RCU safe
Mark struct cgroup_subsys_state->cgroup as safe under RCU read lock. This
will enable accessing css->cgroup from a bpf css iterator.

Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/20250623063854.1896364-4-song@kernel.org
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-23 13:03:12 +02:00
Song Liu
535b070f4a
bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node
BPF programs, such as LSM and sched_ext, would benefit from tags on
cgroups. One common practice to apply such tags is to set xattrs on
cgroupfs folders.

Introduce kfunc bpf_cgroup_read_xattr, which allows reading cgroup's
xattr.

Note that, we already have bpf_get_[file|dentry]_xattr. However, these
two APIs are not ideal for reading cgroupfs xattrs, because:

  1) These two APIs only works in sleepable contexts;
  2) There is no kfunc that matches current cgroup to cgroupfs dentry.

bpf_cgroup_read_xattr is generic and can be useful for many program
types. It is also safe, because it requires trusted or rcu protected
argument (KF_RCU). Therefore, we make it available to all program types.

Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/20250623063854.1896364-3-song@kernel.org
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-23 13:03:12 +02:00
Christian Brauner
d1f4e90260
kernfs: remove iattr_mutex
All allocations of struct kernfs_iattrs are serialized through a global
mutex. Simply do a racy allocation and let the first one win. I bet most
callers are under inode->i_rwsem anyway and it wouldn't be needed but
let's not require that.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/20250623063854.1896364-2-song@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-23 13:03:12 +02:00
Yuan Chen
99fe8af069 bpftool: Fix memory leak in dump_xx_nlmsg on realloc failure
In function dump_xx_nlmsg(), when realloc() fails to allocate memory,
the original pointer to the buffer is overwritten with NULL. This causes
a memory leak because the previously allocated buffer becomes unreachable
without being freed.

Fixes: 7900efc192 ("tools/bpf: bpftool: improve output format for bpftool net")
Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20250620012133.14819-1-chenyuan_fl@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-20 11:32:36 -07:00
Slava Imameev
f8b19aeca1 selftests/bpf: Add test for bpftool access to read-only protected maps
Add selftest cases that validate bpftool's expected behavior when
accessing maps protected from modification via security_bpf_map.

The test includes a BPF program attached to security_bpf_map with two maps:
- A protected map that only allows read-only access
- An unprotected map that allows full access

The test script attaches the BPF program to security_bpf_map and
verifies that for the bpftool map command:
- Read access works on both maps
- Write access fails on the protected map
- Write access succeeds on the unprotected map
- These behaviors remain consistent when the maps are pinned

Signed-off-by: Slava Imameev <slava.imameev@crowdstrike.com>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20250620151812.13952-2-slava.imameev@crowdstrike.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-20 11:13:03 -07:00
Slava Imameev
d32179e8c2 bpftool: Use appropriate permissions for map access
Modify several functions in tools/bpf/bpftool/common.c to allow
specification of requested access for file descriptors, such as
read-only access.

Update bpftool to request only read access for maps when write
access is not required. This fixes errors when reading from maps
that are protected from modification via security_bpf_map.

Signed-off-by: Slava Imameev <slava.imameev@crowdstrike.com>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20250620151812.13952-1-slava.imameev@crowdstrike.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-20 11:13:03 -07:00
Luis Gerhorst
e30329b8a6 powerpc/bpf: Fix warning for unused ori31_emitted
Without this, the compiler (clang21) might emit a warning under W=1
because the variable ori31_emitted is set but never used if
CONFIG_PPC_BOOK3S_64=n.

Without this patch:

$ make -j $(nproc) W=1 ARCH=powerpc SHELL=/bin/bash arch/powerpc/net
  [...]
  CC      arch/powerpc/net/bpf_jit_comp.o
  CC      arch/powerpc/net/bpf_jit_comp64.o
../arch/powerpc/net/bpf_jit_comp64.c: In function 'bpf_jit_build_body':
../arch/powerpc/net/bpf_jit_comp64.c:417:28: warning: variable 'ori31_emitted' set but not used [-Wunused-but-set-variable]
  417 |         bool sync_emitted, ori31_emitted;
      |                            ^~~~~~~~~~~~~
  AR      arch/powerpc/net/built-in.a

With this patch:

  [...]
  CC      arch/powerpc/net/bpf_jit_comp.o
  CC      arch/powerpc/net/bpf_jit_comp64.o
  AR      arch/powerpc/net/built-in.a

Fixes: dff883d9e9 ("bpf, arm64, powerpc: Change nospec to include v1 barrier")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506180402.uUXwVoSH-lkp@intel.com/
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20250619142647.2157017-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-19 09:51:15 -07:00
Eduard Zingerman
cd7312a78f selftests/bpf: include limits.h needed for PATH_MAX directly
Constant PATH_MAX is used in function unpriv_helpers.c:open_config().
This constant is provided by include file <limits.h>.
The dependency was added by commit [1], which does not include
<limits.h> directly, relying instead on <limits.h> being included from
zlib.h -> zconf.h.
As it turns out, this is not the case for all systems, e.g. on
Fedora 41 zlib 1.3.1 is used, and there <limits.h> is not included
from zconf.h. Hence, there is a compilation error on Fedora 41.

[1] commit fc2915bb8b ("selftests/bpf: More precise cpu_mitigations state detection")

Fixes: fc2915bb8b ("selftests/bpf: More precise cpu_mitigations state detection")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/20250618093134.3078870-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-18 06:52:07 -07:00
James Bottomley
bd07bd12f2 bpf: Fix key serial argument of bpf_lookup_user_key()
The underlying lookup_user_key() function uses a signed 32 bit integer
for key serial numbers because legitimate serial numbers are positive
(and > 3) and keyrings are negative.  Using a u32 for the keyring in
the bpf function doesn't currently cause any conversion problems but
will start to trip the signed to unsigned conversion warnings when the
kernel enables them, so convert the argument to signed (and update the
tests accordingly) before it acquires more users.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/r/84cdb0775254d297d75e21f577089f64abdfbd28.camel@HansenPartnership.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17 18:15:27 -07:00
Al Viro
f5527f0171 bpf: Get rid of redundant 3rd argument of prepare_seq_file()
Remove 3rd argument in prepare_seq_file() to clean up the code a bit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250615004719.GE3011112@ZenIV
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17 17:19:41 -07:00
Yuan Chen
85cd83fed8 bpftool: Fix JSON writer resource leak in version command
When using `bpftool --version -j/-p`, the JSON writer object
created in do_version() was not properly destroyed after use.
This caused a memory leak each time the version command was
executed with JSON output.

Fix: 004b45c0e5 (tools: bpftool: provide JSON output for all possible commands)

Suggested-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20250617132442.9998-1-chenyuan_fl@163.com
2025-06-17 13:28:26 -07:00
Mykyta Yatsenko
66ab68c9de selftests/bpf: Fix unintentional switch case fall through
Break from switch expression after parsing -n CLI argument in veristat,
instead of falling through and enabling comparison mode.

Fixes: a5c57f81eb ("veristat: add ability to set BPF_F_TEST_SANITY_STRICT flag with -r flag")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250617121536.1320074-1-mykyta.yatsenko5@gmail.com
2025-06-17 13:24:31 -07:00
Eduard Zingerman
fc2915bb8b selftests/bpf: More precise cpu_mitigations state detection
test_progs and test_verifier binaries execute unpriv tests under the
following conditions:
- unpriv BPF is enabled;
- CPU mitigations are enabled (see [1] for details).

The detection of the "mitigations enabled" state is performed by
unpriv_helpers.c:get_mitigations_off() via inspecting kernel boot
command line, looking for a parameter "mitigations=off".

Such detection scheme won't work for certain configurations,
e.g. when CONFIG_CPU_MITIGATIONS is disabled and boot parameter is
not supplied.

Miss-detection leads to test_progs executing tests meant to be run
only with mitigations enabled, e.g.
verifier_and.c:known_subreg_with_unknown_reg(), and reporting false
failures.

Internally, verifier sets bpf_verifier_env->bypass_spec_{v1,v4}
basing on the value returned by kernel/cpu.c:cpu_mitigations_off().
This function is backed by a variable kernel/cpu.c:cpu_mitigations.

This state is not fully introspect-able via sysfs. The closest proxy
is /sys/devices/system/cpu/vulnerabilities/spectre_v1, but it reports
"vulnerable" state only if mitigations are disabled *and* current cpu
is vulnerable, while verifier does not check cpu state.

There are only two ways the kernel/cpu.c:cpu_mitigations can be set:
- via boot parameter;
- via CONFIG_CPU_MITIGATIONS option.

This commit updates unpriv_helpers.c:get_mitigations_off() to scan
/boot/config-$(uname -r) and /proc/config.gz for
CONFIG_CPU_MITIGATIONS value in addition to boot command line check.

Tested using the following configurations:
- mitigations enabled (unpriv tests are enabled)
- mitigations disabled via boot cmdline (unpriv tests skipped)
- mitigations disabled via CONFIG_CPU_MITIGATIONS
  (unpriv tests skipped)

[1] https://lore.kernel.org/bpf/20231025031144.5508-1-laoar.shao@gmail.com/

Reported-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250617005710.1066165-2-eddyz87@gmail.com
2025-06-17 13:23:49 -07:00
Yonghong Song
a633dab4b4 selftests/bpf: Fix RELEASE build failure with gcc14
With gcc14, when building with RELEASE=1, I hit four below compilation
failure:

Error 1:
  In file included from test_loader.c:6:
  test_loader.c: In function ‘run_subtest’: test_progs.h:194:17:
      error: ‘retval’ may be used uninitialized in this function
   [-Werror=maybe-uninitialized]
    194 |                 fprintf(stdout, ##format);           \
        |                 ^~~~~~~
  test_loader.c:958:13: note: ‘retval’ was declared here
    958 |         int retval, err, i;
        |             ^~~~~~

  The uninitialized var 'retval' actually could cause incorrect result.

Error 2:
  In function ‘test_fd_array_cnt’:
  prog_tests/fd_array.c:71:14: error: ‘btf_id’ may be used uninitialized in this
      function [-Werror=maybe-uninitialized]
     71 |         fd = bpf_btf_get_fd_by_id(id);
        |              ^~~~~~~~~~~~~~~~~~~~~~~~
  prog_tests/fd_array.c:302:15: note: ‘btf_id’ was declared here
    302 |         __u32 btf_id;
        |               ^~~~~~

  Changing ASSERT_GE to ASSERT_EQ can fix the compilation error. Otherwise,
  there is no functionality change.

Error 3:
  prog_tests/tailcalls.c: In function ‘test_tailcall_hierarchy_count’:
  prog_tests/tailcalls.c:1402:23: error: ‘fentry_data_fd’ may be used uninitialized
      in this function [-Werror=maybe-uninitialized]
     1402 |                 err = bpf_map_lookup_elem(fentry_data_fd, &i, &val);
          |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  The code is correct. The change intends to silence gcc errors.

Error 4: (this error only happens on arm64)
  In file included from prog_tests/log_buf.c:4:
  prog_tests/log_buf.c: In function ‘bpf_prog_load_log_buf’:
  ./test_progs.h:390:22: error: ‘log_buf’ may be used uninitialized [-Werror=maybe-uninitialized]
    390 |         int ___err = libbpf_get_error(___res);             \
        |                      ^~~~~~~~~~~~~~~~~~~~~~~~
  prog_tests/log_buf.c:158:14: note: in expansion of macro ‘ASSERT_OK_PTR’
    158 |         if (!ASSERT_OK_PTR(log_buf, "log_buf_alloc"))
        |              ^~~~~~~~~~~~~
  In file included from selftests/bpf/tools/include/bpf/bpf.h:32,
                 from ./test_progs.h:36:
  selftests/bpf/tools/include/bpf/libbpf_legacy.h:113:17:
    note: by argument 1 of type ‘const void *’ to ‘libbpf_get_error’ declared here
    113 | LIBBPF_API long libbpf_get_error(const void *ptr);
        |                 ^~~~~~~~~~~~~~~~

  Adding a pragma to disable maybe-uninitialized fixed the issue.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250617044956.2686668-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17 10:18:30 -07:00
Chen-Yu Tsai
713d48878e clk: sunxi-ng: a523: Mark MBUS clock as critical
The MBUS serves as the main data bus for various DMA masters in the
system. If its clock is not enabled, the DMA operations will stall,
leading to the peripherals stalling or timing out. This has been
observed as USB or MMC hosts timing out waiting for transactions
when the clock is automatically disabled by the CCF due to it not
being used.

Mark the clock as critical so that it never gets disabled.

Fixes: 74b0443a0d ("clk: sunxi-ng: a523: add system mod clocks")
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://patch.msgid.link/20250607135029.2085140-1-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-06-14 23:59:45 +08:00
Luis Gerhorst
f66b4aaff2 bpf: Remove redundant free_verifier_state()/pop_stack()
This patch removes duplicated code.

Eduard points out [1]:

    Same cleanup cycles are done in push_stack() and push_async_cb(),
    both functions are only reachable from do_check_common() via
    do_check() -> do_check_insn().

    Hence, I think that cur state should not be freed in push_*()
    functions and pop_stack() loop there is not needed.

This would also fix the 'symptom' for [2], but the issue also has a
simpler fix which was sent separately. This fix also makes sure the
push_*() callers always return an error for which
error_recoverable_with_nospec(err) is false. This is required because
otherwise we try to recover and access the stale `state`.

Moving free_verifier_state() and pop_stack(..., pop_log=false) to happen
after the bpf_vlog_reset() call in do_check_common() is fine because the
pop_stack() call that is moved does not call bpf_vlog_reset() with the
pop_log=false parameter.

[1] https://lore.kernel.org/all/b6931bd0dd72327c55287862f821ca6c4c3eb69a.camel@gmail.com/
[2] https://lore.kernel.org/all/68497853.050a0220.33aa0e.036a.GAE@google.com/

Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/all/b6931bd0dd72327c55287862f821ca6c4c3eb69a.camel@gmail.com/
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Link: https://lore.kernel.org/r/20250613090157.568349-2-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-13 14:59:30 -07:00
Eduard Zingerman
4a4b84ba9e selftests/bpf: verify jset handling in CFG computation
A test case to check if both branches of jset are explored when
computing program CFG.

At 'if r1 & 0x7 ...':
- register 'r2' is computed alive only if jump branch of jset
  instruction is followed;
- register 'r0' is computed alive only if fallthrough branch of jset
  instruction is followed.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250613175331.3238739-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-13 11:51:19 -07:00
Eduard Zingerman
3157f7e299 bpf: handle jset (if a & b ...) as a jump in CFG computation
BPF_JSET is a conditional jump and currently verifier.c:can_jump()
does not know about that. This can lead to incorrect live registers
and SCC computation.

E.g. in the following example:

   1: r0 = 1;
   2: r2 = 2;
   3: if r1 & 0x7 goto +1;
   4: exit;
   5: r0 = r2;
   6: exit;

W/o this fix insn_successors(3) will return only (4), a jump to (5)
would be missed and r2 won't be marked as alive at (3).

Fixes: 14c8552db6 ("bpf: simple DFA-based live registers analysis")
Reported-by: syzbot+a36aac327960ff474804@syzkaller.appspotmail.com
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250613175331.3238739-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-13 11:51:19 -07:00
Andrii Nakryiko
e4c8f96ade Merge branch 'veristat-memory-accounting-for-bpf-programs'
Eduard Zingerman says:

====================
veristat: memory accounting for bpf programs

When working on the verifier, it is sometimes interesting to know how a
particular change affects memory consumption. This patch-set modifies
veristat to provide such information. As a collateral, kernel needs an
update to make allocations reachable from BPF program load accountable
in memcg statistics.

Here is a sample output:

  Program          Peak states  Peak memory (MiB)
  ---------------  -----------  -----------------
  lavd_select_cpu         2153                 43
  lavd_enqueue            1982                 41
  lavd_dispatch           3480                 28

Technically, this is implemented by creating and entering a new cgroup
at the start of veristat execution. The difference between values from
cgroup "memory.peak" file before and after bpf_object__load() is used
as a metric.

To minimize measurements jitter data is collected in megabytes.

Changelog:
v2: https://lore.kernel.org/bpf/20250612130835.2478649-1-eddyz87@gmail.com/
v2 -> v3:
- bpf_verifier_state->jmp_history and
  bpf_verifier_env->explored_states allocations are switched from
  GFP_USER to GFP_KERNEL_ACCOUNT (Andrii, Alexei);
- veristat.c:STR macro removed, PATH_MAX-1 == 4095 is hard-coded in
  scanf format strings (Andrii);
- env->{orig,stat}_cgroup size changed to PATH_MAX (Andrii);
- snprintf_trunc() is removed, flag -Wno-format-truncation
  is added to CFLAGS for veristat.o when compiled with gcc;

v1: https://lore.kernel.org/bpf/20250605230609.1444980-1-eddyz87@gmail.com/
v1 -> v2:
- a single cgroup, created at the beginning of execution, is now used
  for measurements (Andrii, Mykyta);
- cgroup namespace is not created, as it turned out to be useless
  (Andrii);
- veristat no longer mounts cgroup fs or changes subtree_control,
  instead it looks for an existing mount point and reports an error if
  memory.peak file can't be opened (Andrii, Alexei);
- if 'mem_peak' statistics is not enabled, veristat skips cgroup
  setup;
- code sharing with cgroup_helpers.c was considered but was decided
  against to simplify veristat github sync.
====================

Link: https://patch.msgid.link/20250613072147.3938139-1-eddyz87@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-13 10:44:17 -07:00
Eduard Zingerman
67cdcc405b veristat: Memory accounting for bpf programs
This commit adds a new field mem_peak / "Peak memory (MiB)" field to a
set of gathered statistics. The field is intended as an estimate for
peak verifier memory consumption for processing of a given program.

Mechanically stat is collected as follows:
- At the beginning of handle_verif_mode() a new cgroup is created
  and veristat process is moved into this cgroup.
- At each program load:
  - bpf_object__load() is split into bpf_object__prepare() and
    bpf_object__load() to avoid accounting for memory allocated for
    maps;
  - before bpf_object__load():
    - a write to "memory.peak" file of the new cgroup is used to reset
      cgroup statistics;
    - updated value is read from "memory.peak" file and stashed;
  - after bpf_object__load() "memory.peak" is read again and
    difference between new and stashed values is used as a metric.

If any of the above steps fails veristat proceeds w/o collecting
mem_peak information for a program, reporting mem_peak as -1.

While memcg provides data in bytes (converted from pages), veristat
converts it to megabytes to avoid jitter when comparing results of
different executions.

The change has no measurable impact on veristat running time.

A correlation between "Peak states" and "Peak memory" fields provides
a sanity check for gathered statistics, e.g. a sample of data for
sched_ext programs:

Program                   Peak states  Peak memory (MiB)
------------------------  -----------  -----------------
lavd_select_cpu                  2153                 44
lavd_enqueue                     1982                 41
lavd_dispatch                    3480                 28
layered_dispatch                 1417                 17
layered_enqueue                   760                 11
lavd_cpu_offline                  349                  6
lavd_cpu_online                   349                  6
lavd_init                         394                  6
rusty_init                        350                  5
layered_select_cpu                391                  4
...
rusty_stopping                    134                  1
arena_topology_node_init          170                  0

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250613072147.3938139-3-eddyz87@gmail.com
2025-06-13 10:44:12 -07:00
Eduard Zingerman
43736ec3e0 bpf: Include verifier memory allocations in memcg statistics
This commit adds __GFP_ACCOUNT flag to verifier induced memory
allocations. The intent is to account for all allocations reachable
from BPF_PROG_LOAD command, which is needed to track verifier memory
consumption in veristat. This includes allocations done in verifier.c,
and some allocations in btf.c, functions in log.c do not allocate.

There is also a utility function bpf_memcg_flags() which selectively
adds GFP_ACCOUNT flag depending on the `cgroup.memory=nobpf` option.
As far as I understand [1], the idea is to remove bpf_prog instances
and maps from memcg accounting as these objects do not strictly belong
to cgroup, hence it should not apply here.

(btf_parse_fields() is reachable from both program load and map
 creation, but allocated record is not persistent as is freed as soon
 as map_check_btf() exits).

[1] https://lore.kernel.org/all/20230210154734.4416-1-laoar.shao@gmail.com/

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250613072147.3938139-2-eddyz87@gmail.com
2025-06-13 10:29:45 -07:00
Song Liu
ccefa19335 bpf/veristat: Fix veristat for map type BPF_MAP_TYPE_CGRP_STORAGE
BPF_MAP_TYPE_CGRP_STORAGE doesn't allow non-zero max_entries. So veristat
should not set it to 1.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250613050001.1058733-1-song@kernel.org
2025-06-13 10:08:22 -07:00
Ruslan Semchenko
af91af33c1 tools/bpf_jit_disasm: Fix potential negative tpath index in get_exec_path()
If readlink() fails, len will be -1, which can cause negative indexing
and undefined behavior. This patch ensures that len is set to 0 on
readlink failure, preventing such issues.

Signed-off-by: Ruslan Semchenko <uncleruc2075@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250612131816.1870-1-uncleruc2075@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:11:13 -07:00
Alexei Starovoitov
0e93df45c7 Merge branch 'bpf-fix-a-few-test-failures-with-64k-page-size'
Yonghong Song says:

====================
bpf: Fix a few test failures with 64K page size

Changelog:
  v2 -> v3:
    - v2: https://lore.kernel.org/bpf/20250611171519.2033193-1-yonghong.song@linux.dev/
    - Add additional comments for xdp_adjust_tail test.
    - Use actual kernel page size to set new_len for Patch 2 tests.
  v1 -> v2:
    - v1: https://lore.kernel.org/bpf/20250608165534.1019914-1-yonghong.song@linux.dev/
    - For xdp_adjust_tail, let kernel test_run can handle various page sizes for xdp progs.
    - For two change_tail tests, make code easier to understand.
    - Resolved a new test failure (xdp_do_redirect).
====================

Link: https://patch.msgid.link/20250612035027.2207299-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Yonghong Song
44df9e0d4e selftests/bpf: Fix xdp_do_redirect failure with 64KB page size
On arm64 with 64KB page size, the selftest xdp_do_redirect failed like
below:
  ...
  test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec
  test_max_pkt_size:PASS:prog_run_max_size 0 nsec
  test_max_pkt_size:FAIL:prog_run_too_big unexpected prog_run_too_big: actual -28 != expected -22

With 64KB page size, the xdp frame size will be much bigger so
the existing test will fail.

Adjust various parameters so the test can also work on 64K page size.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035042.2208630-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Yonghong Song
96fcf7e7a7 selftests/bpf: Fix two net related test failures with 64K page size
When running BPF selftests on arm64 with a 64K page size, I encountered
the following two test failures:
  sockmap_basic/sockmap skb_verdict change tail:FAIL
  tc_change_tail:FAIL

With further debugging, I identified the root cause in the following
kernel code within __bpf_skb_change_tail():

    u32 max_len = BPF_SKB_MAX_LEN;
    u32 min_len = __bpf_skb_min_len(skb);
    int ret;

    if (unlikely(flags || new_len > max_len || new_len < min_len))
        return -EINVAL;

With a 4K page size, new_len = 65535 and max_len = 16064, the function
returns -EINVAL. However, With a 64K page size, max_len increases to
261824, allowing execution to proceed further in the function. This is
because BPF_SKB_MAX_LEN scales with the page size and larger page sizes
result in higher max_len values.

Updating the new_len parameter in both tests based on actual kernel
page size resolved both failures.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035037.2207911-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Yonghong Song
4fc012daf9 bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K
The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on
arm64 with 64KB page:
   xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL

In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on
when constructing frags, with 64K page size, the frag data_len could
be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail().

To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel
can test different page size properly. With the kernel change, the user
space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default
value is 17, so for 4K page, the maximum packet size will be less than 68K.
To test 64K page, a bigger maximum packet size than 68K is desired. So two
different functions are implemented for subtest xdp_adjust_frags_tail_grow.
Depending on different page size, different data input/output sizes are used
to adapt with different page size.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035032.2207498-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Song Liu
fa6932577c bpf: Initialize used but uninit variable in propagate_liveness()
With input changed == NULL, a local variable is used for "changed".
Initialize tmp properly, so that it can be used in the following:
   *changed |= err > 0;

Otherwise, UBSAN will complain:

UBSAN: invalid-load in kernel/bpf/verifier.c:18924:4
load of value <some random value> is not a valid value for type '_Bool'

Fixes: dfb2d4c64b ("bpf: set 'changed' status if propagate_liveness() did any updates")
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250612221100.2153401-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:53:40 -07:00
Yonghong Song
50034d9362 docs/bpf: Default cpu version changed from v1 to v3 in llvm 20
The default cpu version is changed from v1 to v3 in llvm version 20.
See [1] for more detailed reasoning. Update bpf_devel_QA.rst so
developers can find such information easily.

  [1] https://github.com/llvm/llvm-project/pull/107008

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612043049.2411989-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:44 -07:00
Fushuai Wang
6a4bd31f68 selftests/bpf: fix signedness bug in redir_partial()
When xsend() returns -1 (error), the check 'n < sizeof(buf)' incorrectly
treats it as success due to unsigned promotion. Explicitly check for -1
first.

Fixes: a4b7193d8e ("selftests/bpf: Add sockmap test for redirecting partial skb data")
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Link: https://lore.kernel.org/r/20250612084208.27722-1-wangfushuai@baidu.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Luis Gerhorst
3d71b8b9ab bpf: Fix state use-after-free on push_stack() err
Without this, `state->speculative` is used after the cleanup cycles in
push_stack() or push_async_cb() freed `env->cur_state` (i.e., `state`).
Avoid this by relying on the short-circuit logic to only access `state`
if the error is recoverable (and make sure it never is after push_*()
failed).

push_*() callers must always return an error for which
error_recoverable_with_nospec(err) is false if push_*() returns NULL,
otherwise we try to recover and access the stale `state`. This is only
violated by sanitize_ptr_alu(), thus also fix this case to return
-ENOMEM.

state->speculative does not make sense if the error path of push_*()
ran. In that case, `state->speculative &&
error_recoverable_with_nospec(err)` as a whole should already never
evaluate to true (because all cases where push_stack() fails must return
-ENOMEM/-EFAULT). As mentioned, this is only violated by the
push_stack() call in sanitize_speculative_path() which returns -EACCES
without [1] (through REASON_STACK in sanitize_err() after
sanitize_ptr_alu()). To fix this, return -ENOMEM for REASON_STACK (which
is also the behavior we will have after [1]).

Checked that it fixes the syzbot reproducer as expected.

[1] https://lore.kernel.org/all/20250603213232.339242-1-luis.gerhorst@fau.de/

Fixes: d6f1c85f22 ("bpf: Fall back to nospec for Spectre v1")
Reported-by: syzbot+b5eb72a560b8149a1885@syzkaller.appspotmail.com
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/all/38862a832b91382cddb083dddd92643bed0723b8.camel@gmail.com/
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611210728.266563-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Alexei Starovoitov
e3f6660b78 Merge branch 'bpf-propagate-read-precision-marks-over-state-graph-backedges'
Eduard Zingerman says:

====================
bpf: propagate read/precision marks over state graph backedges

Current loop_entry-based states comparison logic does not handle the
following case:

 .-> A --.  Assume the states are visited in the order A, B, C.
 |   |   |  Assume that state B reaches a state equivalent to state A.
 |   v   v  At this point, state C is not processed yet, so state A
 '-- B   C  has not received any read or precision marks from C.
            As a result, these marks won't be propagated to B.

If B has incomplete marks, it is unsafe to use it in states_equal()
checks. This issue was first reported in [1].

This patch-set
--------------

Here is the gist of the algorithm implemented by this patch-set:
- Compute strongly connected components (SCCs) in the program CFG.
- When a verifier state enters an SCC, that state is recorded as the
  SCC's entry point.
- When a verifier state is found to be equivalent to another
  (e.g., B to A in the example above), it is recorded as a
  states-graph backedge.
- Backedges are accumulated per SCC (*).
- When an SCC entry state reaches `branches == 0`, propagate read and
  precision marks through the backedges until a fixed point is reached
  (e.g., from A to B, from C to A, and then again from A to B).

(*) This is an oversimplification, see patch #8 for details.

Unfortunately, this means that commit [2] needs to be reverted,
as precision propagation requires access to jump history,
and backedges represent history not belonging to `env->cur_state`.

Details are provided in patch #8; a comment in `is_state_visited()`
explains most of the mechanics.

Patch #2 adds a `compute_scc()` function, which computes SCCs in the
program CFG. This function was tested using property-based testing in
[3], but it is not included in selftests.

Previous attempt
----------------

A previous attempt to fix this is described in [4]:
1. Within the states loop, `states_equal(... RANGE_WITHIN)` ignores
   read and precision marks.
2. For states outside the loop, all registers for states within the
   loop are marked as read and precise.

This approach led to an 86x regression on the `cond_break1` selftest.
In that test, one loop was followed by another, and a certain variable
was incremented in the second loop. This variable was marked as
precise due to rule (2), which hindered convergence in the first loop.

After some off-list discussion, it was decided that this might be a
typical case and such regressions are undesirable.

This patch-set avoids such eager precision markings.

Alternatives
------------

Another option is to associate a mask of read/written/precise stack
slots with each instruction. This mask can be populated during
verifier states exploration. Upon reaching an `EXIT` instruction or an
equivalent state, the accumulated masks can be used to propagate
read/written/precise bits across the program's control flow graph
using an analysis similar to use-def.

Unfortunately, a naive implementation of this approach [5] results in
a 10x regression in `veristat` for some `sched_ext` programs due to
the inability to express the must-write property. This issue requires
further investigation.

Changes in verification performance
-----------------------------------

There are some veristat regressions when comparing with master using
selftests and sched_ext BPF binaries. The comparison is done using
master from [6] and this patch-set from [7] where memory accounting
logic is added to veristat.

========= selftests: master vs patch-set =========

File                  Program                              Insns                           Peak memory (KiB)
--------------------- -----------------------------------  -----  -----  ----------------  ----  -----  ----------------
bpf_qdisc_fq.bpf.o    bpf_fq_dequeue                        1187   1645    +458 (+38.58%)   768   1240    +472 (+61.46%)
dynptr_success.bpf.o  test_copy_from_user_str_dynptr         208    279     +71 (+34.13%)   512   1024   +512 (+100.00%)
dynptr_success.bpf.o  test_copy_from_user_task_str_dynptr    205    263     +58 (+28.29%)   512   1024   +512 (+100.00%)
dynptr_success.bpf.o  test_probe_read_kernel_str_dynptr      686    857    +171 (+24.93%)   992   1724    +732 (+73.79%)
dynptr_success.bpf.o  test_probe_read_user_str_dynptr        689    860    +171 (+24.82%)  1016   1744    +728 (+71.65%)
iters.bpf.o           checkpoint_states_deletion            1211   1216       +5 (+0.41%)   512   1280   +768 (+150.00%)
pyperf600_iter.bpf.o  on_event                              2591   5929  +3338 (+128.83%)  4744  11176  +6432 (+135.58%)
verifier_gotol.bpf.o  gotol_large_imm                      40004  40004       +0 (+0.00%)  1024   1536    +512 (+50.00%)

Total progs: 3725
Old success: 2157
New success: 2157
total_insns diff min:    0.00%
total_insns diff max:  128.83%
0 -> value: 0
value -> 0: 0
total_insns abs max old: 837,487
total_insns abs max new: 837,487
   0 .. 5    %: 3710
   5 .. 15   %: 6
  20 .. 30   %: 6
  30 .. 40   %: 2
 125 .. 130  %: 1

mem_peak diff min:  -27.78%
mem_peak diff max:  198.44%
mem_peak abs max old: 269,312 KiB
mem_peak abs max new: 269,312 KiB
 -30 .. -20  %: 1
  -5 .. 0    %: 18
   0 .. 5    %: 3568
   5 .. 15   %: 4
  15 .. 25   %: 3
  45 .. 55   %: 4
  60 .. 70   %: 1
  70 .. 80   %: 2
 100 .. 110  %: 3
 135 .. 145  %: 1
 150 .. 160  %: 1
 195 .. 200  %: 1

========= scx: master vs patch-set =========

Program                   Insns                          Peak memory (KiB)
------------------------  -----  -----  ---------------  -----  -----  -----------------
arena_topology_node_init   2133   2395   +262 (+12.28%)    768    768        +0 (+0.00%)
chaos_dispatch             2835   2868     +33 (+1.16%)   1972   1720     -252 (-12.78%)
chaos_init                 4324   5210   +886 (+20.49%)   2528   3028     +500 (+19.78%)
lavd_cpu_offline           5107   5726   +619 (+12.12%)   4188   6304    +2116 (+50.53%)
lavd_cpu_online            5107   5726   +619 (+12.12%)   4188   6304    +2116 (+50.53%)
lavd_dispatch             41775  47601  +5826 (+13.95%)   6196  29192  +22996 (+371.14%)
lavd_enqueue              20238  24188  +3950 (+19.52%)  22084  42156   +20072 (+90.89%)
lavd_init                  6974   7685   +711 (+10.20%)   5428   6928    +1500 (+27.63%)
lavd_select_cpu           22138  26088  +3950 (+17.84%)  24448  43688   +19240 (+78.70%)
layered_dispatch          17847  26581  +8734 (+48.94%)  11728  28740  +17012 (+145.05%)
layered_dump               1891   2098   +207 (+10.95%)   2036   3048    +1012 (+49.71%)
layered_runnable           2606   2634     +28 (+1.07%)    748   1240     +492 (+65.78%)
p2dq_init                  3691   4554   +863 (+23.38%)   2016   2528     +512 (+25.40%)
rusty_enqueue             28853  28853      +0 (+0.00%)   2072   1824     -248 (-11.97%)
rusty_init_task           31128  31128      +0 (+0.00%)   2176   2560     +384 (+17.65%)

Total progs: 148
Old success: 135
New success: 135
total_insns diff min:    0.00%
total_insns diff max:   48.94%
0 -> value: 0
value -> 0: 0
total_insns abs max old: 41,775
total_insns abs max new: 47,601
   0 .. 5    %: 133
   5 .. 15   %: 7
  15 .. 25   %: 4
  35 .. 45   %: 3
  45 .. 50   %: 1

mem_peak diff min:  -12.78%
mem_peak diff max:  371.14%
mem_peak abs max old: 24,448 KiB
mem_peak abs max new: 43,688 KiB
 -15 .. -5   %: 2
  -5 .. 0    %: 2
   0 .. 5    %: 129
   5 .. 15   %: 1
  15 .. 25   %: 2
  25 .. 35   %: 2
  45 .. 55   %: 3
  65 .. 75   %: 1
  75 .. 85   %: 1
  90 .. 100  %: 1
 145 .. 155  %: 1
 195 .. 205  %: 1
 370 .. 375  %: 1

Changelog
---------

v1: https://lore.kernel.org/bpf/20250524191932.389444-1-eddyz87@gmail.com/
v1 -> v2:
- Rebase
- added mem_peak statistics (Alexei)
- selftests: fixed comments and removed useless r7 assignments (Yonghong)
v2: https://lore.kernel.org/bpf/20250606210352.1692944-1-eddyz87@gmail.com/
v2 -> v3:
- Rebase

Links
-----

[1] https://lore.kernel.org/bpf/20250312031344.3735498-1-eddyz87@gmail.com/
[2] commit 96a30e469c ("bpf: use common instruction history across all states")
[3] https://github.com/eddyz87/scc-test
[4] https://lore.kernel.org/bpf/20250426104634.744077-1-eddyz87@gmail.com/
[5] https://github.com/eddyz87/bpf/tree/propagate-read-and-precision-in-cfg
[6] https://github.com/eddyz87/bpf/tree/veristat-memory-accounting
[7] https://github.com/eddyz87/bpf/tree/scc-accumulate-backedges
====================

Link: https://patch.msgid.link/20250611200546.4120963-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
5159482fdb selftests/bpf: tests with a loop state missing read/precision mark
The test case absent_mark_in_the_middle_state is equivalent of the
following C program:

   1: r8 = bpf_get_prandom_u32();
   2: r6 = -32;
   3: bpf_iter_num_new(&fp[-8], 0, 10);
   4: if (unlikely(bpf_get_prandom_u32()))
   5:   r6 = -31;
   6: for (;;) {
   7:   if (!bpf_iter_num_next(&fp[-8]))
   8:     break;
   9:   if (unlikely(bpf_get_prandom_u32()))
  10:     *(u64 *)(fp + r6) = 7;
  11: }
  12: bpf_iter_num_destroy(&fp[-8]);
  13: return 0;

W/o a fix that instructs verifier to ignore branches count for loop
entries verification proceeds as follows:
- 1-4, state is {r6=-32,fp-8=active};
- 6, checkpoint A is created with {r6=-32,fp-8=active};
- 7, checkpoint B is created with {r6=-32,fp-8=active},
     push state {r6=-32,fp-8=active} from 7 to 9;
- 8,12,13, {r6=-32,fp-8=drained}, exit;
- pop state with {r6=-32,fp-8=active} from 7 to 9;
- 9, push state {r6=-32,fp-8=active} from 9 to 10;
- 6, checkpoint C is created with {r6=-32,fp-8=active};
- 7, checkpoint A is hit, no precision propagated for r6 to C;
- pop state {r6=-32,fp-8=active} from 9 to 10;
- 10, state is {r6=-31,fp-8=active}, r6 is marked as read and precise,
      these marks are propagated to checkpoints A and B (but not C, as
      it is not the parent of current state;
- 6, {r6=-31,fp-8=active} checkpoint C is hit, because r6 is not
     marked precise for this checkpoint;
- the program is accepted, despite a possibility of unaligned u64
  stack access at offset -31.

The test case absent_mark_in_the_middle_state2 is similar except the
following change:

       r8 = bpf_get_prandom_u32();
       r6 = -32;
       bpf_iter_num_new(&fp[-8], 0, 10);
       if (unlikely(bpf_get_prandom_u32())) {
         r6 = -31;
 + jump_into_loop:
 +       goto +0;
 +       goto loop;
 +     }
 +     if (unlikely(bpf_get_prandom_u32()))
 +       goto jump_into_loop;
 + loop:
       for (;;) {
         if (!bpf_iter_num_next(&fp[-8]))
           break;
         if (unlikely(bpf_get_prandom_u32()))
           *(u64 *)(fp + r6) = 7;
       }
       bpf_iter_num_destroy(&fp[-8])
       return 0

The goal is to check that read/precision marks are propagated to
checkpoint created at 'goto +0' that resides outside of the loop.

The test case absent_mark_in_the_middle_state3 is a bit different and
is equivalent to the C program below:

    int absent_mark_in_the_middle_state3(void)
    {
      bpf_iter_num_new(&fp[-8], 0, 10)
      loop1(-32, &fp[-8])
      loop1_wrapper(&fp[-8])
      bpf_iter_num_destroy(&fp[-8])
    }

    int loop1(num, iter)
    {
      while (bpf_iter_num_next(iter)) {
        if (unlikely(bpf_get_prandom_u32()))
          *(fp + num) = 7;
      }
      return 0
    }

    int loop1_wrapper(iter)
    {
      r6 = -32;
      if (unlikely(bpf_get_prandom_u32()))
        r6 = -31;
      loop1(r6, iter);
      return 0;
    }

The unsafe state is reached in a similar manner, but the loop is
located inside a subprogram that is called from two locations in the
main subprogram. This detail is important for exercising
bpf_scc_visit->backedges memory management.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-11-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
0f54ff5470 bpf: include backedges in peak_states stat
Count states accumulated in bpf_scc_visit->backedges in
env->peak_states.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-10-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
0e0da5f901 bpf: remove {update,get}_loop_entry functions
The previous patch switched read and precision tracking for
iterator-based loops from state-graph-based loop tracking to
control-flow-graph-based loop tracking.

This patch removes the now-unused `update_loop_entry()` and
`get_loop_entry()` functions, which were part of the state-graph-based
logic.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-9-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
c9e31900b5 bpf: propagate read/precision marks over state graph backedges
Current loop_entry-based exact states comparison logic does not handle
the following case:

 .-> A --.  Assume the states are visited in the order A, B, C.
 |   |   |  Assume that state B reaches a state equivalent to state A.
 |   v   v  At this point, state C is not processed yet, so state A
 '-- B   C  has not received any read or precision marks from C.
            As a result, these marks won't be propagated to B.

If B has incomplete marks, it is unsafe to use it in states_equal()
checks.

This commit replaces the existing logic with the following:
- Strongly connected components (SCCs) are computed over the program's
  control flow graph (intraprocedurally).
- When a verifier state enters an SCC, that state is recorded as the
  SCC entry point.
- When a verifier state is found equivalent to another (e.g., B to A
  in the example), it is recorded as a states graph backedge.
  Backedges are accumulated per SCC.
- When an SCC entry state reaches `branches == 0`, read and precision
  marks are propagated through the backedges (e.g., from A to B, from
  C to A, and then again from A to B).

To support nested subprogram calls, the entry state and backedge list
are associated not with the SCC itself but with an object called
`bpf_scc_callchain`. A callchain is a tuple `(callsite*, scc_id)`,
where `callsite` is the index of a call instruction for each frame
except the last.

See the comments added in `is_state_visited()` and
`compute_scc_callchain()` for more details.

Fixes: 2a0992829e ("bpf: correct loop detection for iterators convergence")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
b5c677d8d9 bpf: move REG_LIVE_DONE check to clean_live_states()
The next patch would add some relatively heavy-weight operation to
clean_live_states(), this operation can be skipped if REG_LIVE_DONE
is set. Move the check from clean_verifier_state() to
clean_verifier_state() as a small refactoring commit.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
dfb2d4c64b bpf: set 'changed' status if propagate_liveness() did any updates
Add an out parameter to `propagate_liveness()` to record whether any
new liveness bits were set during its execution.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
23b37d6165 bpf: set 'changed' status if propagate_precision() did any updates
Add an out parameter to `propagate_precision()` to record whether any
new precision bits were set during its execution.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
9a2a0d7924 bpf: starting_state parameter for __mark_chain_precision()
Allow `mark_chain_precision()` to run from an arbitrary starting state
by replacing direct references to `env->cur_state` with a parameter.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:42 -07:00
Eduard Zingerman
13f843c017 bpf: frame_insn_idx() utility function
A function to return IP for a given frame in a call stack of a state.
Will be used by a next patch.

The `state->insn_idx = env->insn_idx;` assignment in the do_check()
allows to use frame_insn_idx with env->cur_state.
At the moment bpf_verifier_state->insn_idx is set when new cached
state is added in is_state_visited() and accessed only in the contexts
when the state is already in the cache. Hence this assignment does not
change verifier behaviour.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:42 -07:00
Eduard Zingerman
96c6aa4c63 bpf: compute SCCs in program control flow graph
Compute strongly connected components in the program CFG.
Assign an SCC number to each instruction, recorded in
env->insn_aux[*].scc. Use Tarjan's algorithm for SCC computation
adapted to run non-recursively.

For debug purposes print out computed SCCs as a part of full program
dump in compute_live_registers() at log level 2, e.g.:

  func#0 @0
  Live regs before insn:
        0: .......... (b4) w6 = 10
    2   1: ......6... (18) r1 = 0xffff88810bbb5565
    2   3: .1....6... (b4) w2 = 2
    2   4: .12...6... (85) call bpf_trace_printk#6
    2   5: ......6... (04) w6 += -1
    2   6: ......6... (56) if w6 != 0x0 goto pc-6
        7: .......... (b4) w6 = 5
    1   8: ......6... (18) r1 = 0xffff88810bbb5567
    1  10: .1....6... (b4) w2 = 2
    1  11: .12...6... (85) call bpf_trace_printk#6
    1  12: ......6... (04) w6 += -1
    1  13: ......6... (56) if w6 != 0x0 goto pc-6
       14: .......... (b4) w0 = 0
       15: 0......... (95) exit
   ^^^
  SCC number for the instruction

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:42 -07:00
Eduard Zingerman
baaebe0928 Revert "bpf: use common instruction history across all states"
This reverts commit 96a30e469c.
Next patches in the series modify propagate_precision() to allow
arbitrary starting state. Precision propagation requires access to
jump history, and arbitrary states represent history not belonging to
`env->cur_state`.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:42 -07:00
Leon Romanovsky
c0f21029f1 xfrm: always initialize offload path
Offload path is used for GRO with SW IPsec, and not just for HW
offload. So initialize it anyway.

Fixes: 585b64f5a6 ("xfrm: delay initialization of offload path till its actually requested")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Closes: https://lore.kernel.org/all/aEGW_5HfPqU1rFjl@krikkit
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-06-12 07:07:14 +02:00
Yonghong Song
517b088a84 selftests/bpf: Fix cgroup_mprog_ordering failure due to uninitialized variable
On arm64, the cgroup_mprog_ordering selftest failed with test_progs run
when building with clang compiler. The reason is due to socklen_t optlen
not initialized.

In kernel function do_ip_getsockopt(), we have

        if (copy_from_sockptr(&len, optlen, sizeof(int)))
                return -EFAULT;
        if (len < 0)
                return -EINVAL;

The above 'len' variable is a negative value and hence the test failed.

But the test is okay on x86_64. I checked the x86_64 asm code and I didn't
see explicit initialization of 'optlen' but its value is 0 so kernel
didn't return error. This should be a pure luck.

Fix the bug by initializing 'oplen' var properly.

Fixes: e422d5f118 ("selftests/bpf: Add two selftests for mprog API based cgroup progs")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250611162103.1623692-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-11 12:32:50 -07:00
Eslam Khafagy
c9b03a1100 bpf, doc: Improve wording of docs
The phrase "dividing -1" is one I find confusing.  E.g.,
"INT_MIN dividing -1" sounds like "-1 / INT_MIN" rather than the inverse.
"divided by" instead of "dividing" assuming the inverse is meant.

Signed-off-by: Eslam Khafagy <eslam.medhat1993@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607222434.227890-1-eslam.medhat1993@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-11 12:30:41 -07:00
Tobias Klauser
2d72dd14d7 bpf: adjust path to trace_output sample eBPF program
The sample file was renamed from trace_output_kern.c to
trace_output.bpf.c in commit d4fffba4d0 ("samples/bpf: Change _kern
suffix to .bpf with syscall tracing program"). Adjust the path in the
documentation comment for bpf_perf_event_output.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Link: https://lore.kernel.org/r/20250610140756.16332-1-tklauser@distanz.ch
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-10 08:10:21 -07:00
Steffen Klassert
766f6a784b Merge branch 'xfrm: fixes for xfrm_state_find under preemption'
Sabrina Dubroca says:

====================
While looking at the pcpu_id changes, I found two issues that can
happen if we get preempted and the cpu_id changes. The second patch
takes care of both problems. The first patch also makes sure we don't
use state_ptrs uninitialized, which could currently happen. syzbot
seems to have hit this issue [1].

[1] https://syzkaller.appspot.com/bug?extid=7ed9d47e15e88581dc5b
====================

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-06-10 09:51:54 +02:00
Alexei Starovoitov
5fcf896efe Merge branch 'bpf-mitigate-spectre-v1-using-barriers'
Luis Gerhorst says:

====================
This improves the expressiveness of unprivileged BPF by inserting
speculation barriers instead of rejecting the programs.

The approach was previously presented at LPC'24 [1] and RAID'24 [2].

To mitigate the Spectre v1 (PHT) vulnerability, the kernel rejects
potentially-dangerous unprivileged BPF programs as of
commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted
branches"). In [2], we have analyzed 364 object files from open source
projects (Linux Samples and Selftests, BCC, Loxilb, Cilium, libbpf
Examples, Parca, and Prevail) and found that this affects 31% to 54% of
programs.

To resolve this in the majority of cases this patchset adds a fall-back
for mitigating Spectre v1 using speculation barriers. The kernel still
optimistically attempts to verify all speculative paths but uses
speculation barriers against v1 when unsafe behavior is detected. This
allows for more programs to be accepted without disabling the BPF
Spectre mitigations (e.g., by setting cpu_mitigations_off()).

For this, it relies on the fact that speculation barriers generally
prevent all later instructions from executing if the speculation was not
correct (not only loads). See patch 7 ("bpf: Fall back to nospec for
Spectre v1") for a detailed description and references to the relevant
vendor documentation (AMD and Intel x86-64, ARM64, and PowerPC).

In [1] we have measured the overhead of this approach relative to having
mitigations off and including the upstream Spectre v4 mitigations. For
event tracing and stack-sampling profilers, we found that mitigations
increase BPF program execution time by 0% to 62%. For the Loxilb network
load balancer, we have measured a 14% slowdown in SCTP performance but
no significant slowdown for TCP. This overhead only applies to programs
that were previously rejected.

I reran the expressiveness-evaluation with v6.14 and made sure the main
results still match those from [1] and [2] (which used v6.5).

Main design decisions are:

* Do not use separate bytecode insns for v1 and v4 barriers (inspired by
  Daniel Borkmann's question at LPC). This simplifies the verifier
  significantly and has the only downside that performance on PowerPC is
  not as high as it could be.

* Allow archs to still disable v1/v4 mitigations separately by setting
  bpf_jit_bypass_spec_v1/v4(). This has the benefit that archs can
  benefit from improved BPF expressiveness / performance if they are not
  vulnerable (e.g., ARM64 for v4 in the kernel).

* Do not remove the empty BPF_NOSPEC implementation for backends for
  which it is unknown whether they are vulnerable to Spectre v1.

[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating
    Spectre-PHT using Speculation Barriers in Linux eBPF")
[2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
    Precise Spectre Defenses for Untrusted Linux Kernel Extensions")

Changes:

* v3 -> v4:
  - Remove insn parameter from do_check_insn() and extract
    process_bpf_exit_full as a function as requested by Eduard
  - Investigate apparent sanitize_check_bounds() bug reported by
    Kartikeya (does appear to not be a bug but only confusing code),
    sent separate patch to document it and add an assert
  - Remove already-merged commit 1 ("selftests/bpf: Fix caps for
    __xlated/jited_unpriv")
  - Drop former commit 10 ("bpf: Allow nospec-protected var-offset stack
    access") as it did not include a test and there are other places
    where var-off is rejected. Also, none of the tested real-world
    programs used var-off in the paper. Therefore keep the old behavior
    for now and potentially prepare a patch that converts all cases
    later if required.
  - Add link to AMD lfence and PowerPC speculation barrier (ori 31,31,0)
    documentation
  - Move detailed barrier documentation to commit 7 ("bpf: Fall back to
    nospec for Spectre v1")
  - Link to v3: https://lore.kernel.org/all/20250501073603.1402960-1-luis.gerhorst@fau.de/

* v2 -> v3:
  - Fix
    https://lore.kernel.org/oe-kbuild-all/202504212030.IF1SLhz6-lkp@intel.com/
    and similar by moving the bpf_jit_bypass_spec_v1/v4() prototypes out
    of the #ifdef CONFIG_BPF_SYSCALL. Decided not to move them to
    filter.h (where similar bpf_jit_*() prototypes live) as they would
    still have to be duplicated in bpf.h to be usable to
    bpf_bypass_spec_v1/v4() (unless including filter.h in bpf.h is an
    option).
  - Fix
    https://lore.kernel.org/oe-kbuild-all/202504220035.SoGveGpj-lkp@intel.com/
    by moving the variable declarations out of the switch-case.
  - Build touched C files with W=2 and bpf config on x86 to check that
    there are no other warnings introduced.
  - Found 3 more checkpatch warnings that can be fixed without degrading
    readability.
  - Rebase to bpf-next 2025-05-01
  - Link to v2: https://lore.kernel.org/bpf/20250421091802.3234859-1-luis.gerhorst@fau.de/

* v1 -> v2:
  - Drop former commits 9 ("bpf: Return PTR_ERR from push_stack()") and 11
    ("bpf: Fall back to nospec for spec path verification") as suggested
    by Alexei. This series therefore no longer changes push_stack() to
    return PTR_ERR.
  - Add detailed explanation of how lfence works internally and how it
    affects the algorithm.
  - Add tests checking that nospec instructions are inserted in expected
    locations using __xlated_unpriv as suggested by Eduard (also,
    include a fix for __xlated_unpriv)
  - Add a test for the mitigations from the description of
    commit 9183671af6db ("bpf: Fix leakage under speculation on
    mispredicted branches")
  - Remove unused variables from do_check[_insn]() as suggested by
    Eduard.
  - Remove INSN_IDX_MODIFIED to improve readability as suggested by
    Eduard. This also causes the nospec_result-check to run (and fail)
    for jumping-ops. Add a warning to assert that this check must never
    succeed in that case.
  - Add details on the safety of patch 10 ("bpf: Allow nospec-protected
    var-offset stack access") based on the feedback on v1.
  - Rebase to bpf-next-250420
  - Link to v1: https://lore.kernel.org/all/20250313172127.1098195-1-luis.gerhorst@fau.de/

* RFC -> v1:
  - rebase to bpf-next-250313
  - tests: mark expected successes/new errors
  - add bpt_jit_bypass_spec_v1/v4() to avoid #ifdef in
    bpf_bypass_spec_v1/v4()
  - ensure that nospec with v1-support is implemented for archs for
    which GCC supports speculation barriers, except for MIPS
  - arm64: emit speculation barrier
  - powerpc: change nospec to include v1 barrier
  - discuss potential security (archs that do not impl. BPF nospec) and
    performance (only PowerPC) regressions
  - Link to RFC: https://lore.kernel.org/bpf/20250224203619.594724-1-luis.gerhorst@fau.de/
====================

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://patch.msgid.link/20250603205800.334980-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 22:17:39 -07:00
Luis Gerhorst
4a8765d9a5 selftests/bpf: Add test for Spectre v1 mitigation
This is based on the gadget from the description of commit 9183671af6db
("bpf: Fix leakage under speculation on mispredicted branches").

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250603212814.338867-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:10 -07:00
Luis Gerhorst
d6f1c85f22 bpf: Fall back to nospec for Spectre v1
This implements the core of the series and causes the verifier to fall
back to mitigating Spectre v1 using speculation barriers. The approach
was presented at LPC'24 [1] and RAID'24 [2].

If we find any forbidden behavior on a speculative path, we insert a
nospec (e.g., lfence speculation barrier on x86) before the instruction
and stop verifying the path. While verifying a speculative path, we can
furthermore stop verification of that path whenever we encounter a
nospec instruction.

A minimal example program would look as follows:

	A = true
	B = true
	if A goto e
	f()
	if B goto e
	unsafe()
e:	exit

There are the following speculative and non-speculative paths
(`cur->speculative` and `speculative` referring to the value of the
push_stack() parameters):

- A = true
- B = true
- if A goto e
  - A && !cur->speculative && !speculative
    - exit
  - !A && !cur->speculative && speculative
    - f()
    - if B goto e
      - B && cur->speculative && !speculative
        - exit
      - !B && cur->speculative && speculative
        - unsafe()

If f() contains any unsafe behavior under Spectre v1 and the unsafe
behavior matches `state->speculative &&
error_recoverable_with_nospec(err)`, do_check() will now add a nospec
before f() instead of rejecting the program:

	A = true
	B = true
	if A goto e
	nospec
	f()
	if B goto e
	unsafe()
e:	exit

Alternatively, the algorithm also takes advantage of nospec instructions
inserted for other reasons (e.g., Spectre v4). Taking the program above
as an example, speculative path exploration can stop before f() if a
nospec was inserted there because of Spectre v4 sanitization.

In this example, all instructions after the nospec are dead code (and
with the nospec they are also dead code speculatively).

For this, it relies on the fact that speculation barriers generally
prevent all later instructions from executing if the speculation was not
correct:

* On Intel x86_64, lfence acts as full speculation barrier, not only as
  a load fence [3]:

    An LFENCE instruction or a serializing instruction will ensure that
    no later instructions execute, even speculatively, until all prior
    instructions complete locally. [...] Inserting an LFENCE instruction
    after a bounds check prevents later operations from executing before
    the bound check completes.

  This was experimentally confirmed in [4].

* On AMD x86_64, lfence is dispatch-serializing [5] (requires MSR
  C001_1029[1] to be set if the MSR is supported, this happens in
  init_amd()). AMD further specifies "A dispatch serializing instruction
  forces the processor to retire the serializing instruction and all
  previous instructions before the next instruction is executed" [8]. As
  dispatch is not specific to memory loads or branches, lfence therefore
  also affects all instructions there. Also, if retiring a branch means
  it's PC change becomes architectural (should be), this means any
  "wrong" speculation is aborted as required for this series.

* ARM's SB speculation barrier instruction also affects "any instruction
  that appears later in the program order than the barrier" [6].

* PowerPC's barrier also affects all subsequent instructions [7]:

    [...] executing an ori R31,R31,0 instruction ensures that all
    instructions preceding the ori R31,R31,0 instruction have completed
    before the ori R31,R31,0 instruction completes, and that no
    subsequent instructions are initiated, even out-of-order, until
    after the ori R31,R31,0 instruction completes. The ori R31,R31,0
    instruction may complete before storage accesses associated with
    instructions preceding the ori R31,R31,0 instruction have been
    performed

Regarding the example, this implies that `if B goto e` will not execute
before `if A goto e` completes. Once `if A goto e` completes, the CPU
should find that the speculation was wrong and continue with `exit`.

If there is any other path that leads to `if B goto e` (and therefore
`unsafe()`) without going through `if A goto e`, then a nospec will
still be needed there. However, this patch assumes this other path will
be explored separately and therefore be discovered by the verifier even
if the exploration discussed here stops at the nospec.

This patch furthermore has the unfortunate consequence that Spectre v1
mitigations now only support architectures which implement BPF_NOSPEC.
Before this commit, Spectre v1 mitigations prevented exploits by
rejecting the programs on all architectures. Because some JITs do not
implement BPF_NOSPEC, this patch therefore may regress unpriv BPF's
security to a limited extent:

* The regression is limited to systems vulnerable to Spectre v1, have
  unprivileged BPF enabled, and do NOT emit insns for BPF_NOSPEC. The
  latter is not the case for x86 64- and 32-bit, arm64, and powerpc
  64-bit and they are therefore not affected by the regression.
  According to commit a6f6a95f25 ("LoongArch, bpf: Fix jit to skip
  speculation barrier opcode"), LoongArch is not vulnerable to Spectre
  v1 and therefore also not affected by the regression.

* To the best of my knowledge this regression may therefore only affect
  MIPS. This is deemed acceptable because unpriv BPF is still disabled
  there by default. As stated in a previous commit, BPF_NOSPEC could be
  implemented for MIPS based on GCC's speculation_barrier
  implementation.

* It is unclear which other architectures (besides x86 64- and 32-bit,
  ARM64, PowerPC 64-bit, LoongArch, and MIPS) supported by the kernel
  are vulnerable to Spectre v1. Also, it is not clear if barriers are
  available on these architectures. Implementing BPF_NOSPEC on these
  architectures therefore is non-trivial. Searching GCC and the kernel
  for speculation barrier implementations for these architectures
  yielded no result.

* If any of those regressed systems is also vulnerable to Spectre v4,
  the system was already vulnerable to Spectre v4 attacks based on
  unpriv BPF before this patch and the impact is therefore further
  limited.

As an alternative to regressing security, one could still reject
programs if the architecture does not emit BPF_NOSPEC (e.g., by removing
the empty BPF_NOSPEC-case from all JITs except for LoongArch where it
appears justified). However, this will cause rejections on these archs
that are likely unfounded in the vast majority of cases.

In the tests, some are now successful where we previously had a
false-positive (i.e., rejection). Change them to reflect where the
nospec should be inserted (using __xlated_unpriv) and modify the error
message if the nospec is able to mitigate a problem that previously
shadowed another problem (in that case __xlated_unpriv does not work,
therefore just add a comment).

Define SPEC_V1 to avoid duplicating this ifdef whenever we check for
nospec insns using __xlated_unpriv, define it here once. This also
improves readability. PowerPC can probably also be added here. However,
omit it for now because the BPF CI currently does not include a test.

Limit it to EPERM, EACCES, and EINVAL (and not everything except for
EFAULT and ENOMEM) as it already has the desired effect for most
real-world programs. Briefly went through all the occurrences of EPERM,
EINVAL, and EACCESS in verifier.c to validate that catching them like
this makes sense.

Thanks to Dustin for their help in checking the vendor documentation.

[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating
    Spectre-PHT using Speculation Barriers in Linux eBPF")
[2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
    Precise Spectre Defenses for Untrusted Linux Kernel Extensions")
[3] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html
    ("Managed Runtime Speculative Execution Side Channel Mitigations")
[4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a
    tool to analyze speculative execution attacks and mitigations" -
    Section 4.6 "Stopping Speculative Execution")
[5] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf
    ("White Paper - SOFTWARE TECHNIQUES FOR MANAGING SPECULATION ON AMD
    PROCESSORS - REVISION 5.09.23")
[6] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/SB--Speculation-Barrier-
    ("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set
    Architecture (2020-12)")
[7] https://wiki.raptorcs.com/w/images/5/5f/OPF_PowerISA_v3.1C.pdf
    ("Power ISA™ - Version 3.1C - May 26, 2024 - Section 9.2.1 of Book
    III")
[8] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
    ("AMD64 Architecture Programmer’s Manual Volumes 1–5 - Revision 4.08
    - April 2024 - 7.6.4 Serializing Instructions")

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Dustin Nguyen <nguyen@cs.fau.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603212428.338473-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:10 -07:00
Luis Gerhorst
9124a45080 bpf: Rename sanitize_stack_spill to nospec_result
This is made to clarify that this flag will cause a nospec to be added
after this insn and can therefore be relied upon to reduce speculative
path analysis.

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603212024.338154-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:10 -07:00
Luis Gerhorst
dff883d9e9 bpf, arm64, powerpc: Change nospec to include v1 barrier
This changes the semantics of BPF_NOSPEC (previously a v4-only barrier)
to always emit a speculation barrier that works against both Spectre v1
AND v4. If mitigation is not needed on an architecture, the backend
should set bpf_jit_bypass_spec_v4/v1().

As of now, this commit only has the user-visible implication that unpriv
BPF's performance on PowerPC is reduced. This is the case because we
have to emit additional v1 barrier instructions for BPF_NOSPEC now.

This commit is required for a future commit to allow us to rely on
BPF_NOSPEC for Spectre v1 mitigation. As of this commit, the feature
that nospec acts as a v1 barrier is unused.

Commit f5e81d1117 ("bpf: Introduce BPF nospec instruction for
mitigating Spectre v4") noted that mitigation instructions for v1 and v4
might be different on some archs. While this would potentially offer
improved performance on PowerPC, it was dismissed after the following
considerations:

* Only having one barrier simplifies the verifier and allows us to
  easily rely on v4-induced barriers for reducing the complexity of
  v1-induced speculative path verification.

* For the architectures that implemented BPF_NOSPEC, only PowerPC has
  distinct instructions for v1 and v4. Even there, some insns may be
  shared between the barriers for v1 and v4 (e.g., 'ori 31,31,0' and
  'sync'). If this is still found to impact performance in an
  unacceptable way, BPF_NOSPEC can be split into BPF_NOSPEC_V1 and
  BPF_NOSPEC_V4 later. As an optimization, we can already skip v1/v4
  insns from being emitted for PowerPC with this setup if
  bypass_spec_v1/v4 is set.

Vulnerability-status for BPF_NOSPEC-based Spectre mitigations (v4 as of
this commit, v1 in the future) is therefore:

* x86 (32-bit and 64-bit), ARM64, and PowerPC (64-bit): Mitigated - This
  patch implements BPF_NOSPEC for these architectures. The previous
  v4-only version was supported since commit f5e81d1117 ("bpf:
  Introduce BPF nospec instruction for mitigating Spectre v4") and
  commit b7540d6250 ("powerpc/bpf: Emit stf barrier instruction
  sequences for BPF_NOSPEC").

* LoongArch: Not Vulnerable - Commit a6f6a95f25 ("LoongArch, bpf: Fix
  jit to skip speculation barrier opcode") is the only other past commit
  related to BPF_NOSPEC and indicates that the insn is not required
  there.

* MIPS: Vulnerable (if unprivileged BPF is enabled) -
  Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation
  barrier opcode") indicates that it is not vulnerable, but this
  contradicts the kernel and Debian documentation. Therefore, I assume
  that there exist vulnerable MIPS CPUs (but maybe not from Loongson?).
  In the future, BPF_NOSPEC could be implemented for MIPS based on the
  GCC speculation_barrier [1]. For now, we rely on unprivileged BPF
  being disabled by default.

* Other: Unknown - To the best of my knowledge there is no definitive
  information available that indicates that any other arch is
  vulnerable. They are therefore left untouched (BPF_NOSPEC is not
  implemented, but bypass_spec_v1/v4 is also not set).

I did the following testing to ensure the insn encoding is correct:

* ARM64:
  * 'dsb nsh; isb' was successfully tested with the BPF CI in [2]
  * 'sb' locally using QEMU v7.2.15 -cpu max (emitted sb insn is
    executed for example with './test_progs -t verifier_array_access')

* PowerPC: The following configs were tested locally with ppc64le QEMU
  v8.2 '-machine pseries -cpu POWER9':
  * STF_BARRIER_EIEIO + CONFIG_PPC_BOOK32_64
  * STF_BARRIER_SYNC_ORI (forced on) + CONFIG_PPC_BOOK32_64
  * STF_BARRIER_FALLBACK (forced on) + CONFIG_PPC_BOOK32_64
  * CONFIG_PPC_E500 (forced on) + STF_BARRIER_EIEIO
  * CONFIG_PPC_E500 (forced on) + STF_BARRIER_SYNC_ORI (forced on)
  * CONFIG_PPC_E500 (forced on) + STF_BARRIER_FALLBACK (forced on)
  * CONFIG_PPC_E500 (forced on) + STF_BARRIER_NONE (forced on)
  Most of those cobinations should not occur in practice, but I was not
  able to get an PPC e6500 rootfs (for testing PPC_E500 without forcing
  it on). In any case, this should ensure that there are no unexpected
  conflicts between the insns when combined like this. Individual v1/v4
  barriers were already emitted elsewhere.

Hari's ack is for the PowerPC changes only.

[1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=29b74545531f6afbee9fc38c267524326dbfbedf
    ("MIPS: Add speculation_barrier support")
[2] https://github.com/kernel-patches/bpf/pull/8576

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603211703.337860-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:09 -07:00
Luis Gerhorst
03c68a0f8c bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4()
JITs can set bpf_jit_bypass_spec_v1/v4() if they want the verifier to
skip analysis/patching for the respective vulnerability. For v4, this
will reduce the number of barriers the verifier inserts. For v1, it
allows more programs to be accepted.

The primary motivation for this is to not regress unpriv BPF's
performance on ARM64 in a future commit where BPF_NOSPEC is also used
against Spectre v1.

This has the user-visible change that v1-induced rejections on
non-vulnerable PowerPC CPUs are avoided.

For now, this does not change the semantics of BPF_NOSPEC. It is still a
v4-only barrier and must not be implemented if bypass_spec_v4 is always
true for the arch. Changing it to a v1 AND v4-barrier is done in a
future commit.

As an alternative to bypass_spec_v1/v4, one could introduce NOSPEC_V1
AND NOSPEC_V4 instructions and allow backends to skip their lowering as
suggested by commit f5e81d1117 ("bpf: Introduce BPF nospec instruction
for mitigating Spectre v4"). Adding bpf_jit_bypass_spec_v1/v4() was
found to be preferable for the following reason:

* bypass_spec_v1/v4 benefits non-vulnerable CPUs: Always performing the
  same analysis (not taking into account whether the current CPU is
  vulnerable), needlessly restricts users of CPUs that are not
  vulnerable. The only use case for this would be portability-testing,
  but this can later be added easily when needed by allowing users to
  force bypass_spec_v1/v4 to false.

* Portability is still acceptable: Directly disabling the analysis
  instead of skipping the lowering of BPF_NOSPEC(_V1/V4) might allow
  programs on non-vulnerable CPUs to be accepted while the program will
  be rejected on vulnerable CPUs. With the fallback to speculation
  barriers for Spectre v1 implemented in a future commit, this will only
  affect programs that do variable stack-accesses or are very complex.

For PowerPC, the SEC_FTR checking in bpf_jit_bypass_spec_v4() is based
on the check that was previously located in the BPF_NOSPEC case.

For LoongArch, it would likely be safe to set both
bpf_jit_bypass_spec_v1() and _v4() according to
commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation
barrier opcode"). This is omitted here as I am unable to do any testing
for LoongArch.

Hari's ack concerns the PowerPC part only.

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603211318.337474-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:09 -07:00
Luis Gerhorst
6b84d7895d bpf: Return -EFAULT on internal errors
This prevents us from trying to recover from these on speculative paths
in the future.

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603205800.334980-4-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:09 -07:00
Luis Gerhorst
fd508bde5d bpf: Return -EFAULT on misconfigurations
Mark these cases as non-recoverable to later prevent them from being
caught when they occur during speculative path verification.

Eduard writes [1]:

  The only pace I'm aware of that might act upon specific error code
  from verifier syscall is libbpf. Looking through libbpf code, it seems
  that this change does not interfere with libbpf.

[1] https://lore.kernel.org/all/785b4531ce3b44a84059a4feb4ba458c68fce719.camel@gmail.com/

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603205800.334980-3-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:09 -07:00
Luis Gerhorst
8b7df50fd4 bpf: Move insn if/else into do_check_insn()
This is required to catch the errors later and fall back to a nospec if
on a speculative path.

Eliminate the regs variable as it is only used once and insn_idx is not
modified in-between the definition and usage.

Do not pass insn but compute it in the function itself. As Eduard points
out [1], insn is assumed to correspond to env->insn_idx in many places
(e.g, __check_reg_arg()).

Move code into do_check_insn(), replace
* "continue" with "return 0" after modifying insn_idx
* "goto process_bpf_exit" with "return PROCESS_BPF_EXIT"
* "goto process_bpf_exit_full" with "return process_bpf_exit_full()"
* "do_print_state = " with "*do_print_state = "

[1] https://lore.kernel.org/all/293dbe3950a782b8eb3b87b71d7a967e120191fd.camel@gmail.com/

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603205800.334980-2-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:09 -07:00
Tao Chen
2bc0575fec bpf: Add cookie in fdinfo for raw_tp
Add cookie in fdinfo for raw_tp, the info as follows:

link_type:	raw_tracepoint
link_id:	31
prog_tag:	9dfdf8ef453843bf
prog_id:	32
tp_name:	sys_enter
cookie:	23925373020405760

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-5-chen.dylane@linux.dev
2025-06-09 16:45:17 -07:00
Tao Chen
380cb6dfa2 bpf: Add cookie in fdinfo for tracing
Add cookie in fdinfo for tracing, the info as follows:

link_type:	tracing
link_id:	6
prog_tag:	9dfdf8ef453843bf
prog_id:	35
attach_type:	25
target_obj_id:	1
target_btf_id:	60355
cookie:	9007199254740992

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-4-chen.dylane@linux.dev
2025-06-09 16:45:17 -07:00
Tao Chen
ad954cbe08 bpftool: Display cookie for tracing link probe
Display cookie for tracing link probe, in plain mode:

 #bpftool link
5: tracing  prog 34
	prog_type tracing  attach_type trace_fentry
	target_obj_id 1  target_btf_id 60355
	cookie 4503599627370496
	pids test_progs(176)

And in json mode:

 #bpftool link -j | jq
{
    "id": 5,
    "type": "tracing",
    "prog_id": 34,
    "prog_type": "tracing",
    "attach_type": "trace_fentry",
    "target_obj_id": 1,
    "target_btf_id": 60355,
    "cookie": 4503599627370496,
    "pids": [
      {
        "pid": 176,
        "comm": "test_progs"
      }
    ]
 }

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-3-chen.dylane@linux.dev
2025-06-09 16:45:17 -07:00
Tao Chen
d77efc0ef5 selftests/bpf: Add cookies check for tracing fill_link_info test
Adding tests for getting cookie with fill_link_info for tracing.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-2-chen.dylane@linux.dev
2025-06-09 16:45:17 -07:00
Tao Chen
c7beb48344 bpf: Add cookie to tracing bpf_link_info
bpf_tramp_link includes cookie info, we can add it in bpf_link_info.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-1-chen.dylane@linux.dev
2025-06-09 16:45:17 -07:00
Andrii Nakryiko
f3effef2e8 Merge branch 'bpf-make-reg_not_null-true-for-const_ptr_to_map'
Ihor Solodrai says:

====================
bpf: make reg_not_null() true for CONST_PTR_TO_MAP

Handle CONST_PTR_TO_MAP null checks in the BPF verifier. Add
appropriate test cases.

v3->v4: more test cases
v2->v3: change constant in unpriv test
v1->v2: add a test case with ringbufs

v3: https://lore.kernel.org/bpf/20250604222729.3351946-1-isolodrai@meta.com/
v2: https://lore.kernel.org/bpf/20250604003759.1020745-1-isolodrai@meta.com/
v1: https://lore.kernel.org/bpf/20250523232503.1086319-1-isolodrai@meta.com/
====================

Link: https://patch.msgid.link/20250609183024.359974-1-isolodrai@meta.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-09 16:42:05 -07:00
Ihor Solodrai
260b862918 selftests/bpf: Add test cases with CONST_PTR_TO_MAP null checks
A test requires the following to happen:
  * CONST_PTR_TO_MAP value is checked for null
  * the code in the null branch fails verification

Add test cases:
* direct global map_ptr comparison to null
* lookup inner map, then two checks (the first transforms
  map_value_or_null into map_ptr)
* lookup inner map, spill-fill it, then check for null
* use an array of ringbufs to recreate a common coding pattern [1]

[1] https://lore.kernel.org/bpf/CAEf4BzZNU0gX_sQ8k8JaLe1e+Veth3Rk=4x7MDhv=hQxvO8EDw@mail.gmail.com/

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-4-isolodrai@meta.com
2025-06-09 16:42:04 -07:00
Ihor Solodrai
eb6c992784 selftests/bpf: Add cmp_map_pointer_with_const test
Add a test for CONST_PTR_TO_MAP comparison with a non-0 constant. A
BPF program with this code must not pass verification in unpriv.

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-3-isolodrai@meta.com
2025-06-09 16:42:04 -07:00
Ihor Solodrai
5534e58f2e bpf: Make reg_not_null() true for CONST_PTR_TO_MAP
When reg->type is CONST_PTR_TO_MAP, it can not be null. However the
verifier explores the branches under rX == 0 in check_cond_jmp_op()
even if reg->type is CONST_PTR_TO_MAP, because it was not checked for
in reg_not_null().

Fix this by adding CONST_PTR_TO_MAP to the set of types that are
considered non nullable in reg_not_null().

An old "unpriv: cmp map pointer with zero" selftest fails with this
change, because now early out correctly triggers in
check_cond_jmp_op(), making the verification to pass.

In practice verifier may allow pointer to null comparison in unpriv,
since in many cases the relevant branch and comparison op are removed
as dead code. So change the expected test result to __success_unpriv.

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-2-isolodrai@meta.com
2025-06-09 16:42:04 -07:00
Tao Chen
97ebac5886 bpf: Add show_fdinfo for perf_event
After commit 1b715e1b0e ("bpf: Support ->fill_link_info for perf_event") add
perf_event info, we can also show the info with the method of cat /proc/[fd]/fdinfo.

kprobe fdinfo:
link_type:	perf
link_id:	10
prog_tag:	bcf7977d3b93787c
prog_id:	20
name:	bpf_fentry_test1
offset:	0x0
missed:	0
addr:	0xffffffffa28a2904
event_type:	kprobe
cookie:	3735928559

uprobe fdinfo:
link_type:	perf
link_id:	13
prog_tag:	bcf7977d3b93787c
prog_id:	21
name:	/proc/self/exe
offset:	0x63dce4
ref_ctr_offset:	0x33eee2a
event_type:	uprobe
cookie:	3735928559

tracepoint fdinfo:
link_type:	perf
link_id:	11
prog_tag:	bcf7977d3b93787c
prog_id:	22
tp_name:	sched_switch
event_type:	tracepoint
cookie:	3735928559

perf_event fdinfo:
link_type:	perf
link_id:	12
prog_tag:	bcf7977d3b93787c
prog_id:	23
type:	1
config:	2
event_type:	event
cookie:	3735928559

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606150258.3385166-1-chen.dylane@linux.dev
2025-06-09 16:40:13 -07:00
Andrii Nakryiko
4d2815a1cc Merge branch 'bpf-implement-mprog-api-on-top-of-existing-cgroup-progs'
Yonghong Song says:

====================
bpf: Implement mprog API on top of existing cgroup progs

Current cgroup prog ordering is appending at attachment time. This is not
ideal. In some cases, users want specific ordering at a particular cgroup
level. For example, in Meta, we have a case where three different
applications all have cgroup/setsockopt progs and they require specific
ordering. Current approach is to use a bpfchainer where one bpf prog
contains multiple global functions and each global function can be
freplaced by a prog for a specific application. The ordering of global
functions decides the ordering of those application specific bpf progs.
Using bpfchainer is a centralized approach and is not desirable as
one of applications acts as a daemon. The decentralized attachment
approach is more favorable for those applications.

To address this, the existing mprog API ([2]) seems an ideal solution with
supporting BPF_F_BEFORE and BPF_F_AFTER flags on top of existing cgroup
bpf implementation. More specifically, the support is added for prog/link
attachment with BPF_F_BEFORE and BPF_F_AFTER. The kernel mprog
interface ([2]) is not used and the implementation is directly done in
cgroup bpf code base. The mprog 'revision' is also implemented in
attach/detach/replace, so users can query revision number to check the
change of cgroup prog list.

The patch set contains 5 patches. Patch 1 adds revision support for
cgroup bpf progs. Patch 2 implements mprog API implementation for
prog/link attach and revision update. Patch 3 adds a new libbpf
API to do cgroup link attach with flags like BPF_F_BEFORE/BPF_F_AFTER.
Patches 4 and 5 add two tests to validate the implementation.

  [1] https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev
  [2] https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net

Changelogs:
  v4 -> v5:
    - v4: https://lore.kernel.org/bpf/20250530173812.1823479-1-yonghong.song@linux.dev/
    - Remove early prog/link checking based flags and id_or_fd as later code
      will do checking as well.
    - Do proper cgroup flag checking for bpf_prog_attach().
  v3 -> v4:
    - v3: https://lore.kernel.org/bpf/20250517162720.4077882-1-yonghong.song@linux.dev/
    - Refactor some to make BPF_F_BEFORE/BPF_F_AFTER handling easier to understand.
    - Perviously, I degraded 'link' to 'prog' for later mprog handling. This is
      not correct. Similar to mprog.c, we should be check 'link' instead link->prog
      since it is possible two different links may have the same underlying prog and
      we do not want to miss supporting such use case.
  v2 -> v3:
    - v2: https://lore.kernel.org/bpf/20250508223524.487875-1-yonghong.song@linux.dev/
    - Big change to replace get_anchor_prog() to get_prog_list() so the
      'struct bpf_prog_list *' is returned directly.
    - Support 'BPF_F_BEFORE | BPF_F_AFTER' attachment if the prog list is empty
      and flags do not have 'BPF_F_LINK | BPF_F_ID' and id_or_fd is 0.
    - Add BPF_F_LINK support.
    - Patch 4 is added to reuse id_from_prog_fd() and id_from_link_fd().
  v1 -> v2:
    - v1: https://lore.kernel.org/bpf/20250411011523.1838771-1-yonghong.song@linux.dev/
    - Change cgroup_bpf.revisions from atomic64_t to u64.
    - Added missing bpf_prog_put in various places.
    - Rename get_cmp_prog() to get_anchor_prog(). The implementation tries to
      find the anchor prog regardless of whether id_or_fd is non-NULL or not.
    - Rename bpf_cgroup_prog_attached() to is_cgroup_prog_type() and handle
      BPF_PROG_TYPE_LSM properly (with BPF_LSM_CGROUP attach type).
    - I kept 'id || id_or_fd' condition as the condition 'id' is also used
      in mprog.c so I assume it is okay in cgroup.c as well.
====================

Link: https://patch.msgid.link/20250606163131.2428225-1-yonghong.song@linux.dev
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-09 16:28:31 -07:00
Yonghong Song
e422d5f118 selftests/bpf: Add two selftests for mprog API based cgroup progs
Two tests are added:
  - cgroup_mprog_opts, which mimics tc_opts.c ([1]). Both prog and link
    attach are tested. Some negative tests are also included.
  - cgroup_mprog_ordering, which actually runs the program with some mprog
    API flags.

  [1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/tc_opts.c

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163156.2429955-1-yonghong.song@linux.dev
2025-06-09 16:28:31 -07:00
Yonghong Song
c1bb68656b selftests/bpf: Move some tc_helpers.h functions to test_progs.h
Move static inline functions id_from_prog_fd() and id_from_link_fd()
from prog_tests/tc_helpers.h to test_progs.h so these two functions
can be reused for later cgroup mprog selftests.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163151.2429325-1-yonghong.song@linux.dev
2025-06-09 16:28:30 -07:00
Yonghong Song
1d6711667c libbpf: Support link-based cgroup attach with options
Currently libbpf supports bpf_program__attach_cgroup() with signature:
  LIBBPF_API struct bpf_link *
  bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);

To support mprog style attachment, additionsl fields like flags,
relative_{fd,id} and expected_revision are needed.

Add a new API:
  LIBBPF_API struct bpf_link *
  bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
                                  const struct bpf_cgroup_opts *opts);
where bpf_cgroup_opts contains all above needed fields.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163146.2429212-1-yonghong.song@linux.dev
2025-06-09 16:28:30 -07:00
Yonghong Song
1209339844 bpf: Implement mprog API on top of existing cgroup progs
Current cgroup prog ordering is appending at attachment time. This is not
ideal. In some cases, users want specific ordering at a particular cgroup
level. To address this, the existing mprog API seems an ideal solution with
supporting BPF_F_BEFORE and BPF_F_AFTER flags.

But there are a few obstacles to directly use kernel mprog interface.
Currently cgroup bpf progs already support prog attach/detach/replace
and link-based attach/detach/replace. For example, in struct
bpf_prog_array_item, the cgroup_storage field needs to be together
with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
as the member, which makes it difficult to use kernel mprog interface.

In another case, the current cgroup prog detach tries to use the
same flag as in attach. This is different from mprog kernel interface
which uses flags passed from user space.

So to avoid modifying existing behavior, I made the following changes to
support mprog API for cgroup progs:
 - The support is for prog list at cgroup level. Cross-level prog list
   (a.k.a. effective prog list) is not supported.
 - Previously, BPF_F_PREORDER is supported only for prog attach, now
   BPF_F_PREORDER is also supported by link-based attach.
 - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID/BPF_F_LINK is supported
   similar to kernel mprog but with different implementation.
 - For detach and replace, use the existing implementation.
 - For attach, detach and replace, the revision for a particular prog
   list, associated with a particular attach type, will be updated
   by increasing count by 1.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163141.2428937-1-yonghong.song@linux.dev
2025-06-09 16:28:28 -07:00
Yonghong Song
9b8367b604 cgroup: Add bpf prog revisions to struct cgroup_bpf
One of key items in mprog API is revision for prog list. The revision
number will be increased if the prog list changed, e.g., attach, detach
or replace.

Add 'revisions' field to struct cgroup_bpf, representing revisions for
all cgroup related attachment types. The initial revision value is
set to 1, the same as kernel mprog implementations.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163136.2428732-1-yonghong.song@linux.dev
2025-06-09 16:17:11 -07:00
Eslam Khafagy
e41079f53e Documentation: Fix spelling mistake.
Fix typo "desination => destination"
in file
Documentation/bpf/standardization/instruction-set.rst

Signed-off-by: Eslam Khafagy <eslam.medhat1993@gmail.com>
Acked-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20250606100511.368450-1-eslam.medhat1993@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:22:36 -07:00
Alexei Starovoitov
d365993c2d Merge branch 'selftests-bpf-fix-a-few-test-failures-with-arm64-64kb-page'
Yonghong Song says:

====================
selftests/bpf: Fix a few test failures with arm64 64KB page

My local arm64 host has 64KB page size and the VM to run test_progs
also has 64KB page size. There are a few self tests assuming 4KB page
and failed in my environment.

Patch 1 reduced long assert logs so if the test fails, developers
can check logs easily. Patches 2-4 fixed three selftest failures.

Changelogs:
  v3 -> v4:
    - v3: https://lore.kernel.org/bpf/20250606213048.340421-1-yonghong.song@linux.dev/
    - In v3, I tried to use __kconfig with CONFIG_ARM64_64K_PAGES to decide to have
      4K or 64K aligned. But CI seems unhappy about this. Most likely the reason
      is due to lskel. So in v4, simply adjust/increase numbers to 64K aligned for
      test_ringbuf_write test.
  v2 -> v3:
    - v2: https://lore.kernel.org/bpf/20250606174139.3036576-1-yonghong.song@linux.dev/
    - Fix veristat failure with bpf object file test_ringbuf_write.bpf.o.
  v1 -> v2:
    - v1: https://lore.kernel.org/bpf/20250606032309.444401-1-yonghong.song@linux.dev/
    - Fix a problem with selftest release build, basically from
      BUILD_BUG_ON to ASSERT_LT.
====================

Link: https://patch.msgid.link/20250607013605.1550284-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:44 -07:00
Yonghong Song
bbc7bd658d selftests/bpf: Fix a user_ringbuf failure with arm64 64KB page size
The ringbuf max_entries must be PAGE_ALIGNED. See kernel function
ringbuf_map_alloc(). So for arm64 64KB page size, adjust max_entries
properly.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013626.1553001-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Yonghong Song
8c8c5e3c85 selftests/bpf: Fix ringbuf/ringbuf_write test failure with arm64 64KB page size
The ringbuf max_entries must be PAGE_ALIGNED. See kernel function
ringbuf_map_alloc(). So for arm64 64KB page size, adjust max_entries
and other related metrics properly.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013621.1552332-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Yonghong Song
377d371590 selftests/bpf: Fix bpf_mod_race test failure with arm64 64KB page size
Currently, uffd_register.range.len is set to 4096 for command
'ioctl(uffd, UFFDIO_REGISTER, &uffd_register)'. For arm64 64KB page size,
the len must be 64KB size aligned as page size alignment is required.
See fs/userfaultfd.c:validate_unaligned_range().

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013615.1551783-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Yonghong Song
ae8824037a selftests/bpf: Reduce test_xdp_adjust_frags_tail_grow logs
For selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow, if tested failure,
I see a long list of log output like

    ...
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    ...

There are total 7374 lines of the above which is too much. Let us
only issue such logs when it is an assert failure.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013610.1551399-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Rong Tao
64a064ce33 selftests/bpf: rbtree: Fix incorrect global variable usage
Within __add_three() function, should use function parameters instead of
global variables. So that the variables groot_nested.inner.root and
groot_nested.inner.glock in rbtree_add_nodes_nested() are tested
correctly.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_3DD7405C0839EBE2724AC5FA357B5402B105@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-05 13:55:26 -07:00
Blake Jones
a570f386f3 Tests for the ".emit_strings" functionality in the BTF dumper.
When this mode is turned on, "emit_zeroes" and "compact" have no effect,
and embedded NUL characters always terminate printing of an array.

Signed-off-by: Blake Jones <blakejones@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250603203701.520541-2-blakejones@google.com
2025-06-05 13:45:16 -07:00
Blake Jones
87c9c79a02 libbpf: Add support for printing BTF character arrays as strings
The BTF dumper code currently displays arrays of characters as just that -
arrays, with each character formatted individually. Sometimes this is what
makes sense, but it's nice to be able to treat that array as a string.

This change adds a special case to the btf_dump functionality to allow
0-terminated arrays of single-byte integer values to be printed as
character strings. Characters for which isprint() returns false are
printed as hex-escaped values. This is enabled when the new ".emit_strings"
is set to 1 in the btf_dump_type_data_opts structure.

As an example, here's what it looks like to dump the string "hello" using
a few different field values for btf_dump_type_data_opts (.compact = 1):

- .emit_strings = 0, .skip_names = 0:  (char[6])['h','e','l','l','o',]
- .emit_strings = 0, .skip_names = 1:  ['h','e','l','l','o',]
- .emit_strings = 1, .skip_names = 0:  (char[6])"hello"
- .emit_strings = 1, .skip_names = 1:  "hello"

Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1:

- .emit_strings = 0:  ['h',-1,]
- .emit_strings = 1:  "h\xff"

Signed-off-by: Blake Jones <blakejones@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250603203701.520541-1-blakejones@google.com
2025-06-05 13:45:16 -07:00
Luis Gerhorst
97744b4971 bpf: Clarify sanitize_check_bounds()
As is, it appears as if pointer arithmetic is allowed for everything
except PTR_TO_{STACK,MAP_VALUE} if one only looks at
sanitize_check_bounds(). However, this is misleading as the function
only works together with retrieve_ptr_limit() and the two must be kept
in sync. This patch documents the interdependency and adds a check to
ensure they stay in sync.

adjust_ptr_min_max_vals(): Because the preceding switch returns -EACCES
for every opcode except for ADD/SUB, the sanitize_needed() following the
sanitize_check_bounds() call is always true if reached. This means,
unless sanitize_check_bounds() detected that the pointer goes OOB
because of the ADD/SUB and returns -EACCES, sanitize_ptr_alu() always
executes after sanitize_check_bounds().

The following shows that this also implies that retrieve_ptr_limit()
runs in all relevant cases.

Note that there are two calls to sanitize_ptr_alu(), these are simply
needed to easily calculate the correct alu_limit as explained in
commit 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic
mask"). The truncation-simulation is already performed on the first
call.

In the second sanitize_ptr_alu(commit_window = true), we always run
retrieve_ptr_limit(), unless:

* can_skip_alu_sanititation() is true, notably `BPF_SRC(insn->code) ==
  BPF_K`. BPF_K is fine because it means that there is no scalar
  register (which could be subject to speculative scalar confusion due
  to Spectre v4) that goes into the ALU operation. The pointer register
  can not be subject to v4-based value confusion due to the nospec
  added. Thus, in this case it would have been fine to also skip
  sanitize_check_bounds().

* If we are on a speculative path (`vstate->speculative`) and in the
  second "commit" phase, sanitize_ptr_alu() always just returns 0. This
  makes sense because there are no ALU sanitization limits to be learned
  from speculative paths. Furthermore, because the sanitization will
  ensure that pointer arithmetic stays in (architectural) bounds, the
  sanitize_check_bounds() on the speculative path could also be skipped.

The second case needs more attention: Assume we have some ALU operation
that is used with scalars architecturally, but with a
non-PTR_TO_{STACK,MAP_VALUE} pointer (e.g., PTR_TO_PACKET)
speculatively. It might appear as if this would allow an unsanitized
pointer ALU operations, but this can not happen because one of the
following two always holds:

* The type mismatch stems from Spectre v4, then it is prevented by a
  nospec after the possibly-bypassed store involving the pointer. There
  is no speculative path simulated for this case thus it never happens.

* The type mismatch stems from a Spectre v1 gadget like the following:

    r1 = slow(0)
    r4 = fast(0)
    r3 = SCALAR // Spectre v4 scalar confusion
    if (r1) {
      r2 = PTR_TO_PACKET
    } else {
      r2 = 42
    }
    if (r4) {
      r2 += r3
      *r2
    }

  If `r2 = PTR_TO_PACKET` is indeed dead code, it will be sanitized to
  `goto -1` (as is the case for the r4-if block). If it is not (e.g., if
  `r1 = r4 = 1` is possible), it will also be explored on an
  architectural path and retrieve_ptr_limit() will reject it.

To summarize, the exception for `vstate->speculative` is safe.

Back to retrieve_ptr_limit(): It only allows the ALU operation if the
involved pointer register (can be either source or destination for ADD)
is PTR_TO_STACK or PTR_TO_MAP_VALUE. Otherwise, it returns -EOPNOTSUPP.

Therefore, sanitize_check_bounds() returning 0 for
non-PTR_TO_{STACK,MAP_VALUE} is fine because retrieve_ptr_limit() also
runs for all relevant cases and prevents unsafe operations.

To summarize, we allow unsanitized pointer arithmetic with 64-bit
ADD/SUB for the following instructions if the requirements from
retrieve_ptr_limit() AND sanitize_check_bounds() hold:

* ptr -=/+= imm32 (i.e. `BPF_SRC(insn->code) == BPF_K`)

* PTR_TO_{STACK,MAP_VALUE} -= scalar

* PTR_TO_{STACK,MAP_VALUE} += scalar

* scalar += PTR_TO_{STACK,MAP_VALUE}

To document the interdependency between sanitize_check_bounds() and
retrieve_ptr_limit(), add a verifier_bug_if() to make sure they stay in
sync.

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/CAP01T76HZ+s5h+_REqRFkRjjoKwnZZn9YswpSVinGicah1pGJw@mail.gmail.com/
Link: https://lore.kernel.org/bpf/CAP01T75oU0zfZCiymEcH3r-GQ5A6GOc6GmYzJEnMa3=53XuUQQ@mail.gmail.com/
Link: https://lore.kernel.org/r/20250603204557.332447-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-05 13:20:07 -07:00
Jiawei Zhao
919319b4ed libbpf: Correct some typos and syntax issues in usdt doc
Fix some incorrect words, such as "and" -> "an", "it's" -> "its".  Fix
some grammar issues, such as removing redundant "will", "would
complicated" -> "would complicate".

Signed-off-by: Jiawei Zhao <Phoenix500526@163.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250531095111.57824-1-Phoenix500526@163.com
2025-06-05 11:45:48 -07:00
Tao Chen
9c8827d773 bpftool: Display cookie for raw_tp link probe
Display cookie for raw_tp link probe, in plain mode:

 #bpftool link

22: raw_tracepoint  prog 14
        tp 'sys_enter'  cookie 23925373020405760
        pids test_progs(176)

And in json mode:

 #bpftool link -j | jq

[
  {
    "id": 47,
    "type": "raw_tracepoint",
    "prog_id": 79,
    "tp_name": "sys_enter",
    "cookie": 23925373020405760,
    "pids": [
      {
        "pid": 274,
        "comm": "test_progs"
      }
    ]
  }
]

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20250603154309.3063644-3-chen.dylane@linux.dev
2025-06-05 11:44:52 -07:00
Tao Chen
25a0d04d38 selftests/bpf: Add cookies check for raw_tp fill_link_info test
Adding tests for getting cookie with fill_link_info for raw_tp.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250603154309.3063644-2-chen.dylane@linux.dev
2025-06-05 11:44:52 -07:00
Tao Chen
2fe1c59347 bpf: Add cookie to raw_tp bpf_link_info
After commit 68ca5d4eeb ("bpf: support BPF cookie in raw tracepoint
(raw_tp, tp_btf) programs"), we can show the cookie in bpf_link_info
like kprobe etc.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250603154309.3063644-1-chen.dylane@linux.dev
2025-06-05 11:44:52 -07:00
Sabrina Dubroca
7eb11c0ab7 xfrm: state: use a consistent pcpu_id in xfrm_state_find
If we get preempted during xfrm_state_find, we could run
xfrm_state_look_at using a different pcpu_id than the one
xfrm_state_find saw. This could lead to ignoring states that should
have matched, and triggering acquires on a CPU that already has a pcpu
state.

    xfrm_state_find starts on CPU1
    pcpu_id = 1
    lookup starts
    <preemption, we're now on CPU2>
    xfrm_state_look_at pcpu_id = 2
       finds a state
found:
    best->pcpu_num != pcpu_id (2 != 1)
    if (!x && !error && !acquire_in_progress) {
        ...
        xfrm_state_alloc
        xfrm_init_tempstate
        ...

This can be avoided by passing the original pcpu_id down to all
xfrm_state_look_at() calls.

Also switch to raw_smp_processor_id, disabling preempting just to
re-enable it immediately doesn't really make sense.

Fixes: 1ddf9916ac ("xfrm: Add support for per cpu xfrm state handling.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-06-03 15:39:57 +02:00
Sabrina Dubroca
94d077c331 xfrm: state: initialize state_ptrs earlier in xfrm_state_find
In case of preemption, xfrm_state_look_at will find a different
pcpu_id and look up states for that other CPU. If we matched a state
for CPU2 in the state_cache while the lookup started on CPU1, we will
jump to "found", but the "best" state that we got will be ignored and
we will enter the "acquire" block. This block uses state_ptrs, which
isn't initialized at this point.

Let's initialize state_ptrs just after taking rcu_read_lock. This will
also prevent a possible misuse in the future, if someone adjusts this
function.

Reported-by: syzbot+7ed9d47e15e88581dc5b@syzkaller.appspotmail.com
Fixes: e952837f3d ("xfrm: state: fix out-of-bounds read during lookup")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-06-03 15:39:48 +02:00
324 changed files with 11917 additions and 2798 deletions

View File

@ -694,6 +694,10 @@ Sedat Dilek <sedat.dilek@gmail.com> <sedat.dilek@credativ.de>
Senthilkumar N L <quic_snlakshm@quicinc.com> <snlakshm@codeaurora.org>
Serge Hallyn <sergeh@kernel.org> <serge.hallyn@canonical.com>
Serge Hallyn <sergeh@kernel.org> <serue@us.ibm.com>
Sergey Senozhatsky <senozhatsky@chromium.org> <sergey.senozhatsky.work@gmail.com>
Sergey Senozhatsky <senozhatsky@chromium.org> <sergey.senozhatsky@gmail.com>
Sergey Senozhatsky <senozhatsky@chromium.org> <sergey.senozhatsky@mail.by>
Sergey Senozhatsky <senozhatsky@chromium.org> <senozhatsky@google.com>
Seth Forshee <sforshee@kernel.org> <seth.forshee@canonical.com>
Shakeel Butt <shakeel.butt@linux.dev> <shakeelb@google.com>
Shannon Nelson <sln@onemain.com> <shannon.nelson@amd.com>

View File

@ -1397,6 +1397,10 @@ N: Thomas Gleixner
E: tglx@linutronix.de
D: NAND flash hardware support, JFFS2 on NAND flash
N: Jérôme Glisse
E: jglisse@redhat.com
D: HMM - Heterogeneous Memory Management
N: Richard E. Gooch
E: rgooch@atnf.csiro.au
D: parent process death signal to children

View File

@ -611,9 +611,10 @@ Q: I have added a new BPF instruction to the kernel, how can I integrate
it into LLVM?
A: LLVM has a ``-mcpu`` selector for the BPF back end in order to allow
the selection of BPF instruction set extensions. By default the
``generic`` processor target is used, which is the base instruction set
(v1) of BPF.
the selection of BPF instruction set extensions. Before llvm version 20,
the ``generic`` processor target is used, which is the base instruction
set (v1) of BPF. Since llvm 20, the default processor target has changed
to instruction set v3.
LLVM has an option to select ``-mcpu=probe`` where it will probe the host
kernel for supported BPF instruction set extensions and selects the

View File

@ -350,9 +350,9 @@ Underflow and overflow are allowed during arithmetic operations, meaning
the 64-bit or 32-bit value will wrap. If BPF program execution would
result in division by zero, the destination register is instead set to zero.
Otherwise, for ``ALU64``, if execution would result in ``LLONG_MIN``
dividing -1, the desination register is instead set to ``LLONG_MIN``. For
``ALU``, if execution would result in ``INT_MIN`` dividing -1, the
desination register is instead set to ``INT_MIN``.
divided by -1, the destination register is instead set to ``LLONG_MIN``. For
``ALU``, if execution would result in ``INT_MIN`` divided by -1, the
destination register is instead set to ``INT_MIN``.
If execution would result in modulo by zero, for ``ALU64`` the value of
the destination register is unchanged whereas for ``ALU`` the upper

View File

@ -110,7 +110,7 @@ examples:
reg = <0x01cb4000 0x1000>;
interrupts = <GIC_SPI 84 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&ccu CLK_BUS_CSI>,
<&ccu CLK_CSI1_SCLK>,
<&ccu CLK_CSI_SCLK>,
<&ccu CLK_DRAM_CSI>;
clock-names = "bus",
"mod",

View File

@ -79,7 +79,7 @@ examples:
reg = <0x01cb8000 0x1000>;
interrupts = <GIC_SPI 83 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&ccu CLK_BUS_CSI>,
<&ccu CLK_CSI1_SCLK>,
<&ccu CLK_CSI_SCLK>,
<&ccu CLK_DRAM_CSI>;
clock-names = "bus", "mod", "ram";
resets = <&ccu RST_BUS_CSI>;

View File

@ -103,7 +103,7 @@ examples:
reg = <0x01cb1000 0x1000>;
interrupts = <GIC_SPI 90 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&ccu CLK_BUS_CSI>,
<&ccu CLK_CSI1_SCLK>;
<&ccu CLK_CSI_SCLK>;
clock-names = "bus", "mod";
resets = <&ccu RST_BUS_CSI>;

View File

@ -11009,7 +11009,8 @@ F: Documentation/ABI/testing/debugfs-hisi-zip
F: drivers/crypto/hisilicon/zip/
HMM - Heterogeneous Memory Management
M: Jérôme Glisse <jglisse@redhat.com>
M: Jason Gunthorpe <jgg@nvidia.com>
M: Leon Romanovsky <leonro@nvidia.com>
L: linux-mm@kvack.org
S: Maintained
F: Documentation/mm/hmm.rst
@ -12188,9 +12189,8 @@ F: drivers/dma/idxd/*
F: include/uapi/linux/idxd.h
INTEL IN FIELD SCAN (IFS) DEVICE
M: Jithu Joseph <jithu.joseph@intel.com>
M: Tony Luck <tony.luck@intel.com>
R: Ashok Raj <ashok.raj.linux@gmail.com>
R: Tony Luck <tony.luck@intel.com>
S: Maintained
F: drivers/platform/x86/intel/ifs
F: include/trace/events/intel_ifs.h
@ -12530,8 +12530,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi.git
F: drivers/net/wireless/intel/iwlwifi/
INTEL WMI SLIM BOOTLOADER (SBL) FIRMWARE UPDATE DRIVER
M: Jithu Joseph <jithu.joseph@intel.com>
S: Maintained
S: Orphan
W: https://slimbootloader.github.io/security/firmware-update.html
F: drivers/platform/x86/intel/wmi/sbl-fw-update.c
@ -17385,6 +17384,7 @@ F: include/linux/ethtool.h
F: include/linux/framer/framer-provider.h
F: include/linux/framer/framer.h
F: include/linux/in.h
F: include/linux/in6.h
F: include/linux/indirect_call_wrapper.h
F: include/linux/inet.h
F: include/linux/inet_diag.h

View File

@ -2,7 +2,7 @@
VERSION = 6
PATCHLEVEL = 16
SUBLEVEL = 0
EXTRAVERSION = -rc7
EXTRAVERSION =
NAME = Baby Opossum Posse
# *DOCUMENTATION*

View File

@ -121,7 +121,7 @@ config ARM
select HAVE_KERNEL_XZ
select HAVE_KPROBES if !XIP_KERNEL && !CPU_ENDIAN_BE32 && !CPU_V7M
select HAVE_KRETPROBES if HAVE_KPROBES
select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if (LD_VERSION >= 23600 || LD_CAN_USE_KEEP_IN_OVERLAY)
select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if (LD_VERSION >= 23600 || LD_IS_LLD) && LD_CAN_USE_KEEP_IN_OVERLAY
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI
select HAVE_OPTPROBES if !THUMB2_KERNEL

View File

@ -149,7 +149,7 @@ endif
# Need -Uarm for gcc < 3.x
KBUILD_CPPFLAGS +=$(cpp-y)
KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_ISA) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm
KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include asm/unified.h -msoft-float
KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) -Wa,$(arch-y) $(tune-y) -include $(srctree)/arch/arm/include/asm/unified.h -msoft-float
KBUILD_RUSTFLAGS += --target=arm-unknown-linux-gnueabi
CHECKFLAGS += -D__arm__

View File

@ -652,7 +652,7 @@ csi1: camera@1cb4000 {
reg = <0x01cb4000 0x3000>;
interrupts = <GIC_SPI 84 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&ccu CLK_BUS_CSI>,
<&ccu CLK_CSI1_SCLK>,
<&ccu CLK_CSI_SCLK>,
<&ccu CLK_DRAM_CSI>;
clock-names = "bus", "mod", "ram";
resets = <&ccu RST_BUS_CSI>;

View File

@ -131,7 +131,7 @@ rgmii0_pins: rgmii0-pins {
"PH5", "PH6", "PH7", "PH9", "PH10",
"PH14", "PH15", "PH16", "PH17", "PH18";
allwinner,pinmux = <5>;
function = "emac0";
function = "gmac0";
drive-strength = <40>;
bias-disable;
};
@ -540,8 +540,8 @@ ohci1: usb@4200400 {
status = "disabled";
};
emac0: ethernet@4500000 {
compatible = "allwinner,sun55i-a523-emac0",
gmac0: ethernet@4500000 {
compatible = "allwinner,sun55i-a523-gmac0",
"allwinner,sun50i-a64-emac";
reg = <0x04500000 0x10000>;
clocks = <&ccu CLK_BUS_EMAC0>;

View File

@ -12,7 +12,7 @@ / {
compatible = "radxa,cubie-a5e", "allwinner,sun55i-a527";
aliases {
ethernet0 = &emac0;
ethernet0 = &gmac0;
serial0 = &uart0;
};
@ -55,7 +55,7 @@ &ehci1 {
status = "okay";
};
&emac0 {
&gmac0 {
phy-mode = "rgmii-id";
phy-handle = <&ext_rgmii_phy>;
phy-supply = <&reg_cldo3>;

View File

@ -12,7 +12,7 @@ / {
compatible = "yuzukihd,avaota-a1", "allwinner,sun55i-t527";
aliases {
ethernet0 = &emac0;
ethernet0 = &gmac0;
serial0 = &uart0;
};
@ -65,7 +65,7 @@ &ehci1 {
status = "okay";
};
&emac0 {
&gmac0 {
phy-mode = "rgmii-id";
phy-handle = <&ext_rgmii_phy>;
phy-supply = <&reg_dcdc4>;

View File

@ -29,7 +29,6 @@ led-lan1 {
function-enumerator = <1>;
gpios = <&gpio3 RK_PD6 GPIO_ACTIVE_HIGH>;
label = "LAN-1";
linux,default-trigger = "netdev";
};
led-lan2 {
@ -39,7 +38,6 @@ led-lan2 {
function-enumerator = <2>;
gpios = <&gpio3 RK_PD7 GPIO_ACTIVE_HIGH>;
label = "LAN-2";
linux,default-trigger = "netdev";
};
power_led: led-sys {
@ -56,7 +54,6 @@ led-wan {
function = LED_FUNCTION_WAN;
gpios = <&gpio2 RK_PC1 GPIO_ACTIVE_HIGH>;
label = "WAN";
linux,default-trigger = "netdev";
};
};
};

View File

@ -41,6 +41,11 @@
/*
* Save/restore interrupts.
*/
.macro save_and_disable_daif, flags
mrs \flags, daif
msr daifset, #0xf
.endm
.macro save_and_disable_irq, flags
mrs \flags, daif
msr daifset, #3

View File

@ -825,6 +825,7 @@ SYM_CODE_END(__bp_harden_el1_vectors)
*
*/
SYM_FUNC_START(cpu_switch_to)
save_and_disable_daif x11
mov x10, #THREAD_CPU_CONTEXT
add x8, x0, x10
mov x9, sp
@ -848,6 +849,7 @@ SYM_FUNC_START(cpu_switch_to)
ptrauth_keys_install_kernel x1, x8, x9, x10
scs_save x0
scs_load_current
restore_irq x11
ret
SYM_FUNC_END(cpu_switch_to)
NOKPROBE(cpu_switch_to)
@ -874,6 +876,7 @@ NOKPROBE(ret_from_fork)
* Calls func(regs) using this CPU's irq stack and shadow irq stack.
*/
SYM_FUNC_START(call_on_irq_stack)
save_and_disable_daif x9
#ifdef CONFIG_SHADOW_CALL_STACK
get_current_task x16
scs_save x16
@ -888,8 +891,10 @@ SYM_FUNC_START(call_on_irq_stack)
/* Move to the new stack and call the function there */
add sp, x16, #IRQ_STACK_SIZE
restore_irq x9
blr x1
save_and_disable_daif x9
/*
* Restore the SP from the FP, and restore the FP and LR from the frame
* record.
@ -897,6 +902,7 @@ SYM_FUNC_START(call_on_irq_stack)
mov sp, x29
ldp x29, x30, [sp], #16
scs_load_current
restore_irq x9
ret
SYM_FUNC_END(call_on_irq_stack)
NOKPROBE(call_on_irq_stack)

View File

@ -325,4 +325,9 @@
#define A64_MRS_SP_EL0(Rt) \
aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_SP_EL0)
/* Barriers */
#define A64_SB aarch64_insn_get_sb_value()
#define A64_DSB_NSH (aarch64_insn_get_dsb_base_value() | 0x7 << 8)
#define A64_ISB aarch64_insn_get_isb_value()
#endif /* _BPF_JIT_H */

View File

@ -30,6 +30,7 @@
#define TMP_REG_2 (MAX_BPF_JIT_REG + 1)
#define TCCNT_PTR (MAX_BPF_JIT_REG + 2)
#define TMP_REG_3 (MAX_BPF_JIT_REG + 3)
#define PRIVATE_SP (MAX_BPF_JIT_REG + 4)
#define ARENA_VM_START (MAX_BPF_JIT_REG + 5)
#define check_imm(bits, imm) do { \
@ -68,6 +69,8 @@ static const int bpf2a64[] = {
[TCCNT_PTR] = A64_R(26),
/* temporary register for blinding constants */
[BPF_REG_AX] = A64_R(9),
/* callee saved register for private stack pointer */
[PRIVATE_SP] = A64_R(27),
/* callee saved register for kern_vm_start address */
[ARENA_VM_START] = A64_R(28),
};
@ -86,6 +89,7 @@ struct jit_ctx {
u64 user_vm_start;
u64 arena_vm_start;
bool fp_used;
bool priv_sp_used;
bool write;
};
@ -98,6 +102,10 @@ struct bpf_plt {
#define PLT_TARGET_SIZE sizeof_field(struct bpf_plt, target)
#define PLT_TARGET_OFFSET offsetof(struct bpf_plt, target)
/* Memory size/value to protect private stack overflow/underflow */
#define PRIV_STACK_GUARD_SZ 16
#define PRIV_STACK_GUARD_VAL 0xEB9F12345678eb9fULL
static inline void emit(const u32 insn, struct jit_ctx *ctx)
{
if (ctx->image != NULL && ctx->write)
@ -387,8 +395,11 @@ static void find_used_callee_regs(struct jit_ctx *ctx)
if (reg_used & 8)
ctx->used_callee_reg[i++] = bpf2a64[BPF_REG_9];
if (reg_used & 16)
if (reg_used & 16) {
ctx->used_callee_reg[i++] = bpf2a64[BPF_REG_FP];
if (ctx->priv_sp_used)
ctx->used_callee_reg[i++] = bpf2a64[PRIVATE_SP];
}
if (ctx->arena_vm_start)
ctx->used_callee_reg[i++] = bpf2a64[ARENA_VM_START];
@ -412,6 +423,7 @@ static void push_callee_regs(struct jit_ctx *ctx)
emit(A64_PUSH(A64_R(23), A64_R(24), A64_SP), ctx);
emit(A64_PUSH(A64_R(25), A64_R(26), A64_SP), ctx);
emit(A64_PUSH(A64_R(27), A64_R(28), A64_SP), ctx);
ctx->fp_used = true;
} else {
find_used_callee_regs(ctx);
for (i = 0; i + 1 < ctx->nr_used_callee_reg; i += 2) {
@ -461,6 +473,19 @@ static void pop_callee_regs(struct jit_ctx *ctx)
}
}
static void emit_percpu_ptr(const u8 dst_reg, void __percpu *ptr,
struct jit_ctx *ctx)
{
const u8 tmp = bpf2a64[TMP_REG_1];
emit_a64_mov_i64(dst_reg, (__force const u64)ptr, ctx);
if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
emit(A64_MRS_TPIDR_EL2(tmp), ctx);
else
emit(A64_MRS_TPIDR_EL1(tmp), ctx);
emit(A64_ADD(1, dst_reg, dst_reg, tmp), ctx);
}
#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
@ -476,6 +501,8 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
const bool is_main_prog = !bpf_is_subprog(prog);
const u8 fp = bpf2a64[BPF_REG_FP];
const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
const u8 priv_sp = bpf2a64[PRIVATE_SP];
void __percpu *priv_stack_ptr;
const int idx0 = ctx->idx;
int cur_offset;
@ -551,15 +578,23 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
emit(A64_SUB_I(1, A64_SP, A64_FP, 96), ctx);
}
if (ctx->fp_used)
/* Set up BPF prog stack base register */
emit(A64_MOV(1, fp, A64_SP), ctx);
/* Stack must be multiples of 16B */
ctx->stack_size = round_up(prog->aux->stack_depth, 16);
if (ctx->fp_used) {
if (ctx->priv_sp_used) {
/* Set up private stack pointer */
priv_stack_ptr = prog->aux->priv_stack_ptr + PRIV_STACK_GUARD_SZ;
emit_percpu_ptr(priv_sp, priv_stack_ptr, ctx);
emit(A64_ADD_I(1, fp, priv_sp, ctx->stack_size), ctx);
} else {
/* Set up BPF prog stack base register */
emit(A64_MOV(1, fp, A64_SP), ctx);
}
}
/* Set up function call stack */
if (ctx->stack_size)
if (ctx->stack_size && !ctx->priv_sp_used)
emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
if (ctx->arena_vm_start)
@ -623,7 +658,7 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
emit(A64_STR64I(tcc, ptr, 0), ctx);
/* restore SP */
if (ctx->stack_size)
if (ctx->stack_size && !ctx->priv_sp_used)
emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
pop_callee_regs(ctx);
@ -991,7 +1026,7 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
const u8 ptr = bpf2a64[TCCNT_PTR];
/* We're done with BPF stack */
if (ctx->stack_size)
if (ctx->stack_size && !ctx->priv_sp_used)
emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
pop_callee_regs(ctx);
@ -1120,6 +1155,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
const u8 tmp2 = bpf2a64[TMP_REG_2];
const u8 fp = bpf2a64[BPF_REG_FP];
const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
const u8 priv_sp = bpf2a64[PRIVATE_SP];
const s16 off = insn->off;
const s32 imm = insn->imm;
const int i = insn - ctx->prog->insnsi;
@ -1564,7 +1600,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
src = tmp2;
}
if (src == fp) {
src_adj = A64_SP;
src_adj = ctx->priv_sp_used ? priv_sp : A64_SP;
off_adj = off + ctx->stack_size;
} else {
src_adj = src;
@ -1630,17 +1666,14 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
return ret;
break;
/* speculation barrier */
/* speculation barrier against v1 and v4 */
case BPF_ST | BPF_NOSPEC:
/*
* Nothing required here.
*
* In case of arm64, we rely on the firmware mitigation of
* Speculative Store Bypass as controlled via the ssbd kernel
* parameter. Whenever the mitigation is enabled, it works
* for all of the kernel code with no need to provide any
* additional instructions.
*/
if (alternative_has_cap_likely(ARM64_HAS_SB)) {
emit(A64_SB, ctx);
} else {
emit(A64_DSB_NSH, ctx);
emit(A64_ISB, ctx);
}
break;
/* ST: *(size *)(dst + off) = imm */
@ -1657,7 +1690,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
dst = tmp2;
}
if (dst == fp) {
dst_adj = A64_SP;
dst_adj = ctx->priv_sp_used ? priv_sp : A64_SP;
off_adj = off + ctx->stack_size;
} else {
dst_adj = dst;
@ -1719,7 +1752,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
dst = tmp2;
}
if (dst == fp) {
dst_adj = A64_SP;
dst_adj = ctx->priv_sp_used ? priv_sp : A64_SP;
off_adj = off + ctx->stack_size;
} else {
dst_adj = dst;
@ -1862,6 +1895,39 @@ static inline void bpf_flush_icache(void *start, void *end)
flush_icache_range((unsigned long)start, (unsigned long)end);
}
static void priv_stack_init_guard(void __percpu *priv_stack_ptr, int alloc_size)
{
int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
u64 *stack_ptr;
for_each_possible_cpu(cpu) {
stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
stack_ptr[0] = PRIV_STACK_GUARD_VAL;
stack_ptr[1] = PRIV_STACK_GUARD_VAL;
stack_ptr[underflow_idx] = PRIV_STACK_GUARD_VAL;
stack_ptr[underflow_idx + 1] = PRIV_STACK_GUARD_VAL;
}
}
static void priv_stack_check_guard(void __percpu *priv_stack_ptr, int alloc_size,
struct bpf_prog *prog)
{
int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
u64 *stack_ptr;
for_each_possible_cpu(cpu) {
stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
if (stack_ptr[0] != PRIV_STACK_GUARD_VAL ||
stack_ptr[1] != PRIV_STACK_GUARD_VAL ||
stack_ptr[underflow_idx] != PRIV_STACK_GUARD_VAL ||
stack_ptr[underflow_idx + 1] != PRIV_STACK_GUARD_VAL) {
pr_err("BPF private stack overflow/underflow detected for prog %sx\n",
bpf_jit_get_prog_name(prog));
break;
}
}
}
struct arm64_jit_data {
struct bpf_binary_header *header;
u8 *ro_image;
@ -1874,9 +1940,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
int image_size, prog_size, extable_size, extable_align, extable_offset;
struct bpf_prog *tmp, *orig_prog = prog;
struct bpf_binary_header *header;
struct bpf_binary_header *ro_header;
struct bpf_binary_header *ro_header = NULL;
struct arm64_jit_data *jit_data;
void __percpu *priv_stack_ptr = NULL;
bool was_classic = bpf_prog_was_classic(prog);
int priv_stack_alloc_sz;
bool tmp_blinded = false;
bool extra_pass = false;
struct jit_ctx ctx;
@ -1908,6 +1976,23 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
}
prog->aux->jit_data = jit_data;
}
priv_stack_ptr = prog->aux->priv_stack_ptr;
if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
/* Allocate actual private stack size with verifier-calculated
* stack size plus two memory guards to protect overflow and
* underflow.
*/
priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 16) +
2 * PRIV_STACK_GUARD_SZ;
priv_stack_ptr = __alloc_percpu_gfp(priv_stack_alloc_sz, 16, GFP_KERNEL);
if (!priv_stack_ptr) {
prog = orig_prog;
goto out_priv_stack;
}
priv_stack_init_guard(priv_stack_ptr, priv_stack_alloc_sz);
prog->aux->priv_stack_ptr = priv_stack_ptr;
}
if (jit_data->ctx.offset) {
ctx = jit_data->ctx;
ro_image_ptr = jit_data->ro_image;
@ -1931,6 +2016,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
ctx.user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
ctx.arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
if (priv_stack_ptr)
ctx.priv_sp_used = true;
/* Pass 1: Estimate the maximum image size.
*
* BPF line info needs ctx->offset[i] to be the offset of
@ -2070,7 +2158,12 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
ctx.offset[i] *= AARCH64_INSN_SIZE;
bpf_prog_fill_jited_linfo(prog, ctx.offset + 1);
out_off:
if (!ro_header && priv_stack_ptr) {
free_percpu(priv_stack_ptr);
prog->aux->priv_stack_ptr = NULL;
}
kvfree(ctx.offset);
out_priv_stack:
kfree(jit_data);
prog->aux->jit_data = NULL;
}
@ -2089,6 +2182,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
goto out_off;
}
bool bpf_jit_supports_private_stack(void)
{
return true;
}
bool bpf_jit_supports_kfunc_call(void)
{
return true;
@ -2243,11 +2341,6 @@ static int calc_arg_aux(const struct btf_func_model *m,
/* the rest arguments are passed through stack */
for (; i < m->nr_args; i++) {
/* We can not know for sure about exact alignment needs for
* struct passed on stack, so deny those
*/
if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
return -ENOTSUPP;
stack_slots = (m->arg_size[i] + 7) / 8;
a->bstack_for_args += stack_slots * 8;
a->ostack_for_args = a->ostack_for_args + stack_slots * 8;
@ -2911,6 +3004,17 @@ bool bpf_jit_supports_percpu_insn(void)
return true;
}
bool bpf_jit_bypass_spec_v4(void)
{
/* In case of arm64, we rely on the firmware mitigation of Speculative
* Store Bypass as controlled via the ssbd kernel parameter. Whenever
* the mitigation is enabled, it works for all of the kernel code with
* no need to provide any additional instructions. Therefore, skip
* inserting nospec insns against Spectre v4.
*/
return true;
}
bool bpf_jit_inlines_helper_call(s32 imm)
{
switch (imm) {
@ -2928,6 +3032,8 @@ void bpf_jit_free(struct bpf_prog *prog)
if (prog->jited) {
struct arm64_jit_data *jit_data = prog->aux->jit_data;
struct bpf_binary_header *hdr;
void __percpu *priv_stack_ptr;
int priv_stack_alloc_sz;
/*
* If we fail the final pass of JIT (from jit_subprogs),
@ -2941,6 +3047,13 @@ void bpf_jit_free(struct bpf_prog *prog)
}
hdr = bpf_jit_binary_pack_hdr(prog);
bpf_jit_binary_pack_free(hdr, NULL);
priv_stack_ptr = prog->aux->priv_stack_ptr;
if (priv_stack_ptr) {
priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 16) +
2 * PRIV_STACK_GUARD_SZ;
priv_stack_check_guard(priv_stack_ptr, priv_stack_alloc_sz, prog);
free_percpu(prog->aux->priv_stack_ptr);
}
WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
}

View File

@ -41,6 +41,15 @@ linux,cma {
};
};
&apbdma3 {
status = "okay";
};
&mmc0 {
status = "okay";
bus-width = <4>;
};
&gmac0 {
status = "okay";

View File

@ -104,7 +104,7 @@ dma-controller@1fe10c10 {
status = "disabled";
};
dma-controller@1fe10c20 {
apbdma2: dma-controller@1fe10c20 {
compatible = "loongson,ls2k0500-apbdma", "loongson,ls2k1000-apbdma";
reg = <0 0x1fe10c20 0 0x8>;
interrupt-parent = <&eiointc>;
@ -114,7 +114,7 @@ dma-controller@1fe10c20 {
status = "disabled";
};
dma-controller@1fe10c30 {
apbdma3: dma-controller@1fe10c30 {
compatible = "loongson,ls2k0500-apbdma", "loongson,ls2k1000-apbdma";
reg = <0 0x1fe10c30 0 0x8>;
interrupt-parent = <&eiointc>;
@ -437,6 +437,30 @@ i2c@1ff4a800 {
status = "disabled";
};
mmc0: mmc@1ff64000 {
compatible = "loongson,ls2k0500-mmc";
reg = <0 0x1ff64000 0 0x2000>,
<0 0x1fe10100 0 0x4>;
interrupt-parent = <&eiointc>;
interrupts = <57>;
dmas = <&apbdma3 0>;
dma-names = "rx-tx";
clocks = <&clk LOONGSON2_APB_CLK>;
status = "disabled";
};
mmc@1ff66000 {
compatible = "loongson,ls2k0500-mmc";
reg = <0 0x1ff66000 0 0x2000>,
<0 0x1fe10100 0 0x4>;
interrupt-parent = <&eiointc>;
interrupts = <58>;
dmas = <&apbdma2 0>;
dma-names = "rx-tx";
clocks = <&clk LOONGSON2_APB_CLK>;
status = "disabled";
};
pmc: power-management@1ff6c000 {
compatible = "loongson,ls2k0500-pmc", "syscon";
reg = <0x0 0x1ff6c000 0x0 0x58>;

View File

@ -48,6 +48,19 @@ fan0: pwm-fan {
};
};
&apbdma1 {
status = "okay";
};
&mmc {
status = "okay";
pinctrl-0 = <&sdio_pins_default>;
pinctrl-names = "default";
bus-width = <4>;
cd-gpios = <&gpio0 22 GPIO_ACTIVE_LOW>;
};
&gmac0 {
status = "okay";

View File

@ -187,14 +187,14 @@ gpio0: gpio@1fe00500 {
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<0 IRQ_TYPE_NONE>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_NONE>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_LEVEL_HIGH>,
<26 IRQ_TYPE_LEVEL_HIGH>,
@ -209,13 +209,13 @@ gpio0: gpio@1fe00500 {
<27 IRQ_TYPE_LEVEL_HIGH>,
<27 IRQ_TYPE_LEVEL_HIGH>,
<27 IRQ_TYPE_LEVEL_HIGH>,
<>,
<0 IRQ_TYPE_NONE>,
<27 IRQ_TYPE_LEVEL_HIGH>,
<27 IRQ_TYPE_LEVEL_HIGH>,
<27 IRQ_TYPE_LEVEL_HIGH>,
<27 IRQ_TYPE_LEVEL_HIGH>,
<>,
<>,
<0 IRQ_TYPE_NONE>,
<0 IRQ_TYPE_NONE>,
<27 IRQ_TYPE_LEVEL_HIGH>,
<27 IRQ_TYPE_LEVEL_HIGH>,
<27 IRQ_TYPE_LEVEL_HIGH>,
@ -256,7 +256,7 @@ dma-controller@1fe00c00 {
status = "disabled";
};
dma-controller@1fe00c10 {
apbdma1: dma-controller@1fe00c10 {
compatible = "loongson,ls2k1000-apbdma";
reg = <0x0 0x1fe00c10 0x0 0x8>;
interrupt-parent = <&liointc1>;
@ -405,6 +405,18 @@ i2s: i2s@1fe2d000 {
status = "disabled";
};
mmc: mmc@1fe2c000 {
compatible = "loongson,ls2k1000-mmc";
reg = <0 0x1fe2c000 0 0x68>,
<0 0x1fe00438 0 0x8>;
interrupt-parent = <&liointc0>;
interrupts = <31 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&clk LOONGSON2_APB_CLK>;
dmas = <&apbdma1 0>;
dma-names = "rx-tx";
status = "disabled";
};
spi0: spi@1fff0220 {
compatible = "loongson,ls2k1000-spi";
reg = <0x0 0x1fff0220 0x0 0x10>;

View File

@ -39,6 +39,16 @@ linux,cma {
};
};
&emmc {
status = "okay";
bus-width = <8>;
cap-mmc-highspeed;
mmc-hs200-1_8v;
no-sd;
no-sdio;
};
&sata {
status = "okay";
};

View File

@ -259,6 +259,24 @@ uart0: serial@1fe001e0 {
status = "disabled";
};
emmc: mmc@79990000 {
compatible = "loongson,ls2k2000-mmc";
reg = <0x0 0x79990000 0x0 0x1000>;
interrupt-parent = <&pic>;
interrupts = <51 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&clk LOONGSON2_EMMC_CLK>;
status = "disabled";
};
mmc@79991000 {
compatible = "loongson,ls2k2000-mmc";
reg = <0x0 0x79991000 0x0 0x1000>;
interrupt-parent = <&pic>;
interrupts = <50 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&clk LOONGSON2_EMMC_CLK>;
status = "disabled";
};
pcie@1a000000 {
compatible = "loongson,ls2k-pci";
reg = <0x0 0x1a000000 0x0 0x02000000>,

View File

@ -497,6 +497,7 @@ void arch_simulate_insn(union loongarch_instruction insn, struct pt_regs *regs);
int larch_insn_read(void *addr, u32 *insnp);
int larch_insn_write(void *addr, u32 insn);
int larch_insn_patch_text(void *addr, u32 insn);
int larch_insn_text_copy(void *dst, void *src, size_t len);
u32 larch_insn_gen_nop(void);
u32 larch_insn_gen_b(unsigned long pc, unsigned long dest);
@ -510,6 +511,8 @@ u32 larch_insn_gen_move(enum loongarch_gpr rd, enum loongarch_gpr rj);
u32 larch_insn_gen_lu12iw(enum loongarch_gpr rd, int imm);
u32 larch_insn_gen_lu32id(enum loongarch_gpr rd, int imm);
u32 larch_insn_gen_lu52id(enum loongarch_gpr rd, enum loongarch_gpr rj, int imm);
u32 larch_insn_gen_beq(enum loongarch_gpr rd, enum loongarch_gpr rj, int imm);
u32 larch_insn_gen_bne(enum loongarch_gpr rd, enum loongarch_gpr rj, int imm);
u32 larch_insn_gen_jirl(enum loongarch_gpr rd, enum loongarch_gpr rj, int imm);
static inline bool signed_imm_check(long val, unsigned int bit)

View File

@ -451,6 +451,13 @@
#define LOONGARCH_CSR_KS6 0x36
#define LOONGARCH_CSR_KS7 0x37
#define LOONGARCH_CSR_KS8 0x38
#define LOONGARCH_CSR_KS9 0x39
#define LOONGARCH_CSR_KS10 0x3a
#define LOONGARCH_CSR_KS11 0x3b
#define LOONGARCH_CSR_KS12 0x3c
#define LOONGARCH_CSR_KS13 0x3d
#define LOONGARCH_CSR_KS14 0x3e
#define LOONGARCH_CSR_KS15 0x3f
/* Exception allocated KS0, KS1 and KS2 statically */
#define EXCEPTION_KS0 LOONGARCH_CSR_KS0

View File

@ -39,16 +39,19 @@ void __init init_environ(void)
static int __init init_cpu_fullname(void)
{
struct device_node *root;
int cpu, ret;
char *model;
char *cpuname;
const char *model;
struct device_node *root;
/* Parsing cpuname from DTS model property */
root = of_find_node_by_path("/");
ret = of_property_read_string(root, "model", (const char **)&model);
ret = of_property_read_string(root, "model", &model);
if (ret == 0) {
cpuname = kstrdup(model, GFP_KERNEL);
loongson_sysconf.cpuname = strsep(&cpuname, " ");
}
of_node_put(root);
if (ret == 0)
loongson_sysconf.cpuname = strsep(&model, " ");
if (loongson_sysconf.cpuname && !strncmp(loongson_sysconf.cpuname, "Loongson", 8)) {
for (cpu = 0; cpu < NR_CPUS; cpu++)

View File

@ -4,6 +4,7 @@
*/
#include <linux/sizes.h>
#include <linux/uaccess.h>
#include <linux/set_memory.h>
#include <asm/cacheflush.h>
#include <asm/inst.h>
@ -218,6 +219,32 @@ int larch_insn_patch_text(void *addr, u32 insn)
return ret;
}
int larch_insn_text_copy(void *dst, void *src, size_t len)
{
int ret;
unsigned long flags;
unsigned long dst_start, dst_end, dst_len;
dst_start = round_down((unsigned long)dst, PAGE_SIZE);
dst_end = round_up((unsigned long)dst + len, PAGE_SIZE);
dst_len = dst_end - dst_start; /* page-aligned */
set_memory_rw(dst_start, dst_len / PAGE_SIZE);
raw_spin_lock_irqsave(&patch_lock, flags);
ret = copy_to_kernel_nofault(dst, src, len);
if (ret)
pr_err("%s: operation failed\n", __func__);
raw_spin_unlock_irqrestore(&patch_lock, flags);
set_memory_rox(dst_start, dst_len / PAGE_SIZE);
if (!ret)
flush_icache_range((unsigned long)dst, (unsigned long)dst + len);
return ret;
}
u32 larch_insn_gen_nop(void)
{
return INSN_NOP;
@ -323,6 +350,34 @@ u32 larch_insn_gen_lu52id(enum loongarch_gpr rd, enum loongarch_gpr rj, int imm)
return insn.word;
}
u32 larch_insn_gen_beq(enum loongarch_gpr rd, enum loongarch_gpr rj, int imm)
{
union loongarch_instruction insn;
if ((imm & 3) || imm < -SZ_128K || imm >= SZ_128K) {
pr_warn("The generated beq instruction is out of range.\n");
return INSN_BREAK;
}
emit_beq(&insn, rj, rd, imm >> 2);
return insn.word;
}
u32 larch_insn_gen_bne(enum loongarch_gpr rd, enum loongarch_gpr rj, int imm)
{
union loongarch_instruction insn;
if ((imm & 3) || imm < -SZ_128K || imm >= SZ_128K) {
pr_warn("The generated bne instruction is out of range.\n");
return INSN_BREAK;
}
emit_bne(&insn, rj, rd, imm >> 2);
return insn.word;
}
u32 larch_insn_gen_jirl(enum loongarch_gpr rd, enum loongarch_gpr rj, int imm)
{
union loongarch_instruction insn;

View File

@ -109,4 +109,4 @@ SYM_CODE_END(kexec_smp_wait)
relocate_new_kernel_end:
.section ".data"
SYM_DATA(relocate_new_kernel_size, .long relocate_new_kernel_end - relocate_new_kernel)
SYM_DATA(relocate_new_kernel_size, .quad relocate_new_kernel_end - relocate_new_kernel)

View File

@ -191,6 +191,16 @@ static int __init early_parse_mem(char *p)
return -EINVAL;
}
start = 0;
size = memparse(p, &p);
if (*p == '@') /* Every mem=... should contain '@' */
start = memparse(p + 1, &p);
else { /* Only one mem=... is allowed if no '@' */
usermem = 1;
memblock_enforce_memory_limit(size);
return 0;
}
/*
* If a user specifies memory size, we
* blow away any automatically generated
@ -201,14 +211,6 @@ static int __init early_parse_mem(char *p)
memblock_remove(memblock_start_of_DRAM(),
memblock_end_of_DRAM() - memblock_start_of_DRAM());
}
start = 0;
size = memparse(p, &p);
if (*p == '@')
start = memparse(p + 1, &p);
else {
pr_err("Invalid format!\n");
return -EINVAL;
}
if (!IS_ENABLED(CONFIG_NUMA))
memblock_add(start, size);

View File

@ -508,7 +508,7 @@ bool unwind_next_frame(struct unwind_state *state)
state->pc = bt_address(pc);
if (!state->pc) {
pr_err("cannot find unwind pc at %pK\n", (void *)pc);
pr_err("cannot find unwind pc at %p\n", (void *)pc);
goto err;
}

View File

@ -4,13 +4,20 @@
*
* Copyright (C) 2022 Loongson Technology Corporation Limited
*/
#include <linux/memory.h>
#include "bpf_jit.h"
#define REG_TCC LOONGARCH_GPR_A6
#define TCC_SAVED LOONGARCH_GPR_S5
#define LOONGARCH_MAX_REG_ARGS 8
#define SAVE_RA BIT(0)
#define SAVE_TCC BIT(1)
#define LOONGARCH_LONG_JUMP_NINSNS 5
#define LOONGARCH_LONG_JUMP_NBYTES (LOONGARCH_LONG_JUMP_NINSNS * 4)
#define LOONGARCH_FENTRY_NINSNS 2
#define LOONGARCH_FENTRY_NBYTES (LOONGARCH_FENTRY_NINSNS * 4)
#define LOONGARCH_BPF_FENTRY_NBYTES (LOONGARCH_LONG_JUMP_NINSNS * 4)
#define REG_TCC LOONGARCH_GPR_A6
#define BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack) (round_up(stack, 16) - 80)
static const int regmap[] = {
/* return value from in-kernel function, and exit value for eBPF program */
@ -32,32 +39,57 @@ static const int regmap[] = {
[BPF_REG_AX] = LOONGARCH_GPR_T0,
};
static void mark_call(struct jit_ctx *ctx)
static void prepare_bpf_tail_call_cnt(struct jit_ctx *ctx, int *store_offset)
{
ctx->flags |= SAVE_RA;
}
const struct bpf_prog *prog = ctx->prog;
const bool is_main_prog = !bpf_is_subprog(prog);
static void mark_tail_call(struct jit_ctx *ctx)
{
ctx->flags |= SAVE_TCC;
}
if (is_main_prog) {
/*
* LOONGARCH_GPR_T3 = MAX_TAIL_CALL_CNT
* if (REG_TCC > T3 )
* std REG_TCC -> LOONGARCH_GPR_SP + store_offset
* else
* std REG_TCC -> LOONGARCH_GPR_SP + store_offset
* REG_TCC = LOONGARCH_GPR_SP + store_offset
*
* std REG_TCC -> LOONGARCH_GPR_SP + store_offset
*
* The purpose of this code is to first push the TCC into stack,
* and then push the address of TCC into stack.
* In cases where bpf2bpf and tailcall are used in combination,
* the value in REG_TCC may be a count or an address,
* these two cases need to be judged and handled separately.
*/
emit_insn(ctx, addid, LOONGARCH_GPR_T3, LOONGARCH_GPR_ZERO, MAX_TAIL_CALL_CNT);
*store_offset -= sizeof(long);
static bool seen_call(struct jit_ctx *ctx)
{
return (ctx->flags & SAVE_RA);
}
emit_cond_jmp(ctx, BPF_JGT, REG_TCC, LOONGARCH_GPR_T3, 4);
static bool seen_tail_call(struct jit_ctx *ctx)
{
return (ctx->flags & SAVE_TCC);
}
/*
* If REG_TCC < MAX_TAIL_CALL_CNT, the value in REG_TCC is a count,
* push tcc into stack
*/
emit_insn(ctx, std, REG_TCC, LOONGARCH_GPR_SP, *store_offset);
static u8 tail_call_reg(struct jit_ctx *ctx)
{
if (seen_call(ctx))
return TCC_SAVED;
/* Push the address of TCC into the REG_TCC */
emit_insn(ctx, addid, REG_TCC, LOONGARCH_GPR_SP, *store_offset);
return REG_TCC;
emit_uncond_jmp(ctx, 2);
/*
* If REG_TCC > MAX_TAIL_CALL_CNT, the value in REG_TCC is an address,
* push tcc_ptr into stack
*/
emit_insn(ctx, std, REG_TCC, LOONGARCH_GPR_SP, *store_offset);
} else {
*store_offset -= sizeof(long);
emit_insn(ctx, std, REG_TCC, LOONGARCH_GPR_SP, *store_offset);
}
/* Push tcc_ptr into stack */
*store_offset -= sizeof(long);
emit_insn(ctx, std, REG_TCC, LOONGARCH_GPR_SP, *store_offset);
}
/*
@ -80,6 +112,10 @@ static u8 tail_call_reg(struct jit_ctx *ctx)
* | $s4 |
* +-------------------------+
* | $s5 |
* +-------------------------+
* | tcc |
* +-------------------------+
* | tcc_ptr |
* +-------------------------+ <--BPF_REG_FP
* | prog->aux->stack_depth |
* | (optional) |
@ -88,22 +124,32 @@ static u8 tail_call_reg(struct jit_ctx *ctx)
*/
static void build_prologue(struct jit_ctx *ctx)
{
int stack_adjust = 0, store_offset, bpf_stack_adjust;
int i, stack_adjust = 0, store_offset, bpf_stack_adjust;
const struct bpf_prog *prog = ctx->prog;
const bool is_main_prog = !bpf_is_subprog(prog);
bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
/* To store ra, fp, s0, s1, s2, s3, s4 and s5. */
/* To store ra, fp, s0, s1, s2, s3, s4, s5 */
stack_adjust += sizeof(long) * 8;
/* To store tcc and tcc_ptr */
stack_adjust += sizeof(long) * 2;
stack_adjust = round_up(stack_adjust, 16);
stack_adjust += bpf_stack_adjust;
/* Reserve space for the move_imm + jirl instruction */
for (i = 0; i < LOONGARCH_LONG_JUMP_NINSNS; i++)
emit_insn(ctx, nop);
/*
* First instruction initializes the tail call count (TCC).
* On tail call we skip this instruction, and the TCC is
* passed in REG_TCC from the caller.
* First instruction initializes the tail call count (TCC)
* register to zero. On tail call we skip this instruction,
* and the TCC is passed in REG_TCC from the caller.
*/
emit_insn(ctx, addid, REG_TCC, LOONGARCH_GPR_ZERO, MAX_TAIL_CALL_CNT);
if (is_main_prog)
emit_insn(ctx, addid, REG_TCC, LOONGARCH_GPR_ZERO, 0);
emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, -stack_adjust);
@ -131,20 +177,13 @@ static void build_prologue(struct jit_ctx *ctx)
store_offset -= sizeof(long);
emit_insn(ctx, std, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, store_offset);
prepare_bpf_tail_call_cnt(ctx, &store_offset);
emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_adjust);
if (bpf_stack_adjust)
emit_insn(ctx, addid, regmap[BPF_REG_FP], LOONGARCH_GPR_SP, bpf_stack_adjust);
/*
* Program contains calls and tail calls, so REG_TCC need
* to be saved across calls.
*/
if (seen_tail_call(ctx) && seen_call(ctx))
move_reg(ctx, TCC_SAVED, REG_TCC);
else
emit_insn(ctx, nop);
ctx->stack_size = stack_adjust;
}
@ -177,6 +216,16 @@ static void __build_epilogue(struct jit_ctx *ctx, bool is_tail_call)
load_offset -= sizeof(long);
emit_insn(ctx, ldd, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, load_offset);
/*
* When push into the stack, follow the order of tcc then tcc_ptr.
* When pop from the stack, first pop tcc_ptr then followed by tcc.
*/
load_offset -= 2 * sizeof(long);
emit_insn(ctx, ldd, REG_TCC, LOONGARCH_GPR_SP, load_offset);
load_offset += sizeof(long);
emit_insn(ctx, ldd, REG_TCC, LOONGARCH_GPR_SP, load_offset);
emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, stack_adjust);
if (!is_tail_call) {
@ -189,7 +238,7 @@ static void __build_epilogue(struct jit_ctx *ctx, bool is_tail_call)
* Call the next bpf prog and skip the first instruction
* of TCC initialization.
*/
emit_insn(ctx, jirl, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_T3, 1);
emit_insn(ctx, jirl, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_T3, 6);
}
}
@ -208,12 +257,10 @@ bool bpf_jit_supports_far_kfunc_call(void)
return true;
}
/* initialized on the first pass of build_body() */
static int out_offset = -1;
static int emit_bpf_tail_call(struct jit_ctx *ctx)
static int emit_bpf_tail_call(struct jit_ctx *ctx, int insn)
{
int off;
u8 tcc = tail_call_reg(ctx);
int off, tc_ninsn = 0;
int tcc_ptr_off = BPF_TAIL_CALL_CNT_PTR_STACK_OFF(ctx->stack_size);
u8 a1 = LOONGARCH_GPR_A1;
u8 a2 = LOONGARCH_GPR_A2;
u8 t1 = LOONGARCH_GPR_T1;
@ -222,7 +269,7 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
const int idx0 = ctx->idx;
#define cur_offset (ctx->idx - idx0)
#define jmp_offset (out_offset - (cur_offset))
#define jmp_offset (tc_ninsn - (cur_offset))
/*
* a0: &ctx
@ -232,6 +279,7 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
* if (index >= array->map.max_entries)
* goto out;
*/
tc_ninsn = insn ? ctx->offset[insn+1] - ctx->offset[insn] : ctx->offset[0];
off = offsetof(struct bpf_array, map.max_entries);
emit_insn(ctx, ldwu, t1, a1, off);
/* bgeu $a2, $t1, jmp_offset */
@ -239,11 +287,15 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
goto toofar;
/*
* if (--TCC < 0)
* goto out;
* if ((*tcc_ptr)++ >= MAX_TAIL_CALL_CNT)
* goto out;
*/
emit_insn(ctx, addid, REG_TCC, tcc, -1);
if (emit_tailcall_jmp(ctx, BPF_JSLT, REG_TCC, LOONGARCH_GPR_ZERO, jmp_offset) < 0)
emit_insn(ctx, ldd, REG_TCC, LOONGARCH_GPR_SP, tcc_ptr_off);
emit_insn(ctx, ldd, t3, REG_TCC, 0);
emit_insn(ctx, addid, t3, t3, 1);
emit_insn(ctx, std, t3, REG_TCC, 0);
emit_insn(ctx, addid, t2, LOONGARCH_GPR_ZERO, MAX_TAIL_CALL_CNT);
if (emit_tailcall_jmp(ctx, BPF_JSGT, t3, t2, jmp_offset) < 0)
goto toofar;
/*
@ -263,15 +315,6 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
emit_insn(ctx, ldd, t3, t2, off);
__build_epilogue(ctx, true);
/* out: */
if (out_offset == -1)
out_offset = cur_offset;
if (cur_offset != out_offset) {
pr_err_once("tail_call out_offset = %d, expected %d!\n",
cur_offset, out_offset);
return -1;
}
return 0;
toofar:
@ -463,7 +506,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
u64 func_addr;
bool func_addr_fixed, sign_extend;
int i = insn - ctx->prog->insnsi;
int ret, jmp_offset;
int ret, jmp_offset, tcc_ptr_off;
const u8 code = insn->code;
const u8 cond = BPF_OP(code);
const u8 t1 = LOONGARCH_GPR_T1;
@ -899,12 +942,16 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
/* function call */
case BPF_JMP | BPF_CALL:
mark_call(ctx);
ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
&func_addr, &func_addr_fixed);
if (ret < 0)
return ret;
if (insn->src_reg == BPF_PSEUDO_CALL) {
tcc_ptr_off = BPF_TAIL_CALL_CNT_PTR_STACK_OFF(ctx->stack_size);
emit_insn(ctx, ldd, REG_TCC, LOONGARCH_GPR_SP, tcc_ptr_off);
}
move_addr(ctx, t1, func_addr);
emit_insn(ctx, jirl, LOONGARCH_GPR_RA, t1, 0);
@ -915,8 +962,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
/* tail call */
case BPF_JMP | BPF_TAIL_CALL:
mark_tail_call(ctx);
if (emit_bpf_tail_call(ctx) < 0)
if (emit_bpf_tail_call(ctx, i) < 0)
return -EINVAL;
break;
@ -1180,12 +1226,526 @@ static int validate_code(struct jit_ctx *ctx)
return -1;
}
return 0;
}
static int validate_ctx(struct jit_ctx *ctx)
{
if (validate_code(ctx))
return -1;
if (WARN_ON_ONCE(ctx->num_exentries != ctx->prog->aux->num_exentries))
return -1;
return 0;
}
static int emit_jump_and_link(struct jit_ctx *ctx, u8 rd, u64 target)
{
if (!target) {
pr_err("bpf_jit: jump target address is error\n");
return -EFAULT;
}
move_imm(ctx, LOONGARCH_GPR_T1, target, false);
emit_insn(ctx, jirl, rd, LOONGARCH_GPR_T1, 0);
return 0;
}
static int emit_jump_or_nops(void *target, void *ip, u32 *insns, bool is_call)
{
struct jit_ctx ctx;
ctx.idx = 0;
ctx.image = (union loongarch_instruction *)insns;
if (!target) {
emit_insn((&ctx), nop);
emit_insn((&ctx), nop);
return 0;
}
return emit_jump_and_link(&ctx, is_call ? LOONGARCH_GPR_T0 : LOONGARCH_GPR_ZERO, (u64)target);
}
static int emit_call(struct jit_ctx *ctx, u64 addr)
{
return emit_jump_and_link(ctx, LOONGARCH_GPR_RA, addr);
}
void *bpf_arch_text_copy(void *dst, void *src, size_t len)
{
int ret;
mutex_lock(&text_mutex);
ret = larch_insn_text_copy(dst, src, len);
mutex_unlock(&text_mutex);
return ret ? ERR_PTR(-EINVAL) : dst;
}
int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
void *old_addr, void *new_addr)
{
int ret;
bool is_call = (poke_type == BPF_MOD_CALL);
u32 old_insns[LOONGARCH_LONG_JUMP_NINSNS] = {[0 ... 4] = INSN_NOP};
u32 new_insns[LOONGARCH_LONG_JUMP_NINSNS] = {[0 ... 4] = INSN_NOP};
if (!is_kernel_text((unsigned long)ip) &&
!is_bpf_text_address((unsigned long)ip))
return -ENOTSUPP;
ret = emit_jump_or_nops(old_addr, ip, old_insns, is_call);
if (ret)
return ret;
if (memcmp(ip, old_insns, LOONGARCH_LONG_JUMP_NBYTES))
return -EFAULT;
ret = emit_jump_or_nops(new_addr, ip, new_insns, is_call);
if (ret)
return ret;
mutex_lock(&text_mutex);
if (memcmp(ip, new_insns, LOONGARCH_LONG_JUMP_NBYTES))
ret = larch_insn_text_copy(ip, new_insns, LOONGARCH_LONG_JUMP_NBYTES);
mutex_unlock(&text_mutex);
return ret;
}
int bpf_arch_text_invalidate(void *dst, size_t len)
{
int i;
int ret = 0;
u32 *inst;
inst = kvmalloc(len, GFP_KERNEL);
if (!inst)
return -ENOMEM;
for (i = 0; i < (len / sizeof(u32)); i++)
inst[i] = INSN_BREAK;
mutex_lock(&text_mutex);
if (larch_insn_text_copy(dst, inst, len))
ret = -EINVAL;
mutex_unlock(&text_mutex);
kvfree(inst);
return ret;
}
static void store_args(struct jit_ctx *ctx, int nargs, int args_off)
{
int i;
for (i = 0; i < nargs; i++) {
emit_insn(ctx, std, LOONGARCH_GPR_A0 + i, LOONGARCH_GPR_FP, -args_off);
args_off -= 8;
}
}
static void restore_args(struct jit_ctx *ctx, int nargs, int args_off)
{
int i;
for (i = 0; i < nargs; i++) {
emit_insn(ctx, ldd, LOONGARCH_GPR_A0 + i, LOONGARCH_GPR_FP, -args_off);
args_off -= 8;
}
}
static int invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
int args_off, int retval_off, int run_ctx_off, bool save_ret)
{
int ret;
u32 *branch;
struct bpf_prog *p = l->link.prog;
int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
if (l->cookie) {
move_imm(ctx, LOONGARCH_GPR_T1, l->cookie, false);
emit_insn(ctx, std, LOONGARCH_GPR_T1, LOONGARCH_GPR_FP, -run_ctx_off + cookie_off);
} else {
emit_insn(ctx, std, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_FP, -run_ctx_off + cookie_off);
}
/* arg1: prog */
move_imm(ctx, LOONGARCH_GPR_A0, (const s64)p, false);
/* arg2: &run_ctx */
emit_insn(ctx, addid, LOONGARCH_GPR_A1, LOONGARCH_GPR_FP, -run_ctx_off);
ret = emit_call(ctx, (const u64)bpf_trampoline_enter(p));
if (ret)
return ret;
/* store prog start time */
move_reg(ctx, LOONGARCH_GPR_S1, LOONGARCH_GPR_A0);
/* if (__bpf_prog_enter(prog) == 0)
* goto skip_exec_of_prog;
*/
branch = (u32 *)ctx->image + ctx->idx;
/* nop reserved for conditional jump */
emit_insn(ctx, nop);
/* arg1: &args_off */
emit_insn(ctx, addid, LOONGARCH_GPR_A0, LOONGARCH_GPR_FP, -args_off);
if (!p->jited)
move_imm(ctx, LOONGARCH_GPR_A1, (const s64)p->insnsi, false);
ret = emit_call(ctx, (const u64)p->bpf_func);
if (ret)
return ret;
if (save_ret) {
emit_insn(ctx, std, LOONGARCH_GPR_A0, LOONGARCH_GPR_FP, -retval_off);
emit_insn(ctx, std, regmap[BPF_REG_0], LOONGARCH_GPR_FP, -(retval_off - 8));
}
/* update branch with beqz */
if (ctx->image) {
int offset = (void *)(&ctx->image[ctx->idx]) - (void *)branch;
*branch = larch_insn_gen_beq(LOONGARCH_GPR_A0, LOONGARCH_GPR_ZERO, offset);
}
/* arg1: prog */
move_imm(ctx, LOONGARCH_GPR_A0, (const s64)p, false);
/* arg2: prog start time */
move_reg(ctx, LOONGARCH_GPR_A1, LOONGARCH_GPR_S1);
/* arg3: &run_ctx */
emit_insn(ctx, addid, LOONGARCH_GPR_A2, LOONGARCH_GPR_FP, -run_ctx_off);
ret = emit_call(ctx, (const u64)bpf_trampoline_exit(p));
return ret;
}
static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
int args_off, int retval_off, int run_ctx_off, u32 **branches)
{
int i;
emit_insn(ctx, std, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_FP, -retval_off);
for (i = 0; i < tl->nr_links; i++) {
invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off, run_ctx_off, true);
emit_insn(ctx, ldd, LOONGARCH_GPR_T1, LOONGARCH_GPR_FP, -retval_off);
branches[i] = (u32 *)ctx->image + ctx->idx;
emit_insn(ctx, nop);
}
}
void *arch_alloc_bpf_trampoline(unsigned int size)
{
return bpf_prog_pack_alloc(size, jit_fill_hole);
}
void arch_free_bpf_trampoline(void *image, unsigned int size)
{
bpf_prog_pack_free(image, size);
}
static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
const struct btf_func_model *m, struct bpf_tramp_links *tlinks,
void *func_addr, u32 flags)
{
int i, ret, save_ret;
int stack_size = 0, nargs = 0;
int retval_off, args_off, nargs_off, ip_off, run_ctx_off, sreg_off, tcc_ptr_off;
bool is_struct_ops = flags & BPF_TRAMP_F_INDIRECT;
void *orig_call = func_addr;
struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
u32 **branches = NULL;
if (flags & (BPF_TRAMP_F_ORIG_STACK | BPF_TRAMP_F_SHARE_IPMODIFY))
return -ENOTSUPP;
/*
* FP + 8 [ RA to parent func ] return address to parent
* function
* FP + 0 [ FP of parent func ] frame pointer of parent
* function
* FP - 8 [ T0 to traced func ] return address of traced
* function
* FP - 16 [ FP of traced func ] frame pointer of traced
* function
*
* FP - retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or
* BPF_TRAMP_F_RET_FENTRY_RET
* [ argN ]
* [ ... ]
* FP - args_off [ arg1 ]
*
* FP - nargs_off [ regs count ]
*
* FP - ip_off [ traced func ] BPF_TRAMP_F_IP_ARG
*
* FP - run_ctx_off [ bpf_tramp_run_ctx ]
*
* FP - sreg_off [ callee saved reg ]
*
* FP - tcc_ptr_off [ tail_call_cnt_ptr ]
*/
if (m->nr_args > LOONGARCH_MAX_REG_ARGS)
return -ENOTSUPP;
if (flags & (BPF_TRAMP_F_ORIG_STACK | BPF_TRAMP_F_SHARE_IPMODIFY))
return -ENOTSUPP;
stack_size = 0;
/* Room of trampoline frame to store return address and frame pointer */
stack_size += 16;
save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
if (save_ret) {
/* Save BPF R0 and A0 */
stack_size += 16;
retval_off = stack_size;
}
/* Room of trampoline frame to store args */
nargs = m->nr_args;
stack_size += nargs * 8;
args_off = stack_size;
/* Room of trampoline frame to store args number */
stack_size += 8;
nargs_off = stack_size;
/* Room of trampoline frame to store ip address */
if (flags & BPF_TRAMP_F_IP_ARG) {
stack_size += 8;
ip_off = stack_size;
}
/* Room of trampoline frame to store struct bpf_tramp_run_ctx */
stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8);
run_ctx_off = stack_size;
stack_size += 8;
sreg_off = stack_size;
/* Room of trampoline frame to store tail_call_cnt_ptr */
if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) {
stack_size += 8;
tcc_ptr_off = stack_size;
}
stack_size = round_up(stack_size, 16);
if (is_struct_ops) {
/*
* For the trampoline called directly, just handle
* the frame of trampoline.
*/
emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, -stack_size);
emit_insn(ctx, std, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, stack_size - 8);
emit_insn(ctx, std, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_size - 16);
emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_size);
} else {
/*
* For the trampoline called from function entry,
* the frame of traced function and the frame of
* trampoline need to be considered.
*/
/* RA and FP for parent function */
emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, -16);
emit_insn(ctx, std, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, 8);
emit_insn(ctx, std, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, 0);
emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, 16);
/* RA and FP for traced function */
emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, -stack_size);
emit_insn(ctx, std, LOONGARCH_GPR_T0, LOONGARCH_GPR_SP, stack_size - 8);
emit_insn(ctx, std, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_size - 16);
emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_size);
}
if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
emit_insn(ctx, std, REG_TCC, LOONGARCH_GPR_FP, -tcc_ptr_off);
/* callee saved register S1 to pass start time */
emit_insn(ctx, std, LOONGARCH_GPR_S1, LOONGARCH_GPR_FP, -sreg_off);
/* store ip address of the traced function */
if (flags & BPF_TRAMP_F_IP_ARG) {
move_imm(ctx, LOONGARCH_GPR_T1, (const s64)func_addr, false);
emit_insn(ctx, std, LOONGARCH_GPR_T1, LOONGARCH_GPR_FP, -ip_off);
}
/* store nargs number */
move_imm(ctx, LOONGARCH_GPR_T1, nargs, false);
emit_insn(ctx, std, LOONGARCH_GPR_T1, LOONGARCH_GPR_FP, -nargs_off);
store_args(ctx, nargs, args_off);
/* To traced function */
/* Ftrace jump skips 2 NOP instructions */
if (is_kernel_text((unsigned long)orig_call))
orig_call += LOONGARCH_FENTRY_NBYTES;
/* Direct jump skips 5 NOP instructions */
else if (is_bpf_text_address((unsigned long)orig_call))
orig_call += LOONGARCH_BPF_FENTRY_NBYTES;
if (flags & BPF_TRAMP_F_CALL_ORIG) {
move_imm(ctx, LOONGARCH_GPR_A0, (const s64)im, false);
ret = emit_call(ctx, (const u64)__bpf_tramp_enter);
if (ret)
return ret;
}
for (i = 0; i < fentry->nr_links; i++) {
ret = invoke_bpf_prog(ctx, fentry->links[i], args_off, retval_off,
run_ctx_off, flags & BPF_TRAMP_F_RET_FENTRY_RET);
if (ret)
return ret;
}
if (fmod_ret->nr_links) {
branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *), GFP_KERNEL);
if (!branches)
return -ENOMEM;
invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off, run_ctx_off, branches);
}
if (flags & BPF_TRAMP_F_CALL_ORIG) {
restore_args(ctx, m->nr_args, args_off);
if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
emit_insn(ctx, ldd, REG_TCC, LOONGARCH_GPR_FP, -tcc_ptr_off);
ret = emit_call(ctx, (const u64)orig_call);
if (ret)
goto out;
emit_insn(ctx, std, LOONGARCH_GPR_A0, LOONGARCH_GPR_FP, -retval_off);
emit_insn(ctx, std, regmap[BPF_REG_0], LOONGARCH_GPR_FP, -(retval_off - 8));
im->ip_after_call = ctx->ro_image + ctx->idx;
/* Reserve space for the move_imm + jirl instruction */
for (i = 0; i < LOONGARCH_LONG_JUMP_NINSNS; i++)
emit_insn(ctx, nop);
}
for (i = 0; ctx->image && i < fmod_ret->nr_links; i++) {
int offset = (void *)(&ctx->image[ctx->idx]) - (void *)branches[i];
*branches[i] = larch_insn_gen_bne(LOONGARCH_GPR_T1, LOONGARCH_GPR_ZERO, offset);
}
for (i = 0; i < fexit->nr_links; i++) {
ret = invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off, run_ctx_off, false);
if (ret)
goto out;
}
if (flags & BPF_TRAMP_F_CALL_ORIG) {
im->ip_epilogue = ctx->ro_image + ctx->idx;
move_imm(ctx, LOONGARCH_GPR_A0, (const s64)im, false);
ret = emit_call(ctx, (const u64)__bpf_tramp_exit);
if (ret)
goto out;
}
if (flags & BPF_TRAMP_F_RESTORE_REGS)
restore_args(ctx, m->nr_args, args_off);
if (save_ret) {
emit_insn(ctx, ldd, LOONGARCH_GPR_A0, LOONGARCH_GPR_FP, -retval_off);
emit_insn(ctx, ldd, regmap[BPF_REG_0], LOONGARCH_GPR_FP, -(retval_off - 8));
}
emit_insn(ctx, ldd, LOONGARCH_GPR_S1, LOONGARCH_GPR_FP, -sreg_off);
if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
emit_insn(ctx, ldd, REG_TCC, LOONGARCH_GPR_FP, -tcc_ptr_off);
if (is_struct_ops) {
/* trampoline called directly */
emit_insn(ctx, ldd, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, stack_size - 8);
emit_insn(ctx, ldd, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_size - 16);
emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, stack_size);
emit_insn(ctx, jirl, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_RA, 0);
} else {
/* trampoline called from function entry */
emit_insn(ctx, ldd, LOONGARCH_GPR_T0, LOONGARCH_GPR_SP, stack_size - 8);
emit_insn(ctx, ldd, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_size - 16);
emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, stack_size);
emit_insn(ctx, ldd, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, 8);
emit_insn(ctx, ldd, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, 0);
emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, 16);
if (flags & BPF_TRAMP_F_SKIP_FRAME)
/* return to parent function */
emit_insn(ctx, jirl, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_RA, 0);
else
/* return to traced function */
emit_insn(ctx, jirl, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_T0, 0);
}
ret = ctx->idx;
out:
kfree(branches);
return ret;
}
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
void *ro_image_end, const struct btf_func_model *m,
u32 flags, struct bpf_tramp_links *tlinks, void *func_addr)
{
int ret, size;
void *image, *tmp;
struct jit_ctx ctx;
size = ro_image_end - ro_image;
image = kvmalloc(size, GFP_KERNEL);
if (!image)
return -ENOMEM;
ctx.image = (union loongarch_instruction *)image;
ctx.ro_image = (union loongarch_instruction *)ro_image;
ctx.idx = 0;
jit_fill_hole(image, (unsigned int)(ro_image_end - ro_image));
ret = __arch_prepare_bpf_trampoline(&ctx, im, m, tlinks, func_addr, flags);
if (ret > 0 && validate_code(&ctx) < 0) {
ret = -EINVAL;
goto out;
}
tmp = bpf_arch_text_copy(ro_image, image, size);
if (IS_ERR(tmp)) {
ret = PTR_ERR(tmp);
goto out;
}
bpf_flush_icache(ro_image, ro_image_end);
out:
kvfree(image);
return ret < 0 ? ret : size;
}
int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
struct bpf_tramp_links *tlinks, void *func_addr)
{
int ret;
struct jit_ctx ctx;
struct bpf_tramp_image im;
ctx.image = NULL;
ctx.idx = 0;
ret = __arch_prepare_bpf_trampoline(&ctx, &im, m, tlinks, func_addr, flags);
/* Page align */
return ret < 0 ? ret : round_up(ret * LOONGARCH_INSN_SIZE, PAGE_SIZE);
}
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
{
bool tmp_blinded = false, extra_pass = false;
@ -1288,7 +1848,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
build_epilogue(&ctx);
/* 3. Extra pass to validate JITed code */
if (validate_code(&ctx)) {
if (validate_ctx(&ctx)) {
bpf_jit_binary_free(header);
prog = orig_prog;
goto out_offset;
@ -1342,7 +1902,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
if (tmp_blinded)
bpf_jit_prog_release_other(prog, prog == orig_prog ? tmp : orig_prog);
out_offset = -1;
return prog;
@ -1354,6 +1913,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
goto out_offset;
}
bool bpf_jit_bypass_spec_v1(void)
{
return true;
}
bool bpf_jit_bypass_spec_v4(void)
{
return true;
}
/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
bool bpf_jit_supports_subprog_tailcalls(void)
{

View File

@ -18,6 +18,7 @@ struct jit_ctx {
u32 *offset;
int num_exentries;
union loongarch_instruction *image;
union loongarch_instruction *ro_image;
u32 stack_size;
};
@ -308,3 +309,8 @@ static inline int emit_tailcall_jmp(struct jit_ctx *ctx, u8 cond, enum loongarch
return -EINVAL;
}
static inline void bpf_flush_icache(void *start, void *end)
{
flush_icache_range((unsigned long)start, (unsigned long)end);
}

View File

@ -36,7 +36,7 @@ endif
# VDSO linker flags.
ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
$(filter -E%,$(KBUILD_CFLAGS)) -nostdlib -shared --build-id -T
$(filter -E%,$(KBUILD_CFLAGS)) -shared --build-id -T
#
# Shared build commands.

View File

@ -370,6 +370,23 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
return 0;
}
bool bpf_jit_bypass_spec_v1(void)
{
#if defined(CONFIG_PPC_E500) || defined(CONFIG_PPC_BOOK3S_64)
return !(security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) &&
security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR));
#else
return true;
#endif
}
bool bpf_jit_bypass_spec_v4(void)
{
return !(security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) &&
security_ftr_enabled(SEC_FTR_STF_BARRIER) &&
stf_barrier_type_get() != STF_BARRIER_NONE);
}
/*
* We spill into the redzone always, even if the bpf program has its own stackframe.
* Offsets hardcoded based on BPF_PPC_STACK_SAVE -- see bpf_jit_stack_local()
@ -397,6 +414,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
u32 *addrs, int pass, bool extra_pass)
{
enum stf_barrier_type stf_barrier = stf_barrier_type_get();
bool sync_emitted, ori31_emitted;
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
int i, ret;
@ -789,30 +807,51 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
/*
* BPF_ST NOSPEC (speculation barrier)
*
* The following must act as a barrier against both Spectre v1
* and v4 if we requested both mitigations. Therefore, also emit
* 'isync; sync' on E500 or 'ori31' on BOOK3S_64 in addition to
* the insns needed for a Spectre v4 barrier.
*
* If we requested only !bypass_spec_v1 OR only !bypass_spec_v4,
* we can skip the respective other barrier type as an
* optimization.
*/
case BPF_ST | BPF_NOSPEC:
if (!security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) ||
!security_ftr_enabled(SEC_FTR_STF_BARRIER))
break;
switch (stf_barrier) {
case STF_BARRIER_EIEIO:
EMIT(PPC_RAW_EIEIO() | 0x02000000);
break;
case STF_BARRIER_SYNC_ORI:
sync_emitted = false;
ori31_emitted = false;
if (IS_ENABLED(CONFIG_PPC_E500) &&
!bpf_jit_bypass_spec_v1()) {
EMIT(PPC_RAW_ISYNC());
EMIT(PPC_RAW_SYNC());
EMIT(PPC_RAW_LD(tmp1_reg, _R13, 0));
EMIT(PPC_RAW_ORI(_R31, _R31, 0));
break;
case STF_BARRIER_FALLBACK:
ctx->seen |= SEEN_FUNC;
PPC_LI64(_R12, dereference_kernel_function_descriptor(bpf_stf_barrier));
EMIT(PPC_RAW_MTCTR(_R12));
EMIT(PPC_RAW_BCTRL());
break;
case STF_BARRIER_NONE:
break;
sync_emitted = true;
}
if (!bpf_jit_bypass_spec_v4()) {
switch (stf_barrier) {
case STF_BARRIER_EIEIO:
EMIT(PPC_RAW_EIEIO() | 0x02000000);
break;
case STF_BARRIER_SYNC_ORI:
if (!sync_emitted)
EMIT(PPC_RAW_SYNC());
EMIT(PPC_RAW_LD(tmp1_reg, _R13, 0));
EMIT(PPC_RAW_ORI(_R31, _R31, 0));
ori31_emitted = true;
break;
case STF_BARRIER_FALLBACK:
ctx->seen |= SEEN_FUNC;
PPC_LI64(_R12, dereference_kernel_function_descriptor(bpf_stf_barrier));
EMIT(PPC_RAW_MTCTR(_R12));
EMIT(PPC_RAW_BCTRL());
break;
case STF_BARRIER_NONE:
break;
}
}
if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
!bpf_jit_bypass_spec_v1() &&
!ori31_emitted)
EMIT(PPC_RAW_ORI(_R31, _R31, 0));
break;
/*

View File

@ -1,55 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* BPF Jit compiler defines
*
* Copyright IBM Corp. 2012,2015
*
* Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
* Michael Holzheu <holzheu@linux.vnet.ibm.com>
*/
#ifndef __ARCH_S390_NET_BPF_JIT_H
#define __ARCH_S390_NET_BPF_JIT_H
#ifndef __ASSEMBLY__
#include <linux/filter.h>
#include <linux/types.h>
#endif /* __ASSEMBLY__ */
/*
* Stackframe layout (packed stack):
*
* ^ high
* +---------------+ |
* | old backchain | |
* +---------------+ |
* | r15 - r6 | |
* +---------------+ |
* | 4 byte align | |
* | tail_call_cnt | |
* BFP -> +===============+ |
* | | |
* | BPF stack | |
* | | |
* R15+160 -> +---------------+ |
* | new backchain | |
* R15+152 -> +---------------+ |
* | + 152 byte SA | |
* R15 -> +---------------+ + low
*
* We get 160 bytes stack space from calling function, but only use
* 12 * 8 byte for old backchain, r15..r6, and tail_call_cnt.
*
* The stack size used by the BPF program ("BPF stack" above) is passed
* via "aux->stack_depth".
*/
#define STK_SPACE_ADD (160)
#define STK_160_UNUSED (160 - 12 * 8)
#define STK_OFF (STK_SPACE_ADD - STK_160_UNUSED)
#define STK_OFF_R6 (160 - 11 * 8) /* Offset of r6 on stack */
#define STK_OFF_TCCNT (160 - 12 * 8) /* Offset of tail_call_cnt on stack */
#endif /* __ARCH_S390_NET_BPF_JIT_H */

View File

@ -32,7 +32,6 @@
#include <asm/set_memory.h>
#include <asm/text-patching.h>
#include <asm/unwind.h>
#include "bpf_jit.h"
struct bpf_jit {
u32 seen; /* Flags to remember seen eBPF instructions */
@ -54,6 +53,7 @@ struct bpf_jit {
int prologue_plt; /* Start of prologue hotpatch PLT */
int kern_arena; /* Pool offset of kernel arena address */
u64 user_arena; /* User arena address */
u32 frame_off; /* Offset of struct bpf_prog from %r15 */
};
#define SEEN_MEM BIT(0) /* use mem[] for temporary storage */
@ -425,12 +425,26 @@ static void jit_fill_hole(void *area, unsigned int size)
memset(area, 0, size);
}
/*
* Caller-allocated part of the frame.
* Thanks to packed stack, its otherwise unused initial part can be used for
* the BPF stack and for the next frame.
*/
struct prog_frame {
u64 unused[8];
/* BPF stack starts here and grows towards 0 */
u32 tail_call_cnt;
u32 pad;
u64 r6[10]; /* r6 - r15 */
u64 backchain;
} __packed;
/*
* Save registers from "rs" (register start) to "re" (register end) on stack
*/
static void save_regs(struct bpf_jit *jit, u32 rs, u32 re)
{
u32 off = STK_OFF_R6 + (rs - 6) * 8;
u32 off = offsetof(struct prog_frame, r6) + (rs - 6) * 8;
if (rs == re)
/* stg %rs,off(%r15) */
@ -443,12 +457,9 @@ static void save_regs(struct bpf_jit *jit, u32 rs, u32 re)
/*
* Restore registers from "rs" (register start) to "re" (register end) on stack
*/
static void restore_regs(struct bpf_jit *jit, u32 rs, u32 re, u32 stack_depth)
static void restore_regs(struct bpf_jit *jit, u32 rs, u32 re)
{
u32 off = STK_OFF_R6 + (rs - 6) * 8;
if (jit->seen & SEEN_STACK)
off += STK_OFF + stack_depth;
u32 off = jit->frame_off + offsetof(struct prog_frame, r6) + (rs - 6) * 8;
if (rs == re)
/* lg %rs,off(%r15) */
@ -492,8 +503,7 @@ static int get_end(u16 seen_regs, int start)
* Save and restore clobbered registers (6-15) on stack.
* We save/restore registers in chunks with gap >= 2 registers.
*/
static void save_restore_regs(struct bpf_jit *jit, int op, u32 stack_depth,
u16 extra_regs)
static void save_restore_regs(struct bpf_jit *jit, int op, u16 extra_regs)
{
u16 seen_regs = jit->seen_regs | extra_regs;
const int last = 15, save_restore_size = 6;
@ -516,7 +526,7 @@ static void save_restore_regs(struct bpf_jit *jit, int op, u32 stack_depth,
if (op == REGS_SAVE)
save_regs(jit, rs, re);
else
restore_regs(jit, rs, re, stack_depth);
restore_regs(jit, rs, re);
re++;
} while (re <= last);
}
@ -581,11 +591,12 @@ static void bpf_jit_plt(struct bpf_plt *plt, void *ret, void *target)
* Emit function prologue
*
* Save registers and create stack frame if necessary.
* See stack frame layout description in "bpf_jit.h"!
* Stack frame layout is described by struct prog_frame.
*/
static void bpf_jit_prologue(struct bpf_jit *jit, struct bpf_prog *fp,
u32 stack_depth)
static void bpf_jit_prologue(struct bpf_jit *jit, struct bpf_prog *fp)
{
BUILD_BUG_ON(sizeof(struct prog_frame) != STACK_FRAME_OVERHEAD);
/* No-op for hotpatching */
/* brcl 0,prologue_plt */
EMIT6_PCREL_RILC(0xc0040000, 0, jit->prologue_plt);
@ -593,8 +604,9 @@ static void bpf_jit_prologue(struct bpf_jit *jit, struct bpf_prog *fp,
if (!bpf_is_subprog(fp)) {
/* Initialize the tail call counter in the main program. */
/* xc STK_OFF_TCCNT(4,%r15),STK_OFF_TCCNT(%r15) */
_EMIT6(0xd703f000 | STK_OFF_TCCNT, 0xf000 | STK_OFF_TCCNT);
/* xc tail_call_cnt(4,%r15),tail_call_cnt(%r15) */
_EMIT6(0xd703f000 | offsetof(struct prog_frame, tail_call_cnt),
0xf000 | offsetof(struct prog_frame, tail_call_cnt));
} else {
/*
* Skip the tail call counter initialization in subprograms.
@ -617,7 +629,7 @@ static void bpf_jit_prologue(struct bpf_jit *jit, struct bpf_prog *fp,
jit->seen_regs |= NVREGS;
} else {
/* Save registers */
save_restore_regs(jit, REGS_SAVE, stack_depth,
save_restore_regs(jit, REGS_SAVE,
fp->aux->exception_boundary ? NVREGS : 0);
}
/* Setup literal pool */
@ -637,13 +649,15 @@ static void bpf_jit_prologue(struct bpf_jit *jit, struct bpf_prog *fp,
if (is_first_pass(jit) || (jit->seen & SEEN_STACK)) {
/* lgr %w1,%r15 (backchain) */
EMIT4(0xb9040000, REG_W1, REG_15);
/* la %bfp,STK_160_UNUSED(%r15) (BPF frame pointer) */
EMIT4_DISP(0x41000000, BPF_REG_FP, REG_15, STK_160_UNUSED);
/* aghi %r15,-STK_OFF */
EMIT4_IMM(0xa70b0000, REG_15, -(STK_OFF + stack_depth));
/* stg %w1,152(%r15) (backchain) */
/* la %bfp,unused_end(%r15) (BPF frame pointer) */
EMIT4_DISP(0x41000000, BPF_REG_FP, REG_15,
offsetofend(struct prog_frame, unused));
/* aghi %r15,-frame_off */
EMIT4_IMM(0xa70b0000, REG_15, -jit->frame_off);
/* stg %w1,backchain(%r15) */
EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0,
REG_15, 152);
REG_15,
offsetof(struct prog_frame, backchain));
}
}
@ -677,13 +691,13 @@ static void call_r1(struct bpf_jit *jit)
/*
* Function epilogue
*/
static void bpf_jit_epilogue(struct bpf_jit *jit, u32 stack_depth)
static void bpf_jit_epilogue(struct bpf_jit *jit)
{
jit->exit_ip = jit->prg;
/* Load exit code: lgr %r2,%b0 */
EMIT4(0xb9040000, REG_2, BPF_REG_0);
/* Restore registers */
save_restore_regs(jit, REGS_RESTORE, stack_depth, 0);
save_restore_regs(jit, REGS_RESTORE, 0);
EMIT_JUMP_REG(14);
jit->prg = ALIGN(jit->prg, 8);
@ -865,7 +879,7 @@ static int sign_extend(struct bpf_jit *jit, int r, u8 size, u8 flags)
* stack space for the large switch statement.
*/
static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
int i, bool extra_pass, u32 stack_depth)
int i, bool extra_pass)
{
struct bpf_insn *insn = &fp->insnsi[i];
s32 branch_oc_off = insn->off;
@ -1786,9 +1800,10 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
* Note 2: We assume that the verifier does not let us call the
* main program, which clears the tail call counter on entry.
*/
/* mvc STK_OFF_TCCNT(4,%r15),N(%r15) */
_EMIT6(0xd203f000 | STK_OFF_TCCNT,
0xf000 | (STK_OFF_TCCNT + STK_OFF + stack_depth));
/* mvc tail_call_cnt(4,%r15),frame_off+tail_call_cnt(%r15) */
_EMIT6(0xd203f000 | offsetof(struct prog_frame, tail_call_cnt),
0xf000 | (jit->frame_off +
offsetof(struct prog_frame, tail_call_cnt)));
/* Sign-extend the kfunc arguments. */
if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
@ -1839,10 +1854,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
* goto out;
*/
if (jit->seen & SEEN_STACK)
off = STK_OFF_TCCNT + STK_OFF + stack_depth;
else
off = STK_OFF_TCCNT;
off = jit->frame_off +
offsetof(struct prog_frame, tail_call_cnt);
/* lhi %w0,1 */
EMIT4_IMM(0xa7080000, REG_W0, 1);
/* laal %w1,%w0,off(%r15) */
@ -1872,7 +1885,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
/*
* Restore registers before calling function
*/
save_restore_regs(jit, REGS_RESTORE, stack_depth, 0);
save_restore_regs(jit, REGS_RESTORE, 0);
/*
* goto *(prog->bpf_func + tail_call_start);
@ -2165,7 +2178,7 @@ static int bpf_set_addr(struct bpf_jit *jit, int i)
* Compile eBPF program into s390x code
*/
static int bpf_jit_prog(struct bpf_jit *jit, struct bpf_prog *fp,
bool extra_pass, u32 stack_depth)
bool extra_pass)
{
int i, insn_count, lit32_size, lit64_size;
u64 kern_arena;
@ -2174,24 +2187,30 @@ static int bpf_jit_prog(struct bpf_jit *jit, struct bpf_prog *fp,
jit->lit64 = jit->lit64_start;
jit->prg = 0;
jit->excnt = 0;
if (is_first_pass(jit) || (jit->seen & SEEN_STACK))
jit->frame_off = sizeof(struct prog_frame) -
offsetofend(struct prog_frame, unused) +
round_up(fp->aux->stack_depth, 8);
else
jit->frame_off = 0;
kern_arena = bpf_arena_get_kern_vm_start(fp->aux->arena);
if (kern_arena)
jit->kern_arena = _EMIT_CONST_U64(kern_arena);
jit->user_arena = bpf_arena_get_user_vm_start(fp->aux->arena);
bpf_jit_prologue(jit, fp, stack_depth);
bpf_jit_prologue(jit, fp);
if (bpf_set_addr(jit, 0) < 0)
return -1;
for (i = 0; i < fp->len; i += insn_count) {
insn_count = bpf_jit_insn(jit, fp, i, extra_pass, stack_depth);
insn_count = bpf_jit_insn(jit, fp, i, extra_pass);
if (insn_count < 0)
return -1;
/* Next instruction address */
if (bpf_set_addr(jit, i + insn_count) < 0)
return -1;
}
bpf_jit_epilogue(jit, stack_depth);
bpf_jit_epilogue(jit);
lit32_size = jit->lit32 - jit->lit32_start;
lit64_size = jit->lit64 - jit->lit64_start;
@ -2267,7 +2286,6 @@ static struct bpf_binary_header *bpf_jit_alloc(struct bpf_jit *jit,
*/
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
{
u32 stack_depth = round_up(fp->aux->stack_depth, 8);
struct bpf_prog *tmp, *orig_fp = fp;
struct bpf_binary_header *header;
struct s390_jit_data *jit_data;
@ -2320,7 +2338,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
* - 3: Calculate program size and addrs array
*/
for (pass = 1; pass <= 3; pass++) {
if (bpf_jit_prog(&jit, fp, extra_pass, stack_depth)) {
if (bpf_jit_prog(&jit, fp, extra_pass)) {
fp = orig_fp;
goto free_addrs;
}
@ -2334,7 +2352,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
goto free_addrs;
}
skip_init_ctx:
if (bpf_jit_prog(&jit, fp, extra_pass, stack_depth)) {
if (bpf_jit_prog(&jit, fp, extra_pass)) {
bpf_jit_binary_free(header);
fp = orig_fp;
goto free_addrs;
@ -2654,9 +2672,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
/* stg %r1,backchain_off(%r15) */
EMIT6_DISP_LH(0xe3000000, 0x0024, REG_1, REG_0, REG_15,
tjit->backchain_off);
/* mvc tccnt_off(4,%r15),stack_size+STK_OFF_TCCNT(%r15) */
/* mvc tccnt_off(4,%r15),stack_size+tail_call_cnt(%r15) */
_EMIT6(0xd203f000 | tjit->tccnt_off,
0xf000 | (tjit->stack_size + STK_OFF_TCCNT));
0xf000 | (tjit->stack_size +
offsetof(struct prog_frame, tail_call_cnt)));
/* stmg %r2,%rN,fwd_reg_args_off(%r15) */
if (nr_reg_args)
EMIT6_DISP_LH(0xeb000000, 0x0024, REG_2,
@ -2793,8 +2812,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
(nr_stack_args * sizeof(u64) - 1) << 16 |
tjit->stack_args_off,
0xf000 | tjit->orig_stack_args_off);
/* mvc STK_OFF_TCCNT(4,%r15),tccnt_off(%r15) */
_EMIT6(0xd203f000 | STK_OFF_TCCNT, 0xf000 | tjit->tccnt_off);
/* mvc tail_call_cnt(4,%r15),tccnt_off(%r15) */
_EMIT6(0xd203f000 | offsetof(struct prog_frame, tail_call_cnt),
0xf000 | tjit->tccnt_off);
/* lgr %r1,%r8 */
EMIT4(0xb9040000, REG_1, REG_8);
/* %r1() */
@ -2851,8 +2871,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
if (flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET))
EMIT6_DISP_LH(0xe3000000, 0x0004, REG_2, REG_0, REG_15,
tjit->retval_off);
/* mvc stack_size+STK_OFF_TCCNT(4,%r15),tccnt_off(%r15) */
_EMIT6(0xd203f000 | (tjit->stack_size + STK_OFF_TCCNT),
/* mvc stack_size+tail_call_cnt(4,%r15),tccnt_off(%r15) */
_EMIT6(0xd203f000 | (tjit->stack_size +
offsetof(struct prog_frame, tail_call_cnt)),
0xf000 | tjit->tccnt_off);
/* aghi %r15,stack_size */
EMIT4_IMM(0xa70b0000, REG_15, tjit->stack_size);

View File

@ -1526,7 +1526,7 @@ static bool kvm_xen_schedop_poll(struct kvm_vcpu *vcpu, bool longmode,
if (kvm_read_guest_virt(vcpu, (gva_t)sched_poll.ports, ports,
sched_poll.nr_ports * sizeof(*ports), &e)) {
*r = -EFAULT;
return true;
goto out;
}
for (i = 0; i < sched_poll.nr_ports; i++) {

View File

@ -3501,13 +3501,6 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func
return emit_bpf_dispatcher(&prog, 0, num_funcs - 1, funcs, image, buf);
}
static const char *bpf_get_prog_name(struct bpf_prog *prog)
{
if (prog->aux->ksym.prog)
return prog->aux->ksym.name;
return prog->aux->name;
}
static void priv_stack_init_guard(void __percpu *priv_stack_ptr, int alloc_size)
{
int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
@ -3531,7 +3524,7 @@ static void priv_stack_check_guard(void __percpu *priv_stack_ptr, int alloc_size
if (stack_ptr[0] != PRIV_STACK_GUARD_VAL ||
stack_ptr[underflow_idx] != PRIV_STACK_GUARD_VAL) {
pr_err("BPF private stack overflow/underflow detected for prog %sx\n",
bpf_get_prog_name(prog));
bpf_jit_get_prog_name(prog));
break;
}
}
@ -3845,7 +3838,6 @@ void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp
}
return;
#endif
WARN(1, "verification of programs using bpf_throw should have failed\n");
}
void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,

View File

@ -719,7 +719,8 @@ void elevator_set_default(struct request_queue *q)
.name = "mq-deadline",
.no_uevent = true,
};
int err = 0;
int err;
struct elevator_type *e;
/* now we allow to switch elevator */
blk_queue_flag_clear(QUEUE_FLAG_NO_ELV_SWITCH, q);
@ -732,12 +733,18 @@ void elevator_set_default(struct request_queue *q)
* have multiple queues or mq-deadline is not available, default
* to "none".
*/
if (elevator_find_get(ctx.name) && (q->nr_hw_queues == 1 ||
blk_mq_is_shared_tags(q->tag_set->flags)))
e = elevator_find_get(ctx.name);
if (!e)
return;
if ((q->nr_hw_queues == 1 ||
blk_mq_is_shared_tags(q->tag_set->flags))) {
err = elevator_change(q, &ctx);
if (err < 0)
pr_warn("\"%s\" elevator initialization, failed %d, "
"falling back to \"none\"\n", ctx.name, err);
if (err < 0)
pr_warn("\"%s\" elevator initialization, failed %d, falling back to \"none\"\n",
ctx.name, err);
}
elevator_put(e);
}
void elevator_set_none(struct request_queue *q)

View File

@ -943,6 +943,7 @@ struct fsl_mc_device *fsl_mc_get_endpoint(struct fsl_mc_device *mc_dev,
struct fsl_mc_obj_desc endpoint_desc = {{ 0 }};
struct dprc_endpoint endpoint1 = {{ 0 }};
struct dprc_endpoint endpoint2 = {{ 0 }};
struct fsl_mc_bus *mc_bus;
int state, err;
mc_bus_dev = to_fsl_mc_device(mc_dev->dev.parent);
@ -966,6 +967,8 @@ struct fsl_mc_device *fsl_mc_get_endpoint(struct fsl_mc_device *mc_dev,
strcpy(endpoint_desc.type, endpoint2.type);
endpoint_desc.id = endpoint2.id;
endpoint = fsl_mc_device_lookup(&endpoint_desc, mc_bus_dev);
if (endpoint)
return endpoint;
/*
* We know that the device has an endpoint because we verified by
@ -973,17 +976,13 @@ struct fsl_mc_device *fsl_mc_get_endpoint(struct fsl_mc_device *mc_dev,
* yet discovered by the fsl-mc bus, thus the lookup returned NULL.
* Force a rescan of the devices in this container and retry the lookup.
*/
if (!endpoint) {
struct fsl_mc_bus *mc_bus = to_fsl_mc_bus(mc_bus_dev);
if (mutex_trylock(&mc_bus->scan_mutex)) {
err = dprc_scan_objects(mc_bus_dev, true);
mutex_unlock(&mc_bus->scan_mutex);
}
if (err < 0)
return ERR_PTR(err);
mc_bus = to_fsl_mc_bus(mc_bus_dev);
if (mutex_trylock(&mc_bus->scan_mutex)) {
err = dprc_scan_objects(mc_bus_dev, true);
mutex_unlock(&mc_bus->scan_mutex);
}
if (err < 0)
return ERR_PTR(err);
endpoint = fsl_mc_device_lookup(&endpoint_desc, mc_bus_dev);
/*

View File

@ -385,7 +385,8 @@ static SUNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT(mbus_clk, "mbus", mbus_parents,
0, 0, /* no P */
24, 3, /* mux */
BIT(31), /* gate */
0, CCU_FEATURE_UPDATE_BIT);
CLK_IS_CRITICAL,
CCU_FEATURE_UPDATE_BIT);
static const struct clk_hw *mbus_hws[] = { &mbus_clk.common.hw };

View File

@ -350,7 +350,7 @@ static SUNXI_CCU_M_WITH_MUX_GATE(de_clk, "de", de_parents,
0x104, 0, 4, 24, 2, BIT(31),
CLK_SET_RATE_PARENT);
static const char * const tcon_parents[] = { "pll-video" };
static const char * const tcon_parents[] = { "pll-video", "pll-periph0" };
static SUNXI_CCU_M_WITH_MUX_GATE(tcon_clk, "tcon", tcon_parents,
0x118, 0, 4, 24, 3, BIT(31), 0);
@ -362,11 +362,11 @@ static const char * const csi_mclk_parents[] = { "osc24M", "pll-video",
static SUNXI_CCU_M_WITH_MUX_GATE(csi0_mclk_clk, "csi0-mclk", csi_mclk_parents,
0x130, 0, 5, 8, 3, BIT(15), 0);
static const char * const csi1_sclk_parents[] = { "pll-video", "pll-isp" };
static SUNXI_CCU_M_WITH_MUX_GATE(csi1_sclk_clk, "csi-sclk", csi1_sclk_parents,
static const char * const csi_sclk_parents[] = { "pll-video", "pll-isp" };
static SUNXI_CCU_M_WITH_MUX_GATE(csi_sclk_clk, "csi-sclk", csi_sclk_parents,
0x134, 16, 4, 24, 3, BIT(31), 0);
static SUNXI_CCU_M_WITH_MUX_GATE(csi1_mclk_clk, "csi-mclk", csi_mclk_parents,
static SUNXI_CCU_M_WITH_MUX_GATE(csi1_mclk_clk, "csi1-mclk", csi_mclk_parents,
0x134, 0, 5, 8, 3, BIT(15), 0);
static SUNXI_CCU_M_WITH_GATE(ve_clk, "ve", "pll-ve",
@ -452,7 +452,7 @@ static struct ccu_common *sun8i_v3s_ccu_clks[] = {
&tcon_clk.common,
&csi_misc_clk.common,
&csi0_mclk_clk.common,
&csi1_sclk_clk.common,
&csi_sclk_clk.common,
&csi1_mclk_clk.common,
&ve_clk.common,
&ac_dig_clk.common,
@ -551,7 +551,7 @@ static struct clk_hw_onecell_data sun8i_v3s_hw_clks = {
[CLK_TCON0] = &tcon_clk.common.hw,
[CLK_CSI_MISC] = &csi_misc_clk.common.hw,
[CLK_CSI0_MCLK] = &csi0_mclk_clk.common.hw,
[CLK_CSI1_SCLK] = &csi1_sclk_clk.common.hw,
[CLK_CSI_SCLK] = &csi_sclk_clk.common.hw,
[CLK_CSI1_MCLK] = &csi1_mclk_clk.common.hw,
[CLK_VE] = &ve_clk.common.hw,
[CLK_AC_DIG] = &ac_dig_clk.common.hw,
@ -633,7 +633,7 @@ static struct clk_hw_onecell_data sun8i_v3_hw_clks = {
[CLK_TCON0] = &tcon_clk.common.hw,
[CLK_CSI_MISC] = &csi_misc_clk.common.hw,
[CLK_CSI0_MCLK] = &csi0_mclk_clk.common.hw,
[CLK_CSI1_SCLK] = &csi1_sclk_clk.common.hw,
[CLK_CSI_SCLK] = &csi_sclk_clk.common.hw,
[CLK_CSI1_MCLK] = &csi1_mclk_clk.common.hw,
[CLK_VE] = &ve_clk.common.hw,
[CLK_AC_DIG] = &ac_dig_clk.common.hw,

View File

@ -5193,6 +5193,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool notify_clients)
dev->dev->power.disable_depth--;
#endif
}
amdgpu_vram_mgr_clear_reset_blocks(adev);
adev->in_suspend = false;
if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D0))

View File

@ -154,6 +154,7 @@ int amdgpu_vram_mgr_reserve_range(struct amdgpu_vram_mgr *mgr,
uint64_t start, uint64_t size);
int amdgpu_vram_mgr_query_page_status(struct amdgpu_vram_mgr *mgr,
uint64_t start);
void amdgpu_vram_mgr_clear_reset_blocks(struct amdgpu_device *adev);
bool amdgpu_res_cpu_visible(struct amdgpu_device *adev,
struct ttm_resource *res);

View File

@ -782,6 +782,23 @@ uint64_t amdgpu_vram_mgr_vis_usage(struct amdgpu_vram_mgr *mgr)
return atomic64_read(&mgr->vis_usage);
}
/**
* amdgpu_vram_mgr_clear_reset_blocks - reset clear blocks
*
* @adev: amdgpu device pointer
*
* Reset the cleared drm buddy blocks.
*/
void amdgpu_vram_mgr_clear_reset_blocks(struct amdgpu_device *adev)
{
struct amdgpu_vram_mgr *mgr = &adev->mman.vram_mgr;
struct drm_buddy *mm = &mgr->mm;
mutex_lock(&mgr->lock);
drm_buddy_reset_clear(mm, false);
mutex_unlock(&mgr->lock);
}
/**
* amdgpu_vram_mgr_intersects - test each drm buddy block for intersection
*

View File

@ -1373,7 +1373,7 @@ static int ti_sn_bridge_probe(struct auxiliary_device *adev,
regmap_update_bits(pdata->regmap, SN_HPD_DISABLE_REG,
HPD_DISABLE, 0);
mutex_unlock(&pdata->comms_mutex);
};
}
drm_bridge_add(&pdata->bridge);

View File

@ -404,6 +404,49 @@ drm_get_buddy(struct drm_buddy_block *block)
}
EXPORT_SYMBOL(drm_get_buddy);
/**
* drm_buddy_reset_clear - reset blocks clear state
*
* @mm: DRM buddy manager
* @is_clear: blocks clear state
*
* Reset the clear state based on @is_clear value for each block
* in the freelist.
*/
void drm_buddy_reset_clear(struct drm_buddy *mm, bool is_clear)
{
u64 root_size, size, start;
unsigned int order;
int i;
size = mm->size;
for (i = 0; i < mm->n_roots; ++i) {
order = ilog2(size) - ilog2(mm->chunk_size);
start = drm_buddy_block_offset(mm->roots[i]);
__force_merge(mm, start, start + size, order);
root_size = mm->chunk_size << order;
size -= root_size;
}
for (i = 0; i <= mm->max_order; ++i) {
struct drm_buddy_block *block;
list_for_each_entry_reverse(block, &mm->free_list[i], link) {
if (is_clear != drm_buddy_block_is_clear(block)) {
if (is_clear) {
mark_cleared(block);
mm->clear_avail += drm_buddy_block_size(mm, block);
} else {
clear_reset(block);
mm->clear_avail -= drm_buddy_block_size(mm, block);
}
}
}
}
}
EXPORT_SYMBOL(drm_buddy_reset_clear);
/**
* drm_buddy_free_block - free a block
*

View File

@ -230,7 +230,7 @@ void drm_gem_dma_free(struct drm_gem_dma_object *dma_obj)
if (drm_gem_is_imported(gem_obj)) {
if (dma_obj->vaddr)
dma_buf_vunmap_unlocked(gem_obj->dma_buf, &map);
dma_buf_vunmap_unlocked(gem_obj->import_attach->dmabuf, &map);
drm_prime_gem_destroy(gem_obj, dma_obj->sgt);
} else if (dma_obj->vaddr) {
if (dma_obj->map_noncoherent)

View File

@ -419,6 +419,7 @@ EXPORT_SYMBOL(drm_gem_fb_vunmap);
static void __drm_gem_fb_end_cpu_access(struct drm_framebuffer *fb, enum dma_data_direction dir,
unsigned int num_planes)
{
struct dma_buf_attachment *import_attach;
struct drm_gem_object *obj;
int ret;
@ -427,9 +428,10 @@ static void __drm_gem_fb_end_cpu_access(struct drm_framebuffer *fb, enum dma_dat
obj = drm_gem_fb_get_obj(fb, num_planes);
if (!obj)
continue;
import_attach = obj->import_attach;
if (!drm_gem_is_imported(obj))
continue;
ret = dma_buf_end_cpu_access(obj->dma_buf, dir);
ret = dma_buf_end_cpu_access(import_attach->dmabuf, dir);
if (ret)
drm_err(fb->dev, "dma_buf_end_cpu_access(%u, %d) failed: %d\n",
ret, num_planes, dir);
@ -452,6 +454,7 @@ static void __drm_gem_fb_end_cpu_access(struct drm_framebuffer *fb, enum dma_dat
*/
int drm_gem_fb_begin_cpu_access(struct drm_framebuffer *fb, enum dma_data_direction dir)
{
struct dma_buf_attachment *import_attach;
struct drm_gem_object *obj;
unsigned int i;
int ret;
@ -462,9 +465,10 @@ int drm_gem_fb_begin_cpu_access(struct drm_framebuffer *fb, enum dma_data_direct
ret = -EINVAL;
goto err___drm_gem_fb_end_cpu_access;
}
import_attach = obj->import_attach;
if (!drm_gem_is_imported(obj))
continue;
ret = dma_buf_begin_cpu_access(obj->dma_buf, dir);
ret = dma_buf_begin_cpu_access(import_attach->dmabuf, dir);
if (ret)
goto err___drm_gem_fb_end_cpu_access;
}

View File

@ -349,7 +349,7 @@ int drm_gem_shmem_vmap_locked(struct drm_gem_shmem_object *shmem,
int ret = 0;
if (drm_gem_is_imported(obj)) {
ret = dma_buf_vmap(obj->dma_buf, map);
ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
} else {
pgprot_t prot = PAGE_KERNEL;
@ -409,7 +409,7 @@ void drm_gem_shmem_vunmap_locked(struct drm_gem_shmem_object *shmem,
struct drm_gem_object *obj = &shmem->base;
if (drm_gem_is_imported(obj)) {
dma_buf_vunmap(obj->dma_buf, map);
dma_buf_vunmap(obj->import_attach->dmabuf, map);
} else {
dma_resv_assert_held(shmem->base.resv);

View File

@ -453,7 +453,13 @@ struct dma_buf *drm_gem_prime_handle_to_dmabuf(struct drm_device *dev,
}
mutex_lock(&dev->object_name_lock);
/* re-export the original imported/exported object */
/* re-export the original imported object */
if (obj->import_attach) {
dmabuf = obj->import_attach->dmabuf;
get_dma_buf(dmabuf);
goto out_have_obj;
}
if (obj->dma_buf) {
get_dma_buf(obj->dma_buf);
dmabuf = obj->dma_buf;

View File

@ -65,7 +65,7 @@ static void etnaviv_gem_prime_release(struct etnaviv_gem_object *etnaviv_obj)
struct iosys_map map = IOSYS_MAP_INIT_VADDR(etnaviv_obj->vaddr);
if (etnaviv_obj->vaddr)
dma_buf_vunmap_unlocked(etnaviv_obj->base.dma_buf, &map);
dma_buf_vunmap_unlocked(etnaviv_obj->base.import_attach->dmabuf, &map);
/* Don't drop the pages for imported dmabuf, as they are not
* ours, just free the array we allocated:
@ -82,7 +82,7 @@ static void *etnaviv_gem_prime_vmap_impl(struct etnaviv_gem_object *etnaviv_obj)
lockdep_assert_held(&etnaviv_obj->lock);
ret = dma_buf_vmap(etnaviv_obj->base.dma_buf, &map);
ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
if (ret)
return NULL;
return map.vaddr;

View File

@ -7061,7 +7061,8 @@ static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_stat
struct drm_i915_private *i915 = to_i915(intel_state->base.dev);
struct drm_plane *plane;
struct drm_plane_state *new_plane_state;
int ret, i;
long ret;
int i;
for_each_new_plane_in_state(&intel_state->base, plane, new_plane_state, i) {
if (new_plane_state->fence) {

View File

@ -1604,6 +1604,12 @@ int intel_dp_rate_select(struct intel_dp *intel_dp, int rate)
void intel_dp_compute_rate(struct intel_dp *intel_dp, int port_clock,
u8 *link_bw, u8 *rate_select)
{
struct intel_display *display = to_intel_display(intel_dp);
/* FIXME g4x can't generate an exact 2.7GHz with the 96MHz non-SSC refclk */
if (display->platform.g4x && port_clock == 268800)
port_clock = 270000;
/* eDP 1.4 rate select method. */
if (intel_dp->use_rate_select) {
*link_bw = 0;

View File

@ -1284,9 +1284,6 @@ nouveau_ioctls[] = {
DRM_IOCTL_DEF_DRV(NOUVEAU_EXEC, nouveau_exec_ioctl_exec, DRM_RENDER_ALLOW),
};
#define DRM_IOCTL_NOUVEAU_NVIF _IOC(_IOC_READ | _IOC_WRITE, DRM_IOCTL_BASE, \
DRM_COMMAND_BASE + DRM_NOUVEAU_NVIF, 0)
long
nouveau_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
@ -1300,10 +1297,14 @@ nouveau_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return ret;
}
if ((cmd & ~IOCSIZE_MASK) == DRM_IOCTL_NOUVEAU_NVIF)
switch (_IOC_NR(cmd) - DRM_COMMAND_BASE) {
case DRM_NOUVEAU_NVIF:
ret = nouveau_abi16_ioctl(filp, (void __user *)arg, _IOC_SIZE(cmd));
else
break;
default:
ret = drm_ioctl(file, cmd, arg);
break;
}
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);

View File

@ -39,6 +39,9 @@ nvif_chan_gpfifo_post(struct nvif_chan *chan)
const u32 pbptr = (chan->push.cur - map) + chan->func->gpfifo.post_size;
const u32 gpptr = (chan->gpfifo.cur + 1) & chan->gpfifo.max;
if (!chan->func->gpfifo.post)
return 0;
return chan->func->gpfifo.post(chan, gpptr, pbptr);
}

View File

@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct drm_sched_entity *entity)
}
EXPORT_SYMBOL(drm_sched_entity_destroy);
/* drm_sched_entity_clear_dep - callback to clear the entities dependency */
static void drm_sched_entity_clear_dep(struct dma_fence *f,
struct dma_fence_cb *cb)
{
struct drm_sched_entity *entity =
container_of(cb, struct drm_sched_entity, cb);
entity->dependency = NULL;
dma_fence_put(f);
}
/*
* drm_sched_entity_wakeup - callback to clear the entity's dependency and
* wake up the scheduler
@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
struct drm_sched_entity *entity =
container_of(cb, struct drm_sched_entity, cb);
drm_sched_entity_clear_dep(f, cb);
entity->dependency = NULL;
dma_fence_put(f);
drm_sched_wakeup(entity->rq->sched);
}
@ -429,13 +419,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
fence = dma_fence_get(&s_fence->scheduled);
dma_fence_put(entity->dependency);
entity->dependency = fence;
if (!dma_fence_add_callback(fence, &entity->cb,
drm_sched_entity_clear_dep))
return true;
/* Ignore it when it is already scheduled */
dma_fence_put(fence);
return false;
}
if (!dma_fence_add_callback(entity->dependency, &entity->cb,

View File

@ -204,15 +204,16 @@ static void virtgpu_dma_buf_free_obj(struct drm_gem_object *obj)
{
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
struct virtio_gpu_device *vgdev = obj->dev->dev_private;
struct dma_buf_attachment *attach = obj->import_attach;
if (drm_gem_is_imported(obj)) {
struct dma_buf *dmabuf = obj->dma_buf;
struct dma_buf *dmabuf = attach->dmabuf;
dma_resv_lock(dmabuf->resv, NULL);
virtgpu_dma_buf_unmap(bo);
dma_resv_unlock(dmabuf->resv);
dma_buf_detach(dmabuf, obj->import_attach);
dma_buf_detach(dmabuf, attach);
dma_buf_put(dmabuf);
}

View File

@ -85,10 +85,10 @@ static int vmw_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
int ret;
if (drm_gem_is_imported(obj)) {
ret = dma_buf_vmap(obj->dma_buf, map);
ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
if (!ret) {
if (drm_WARN_ON(obj->dev, map->is_iomem)) {
dma_buf_vunmap(obj->dma_buf, map);
dma_buf_vunmap(obj->import_attach->dmabuf, map);
return -EIO;
}
}
@ -102,7 +102,7 @@ static int vmw_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
static void vmw_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
{
if (drm_gem_is_imported(obj))
dma_buf_vunmap(obj->dma_buf, map);
dma_buf_vunmap(obj->import_attach->dmabuf, map);
else
drm_gem_ttm_vunmap(obj, map);
}

View File

@ -24,7 +24,7 @@
extern struct fault_attr gt_reset_failure;
static inline bool xe_fault_inject_gt_reset(void)
{
return should_fail(&gt_reset_failure, 1);
return IS_ENABLED(CONFIG_DEBUG_FS) && should_fail(&gt_reset_failure, 1);
}
struct xe_gt *xe_gt_alloc(struct xe_tile *tile);

View File

@ -452,8 +452,10 @@ static int qup_i2c_bus_active(struct qup_i2c_dev *qup, int len)
if (!(status & I2C_STATUS_BUS_ACTIVE))
break;
if (time_after(jiffies, timeout))
if (time_after(jiffies, timeout)) {
ret = -ETIMEDOUT;
break;
}
usleep_range(len, len * 2);
}

View File

@ -607,7 +607,6 @@ static int tegra_i2c_wait_for_config_load(struct tegra_i2c_dev *i2c_dev)
static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev)
{
u32 val, clk_divisor, clk_multiplier, tsu_thd, tlow, thigh, non_hs_mode;
acpi_handle handle = ACPI_HANDLE(i2c_dev->dev);
struct i2c_timings *t = &i2c_dev->timings;
int err;
@ -619,11 +618,7 @@ static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev)
* emit a noisy warning on error, which won't stay unnoticed and
* won't hose machine entirely.
*/
if (handle)
err = acpi_evaluate_object(handle, "_RST", NULL, NULL);
else
err = reset_control_reset(i2c_dev->rst);
err = device_reset(i2c_dev->dev);
WARN_ON_ONCE(err);
if (IS_DVC(i2c_dev))
@ -1666,19 +1661,6 @@ static void tegra_i2c_parse_dt(struct tegra_i2c_dev *i2c_dev)
i2c_dev->is_vi = true;
}
static int tegra_i2c_init_reset(struct tegra_i2c_dev *i2c_dev)
{
if (ACPI_HANDLE(i2c_dev->dev))
return 0;
i2c_dev->rst = devm_reset_control_get_exclusive(i2c_dev->dev, "i2c");
if (IS_ERR(i2c_dev->rst))
return dev_err_probe(i2c_dev->dev, PTR_ERR(i2c_dev->rst),
"failed to get reset control\n");
return 0;
}
static int tegra_i2c_init_clocks(struct tegra_i2c_dev *i2c_dev)
{
int err;
@ -1788,10 +1770,6 @@ static int tegra_i2c_probe(struct platform_device *pdev)
tegra_i2c_parse_dt(i2c_dev);
err = tegra_i2c_init_reset(i2c_dev);
if (err)
return err;
err = tegra_i2c_init_clocks(i2c_dev);
if (err)
return err;

View File

@ -116,15 +116,16 @@ static int virtio_i2c_complete_reqs(struct virtqueue *vq,
for (i = 0; i < num; i++) {
struct virtio_i2c_req *req = &reqs[i];
wait_for_completion(&req->completion);
if (!failed && req->in_hdr.status != VIRTIO_I2C_MSG_OK)
failed = true;
if (!failed) {
if (wait_for_completion_interruptible(&req->completion))
failed = true;
else if (req->in_hdr.status != VIRTIO_I2C_MSG_OK)
failed = true;
else
j++;
}
i2c_put_dma_safe_msg_buf(reqs[i].buf, &msgs[i], !failed);
if (!failed)
j++;
}
return j;

View File

@ -145,13 +145,16 @@ void can_change_state(struct net_device *dev, struct can_frame *cf,
EXPORT_SYMBOL_GPL(can_change_state);
/* CAN device restart for bus-off recovery */
static void can_restart(struct net_device *dev)
static int can_restart(struct net_device *dev)
{
struct can_priv *priv = netdev_priv(dev);
struct sk_buff *skb;
struct can_frame *cf;
int err;
if (!priv->do_set_mode)
return -EOPNOTSUPP;
if (netif_carrier_ok(dev))
netdev_err(dev, "Attempt to restart for bus-off recovery, but carrier is OK?\n");
@ -173,10 +176,14 @@ static void can_restart(struct net_device *dev)
if (err) {
netdev_err(dev, "Restart failed, error %pe\n", ERR_PTR(err));
netif_carrier_off(dev);
return err;
} else {
netdev_dbg(dev, "Restarted\n");
priv->can_stats.restarts++;
}
return 0;
}
static void can_restart_work(struct work_struct *work)
@ -201,9 +208,8 @@ int can_restart_now(struct net_device *dev)
return -EBUSY;
cancel_delayed_work_sync(&priv->restart_work);
can_restart(dev);
return 0;
return can_restart(dev);
}
/* CAN bus-off

View File

@ -285,6 +285,12 @@ static int can_changelink(struct net_device *dev, struct nlattr *tb[],
}
if (data[IFLA_CAN_RESTART_MS]) {
if (!priv->do_set_mode) {
NL_SET_ERR_MSG(extack,
"Device doesn't support restart from Bus Off");
return -EOPNOTSUPP;
}
/* Do not allow changing restart delay while running */
if (dev->flags & IFF_UP)
return -EBUSY;
@ -292,6 +298,12 @@ static int can_changelink(struct net_device *dev, struct nlattr *tb[],
}
if (data[IFLA_CAN_RESTART]) {
if (!priv->do_set_mode) {
NL_SET_ERR_MSG(extack,
"Device doesn't support restart from Bus Off");
return -EOPNOTSUPP;
}
/* Do not allow a restart while not running */
if (!(dev->flags & IFF_UP))
return -EINVAL;

View File

@ -818,6 +818,9 @@ static void bcmasp_init_tx(struct bcmasp_intf *intf)
/* Tx SPB */
tx_spb_ctrl_wl(intf, ((intf->channel + 8) << TX_SPB_CTRL_XF_BID_SHIFT),
TX_SPB_CTRL_XF_CTRL2);
if (intf->parent->tx_chan_offset)
tx_pause_ctrl_wl(intf, (1 << (intf->channel + 8)), TX_PAUSE_MAP_VECTOR);
tx_spb_top_wl(intf, 0x1e, TX_SPB_TOP_BLKOUT);
tx_spb_dma_wq(intf, intf->tx_spb_dma_addr, TX_SPB_DMA_READ);

View File

@ -4666,12 +4666,19 @@ static int dpaa2_eth_connect_mac(struct dpaa2_eth_priv *priv)
return PTR_ERR(dpmac_dev);
}
if (IS_ERR(dpmac_dev) || dpmac_dev->dev.type != &fsl_mc_bus_dpmac_type)
if (IS_ERR(dpmac_dev))
return 0;
if (dpmac_dev->dev.type != &fsl_mc_bus_dpmac_type) {
err = 0;
goto out_put_device;
}
mac = kzalloc(sizeof(struct dpaa2_mac), GFP_KERNEL);
if (!mac)
return -ENOMEM;
if (!mac) {
err = -ENOMEM;
goto out_put_device;
}
mac->mc_dev = dpmac_dev;
mac->mc_io = priv->mc_io;
@ -4705,6 +4712,8 @@ static int dpaa2_eth_connect_mac(struct dpaa2_eth_priv *priv)
dpaa2_mac_close(mac);
err_free_mac:
kfree(mac);
out_put_device:
put_device(&dpmac_dev->dev);
return err;
}

View File

@ -1448,12 +1448,19 @@ static int dpaa2_switch_port_connect_mac(struct ethsw_port_priv *port_priv)
if (PTR_ERR(dpmac_dev) == -EPROBE_DEFER)
return PTR_ERR(dpmac_dev);
if (IS_ERR(dpmac_dev) || dpmac_dev->dev.type != &fsl_mc_bus_dpmac_type)
if (IS_ERR(dpmac_dev))
return 0;
if (dpmac_dev->dev.type != &fsl_mc_bus_dpmac_type) {
err = 0;
goto out_put_device;
}
mac = kzalloc(sizeof(*mac), GFP_KERNEL);
if (!mac)
return -ENOMEM;
if (!mac) {
err = -ENOMEM;
goto out_put_device;
}
mac->mc_dev = dpmac_dev;
mac->mc_io = port_priv->ethsw_data->mc_io;
@ -1483,6 +1490,8 @@ static int dpaa2_switch_port_connect_mac(struct ethsw_port_priv *port_priv)
dpaa2_mac_close(mac);
err_free_mac:
kfree(mac);
out_put_device:
put_device(&dpmac_dev->dev);
return err;
}

View File

@ -1917,49 +1917,56 @@ static void gve_turnup_and_check_status(struct gve_priv *priv)
gve_handle_link_status(priv, GVE_DEVICE_STATUS_LINK_STATUS_MASK & status);
}
static void gve_tx_timeout(struct net_device *dev, unsigned int txqueue)
static struct gve_notify_block *gve_get_tx_notify_block(struct gve_priv *priv,
unsigned int txqueue)
{
struct gve_notify_block *block;
struct gve_tx_ring *tx = NULL;
struct gve_priv *priv;
u32 last_nic_done;
u32 current_time;
u32 ntfy_idx;
netdev_info(dev, "Timeout on tx queue, %d", txqueue);
priv = netdev_priv(dev);
if (txqueue > priv->tx_cfg.num_queues)
goto reset;
return NULL;
ntfy_idx = gve_tx_idx_to_ntfy(priv, txqueue);
if (ntfy_idx >= priv->num_ntfy_blks)
goto reset;
return NULL;
block = &priv->ntfy_blocks[ntfy_idx];
tx = block->tx;
return &priv->ntfy_blocks[ntfy_idx];
}
static bool gve_tx_timeout_try_q_kick(struct gve_priv *priv,
unsigned int txqueue)
{
struct gve_notify_block *block;
u32 current_time;
block = gve_get_tx_notify_block(priv, txqueue);
if (!block)
return false;
current_time = jiffies_to_msecs(jiffies);
if (tx->last_kick_msec + MIN_TX_TIMEOUT_GAP > current_time)
goto reset;
if (block->tx->last_kick_msec + MIN_TX_TIMEOUT_GAP > current_time)
return false;
/* Check to see if there are missed completions, which will allow us to
* kick the queue.
*/
last_nic_done = gve_tx_load_event_counter(priv, tx);
if (last_nic_done - tx->done) {
netdev_info(dev, "Kicking queue %d", txqueue);
iowrite32be(GVE_IRQ_MASK, gve_irq_doorbell(priv, block));
napi_schedule(&block->napi);
tx->last_kick_msec = current_time;
goto out;
} // Else reset.
netdev_info(priv->dev, "Kicking queue %d", txqueue);
napi_schedule(&block->napi);
block->tx->last_kick_msec = current_time;
return true;
}
reset:
gve_schedule_reset(priv);
static void gve_tx_timeout(struct net_device *dev, unsigned int txqueue)
{
struct gve_notify_block *block;
struct gve_priv *priv;
out:
if (tx)
tx->queue_timeout++;
netdev_info(dev, "Timeout on tx queue, %d", txqueue);
priv = netdev_priv(dev);
if (!gve_tx_timeout_try_q_kick(priv, txqueue))
gve_schedule_reset(priv);
block = gve_get_tx_notify_block(priv, txqueue);
if (block)
block->tx->queue_timeout++;
priv->tx_timeo_cnt++;
}

View File

@ -11,6 +11,7 @@
#include <linux/irq.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/iommu.h>
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/skbuff.h>
@ -1039,6 +1040,8 @@ static bool hns3_can_use_tx_sgl(struct hns3_enet_ring *ring,
static void hns3_init_tx_spare_buffer(struct hns3_enet_ring *ring)
{
u32 alloc_size = ring->tqp->handle->kinfo.tx_spare_buf_size;
struct net_device *netdev = ring_to_netdev(ring);
struct hns3_nic_priv *priv = netdev_priv(netdev);
struct hns3_tx_spare *tx_spare;
struct page *page;
dma_addr_t dma;
@ -1080,6 +1083,7 @@ static void hns3_init_tx_spare_buffer(struct hns3_enet_ring *ring)
tx_spare->buf = page_address(page);
tx_spare->len = PAGE_SIZE << order;
ring->tx_spare = tx_spare;
ring->tx_copybreak = priv->tx_copybreak;
return;
dma_mapping_error:
@ -4874,6 +4878,30 @@ static void hns3_nic_dealloc_vector_data(struct hns3_nic_priv *priv)
devm_kfree(&pdev->dev, priv->tqp_vector);
}
static void hns3_update_tx_spare_buf_config(struct hns3_nic_priv *priv)
{
#define HNS3_MIN_SPARE_BUF_SIZE (2 * 1024 * 1024)
#define HNS3_MAX_PACKET_SIZE (64 * 1024)
struct iommu_domain *domain = iommu_get_domain_for_dev(priv->dev);
struct hnae3_ae_dev *ae_dev = hns3_get_ae_dev(priv->ae_handle);
struct hnae3_handle *handle = priv->ae_handle;
if (ae_dev->dev_version < HNAE3_DEVICE_VERSION_V3)
return;
if (!(domain && iommu_is_dma_domain(domain)))
return;
priv->min_tx_copybreak = HNS3_MAX_PACKET_SIZE;
priv->min_tx_spare_buf_size = HNS3_MIN_SPARE_BUF_SIZE;
if (priv->tx_copybreak < priv->min_tx_copybreak)
priv->tx_copybreak = priv->min_tx_copybreak;
if (handle->kinfo.tx_spare_buf_size < priv->min_tx_spare_buf_size)
handle->kinfo.tx_spare_buf_size = priv->min_tx_spare_buf_size;
}
static void hns3_ring_get_cfg(struct hnae3_queue *q, struct hns3_nic_priv *priv,
unsigned int ring_type)
{
@ -5107,6 +5135,7 @@ int hns3_init_all_ring(struct hns3_nic_priv *priv)
int i, j;
int ret;
hns3_update_tx_spare_buf_config(priv);
for (i = 0; i < ring_num; i++) {
ret = hns3_alloc_ring_memory(&priv->ring[i]);
if (ret) {
@ -5311,6 +5340,8 @@ static int hns3_client_init(struct hnae3_handle *handle)
priv->ae_handle = handle;
priv->tx_timeout_count = 0;
priv->max_non_tso_bd_num = ae_dev->dev_specs.max_non_tso_bd_num;
priv->min_tx_copybreak = 0;
priv->min_tx_spare_buf_size = 0;
set_bit(HNS3_NIC_STATE_DOWN, &priv->state);
handle->msg_enable = netif_msg_init(debug, DEFAULT_MSG_LEVEL);

View File

@ -596,6 +596,8 @@ struct hns3_nic_priv {
struct hns3_enet_coalesce rx_coal;
u32 tx_copybreak;
u32 rx_copybreak;
u32 min_tx_copybreak;
u32 min_tx_spare_buf_size;
};
union l3_hdr_info {

View File

@ -9576,33 +9576,36 @@ static bool hclge_need_enable_vport_vlan_filter(struct hclge_vport *vport)
return false;
}
int hclge_enable_vport_vlan_filter(struct hclge_vport *vport, bool request_en)
static int __hclge_enable_vport_vlan_filter(struct hclge_vport *vport,
bool request_en)
{
struct hclge_dev *hdev = vport->back;
bool need_en;
int ret;
mutex_lock(&hdev->vport_lock);
vport->req_vlan_fltr_en = request_en;
need_en = hclge_need_enable_vport_vlan_filter(vport);
if (need_en == vport->cur_vlan_fltr_en) {
mutex_unlock(&hdev->vport_lock);
if (need_en == vport->cur_vlan_fltr_en)
return 0;
}
ret = hclge_set_vport_vlan_filter(vport, need_en);
if (ret) {
mutex_unlock(&hdev->vport_lock);
if (ret)
return ret;
}
vport->cur_vlan_fltr_en = need_en;
return 0;
}
int hclge_enable_vport_vlan_filter(struct hclge_vport *vport, bool request_en)
{
struct hclge_dev *hdev = vport->back;
int ret;
mutex_lock(&hdev->vport_lock);
vport->req_vlan_fltr_en = request_en;
ret = __hclge_enable_vport_vlan_filter(vport, request_en);
mutex_unlock(&hdev->vport_lock);
return 0;
return ret;
}
static int hclge_enable_vlan_filter(struct hnae3_handle *handle, bool enable)
@ -10623,16 +10626,19 @@ static void hclge_sync_vlan_fltr_state(struct hclge_dev *hdev)
&vport->state))
continue;
ret = hclge_enable_vport_vlan_filter(vport,
vport->req_vlan_fltr_en);
mutex_lock(&hdev->vport_lock);
ret = __hclge_enable_vport_vlan_filter(vport,
vport->req_vlan_fltr_en);
if (ret) {
dev_err(&hdev->pdev->dev,
"failed to sync vlan filter state for vport%u, ret = %d\n",
vport->vport_id, ret);
set_bit(HCLGE_VPORT_STATE_VLAN_FLTR_CHANGE,
&vport->state);
mutex_unlock(&hdev->vport_lock);
return;
}
mutex_unlock(&hdev->vport_lock);
}
}

View File

@ -497,14 +497,14 @@ int hclge_ptp_init(struct hclge_dev *hdev)
if (ret) {
dev_err(&hdev->pdev->dev,
"failed to init freq, ret = %d\n", ret);
goto out;
goto out_clear_int;
}
ret = hclge_ptp_set_ts_mode(hdev, &hdev->ptp->ts_cfg);
if (ret) {
dev_err(&hdev->pdev->dev,
"failed to init ts mode, ret = %d\n", ret);
goto out;
goto out_clear_int;
}
ktime_get_real_ts64(&ts);
@ -512,7 +512,7 @@ int hclge_ptp_init(struct hclge_dev *hdev)
if (ret) {
dev_err(&hdev->pdev->dev,
"failed to init ts time, ret = %d\n", ret);
goto out;
goto out_clear_int;
}
set_bit(HCLGE_STATE_PTP_EN, &hdev->state);
@ -520,6 +520,9 @@ int hclge_ptp_init(struct hclge_dev *hdev)
return 0;
out_clear_int:
clear_bit(HCLGE_PTP_FLAG_EN, &hdev->ptp->flags);
hclge_ptp_int_en(hdev, false);
out:
hclge_ptp_destroy_clock(hdev);

View File

@ -3094,11 +3094,7 @@ static void hclgevf_uninit_ae_dev(struct hnae3_ae_dev *ae_dev)
static u32 hclgevf_get_max_channels(struct hclgevf_dev *hdev)
{
struct hnae3_handle *nic = &hdev->nic;
struct hnae3_knic_private_info *kinfo = &nic->kinfo;
return min_t(u32, hdev->rss_size_max,
hdev->num_tqps / kinfo->tc_info.num_tc);
return min(hdev->rss_size_max, hdev->num_tqps);
}
/**

View File

@ -638,6 +638,9 @@
/* For checksumming, the sum of all words in the NVM should equal 0xBABA. */
#define NVM_SUM 0xBABA
/* Uninitialized ("empty") checksum word value */
#define NVM_CHECKSUM_UNINITIALIZED 0xFFFF
/* PBA (printed board assembly) number words */
#define NVM_PBA_OFFSET_0 8
#define NVM_PBA_OFFSET_1 9

View File

@ -4274,6 +4274,8 @@ static s32 e1000_validate_nvm_checksum_ich8lan(struct e1000_hw *hw)
ret_val = e1000e_update_nvm_checksum(hw);
if (ret_val)
return ret_val;
} else if (hw->mac.type == e1000_pch_tgp) {
return 0;
}
}

View File

@ -558,6 +558,12 @@ s32 e1000e_validate_nvm_checksum_generic(struct e1000_hw *hw)
checksum += nvm_data;
}
if (hw->mac.type == e1000_pch_tgp &&
nvm_data == NVM_CHECKSUM_UNINITIALIZED) {
e_dbg("Uninitialized NVM Checksum on TGP platform - ignoring\n");
return 0;
}
if (checksum != (u16)NVM_SUM) {
e_dbg("NVM Checksum Invalid\n");
return -E1000_ERR_NVM;

View File

@ -3137,10 +3137,10 @@ static int i40e_vc_del_mac_addr_msg(struct i40e_vf *vf, u8 *msg)
const u8 *addr = al->list[i].addr;
/* Allow to delete VF primary MAC only if it was not set
* administratively by PF or if VF is trusted.
* administratively by PF.
*/
if (ether_addr_equal(addr, vf->default_lan_addr.addr)) {
if (i40e_can_vf_change_mac(vf))
if (!vf->pf_set_mac)
was_unimac_deleted = true;
else
continue;
@ -5006,7 +5006,7 @@ int i40e_get_vf_stats(struct net_device *netdev, int vf_id,
vf_stats->broadcast = stats->rx_broadcast;
vf_stats->multicast = stats->rx_multicast;
vf_stats->rx_dropped = stats->rx_discards + stats->rx_discards_other;
vf_stats->tx_dropped = stats->tx_discards;
vf_stats->tx_dropped = stats->tx_errors;
return 0;
}

View File

@ -2301,6 +2301,8 @@ enum ice_ddp_state ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf,
return ICE_DDP_PKG_ERR;
buf_copy = devm_kmemdup(ice_hw_to_dev(hw), buf, len, GFP_KERNEL);
if (!buf_copy)
return ICE_DDP_PKG_ERR;
state = ice_init_pkg(hw, buf_copy, len);
if (!ice_is_init_pkg_successful(state)) {

View File

@ -1947,8 +1947,8 @@ static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out,
err = mlx5_cmd_invoke(dev, inb, outb, out, out_size, callback, context,
pages_queue, token, force_polling);
if (callback)
return err;
if (callback && !err)
return 0;
if (err > 0) /* Failed in FW, command didn't execute */
err = deliv_status_to_err(err);

View File

@ -1182,19 +1182,19 @@ static void esw_set_peer_miss_rule_source_port(struct mlx5_eswitch *esw,
static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
struct mlx5_core_dev *peer_dev)
{
struct mlx5_eswitch *peer_esw = peer_dev->priv.eswitch;
struct mlx5_flow_destination dest = {};
struct mlx5_flow_act flow_act = {0};
struct mlx5_flow_handle **flows;
/* total vports is the same for both e-switches */
int nvports = esw->total_vports;
struct mlx5_flow_handle *flow;
struct mlx5_vport *peer_vport;
struct mlx5_flow_spec *spec;
struct mlx5_vport *vport;
int err, pfindex;
unsigned long i;
void *misc;
if (!MLX5_VPORT_MANAGER(esw->dev) && !mlx5_core_is_ecpf_esw_manager(esw->dev))
if (!MLX5_VPORT_MANAGER(peer_dev) &&
!mlx5_core_is_ecpf_esw_manager(peer_dev))
return 0;
spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
@ -1203,7 +1203,7 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
peer_miss_rules_setup(esw, peer_dev, spec, &dest);
flows = kvcalloc(nvports, sizeof(*flows), GFP_KERNEL);
flows = kvcalloc(peer_esw->total_vports, sizeof(*flows), GFP_KERNEL);
if (!flows) {
err = -ENOMEM;
goto alloc_flows_err;
@ -1213,10 +1213,10 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
misc = MLX5_ADDR_OF(fte_match_param, spec->match_value,
misc_parameters);
if (mlx5_core_is_ecpf_esw_manager(esw->dev)) {
vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF);
esw_set_peer_miss_rule_source_port(esw, peer_dev->priv.eswitch,
spec, MLX5_VPORT_PF);
if (mlx5_core_is_ecpf_esw_manager(peer_dev)) {
peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF);
esw_set_peer_miss_rule_source_port(esw, peer_esw, spec,
MLX5_VPORT_PF);
flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw),
spec, &flow_act, &dest, 1);
@ -1224,11 +1224,11 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
err = PTR_ERR(flow);
goto add_pf_flow_err;
}
flows[vport->index] = flow;
flows[peer_vport->index] = flow;
}
if (mlx5_ecpf_vport_exists(esw->dev)) {
vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_ECPF);
if (mlx5_ecpf_vport_exists(peer_dev)) {
peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_ECPF);
MLX5_SET(fte_match_set_misc, misc, source_port, MLX5_VPORT_ECPF);
flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw),
spec, &flow_act, &dest, 1);
@ -1236,13 +1236,14 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
err = PTR_ERR(flow);
goto add_ecpf_flow_err;
}
flows[vport->index] = flow;
flows[peer_vport->index] = flow;
}
mlx5_esw_for_each_vf_vport(esw, i, vport, mlx5_core_max_vfs(esw->dev)) {
mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport,
mlx5_core_max_vfs(peer_dev)) {
esw_set_peer_miss_rule_source_port(esw,
peer_dev->priv.eswitch,
spec, vport->vport);
peer_esw,
spec, peer_vport->vport);
flow = mlx5_add_flow_rules(mlx5_eswitch_get_slow_fdb(esw),
spec, &flow_act, &dest, 1);
@ -1250,22 +1251,22 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
err = PTR_ERR(flow);
goto add_vf_flow_err;
}
flows[vport->index] = flow;
flows[peer_vport->index] = flow;
}
if (mlx5_core_ec_sriov_enabled(esw->dev)) {
mlx5_esw_for_each_ec_vf_vport(esw, i, vport, mlx5_core_max_ec_vfs(esw->dev)) {
if (i >= mlx5_core_max_ec_vfs(peer_dev))
break;
esw_set_peer_miss_rule_source_port(esw, peer_dev->priv.eswitch,
spec, vport->vport);
if (mlx5_core_ec_sriov_enabled(peer_dev)) {
mlx5_esw_for_each_ec_vf_vport(peer_esw, i, peer_vport,
mlx5_core_max_ec_vfs(peer_dev)) {
esw_set_peer_miss_rule_source_port(esw, peer_esw,
spec,
peer_vport->vport);
flow = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb,
spec, &flow_act, &dest, 1);
if (IS_ERR(flow)) {
err = PTR_ERR(flow);
goto add_ec_vf_flow_err;
}
flows[vport->index] = flow;
flows[peer_vport->index] = flow;
}
}
@ -1282,25 +1283,27 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
return 0;
add_ec_vf_flow_err:
mlx5_esw_for_each_ec_vf_vport(esw, i, vport, mlx5_core_max_ec_vfs(esw->dev)) {
if (!flows[vport->index])
mlx5_esw_for_each_ec_vf_vport(peer_esw, i, peer_vport,
mlx5_core_max_ec_vfs(peer_dev)) {
if (!flows[peer_vport->index])
continue;
mlx5_del_flow_rules(flows[vport->index]);
mlx5_del_flow_rules(flows[peer_vport->index]);
}
add_vf_flow_err:
mlx5_esw_for_each_vf_vport(esw, i, vport, mlx5_core_max_vfs(esw->dev)) {
if (!flows[vport->index])
mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport,
mlx5_core_max_vfs(peer_dev)) {
if (!flows[peer_vport->index])
continue;
mlx5_del_flow_rules(flows[vport->index]);
mlx5_del_flow_rules(flows[peer_vport->index]);
}
if (mlx5_ecpf_vport_exists(esw->dev)) {
vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_ECPF);
mlx5_del_flow_rules(flows[vport->index]);
if (mlx5_ecpf_vport_exists(peer_dev)) {
peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_ECPF);
mlx5_del_flow_rules(flows[peer_vport->index]);
}
add_ecpf_flow_err:
if (mlx5_core_is_ecpf_esw_manager(esw->dev)) {
vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF);
mlx5_del_flow_rules(flows[vport->index]);
if (mlx5_core_is_ecpf_esw_manager(peer_dev)) {
peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF);
mlx5_del_flow_rules(flows[peer_vport->index]);
}
add_pf_flow_err:
esw_warn(esw->dev, "FDB: Failed to add peer miss flow rule err %d\n", err);
@ -1313,37 +1316,34 @@ static int esw_add_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
static void esw_del_fdb_peer_miss_rules(struct mlx5_eswitch *esw,
struct mlx5_core_dev *peer_dev)
{
struct mlx5_eswitch *peer_esw = peer_dev->priv.eswitch;
u16 peer_index = mlx5_get_dev_index(peer_dev);
struct mlx5_flow_handle **flows;
struct mlx5_vport *vport;
struct mlx5_vport *peer_vport;
unsigned long i;
flows = esw->fdb_table.offloads.peer_miss_rules[peer_index];
if (!flows)
return;
if (mlx5_core_ec_sriov_enabled(esw->dev)) {
mlx5_esw_for_each_ec_vf_vport(esw, i, vport, mlx5_core_max_ec_vfs(esw->dev)) {
/* The flow for a particular vport could be NULL if the other ECPF
* has fewer or no VFs enabled
*/
if (!flows[vport->index])
continue;
mlx5_del_flow_rules(flows[vport->index]);
}
if (mlx5_core_ec_sriov_enabled(peer_dev)) {
mlx5_esw_for_each_ec_vf_vport(peer_esw, i, peer_vport,
mlx5_core_max_ec_vfs(peer_dev))
mlx5_del_flow_rules(flows[peer_vport->index]);
}
mlx5_esw_for_each_vf_vport(esw, i, vport, mlx5_core_max_vfs(esw->dev))
mlx5_del_flow_rules(flows[vport->index]);
mlx5_esw_for_each_vf_vport(peer_esw, i, peer_vport,
mlx5_core_max_vfs(peer_dev))
mlx5_del_flow_rules(flows[peer_vport->index]);
if (mlx5_ecpf_vport_exists(esw->dev)) {
vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_ECPF);
mlx5_del_flow_rules(flows[vport->index]);
if (mlx5_ecpf_vport_exists(peer_dev)) {
peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_ECPF);
mlx5_del_flow_rules(flows[peer_vport->index]);
}
if (mlx5_core_is_ecpf_esw_manager(esw->dev)) {
vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF);
mlx5_del_flow_rules(flows[vport->index]);
if (mlx5_core_is_ecpf_esw_manager(peer_dev)) {
peer_vport = mlx5_eswitch_get_vport(peer_esw, MLX5_VPORT_PF);
mlx5_del_flow_rules(flows[peer_vport->index]);
}
kvfree(flows);

View File

@ -288,8 +288,12 @@ static int prueth_fw_offload_buffer_setup(struct prueth_emac *emac)
int i;
addr = lower_32_bits(prueth->msmcram.pa);
if (slice)
addr += PRUETH_NUM_BUF_POOLS * PRUETH_EMAC_BUF_POOL_SIZE;
if (slice) {
if (prueth->pdata.banked_ms_ram)
addr += MSMC_RAM_BANK_SIZE;
else
addr += PRUETH_SW_TOTAL_BUF_SIZE_PER_SLICE;
}
if (addr % SZ_64K) {
dev_warn(prueth->dev, "buffer pool needs to be 64KB aligned\n");
@ -297,43 +301,66 @@ static int prueth_fw_offload_buffer_setup(struct prueth_emac *emac)
}
bpool_cfg = emac->dram.va + BUFFER_POOL_0_ADDR_OFFSET;
/* workaround for f/w bug. bpool 0 needs to be initialized */
for (i = 0; i < PRUETH_NUM_BUF_POOLS; i++) {
/* Configure buffer pools for forwarding buffers
* - used by firmware to store packets to be forwarded to other port
* - 8 total pools per slice
*/
for (i = 0; i < PRUETH_NUM_FWD_BUF_POOLS_PER_SLICE; i++) {
writel(addr, &bpool_cfg[i].addr);
writel(PRUETH_EMAC_BUF_POOL_SIZE, &bpool_cfg[i].len);
addr += PRUETH_EMAC_BUF_POOL_SIZE;
writel(PRUETH_SW_FWD_BUF_POOL_SIZE, &bpool_cfg[i].len);
addr += PRUETH_SW_FWD_BUF_POOL_SIZE;
}
if (!slice)
addr += PRUETH_NUM_BUF_POOLS * PRUETH_EMAC_BUF_POOL_SIZE;
else
addr += PRUETH_SW_NUM_BUF_POOLS_HOST * PRUETH_SW_BUF_POOL_SIZE_HOST;
/* Configure buffer pools for Local Injection buffers
* - used by firmware to store packets received from host core
* - 16 total pools per slice
*/
for (i = 0; i < PRUETH_NUM_LI_BUF_POOLS_PER_SLICE; i++) {
int cfg_idx = i + PRUETH_NUM_FWD_BUF_POOLS_PER_SLICE;
for (i = PRUETH_NUM_BUF_POOLS;
i < 2 * PRUETH_SW_NUM_BUF_POOLS_HOST + PRUETH_NUM_BUF_POOLS;
i++) {
/* The driver only uses first 4 queues per PRU so only initialize them */
if (i % PRUETH_SW_NUM_BUF_POOLS_HOST < PRUETH_SW_NUM_BUF_POOLS_PER_PRU) {
writel(addr, &bpool_cfg[i].addr);
writel(PRUETH_SW_BUF_POOL_SIZE_HOST, &bpool_cfg[i].len);
addr += PRUETH_SW_BUF_POOL_SIZE_HOST;
/* The driver only uses first 4 queues per PRU,
* so only initialize buffer for them
*/
if ((i % PRUETH_NUM_LI_BUF_POOLS_PER_PORT_PER_SLICE)
< PRUETH_SW_USED_LI_BUF_POOLS_PER_PORT_PER_SLICE) {
writel(addr, &bpool_cfg[cfg_idx].addr);
writel(PRUETH_SW_LI_BUF_POOL_SIZE,
&bpool_cfg[cfg_idx].len);
addr += PRUETH_SW_LI_BUF_POOL_SIZE;
} else {
writel(0, &bpool_cfg[i].addr);
writel(0, &bpool_cfg[i].len);
writel(0, &bpool_cfg[cfg_idx].addr);
writel(0, &bpool_cfg[cfg_idx].len);
}
}
if (!slice)
addr += PRUETH_SW_NUM_BUF_POOLS_HOST * PRUETH_SW_BUF_POOL_SIZE_HOST;
else
addr += PRUETH_EMAC_RX_CTX_BUF_SIZE;
/* Express RX buffer queue
* - used by firmware to store express packets to be transmitted
* to the host core
*/
rxq_ctx = emac->dram.va + HOST_RX_Q_EXP_CONTEXT_OFFSET;
for (i = 0; i < 3; i++)
writel(addr, &rxq_ctx->start[i]);
addr += PRUETH_SW_HOST_EXP_BUF_POOL_SIZE;
writel(addr, &rxq_ctx->end);
/* Pre-emptible RX buffer queue
* - used by firmware to store preemptible packets to be transmitted
* to the host core
*/
rxq_ctx = emac->dram.va + HOST_RX_Q_PRE_CONTEXT_OFFSET;
for (i = 0; i < 3; i++)
writel(addr, &rxq_ctx->start[i]);
addr += PRUETH_EMAC_RX_CTX_BUF_SIZE;
writel(addr - SZ_2K, &rxq_ctx->end);
addr += PRUETH_SW_HOST_PRE_BUF_POOL_SIZE;
writel(addr, &rxq_ctx->end);
/* Set pointer for default dropped packet write
* - used by firmware to temporarily store packet to be dropped
*/
rxq_ctx = emac->dram.va + DEFAULT_MSMC_Q_OFFSET;
writel(addr, &rxq_ctx->start[0]);
return 0;
}
@ -347,13 +374,13 @@ static int prueth_emac_buffer_setup(struct prueth_emac *emac)
u32 addr;
int i;
/* Layout to have 64KB aligned buffer pool
* |BPOOL0|BPOOL1|RX_CTX0|RX_CTX1|
*/
addr = lower_32_bits(prueth->msmcram.pa);
if (slice)
addr += PRUETH_NUM_BUF_POOLS * PRUETH_EMAC_BUF_POOL_SIZE;
if (slice) {
if (prueth->pdata.banked_ms_ram)
addr += MSMC_RAM_BANK_SIZE;
else
addr += PRUETH_EMAC_TOTAL_BUF_SIZE_PER_SLICE;
}
if (addr % SZ_64K) {
dev_warn(prueth->dev, "buffer pool needs to be 64KB aligned\n");
@ -361,39 +388,66 @@ static int prueth_emac_buffer_setup(struct prueth_emac *emac)
}
bpool_cfg = emac->dram.va + BUFFER_POOL_0_ADDR_OFFSET;
/* workaround for f/w bug. bpool 0 needs to be initilalized */
writel(addr, &bpool_cfg[0].addr);
writel(0, &bpool_cfg[0].len);
for (i = PRUETH_EMAC_BUF_POOL_START;
i < PRUETH_EMAC_BUF_POOL_START + PRUETH_NUM_BUF_POOLS;
i++) {
writel(addr, &bpool_cfg[i].addr);
writel(PRUETH_EMAC_BUF_POOL_SIZE, &bpool_cfg[i].len);
addr += PRUETH_EMAC_BUF_POOL_SIZE;
/* Configure buffer pools for forwarding buffers
* - in mac mode - no forwarding so initialize all pools to 0
* - 8 total pools per slice
*/
for (i = 0; i < PRUETH_NUM_FWD_BUF_POOLS_PER_SLICE; i++) {
writel(0, &bpool_cfg[i].addr);
writel(0, &bpool_cfg[i].len);
}
if (!slice)
addr += PRUETH_NUM_BUF_POOLS * PRUETH_EMAC_BUF_POOL_SIZE;
else
addr += PRUETH_EMAC_RX_CTX_BUF_SIZE * 2;
/* Configure buffer pools for Local Injection buffers
* - used by firmware to store packets received from host core
* - 16 total pools per slice
*/
bpool_cfg = emac->dram.va + BUFFER_POOL_0_ADDR_OFFSET;
for (i = 0; i < PRUETH_NUM_LI_BUF_POOLS_PER_SLICE; i++) {
int cfg_idx = i + PRUETH_NUM_FWD_BUF_POOLS_PER_SLICE;
/* Pre-emptible RX buffer queue */
rxq_ctx = emac->dram.va + HOST_RX_Q_PRE_CONTEXT_OFFSET;
for (i = 0; i < 3; i++)
writel(addr, &rxq_ctx->start[i]);
/* In EMAC mode, only first 4 buffers are used,
* as 1 slice needs to handle only 1 port
*/
if (i < PRUETH_EMAC_USED_LI_BUF_POOLS_PER_PORT_PER_SLICE) {
writel(addr, &bpool_cfg[cfg_idx].addr);
writel(PRUETH_EMAC_LI_BUF_POOL_SIZE,
&bpool_cfg[cfg_idx].len);
addr += PRUETH_EMAC_LI_BUF_POOL_SIZE;
} else {
writel(0, &bpool_cfg[cfg_idx].addr);
writel(0, &bpool_cfg[cfg_idx].len);
}
}
addr += PRUETH_EMAC_RX_CTX_BUF_SIZE;
writel(addr, &rxq_ctx->end);
/* Express RX buffer queue */
/* Express RX buffer queue
* - used by firmware to store express packets to be transmitted
* to host core
*/
rxq_ctx = emac->dram.va + HOST_RX_Q_EXP_CONTEXT_OFFSET;
for (i = 0; i < 3; i++)
writel(addr, &rxq_ctx->start[i]);
addr += PRUETH_EMAC_RX_CTX_BUF_SIZE;
addr += PRUETH_EMAC_HOST_EXP_BUF_POOL_SIZE;
writel(addr, &rxq_ctx->end);
/* Pre-emptible RX buffer queue
* - used by firmware to store preemptible packets to be transmitted
* to host core
*/
rxq_ctx = emac->dram.va + HOST_RX_Q_PRE_CONTEXT_OFFSET;
for (i = 0; i < 3; i++)
writel(addr, &rxq_ctx->start[i]);
addr += PRUETH_EMAC_HOST_PRE_BUF_POOL_SIZE;
writel(addr, &rxq_ctx->end);
/* Set pointer for default dropped packet write
* - used by firmware to temporarily store packet to be dropped
*/
rxq_ctx = emac->dram.va + DEFAULT_MSMC_Q_OFFSET;
writel(addr, &rxq_ctx->start[0]);
return 0;
}

View File

@ -26,21 +26,71 @@ struct icssg_flow_cfg {
#define PRUETH_MAX_RX_FLOWS 1 /* excluding default flow */
#define PRUETH_RX_FLOW_DATA 0
#define PRUETH_EMAC_BUF_POOL_SIZE SZ_8K
#define PRUETH_EMAC_POOLS_PER_SLICE 24
#define PRUETH_EMAC_BUF_POOL_START 8
#define PRUETH_NUM_BUF_POOLS 8
#define PRUETH_EMAC_RX_CTX_BUF_SIZE SZ_16K /* per slice */
#define MSMC_RAM_SIZE \
(2 * (PRUETH_EMAC_BUF_POOL_SIZE * PRUETH_NUM_BUF_POOLS + \
PRUETH_EMAC_RX_CTX_BUF_SIZE * 2))
/* Defines for forwarding path buffer pools:
* - used by firmware to store packets to be forwarded to other port
* - 8 total pools per slice
* - only used in switch mode (as no forwarding in mac mode)
*/
#define PRUETH_NUM_FWD_BUF_POOLS_PER_SLICE 8
#define PRUETH_SW_FWD_BUF_POOL_SIZE (SZ_8K)
#define PRUETH_SW_BUF_POOL_SIZE_HOST SZ_4K
#define PRUETH_SW_NUM_BUF_POOLS_HOST 8
#define PRUETH_SW_NUM_BUF_POOLS_PER_PRU 4
#define MSMC_RAM_SIZE_SWITCH_MODE \
(MSMC_RAM_SIZE + \
(2 * PRUETH_SW_BUF_POOL_SIZE_HOST * PRUETH_SW_NUM_BUF_POOLS_HOST))
/* Defines for local injection path buffer pools:
* - used by firmware to store packets received from host core
* - 16 total pools per slice
* - 8 pools per port per slice and each slice handles both ports
* - only 4 out of 8 pools used per port (as only 4 real QoS levels in ICSSG)
* - switch mode: 8 total pools used
* - mac mode: 4 total pools used
*/
#define PRUETH_NUM_LI_BUF_POOLS_PER_SLICE 16
#define PRUETH_NUM_LI_BUF_POOLS_PER_PORT_PER_SLICE 8
#define PRUETH_SW_LI_BUF_POOL_SIZE SZ_4K
#define PRUETH_SW_USED_LI_BUF_POOLS_PER_SLICE 8
#define PRUETH_SW_USED_LI_BUF_POOLS_PER_PORT_PER_SLICE 4
#define PRUETH_EMAC_LI_BUF_POOL_SIZE SZ_8K
#define PRUETH_EMAC_USED_LI_BUF_POOLS_PER_SLICE 4
#define PRUETH_EMAC_USED_LI_BUF_POOLS_PER_PORT_PER_SLICE 4
/* Defines for host egress path - express and preemptible buffers
* - used by firmware to store express and preemptible packets
* to be transmitted to host core
* - used by both mac/switch modes
*/
#define PRUETH_SW_HOST_EXP_BUF_POOL_SIZE SZ_16K
#define PRUETH_SW_HOST_PRE_BUF_POOL_SIZE (SZ_16K - SZ_2K)
#define PRUETH_EMAC_HOST_EXP_BUF_POOL_SIZE PRUETH_SW_HOST_EXP_BUF_POOL_SIZE
#define PRUETH_EMAC_HOST_PRE_BUF_POOL_SIZE PRUETH_SW_HOST_PRE_BUF_POOL_SIZE
/* Buffer used by firmware to temporarily store packet to be dropped */
#define PRUETH_SW_DROP_PKT_BUF_SIZE SZ_2K
#define PRUETH_EMAC_DROP_PKT_BUF_SIZE PRUETH_SW_DROP_PKT_BUF_SIZE
/* Total switch mode memory usage for buffers per slice */
#define PRUETH_SW_TOTAL_BUF_SIZE_PER_SLICE \
(PRUETH_SW_FWD_BUF_POOL_SIZE * PRUETH_NUM_FWD_BUF_POOLS_PER_SLICE + \
PRUETH_SW_LI_BUF_POOL_SIZE * PRUETH_SW_USED_LI_BUF_POOLS_PER_SLICE + \
PRUETH_SW_HOST_EXP_BUF_POOL_SIZE + \
PRUETH_SW_HOST_PRE_BUF_POOL_SIZE + \
PRUETH_SW_DROP_PKT_BUF_SIZE)
/* Total switch mode memory usage for all buffers */
#define PRUETH_SW_TOTAL_BUF_SIZE \
(2 * PRUETH_SW_TOTAL_BUF_SIZE_PER_SLICE)
/* Total mac mode memory usage for buffers per slice */
#define PRUETH_EMAC_TOTAL_BUF_SIZE_PER_SLICE \
(PRUETH_EMAC_LI_BUF_POOL_SIZE * \
PRUETH_EMAC_USED_LI_BUF_POOLS_PER_SLICE + \
PRUETH_EMAC_HOST_EXP_BUF_POOL_SIZE + \
PRUETH_EMAC_HOST_PRE_BUF_POOL_SIZE + \
PRUETH_EMAC_DROP_PKT_BUF_SIZE)
/* Total mac mode memory usage for all buffers */
#define PRUETH_EMAC_TOTAL_BUF_SIZE \
(2 * PRUETH_EMAC_TOTAL_BUF_SIZE_PER_SLICE)
/* Size of 1 bank of MSMC/OC_SRAM memory */
#define MSMC_RAM_BANK_SIZE SZ_256K
#define PRUETH_SWITCH_FDB_MASK ((SIZE_OF_FDB / NUMBER_OF_FDB_BUCKET_ENTRIES) - 1)

View File

@ -1764,10 +1764,15 @@ static int prueth_probe(struct platform_device *pdev)
goto put_mem;
}
msmc_ram_size = MSMC_RAM_SIZE;
prueth->is_switchmode_supported = prueth->pdata.switch_mode;
if (prueth->is_switchmode_supported)
msmc_ram_size = MSMC_RAM_SIZE_SWITCH_MODE;
if (prueth->pdata.banked_ms_ram) {
/* Reserve 2 MSMC RAM banks for buffers to avoid arbitration */
msmc_ram_size = (2 * MSMC_RAM_BANK_SIZE);
} else {
msmc_ram_size = PRUETH_EMAC_TOTAL_BUF_SIZE;
if (prueth->is_switchmode_supported)
msmc_ram_size = PRUETH_SW_TOTAL_BUF_SIZE;
}
/* NOTE: FW bug needs buffer base to be 64KB aligned */
prueth->msmcram.va =
@ -1924,7 +1929,8 @@ static int prueth_probe(struct platform_device *pdev)
free_pool:
gen_pool_free(prueth->sram_pool,
(unsigned long)prueth->msmcram.va, msmc_ram_size);
(unsigned long)prueth->msmcram.va,
prueth->msmcram.size);
put_mem:
pruss_release_mem_region(prueth->pruss, &prueth->shram);
@ -1976,8 +1982,8 @@ static void prueth_remove(struct platform_device *pdev)
icss_iep_put(prueth->iep0);
gen_pool_free(prueth->sram_pool,
(unsigned long)prueth->msmcram.va,
MSMC_RAM_SIZE);
(unsigned long)prueth->msmcram.va,
prueth->msmcram.size);
pruss_release_mem_region(prueth->pruss, &prueth->shram);
@ -1994,12 +2000,14 @@ static const struct prueth_pdata am654_icssg_pdata = {
.fdqring_mode = K3_RINGACC_RING_MODE_MESSAGE,
.quirk_10m_link_issue = 1,
.switch_mode = 1,
.banked_ms_ram = 0,
};
static const struct prueth_pdata am64x_icssg_pdata = {
.fdqring_mode = K3_RINGACC_RING_MODE_RING,
.quirk_10m_link_issue = 1,
.switch_mode = 1,
.banked_ms_ram = 1,
};
static const struct of_device_id prueth_dt_match[] = {

View File

@ -251,11 +251,13 @@ struct prueth_emac {
* @fdqring_mode: Free desc queue mode
* @quirk_10m_link_issue: 10M link detect errata
* @switch_mode: switch firmware support
* @banked_ms_ram: banked memory support
*/
struct prueth_pdata {
enum k3_ring_mode fdqring_mode;
u32 quirk_10m_link_issue:1;
u32 switch_mode:1;
u32 banked_ms_ram:1;
};
struct icssg_firmwares {

View File

@ -180,6 +180,9 @@
/* Used to notify the FW of the current link speed */
#define PORT_LINK_SPEED_OFFSET 0x00A8
/* 2k memory pointer reserved for default writes by PRU0*/
#define DEFAULT_MSMC_Q_OFFSET 0x00AC
/* TAS gate mask for windows list0 */
#define TAS_GATE_MASK_LIST0 0x0100

View File

@ -32,7 +32,6 @@ struct netkit {
struct netkit_link {
struct bpf_link link;
struct net_device *dev;
u32 location;
};
static __always_inline int
@ -733,8 +732,8 @@ static void netkit_link_fdinfo(const struct bpf_link *link, struct seq_file *seq
seq_printf(seq, "ifindex:\t%u\n", ifindex);
seq_printf(seq, "attach_type:\t%u (%s)\n",
nkl->location,
nkl->location == BPF_NETKIT_PRIMARY ? "primary" : "peer");
link->attach_type,
link->attach_type == BPF_NETKIT_PRIMARY ? "primary" : "peer");
}
static int netkit_link_fill_info(const struct bpf_link *link,
@ -749,7 +748,7 @@ static int netkit_link_fill_info(const struct bpf_link *link,
rtnl_unlock();
info->netkit.ifindex = ifindex;
info->netkit.attach_type = nkl->location;
info->netkit.attach_type = link->attach_type;
return 0;
}
@ -775,8 +774,7 @@ static int netkit_link_init(struct netkit_link *nkl,
struct bpf_prog *prog)
{
bpf_link_init(&nkl->link, BPF_LINK_TYPE_NETKIT,
&netkit_link_lops, prog);
nkl->location = attr->link_create.attach_type;
&netkit_link_lops, prog, attr->link_create.attach_type);
nkl->dev = dev;
return bpf_link_prime(&nkl->link, link_primer);
}

View File

@ -2508,6 +2508,7 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
}
EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
#if IS_ENABLED(CONFIG_PCI_PWRCTRL)
static struct platform_device *pci_pwrctrl_create_device(struct pci_bus *bus, int devfn)
{
struct pci_host_bridge *host = pci_find_host_bridge(bus);
@ -2537,6 +2538,12 @@ static struct platform_device *pci_pwrctrl_create_device(struct pci_bus *bus, in
return pdev;
}
#else
static struct platform_device *pci_pwrctrl_create_device(struct pci_bus *bus, int devfn)
{
return NULL;
}
#endif
/*
* Read the config data for a PCI device, sanity-check it,

View File

@ -662,6 +662,7 @@ static void gaokun_aux_release(struct device *dev)
{
struct auxiliary_device *adev = to_auxiliary_dev(dev);
of_node_put(dev->of_node);
kfree(adev);
}
@ -693,6 +694,7 @@ static int gaokun_aux_init(struct device *parent, const char *name,
ret = auxiliary_device_init(adev);
if (ret) {
of_node_put(adev->dev.of_node);
kfree(adev);
return ret;
}

View File

@ -15,6 +15,7 @@
#include <linux/hwmon.h>
#include <linux/platform_device.h>
#include <linux/string.h>
#include <linux/string_helpers.h>
#include <uapi/linux/psci.h>
#define MLXBF_PMC_WRITE_REG_32 0x82000009
@ -1222,7 +1223,7 @@ static int mlxbf_pmc_get_event_num(const char *blk, const char *evt)
return -ENODEV;
}
/* Get the event number given the name */
/* Get the event name given the number */
static char *mlxbf_pmc_get_event_name(const char *blk, u32 evt)
{
const struct mlxbf_pmc_events *events;
@ -1784,6 +1785,7 @@ static ssize_t mlxbf_pmc_event_store(struct device *dev,
attr, struct mlxbf_pmc_attribute, dev_attr);
unsigned int blk_num, cnt_num;
bool is_l3 = false;
char *evt_name;
int evt_num;
int err;
@ -1791,14 +1793,23 @@ static ssize_t mlxbf_pmc_event_store(struct device *dev,
cnt_num = attr_event->index;
if (isalpha(buf[0])) {
/* Remove the trailing newline character if present */
evt_name = kstrdup_and_replace(buf, '\n', '\0', GFP_KERNEL);
if (!evt_name)
return -ENOMEM;
evt_num = mlxbf_pmc_get_event_num(pmc->block_name[blk_num],
buf);
evt_name);
kfree(evt_name);
if (evt_num < 0)
return -EINVAL;
} else {
err = kstrtouint(buf, 0, &evt_num);
if (err < 0)
return err;
if (!mlxbf_pmc_get_event_name(pmc->block_name[blk_num], evt_num))
return -EINVAL;
}
if (strstr(pmc->block_name[blk_num], "l3cache"))
@ -1879,13 +1890,14 @@ static ssize_t mlxbf_pmc_enable_store(struct device *dev,
{
struct mlxbf_pmc_attribute *attr_enable = container_of(
attr, struct mlxbf_pmc_attribute, dev_attr);
unsigned int en, blk_num;
unsigned int blk_num;
u32 word;
int err;
bool en;
blk_num = attr_enable->nr;
err = kstrtouint(buf, 0, &en);
err = kstrtobool(buf, &en);
if (err < 0)
return err;
@ -1905,14 +1917,11 @@ static ssize_t mlxbf_pmc_enable_store(struct device *dev,
MLXBF_PMC_CRSPACE_PERFMON_CTL(pmc->block[blk_num].counters),
MLXBF_PMC_WRITE_REG_32, word);
} else {
if (en && en != 1)
return -EINVAL;
err = mlxbf_pmc_config_l3_counters(blk_num, false, !!en);
if (err)
return err;
if (en == 1) {
if (en) {
err = mlxbf_pmc_config_l3_counters(blk_num, true, false);
if (err)
return err;

View File

@ -58,6 +58,8 @@ obj-$(CONFIG_X86_PLATFORM_DRIVERS_HP) += hp/
# Hewlett Packard Enterprise
obj-$(CONFIG_UV_SYSFS) += uv_sysfs.o
obj-$(CONFIG_FW_ATTR_CLASS) += firmware_attributes_class.o
# IBM Thinkpad and Lenovo
obj-$(CONFIG_IBM_RTL) += ibm_rtl.o
obj-$(CONFIG_IDEAPAD_LAPTOP) += ideapad-laptop.o
@ -128,7 +130,6 @@ obj-$(CONFIG_SYSTEM76_ACPI) += system76_acpi.o
obj-$(CONFIG_TOPSTAR_LAPTOP) += topstar-laptop.o
# Platform drivers
obj-$(CONFIG_FW_ATTR_CLASS) += firmware_attributes_class.o
obj-$(CONFIG_SERIAL_MULTI_INSTANTIATE) += serial-multi-instantiate.o
obj-$(CONFIG_TOUCHSCREEN_DMI) += touchscreen_dmi.o
obj-$(CONFIG_WIRELESS_HOTKEY) += wireless-hotkey.o

View File

@ -89,6 +89,14 @@ static struct awcc_quirks generic_quirks = {
static struct awcc_quirks empty_quirks;
static const struct dmi_system_id awcc_dmi_table[] __initconst = {
{
.ident = "Alienware Area-51m",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Alienware"),
DMI_MATCH(DMI_PRODUCT_NAME, "Alienware Area-51m"),
},
.driver_data = &generic_quirks,
},
{
.ident = "Alienware Area-51m R2",
.matches = {
@ -97,6 +105,14 @@ static const struct dmi_system_id awcc_dmi_table[] __initconst = {
},
.driver_data = &generic_quirks,
},
{
.ident = "Alienware m15 R5",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Alienware"),
DMI_MATCH(DMI_PRODUCT_NAME, "Alienware m15 R5"),
},
.driver_data = &generic_quirks,
},
{
.ident = "Alienware m15 R7",
.matches = {
@ -233,6 +249,7 @@ static const struct dmi_system_id awcc_dmi_table[] __initconst = {
},
.driver_data = &g_series_quirks,
},
{}
};
enum AWCC_GET_FAN_SENSORS_OPERATIONS {

View File

@ -49,6 +49,7 @@ static const struct dmi_system_id lis3lv02d_devices[] __initconst = {
DELL_LIS3LV02D_DMI_ENTRY("Latitude E6330", 0x29),
DELL_LIS3LV02D_DMI_ENTRY("Latitude E6430", 0x29),
DELL_LIS3LV02D_DMI_ENTRY("Precision 3540", 0x29),
DELL_LIS3LV02D_DMI_ENTRY("Precision 3551", 0x29),
DELL_LIS3LV02D_DMI_ENTRY("Precision M6800", 0x29),
DELL_LIS3LV02D_DMI_ENTRY("Vostro V131", 0x1d),
DELL_LIS3LV02D_DMI_ENTRY("Vostro 5568", 0x29),

View File

@ -689,9 +689,13 @@ static int dell_wmi_ddv_battery_translate(struct dell_wmi_ddv_data *data,
dev_dbg(&data->wdev->dev, "Translation cache miss\n");
/* Perform a translation between a ACPI battery and a battery index */
ret = power_supply_get_property(battery, POWER_SUPPLY_PROP_SERIAL_NUMBER, &val);
/*
* Perform a translation between a ACPI battery and a battery index.
* We have to use power_supply_get_property_direct() here because this
* function will also get called from the callbacks of the power supply
* extension.
*/
ret = power_supply_get_property_direct(battery, POWER_SUPPLY_PROP_SERIAL_NUMBER, &val);
if (ret < 0)
return ret;

View File

@ -1669,7 +1669,7 @@ static int ideapad_kbd_bl_init(struct ideapad_private *priv)
priv->kbd_bl.led.name = "platform::" LED_FUNCTION_KBD_BACKLIGHT;
priv->kbd_bl.led.brightness_get = ideapad_kbd_bl_led_cdev_brightness_get;
priv->kbd_bl.led.brightness_set_blocking = ideapad_kbd_bl_led_cdev_brightness_set;
priv->kbd_bl.led.flags = LED_BRIGHT_HW_CHANGED;
priv->kbd_bl.led.flags = LED_BRIGHT_HW_CHANGED | LED_RETAIN_AT_SHUTDOWN;
err = led_classdev_register(&priv->platform_device->dev, &priv->kbd_bl.led);
if (err)
@ -1728,7 +1728,7 @@ static int ideapad_fn_lock_led_init(struct ideapad_private *priv)
priv->fn_lock.led.name = "platform::" LED_FUNCTION_FNLOCK;
priv->fn_lock.led.brightness_get = ideapad_fn_lock_led_cdev_get;
priv->fn_lock.led.brightness_set_blocking = ideapad_fn_lock_led_cdev_set;
priv->fn_lock.led.flags = LED_BRIGHT_HW_CHANGED;
priv->fn_lock.led.flags = LED_BRIGHT_HW_CHANGED | LED_RETAIN_AT_SHUTDOWN;
err = led_classdev_register(&priv->platform_device->dev, &priv->fn_lock.led);
if (err)

View File

@ -122,26 +122,35 @@ static int lenovo_super_hotkey_wmi_led_init(enum mute_led_type led_type, struct
return -EIO;
union acpi_object *obj __free(kfree) = output.pointer;
if (obj && obj->type == ACPI_TYPE_INTEGER)
led_version = obj->integer.value;
else
if (!obj || obj->type != ACPI_TYPE_INTEGER)
return -EIO;
wpriv->cdev[led_type].max_brightness = LED_ON;
wpriv->cdev[led_type].flags = LED_CORE_SUSPENDRESUME;
led_version = obj->integer.value;
/*
* Output parameters define: 0 means mute LED is not supported, Non-zero means
* mute LED can be supported.
*/
if (led_version == 0)
return 0;
switch (led_type) {
case MIC_MUTE:
if (led_version != WMI_LUD_SUPPORT_MICMUTE_LED_VER)
return -EIO;
if (led_version != WMI_LUD_SUPPORT_MICMUTE_LED_VER) {
pr_warn("The MIC_MUTE LED of this device isn't supported.\n");
return 0;
}
wpriv->cdev[led_type].name = "platform::micmute";
wpriv->cdev[led_type].brightness_set_blocking = &lsh_wmi_micmute_led_set;
wpriv->cdev[led_type].default_trigger = "audio-micmute";
break;
case AUDIO_MUTE:
if (led_version != WMI_LUD_SUPPORT_AUDIOMUTE_LED_VER)
return -EIO;
if (led_version != WMI_LUD_SUPPORT_AUDIOMUTE_LED_VER) {
pr_warn("The AUDIO_MUTE LED of this device isn't supported.\n");
return 0;
}
wpriv->cdev[led_type].name = "platform::mute";
wpriv->cdev[led_type].brightness_set_blocking = &lsh_wmi_audiomute_led_set;
@ -152,6 +161,9 @@ static int lenovo_super_hotkey_wmi_led_init(enum mute_led_type led_type, struct
return -EINVAL;
}
wpriv->cdev[led_type].max_brightness = LED_ON;
wpriv->cdev[led_type].flags = LED_CORE_SUSPENDRESUME;
err = devm_led_classdev_register(dev, &wpriv->cdev[led_type]);
if (err < 0) {
dev_err(dev, "Could not register mute LED %d : %d\n", led_type, err);

View File

@ -1235,9 +1235,8 @@ bool power_supply_has_property(struct power_supply *psy,
return false;
}
int power_supply_get_property(struct power_supply *psy,
enum power_supply_property psp,
union power_supply_propval *val)
static int __power_supply_get_property(struct power_supply *psy, enum power_supply_property psp,
union power_supply_propval *val, bool use_extensions)
{
struct power_supply_ext_registration *reg;
@ -1247,10 +1246,14 @@ int power_supply_get_property(struct power_supply *psy,
return -ENODEV;
}
scoped_guard(rwsem_read, &psy->extensions_sem) {
power_supply_for_each_extension(reg, psy) {
if (power_supply_ext_has_property(reg->ext, psp))
if (use_extensions) {
scoped_guard(rwsem_read, &psy->extensions_sem) {
power_supply_for_each_extension(reg, psy) {
if (!power_supply_ext_has_property(reg->ext, psp))
continue;
return reg->ext->get_property(psy, reg->ext, reg->data, psp, val);
}
}
}
@ -1261,20 +1264,49 @@ int power_supply_get_property(struct power_supply *psy,
else
return -EINVAL;
}
int power_supply_get_property(struct power_supply *psy, enum power_supply_property psp,
union power_supply_propval *val)
{
return __power_supply_get_property(psy, psp, val, true);
}
EXPORT_SYMBOL_GPL(power_supply_get_property);
int power_supply_set_property(struct power_supply *psy,
enum power_supply_property psp,
const union power_supply_propval *val)
/**
* power_supply_get_property_direct - Read a power supply property without checking for extensions
* @psy: The power supply
* @psp: The power supply property to read
* @val: The resulting value of the power supply property
*
* Read a power supply property without taking into account any power supply extensions registered
* on the given power supply. This is mostly useful for power supply extensions that want to access
* their own power supply as using power_supply_get_property() directly will result in a potential
* deadlock.
*
* Return: 0 on success or negative error code on failure.
*/
int power_supply_get_property_direct(struct power_supply *psy, enum power_supply_property psp,
union power_supply_propval *val)
{
return __power_supply_get_property(psy, psp, val, false);
}
EXPORT_SYMBOL_GPL(power_supply_get_property_direct);
static int __power_supply_set_property(struct power_supply *psy, enum power_supply_property psp,
const union power_supply_propval *val, bool use_extensions)
{
struct power_supply_ext_registration *reg;
if (atomic_read(&psy->use_cnt) <= 0)
return -ENODEV;
scoped_guard(rwsem_read, &psy->extensions_sem) {
power_supply_for_each_extension(reg, psy) {
if (power_supply_ext_has_property(reg->ext, psp)) {
if (use_extensions) {
scoped_guard(rwsem_read, &psy->extensions_sem) {
power_supply_for_each_extension(reg, psy) {
if (!power_supply_ext_has_property(reg->ext, psp))
continue;
if (reg->ext->set_property)
return reg->ext->set_property(psy, reg->ext, reg->data,
psp, val);
@ -1289,8 +1321,34 @@ int power_supply_set_property(struct power_supply *psy,
return psy->desc->set_property(psy, psp, val);
}
int power_supply_set_property(struct power_supply *psy, enum power_supply_property psp,
const union power_supply_propval *val)
{
return __power_supply_set_property(psy, psp, val, true);
}
EXPORT_SYMBOL_GPL(power_supply_set_property);
/**
* power_supply_set_property_direct - Write a power supply property without checking for extensions
* @psy: The power supply
* @psp: The power supply property to write
* @val: The value to write to the power supply property
*
* Write a power supply property without taking into account any power supply extensions registered
* on the given power supply. This is mostly useful for power supply extensions that want to access
* their own power supply as using power_supply_set_property() directly will result in a potential
* deadlock.
*
* Return: 0 on success or negative error code on failure.
*/
int power_supply_set_property_direct(struct power_supply *psy, enum power_supply_property psp,
const union power_supply_propval *val)
{
return __power_supply_set_property(psy, psp, val, false);
}
EXPORT_SYMBOL_GPL(power_supply_set_property_direct);
int power_supply_property_is_writeable(struct power_supply *psy,
enum power_supply_property psp)
{

Some files were not shown because too many files have changed in this diff Show More