linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-27 06:50:37 +00:00

Author	SHA1	Message	Date
Ilya Leoshkevich	9474e27a24	libbpf: Add the ability to suppress perf event enablement Automatically enabling a perf event after attaching a BPF prog to it is not always desirable. Add a new "dont_enable" field to struct bpf_perf_event_opts. While introducing "enable" instead would be nicer in that it would avoid a double negation in the implementation, it would make DECLARE_LIBBPF_OPTS() less efficient. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-07 09:01:41 -07:00
Linus Torvalds	a6923c06a3	bpf-fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmiNNksACgkQ6rmadz2v bTrKRhAAnju4bbFRHU88Y68p6Meq/jxgjxHZAkTqZA0Nvbu2cItPRL7XHAAhTWE7 OBEIm3UKCH4gs4fY8rDHiIgnnaQavXUmvXZblOIOjxnqRKJpU3px+wwJvGFq5Enq WP6UZV8tj+O2tNfNNYS+mgQvvIpUISHGpKimvx7ede3e1U3cJBkppbT3gooMHYuc 5s1QtYHWaPY/1DpkHgqJ2UPGcbT9/HSPGMHRNaHKjQTcNcLcrj7RRjchgXqcc7Vs hVijvVrLiuK0MyU42ritmaqvjjgD6hKPZguRQe2/hAtrOo0Alf+4mXkMgam7simN iHfGc7nhw1xAFTPj4WXahja89G00FdDN5NR37Rgurm/i2fY7BuXAkMjiMiwGB3C3 jk2wG3RSifYeC2rxhkYJdqcx8Cz6m+pjgyJ2o9Jy5dn426VXg/kzkUXpl6u5jaPZ SmKoo9Xu1r7xqTaUc9kk8pJI5Xt9vD5oQjF2KQuPZXxNidiwW6k2OGbW+wF26nEi Q6pfDu3pvHAd/UE6cD5yFe97o3Cc2XfGwI/Sv2k99UVPvNcvfAvVo9fsItHBhCPn zHkihW2S0zmbBlhcrB+PrLclNgLleP9JukFN+5scc0a9lbQxIm6v2TNKGlBfDQtO I+Kn266oqT4BEgnQGlCQquINnQAdmS8VMnnunGOu6+rwPUtkI7E= =XLHS -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix kCFI failures in JITed BPF code on arm64 (Sami Tolvanen, Puranjay Mohan, Mark Rutland, Maxwell Bland) - Disallow tail calls between BPF programs that use different cgroup local storage maps to prevent out-of-bounds access (Daniel Borkmann) - Fix unaligned access in flow_dissector and netfilter BPF programs (Paul Chaignon) - Avoid possible use of uninitialized mod_len in libbpf (Achill Gilgenast) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test for unaligned flow_dissector ctx access bpf: Improve ctx access verifier error message bpf: Check netfilter ctx accesses are aligned bpf: Check flow_dissector ctx accesses are aligned arm64/cfi,bpf: Support kCFI + BPF on arm64 cfi: Move BPF CFI types and helpers to generic code cfi: add C CFI type macro libbpf: Avoid possible use of uninitialized mod_len bpf: Fix oob access in cgroup local storage bpf: Move cgroup iterator helpers to bpf.h bpf: Move bpf map owner out of common struct bpf: Add cookie object to bpf maps	2025-08-01 17:13:26 -07:00
Linus Torvalds	f4f346c346	[GIT PULL] perf tools changes for v6.17 Build-ID processing goodies --------------------------- Build-IDs are content based hashes to link regions of memory to ELF files in post processing. They have been available in distros for quite a while: $ file /bin/bash /bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=707a1c670cd72f8e55ffedfbe94ea98901b7ce3a, for GNU/Linux 3.2.0, stripped It is possible to ask the kernel to get it from mmap executable backing storage at time they are being put in place and send it as metadata at that moment to have in perf.data. Prefer that across the board to speed up 'record' time - it post processes the samples to find binaries touched by any samples and to save them with build-ID. It can skip reading build-ID in userspace if it comes from the kernel. perf record ----------- * Make --buildid-mmap default. The kernel can generate MMAP2 events with a build-ID from ELF header. Use that by default instead of using inode and device ID to identify binaries. It also can be disabled with --no-buildid-mmap. * Use BPF for -u/--uid option to sample processes belong to a user. BPF can track user processes more accurately and the existing logic often fails to get the list of processes due to race with reading the /proc filesystem. * Generate PERF_RECORD_BPF_METADATA when it profiles BPF programs and they have variables starting with "bpf_metadata_". This will help to identify BPF objects used in the profile. This has been supported in bpftool for some time and allows the recording of metadata such as commit hashes, versions, etc, that now gets recorded in perf.data as well. * Collect list of DSOs touched in the sample callchains as well as in the sample itself. This would increase the processing time at the end of record, but can improve the data quality. perf stat --------- * Add a new 'drm' pseudo-PMU support like in 'hwmon'. It can collect DRM usage stats using fdinfo in /proc. On my Intel laptop, it shows like below: $ perf list drm ... drm: drm-active-stolen-system0 [Total memory active in one or more engines. Unit: drm_i915] drm-active-system0 [Total memory active in one or more engines. Unit: drm_i915] drm-engine-capacity-video [Engine capacity. Unit: drm_i915] drm-engine-copy [Utilization in ns. Unit: drm_i915] drm-engine-render [Utilization in ns. Unit: drm_i915] drm-engine-video [Utilization in ns. Unit: drm_i915] ... $ sudo perf stat -a -e drm-engine-render,drm-engine-video,drm-engine-capacity-video sleep 1 Performance counter stats for 'system wide': 48,137,316,988,873 ns drm-engine-render 34,452,696,746 ns drm-engine-video 20 capacity drm-engine-capacity-video 1.002086194 seconds time elapsed perf list --------- * Add description for software events. The description is in JSON format and the event parser now can handle the software events like others (for example, it's case-insensitive and subject to wildcard matching). $ perf list software List of pre-defined events (to be used in -e or -M): software: alignment-faults [Number of kernel handled memory alignment faults. Unit: software] bpf-output [An event used by BPF programs to write to the perf ring buffer. Unit: software] cgroup-switches [Number of context switches to a task in a different cgroup. Unit: software] context-switches [Number of context switches [This event is an alias of cs]. Unit: software] cpu-clock [Per-CPU high-resolution timer based event. Unit: software] cpu-migrations [Number of times a process has migrated to a new CPU [This event is an alias of migrations]. Unit: software] cs [Number of context switches [This event is an alias of context-switches]. Unit: software] dummy [A placeholder event that doesn't count anything. Unit: software] emulation-faults [Number of kernel handled unimplemented instruction faults handled through emulation. Unit: software] faults [Number of page faults [This event is an alias of page-faults]. Unit: software] major-faults [Number of major page faults. Major faults require I/O to handle. Unit: software] migrations [Number of times a process has migrated to a new CPU [This event is an alias of cpu-migrations]. Unit: software] minor-faults [Number of minor page faults. Minor faults don't require I/O to handle. Unit: software] page-faults [Number of page faults [This event is an alias of faults]. Unit: software] task-clock [Per-task high-resolution timer based event. Unit: software] perf ftrace ----------- * Add -e/--events option to perf ftrace latency to measure latency between the two events instead of a function. $ sudo perf ftrace latency -ab -e i915_request_wait_begin,i915_request_wait_end --hide-empty -- sleep 1 # DURATION \| COUNT \| GRAPH \| 256 - 512 us \| 4 \| ###### \| 2 - 4 ms \| 2 \| ### \| 4 - 8 ms \| 12 \| ################### \| 8 - 16 ms \| 10 \| ################ \| # statistics (in usec) total time: 194915 avg time: 6961 max time: 12855 min time: 373 count: 28 * Add new function graph tracer options (--graph-opts) to display more info like arguments and return value. They will be passed to the kernel ftrace directly. $ sudo perf ftrace -G vfs_write --graph-opts retval,retaddr # tracer: function_graph # # CPU DURATION FUNCTION CALLS # \| \| \| \| \| \| \| ... 5) \| mutex_unlock() { /* <-rb_simple_write+0xda/0x150 / 5) 0.188 us \| local_clock(); / <-lock_release+0x2ad/0x440 ret=0x3bf2a3cf90e / 5) \| rt_mutex_slowunlock() { / <-rb_simple_write+0xda/0x150 / 5) \| _raw_spin_lock_irqsave() { / <-rt_mutex_slowunlock+0x4f/0x200 / 5) 0.123 us \| preempt_count_add(); / <-_raw_spin_lock_irqsave+0x23/0x90 ret=0x0 / 5) 0.128 us \| local_clock(); / <-__lock_acquire.isra.0+0x17a/0x740 ret=0x3bf2a3cfc8b / 5) 0.086 us \| do_raw_spin_trylock(); / <-_raw_spin_lock_irqsave+0x4a/0x90 ret=0x1 / 5) 0.845 us \| } / _raw_spin_lock_irqsave ret=0x292 / ... misc ---- Add perf archive --exclude-buildids <FILE> option to skip some binaries. The format of the FILE should be same as an output of perf buildid-list. * Get rid of dependency of libcrypto. It was just to get SHA-1 hash so implement it directly like in the kernel. A side effect is that it needs -fno-strict-aliasing compiler option (again, like in the kernel). * Convert all shell script tests to use bash. Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCaIz8IAAKCRCMstVUGiXM g4huAP9WDTZIT9E0gx4yLJ0slyBV/5ROaUWX8OUVO3JJ/1sEUgEAp3wsSmDYc1/o XTvqNNjxo1LG+bEmZk8yNAJ2FYghPgw= =6enW -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.17-2025-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "Build-ID processing goodies: Build-IDs are content based hashes to link regions of memory to ELF files in post processing. They have been available in distros for quite a while: $ file /bin/bash /bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=707a1c670cd72f8e55ffedfbe94ea98901b7ce3a, for GNU/Linux 3.2.0, stripped It is possible to ask the kernel to get it from mmap executable backing storage at time they are being put in place and send it as metadata at that moment to have in perf.data. Prefer that across the board to speed up 'record' time - it post processes the samples to find binaries touched by any samples and to save them with build-ID. It can skip reading build-ID in userspace if it comes from the kernel. perf record: * Make --buildid-mmap default. The kernel can generate MMAP2 events with a build-ID from ELF header. Use that by default instead of using inode and device ID to identify binaries. It also can be disabled with --no-buildid-mmap. * Use BPF for -u/--uid option to sample processes belong to a user. BPF can track user processes more accurately and the existing logic often fails to get the list of processes due to race with reading the /proc filesystem. * Generate PERF_RECORD_BPF_METADATA when it profiles BPF programs and they have variables starting with "bpf_metadata_". This will help to identify BPF objects used in the profile. This has been supported in bpftool for some time and allows the recording of metadata such as commit hashes, versions, etc, that now gets recorded in perf.data as well. * Collect list of DSOs touched in the sample callchains as well as in the sample itself. This would increase the processing time at the end of record, but can improve the data quality. perf stat: * Add a new 'drm' pseudo-PMU support like in 'hwmon'. It can collect DRM usage stats using fdinfo in /proc. On my Intel laptop, it shows like below: $ perf list drm ... drm: drm-active-stolen-system0 [Total memory active in one or more engines. Unit: drm_i915] drm-active-system0 [Total memory active in one or more engines. Unit: drm_i915] drm-engine-capacity-video [Engine capacity. Unit: drm_i915] drm-engine-copy [Utilization in ns. Unit: drm_i915] drm-engine-render [Utilization in ns. Unit: drm_i915] drm-engine-video [Utilization in ns. Unit: drm_i915] ... $ sudo perf stat -a -e drm-engine-render,drm-engine-video,drm-engine-capacity-video sleep 1 Performance counter stats for 'system wide': 48,137,316,988,873 ns drm-engine-render 34,452,696,746 ns drm-engine-video 20 capacity drm-engine-capacity-video 1.002086194 seconds time elapsed perf list * Add description for software events. The description is in JSON format and the event parser now can handle the software events like others (for example, it's case-insensitive and subject to wildcard matching). $ perf list software List of pre-defined events (to be used in -e or -M): software: alignment-faults [Number of kernel handled memory alignment faults. Unit: software] bpf-output [An event used by BPF programs to write to the perf ring buffer. Unit: software] cgroup-switches [Number of context switches to a task in a different cgroup. Unit: software] context-switches [Number of context switches [This event is an alias of cs]. Unit: software] cpu-clock [Per-CPU high-resolution timer based event. Unit: software] cpu-migrations [Number of times a process has migrated to a new CPU [This event is an alias of migrations]. Unit: software] cs [Number of context switches [This event is an alias of context-switches]. Unit: software] dummy [A placeholder event that doesn't count anything. Unit: software] emulation-faults [Number of kernel handled unimplemented instruction faults handled through emulation. Unit: software] faults [Number of page faults [This event is an alias of page-faults]. Unit: software] major-faults [Number of major page faults. Major faults require I/O to handle. Unit: software] migrations [Number of times a process has migrated to a new CPU [This event is an alias of cpu-migrations]. Unit: software] minor-faults [Number of minor page faults. Minor faults don't require I/O to handle. Unit: software] page-faults [Number of page faults [This event is an alias of faults]. Unit: software] task-clock [Per-task high-resolution timer based event. Unit: software] perf ftrace: * Add -e/--events option to perf ftrace latency to measure latency between the two events instead of a function. $ sudo perf ftrace latency -ab -e i915_request_wait_begin,i915_request_wait_end --hide-empty -- sleep 1 # DURATION \| COUNT \| GRAPH \| 256 - 512 us \| 4 \| ###### \| 2 - 4 ms \| 2 \| ### \| 4 - 8 ms \| 12 \| ################### \| 8 - 16 ms \| 10 \| ################ \| # statistics (in usec) total time: 194915 avg time: 6961 max time: 12855 min time: 373 count: 28 * Add new function graph tracer options (--graph-opts) to display more info like arguments and return value. They will be passed to the kernel ftrace directly. $ sudo perf ftrace -G vfs_write --graph-opts retval,retaddr # tracer: function_graph # # CPU DURATION FUNCTION CALLS # \| \| \| \| \| \| \| ... 5) \| mutex_unlock() { /* <-rb_simple_write+0xda/0x150 / 5) 0.188 us \| local_clock(); / <-lock_release+0x2ad/0x440 ret=0x3bf2a3cf90e / 5) \| rt_mutex_slowunlock() { / <-rb_simple_write+0xda/0x150 / 5) \| _raw_spin_lock_irqsave() { / <-rt_mutex_slowunlock+0x4f/0x200 / 5) 0.123 us \| preempt_count_add(); / <-_raw_spin_lock_irqsave+0x23/0x90 ret=0x0 / 5) 0.128 us \| local_clock(); / <-__lock_acquire.isra.0+0x17a/0x740 ret=0x3bf2a3cfc8b / 5) 0.086 us \| do_raw_spin_trylock(); / <-_raw_spin_lock_irqsave+0x4a/0x90 ret=0x1 / 5) 0.845 us \| } / _raw_spin_lock_irqsave ret=0x292 / ... Misc: Add perf archive --exclude-buildids <FILE> option to skip some binaries. The format of the FILE should be same as an output of perf buildid-list. * Get rid of dependency of libcrypto. It was just to get SHA-1 hash so implement it directly like in the kernel. A side effect is that it needs -fno-strict-aliasing compiler option (again, like in the kernel). * Convert all shell script tests to use bash" * tag 'perf-tools-for-v6.17-2025-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (179 commits) perf record: Cache build-ID of hit DSOs only perf test: Ensure lock contention using pipe mode perf python: Stop using deprecated PyUnicode_AsString() perf list: Skip ABI PMUs when printing pmu values perf list: Remove tracepoint printing code perf tp_pmu: Add event APIs perf tp_pmu: Factor existing tracepoint logic to new file perf parse-events: Remove non-json software events perf jevents: Add common software event json perf tools: Remove libtraceevent in .gitignore perf test: Fix comment ordering perf sort: Use perf_env to set arch sort keys and header perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test perf sample: Remove arch notion of sample parsing perf env: Remove global perf_env perf trace: Avoid global perf_env with evsel__env perf auxtrace: Pass perf_env from session through to mmap read perf machine: Explicitly pass in host perf_env perf bench synthesize: Avoid use of global perf_env perf top: Make perf_env locally scoped ...	2025-08-01 16:55:47 -07:00
Achill Gilgenast	13cb75730b	libbpf: Avoid possible use of uninitialized mod_len Though mod_len is only read when mod_name != NULL and both are initialized together, gcc15 produces a warning with -Werror=maybe-uninitialized: libbpf.c: In function 'find_kernel_btf_id.constprop': libbpf.c:10100:33: error: 'mod_len' may be used uninitialized [-Werror=maybe-uninitialized] 10100 \| if (mod_name && strncmp(mod->name, mod_name, mod_len) != 0) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ libbpf.c:10070:21: note: 'mod_len' was declared here 10070 \| int ret, i, mod_len; \| ^~~~~~~ Silence the false positive. Signed-off-by: Achill Gilgenast <fossdd@pwned.life> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250729094611.2065713-1-fossdd@pwned.life Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-31 11:39:46 -07:00
Ian Rogers	811082e4b6	perf parse-events: Support user CPUs mixed with threads/processes Counting events system-wide with a specified CPU prior to this change worked: ``` $ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' -a sleep 1 Performance counter stats for 'system wide': 59,393,419,099 msr/tsc/ 33,927,965,927 msr/tsc,cpu=cpu_core/ 25,465,608,044 msr/tsc,cpu=cpu_atom/ ``` However, when counting with process the counts became system wide: ``` $ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10 10.1: Basic parsing test : Ok 10.2: Parsing without PMU name : Ok 10.3: Parsing with PMU name : Ok Performance counter stats for 'perf test -F 10': 59,233,549 msr/tsc/ 59,227,556 msr/tsc,cpu=cpu_core/ 59,224,053 msr/tsc,cpu=cpu_atom/ ``` Make the handling of CPU maps with event parsing clearer. When an event is parsed creating an evsel the cpus should be either the PMU's cpumask or user specified CPUs. Update perf_evlist__propagate_maps so that it doesn't clobber the user specified CPUs. Try to make the behavior clearer, firstly fix up missing cpumasks. Next, perform sanity checks and adjustments from the global evlist CPU requests and for the PMU including simplifying to the "any CPU"(-1) value. Finally remove the event if the cpumask is empty. So that events are opened with a CPU and a thread change stat's create_perf_stat_counter to give both. With the change things are fixed: ``` $ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10 10.1: Basic parsing test : Ok 10.2: Parsing without PMU name : Ok 10.3: Parsing with PMU name : Ok Performance counter stats for 'perf test -F 10': 63,704,975 msr/tsc/ 47,060,704 msr/tsc,cpu=cpu_core/ (4.62%) 16,640,591 msr/tsc,cpu=cpu_atom/ (2.18%) ``` However, note the "--no-scale" option is used. This is necessary as the running time for the event on the counter isn't the same as the enabled time because the thread doesn't necessarily run on the CPUs specified for the counter. All counter values are scaled with: scaled_value = value * time_enabled / time_running and so without --no-scale the scaled_value becomes very large. This problem already exists on hybrid systems for the same reason. Here are 2 runs of the same code with an instructions event that counts the same on both types of core, there is no real multiplexing happening on the event: ``` $ perf stat -e instructions perf test -F 10 ... Performance counter stats for 'perf test -F 10': 87,896,447 cpu_atom/instructions/ (14.37%) 98,171,964 cpu_core/instructions/ (85.63%) ... $ perf stat --no-scale -e instructions perf test -F 10 ... Performance counter stats for 'perf test -F 10': 13,069,890 cpu_atom/instructions/ (19.32%) 83,460,274 cpu_core/instructions/ (80.68%) ... ``` The scaling has inflated per-PMU instruction counts and the overall count by 2x. To fix this the kernel needs changing when a task+CPU event (or just task event on hybrid) is scheduled out. A fix could be that the state isn't inactive but off for such events, so that time_enabled counts don't accumulate on them. Reviewed-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250719030517.1990983-13-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-07-24 13:41:35 -07:00
Ian Rogers	9a711ef3bd	libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete This allows the perf_evsel__exit to be called when the struct perf_evsel is embedded inside another struct, such as struct evsel in perf. Reviewed-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250719030517.1990983-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-07-24 13:41:35 -07:00
Ian Rogers	6d765f5f7e	libperf evsel: Rename own_cpus to pmu_cpus own_cpus is generally the cpumask from the PMU. Rename to pmu_cpus to try to make this clearer. Variable rename with no other changes. Reviewed-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250719030517.1990983-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-07-24 13:41:35 -07:00
Ian Rogers	478272d1cd	tools subcmd: Tighten the filename size in check_if_command_finished FILENAME_MAX is often PATH_MAX (4kb), far more than needed for the /proc path. Make the buffer size sufficient for the maximum integer plus "/proc/" and "/status" with a '\0' terminator. Fixes: `5ce42b5de4` ("tools subcmd: Add non-waitpid check_if_command_finished()") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250717150855.1032526-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-07-22 18:17:53 -07:00
Eduard Zingerman	42be23e8f2	libbpf: Verify that arena map exists when adding arena relocations Fuzzer reported a memory access error in bpf_program__record_reloc() that happens when: - ".addr_space.1" section exists - there is a relocation referencing this section - there are no arena maps defined in BTF. Sanity checks for maps existence are already present in bpf_program__record_reloc(), hence this commit adds another one. [1] https://github.com/libbpf/libbpf/actions/runs/16375110681/job/46272998064 Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250718222059.281526-1-eddyz87@gmail.com	2025-07-18 17:12:50 -07:00
Alexei Starovoitov	beb1097ec8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc6 Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-18 12:15:59 -07:00
Matteo Croce	0ee30d937c	libbpf: Fix warning in calloc() usage When compiling libbpf with some compilers, this warning is triggered: libbpf.c: In function ‘bpf_object__gen_loader’: libbpf.c:9209:28: error: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Werror=calloc-transposed-args] 9209 \| gen = calloc(sizeof(*gen), 1); \| ^ libbpf.c:9209:28: note: earlier argument should specify number of elements, later size of each element Fix this by inverting the calloc() arguments. Signed-off-by: Matteo Croce <teknoraver@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20250717200337.49168-1-technoboy85@gmail.com	2025-07-18 08:29:50 -07:00
Andrii Nakryiko	0238c45fbb	libbpf: Fix handling of BPF arena relocations Initial __arena global variable support implementation in libbpf contains a bug: it remembers struct bpf_map pointer for arena, which is used later on to process relocations. Recording this pointer is problematic because map pointers are not stable during ELF relocation collection phase, as an array of struct bpf_map's can be reallocated, invalidating all the pointers. Libbpf is dealing with similar issues by using a stable internal map index, though for BPF arena map specifically this approach wasn't used due to an oversight. The resulting behavior is non-deterministic issue which depends on exact layout of ELF object file, number of actual maps, etc. We didn't hit this until very recently, when this bug started triggering crash in BPF CI when validating one of sched-ext BPF programs. The fix is rather straightforward: we just follow an established pattern of remembering map index (just like obj->kconfig_map_idx, for example) instead of `struct bpf_map *`, and resolving index to a pointer at the point where map information is necessary. While at it also add debug-level message for arena-related relocation resolution information, which we already have for all other kinds of maps. Fixes: `2e7ba4f8fd` ("libbpf: Recognize __arena global variables.") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250718001009.610955-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-17 19:17:46 -07:00
Andrii Nakryiko	8080500cba	libbpf: start v1.7 dev cycle With libbpf 1.6.0 released, adjust libbpf.map and libbpf_version.h to start v1.7 development cycles. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250716175936.2343013-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-16 18:37:27 -07:00
Eduard Zingerman	aaa0e57e69	libbpf: __arg_untrusted in bpf_helpers.h Make btf_decl_tag("arg:untrusted") available for libbpf users via macro. Makes the following usage possible: void foo(struct bar p __arg_untrusted) { ... } void bar(struct foo p __arg_trusted) { ... foo(p->buz->bar); // buz derefrence looses __trusted ... } Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-07 08:25:07 -07:00
Kumar Kartikeya Dwivedi	3bbc1ba9cc	libbpf: Introduce bpf_prog_stream_read() API Introduce a libbpf API so that users can read data from a given BPF stream for a BPF prog fd. For now, only the low-level syscall wrapper is provided, we can add a bpf_program__* accessor as a follow up if needed. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-11-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:07 -07:00
Kumar Kartikeya Dwivedi	21a3afc76a	libbpf: Add bpf_stream_printk() macro Add a convenience macro to print data to the BPF streams. BPF_STDOUT and BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to print to the respective streams. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:07 -07:00
Namhyung Kim	1fdf938168	perf tools: Fix use-after-free in help_unknown_cmd() Currently perf aborts when it finds an invalid command. I guess it depends on the environment as I have some custom commands in the path. $ perf bad-command perf: 'bad-command' is not a perf-command. See 'perf --help'. Aborted (core dumped) It's because the exclude_cmds() in libsubcmd has a use-after-free when it removes some entries. After copying one to another entry, it keeps the pointer in the both position. And the next copy operation will free the later one but it's the same entry in the previous one. For example, let's say cmds = { A, B, C, D, E } and excludes = { B, E }. ci cj ei cmds-name excludes -----------+-------------------- 0 0 0 \| A B : cmp < 0, ci == cj 1 1 0 \| B B : cmp == 0 2 1 1 \| C E : cmp < 0, ci != cj At this point, it frees cmds->names[1] and cmds->names[1] is assigned to cmds->names[2]. 3 2 1 \| D E : cmp < 0, ci != cj Now it frees cmds->names[2] but it's the same as cmds->names[1]. So accessing cmds->names[1] will be invalid. This makes the subcmd tests succeed. $ perf test subcmd 69: libsubcmd help tests : 69.1: Load subcmd names : Ok 69.2: Uniquify subcmd names : Ok 69.3: Exclude duplicate subcmd names : Ok Fixes: `4b96679170` ("libsubcmd: Avoid SEGV/use-after-free when commands aren't excluded") Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250701201027.1171561-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-07-02 18:58:50 -07:00
Alexei Starovoitov	886178a33a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3 Cross-merge BPF, perf and other fixes after downstream PRs. It restores BPF CI to green after critical fix commit `bc4394e5e7` ("perf: Fix the throttle error of some clock events") No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-26 09:49:39 -07:00
Adin Scannell	fa6f092cc0	libbpf: Fix possible use-after-free for externs The `name` field in `obj->externs` points into the BTF data at initial open time. However, some functions may invalidate this after opening and before loading (e.g. `bpf_map__set_value_size`), which results in pointers into freed memory and undefined behavior. The simplest solution is to simply `strdup` these strings, similar to the `essent_name`, and free them at the same time. In order to test this path, the `global_map_resize` BPF selftest is modified slightly to ensure the presence of an extern, which causes this test to fail prior to the fix. Given there isn't an obvious API or error to test against, I opted to add this to the existing test as an aspect of the resizing feature rather than duplicate the test. Fixes: `9d0a23313b` ("libbpf: Add capability for resizing datasec maps") Signed-off-by: Adin Scannell <amscanne@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250625050215.2777374-1-amscanne@meta.com	2025-06-25 12:28:58 -07:00
Ian Rogers	be59dba332	libperf evsel: Add missed puts and asserts A missed evsel__close before evsel__delete was the source of leaking perf events due to a hybrid test. Add asserts in debug builds so that this shouldn't happen in the future. Add puts missing on the cpu map and thread maps. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250617223356.2752099-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-06-24 10:27:51 -07:00
Yuan Chen	aa485e8789	libbpf: Fix null pointer dereference in btf_dump__free on allocation failure When btf_dump__new() fails to allocate memory for the internal hashmap (btf_dump->type_names), it returns an error code. However, the cleanup function btf_dump__free() does not check if btf_dump->type_names is NULL before attempting to free it. This leads to a null pointer dereference when btf_dump__free() is called on a btf_dump object. Fixes: `351131b51c` ("libbpf: add btf_dump API for BTF-to-C conversion") Signed-off-by: Yuan Chen <chenyuan@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250618011933.11423-1-chenyuan_fl@163.com	2025-06-23 11:13:40 -07:00
Namhyung Kim	c833e8cc4d	Linux 6.16-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmhYZ9AeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGPdcIAIR01BnZbU7knvkI QSMlJR4zl8IDIu+9P35v5jmoklJFqYtIGQUnNTxrWK0i/NGj6+FAmA8xgAzeJZ7d Kv0Qs0fkgwQWxFbbM9Xg9Vob8gxSZ6Qyo5ETz8UDupCNWyUPpMaRDLv8IRtF/19I YAgXJqCXC4LuJJNROSkk3wiLqg+CerzJ/m7GJtBQdCmbv6h87HETaeQpKVgEwTkR uEn99CDnOryovIYWoDaBbqLRF5AQr2JA6lqmQAr57wyjzuj1Ul2BNgLC/dlnuSQl G8kv0+Kf+l1X3UJy1w8V4lkSLyjpTko7QIgv3AVDLDvOobCtZglWD4vPJ1OGvw4m cHDrPg4= =thw2 -----END PGP SIGNATURE----- Merge tag 'v6.16-rc3' into perf-tools-next To get the fixes in libbpf and perf tools. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-06-22 21:54:03 -07:00
Blake Jones	ab38e84ba9	perf record: collect BPF metadata from existing BPF programs Look for .rodata maps, find ones with 'bpf_metadata_' variables, extract their values as strings, and create a new PERF_RECORD_BPF_METADATA synthetic event using that data. The code gets invoked from the existing routine perf_event__synthesize_one_bpf_prog(). For example, a BPF program with the following variables: const char bpf_metadata_version[] SEC(".rodata") = "3.14159"; int bpf_metadata_value[] SEC(".rodata") = 42; would generate a PERF_RECORD_BPF_METADATA record with: .prog_name = <BPF program name, e.g. "bpf_prog_a1b2c3_foo"> .nr_entries = 2 .entries[0].key = "version" .entries[0].value = "3.14159" .entries[1].key = "value" .entries[1].value = "42" Each of the BPF programs and subprograms that share those variables would get a distinct PERF_RECORD_BPF_METADATA record, with the ".prog_name" showing the name of each program or subprogram. The prog_name is deliberately the same as the ".name" field in the corresponding PERF_RECORD_KSYMBOL record. This code only gets invoked if support for displaying BTF char arrays as strings is detected. Signed-off-by: Blake Jones <blakejones@google.com> Link: https://lore.kernel.org/r/20250612194939.162730-3-blakejones@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-06-20 14:48:35 -07:00
Yonghong Song	1d6711667c	libbpf: Support link-based cgroup attach with options Currently libbpf supports bpf_program__attach_cgroup() with signature: LIBBPF_API struct bpf_link * bpf_program__attach_cgroup(const struct bpf_program prog, int cgroup_fd); To support mprog style attachment, additionsl fields like flags, relative_{fd,id} and expected_revision are needed. Add a new API: LIBBPF_API struct bpf_link bpf_program__attach_cgroup_opts(const struct bpf_program prog, int cgroup_fd, const struct bpf_cgroup_opts opts); where bpf_cgroup_opts contains all above needed fields. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163146.2429212-1-yonghong.song@linux.dev	2025-06-09 16:28:30 -07:00
Andrii Nakryiko	02670deede	libbpf: Handle unsupported mmap-based /sys/kernel/btf/vmlinux correctly libbpf_err_ptr() helpers are meant to return NULL and set errno, if there is an error. But btf_parse_raw_mmap() is meant to be used internally and is expected to return ERR_PTR() values. Because of this mismatch, when libbpf tries to mmap /sys/kernel/btf/vmlinux, we don't detect the error correctly with IS_ERR() check, and never fallback to old non-mmap-based way of loading vmlinux BTF. Fix this by using proper ERR_PTR() returns internally. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Fixes: `3c0421c93c` ("libbpf: Use mmap to parse vmlinux BTF from sysfs") Cc: Lorenz Bauer <lmb@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20250606202134.2738910-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-06 14:07:07 -07:00
Blake Jones	87c9c79a02	libbpf: Add support for printing BTF character arrays as strings The BTF dumper code currently displays arrays of characters as just that - arrays, with each character formatted individually. Sometimes this is what makes sense, but it's nice to be able to treat that array as a string. This change adds a special case to the btf_dump functionality to allow 0-terminated arrays of single-byte integer values to be printed as character strings. Characters for which isprint() returns false are printed as hex-escaped values. This is enabled when the new ".emit_strings" is set to 1 in the btf_dump_type_data_opts structure. As an example, here's what it looks like to dump the string "hello" using a few different field values for btf_dump_type_data_opts (.compact = 1): - .emit_strings = 0, .skip_names = 0: (char[6])['h','e','l','l','o',] - .emit_strings = 0, .skip_names = 1: ['h','e','l','l','o',] - .emit_strings = 1, .skip_names = 0: (char[6])"hello" - .emit_strings = 1, .skip_names = 1: "hello" Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1: - .emit_strings = 0: ['h',-1,] - .emit_strings = 1: "h\xff" Signed-off-by: Blake Jones <blakejones@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250603203701.520541-1-blakejones@google.com	2025-06-05 13:45:16 -07:00
Jiawei Zhao	919319b4ed	libbpf: Correct some typos and syntax issues in usdt doc Fix some incorrect words, such as "and" -> "an", "it's" -> "its". Fix some grammar issues, such as removing redundant "will", "would complicated" -> "would complicate". Signed-off-by: Jiawei Zhao <Phoenix500526@163.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250531095111.57824-1-Phoenix500526@163.com	2025-06-05 11:45:48 -07:00
Linus Torvalds	0939bd2fcf	perf tools improvements and fixes for Linux v6.16: perf report/top/annotate TUI: - Accept the left arrow key as a Zoom out if done on the first column. - Show if source code toggle status in title, to help spotting bugs with the various disassemblers (capstone, llvm, objdump). - Provide feedback on unhandled hotkeys. Build: - Better inform when certain features are not available with warnings in the build process and in 'perf version --build-options' or 'perf -vv'. perf record: - Improve the --off-cpu code by synthesizing events for switch-out -> switch-in intervals using a BPF program. This can be fine tuned using a --off-cpu-thresh knob. perf report: - Add 'tgid' sort key. perf mem/c2c: - Add 'op', 'cache', 'snoop', 'dtlb' output fields. - Add support for 'ldlat' on AMD IBS (Instruction Based Sampling). perf ftrace: - Use process/session specific trace settings instead of messing with the global ftrace knobs. perf trace: - Implement syscall summary in BPF. - Support --summary-mode=cgroup. - Always print return value for syscalls returning a pid. - The rseq and set_robust_list don't return a pid, just -errno. perf lock contention: - Symbolize zone->lock using BTF. - Add -J/--inject-delay option to estimate impact on application performance by optimization of kernel locking behavior. perf stat: - Improve hybrid support for the NMI watchdog warning. Symbol resolution: - Handle 'u' and 'l' symbols in /proc/kallsyms, resolving some Rust symbols. - Improve Rust demangler. Hardware tracing: Intel PT: - Fix PEBS-via-PT data_src. - Do not default to recording all switch events. - Fix pattern matching with python3 on the SQL viewer script. arm64: - Fixups for the hip08 hha PMU. Vendor events: - Update Intel events/metrics files for alderlake, alderlaken, arrowlake, bonnell, broadwell, broadwellde, broadwellx, cascadelakex, clearwaterforest, elkhartlake, emeraldrapids, grandridge, graniterapids, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, lunarlake, meteorlake, nehalemep, nehalemex, rocketlake, sandybridge, sapphirerapids, sierraforest, skylake, skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp, westmereep-sx. python support: - Add support for event counts in the python binding, add a counting.py example. perf list: - Display the PMU name associated with a perf metric in JSON. perf test: - Hybrid improvements for metric value validation test. - Fix LBR test by ignoring idle task. - Add AMD IBS sw filter ana d'ldlat' tests. - Add 'perf trace --summary-mode=cgroup' test. - Add tests for the various language symbol demanglers. Miscellaneous. - Allow specifying the cpu an event will be tied using '-e event/cpu=N/'. - Sync various headers with the kernel sources. - Add annotations to use clang's -Wthread-safety and fix some problems it detected. - Make dump_stack() use perf's symbol resolution to provide better backtraces. - Intel TPEBS support cleanups and fixes. TPEBS stands for Timed PEBS (Precision Event-Based Sampling), that adds timing info, the retirement latency of instructions. - Various memory allocation (some detected by ASAN) and reference counting fixes. - Add a 8-byte aligned PERF_RECORD_COMPRESSED2 to replace PERF_RECORD_COMPRESSED. - Skip unsupported event types in perf.data files, don't stop when finding one. - Improve lookups using hashmaps and binary searches. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCaD9ViwAKCRCyPKLppCJ+ JzOfAQDXlukhPQyuJ4j1ie0x1QO4jalloMbG1Bkp3hn6yjxafAD9Ha5wr+dwnAj4 FfxOVqua29r8Htn4aGahXZ0nnlVp9Ac= =bwgD -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.16-1-2025-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "perf report/top/annotate TUI: - Accept the left arrow key as a Zoom out if done on the first column - Show if source code toggle status in title, to help spotting bugs with the various disassemblers (capstone, llvm, objdump) - Provide feedback on unhandled hotkeys Build: - Better inform when certain features are not available with warnings in the build process and in 'perf version --build-options' or 'perf -vv' perf record: - Improve the --off-cpu code by synthesizing events for switch-out -> switch-in intervals using a BPF program. This can be fine tuned using a --off-cpu-thresh knob perf report: - Add 'tgid' sort key perf mem/c2c: - Add 'op', 'cache', 'snoop', 'dtlb' output fields - Add support for 'ldlat' on AMD IBS (Instruction Based Sampling) perf ftrace: - Use process/session specific trace settings instead of messing with the global ftrace knobs perf trace: - Implement syscall summary in BPF - Support --summary-mode=cgroup - Always print return value for syscalls returning a pid - The rseq and set_robust_list don't return a pid, just -errno perf lock contention: - Symbolize zone->lock using BTF - Add -J/--inject-delay option to estimate impact on application performance by optimization of kernel locking behavior perf stat: - Improve hybrid support for the NMI watchdog warning Symbol resolution: - Handle 'u' and 'l' symbols in /proc/kallsyms, resolving some Rust symbols - Improve Rust demangler Hardware tracing: Intel PT: - Fix PEBS-via-PT data_src - Do not default to recording all switch events - Fix pattern matching with python3 on the SQL viewer script arm64: - Fixups for the hip08 hha PMU Vendor events: - Update Intel events/metrics files for alderlake, alderlaken, arrowlake, bonnell, broadwell, broadwellde, broadwellx, cascadelakex, clearwaterforest, elkhartlake, emeraldrapids, grandridge, graniterapids, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, lunarlake, meteorlake, nehalemep, nehalemex, rocketlake, sandybridge, sapphirerapids, sierraforest, skylake, skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp, westmereep-sx python support: - Add support for event counts in the python binding, add a counting.py example perf list: - Display the PMU name associated with a perf metric in JSON perf test: - Hybrid improvements for metric value validation test - Fix LBR test by ignoring idle task - Add AMD IBS sw filter ana d'ldlat' tests - Add 'perf trace --summary-mode=cgroup' test - Add tests for the various language symbol demanglers Miscellaneous: - Allow specifying the cpu an event will be tied using '-e event/cpu=N/' - Sync various headers with the kernel sources - Add annotations to use clang's -Wthread-safety and fix some problems it detected - Make dump_stack() use perf's symbol resolution to provide better backtraces - Intel TPEBS support cleanups and fixes. TPEBS stands for Timed PEBS (Precision Event-Based Sampling), that adds timing info, the retirement latency of instructions - Various memory allocation (some detected by ASAN) and reference counting fixes - Add a 8-byte aligned PERF_RECORD_COMPRESSED2 to replace PERF_RECORD_COMPRESSED - Skip unsupported event types in perf.data files, don't stop when finding one - Improve lookups using hashmaps and binary searches" * tag 'perf-tools-for-v6.16-1-2025-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (206 commits) perf callchain: Always populate the addr_location map when adding IP perf lock contention: Reject more than 10ms delays for safety perf trace: Set errpid to false for rseq and set_robust_list perf symbol: Move demangling code out of symbol-elf.c perf trace: Always print return value for syscalls returning a pid perf script: Print PERF_AUX_FLAG_COLLISION flag perf mem: Show absolute percent in mem_stat output perf mem: Display sort order only if it's available perf mem: Describe overhead calculation in brief perf record: Fix incorrect --user-regs comments Revert "perf thread: Ensure comm_lock held for comm_list" perf test trace_summary: Skip --bpf-summary tests if no libbpf perf test intel-pt: Skip jitdump test if no libelf perf intel-tpebs: Avoid race when evlist is being deleted perf test demangle-java: Don't segv if demangling fails perf symbol: Fix use-after-free in filename__read_build_id perf pmu: Avoid segv for missing name/alias_name in wildcarding perf machine: Factor creating a "live" machine out of dwarf-unwind perf test: Add AMD IBS sw filter test perf mem: Count L2 HITM for c2c statistic ...	2025-06-03 15:11:44 -07:00
Linus Torvalds	90b83efa67	bpf-next-6.16 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmg3NqgACgkQ6rmadz2v bTpNUQ/8DPeYtn3nskpsP2OwFy6O3hhfCe6gjOAmUVSk000xbG+AcI/h1DnGZWgk xlVcEs93ekzUzHd7k1+RJ2c5yDLXieLJAtb66rbFU1enkxs2cWlcWSKE6K/gaoh3 G1BCARVlKwtrJhrVrsXtYP/eGZxKRSUZFK7xhtCk7lp7sRI3xkTLE+FJBcDkTJ6W HwF14i3zO+BkqNGdFwwlASCCqRItSNBBiM3KjW1DbETOTfAKlvCTrcgdUiODqxhF PNnULW+xmICABDFlKfDMlUAGNlSHKjiI3+g31LdblA5eyEhIqiCRgBGFYoCnsluk qUauRSie61KqC7fxN3qVpC3bXJfD1td7uIvoqSkDLtTv8a5+HAoiohzi1qBzCayl LAGkBYewAfDtdDDjNY38JLH2RCdyY6zG9DhqghPHdPlM7zj7L5zZgj34igEwesMM mfj9TuFFF99yfX5UUeSxKpDGR1eO4Ew0p7tg8CRs8Fqh6AIQSmboREZrsncVRCTS 4SDHSI4KcO4LO2pEKzy+X4dewganN7aESnQG34iG0liyvDDwJOgUnDWLRwPLas7k 3b/zIfBLxOJpA5R+0hhAMtjMA4NgyKJf4yFZwEieuasQjvzwTApi24YhZ/b3HSEB 2Dp8kHEEbwezv0OFFz/fJ88dNQnrDmtJ+QByN/liA8kj4Yuh2+Q= =j3t8 -----END PGP SIGNATURE----- Merge tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Fix and improve BTF deduplication of identical BTF types (Alan Maguire and Andrii Nakryiko) - Support up to 12 arguments in BPF trampoline on arm64 (Xu Kuohai and Alexis Lothoré) - Support load-acquire and store-release instructions in BPF JIT on riscv64 (Andrea Parri) - Fix uninitialized values in BPF_{CORE,PROBE}_READ macros (Anton Protopopov) - Streamline allowed helpers across program types (Feng Yang) - Support atomic update for hashtab of BPF maps (Hou Tao) - Implement json output for BPF helpers (Ihor Solodrai) - Several s390 JIT fixes (Ilya Leoshkevich) - Various sockmap fixes (Jiayuan Chen) - Support mmap of vmlinux BTF data (Lorenz Bauer) - Support BPF rbtree traversal and list peeking (Martin KaFai Lau) - Tests for sockmap/sockhash redirection (Michal Luczaj) - Introduce kfuncs for memory reads into dynptrs (Mykyta Yatsenko) - Add support for dma-buf iterators in BPF (T.J. Mercier) - The verifier support for __bpf_trap() (Yonghong Song) * tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (135 commits) bpf, arm64: Remove unused-but-set function and variable. selftests/bpf: Add tests with stack ptr register in conditional jmp bpf: Do not include stack ptr register in precision backtracking bookkeeping selftests/bpf: enable many-args tests for arm64 bpf, arm64: Support up to 12 function arguments bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem() bpf: Avoid __bpf_prog_ret0_warn when jit fails bpftool: Add support for custom BTF path in prog load/loadall selftests/bpf: Add unit tests with __bpf_trap() kfunc bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variable bpf: Remove special_kfunc_set from verifier selftests/bpf: Add test for open coded dmabuf_iter selftests/bpf: Add test for dmabuf_iter bpf: Add open coded dmabuf iterator bpf: Add dmabuf iterator dma-buf: Rename debugfs symbols bpf: Fix error return value in bpf_copy_from_user_dynptr libbpf: Use mmap to parse vmlinux BTF from sysfs selftests: bpf: Add a test for mmapable vmlinux BTF btf: Allow mmap of vmlinux btf ...	2025-05-28 15:52:42 -07:00
Lorenz Bauer	3c0421c93c	libbpf: Use mmap to parse vmlinux BTF from sysfs Teach libbpf to use mmap when parsing vmlinux BTF from /sys. We don't apply this to fall-back paths on the regular file system because there is no way to ensure that modifications underlying the MAP_PRIVATE mapping are not visible to the process. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250520-vmlinux-mmap-v5-3-e8c941acc414@isovalent.com	2025-05-23 10:06:28 -07:00
Ian Rogers	3ee2255c4f	libperf threadmap: Add perf_thread_map__idx() Allow computation of thread map index from a PID. Note, with a 'struct perf_cpu_map' the sorted nature allows for a binary search to compute the index which isn't currently possible with a 'struct perf_thread_map' as they aren't guaranteed sorted. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250519195148.1708988-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-05-21 15:07:13 -03:00
Ian Rogers	eead8a0114	libperf threadmap: Don't segv for index 0 for the NULL 'struct perf_thread_map' pointer perf_thread_map__nr() returns length 1 if the perf_thread_map is NULL, meaning index 0 is valid. When perf_thread_map__pid() of index 0 is read then return the expected "any" -1 value. Assert this is only done for index 0. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250519195148.1708988-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-05-21 15:07:13 -03:00
Alan Maguire	4e29128a9a	libbpf/btf: Fix string handling to support multi-split BTF libbpf handling of split BTF has been written largely with the assumption that multiple splits are possible, i.e. split BTF on top of split BTF on top of base BTF. One area where this does not quite work is string handling in split BTF; the start string offset should be the base BTF string section length + the base BTF string offset. This worked in the past because for a single split BTF with base the start string offset was always 0. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250519165935.261614-2-alan.maguire@oracle.com	2025-05-20 16:22:30 -07:00
Chun-Tse Shao	208c0e1683	perf record: Add 8-byte aligned event type PERF_RECORD_COMPRESSED2 The original PERF_RECORD_COMPRESS is not 8-byte aligned, which can cause asan runtime error: # Build with asan $ make -C tools/perf O=/tmp/perf DEBUG=1 EXTRA_CFLAGS="-O0 -g -fno-omit-frame-pointer -fsanitize=undefined" # Test success with many asan runtime errors: $ /tmp/perf/perf test "Zstd perf.data compression/decompression" -vv 83: Zstd perf.data compression/decompression: ... util/session.c:1959:13: runtime error: member access within misaligned address 0x7f69e3f99653 for type 'union perf_event', which requires 13 byte alignment 0x7f69e3f99653: note: pointer points here d0 3a 50 69 44 00 00 00 00 00 08 00 bb 07 00 00 00 00 00 00 44 00 00 00 00 00 00 00 ff 07 00 00 ^ util/session.c:2163:22: runtime error: member access within misaligned address 0x7f69e3f99653 for type 'union perf_event', which requires 8 byte alignment 0x7f69e3f99653: note: pointer points here d0 3a 50 69 44 00 00 00 00 00 08 00 bb 07 00 00 00 00 00 00 44 00 00 00 00 00 00 00 ff 07 00 00 ^ ... Since there is no way to align compressed data in zstd compression, this patch add a new event type `PERF_RECORD_COMPRESSED2`, which adds a field `data_size` to specify the actual compressed data size. The `header.size` contains the total record size, including the padding at the end to make it 8-byte aligned. Tested with `Zstd perf.data compression/decompression` Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250303183646.327510-1-ctshao@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-05-16 17:31:40 -03:00
Mykyta Yatsenko	d0445d7dd3	libbpf: Check bpf_map_skeleton link for NULL Avoid dereferencing bpf_map_skeleton's link field if it's NULL. If BPF map skeleton is created with the size, that indicates containing link field, but the field was not actually initialized with valid bpf_link pointer, libbpf crashes. This may happen when using libbpf-rs skeleton. Skeleton loading may still progress, but user needs to attach struct_ops map separately. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250514113220.219095-1-mykyta.yatsenko5@gmail.com	2025-05-14 09:30:06 -07:00
Anton Protopopov	fd5fd538a1	libbpf: Use proper errno value in nlattr Return value of the validate_nla() function can be propagated all the way up to users of libbpf API. In case of error this libbpf version of validate_nla returns -1 which will be seen as -EPERM from user's point of view. Instead, return a more reasonable -EINVAL. Fixes: `bbf48c18ee` ("libbpf: add error reporting in XDP") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250510182011.2246631-1-a.s.protopopov@gmail.com	2025-05-12 15:22:54 -07:00
Ian Rogers	2e7a2f7f3c	libperf cpumap: Add ability to create CPU from a single CPU number Add perf_cpu_map__new_int() so that a CPU map can be created from a single integer. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250403194337.40202-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-05-12 14:18:16 -03:00
Jakub Kicinski	6b02fd7799	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.15-rc6). No conflicts. Adjacent changes: net/core/dev.c: `08e9f2d584` ("net: Lock netdevices during dev_shutdown") `a82dc19db1` ("net: avoid potential race between netdev_get_by_index_lock() and netns switch") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-08 08:59:02 -07:00
Andrii Nakryiko	62e23f1838	libbpf: Improve BTF dedup handling of "identical" BTF types BTF dedup has a strong assumption that compiler with deduplicate identical types within any given compilation unit (i.e., .c file). This property is used when establishing equilvalence of two subgraphs of types. Unfortunately, this property doesn't always holds in practice. We've seen cases of having truly identical structs, unions, array definitions, and, most recently, even pointers to the same type being duplicated within CU. Previously, we mitigated this on a case-by-case basis, adding a few simple heuristics for validating that two BTF types (having two different type IDs) are structurally the same. But this approach scales poorly, and we can have more weird cases come up in the future. So let's take a half-step back, and implement a bit more generic structural equivalence check, recursively. We still limit it to reasonable depth to avoid long reference loops. Depth-wise limiting of potentially cyclical graph isn't great, but as I mentioned below doesn't seem to be detrimental performance-wise. We can always improve this in the future with per-type visited markers, if necessary. Performance-wise this doesn't seem too affect vmlinux BTF dedup, which makes sense because this logic kicks in not so frequently and only if we already established a canonical candidate type match, but suddenly find a different (but probably identical) type. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/20250501235231.1339822-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-05-05 14:51:47 -07:00
Anton Protopopov	41d4ce6df3	bpf: Fix uninitialized values in BPF_{CORE,PROBE}_READ With the latest LLVM bpf selftests build will fail with the following error message: progs/profiler.inc.h:710:31: error: default initialization of an object of type 'typeof ((parent_task)->real_cred->uid.val)' (aka 'const unsigned int') leaves the object uninitialized and is incompatible with C++ [-Werror,-Wdefault-const-init-unsafe] 710 \| proc_exec_data->parent_uid = BPF_CORE_READ(parent_task, real_cred, uid.val); \| ^ tools/testing/selftests/bpf/tools/include/bpf/bpf_core_read.h:520:35: note: expanded from macro 'BPF_CORE_READ' 520 \| ___type((src), a, ##__VA_ARGS__) __r; \ \| ^ This happens because BPF_CORE_READ (and other macro) declare the variable __r using the ___type macro which can inherit const modifier from intermediate types. Fix this by using __typeof_unqual__, when supported. (And when it is not supported, the problem shouldn't appear, as older compilers haven't complained.) Fixes: `792001f4f7` ("libbpf: Add user-space variants of BPF_CORE_READ() family of macros") Fixes: `a4b09a9ef9` ("libbpf: Add non-CO-RE variants of BPF_CORE_READ() macro family") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250502193031.3522715-1-a.s.protopopov@gmail.com	2025-05-05 14:20:28 -07:00
Anton Protopopov	358b1c0f56	libbpf: Use proper errno value in linker Return values of the linker_append_sec_data() and the linker_append_elf_relos() functions are propagated all the way up to users of libbpf API. In some error cases these functions return -1 which will be seen as -EPERM from user's point of view. Instead, return a more reasonable -EINVAL. Fixes: `faf6ed321c` ("libbpf: Add BPF static linker APIs") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250430120820.2262053-1-a.s.protopopov@gmail.com	2025-04-30 09:04:20 -07:00
James Clark	8988c4b919	perf tools: Fix in-source libperf build When libperf is built alone in-source, $(OUTPUT) isn't set. This causes the generated uapi path to resolve to '/../arch' which results in a permissions error: mkdir: cannot create directory '/../arch': Permission denied Fix it by removing the preceding '/..' which means that it gets generated either in the tools/lib/perf part of the tree or the OUTPUT folder. Some other rules that rely on OUTPUT further refine this conditionally depending on whether it's an in-source or out-of-source build, but I don't think we need the extra complexity here. And this rule is slightly different to others because the header is needed by both libperf and Perf. This is further complicated by the fact that Perf always passes O=... to libperf even for in source builds, meaning that OUTPUT isn't set consistently between projects. Because we're no longer going one level up to try to generate the file in the tools/ folder, Perf's include rule needs to descend into libperf. Also fix the clean rule while we're here. Reported-by: Thorsten Leemhuis <linux@leemhuis.info> Closes: https://lore.kernel.org/linux-perf-users/7703f88e-ccb7-4c98-9da4-8aad224e780f@leemhuis.info/ Fixes: `bfb713ea53` ("perf tools: Fix arm64 build by generating unistd_64.h") Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Link: https://lore.kernel.org/r/20250429-james-perf-fix-libperf-in-source-build-v1-1-a1a827ac15e5@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-04-29 12:32:31 -07:00
Alan Maguire	8e64c387c9	libbpf: Add identical pointer detection to btf_dedup_is_equiv() Recently as a side-effect of commit `ac053946f5` ("compiler.h: introduce TYPEOF_UNQUAL() macro") issues were observed in deduplication between modules and kernel BTF such that a large number of kernel types were not deduplicated so were found in module BTF (task_struct, bpf_prog etc). The root cause appeared to be a failure to dedup struct types, specifically those with members that were pointers with __percpu annotations. The issue in dedup is at the point that we are deduplicating structures, we have not yet deduplicated reference types like pointers. If multiple copies of a pointer point at the same (deduplicated) integer as in this case, we do not see them as identical. Special handling already exists to deal with structures and arrays, so add pointer handling here too. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250429161042.2069678-1-alan.maguire@oracle.com	2025-04-29 10:16:23 -07:00
Jonathan Wiepert	91dbac4076	Use thread-safe function pointer in libbpf_print This patch fixes a thread safety bug where libbpf_print uses the global variable storing the print function pointer rather than the local variable that had the print function set via __atomic_load_n. Fixes: `f1cb927cdb` ("libbpf: Ensure print callback usage is thread-safe") Signed-off-by: Jonathan Wiepert <jonathan.wiepert@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Link: https://lore.kernel.org/bpf/20250424221457.793068-1-jonathan.wiepert@gmail.com	2025-04-25 09:27:17 -07:00
Tao Chen	64821d25f0	libbpf: Remove sample_period init in perf_buffer It seems that sample_period is not used in perf buffer. Actually, only wakeup_events are meaningful to enable events aggregation for wakeup notification. Remove sample_period setting code to avoid confusion. Fixes: `fb84b82246` ("libbpf: add perf buffer API") Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/bpf/20250423163901.2983689-1-chen.dylane@linux.dev	2025-04-25 09:24:47 -07:00
James Clark	bfb713ea53	perf tools: Fix arm64 build by generating unistd_64.h Since pulling in the kernel changes in commit `22f72088ff` ("tools headers: Update the syscall table with the kernel sources"), arm64 is no longer using a generic syscall header and generates one from the syscall table. Therefore we must also generate the syscall header for arm64 before building Perf. Add it as a dependency to libperf which uses one syscall number. Perf uses more, but as libperf is a dependency of Perf it will be generated for both. Future platforms that need this will have to add their own syscall-y targets in libperf manually. Unfortunately the arch specific files that do this (e.g. arch/arm64/include/asm/Kbuild) can't easily be imported into the Perf build. But Perf only needs a subset of the generated files anyway, so redefining them is probably the correct thing to do. Fixes: `22f72088ff` ("tools headers: Update the syscall table with the kernel sources") Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20250417-james-perf-fix-gen-syscall-v1-1-1d268c923901@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-04-23 08:57:12 -07:00
Feng Yang	4dde20b1aa	libbpf: Fix event name too long error When the binary path is excessively long, the generated probe_name in libbpf exceeds the kernel's MAX_EVENT_NAME_LEN limit (64 bytes). This causes legacy uprobe event attachment to fail with error code -22. The fix reorders the fields to place the unique ID before the name. This ensures that even if truncation occurs via snprintf, the unique ID remains intact, preserving event name uniqueness. Additionally, explicit checks with MAX_EVENT_NAME_LEN are added to enforce length constraints. Before Fix: ./test_progs -t attach_probe/kprobe-long_name ...... libbpf: failed to add legacy kprobe event for 'bpf_testmod_looooooooooooooooooooooooooooooong_name+0x0': -EINVAL libbpf: prog 'handle_kprobe': failed to create kprobe 'bpf_testmod_looooooooooooooooooooooooooooooong_name+0x0' perf event: -EINVAL test_attach_kprobe_long_event_name:FAIL:attach_kprobe_long_event_name unexpected error: -22 test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec #13/11 attach_probe/kprobe-long_name:FAIL #13 attach_probe:FAIL ./test_progs -t attach_probe/uprobe-long_name ...... libbpf: failed to add legacy uprobe event for /root/linux-bpf/bpf-next/tools/testing/selftests/bpf/test_progs:0x13efd9: -EINVAL libbpf: prog 'handle_uprobe': failed to create uprobe '/root/linux-bpf/bpf-next/tools/testing/selftests/bpf/test_progs:0x13efd9' perf event: -EINVAL test_attach_uprobe_long_event_name:FAIL:attach_uprobe_long_event_name unexpected error: -22 #13/10 attach_probe/uprobe-long_name:FAIL #13 attach_probe:FAIL After Fix: ./test_progs -t attach_probe/uprobe-long_name #13/10 attach_probe/uprobe-long_name:OK #13 attach_probe:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED ./test_progs -t attach_probe/kprobe-long_name #13/11 attach_probe/kprobe-long_name:OK #13 attach_probe:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Fixes: `46ed5fc33d` ("libbpf: Refactor and simplify legacy kprobe code") Fixes: `cc10623c68` ("libbpf: Add legacy uprobe attaching support") Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250417014848.59321-2-yangfeng59949@163.com	2025-04-22 17:13:37 -07:00
Amery Hung	4b15121da7	libbpf: Support creating and destroying qdisc Extend struct bpf_tc_hook with handle, qdisc name and a new attach type, BPF_TC_QDISC, to allow users to add or remove any qdisc specified in addition to clsact. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250409214606.2000194-8-ameryhung@gmail.com	2025-04-17 10:54:41 -07:00
Ihor Solodrai	8582d9ab3e	libbpf: Verify section type in btf_find_elf_sections A valid ELF file may contain a SHT_NOBITS .BTF section. This case is not handled correctly in btf_parse_elf, which leads to a segfault. Before attempting to load BTF section data, check that the section type is SHT_PROGBITS, which is the expected type for BTF data. Fail with an error if the type is different. Bug report: https://github.com/libbpf/libbpf/issues/894 v1: https://lore.kernel.org/bpf/20250408184104.3962949-1-ihor.solodrai@linux.dev/ Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250410182823.1591681-1-ihor.solodrai@linux.dev	2025-04-15 15:18:55 -07:00
Viktor Malik	ee684de5c1	libbpf: Fix buffer overflow in bpf_object__init_prog As shown in [1], it is possible to corrupt a BPF ELF file such that arbitrary BPF instructions are loaded by libbpf. This can be done by setting a symbol (BPF program) section offset to a large (unsigned) number such that <section start + symbol offset> overflows and points before the section data in the memory. Consider the situation below where: - prog_start = sec_start + symbol_offset <-- size_t overflow here - prog_end = prog_start + prog_size prog_start sec_start prog_end sec_end \| \| \| \| v v v v .....................\|################################\|............ The report in [1] also provides a corrupted BPF ELF which can be used as a reproducer: $ readelf -S crash Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 2] uretprobe.mu[...] PROGBITS 0000000000000000 00000040 0000000000000068 0000000000000000 AX 0 0 8 $ readelf -s crash Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will point before the actual memory where section 2 is allocated. This is also reported by AddressSanitizer: ================================================================= ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490 READ of size 104 at 0x7c7302fe0000 thread T0 #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76) #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856 #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928 #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930 #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067 #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090 #6 0x000000400c16 in main /poc/poc.c:8 #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4) #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667) #9 0x000000400b34 in _start (/poc/poc+0x400b34) 0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8) allocated by thread T0 here: #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b) #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600) #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018) #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740 The problem here is that currently, libbpf only checks that the program end is within the section bounds. There used to be a check `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was removed by commit `6245947c1b` ("libbpf: Allow gaps in BPF program sections to support overriden weak functions"). Add a check for detecting the overflow of `sec_off + prog_sz` to bpf_object__init_prog to fix this issue. [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Fixes: `6245947c1b` ("libbpf: Allow gaps in BPF program sections to support overriden weak functions") Reported-by: lmarch2 <2524158037@qq.com> Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Link: https://lore.kernel.org/bpf/20250415155014.397603-1-vmalik@redhat.com	2025-04-15 15:17:01 -07:00

1 2 3 4 5 ...

2911 Commits