linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-02 16:44:59 +00:00

Author	SHA1	Message	Date
Eric Lin	6dad43bb11	perf vendor events riscv: Add SiFive P650 events The SiFive Performance P650 core (including the vector-enabled P670 and area-optimized P450/P470 variants) updates the P550 microarchitecture. It brings in the debug, trace, and counter events from newer Bullet cores, and adds new events for iTLB and dTLB multi-hits. All other PMU events are unchanged from the P550 core. Signed-off-by: Eric Lin <eric.lin@sifive.com> Co-developed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-8-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 14:15:38 -07:00
Eric Lin	2e3a13d6b7	perf vendor events riscv: Add SiFive P550 events The SiFive Performance P550 core features an out-of-order microarchitecture which exposes the same PMU events as Bullet, plus events for UTLB hits and PTE cache misses/hits. Add support for specifying these events using symbolic names. Signed-off-by: Eric Lin <eric.lin@sifive.com> Co-developed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-7-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 14:15:38 -07:00
Eric Lin	8866a33815	perf vendor events riscv: Add SiFive Bullet version 0x0d events SiFive Bullet microarchitecture cores with mimpid values starting with 0x0d or greater add new PMU events to count TLB miss stall cycles. All other PMU events are unchanged from earlier Bullet cores. Signed-off-by: Eric Lin <eric.lin@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-6-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 14:15:38 -07:00
Eric Lin	acaefd6049	perf vendor events riscv: Add SiFive Bullet version 0x07 events SiFive Bullet microarchitecture cores with mimpid values starting with 0x07 or greater add new PMU events to support debug, trace, and counter sampling and filtering (Sscofpmf). All other PMU events are unchanged from earlier Bullet cores. Signed-off-by: Eric Lin <eric.lin@sifive.com> Co-developed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-5-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 14:15:38 -07:00
Eric Lin	4f762cb409	perf vendor events riscv: Update SiFive Bullet events Regenerate the event lists from the original hardware description. This makes them consistent with the event lists for newer versions of the hardware, allowing most files to be reused across hardware versions. Signed-off-by: Eric Lin <eric.lin@sifive.com> Co-developed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-4-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 14:15:38 -07:00
Samuel Holland	0d042fa514	perf vendor events riscv: Remove leading zeroes The EventCode field (as stored in the mhpmeventN CSRs) is actually 56 bits wide, but there is no need to keep leading zeroes in the JSON files. Remove them to simplify review of the following change, which regenerates the files in a way that does not include leading zeroes. This change was performed automatically with `sed -i "s/0x0*/0x/"`. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-3-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 14:15:38 -07:00
Samuel Holland	d35ad7e881	perf vendor events riscv: Rename U74 to Bullet This set of PMU event descriptions applies not only to the SiFive U74 core configuration, but also to other SiFive cores that implement the Bullet microarchitecture (such as U64, P270, and X280). Rename the directory to be more generic. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-2-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 14:15:37 -07:00
Dr. David Alan Gilbert	c1a37db3cf	perf util: Remove unused perf_config__refresh perf_config__refresh() was added in 2016 by commit `8a0a9c7e91` ("perf config: Introduce new init() and exit()") but has remained unused. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-7-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 11:31:24 -07:00
Dr. David Alan Gilbert	e032e7a775	perf util: Remove unused perf_pmus__default_pmu_name perf_pmus__default_pmu_name() last use was removed by 2023's commit `e3edd6cf63` ("perf pmu-events: Reduce processed events by passing PMU") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-6-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 11:31:24 -07:00
Dr. David Alan Gilbert	f986468641	perf util: Remove unused perf_data__update_dir perf_data__update_dir() was added in 2019's commit `e8be135751` ("perf data: Add perf_data__update_dir() function") but has never been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-5-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 11:31:24 -07:00
Dr. David Alan Gilbert	cf99ec1525	perf util: Remove unused pstack__pop The last use of pstack__pop() was removed in 2015 by commit `6422184b08` ("perf hists browser: Simplify zooming code using pstack_peek()") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-4-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 11:31:24 -07:00
Dr. David Alan Gilbert	a9b496f420	perf util: Remove unused perf_color_default_config perf_color_default_config() was added in 2009 by commit `8fc0321f1a` ("perf_counter tools: Add color terminal output support") but has remained unused. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-3-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-10 11:31:24 -07:00
Ian Rogers	36e7748d33	perf tests: Fix data symbol test with LTO builds With LTO builds, although regular builds could also see this as all the code is in one file, the datasym workload can realize the buf1.reserved data is never accessed. The compiler moves the variable to bss and only keeps the data1 and data2 parts as separate variables. This causes the symbol check to fail in the test. Make the variable volatile to disable the more aggressive optimization. Rename the variable to make which buf1 in perf is being referred to. Before: $ perf test -vv "data symbol" 126: Test data symbol: --- start --- test child forked, pid 299808 perf does not have symbol 'buf1' perf is missing symbols - skipping test ---- end(-2) ---- 126: Test data symbol : Skip $ nm perf\|grep buf1 0000000000a5fa40 b buf1.0 0000000000a5fa48 b buf1.1 After: $ nm perf\|grep buf1 0000000000a53a00 d buf1 $ perf test -vv "data symbol"126: Test data symbol: --- start --- test child forked, pid 302166 a53a00-a53a39 l buf1 perf does have symbol 'buf1' Recording workload... Waiting for "perf record has started" message OK Cleaning up files... ---- end(0) ---- 126: Test data symbol : Ok Fixes: `3dfc01fe9d` ("perf test: Add 'datasym' test workload") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250226230109.314580-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-07 14:07:07 -08:00
Namhyung Kim	e1f5bb18a7	perf report: Fix memory leaks in the hierarchy mode Ian told me that there are many memory leaks in the hierarchy mode. I can easily reproduce it with the follwing command. $ make DEBUG=1 EXTRA_CFLAGS=-fsanitize=leak $ perf record --latency -g -- ./perf test -w thloop $ perf report -H --stdio ... Indirect leak of 168 byte(s) in 21 object(s) allocated from: #0 0x7f3414c16c65 in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75 #1 0x55ed3602346e in map__get util/map.h:189 #2 0x55ed36024cc4 in hist_entry__init util/hist.c:476 #3 0x55ed36025208 in hist_entry__new util/hist.c:588 #4 0x55ed36027c05 in hierarchy_insert_entry util/hist.c:1587 #5 0x55ed36027e2e in hists__hierarchy_insert_entry util/hist.c:1638 #6 0x55ed36027fa4 in hists__collapse_insert_entry util/hist.c:1685 #7 0x55ed360283e8 in hists__collapse_resort util/hist.c:1776 #8 0x55ed35de0323 in report__collapse_hists /home/namhyung/project/linux/tools/perf/builtin-report.c:735 #9 0x55ed35de15b4 in __cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1119 #10 0x55ed35de43dc in cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1867 #11 0x55ed35e66767 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:351 #12 0x55ed35e66a0e in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:404 #13 0x55ed35e66b67 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:448 #14 0x55ed35e66eb0 in main /home/namhyung/project/linux/tools/perf/perf.c:556 #15 0x7f340ac33d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 ... $ perf report -H --stdio 2>&1 \| grep -c '^Indirect leak' 93 I found that hist_entry__delete() missed to release child entries in the hierarchy tree (hroot_{in,out}). It needs to iterate the child entries and call hist_entry__delete() recursively. After this change: $ perf report -H --stdio 2>&1 \| grep -c '^Indirect leak' 0 Reported-by: Ian Rogers <irogers@google.com> Tested-by Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250307061250.320849-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-07 14:07:07 -08:00
Namhyung Kim	e242df05ee	perf report: Use map_symbol__copy() when copying callchains It seems there are places to miss updating refcount of maps. Let's use map_symbol__copy() helper to properly copy them with refcounts updated. Link: https://lore.kernel.org/r/20250307061250.320849-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-07 14:06:56 -08:00
Athira Rajeev	4c3f09e35c	perf annotate: Return errors from disasm_line__parse_powerpc() In disasm_line__parse_powerpc() , return code from function disasm_line__parse() is ignored. This will result in bad results if the disasm_line__parse() fails to disasm the line. Use the return code to fix this. Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/r/20250304154114.62093-2-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-06 16:52:24 -08:00
Athira Rajeev	dab8c32ece	perf annotate: Add annotation_options.disassembler_used When doing "perf annotate", perf tool provides option to use specific disassembler like llvm/objdump/capstone. The order picked is to use llvm first and if that fails fallback to objdump ie to use PERF_DISASM_LLVM, PERF_DISASM_CAPSTONE and PERF_DISASM_OBJDUMP In powerpc, when using "data type" sort keys, first preferred approach is to read the raw instruction from the DSO. In objdump is specified in "--objdump" option, it picks the symbol disassemble using objdump. Currently disasm_line__parse_powerpc() function uses length of the "line" to determine if objdump is used. But there are few cases, where if objdump doesn't recognise the instruction, the disassembled string will be empty. Example: 134cdc: c4 05 82 41 beq 1352a0 <getcwd+0x6e0> 134ce0: ac 00 8e 40 bne cr3,134d8c <getcwd+0x1cc> 134ce4: 0f 00 10 04 pld r9,1028308 ====>134ce8: d4 b0 20 e5 134cec: 16 00 40 39 li r10,22 134cf0: 48 01 21 ea ld r17,328(r1) So depending on length of line will give bad results. Add a new filed to annotation options structure, "struct annotation_options" to save the disassembler used. Use this info to determine if disassembly is done while parsing the disasm line. Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/r/20250304154114.62093-1-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-06 16:51:22 -08:00
Namhyung Kim	b0920abe0d	perf report: Do not process non-JIT BPF ksymbol events The length of PERF_RECORD_KSYMBOL for BPF is a size of JITed code so it'd be 0 when it's not JITed. The ksymbol is needed to symbolize the code when it gets samples in the region but non-JITed code cannot get samples. Thus it'd be ok to ignore them. Actually it caused a performance issue in the perf tools on old ARM kernels where it can refuse to JIT some BPF codes. It ended up splitting the existing kernel map (kallsyms). And later lookup for a kernel symbol would create a new kernel map from kallsyms and then split it again and again. :( Probably there's a bug in the kernel map/symbol handling in perf tools. But I think we need to fix this anyway. Reported-by: Kevin Nomura <nomurak@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250305232838.128692-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-06 16:00:25 -08:00
Ian Rogers	2c744f38da	perf test: Fix leak in "Synthesize attr update" test The own_cpus map variable may be non-NULL and hold a reference, in particular on hybrid machines. Do a put before overwriting the variable to avoid a memory leak. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250305191931.604764-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-06 15:59:41 -08:00
Namhyung Kim	41453107bf	perf machine: Fix insertion of PERF_RECORD_KSYMBOL related kernel maps This was detected at the end of a 'perf record' session when build-id collection was enabled and thus the BPF programs put in place while the session was running, some even put in place by perf itself were processed and inserted, with some overlaps related to BPF trampolines and programs took place. Using maps__fixup_overlap_and_insert() instead of maps__insert() "fixes" the problem, in the sense that overlaps will be dealt with and then the consistency will be kept, but it would be interesting to fully understand why such overlaps take place and how to deal with them when doing symbol resolution. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Suggested-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/CAP-5=fXEEMFgPF2aZhKsfrY_En+qoqX20dWfuE_ad73Uxf0ZHQ@mail.gmail.com Link: https://lore.kernel.org/r/20250228211734.33781-7-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 23:04:07 -08:00
Arnaldo Carvalho de Melo	e0e4e0b8b7	perf maps: Add missing map__set_kmap_maps() when replacing a kernel map Since in this case __maps__insert_sorted() is not called and thus doesn't have the opportunity to do the needed map__set_kmap_maps() calls on the new map. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z7-May5w9VQd5QD0@x1 Link: https://lore.kernel.org/r/20250228211734.33781-6-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 23:03:43 -08:00
Namhyung Kim	0d11fab327	perf maps: Fixup maps_by_name when modifying maps_by_address We can't just replacing the map in the maps_by_address and not touching on the maps_by_name, that would leave the refcount as 1 and thus trip another consistency check, this one: perf: util/maps.c:110: check_invariants: Assertion `refcount_read(map__refcnt(map)) > 1' failed. 106 /* 107 * Maps by name maps should be in maps_by_address, so 108 * the reference count should be higher. 109 / 110 assert(refcount_read(map__refcnt(map)) > 1); Committer notice: Initialize the newly added 'ni' variable, that really can't be accessed unitialized trips some gcc versions, like: 12 20.00 archlinux:base : FAIL gcc version 13.2.1 20230801 (GCC) util/maps.c: In function ‘__maps__fixup_overlap_and_insert’: util/maps.c:896:54: error: ‘ni’ may be used uninitialized [-Werror=maybe-uninitialized] 896 \| map__put(maps_by_name[ni]); \| ^ util/maps.c:816:25: note: ‘ni’ was declared here 816 \| unsigned int i, ni; \| ^~ cc1: all warnings being treated as errors make[3]: ** [/git/perf-6.14.0-rc1/tools/build/Makefile.build:138: util] Error 2 Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z79std66tPq-nqsD@google.com Link: https://lore.kernel.org/r/20250228211734.33781-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 23:03:33 -08:00
Namhyung Kim	f7a46e028c	perf machine: Fixup kernel maps ends after adding extra maps I just noticed it would add extra kernel maps after modules. I think it should fixup end address of the kernel maps after adding all maps first. Fixes: `876e80cf83` ("perf tools: Fixup end address of modules") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z7TvZGjVix2asYWI@x1 Link: https://lore.kernel.org/lkml/Z712hzvv22Ni63f1@google.com Link: https://lore.kernel.org/r/20250228211734.33781-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 23:03:15 -08:00
Arnaldo Carvalho de Melo	25d9c0301d	perf maps: Set the kmaps for newly created/added kernel maps When using __maps__insert_sorted() the map kmaps field needs to be initialized, as we need kernel maps to work with map__kmap(). Fix it by using the newly introduced map__set_kmap() method. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z74V0hZXrTLM6VIJ@x1 Link: https://lore.kernel.org/r/20250228211734.33781-3-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 23:03:11 -08:00
Arnaldo Carvalho de Melo	99deaf5578	perf maps: Introduce map__set_kmap_maps() for kernel maps We need to set it in other places than __maps__insert(), so that we can have access to the 'struct maps' from a kernel 'struct map'. When building perf with 'DEBUG=1' we can notice it failing a consistency check done in the check_invariants() function: root@number:~# perf record -- perf test -w offcpu [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.040 MB perf.data (23 samples) ] perf: util/maps.c:95: check_invariants: Assertion `map__end(prev) <= map__end(map)' failed. Aborted (core dumped) root@number:~# The investigation on that was happening bisected to `876e80cf83` ("perf tools: Fixup end address of modules"), and the following patches will plug the problems found, this patch is just legwork on that direction. Use the map__set_kmap_maps() name as per a review comment from Ian Rogers, later there are further suggestions from him on getting rid of the kmaps variable, see the thread referenced in the Link below. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z74V0hZXrTLM6VIJ@x1 Link: https://lore.kernel.org/r/20250228211734.33781-2-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 23:03:04 -08:00
Thomas Falcon	74fb903b21	perf script: Fix output type for dynamically allocated core PMU's This patch was originally posted here: https://lore.kernel.org/all/20241213215421.661139-1-thomas.falcon@intel.com/ I have rebased on top of Arnaldo's patch here: https://lore.kernel.org/all/Z2XCi3PgstSrV0SE@x1/ The original commit message: " perf script output may show different fields on different core PMU's that exist on heterogeneous platforms. For example, perf record -e "{cpu_core/mem-loads-aux/,cpu_core/event=0xcd,\ umask=0x01,ldlat=3,name=MEM_UOPS_RETIRED.LOAD_LATENCY/}:upp"\ -c10000 -W -d -a -- sleep 1 perf script: chromium-browse 46572 [002] 544966.882384: 10000 cpu_core/MEM_UOPS_RETIRED.LOAD_LATENCY/: 7ffdf1391b0c 10268100142 \ \|OP LOAD\|LVL L1 hit\|SNP None\|TLB L1 or L2 hit\|LCK No\|BLK N/A 5 7 0 7fad7c47425d [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3) perf record -e cpu_atom/event=0xd0,umask=0x05,ldlat=3,\ name=MEM_UOPS_RETIRED.LOAD_LATENCY/upp -c10000 -W -d -a -- sleep 1 perf script: gnome-control-c 534224 [023] 544951.816227: 10000 cpu_atom/MEM_UOPS_RETIRED.LOAD_LATENCY/: 7f0aaaa0aae0 [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3) Some fields, such as data_src, are not included by default. The cause is that while one PMU may be assigned a type such as PERF_TYPE_RAW, other core PMU's are dynamically allocated at boot time. If this value does not match an existing PERF_TYPE_X value, output_type(perf_event_attr.type) will return OUTPUT_TYPE_OTHER. Instead search for a core PMU with a matching perf_event_attr type and, if one is found, return PERF_TYPE_RAW to match output of other core PMU's. " Suggested-by: Kan Liang <kan.liang@intel.com> Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305163935.1605312-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 11:46:00 -08:00
Thomas Richter	957d194163	perf bench: Fix perf bench syscall loop count Command 'perf bench syscall fork -l 100000' offers option -l to run for a specified number of iterations. However this option is not always observed. The number is silently limited to 10000 iterations as can be seen: Output before: # perf bench syscall fork -l 100000 # Running 'syscall/fork' benchmark: # Executed 10,000 fork() calls Total time: 23.388 [sec] 2338.809800 usecs/op 427 ops/sec # When explicitly specified with option -l or --loops, also observe higher number of iterations: Output after: # perf bench syscall fork -l 100000 # Running 'syscall/fork' benchmark: # Executed 100,000 fork() calls Total time: 716.982 [sec] 7169.829510 usecs/op 139 ops/sec # This patch fixes the issue for basic execve fork and getpgid. Fixes: `ece7f7c050` ("perf bench syscall: Add fork syscall benchmark") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20250304092349.2618082-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:19:23 -08:00
Namhyung Kim	b627b443cc	perf test: Simplify data symbol test Now the workload will end after 1 second. Just run it with perf instead of waiting for the background process. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304022837.1877845-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:17:01 -08:00
Namhyung Kim	f04c7ef352	perf test: Add timeout to datasym workload Unlike others it has an infinite loop that make it annoying to call. Make it finish after 1 second and handle command-line argument to change the setting. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304022837.1877845-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:17:01 -08:00
Namhyung Kim	15bcfb96d0	perf test: Add trace record and replay test It just check trace record and replay could display correct output. It uses 'sleep' process and sees there's a clock_nanosleep syscall. $ sudo perf test -vv replay 108: perf trace record and replay: --- start --- test child forked, pid 1563219 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.077 MB /tmp/temporary_file.w1ApA (242 samples) ] 0.686 (1000.068 ms): sleep/1563226 clock_nanosleep(rqtp: 0x7ffc20ffee10, rmtp: 0x7ffc20ffee50) = 0 ---- end(0) ---- 108: perf trace record and replay : Ok Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250304022837.1877845-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:17:01 -08:00
Namhyung Kim	38672c5033	perf test: Skip perf trace tests when running as non-root perf trace requires root because it needs to use tracepoints and BPF. Skip those test when it's not run as root. Before: $ perf test trace 15: Parse sched tracepoints fields : Skip (permissions) 80: perf ftrace tests : Skip 105: perf trace enum augmentation tests : FAILED! 106: perf trace BTF general tests : FAILED! 107: perf trace exit race : FAILED! 118: probe libc's inet_pton & backtrace it with ping : Skip 125: Check Arm CoreSight trace data recording and synthesized samples: Skip 127: Check Arm SPE trace data recording and synthesized samples : Skip 132: Check open filename arg using perf trace + vfs_getname : FAILED! After: $ perf test trace 15: Parse sched tracepoints fields : Skip (permissions) 80: perf ftrace tests : Skip 105: perf trace enum augmentation tests : Skip 106: perf trace BTF general tests : Skip 107: perf trace exit race : Skip 118: probe libc's inet_pton & backtrace it with ping : Skip 125: Check Arm CoreSight trace data recording and synthesized samples: Skip 127: Check Arm SPE trace data recording and synthesized samples : Skip 132: Check open filename arg using perf trace + vfs_getname : Skip Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250304022837.1877845-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:17:01 -08:00
Namhyung Kim	3fb29a7514	perf test: Skip perf probe tests when running as non-root perf trace requires root because it needs to use [ku]probes. Skip those test when it's not run as root. Before: $ perf test probe 47: Probe SDT events : Ok 104: test perf probe of function from different CU : FAILED! 115: perftool-testsuite_probe : FAILED! 117: Add vfs_getname probe to get syscall args filenames : FAILED! 118: probe libc's inet_pton & backtrace it with ping : FAILED! 119: Use vfs_getname probe to get syscall args filenames : FAILED! After: $ perf test probe 47: Probe SDT events : Ok 104: test perf probe of function from different CU : Skip 115: perftool-testsuite_probe : Skip 117: Add vfs_getname probe to get syscall args filenames : Skip 118: probe libc's inet_pton & backtrace it with ping : Skip 119: Use vfs_getname probe to get syscall args filenames : Skip Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20250304022837.1877845-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:17:01 -08:00
Namhyung Kim	45a86d017a	perf test: Add --metric-only to perf stat output tests Add a test case for --metric-only for std, csv, json output mode using shadow IPC metric from instructions and cycles events. It should produce 'insn per cycle' metric. But currently JSON output has (none) 'GHz' as well. It looks like a bug but I don't have enough time to debug it for now so I made it pass. :( $ perf stat --metric-only -e instructions,cycles true Performance counter stats for 'true': 0.56 0.002127319 seconds time elapsed 0.002077000 seconds user 0.000000000 seconds sys $ perf stat -x, --metric-only -e instructions,cycles true 0.55,, $ perf stat -j --metric-only -e instructions,cycles true {"insn per cycle" : "0.53", "GHz" : "none"} $ perf test output -v 5: Test data source output : Ok 31: Sort output of hist entries : Ok 88: perf stat CSV output linter : Ok 90: perf stat JSON output linter : Ok 92: perf stat STD output linter : Ok Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250304022837.1877845-2-namhyung@kernel.org Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:17:01 -08:00
Leo Yan	2cc2f258a9	perf arm-spe: Support previous branch target (PBT) address When FEAT_SPE_PBT is implemented, the previous branch target address (named as PBT) before the sampled operation, will be recorded. This commit first introduces a 'prev_br_tgt' field in the record for saving the PBT address in the decoder. If the current operation is a branch instruction, by combining with PBT, it can create a chain with two consecutive branches. As the branch stack stores branches in descending order, meaning a newer branch is stored in a lower entry in the stack. Arm SPE stores the latest branch in the first entry of branch stack, and the previous branch coming from PBT is stored into the second entry. Otherwise, if current operation is not a branch, the last branch will be saved for PBT only. PBT lacks associated information such as branch source address, branch type, and events. The branch entry fills zeros for the corresponding fields and only set its target address. After: perf script -f --itrace=bl -F flags,addr,brstack jcc ffff800080187914 0xffff8000801878fc/0xffff800080187914/P/-/-/1/COND/- 0x0/0xffff8000801878f8/-/-/-/0//- jcc ffff8000802d12d8 0xffff8000802d12f8/0xffff8000802d12d8/P/-/-/1/COND/- 0x0/0xffff8000802d12ec/-/-/-/0//- jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//- jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//- jmp ffff800081410980 0xffff800081419108/0xffff800081410980/P/-/-/1//- 0x0/0xffff800081419104/-/-/-/0//- return ffff80008036e064 0xffff80008141ba84/0xffff80008036e064/P/-/-/1/RET/- 0x0/0xffff80008141ba60/-/-/-/0//- jcc ffff8000803d54f0 0xffff8000803d54e8/0xffff8000803d54f0/P/-/-/1/COND/- 0x0/0xffff8000803d54e0/-/-/-/0//- jmp ffff80008015e468 0xffff8000803d46dc/0xffff80008015e468/P/-/-/1//- 0x0/0xffff8000803d46c8/-/-/-/0//- jmp ffff8000806e2d50 0xffff80008040f710/0xffff8000806e2d50/P/-/-/1//- 0x0/0xffff80008040f6e8/-/-/-/0//- jcc ffff800080721704 0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/- 0x0/0xffff8000807216ac/-/-/-/0//- Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-13-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:20 -08:00
Leo Yan	73cb57f56f	perf arm-spe: Add branch stack Although Arm SPE cannot generate continuous branch records, this commit creates a branch stack with only one branch entry. A single branch info can be used for performance optimization. A branch stack structure is dynamically allocated in the decode queue. The branch stack and stack flags are synthesized based on branch types and associated events. After: # perf script --itrace=bl1 -F flags,addr,brstack jcc ffffc0fad9c6b214 0xffffc0fad9c6b234/0xffffc0fad9c6b214/P/-/-/7/COND/- jcc/miss,not_taken/ ffffc0fadaaebb30 0xffffc0fadaaebb2c/0xffffc0fadaaebb30/MN/-/-/7/COND/- jmp ffffc0fadaaea358 0xffffc0fadaaea5ec/0xffffc0fadaaea358/P/-/-/5//- jcc/not_taken/ ffffc0fadaae6494 0xffffc0fadaae6490/0xffffc0fadaae6494/PN/-/-/11/COND/- jcc/not_taken/ ffff7f83ab54 0xffff7f83ab50/0xffff7f83ab54/PN/-/-/13/COND/- jcc/not_taken/ ffff7f83ab08 0xffff7f83ab04/0xffff7f83ab08/PN/-/-/8/COND/- jcc ffff7f83aa80 0xffff7f83aa58/0xffff7f83aa80/P/-/-/10/COND/- jcc ffff7f9a45d0 0xffff7f9a43f0/0xffff7f9a45d0/P/-/-/29/COND/- jcc/not_taken/ ffffc0fad9ba6db4 0xffffc0fad9ba6db0/0xffffc0fad9ba6db4/PN/-/-/44/COND/- jcc ffffc0fadaac2964 0xffffc0fadaac2970/0xffffc0fadaac2964/P/-/-/6/COND/- jcc ffffc0fad99ddc10 0xffffc0fad99ddc04/0xffffc0fad99ddc10/P/-/-/72/COND/- jcc/not_taken/ ffffc0fad9b3f21c 0xffffc0fad9b3f218/0xffffc0fad9b3f21c/PN/-/-/64/COND/- jcc ffffc0fad9c3b604 0xffffc0fad9c3b5f8/0xffffc0fad9c3b604/P/-/-/13/COND/- jcc ffffc0fadaad6048 0xffffc0fadaad5f8c/0xffffc0fadaad6048/P/-/-/5/COND/- return/miss/ ffff7f84e614 0xffffc0fad98a2274/0xffff7f84e614/M/-/-/13/RET/- jcc/not_taken/ ffffc0fadaac4eb4 0xffffc0fadaac4eb0/0xffffc0fadaac4eb4/PN/-/-/5/COND/- jmp ffff7f8e3130 0xffff7f87555c/0xffff7f8e3130/P/-/-/5//- jcc/not_taken/ ffffc0fad9b3d9b0 0xffffc0fad9b3d9ac/0xffffc0fad9b3d9b0/PN/-/-/14/COND/- return ffffc0fad9b91950 0xffffc0fad98c3e28/0xffffc0fad9b91950/P/-/-/12/RET/- Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-12-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:20 -08:00
Leo Yan	4a53a67e0e	perf arm-spe: Set sample flags with supplement info Based on the supplement information in the record, this commit sets the sample flags for conditional branch, function call, return. It also sets events in flags, such as mispredict, not taken, and in transaction. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-11-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:20 -08:00
Leo Yan	5c1b158396	perf arm-spe: Fill branch operations and events to record The new added branch operations and events are filled into record, the information will be consumed when synthesizing samples. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-10-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:20 -08:00
Leo Yan	faf2260542	perf arm-spe: Decode transactional event The bit[16] in an event payload indicates an operation is in transactional state. Decode the bit. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-9-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:20 -08:00
Leo Yan	64d86c03e1	perf arm-spe: Extend branch operations In Arm ARM (ARM DDI 0487, L.a), the section "D18.2.7 Operation Type packet", the branch subclass is extended for Call Return (CR), Guarded control stack data access (GCS). This commit adds support CR and GCS operations. The IND (indirect) operation is defined only in bit [1], its macro is updated accordingly. Move the COND (Conditional) macro into the same group with other operations for better maintenance. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-8-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:19 -08:00
Leo Yan	e1d47850bb	perf arm-spe: Fix load-store operation checking The ARM_SPE_OP_LD and ARM_SPE_OP_ST operations are secondary operation type, they are overlapping with other second level's operation types belonging to SVE and branch operations. As a result, a non load-store operation can be parsed for data source and memory sample. To fix the issue, this commit introduces a is_ldst_op() macro for checking LDST operation, and apply the checking when synthesize data source and memory samples. Fixes: `a89dbc9b98` ("perf arm-spe: Set sample's data source field") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250304111240.3378214-7-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:19 -08:00
Leo Yan	1e66dcff7b	perf script: Add not taken event for branch stack The branch stack has an existed field for printing mispredict, extend the field for printing events and add support not-taken event. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-6-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:19 -08:00
Leo Yan	4caa971050	perf script: Add not taken event for branches Some hardware (e.g., Arm SPE) can trace the not taken event for branches. Add a flag for this event and support printing it. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-5-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:19 -08:00
Leo Yan	88b1473135	perf script: Separate events from branch types Branch types and events are two different things. A branch type can be a conditional branch, an indirect branch, a procedure call, a return, or an exception taken, etc. The extra event information is provided for what happens during a branch, e.g. if a branch is mispredicted or not taken (specific to conditional branches). To deliver information about branches, this commit separates events from branch types. It parses branch types first, then appends event strings embraced by the '/' character. If multiple events occur, the events is separated with a comma (,). Also add a minor improvement by adding char 'm' in char array for branch mispredict event. Below are extracted sample flags. Before: branch: br miss instructions: br miss After: branch: jmp/miss/ instructions: jmp/miss/ Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:19 -08:00
Leo Yan	4d59818897	perf script: Refactor sample_flags_to_name() function When generating a string for sample flags, the sample_flags_to_name() function lacks the ability to parse the trace start bit or trace end bit. Therefore, the function is invoked multiple times after clearing its unsupported bits. This commit improves the sample_flags_to_name() function to parse sample flags in one go for three kinds of information: - The prefix info for trace start, trace end, etc. - Branch types. - Extra info for transaction and interrupt related info. As a result, the code is simplified to call the sample_flags_to_name() only once. No expectation for any changes in the perf script output. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:19 -08:00
Leo Yan	2b747a86d8	perf script: Make printing flags reliable Add a check for the generated string of flags. Print out the raw number if the string generation fails. Use the SAMPLE_FLAGS_STR_ALIGNED_SIZE macro to replace the value '21'. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20250304111240.3378214-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-05 09:13:19 -08:00
James Clark	be9f3e95a9	perf stat: Fix non-uniquified hybrid legacy events Legacy hybrid events have attr.type == PERF_TYPE_HARDWARE, so they look like plain legacy events if we only look at attr.type. But legacy events should still be uniquified if they were opened on a non-legacy PMU. Fix it by checking if the evsel is hybrid and forcing needs_uniquify before looking at the attr.type. This restores PMU names on hybrid systems and also changes "perf stat metrics (shadow stat) test" from a FAIL back to a SKIP (on hybrid). The test was gated on "cycles" appearing alone which doesn't happen on here. Before: $ perf stat -- true ... <not counted> instructions:u (0.00%) 162,536 instructions:u # 0.58 insn per cycle ... After: $ perf stat -- true ... <not counted> cpu_atom/instructions/u (0.00%) 162,541 cpu_core/instructions/u # 0.62 insn per cycle ... Fixes: `357b965deb` ("perf stat: Changes to event name uniquification") Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250226145526.632380-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-03 12:48:17 -08:00
Namhyung Kim	7788ad59d1	perf tools: Skip BPF sideband event for userspace profiling The BPF sideband information is tracked using a separate thread and evlist. But it's only useful for profiling kernel and we can skip it when users profile their application only. It seems it already fails to open the sideband event in that case. Let's remove the noise in the verbose output anyway. Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250226203039.1099131-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-03-02 09:47:24 -08:00
Colin Ian King	7e55bc0110	perf test: Fix spelling mistake "sythesizing" -> "synthesizing" There are spelling mistakes in TEST_ASSERT_VAL messages. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20250228090941.680226-1-colin.i.king@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-28 16:17:53 -08:00
Luca Ceresoli	75100d848e	perf build: Fix in-tree build due to symbolic link Building perf in-tree is broken after commit `890a1961c8` ("perf tools: Create source symlink in perf object dir") which added a 'source' symlink in the output dir pointing to the source dir. With in-tree builds, the added 'SOURCE = ...' line is executed multiple times (I observed 2 during the build plus 2 during installation). This is a minor inefficiency, in theory not harmful because symlink creation is assumed to be idempotent. But it is not. Considering with in-tree builds: srctree=/absolute/path/to/linux OUTPUT=/absolute/path/to/linux/tools/perf here's what happens: 1. ln -sf $(srctree)/tools/perf $(OUTPUT)/source -> creates /absolute/path/to/linux/tools/perf/source link to /absolute/path/to/linux/tools/perf => OK, that's what was intended 2. ln -sf $(srctree)/tools/perf $(OUTPUT)/source # same command as 1 -> creates /absolute/path/to/linux/tools/perf/perf link to /absolute/path/to/linux/tools/perf => Not what was intended, not idempotent 3. Now the build _should_ create the 'perf' executable, but it fails The reason is the tricky 'ln' command line. At the first invocation 'ln' uses the 1st form: ln [OPTION]... [-T] TARGET LINK_NAME and creates a link to TARGET called LINK_NAME. At the second invocation $(OUTPUT)/source exists, so 'ln' uses the 3rd form: ln [OPTION]... TARGET... DIRECTORY and creates a link to TARGET called TARGET inside DIRECTORY. Fix by adding -n/--no-dereference to "treat LINK_NAME as a normal file if it is a symbolic link to a directory", as the manpage says. Closes: https://lore.kernel.org/all/20241125182506.38af9907@booty/ Fixes: `890a1961c8` ("perf tools: Create source symlink in perf object dir") Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20250124-perf-fix-intree-build-v1-1-485dd7a855e4@bootlin.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-28 16:17:41 -08:00
Leo Yan	e50b291fbb	perf arm-spe: Report error if set frequency When users set the parameter '-F' to specify frequency for Arm SPE, the tool reports error: perf record -F 1000 -e arm_spe_0// -- sleep 1 Error: Invalid event (arm_spe_0//) in per-thread mode, enable system wide with '-a'. The output logs are confused and it does not give the correct reminding. Arm SPE does not support frequency setting given it adopts a statistical based approach. Alternatively, Arm SPE supports setting period. This commit adds a for frequency setting. It reports error and reminds users to set period instead. After: perf record -F 1000 -e arm_spe_0// -- sleep 1 Arm SPE: Frequency is not supported. Set period with -c option or PMU parameter (-e arm_spe_0/period=NUM/). Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250227085544.2154136-1-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-28 10:09:03 -08:00
Chun-Tse Shao	3c97e7b991	perf lock: Report owner stack in usermode This patch parses `owner_lock_stat` into a RB tree, enabling ordered reporting of owner lock statistics with stack traces. It also updates the documentation for the `-o` option in contention mode, decouples `-o` from `-t`, and issues a warning to inform users about the new behavior of `-ov`. Example output: $ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex-spin -E3 perf bench sched pipe ... contended total wait max wait avg wait type caller 171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57 0xffffffffac6318e7 pipe_read+0x57 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50 0xffffffffac631ee0 pipe_write+0x50 0xffffffffac6241db vfs_write+0x3bb 0xffffffffac6244ab ksys_write+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d 0xffffffffac691f0d do_epoll_wait+0x24d 0xffffffffac69249b do_epoll_pwait.part.0+0xb 0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 === owner stack trace === 3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348 0xffffffffac631bd8 pipe_read+0x348 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 ... Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-5-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-28 10:09:02 -08:00
Chun-Tse Shao	a40ccb7d98	perf lock: Make rb_tree helper functions generic The rb_tree helper functions can be reused for parsing `owner_lock_stat` into rb tree for sorting. Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-4-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-28 00:29:37 -08:00
Chun-Tse Shao	425bc88352	perf lock: Retrieve owner callstack in bpf program This implements per-callstack aggregation of lock owners in addition to per-thread. The owner callstack is captured using `bpf_get_task_stack()` at `contention_begin()` and it also adds a custom stackid function for the owner stacks to be compared easily. The owner info is kept in a hash map using lock addr as a key to handle multiple waiters for the same lock. At `contention_end()`, it updates the owner lock stat based on the info that was saved at `contention_begin()`. If there are more waiters, it'd update the owner pid to itself as `contention_end()` means it gets the lock now. But it also needs to check the return value of the lock function in case task was killed by a signal or something. Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-3-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-28 00:29:37 -08:00
Chun-Tse Shao	17ae7f9049	perf lock: Add bpf maps for owner stack tracing Add a struct and few bpf maps in order to tracing owner stack. `struct owner_tracing_data`: Contains owner's pid, stack id, timestamp for when the owner acquires lock, and the count of lock waiters. `stack_buf`: Percpu buffer for retrieving owner stacktrace. `owner_stacks`: For tracing owner stacktrace to customized owner stack id. `owner_data`: For tracing lock_address to `struct owner_tracing_data` in bpf program. `owner_stat`: For reporting owner stacktrace in usermode. Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-2-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-28 00:29:36 -08:00
Ian Rogers	c760174401	perf cpumap: Reduce cpu size from int to int16_t Fewer than 32k logical CPUs are currently supported by perf. A cpumap is indexed by an integer (see perf_cpu_map__cpu) yielding a perf_cpu that wraps a 4-byte int for the logical CPU - the wrapping is done deliberately to avoid confusing a logical CPU with an index into a cpumap. Using a 4-byte int within the perf_cpu is larger than required so this patch reduces it to the 2-byte int16_t. For a cpumap containing 16 entries this will reduce the array size from 64 to 32 bytes. For very large servers with lots of logical CPUs the size savings will be greater. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250210191231.156294-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-27 08:47:25 -08:00
Athira Rajeev	2337b7251d	perf trace: Add missing perf_tool__init() Perf trace on perf.data fails as below: ./perf trace record -- sleep 1 ./perf trace -i perf.data perf: Segmentation fault Segmentation fault (core dumped) Backtrace pointed to : ?? () perf_session.process_user_event () reader.read_event () perf_session.process_events () cmd_trace () run_builtin () handle_internal_command () main () Further debug pointed that, segmentation fault happens when trying to access id_index. Code snippet: case PERF_RECORD_ID_INDEX: err = tool->id_index(session, event); Since 'commit `15d4a6f41d` ("perf tool: Remove perf_tool__fill_defaults()")', perf_tool__fill_defaults is removed. All tools are initialized using perf_tool__init() prior to use. But in builtin-trace, perf_tool__init is not used and hence the defaults are not initialized. Use perf_tool__init() in perf trace to handle the initialization. Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250225113157.28836-1-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-27 08:46:45 -08:00
James Clark	5c496f1d67	perf list: Document -v option deduplication feature -v disables deduplication of similarly suffixed PMUs so add it to the help and doc strings. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250226104111.564443-4-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-26 16:23:47 -08:00
James Clark	c9d699e10f	perf pmu: Don't double count common sysfs and json events After pmu_add_cpu_aliases() is called, perf_pmu__num_events() returns an incorrect value that double counts common events and doesn't match the actual count of events in the alias list. This is because after 'cpu_aliases_added == true', the number of events returned is 'sysfs_aliases + cpu_json_aliases'. But when adding 'case EVENT_SRC_SYSFS' events, 'sysfs_aliases' and 'cpu_json_aliases' are both incremented together, failing to account that these ones overlap and only add a single item to the list. Fix it by adding another counter for overlapping events which doesn't influence 'cpu_json_aliases'. There doesn't seem to be a current issue because it's used in perf list before pmu_add_cpu_aliases() so the correct value is returned. Other uses in tests may also miss it for other reasons like only looking at uncore events. However it's marked as a fixes commit in case any new fix with new uses of perf_pmu__num_events() is backported. Fixes: `d9c5f5f94c` ("perf pmu: Count sys and cpuid JSON events separately") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250226104111.564443-3-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-26 16:23:47 -08:00
James Clark	72c6f57a41	perf pmu: Dynamically allocate tool PMU perf_pmus__destroy() treats all PMUs as allocated and free's them so we can't have any static PMUs that are added to the PMU lists. Fix it by allocating the tool PMU in the same way as the others. Current users of the tool PMU already use find_pmu() and not perf_pmus__tool_pmu(), so rename the function to add 'new' to avoid it being misused in the future. perf_pmus__fake_pmu() can remain as static as it's not added to the PMU lists. Fixes the following error: $ perf bench internals pmu-scan # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times munmap_chunk(): invalid pointer Aborted (core dumped) Fixes: `240505b2d0` ("perf tool_pmu: Factor tool events into their own PMU") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250226104111.564443-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-26 16:23:47 -08:00
Athira Rajeev	556b58c191	perf probe: Pick the correct dwarf die while adding probe points Perf probe on vfs_fstatat fails as below on a powerpc system $ ./perf probe -nf --max-probes=512 -a 'vfs_fstatat $params' Segmentation fault (core dumped) This is observed while running perftool-testsuite_probe testcase. While running with verbose, its observed that segfault happens at: synthesize_probe_trace_arg () synthesize_probe_trace_command () probe_file.add_event () apply_perf_probe_events () __cmd_probe () cmd_probe () run_builtin () handle_internal_command () main () Code in synthesize_probe_trace_arg() access a null value and results in segfault. Data structure which is null: struct probe_trace_arg arg->value We are hitting a case where arg->value is null in probe point: "vfs_fstatat $params". This is happening since 'commit `e896474fe4` ("getname_maybe_null() - the third variant of pathname copy-in")' Before the commit, probe point for vfs_fstatat was getting added only for one location: Writing event: p:probe/vfs_fstatat _text+6345404 dfd=%gpr3:s32 filename=%gpr4:x64 stat=%gpr5:x64 flags=%gpr6:s32 With this change, vfs_fstatat code is inlined for other locations in the code: Probe point found: __do_sys_lstat64+48 Probe point found: __do_sys_stat64+48 Probe point found: __do_sys_newlstat+48 Probe point found: __do_sys_newstat+48 Probe point found: vfs_fstatat+0 When trying to find matching dwarf information entry (DIE) from the debuginfo, the code incorrectly picks DIE which is not referring to vfs_fstatat. Snippet from dwarf entry in vmlinux debuginfo file. The main abstract die is: <1><4214883>: Abbrev Number: 147 (DW_TAG_subprogram) <4214885> DW_AT_external : 1 <4214885> DW_AT_name : (indirect string, offset: 0x17b9f3): vfs_fstatat With formal parameters: <2><4214896>: Abbrev Number: 51 (DW_TAG_formal_parameter) <4214897> DW_AT_name : dfd <2><42148a3>: Abbrev Number: 23 (DW_TAG_formal_parameter) <42148a4> DW_AT_name : (indirect string, offset: 0x8fda9): filename <2><42148b0>: Abbrev Number: 23 (DW_TAG_formal_parameter) <42148b1> DW_AT_name : (indirect string, offset: 0x16bd9c): stat <2><42148bd>: Abbrev Number: 23 (DW_TAG_formal_parameter) <42148be> DW_AT_name : (indirect string, offset: 0x39832b): flags While collecting variables/parameters for a probe point, the function copy_variables_cb() also looks at dwarf debug entries based on the instruction address. Snippet if (dwarf_haspc(die_mem, vf->pf->addr)) return DIE_FIND_CB_CONTINUE; else return DIE_FIND_CB_SIBLING; But incase of inlined function instance for vfs_fstatat, there are two entries which has the instruction address entry point as same. Instance 1: which is for vfs_fstatat and DW_AT_abstract_origin points to 0x4214883 (reference above for main abstract die) <3><42131fa>: Abbrev Number: 59 (DW_TAG_inlined_subroutine) <42131fb> DW_AT_abstract_origin: <0x4214883> <42131ff> DW_AT_entry_pc : 0xc00000000062b1e0 Instance 2: which is not for vfs_fstatat but for getname <5><4213270>: Abbrev Number: 39 (DW_TAG_inlined_subroutine) <4213271> DW_AT_abstract_origin: <0x4215b6b> <4213275> DW_AT_entry_pc : 0xc00000000062b1e0 But the copy_variables_cb() continues to add parameters from second instance also based on the dwarf_haspc() check. This results in formal parameters for getname also appended to params. But while filling in the args->value for these parameters, since these args are not part of dwarf with offset "42131fa". Hence value will be null. This incorrect args results in segfault when value field is accessed. Save the dwarf dieoffset of the actual DW_TAG_subprogram as part of "struct probe_finder". In copy_variables_cb(), include check to make sure the DW_AT_abstract_origin points to the correct entry if the dwarf_haspc() matches the instruction address. Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20250225123042.37263-1-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-26 14:25:14 -08:00
Gabriele Monaco	833d025239	perf ftrace latency: allow to hide empty buckets Especially while using several buckets, it isn't uncommon to have some of them empty and reading the histogram may be a bit more complex: # perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency 200 # DURATION \| COUNT \| GRAPH \| 0 - 5 us \| 14816 \| ###################################### \| 5 - 10 us \| 1228 \| ### \| 10 - 15 us \| 438 \| # \| 15 - 20 us \| 106 \| \| 20 - 25 us \| 21 \| \| 25 - 30 us \| 11 \| \| 30 - 35 us \| 1 \| \| 35 - 40 us \| 2 \| \| 40 - 45 us \| 4 \| \| 45 - 50 us \| 0 \| \| 50 - 55 us \| 1 \| \| 55 - 60 us \| 0 \| \| 60 - 65 us \| 1 \| \| 65 - 70 us \| 1 \| \| 70 - 75 us \| 1 \| \| 75 - 80 us \| 2 \| \| 80 - 85 us \| 0 \| \| 85 - 90 us \| 1 \| \| 90 - 95 us \| 0 \| \| 95 - 100 us \| 1 \| \| 100 - 105 us \| 0 \| \| 105 - 110 us \| 0 \| \| 110 - 115 us \| 0 \| \| 115 - 120 us \| 0 \| \| 120 - 125 us \| 1 \| \| 125 - 130 us \| 0 \| \| 130 - 135 us \| 0 \| \| 135 - 140 us \| 1 \| \| 140 - 145 us \| 0 \| \| 145 - 150 us \| 0 \| \| 150 - 155 us \| 0 \| \| 155 - 160 us \| 0 \| \| 160 - 165 us \| 0 \| \| 165 - 170 us \| 0 \| \| 170 - 175 us \| 0 \| \| 175 - 180 us \| 0 \| \| 180 - 185 us \| 0 \| \| 185 - 190 us \| 0 \| \| 190 - 195 us \| 0 \| \| 195 - 200 us \| 0 \| \| 200 - ... us \| 2 \| \| Allow the optional flag --hide-empty to remove buckets with no element and produce a more compact graph. This feature could be misleading since there is no clear indication for missing buckets, for this reason it's disabled by default. # perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency --hide-empty 200 # DURATION \| COUNT \| GRAPH \| 0 - 5 us \| 14816 \| ###################################### \| 5 - 10 us \| 1228 \| ### \| 10 - 15 us \| 438 \| # \| 15 - 20 us \| 106 \| \| 20 - 25 us \| 21 \| \| 25 - 30 us \| 11 \| \| 30 - 35 us \| 1 \| \| 35 - 40 us \| 2 \| \| 40 - 45 us \| 4 \| \| 50 - 55 us \| 1 \| \| 60 - 65 us \| 1 \| \| 65 - 70 us \| 1 \| \| 70 - 75 us \| 1 \| \| 75 - 80 us \| 2 \| \| 85 - 90 us \| 1 \| \| 95 - 100 us \| 1 \| \| 120 - 125 us \| 1 \| \| 135 - 140 us \| 1 \| \| 200 - ... us \| 2 \| \| Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250207080446.77630-2-gmonaco@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-26 13:48:02 -08:00
Gabriele Monaco	4a75e8c3b2	perf ftrace latency: variable histogram buckets The max-latency value can make the histogram smaller, but not larger, we have a maximum of 22 buckets and specifying a max-latency that would require more buckets has no effect. Dynamically allocate the buckets and compute the bucket number from the max latency as (max-min) / range + 2 If the maximum is not specified, we still set the bucket number to 22 and compute the maximum accordingly. Fail if the maximum is smaller than min+range, this way we make sure we always have 3 buckets: those below min, those above max and one in the middle. Since max-latency is not available in log2 mode, always use 22 buckets. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250207080446.77630-1-gmonaco@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-26 13:48:02 -08:00
Namhyung Kim	f4dc5a3355	perf annotate-data: Handle direct use of stack pointer without fbreg Sometimes compiler generates code to use the stack pointer register without frame pointer. As we know RSP is the stack register on x86, let's treat it as same as fbreg. But the offset would be opposite direction so update the debug message accordingly. Reported-by: Blake Jones <blakejones@google.com> Link: https://lore.kernel.org/r/20250126210242.1181225-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-26 13:42:49 -08:00
Linus Torvalds	9f5270d758	perf tools fixes for v6.14: 2nd batch - Fix tools/ quiet build Makefile infrastructure that was broken when working on tools/perf/ without testing on other tools/ living utilities. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZ74OJwAKCRCyPKLppCJ+ J30mAPsHCA8A+CNq/5yW2VhFLV1GgCSL5oWqxXRn7QjhSrCQBQEAot2u4O5zXs7M sg+mPlYiS1oT+zmvTLlXrN+bVyWP9A4= =jH1N -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix tools/ quiet build Makefile infrastructure that was broken when working on tools/perf/ without testing on other tools/ living utilities. * tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: tools: Remove redundant quiet setup tools: Unify top-level quiet infrastructure	2025-02-25 13:32:32 -08:00
Thomas Falcon	c40aa8d98d	perf report: Fix sample number stats for branch entry mode Currently, stats->nr_samples is incremented per entry in the branch stack instead of per sample taken. As a result, statistics of samples taken during perf record in --branch-filter or --branch-any mode does not seem correct. Instead call hists__inc_nr_samples() for each sample taken instead of for each entry in the branch stack. Before: $ ./perf record -e cycles:u -b -c 10000000000 ./tchain_edit [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.005 MB perf.data (2 samples) ] $ perf report -D \| tail -n 16 Aggregated stats: TOTAL events: 16 COMM events: 2 (12.5%) EXIT events: 1 ( 6.2%) SAMPLE events: 2 (12.5%) MMAP2 events: 2 (12.5%) KSYMBOL events: 1 ( 6.2%) FINISHED_ROUND events: 1 ( 6.2%) ID_INDEX events: 1 ( 6.2%) THREAD_MAP events: 1 ( 6.2%) CPU_MAP events: 1 ( 6.2%) EVENT_UPDATE events: 2 (12.5%) TIME_CONV events: 1 ( 6.2%) FINISHED_INIT events: 1 ( 6.2%) cpu_core/cycles/u stats: SAMPLE events: 64 After: $ ./perf report -D \| tail -n 16 Aggregated stats: TOTAL events: 16 COMM events: 2 (12.5%) EXIT events: 1 ( 6.2%) SAMPLE events: 2 (12.5%) MMAP2 events: 2 (12.5%) KSYMBOL events: 1 ( 6.2%) FINISHED_ROUND events: 1 ( 6.2%) ID_INDEX events: 1 ( 6.2%) THREAD_MAP events: 1 ( 6.2%) CPU_MAP events: 1 ( 6.2%) EVENT_UPDATE events: 2 (12.5%) TIME_CONV events: 1 ( 6.2%) FINISHED_INIT events: 1 ( 6.2%) cpu_core/cycles/u stats: SAMPLE events: 2 Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250220045942.114965-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-24 16:02:28 -08:00
Ian Rogers	e7af194681	perf machine: Reuse module path buffer Rather than copying the path and appending the directory entry in a fresh path buffer, append to the path at the end of where it is for the recursion level. This saves a PATH_MAX buffer per recursion level and some unnecessary copying. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250222061015.303622-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-24 15:46:33 -08:00
Ian Rogers	d996c726a5	perf hwmon_pmu: Switch event discovery to io_dir__readdir Avoid DIR allocations when scanning sysfs by using io_dir for the readdir implementation, that allocates about 1kb on the stack. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250222061015.303622-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-24 15:46:33 -08:00
Ian Rogers	bb327140f5	perf parse-events: Switch tracepoints to io_dir__readdir Avoid DIR allocations when scanning sysfs by using io_dir for the readdir implementation, that allocates about 1kb on the stack. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250222061015.303622-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-24 15:46:33 -08:00
Ian Rogers	56406bd557	perf events: Remove scandir in thread synthesis This avoids scanddir reading the directory into memory that's allocated and instead allocates on the stack. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250222061015.303622-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-24 15:46:33 -08:00
Ian Rogers	d6cd7c9f02	perf header: Switch mem topology to io_dir__readdir Switch memory_node__read and build_mem_topology from opendir/readdir to io_dir__readdir, with smaller stack allocations. Reduces peak memory consumption of perf record by 10kb. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250222061015.303622-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-24 15:46:33 -08:00
Ian Rogers	6a81a3fd9e	perf pmu: Switch to io_dir__readdir Avoid DIR allocations when scanning sysfs by using io_dir for the readdir implementation, that allocates about 1kb on the stack. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250222061015.303622-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-24 15:46:33 -08:00
Ian Rogers	f7cada5f7e	perf maps: Switch modules tree walk to io_dir__readdir Compared to glibc's opendir/readdir this lowers the max RSS of perf record by 1.8MB on a Debian machine. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250222061015.303622-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-24 15:46:33 -08:00
Ian Rogers	7e05269ba8	perf parse-events: Tidy name token matching Prior to commit `70c90e4a6b` ("perf parse-events: Avoid scanning PMUs before parsing") names (generally event names) excluded hyphen (minus) symbols as the formation of legacy names with hyphens was handled in the yacc code. That commit allowed hyphens supposedly making name_minus unnecessary. However, changing name_minus to name has issues in the term config tokens as then name ends up having priority over numbers and name allows matching numbers since commit `5ceb57990b` ("perf parse: Allow tracepoint names to start with digits "). It is also permissable for a name to match with a colon (':') in it when its in a config term list. To address this rename name_minus to term_name, make the pattern match name's except for the colon, add number matching into the config term region with a higher priority than name matching. This addresses an inconsistency and allows greater matching for names inside of term lists, for example, they may start with a number. Rename name_tag to quoted_name and update comments and helper functions to avoid str detecting quoted strings which was already done by the lexer. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250109175401.161340-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-20 22:35:10 -08:00
Krzysztof Łopatowski	4bac7fb586	perf tools: Improve startup time by reducing unnecessary stat() calls When testing perf trace on NixOS, I noticed significant startup delays: - `ls`: ~2ms - `strace ls`: ~10ms - `perf trace ls`: ~550ms Profiling showed that 51% of the time is spent reading files, 26% in loading BPF programs, and 11% in `newfstatat`. This patch optimizes module path exploration by avoiding `stat()` calls unless necessary. For filesystems that do not implement `d_type` (DT_UNKNOWN), it falls back to the old behavior. See `readdir(3)` for details. This reduces `perf trace ls` time to ~500ms. A more thorough startup optimization based on command parameters would be ideal, but that is a larger effort. Signed-off-by: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com> Acked-by: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-19 13:55:59 -08:00
Dmitry Vyukov	6353255e7c	perf report: Fix input reload/switch with symbol sort key Currently the code checks that there is no "ipc" in the sort order and add an ipc string. This will always error out on the second pass after input reload/switch, since the sort order already contains "ipc". Do the ipc check/fixup only on the first pass. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/20250108063628.215577-1-dvyukov@google.com Fixes: `ec6ae74fe8` ("perf report: Display average IPC and IPC coverage per symbol") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-19 13:27:59 -08:00
Namhyung Kim	acda4c2001	perf report: Support switching data w/ and w/o callchains The symbol_conf.use_callchain should be reset when switching to new data file, otherwise report__setup_sample_type() will show an error message that it enabled callchains but no callchain data. The function also will turn on the callchains if the data has PERF_SAMPLE_CALLCHAIN so I think it's ok to reset symbol_conf.use_callchain here. Link: https://lore.kernel.org/r/20250211060745.294289-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-19 13:23:58 -08:00
Namhyung Kim	43c2b6139b	perf report: Switch data file correctly in TUI The 's' key is to switch to a new data file and load the data in the same window. The switch_data_file() will show a popup menu to select which data file user wants and update the 'input_name' global variable. But in the cmd_report(), it didn't update the data.path using the new 'input_name' and keep usng the old file. This is fairly an old bug and I assume people don't use this feature much. :) Link: https://lore.kernel.org/r/20250211060745.294289-1-namhyung@kernel.org Closes: https://lore.kernel.org/linux-perf-users/89e678bc-f0af-4929-a8a6-a2666f1294a4@linaro.org Fixes: `f5fc14124c` ("perf tools: Add data object to handle perf data file") Reported-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-19 13:23:58 -08:00
Greg Kroah-Hartman	0cced76a02	perf tools: Fix up some comments and code to properly use the event_source bus In sysfs, the perf events are all located in /sys/bus/event_source/devices/ but some places ended up hard-coding the location to be at the root of /sys/devices/ which could be very risky as you do not exactly know what type of device you are accessing in sysfs at that location. So fix this all up by properly pointing everything at the bus device list instead of the root of the sysfs devices/ tree. Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/2025021955-implant-excavator-179d@gregkh Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-19 13:23:43 -08:00
James Clark	687b8c3938	perf list: Also append PMU name in verbose mode When listing in verbose mode, the long description is used but the PMU name isn't appended. There doesn't seem to be a reason to exclude it when asking for more information, so use the same print block for both long and short descriptions. Before: $ perf list -v ... inst_retired [Instruction architecturally executed] After: $ perf list -v ... inst_retired [Instruction architecturally executed. Unit: armv8_cortex_a57] Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250219151622.1097289-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-19 13:23:43 -08:00
Yangyu Chen	2ed0e3ea8a	perf vendor events arm64: Fix incorrect CPU_CYCLE in metrics expr Some existing metrics for Neoverse N3 and V3 expressions use CPU_CYCLE to represent the number of cycles, but this is incorrect. The correct event to use is CPU_CYCLES. I encountered this issue while working on a patch to add pmu events for Cortex A720 and A520 by reusing the existing patch for Neoverse N3 and V3 by James Clark [1] and my check script [2] reported this issue. [1] https://lore.kernel.org/lkml/20250122163504.2061472-1-james.clark@linaro.org/ [2] https://github.com/cyyself/arm-pmu-check Signed-off-by: Yangyu Chen <cyy@cyyself.name> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/tencent_D4ED18476ADCE818E31084C60E3E72C14907@qq.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-19 13:23:43 -08:00
Namhyung Kim	29bab85418	perf script: Fix hangup in offline flamegraph report A recent change in the flamegraph script fixed an issue with live mode but it created another for offline mode. It needs to pass "-" to -i option to read from stdin in the live mode. Actually there's a logic to pass the option in the perf script code, but the script was written with "-- $@" which prevented the option to go to the perf script. So the previous commit added the hard-coded "-i -" to the report command. But it's a problem for the offline mode which expects input from a file and now it's stuck on reading from stdin. Let's remove the "-i - --" part and let it pass the options properly to perf script. Closes: https://lore.kernel.org/linux-perf-users/c41e4b04-e1fd-45ab-80b0-ec2ac6e94310@linux.ibm.com Fixes: `23e0a63c6d` ("perf script: force stdin for flamegraph in live mode") Reported-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Anubhav Shelat <ashelat@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-18 16:12:19 -08:00
Dmitry Vyukov	5e838165d0	perf hist: Shrink struct hist_entry size Reorder the struct fields by size to reduce paddings and reduce struct simd_flags size from 8 to 1 byte. This reduces struct hist_entry size by 8 bytes (592->584), and leaves a single more usable 6 byte padding hole. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/7c1cb1c8f9901e945162701ba7269d0f9c70be89.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-18 14:04:32 -08:00
Dmitry Vyukov	257facfaf5	perf test: Add tests for latency and parallelism profiling Ensure basic operation of latency/parallelism profiling and that main latency/parallelism record/report invocations don't fail/crash. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/c129c8f02f328f68e1e9ef2cdc582f8a9786a97d.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-18 14:04:32 -08:00
Dmitry Vyukov	32ecca8d7a	perf report: Add latency and parallelism profiling documentation Describe latency and parallelism profiling, related flags, and differences with the currently only supported CPU-consumption-centric profiling. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/a13f270ed33cedb03ce9ebf9ddbd064854ca0f19.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-18 14:04:32 -08:00
Dmitry Vyukov	2570c02c3a	perf report: Add --latency flag Add record/report --latency flag that allows to capture and show latency-centric profiles rather than the default CPU-consumption-centric profiles. For latency profiles record captures context switch events, and report shows Latency as the first column. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/e9640464bcbc47dde2cb557003f421052ebc9eec.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-18 14:04:32 -08:00
Dmitry Vyukov	ee1cffbe24	perf report: Add latency output field Latency output field is similar to overhead, but represents overhead for latency rather than CPU consumption. It's re-scaled from overhead by dividing weight by the current parallelism level at the time of the sample. It effectively models profiling with 1 sample taken per unit of wall-clock time rather than unit of CPU time. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/b6269518758c2166e6ffdc2f0e24cfdecc8ef9c1.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-18 14:04:32 -08:00
Dmitry Vyukov	61b6b31c2f	perf report: Add parallelism filter Add parallelism filter that can be used to look at specific parallelism levels only. The format is the same as cpu lists. For example: Only single-threaded samples: --parallelism=1 Low parallelism only: --parallelism=1-4 High parallelism only: --parallelism=64-128 Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/e61348985ff0a6a14b07c39e880edbd60a8f8635.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-18 14:04:32 -08:00
Dmitry Vyukov	216f8a970c	perf report: Switch filtered from u8 to u16 We already have all u8 bits taken, adding one more filter leads to unpleasant failure mode, where code compiles w/o warnings, but the last filters silently don't work. Add a typedef and switch to u16. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/32b4ce1731126c88a2d9e191dc87e39ae4651cb7.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-18 14:04:31 -08:00
Charlie Jenkins	293f324ce9	tools: Unify top-level quiet infrastructure Commit `f2868b1a66` ("perf tools: Expose quiet/verbose variables in Makefile.perf") moved the quiet infrastructure out of tools/build/Makefile.build and into the top-level Makefile.perf file so that the quiet infrastructure could be used throughout perf and not just in Makefile.build. Extract out the quiet infrastructure into Makefile.include so that it can be leveraged outside of perf. Fixes: `f2868b1a66` ("perf tools: Expose quiet/verbose variables in Makefile.perf") Reviewed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Benjamin Tissoires <bentiss@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Mykola Lysenko <mykolal@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <qmo@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20250213-quiet_tools-v3-1-07de4482a581@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-02-18 15:31:45 -03:00
Dmitry Vyukov	7ae1972e74	perf report: Add parallelism sort key Show parallelism level in profiles if requested by user. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/7f7bb87cbaa51bf1fb008a0d68b687423ce4bad4.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-17 22:00:50 -08:00
Dmitry Vyukov	f13bc61b2e	perf report: Add machine parallelism Add calculation of the current parallelism level (number of threads actively running on CPUs). The parallelism level can be shown in reports on its own, and to calculate latency overheads. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/0f8c1b8eb12619029e31b3d5c0346f4616a5aeda.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-17 22:00:50 -08:00
Namhyung Kim	20600b8aab	perf tools: Fix compile error on sample->user_regs It's recently changed to allocate dynamically but misses to update some arch-dependent codes to use perf_sample__user_regs(). Fixes: `dc6d2bc2d8` ("perf sample: Make user_regs and intr_regs optional") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250214191641.756664-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-14 12:33:41 -08:00
Leo Yan	d18c882f85	perf tools: Fix compilation error on arm64 Since the commit `dc6d2bc2d8` ("perf sample: Make user_regs and intr_regs optional"), the building for Arm64 reports error: arch/arm64/util/unwind-libdw.c: In function ‘libdw__arch_set_initial_registers’: arch/arm64/util/unwind-libdw.c:11:32: error: initialization of ‘struct regs_dump ’ from incompatible pointer type ‘struct regs_dump ’ [-Werror=incompatible-pointer-types] 11 \| struct regs_dump user_regs = &ui->sample->user_regs; \| ^ cc1: all warnings being treated as errors make[6]: * [/home/niayan01/linux/tools/build/Makefile.build:85: arch/arm64/util/unwind-libdw.o] Error 1 make[5]: * [/home/niayan01/linux/tools/build/Makefile.build:138: util] Error 2 arch/arm64/tests/dwarf-unwind.c: In function ‘test__arch_unwind_sample’: arch/arm64/tests/dwarf-unwind.c:48:27: error: initialization of ‘struct regs_dump ’ from incompatible pointer type ‘struct regs_dump ’ [-Werror=incompatible-pointer-types] 48 \| struct regs_dump regs = &sample->user_regs; \| ^ To fix the issue, use the helper perf_sample__user_regs() to retrieve the user_regs. Fixes: `dc6d2bc2d8` ("perf sample: Make user_regs and intr_regs optional") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250214111025.14478-1-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-14 11:12:12 -08:00
Ian Rogers	dc6d2bc2d8	perf sample: Make user_regs and intr_regs optional The struct dump_regs contains 512 bytes of cache_regs, meaning the two values in perf_sample contribute 1088 bytes of its total 1384 bytes size. Initializing this much memory has a cost reported by Tavian Barnes <tavianator@tavianator.com> as about 2.5% when running `perf script --itrace=i0`: https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/ Adrian Hunter <adrian.hunter@intel.com> replied that the zero initialization was necessary and couldn't simply be removed. This patch aims to strike a middle ground of still zeroing the perf_sample, but removing 79% of its size by make user_regs and intr_regs optional pointers to zalloc-ed memory. To support the allocation accessors are created for user_regs and intr_regs. To support correct cleanup perf_sample__init and perf_sample__exit functions are created and added throughout the code base. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250113194345.1537821-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 20:06:11 -08:00
Ian Rogers	08d9e88348	perf test stat_all_metrics: Ensure missing events fail test Issue reported by Thomas Falcon and diagnosed by Kan Liang here: https://lore.kernel.org/lkml/d44036481022c27d83ce0faf8c7f77042baedb34.camel@intel.com/ Metrics with missing events can be erroneously skipped if they contain FP, AMX or PMM events. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-25-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:40 -08:00
Ian Rogers	8a6dcb26af	perf vendor events: Update Tigerlake events/metrics Update events from v1.16 to v1.17. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.17: `e1d5ac3412` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-24-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	f2f3a4afdd	perf vendor events: Update SkylakeX events/metrics Update events from v1.35 to v1.36. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.36: `f6801e5c14` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-23-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	228c556a63	perf vendor events: Update Skylake metrics Update TMA metrics from 4.8 to 5.02. The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-22-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	86f5536004	perf vendor events: Update Sierraforest events/metrics Update events from v1.04 to v1.07. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.08: `7ae9c45ccf` `903b3d0a0a` `825c436147` `bafe6a7b5c` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Update uncore IIO events umask with the change: `d78e8a1665` which should address an issue originally raised by Michael Petlan: Reported-by: Michael Petlan <mpetlan@redhat.com> Closes: https://lore.kernel.org/all/alpine.LRH.2.20.2401300733310.11354@Diego/ Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-21-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	830ee133a5	perf vendor events: Update Sapphirerapids events/metrics Update events from v1.23 to v1.25. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.25: `78d6273c54` `f069ed9d0b` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Update uncore IIO events umask with the change: `d78e8a1665` which should address an issue originally raised by Michael Petlan: Reported-by: Michael Petlan <mpetlan@redhat.com> Closes: https://lore.kernel.org/all/alpine.LRH.2.20.2401300733310.11354@Diego/ Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-20-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	870b92024e	perf vendor events: Update Rocketlake events/metrics Update events from v1.03 to v1.04. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.04: `015d5a5eab` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-19-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	b4152015a9	perf vendor events: Update Meteorlake events/metrics Update events from v1.10 to v1.12. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.12: `d8fe70c91b` `b9dabd05ff` This updates the mapfile.csv for the 0xB5 CPUID variant of meteorlake. `c3094bc9bb` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-18-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	23878069de	perf vendor events: Update/add Lunarlake events/metrics Update events from v1.01 to v1.10. Add TMA metrics 5.02. Bring in the event updates v1.11: `af329039e8` `4a1cff8ceb` `cbc3b0dc19` `28f4b24f91` `172900e962` `dab0308f7a` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-17-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	c49b050915	perf vendor events: Update IcelakeX events/metrics Update events from v1.26 to v1.27. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.27: `6ee80d0532` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-16-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	094b233575	perf vendor events: Update Icelake events/metrics Update events from v1.22 to v1.24. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.24: `d4f10746cf` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-15-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	be67d89f79	perf vendor events: Update HaswellX events/metrics Update events from v28 to v29. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v29: `71dbf03aba` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-14-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	55bf5d0792	perf vendor events: Update Haswell events/metrics Update events from v35 to v36. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v36: `616ec6fc03` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Remove duplicate event UNC_CLOCK.SOCKET that was erroneously left in uncore-other.json. Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-13-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	aaa73d778b	perf vendor events: Update/add Graniterapids events/metrics Update events from v1.02 to v1.06. Add TMA metrics 5.02. Bring in the event updates v1.06: `de5502e51a` `79b9e512ea` `bc74a895e4` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Update uncore IIO events umask with the change: `d78e8a1665` which should address an issue originally raised by Michael Petlan: Reported-by: Michael Petlan <mpetlan@redhat.com> Closes: https://lore.kernel.org/all/alpine.LRH.2.20.2401300733310.11354@Diego/ Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	b52c4123a5	perf vendor events: Update GrandRidge events/metrics Update events from v1.03 to v1.05. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.05: `3b2e3528fb` `9bc1815536` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Update uncore IIO events umask with the change: `d78e8a1665` which should address an issue originally raised by Michael Petlan: Reported-by: Michael Petlan <mpetlan@redhat.com> Closes: https://lore.kernel.org/all/alpine.LRH.2.20.2401300733310.11354@Diego/ Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	5ee60fbf73	perf vendor events: Update EmeraldRapids events/metrics Update events from v1.09 to v1.11. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.11: `bffcec00a1` `a63da6de48` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Update uncore IIO events umask with the change: `d78e8a1665` which should address an issue originally raised by Michael Petlan: Reported-by: Michael Petlan <mpetlan@redhat.com> Closes: https://lore.kernel.org/all/alpine.LRH.2.20.2401300733310.11354@Diego/ Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	e415c1493f	perf vendor events: Add Clearwaterforest events Add events v1.00. Bring in the events from: https://github.com/intel/perfmon/tree/main/CWF/events Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	7487e4fce9	perf vendor events: Update CascadelakeX events/metrics Update events from v1.22 to v1.23. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.23: `8f3665f6be` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	a75d905d64	perf vendor events: Update BroadwellX events/metrics Update events from v22 to v23. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v23: `679982113f` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	11e644eb46	perf vendor events: Update BroadwellDE events/metrics Update events from v11 to v12. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v12: `e0b83388d5` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	240411b048	perf vendor events: Update Broadwell events/metrics Update events from v29 to v30. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v30: `9a1827b2ac` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	ba56a91063	perf vendor events: Add Arrowlake events/metrics Add events v1.07. Add TMA metrics based on v5.02. Bring in the events from: https://github.com/intel/perfmon/tree/main/ARL/events TMA 5.02 is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	b04fe42f6e	perf vendor events: Update AlderlakeN events/metrics Update events from v1.27 to v1.28. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.28: `801f43f22e` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	54169b4663	perf vendor events: Update Alderlake events/metrics Update events from v1.27 to v1.28. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.28: `801f43f22e` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Namhyung Kim	70f127c716	perf tools: Use symfs when opening debuginfo by path I found that it failed to load a binary using --symfs option. Say I have a binary in /home/user/prog/xxx and a perf data file with it. If I move them to a different machine and use --symfs, it tries to find the binary in some locations under symfs using dso__read_binary_type_filename(), but not the last one. ${symfs}/usr/lib/debug/home/user/prog/xxx.debug ${symfs}/usr/lib/debug/home/user/prog/xxx ${symfs}/home/user/prog/.debug/xxx /home/user/prog/xxx It should check ${symfs}/home/usr/prog/xxx. Let's fix it. Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20250212221445.437481-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:16 -08:00
Namhyung Kim	fc00897c8a	perf trace: Add --summary-mode option The --summary-mode option will select how to show the syscall summary at the end. By default, it'll show the summary for each thread and it's the same as if --summary-mode=thread is passed. The other option is to show total summary, which is --summary-mode=total. I'd like to have this instead of a separate option like --total-summary because we may want to add a new summary mode (by cgroup) later. $ sudo ./perf trace -as --summary-mode=total sleep 1 Summary of events: total, 21580 events syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ epoll_wait 1305 0 14716.712 0.000 11.277 551.529 8.87% futex 1256 89 13331.197 0.000 10.614 733.722 15.49% poll 669 0 6806.618 0.000 10.174 459.316 11.77% ppoll 220 0 3968.797 0.000 18.040 516.775 25.35% clock_nanosleep 1 0 1000.027 1000.027 1000.027 1000.027 0.00% epoll_pwait 21 0 592.783 0.000 28.228 522.293 88.29% nanosleep 16 0 60.515 0.000 3.782 10.123 33.33% ioctl 510 0 4.284 0.001 0.008 0.182 8.84% recvmsg 1434 775 3.497 0.001 0.002 0.174 6.37% write 1393 0 2.854 0.001 0.002 0.017 1.79% read 1063 100 2.236 0.000 0.002 0.083 5.11% ... Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250205205443.1986408-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:16 -08:00
Namhyung Kim	bd50a26c9a	perf tools: Get rid of now-unused rb_resort.h It was only used in perf trace and it switched to use hashmap instead. Let's delete the code. Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250205205443.1986408-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:15 -08:00
Namhyung Kim	ef2da619b1	perf trace: Convert syscall_stats to hashmap It was using a RBtree-based int-list as a hash and a custom resort logic for that. As we have hashmap, let's convert to it and add a custom sort function for the hashmap entries using an array. It should be faster and more light-weighted. It's also to prepare supporting system-wide syscall stats. No functional changes intended. Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250205205443.1986408-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:15 -08:00
Namhyung Kim	c7f821b876	perf trace: Allocate syscall stats only if summary is on The syscall stats are used only when summary is requested. Let's avoid unnecessary operations. While at it, let's pass 'trace' pointer directly instead of passing 'output' file pointer and 'summary' option in the 'trace' separately. Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250205205443.1986408-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:10 -08:00
James Clark	615ec00b06	perf tests: Fix Tool PMU test segfault tool_pmu__event_to_str() now handles skipped events by returning NULL, so it's wrong to re-check for a skip on the resulting string. Calling tool_pmu__skip_event() with a NULL string results in a segfault so remove the unnecessary skip to fix it: $ perf test -vv "parsing with PMU name" 12.2: Parsing with PMU name: ... ---- unexpected signal (11) ---- 12.2: Parsing with PMU name : FAILED! Fixes: `ee8aef2d23` ("perf tools: Add skip check in tool_pmu__event_to_str()") Signed-off-by: James Clark <james.clark@linaro.org> Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250212163859.1489916-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:34:56 -08:00
Kan Liang	ee8aef2d23	perf tools: Add skip check in tool_pmu__event_to_str() Some topdown related metrics may fail on hybrid machines. $ perf stat -M tma_frontend_bound Cannot resolve IDs for tma_frontend_bound: cpu_atom@TOPDOWN_FE_BOUND.ALL@ / (8 * cpu_atom@CPU_CLK_UNHALTED.CORE@) In the find_tool_events(), the tool_pmu__event_to_str() is used to compare the tool_events. It only checks the event name, no PMU or arch. So the tool_events[TOOL_PMU__EVENT_SLOTS] is set to true, because the p-core Topdown metrics has "slots" event. The tool_events is shared. So when parsing the e-core metrics, the "slots" is automatically added. The "slots" event as a tool event should only be available on arm64. It has a different meaning on X86. The tool_pmu__skip_event() intends handle the case. Apply it for tool_pmu__event_to_str() as well. There is a lack of sanity check in the expr__get_id(). Add the check. Closes: https://lore.kernel.org/lkml/608077bc-4139-4a97-8dc4-7997177d95c4@linux.intel.com/ Fixes: `069057239a` ("perf tool_pmu: Move expr literals to tool_pmu") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: thomas.falcon@intel.com Link: https://lore.kernel.org/r/20250207152844.302167-1-kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-10 11:46:30 -08:00
Dr. David Alan Gilbert	1df4b33f62	perf tools: Deadcode removal The last use of machine__fprintf_vmlinux_path() was removed in 2011 by commit `ab81f3fd35` ("perf top: Reuse the 'report' hist_entry/hists classes") mmap_cpu_mask__duplicate() was added in 2021 by commit `6bd006c6eb` ("perf mmap: Introduce mmap_cpu_mask__duplicate()") but hasn't been used since. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250204220545.456435-1-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-10 11:46:02 -08:00
Namhyung Kim	9e676a024f	Linux 6.14-rc1 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmegAi4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG+cMH/jFx5lmvzVObuStc OdqfdMJVF238cX3iovDF6hLMDCuSgYY9CX5FYmd7pGtxGuUEecSLxin+WbJcxfin WBHzgPP+hmcjqpU0yCd3azITi8BHJeFCgT86OM/1Rsv82M4T/xWxBIET79izQJ0E 5L9KzlmPMLTLbLPVa+wookXfoJOycWRDCN6p/jxTLzeM/szqDlokAsSf19iodkl/ 59Gnk5oEYneqyt4FdTgxWcq1fteTlzZJgC6heN5XIjZuSN1ME11N4QO0xu+ld3UA nzbpnNwCRIl50yO5+pvYpkoRrHDwxjJ7an9sliWAHxDt/etVngTaSsl8uGht/9QK +4Vi48I= =TI43 -----END PGP SIGNATURE----- Merge tag 'v6.14-rc1' into perf-tools-next To get the various fixes in the current master. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-05 14:57:18 -08:00
Ian Rogers	357b965deb	perf stat: Changes to event name uniquification The existing logic would disable uniquification on an evlist or enable it per evsel, this is unfortunate as uniquification is most needed when events have the same name and so the whole evlist must be considered. Change the initial disable uniquify on an evlist processing to also set a needs_uniquify flag, for cases like the matching event names. This must be done as an initial pass as uniquification of an event name will change the behavior of the check. Keep the per counter uniquification but now only uniquify event names when the needs_uniquify flag is set. Before this change a hwmon like temp1 wouldn't be uniquified and afterwards it will (ie the PMU is added to the temp1 event's name). Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:29:13 -08:00
Ian Rogers	2d9961c690	perf stat: Don't merge counters purely on name Counter merging was added in commit `942c559339` ("perf stat: Add perf_stat_merge_counters()"), however, it merges events with the same name on different PMUs regardless of whether the different PMUs are actually of the same type (ie they differ only in the suffix on the PMU). For hwmon events there may be a temp1 event on every PMU, but the PMU names are all unique and don't differ just by a suffix. The merging is over eager and will merge all the hwmon counters together meaning an aggregated and very large temp1 value is shown. The same would be true for say cache events and memory controller events where the PMUs differ but the event names are the same. Fix the problem by correctly saying two PMUs alias when they differ only by suffix. Note, there is an overlap with evsel's merged_stat with aggregation and the evsel's metric_leader where aggregation happens for metrics. Fixes: `942c559339` ("perf stat: Add perf_stat_merge_counters()") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:29:05 -08:00
Ian Rogers	63e287131c	perf pmu: Rename name matching for no suffix or wildcard variants Wildcard PMU naming will match a name like pmu_1 to a PMU name like pmu_10 but not to a PMU name like pmu_2 as the suffix forms part of the match. No suffix matching will match pmu_10 to either pmu_1 or pmu_2. Add or rename matching functions on PMU to make it clearer what kind of matching is being performed. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:28:46 -08:00
Ian Rogers	57e13264dc	perf pmus: Restructure pmu_read_sysfs to scan fewer PMUs Rather than scanning core or all PMUs, allow pmu_read_sysfs to read some combination of core, other, hwmon and tool PMUs. The PMUs that should be read and are already read are held as bitmaps. It is known that a "hwmon_" prefix is necessary for a hwmon PMU's name, similarly with "tool", so only scan those PMUs in situations the PMU name or the PMU's type number make sense to. The number of openat system calls reduces from 276 to 98 for a hwmon event. The number of openats for regular perf events isn't changed. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:28:37 -08:00
Ian Rogers	340c345e58	perf evsel: Reduce scanning core PMUs in is_hybrid evsel__is_hybrid returns true if there are multiple core PMUs and the evsel is for a core PMU. Determining the number of core PMUs can require loading/scanning PMUs. There's no point doing the scanning if evsel for the is_hybrid test isn't core so reorder the tests to reduce PMU scanning. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:28:25 -08:00
Thomas Richter	888751e4d0	perf test: Fix Hwmon PMU test endianess issue perf test 11 hwmon fails on s390 with this error # ./perf test -Fv 11 --- start --- ---- end ---- 11.1: Basic parsing test : Ok --- start --- Testing 'temp_test_hwmon_event1' Using CPUID IBM,3931,704,A01,3.7,002f temp_test_hwmon_event1 -> hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/ FAILED tests/hwmon_pmu.c:189 Unexpected config for 'temp_test_hwmon_event1', 292470092988416 != 655361 ---- end ---- 11.2: Parsing without PMU name : FAILED! --- start --- Testing 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/' FAILED tests/hwmon_pmu.c:189 Unexpected config for 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/', 292470092988416 != 655361 ---- end ---- 11.3: Parsing with PMU name : FAILED! # The root cause is in member test_event::config which is initialized to 0xA0001 or 655361. During event parsing a long list event parsing functions are called and end up with this gdb call stack: #0 hwmon_pmu__config_term (hwm=0x168dfd0, attr=0x3ffffff5ee8, term=0x168db60, err=0x3ffffff81c8) at util/hwmon_pmu.c:623 #1 hwmon_pmu__config_terms (pmu=0x168dfd0, attr=0x3ffffff5ee8, terms=0x3ffffff5ea8, err=0x3ffffff81c8) at util/hwmon_pmu.c:662 #2 0x00000000012f870c in perf_pmu__config_terms (pmu=0x168dfd0, attr=0x3ffffff5ee8, terms=0x3ffffff5ea8, zero=false, apply_hardcoded=false, err=0x3ffffff81c8) at util/pmu.c:1519 #3 0x00000000012f88a4 in perf_pmu__config (pmu=0x168dfd0, attr=0x3ffffff5ee8, head_terms=0x3ffffff5ea8, apply_hardcoded=false, err=0x3ffffff81c8) at util/pmu.c:1545 #4 0x00000000012680c4 in parse_events_add_pmu (parse_state=0x3ffffff7fb8, list=0x168dc00, pmu=0x168dfd0, const_parsed_terms=0x3ffffff6090, auto_merge_stats=true, alternate_hw_config=10) at util/parse-events.c:1508 #5 0x00000000012684c6 in parse_events_multi_pmu_add (parse_state=0x3ffffff7fb8, event_name=0x168ec10 "temp_test_hwmon_event1", hw_config=10, const_parsed_terms=0x0, listp=0x3ffffff6230, loc_=0x3ffffff70e0) at util/parse-events.c:1592 #6 0x00000000012f0e4e in parse_events_parse (_parse_state=0x3ffffff7fb8, scanner=0x16878c0) at util/parse-events.y:293 #7 0x00000000012695a0 in parse_events__scanner (str=0x3ffffff81d8 "temp_test_hwmon_event1", input=0x0, parse_state=0x3ffffff7fb8) at util/parse-events.c:1867 #8 0x000000000126a1e8 in __parse_events (evlist=0x168b580, str=0x3ffffff81d8 "temp_test_hwmon_event1", pmu_filter=0x0, err=0x3ffffff81c8, fake_pmu=false, warn_if_reordered=true, fake_tp=false) at util/parse-events.c:2136 #9 0x00000000011e36aa in parse_events (evlist=0x168b580, str=0x3ffffff81d8 "temp_test_hwmon_event1", err=0x3ffffff81c8) at /root/linux/tools/perf/util/parse-events.h:41 #10 0x00000000011e3e64 in do_test (i=0, with_pmu=false, with_alias=false) at tests/hwmon_pmu.c:164 #11 0x00000000011e422c in test__hwmon_pmu (with_pmu=false) at tests/hwmon_pmu.c:219 #12 0x00000000011e431c in test__hwmon_pmu_without_pmu (test=0x1610368 <suite.hwmon_pmu>, subtest=1) at tests/hwmon_pmu.c:23 where the attr::config is set to value 292470092988416 or 0x10a0000000000 in line 625 of file ./util/hwmon_pmu.c: attr->config = key.type_and_num; However member key::type_and_num is defined as union and bit field: union hwmon_pmu_event_key { long type_and_num; struct { int num :16; enum hwmon_type type :8; }; }; s390 is big endian and Intel is little endian architecture. The events for the hwmon dummy pmu have num = 1 or num = 2 and type is set to HWMON_TYPE_TEMP (which is 10). On s390 this assignes member key::type_and_num the value of 0x10a0000000000 (which is 292470092988416) as shown in above trace output. Fix this and export the structure/union hwmon_pmu_event_key so the test shares the same implementation as the event parsing functions for union and bit fields. This should avoid endianess issues on all platforms. Output after: # ./perf test -F 11 11.1: Basic parsing test : Ok 11.2: Parsing without PMU name : Ok 11.3: Parsing with PMU name : Ok # Fixes: `531ee0fd48` ("perf test: Add hwmon "PMU" test") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250131112400.568975-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 17:22:40 -08:00
Thomas Richter	90d97674d4	perf test: Use cycles event in perf record test for leader_sampling On s390 the event instructions can not be used for recording. This event is only supported by perf stat. Change the event from instructions to cycles in subtest test_leader_sampling. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Suggested-by: James Clark <james.clark@linaro.org> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250131102756.4185235-3-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 11:36:14 -08:00
Thomas Richter	859199431d	perf test: Fix perf record test for precise_max On s390 the event instructions can not be used for recording. This event is only supported by perf stat. Test that each event cycles and instructions supports sampling. If the event can not be sampled, skip it. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Suggested-by: James Clark <james.clark@linaro.org> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250131102756.4185235-2-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 11:34:25 -08:00
Anubhav Shelat	23e0a63c6d	perf script: force stdin for flamegraph in live mode Currently, running "perf script flamegraph -a -F 99 sleep 1" should produce flamegraph.html containing the flamegraph. Howevever, it gives a segmentation fault. This is caused because the flamegraph.py script is supposed to take as input the output of "perf record", which should be in stdin. This would require passing "-i -" to flamegraph.py. However, the "flamegraph-report" script causes "perf script" command to take the "-i -" option instead of flamegraph.py, which causes no problem for "perf script", but causes a seg fault since flamegraph.py has no input file. To fix this I added the "-i -" option directly to the flamegraph-report script to ensure flamegraph.py gets input from stdin. Signed-off-by: Anubhav Shelat <ashelat@redhat.com> Tested-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20250131145704.3164542-2-ashelat@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-03 19:49:10 -08:00
Ian Rogers	bb4b8f9697	perf test: Extra verbosity and hypervisor skip for tpebs test When not running as root and with higher perf event paranoia values the perf record forked by TPEBS can fail to attach to the process. Skip the test in these scenarios. Intel TPEBS test skips on non-Intel CPUs. On Intel CPUs under a hypervisor the cache-misses event may not be present or precise. Skip the test under this condition. Refactor the output code to be placed in a file so that on a signal the file can be dumped. This was necessary to catch the issue above as the failing perf record command would fail without output. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250130170135.5817-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-03 19:45:50 -08:00
James Clark	4c4c0724d6	perf: Always feature test reallocarray This is also used in util/comm.c now, so instead of selectively doing the feature test, always do it. If it's ever used anywhere else it's less likely to cause another build failure. This doesn't remove the need to manually include libc_compat.h, and missing that will still cause an error for glibc < 2.26. There isn't a way to fix that without poisoning reallocarray like libbpf did, but that has other downsides like making memory debugging tools less useful. So for Perf keep it like this and we'll have to fix up any missed includes. Fixes the following build error: util/comm.c:152:31: error: implicit declaration of function 'reallocarray' [-Wimplicit-function-declaration] 152 \| tmp = reallocarray(comm_strs->strs, \| ^~~~~~~~~~~~ Fixes: `13ca628716` ("perf comm: Add reference count checking to 'struct comm_str'") Reported-by: Ali Utku Selen <ali.utku.selen@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250129154405.777533-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-31 14:45:19 -08:00
Linus Torvalds	c06310fd6b	perf-tools fixes for 6.14 An early round of random fixes in perf tools for this cycle. perf trace ---------- * Fix loading of BPF program on certain clang versions * Fix out-of-bound access in syscalls with 6 arguments * Skip syscall enum test if landlock syscall is not available perf annotate ------------- * Fix segfaults due to invalid access in disasm arrays perf stat --------- * Fix error handling in topology parsing Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ5vx+gAKCRCMstVUGiXM g8PpAP9fNWvkxEiylqO9GGqMJWnIwWwlz4NCqqOZWyPspcECrgD9Eu0lZlna4tOL 3I8giYN2m7ogNt+ZXP2b0y2np7hOGQc= =lVVJ -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.14-2025-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "An early round of random fixes in perf tools for this cycle. perf trace: - Fix loading of BPF program on certain clang versions - Fix out-of-bound access in syscalls with 6 arguments - Skip syscall enum test if landlock syscall is not available perf annotate: - Fix segfaults due to invalid access in disasm arrays perf stat: - Fix error handling in topology parsing" * tag 'perf-tools-fixes-for-v6.14-2025-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf cpumap: Fix die and cluster IDs perf test: Skip syscall enum test if no landlock syscall perf trace: Fix runtime error of index out of bounds perf annotate: Use an array for the disassembler preference perf trace: Fix BPF loading failure (-E2BIG)	2025-01-30 17:38:20 -08:00
Ian Rogers	8ce0d2da14	perf stat: Fix find_stat for mixed legacy/non-legacy events Legacy events typically don't have a PMU when added leading to mismatched legacy/non-legacy cases in find_stat. Use evsel__find_pmu to make sure the evsel PMU is looked up. Update the evsel__find_pmu code to look for the PMU using the extended config type or, for legacy hardware/hw_cache events on non-hybrid systems, just use the core PMU. Before: ``` $ perf stat -e cycles,cpu/instructions/ -a sleep 1 Performance counter stats for 'system wide': 215,309,764 cycles 44,326,491 cpu/instructions/ 1.002555314 seconds time elapsed ``` After: ``` $ perf stat -e cycles,cpu/instructions/ -a sleep 1 Performance counter stats for 'system wide': 990,676,332 cycles 1,235,762,487 cpu/instructions/ # 1.25 insn per cycle 1.002667198 seconds time elapsed ``` Fixes: `3612ca8e29` ("perf stat: Fix the hard-coded metrics calculation on the hybrid") Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250109222109.567031-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-29 14:06:25 -08:00
Ian Rogers	6ab89b7fc2	perf evsel: Add pmu_name helper Add helper to get the name of the evsel's PMU. This handles the case where there's no sysfs PMU via parse_events event_type helper. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250109222109.567031-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-29 14:05:57 -08:00
James Clark	9fae5884bb	perf cpumap: Fix die and cluster IDs Now that filename__read_int() returns -errno instead of -1 these statements need to be updated otherwise error values will be used as die IDs. This appears as a -2 die ID when the platform doesn't export one: $ perf stat --per-core -a -- true S36-D-2-C0 1 9.45 msec cpu-clock And the session topology test fails: $ perf test -vvv topology CPU 0, core 0, socket 36 CPU 1, core 1, socket 36 CPU 2, core 2, socket 36 CPU 3, core 3, socket 36 FAILED tests/topology.c:137 Cpu map - Die ID doesn't match ---- end(-1) ---- 38: Session topology : FAILED! Fixes: `05be17eed7` ("tool api fs: Correctly encode errno for read/write open failures") Reported-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241218115552.912517-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-28 10:03:26 -08:00
Namhyung Kim	72d81e1062	perf test: Skip syscall enum test if no landlock syscall The perf trace enum augmentation test specifically targets landlock_ add_rule syscall but IIUC it's an optional and can be opt-out by a kernel config. Currently trace_landlock() runs `perf test -w landlock` before the actual testing to check the availability but it's not enough since the workload always returns 0. Instead it could check if perf trace output has 'landlock' string. Fixes: `d66763fed3` ("perf test trace_btf_enum: Add regression test for the BTF augmentation of enums in 'perf trace'") Reviewed-by: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250128170629.1251574-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-28 09:29:39 -08:00
Howard Chu	c7b87ce0dd	perf trace: Fix runtime error of index out of bounds libtraceevent parses and returns an array of argument fields, sometimes larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr", idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6 elements max, creating an out-of-bounds access. This runtime error is found by UBsan. The error message: $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1 builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]' #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966 #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110 #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436 #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897 #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335 #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502 #6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351 #7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404 #8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448 #9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556 #10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360 #12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6) 0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1) = 1 Fixes: `5e58fcfaf4` ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint") Signed-off-by: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250122025519.361873-1-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-28 09:27:27 -08:00
Ian Rogers	bde4ccfd5a	perf annotate: Use an array for the disassembler preference Prior to this change a string was used which could cause issues with an unrecognized disassembler in symbol__disassembler. Change to initializing an array of perf_disassembler enum values. If a value already exists then adding it a second time is ignored to avoid array out of bounds problems present in the previous code, it also allows a statically sized array and removes memory allocation needs. Errors in the disassembler string are reported when the config is parsed during perf annotate or perf top start up. If the array is uninitialized after processing the config file the default llvm, capstone then objdump values are added but without a need to parse a string. Fixes: `a6e8a58de6` ("perf disasm: Allow configuring what disassemblers to use") Closes: https://lore.kernel.org/lkml/CAP-5=fUdfCyxmEiTpzS2uumUp3-SyQOseX2xZo81-dQtWXj6vA@mail.gmail.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250124043856.1177264-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-27 15:58:01 -08:00
James Clark	66e99fd5a1	perf vendor events arm64: Add V3 events/metrics Using the scripts at: https://gitlab.arm.com/telemetry-solution/telemetry-solution/ Generate perf json for neoverse-v3 using the following command: ``` $ telemetry-solution/tools/perf_json_generator/generate.py \ tools/perf/ --telemetry-files \ telemetry-solution/data/pmu/cpu/neoverse/neoverse-v3.json ``` Signed-off-by: Ian Rogers <irogers@google.com> [Re-generate after updating script] Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250122163504.2061472-3-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-24 15:14:15 -08:00
James Clark	994256a798	perf vendor events arm64: Add N3 events/metrics Using the scripts at: https://gitlab.arm.com/telemetry-solution/telemetry-solution/ Generate perf json for neoverse-n3 using the following command: ``` $ telemetry-solution/tools/perf_json_generator/generate.py \ tools/perf/ --telemetry-files \ telemetry-solution/data/pmu/cpu/neoverse/neoverse-n3.json ``` Signed-off-by: Ian Rogers <irogers@google.com> [Re-generate after updating script] Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250122163504.2061472-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-24 15:14:06 -08:00
Benjamin Peterson	0aefb3df8b	perf trace: Fix return value of trace__fprintf_tp_fields This function formerly returned twice the number of bytes printed. Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Reviewed-by: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250123-void-fprintf_tp_fields-v2-1-6038f8224987@engflow.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-24 13:21:49 -08:00
Linus Torvalds	7685b334d1	perf-tools changes for v6.14 There are a lot of changes in the perf tools in this cycle. build ----- * Use generic syscall table to generate syscall numbers on supported archs. * This also enables to get rid of libaudit which was used for syscall numbers. * Remove python2 support as it's deprecated for years. * Fix issues on static build with libzstd. perf record ----------- * Intel-PT supports "aux-action" config term to pause or resume tracing in the aux-buffer. Users can start the intel_pt event as "started-paused" and configure other events to control the Intel-PT tracing. # perf record --kcore -e intel_pt/aux-action=start-paused/ \ -e syscalls:sys_enter_newuname/aux-action=resume/ \ -e syscalls:sys_exit_newuname/aux-action=pause/ -- uname This requires the kernel support (which was added in v6.13). perf lock --------- * 'perf lock contention' command has an ability to symbolize locks in dynamically allocated objects using slab cache name when it runs with BPF. Those dynamic locks would have "&" prefix in the name to distinguish them from ordinary (static) locks. # perf lock con -abl -E 5 sleep 1 contended total wait max wait avg wait address symbol 2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex) 1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex) 4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex) 2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex) 3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex) This also requires the kernel/BPF support (which was added in v6.13). perf ftrace ----------- * 'perf ftrace latency' command gets a couple of options to support linear buckets instead of exponential. Also it's possible to specify max and min latency for the linear buckets. # perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \ --min-latency=200 --max-latency=800 -- sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 200 ns \| 186 \| ### \| 200 - 300 ns \| 256 \| ##### \| 300 - 400 ns \| 364 \| ####### \| 400 - 500 ns \| 223 \| #### \| 500 - 600 ns \| 111 \| ## \| 600 - 700 ns \| 41 \| \| 700 - 800 ns \| 141 \| ## \| 800 - ... ns \| 169 \| ### \| # statistics (in nsec) total time: 2162212 avg time: 967 max time: 16817 min time: 132 count: 2236 * As you can see in the above example, it nows shows the statistics at the end so that users can see the avg/max/min latencies easily. * 'perf ftrace profile' command has --graph-opts option like 'perf ftrace trace' so that it can control the tracing behaviors in the same way. For example, it can limit the function call depth or threshold. perf script ----------- * Improve physical memory resolution in 'mem-phys-addr' script by parsing /proc/iomem file. # perf script mem-phys-addr -- find / ... Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-85f7fffff : System RAM 8929 69.7 547600000-54785d23f : Kernel data 1240 9.7 546a00000-5474bdfff : Kernel rodata 490 3.8 5480ce000-5485fffff : Kernel bss 121 0.9 0-fff : Reserved 3860 30.1 100000-89c01fff : System RAM 18 0.1 8a22c000-8df6efff : System RAM 5 0.0 Others ------ * 'perf test' gets --runs-per-test option to run the test cases repeatedly. This would be helpful to see if it's flaky. * Add 'parse_events' method to Python perf extension module, so that users can use the same event parsing logic in the python code. One more step towards implementing perf tools in Python. :) * Support opening tracepoint events without libtraceevent. This will be helpful if it won't use the tracing data like in 'perf stat'. * Update ARM Neoverse N2/V2 JSON events and metrics Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ5AgiQAKCRCMstVUGiXM g0WhAP43Dpfatrm1jicTyAogk5D/JrIMOgjGtrJJi5RXG/r0gwD8DSWFzLppS9xy KGtjLHrN6v6BqR4DCubdlZmRfh9Qjgg= =M0Kz -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf-tools updates from Namhyung Kim: "There are a lot of changes in the perf tools in this cycle. build: - Use generic syscall table to generate syscall numbers on supported archs - This also enables to get rid of libaudit which was used for syscall numbers - Remove python2 support as it's deprecated for years - Fix issues on static build with libzstd perf record: - Intel-PT supports "aux-action" config term to pause or resume tracing in the aux-buffer. Users can start the intel_pt event as "started-paused" and configure other events to control the Intel-PT tracing: # perf record --kcore -e intel_pt/aux-action=start-paused/ \ -e syscalls:sys_enter_newuname/aux-action=resume/ \ -e syscalls:sys_exit_newuname/aux-action=pause/ -- uname This requires kernel support (which was added in v6.13) perf lock: - 'perf lock contention' command has an ability to symbolize locks in dynamically allocated objects using slab cache name when it runs with BPF. Those dynamic locks would have "&" prefix in the name to distinguish them from ordinary (static) locks # perf lock con -abl -E 5 sleep 1 contended total wait max wait avg wait address symbol 2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex) 1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex) 4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex) 2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex) 3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex) This also requires kernel/BPF support (which was added in v6.13) perf ftrace: - 'perf ftrace latency' command gets a couple of options to support linear buckets instead of exponential. Also it's possible to specify max and min latency for the linear buckets: # perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \ --min-latency=200 --max-latency=800 -- sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 200 ns \| 186 \| ### \| 200 - 300 ns \| 256 \| ##### \| 300 - 400 ns \| 364 \| ####### \| 400 - 500 ns \| 223 \| #### \| 500 - 600 ns \| 111 \| ## \| 600 - 700 ns \| 41 \| \| 700 - 800 ns \| 141 \| ## \| 800 - ... ns \| 169 \| ### \| # statistics (in nsec) total time: 2162212 avg time: 967 max time: 16817 min time: 132 count: 2236 - As you can see in the above example, it nows shows the statistics at the end so that users can see the avg/max/min latencies easily - 'perf ftrace profile' command has --graph-opts option like 'perf ftrace trace' so that it can control the tracing behaviors in the same way. For example, it can limit the function call depth or threshold perf script: - Improve physical memory resolution in 'mem-phys-addr' script by parsing /proc/iomem file # perf script mem-phys-addr -- find / ... Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-85f7fffff : System RAM 8929 69.7 547600000-54785d23f : Kernel data 1240 9.7 546a00000-5474bdfff : Kernel rodata 490 3.8 5480ce000-5485fffff : Kernel bss 121 0.9 0-fff : Reserved 3860 30.1 100000-89c01fff : System RAM 18 0.1 8a22c000-8df6efff : System RAM 5 0.0 Others: - 'perf test' gets --runs-per-test option to run the test cases repeatedly. This would be helpful to see if it's flaky - Add 'parse_events' method to Python perf extension module, so that users can use the same event parsing logic in the python code. One more step towards implementing perf tools in Python. :) - Support opening tracepoint events without libtraceevent. This will be helpful if it won't use the tracing data like in 'perf stat' - Update ARM Neoverse N2/V2 JSON events and metrics" * tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (176 commits) perf test: Update event_groups test to use instructions perf bench: Fix undefined behavior in cmpworker() perf annotate: Prefer passing evsel to evsel->core.idx perf lock: Rename fields in lock_type_table perf lock: Add percpu-rwsem for type filter perf lock: Fix parse_lock_type which only retrieve one lock flag perf lock: Fix return code for functions in __cmd_contention perf hist: Fix width calculation in hpp__fmt() perf hist: Fix bogus profiles when filters are enabled perf hist: Deduplicate cmp/sort/collapse code perf test: Improve verbose documentation perf test: Add a runs-per-test flag perf test: Fix parallel/sequential option documentation perf test: Send list output to stdout rather than stderr perf test: Rename functions and variables for better clarity perf tools: Expose quiet/verbose variables in Makefile.perf perf config: Add a function to set one variable in .perfconfig perf test perftool_testsuite: Return correct value for skipping perf test perftool_testsuite: Add missing description perf test record+probe_libc_inet_pton: Make test resilient ...	2025-01-24 05:45:40 -08:00
Howard Chu	013eb043f3	perf trace: Fix BPF loading failure (-E2BIG) As reported by Namhyung Kim and acknowledged by Qiao Zhao (link: https://lore.kernel.org/linux-perf-users/20241206001436.1947528-1-namhyung@kernel.org/), on certain machines, perf trace failed to load the BPF program into the kernel. The verifier runs perf trace's BPF program for up to 1 million instructions, returning an E2BIG error, whereas the perf trace BPF program should be much less complex than that. This patch aims to fix the issue described above. The E2BIG problem from clang-15 to clang-16 is cause by this line: } else if (size < 0 && size >= -6) { /* buffer / Specifically this check: size < 0. seems like clang generates a cool optimization to this sign check that breaks things. Making 'size' s64, and use } else if ((int)size < 0 && size >= -6) { / buffer / Solves the problem. This is some Hogwarts magic. And the unbounded access of clang-12 and clang-14 (clang-13 works this time) is fixed by making variable 'aug_size' s64. As for this: -if (aug_size > TRACE_AUG_MAX_BUF) - aug_size = TRACE_AUG_MAX_BUF; +aug_size = args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF : args->args[index]; This makes the BPF skel generated by clang-18 work. Yes, new clangs introduce problems too. Sorry, I only know that it works, but I don't know how it works. I'm not an expert in the BPF verifier. I really hope this is not a kernel version issue, as that would make the test case (kernel_nr) (clang_nr), a true horror story. I will test it on more kernel versions in the future. Fixes: `395d38419f`: ("perf trace augmented_raw_syscalls: Add more check s to pass the verifier") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241213023047.541218-1-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-23 15:55:52 -08:00
Linus Torvalds	9ad09c4f28	arm64 updates for 6.14 Confidential Computing: * Register a platform device when running in CCA realm mode to enable automatic loading of dependent modules. CPU Features: * Update a bunch of system register definitions to pick up new field encodings from the architectural documentation. * Add hwcaps and selftests for the new (2024) dpISA extensions. Documentation: * Update EL3 (firmware) requirements for booting Linux on modern arm64 designs. * Remove stale information about the kernel virtual memory map. Miscellaneous: * Minor cleanups and typo fixes. Memory management: * Fix vmemmap_check_pmd() to look at the PMD type bits * LPA2 (52-bit physical addressing) cleanups and minor fixes. * Adjust physical address space depending upon whether or not LPA2 is enabled. Perf and PMUs: * Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU * Extend AXI filtering support for the DDR PMU on NXP IMX SoCs * Fix Designware PCIe PMU event numbering. * Add generic branch events for the Apple M1 CPU PMU. * Add support for Marvell Odyssey DDR and LLC-TAD PMUs. * Cleanups to the Hisilicon DDRC and Uncore PMU code. * Advertise discard mode for the SPE PMU. * Add the perf users mailing list to our MAINTAINERS entry. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmeKZLcQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNEQzB/0X2U89ZiqxIkTPQvfFrjN/uUGybkq59rEL DfeoGukTgJIwc3GHWXXtQ//wuuYKdTeCXaIz5NFK3+7/wmKSLvjkexmue8pta6EY 5rx9bAPr/D8lAUvhKIN2l3pF/ygoRwDz+nT2yVQ1xlZxYJWX7ZIsMj7W7ceb5kdx HRrTSQuhEEPREAWWO4oCMWl5SQZSrIflSE3Be/PsP0OhW6k//ZmWbcJTgUcHbKam o2WtNjITyGzxMpRCcrGEZKoe9YcwSxiut/PoD7JuoB4C/rbsf1cdJ6uLmtvGJcZj qsdRHhVfBzP1+ahONrDbiT3C2+s1UZySKdCDIxiYy6lB39wpP0dd =E7Mf -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "We've got a little less than normal thanks to the holidays in December, but there's the usual summary below. The highlight is probably the 52-bit physical addressing (LPA2) clean-up from Ard. Confidential Computing: - Register a platform device when running in CCA realm mode to enable automatic loading of dependent modules CPU Features: - Update a bunch of system register definitions to pick up new field encodings from the architectural documentation - Add hwcaps and selftests for the new (2024) dpISA extensions Documentation: - Update EL3 (firmware) requirements for booting Linux on modern arm64 designs - Remove stale information about the kernel virtual memory map Miscellaneous: - Minor cleanups and typo fixes Memory management: - Fix vmemmap_check_pmd() to look at the PMD type bits - LPA2 (52-bit physical addressing) cleanups and minor fixes - Adjust physical address space depending upon whether or not LPA2 is enabled Perf and PMUs: - Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU - Extend AXI filtering support for the DDR PMU on NXP IMX SoCs - Fix Designware PCIe PMU event numbering - Add generic branch events for the Apple M1 CPU PMU - Add support for Marvell Odyssey DDR and LLC-TAD PMUs - Cleanups to the Hisilicon DDRC and Uncore PMU code - Advertise discard mode for the SPE PMU - Add the perf users mailing list to our MAINTAINERS entry" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits) Documentation: arm64: Remove stale and redundant virtual memory diagrams perf docs: arm_spe: Document new discard mode perf: arm_spe: Add format option for discard mode MAINTAINERS: Add perf list for drivers/perf/ arm64: Remove duplicate included header drivers/perf: apple_m1: Map generic branch events arm64: rsi: Add automatic arm-cca-guest module loading kselftest/arm64: Add 2024 dpISA extensions to hwcap test KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1 arm64/hwcap: Describe 2024 dpISA extensions to userspace arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12 arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu() arm64: mm: Test for pmd_sect() in vmemmap_check_pmd() arm64/mm: Replace open encodings with PXD_TABLE_BIT arm64/mm: Rename pte_mkpresent() as pte_mkvalid() arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09 ...	2025-01-20 21:21:49 -08:00
Athira Rajeev	91b7747dc7	perf test: Update event_groups test to use instructions In some of the powerpc platforms, event group testcase fails as below: # perf test -v 'Event groups' 69: Event groups : --- start --- test child forked, pid 9765 Using CPUID 0x00820200 Using hv_24x7 for uncore pmu event 0x0 0x0, 0x0 0x0, 0x0 0x0: Fail 0x0 0x0, 0x0 0x0, 0x1 0x3: Pass The testcase creates various combinations of hw, sw and uncore PMU events and verify group creation succeeds or fails as expected. This tests one of the limitation in perf where it doesn't allow creating a group of events from different hw PMUs. The testcase starts a leader event and opens two sibling events. The combination the fails is three hardware events in a group. "0x0 0x0, 0x0 0x0, 0x0 0x0: Fail" Type zero and config zero which translates to PERF_TYPE_HARDWARE and PERF_COUNT_HW_CPU_CYCLE. There is event constraint in powerpc that events using same counter cannot be programmed in a group. Here there is one alternative event for cycles, hence one leader and only one sibling event can go in as a group. if all three events (leader and two sibling events), are hardware events, use instructions as one of the sibling event. Since PERF_COUNT_HW_INSTRUCTIONS is a generic hardware event and present in all architectures, use this as third event. Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20250110094620.94976-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-18 10:32:57 -08:00
Kuan-Wei Chiu	62892e77b8	perf bench: Fix undefined behavior in cmpworker() The comparison function cmpworker() violates the C standard's requirements for qsort() comparison functions, which mandate symmetry and transitivity: Symmetry: If x < y, then y > x. Transitivity: If x < y and y < z, then x < z. In its current implementation, cmpworker() incorrectly returns 0 when w1->tid < w2->tid, which breaks both symmetry and transitivity. This violation causes undefined behavior, potentially leading to issues such as memory corruption in glibc [1]. Fix the issue by returning -1 when w1->tid < w2->tid, ensuring compliance with the C standard and preventing undefined behavior. Link: https://www.qualys.com/2024/01/30/qsort.txt [1] Fixes: `121dd9ea01` ("perf bench: Add epoll parallel epoll_wait benchmark") Cc: stable@vger.kernel.org Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250116110842.4087530-1-visitorckw@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-18 10:14:36 -08:00
Ian Rogers	035f0c279b	perf annotate: Prefer passing evsel to evsel->core.idx An evsel idx may not be stable due to sorting, evlist removal, etc. Try to reduce it being part of APIs by explicitly passing the evsel in annotate code. Internally the code just reads evsel->core.idx so behavior is unchanged. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Chen Ni <nichen@iscas.ac.cn> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20250117181848.690474-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-18 10:02:10 -08:00
Chun-Tse Shao	ac22d75377	perf lock: Rename fields in lock_type_table `lock_type_table` contains `name` and `str` which can be confusing. Rename them to `flags_name` and `lock_name` and add descriptions to enhance understanding. Tested by building perf for x86. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-3-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 10:12:41 -08:00
Chun-Tse Shao	e9188ae3cd	perf lock: Add percpu-rwsem for type filter percpu-rwsem was missing in man page. And for backward compatibility, replace `pcpu-sem` with `percpu-rwsem` before parsing lock name. Tested `./perf lock con -ab -Y pcpu-sem` and `./perf lock con -ab -Y percpu-rwsem` Fixes: `4f701063bf` ("perf lock contention: Show lock type with address") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-2-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 10:12:40 -08:00
Chun-Tse Shao	1be9264158	perf lock: Fix parse_lock_type which only retrieve one lock flag `parse_lock_type` can only add the first lock flag in `lock_type_table` given input `str`. For example, for `Y rwlock`, it only adds `rwlock:R` into this perf session. Another example is for `-Y mutex`, it only adds the mutex without `LCB_F_SPIN` flag. The patch fixes this issue, makes sure both `rwlock:R` and `rwlock:W` will be added with `-Y rwlock`, and so on. Testing: $ ./perf lock con -ab -Y mutex,rwlock -- perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 9.313 [sec] 9.313976 usecs/op 107365 ops/sec contended total wait max wait avg wait type caller 176 1.65 ms 19.43 us 9.38 us mutex pipe_read+0x57 34 180.14 us 10.93 us 5.30 us mutex pipe_write+0x50 7 77.48 us 16.09 us 11.07 us mutex do_epoll_wait+0x24d 7 74.70 us 13.50 us 10.67 us mutex do_epoll_wait+0x24d 3 35.97 us 14.44 us 11.99 us rwlock:W ep_done_scan+0x2d 3 35.00 us 12.23 us 11.66 us rwlock:W do_epoll_wait+0x255 2 15.88 us 11.96 us 7.94 us rwlock:W do_epoll_wait+0x47c 1 15.23 us 15.23 us 15.23 us rwlock:W do_epoll_wait+0x4d0 1 14.26 us 14.26 us 14.26 us rwlock:W ep_done_scan+0x2d 2 14.00 us 7.99 us 7.00 us mutex pipe_read+0x282 1 12.29 us 12.29 us 12.29 us rwlock:R ep_poll_callback+0x35 1 12.02 us 12.02 us 12.02 us rwlock:W do_epoll_ctl+0xb65 1 10.25 us 10.25 us 10.25 us rwlock:R ep_poll_callback+0x35 1 7.86 us 7.86 us 7.86 us mutex do_epoll_ctl+0x6c1 1 5.04 us 5.04 us 5.04 us mutex do_epoll_ctl+0x3d4 [namhyung: Add a comment and rename to 'mutex:spin' for consistency Fixes: `d783ea8f62` ("perf lock contention: Simplify parse_lock_type()") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-1-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 10:11:57 -08:00
Athira Rajeev	83196dd349	perf lock: Fix return code for functions in __cmd_contention perf lock contention returns zero exit value even if the lock contention BPF setup failed. # ./perf lock con -b true libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF libbpf: Error loading vmlinux BTF: -ESRCH libbpf: failed to load object 'lock_contention_bpf' libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH Failed to load lock-contention BPF skeleton lock contention BPF setup failed # echo $? 0 Fix this by saving the return code for lock_contention_prepare so that command exits with proper return code. Similarly set the return code properly for two other functions in builtin-lock, namely setup_output_field() and select_key(). Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250110093730.93610-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 10:09:54 -08:00
Dmitry Vyukov	036e2faa99	perf hist: Fix width calculation in hpp__fmt() hpp__width_fn() round up width to length of the field name, hpp__fmt() should do it too. Otherwise, the numbers may end up unaligned if the field name is long. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250108065949.235718-1-dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 09:51:18 -08:00
Dmitry Vyukov	8b4799e4f0	perf hist: Fix bogus profiles when filters are enabled When a filtered column is not present in the sort order, profiles become arbitrary broken. Filtered and non-filtered entries are collapsed together, and the filtered-by field ends up with a random value (either from a filtered or non-filtered entry). If we end up with filtered entry/value, then the whole collapsed entry will be filtered out and will be missing in the profile. If we end up with non-filtered entry/value, then the overhead value will be wrongly larger (include some subset of filtered out samples). This leads to very confusing profiles. The problem is hard to notice, and if noticed hard to understand. If the filter is for a single value, then it can be fixed by adding the corresponding field to the sort order (provided user understood the problem). But if the filter is for multiple values, it's impossible to fix b/c there is no concept of binary sorting based on filter predicate (we want to group all non-filtered values in one bucket, and all filtered values in another). Examples of affected commands: perf report --tid=123 perf report --sort overhead,symbol --comm=foo,bar Fix this by considering filtered status as the highest priority sort/collapse predicate. As a side effect this effectively adds a new feature of showing profile where several lines are combined based on arbitrary filtering predicate. For example, showing symbols from binaries foo and bar combined together, but not from other binaries; or showing combined overhead of several particular threads. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/359dc444ce94d20e59d3a9e360c36fbeac833a04.1736927981.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 13:43:28 -08:00
Dmitry Vyukov	cd57c04c38	perf hist: Deduplicate cmp/sort/collapse code Application of cmp/sort/collapse fmt callbacks is duplicated 6 times. Factor it into a common helper function. NFC. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/84c4b55614e24a344f86ae0db62e8fa8f251f874.1736927981.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 13:43:28 -08:00
Ian Rogers	4e38f2814f	perf test: Improve verbose documentation Add a little more detail on the output expectations for each verbose level. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Ian Rogers	1c0d9816e9	perf test: Add a runs-per-test flag To detect flakes it is useful to run tests more than once. Add a runs-per-test flag that will run each test multiple times. Example output: ``` $ perf test -r 3 lbr -v 122: perf record LBR tests : Ok 122: perf record LBR tests : Ok 122: perf record LBR tests : Ok ``` Update the documentation for the runs-per-test option. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Ian Rogers	4dd8bc4bf5	perf test: Fix parallel/sequential option documentation The parallel option was removed in commit `94d1a913bd` ("perf test: Make parallel testing the default"). Update the sequential documentation to reflect it isn't the default except for "exclusive" tests. Fixes: `94d1a913bd` ("perf test: Make parallel testing the default") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Ian Rogers	2b7b78efc8	perf test: Send list output to stdout rather than stderr Follow the workload listing in using stdout rather than stderr. Correct the numbering of sub-tests to be 1.1 rather than 1:1. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Ian Rogers	2e47c503de	perf test: Rename functions and variables for better clarity The relationship between subtests and test cases is somewhat confusing, so let's do away with the notion of sub-tests and switch to just working with some number of test cases. Add a test_suite__for_each_test_case as in many cases, except the special one test case situation, the iteration can just be on all test cases. Switch variable names to be more intention revealing of what their value is. This work was motivated by discussion with Kan where it was noted the code is becoming overly indented: https://lore.kernel.org/lkml/20241109160219.49976-1-irogers@google.com/ Unifying more of the sub-test/no-sub-tests avoids one level of indentation in a number of places. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Charlie Jenkins	f2868b1a66	perf tools: Expose quiet/verbose variables in Makefile.perf The variables to make builds silent/verbose live inside tools/build/Makefile.build. Move those variables to the top-level Makefile.perf to be generally available. Committer testing: See the SYSCALL lines, now they are consistent with the other operations in other lines: SYSTBL /tmp/build/perf-tools-next/arch/x86/include/generated/asm/syscalls_32.h SYSTBL /tmp/build/perf-tools-next/arch/x86/include/generated/asm/syscalls_64.h GEN /tmp/build/perf-tools-next/common-cmds.h GEN /tmp/build/perf-tools-next/arch/arm64/include/generated/asm/sysreg-defs.h PERF_VERSION = 6.13.rc2.g3d94bb6ed1d0 GEN perf-archive MKDIR /tmp/build/perf-tools-next/jvmti/ MKDIR /tmp/build/perf-tools-next/jvmti/ MKDIR /tmp/build/perf-tools-next/jvmti/ MKDIR /tmp/build/perf-tools-next/jvmti/ GEN perf-iostat CC /tmp/build/perf-tools-next/jvmti/libjvmti.o Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20250114-perf_make_test-v1-1-decc1c517b11@rivosinc.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 10:59:20 -08:00
Arnaldo Carvalho de Melo	e9cbc854d8	perf config: Add a function to set one variable in .perfconfig To allow for setting a variable from some other tool, like with the "wallclock" patchset needs to allow the user to opt-in to having that key in the sort order for 'perf report'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/lkml/Z4akewi7UPXpagce@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 15:05:56 -03:00
Veronika Molnarova	1ab138febc	perf test perftool_testsuite: Return correct value for skipping In 'perf test', a return value 2 represents that the test case was skipped. Fix this value for perftool_testsuite test cases to differentiate between skip and pass values. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250113182605.130719-3-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:20 -03:00
Veronika Molnarova	5afd6d38cf	perf test perftool_testsuite: Add missing description Properly name the test cases of perftool_testsuite instead of the license being taken as the name for 'perf test'. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250113182605.130719-2-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:20 -03:00
Leo Yan	9a7b618ef6	perf test record+probe_libc_inet_pton: Make test resilient The test failed back and forth due to the call chain being heavily impacted by the libc, which varies across different architectures and distros. The libc contains the symbols for "gaih_inet" and "getaddrinfo" in some cases, but not always. Moreover, these symbols can be either normal symbols or dynamic symbols, making it difficult to decide the call chain entries due to the symbols are inconsistent. To fix the issue, this commit identifies three call chain entries are always present. These entries are matched by iterating through the lines in the "perf script" result. The recording attribute max-stack is set to 4 for the possible maximum call chain depth. After: # perf test -vF pton --- start --- Pattern: ping[][0-9 \.:]+probe_libc:inet_pton: $[[:xdigit:]]+$ Matching: ping 285058 [025] 1219802.466939: probe_libc:inet_pton: (ffffa14b7cf0) Pattern: .inet_pton\+0x[[:xdigit:]]+[[:space:]]$/usr/lib/aarch64-linux-gnu/libc-2.31.so\|inlined$$ Matching: ping 285058 [025] 1219802.466939: probe_libc:inet_pton: (ffffa14b7cf0) Matching: ffffa14b7cf0 __GI___inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc-2.31.so) Pattern: .(\+0x[[:xdigit:]]+\|\[unknown\])[[:space:]]$./bin/ping.$$ Matching: ping 285058 [025] 1219802.466939: probe_libc:inet_pton: (ffffa14b7cf0) Matching: ffffa14b7cf0 __GI___inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc-2.31.so) Matching: ffffa1488040 getaddrinfo+0xe8 (/usr/lib/aarch64-linux-gnu/libc-2.31.so) Matching: aaaab8672da4 [unknown] (/usr/bin/ping) ---- end ---- 82: probe libc's inet_pton & backtrace it with ping : Ok Closes: https://lore.kernel.org/linux-perf-users/1728978807-81116-1-git-send-email-renyu.zj@linux.alibaba.com/ Closes: https://lore.kernel.org/linux-perf-users/Z0X3AYUWkAgfPpWj@x1/T/#m57327e135b156047e37d214a0d453af6ae1e02be Reported-by: Guilherme Amadio <amadio@gentoo.org> Reported-by: Jing Zhang <renyu.zj@linux.alibaba.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241202111958.553403-1-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:20 -03:00
Ian Rogers	8e246a1b2a	perf inject: Fix use without initialization of local variables Local variables were missing initialization and command line processing didn't provide default values. Fixes: `64eed019f3` ("perf inject: Lazy build-id mmap2 event insertion") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241211060831.806539-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
James Clark	6804a7192a	perf probe: Rename err label Rename err to out to avoid confusion because buf is still supposed to be freed in non error cases. Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241211085525.519458-3-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Ian Rogers	f9c506fb69	perf test stat: Avoid hybrid assumption when virtualized The cycles event will fallback to task-clock in the hybrid test when running virtualized. Change the test to not fail for this. Fixes: `65d1182191` ("perf test: Add a test for default perf stat command") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241212173354.9860-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Athira Rajeev	2adbf5349a	perf record: Fix segfault with --off-cpu when debuginfo is not enabled When kernel is built without debuginfo, running 'perf record' with --off-cpu results in segfault as below: ./perf record --off-cpu -e dummy sleep 1 libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF Segmentation fault (core dumped) The backtrace pointed to: #0 0x00000000100fb17c in btf.type_cnt () #1 0x00000000100fc1a8 in btf_find_by_name_kind () #2 0x00000000100fc38c in btf.find_by_name_kind () #3 0x00000000102ee3ac in off_cpu_prepare () #4 0x000000001002f78c in cmd_record () #5 0x00000000100aee78 in run_builtin () #6 0x00000000100af3e4 in handle_internal_command () #7 0x000000001001004c in main () Code sequence is: static void check_sched_switch_args(void) { struct btf btf = btf__load_vmlinux_btf(); const struct btf_type t1, t2, t3; u32 type_id; type_id = btf__find_by_name_kind(btf, "btf_trace_sched_switch", BTF_KIND_TYPEDEF); btf__load_vmlinux_btf() fails when CONFIG_DEBUG_INFO_BTF is not enabled. Here bpf__find_by_name_kind() calls btf__type_cnt() with NULL btf value and results in segfault. To fix this, add a check to see if btf is not NULL before invoking bpf__find_by_name_kind(). Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://lore.kernel.org/r/20241223135813.8175-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Athira Rajeev	8c1a106635	perf tests base_probe: Fix check for the count of existing probes in test_adding_kernel perftool-testsuite_probe fails in test_adding_kernel as below: Regexp not found: "probe:inode_permission_11" -- [ FAIL ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (with force) (output regexp parsing) event syntax error: 'probe:inode_permission_11' \___ unknown tracepoint Error: File /sys/kernel/tracing//events/probe/inode_permission_11 not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. The test does the following: 1) Adds a probe point first using: $CMD_PERF probe --add $TEST_PROBE 2) Then tries to add same probe again without —force and expects it to fail. Next tries to add same probe again with —force. In this case, perf probe succeeds and adds the probe with a suffix number. Example: ./perf probe --add inode_permission Added new event: probe:inode_permission (on inode_permission) ./perf probe --add inode_permission --force Added new event: probe:inode_permission_1 (on inode_permission) ./perf probe --add inode_permission --force Added new event: probe:inode_permission_2 (on inode_permission) Each time, suffix is added to existing probe name. To get the suffix number, test cases uses: NO_OF_PROBES=`$CMD_PERF probe -l \| wc -l` This will work if there is no other probe existing in the system. If there are any other probes other than kernel probes or inode_permission, ( example: any probe), "perf probe -l" will include count for other probes too. Example, in the system where this failed, already some probes were default added. So count became 10 ./perf probe -l \| wc -l 10 So to be specific for "inode_permission", restrict the probe count check to that probe point alone using: NO_OF_PROBES=`$CMD_PERF probe -l $TEST_PROBE\| wc -l` Similarly while removing the probe using "probe --del *", (removing all probes), check uses: ../common/check_all_lines_matched.pl "Removed event: probe:$TEST_PROBE" But if there are other probes in the system, the log will contain reference to other existing probe too. Hence change usage of check_all_lines_matched.pl to check_all_patterns_found.pl This will make sure expecting string comes in the result Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250110094324.94604-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Michel Lind	8bf18c5cef	perf MANIFEST: Add license files The standalone tarballs should include the license files - both the COPYING declaration as well as the text of GPLv2. Signed-off-by: Michel Lind <michel@michel-slm.name> Link: https://lore.kernel.org/r/Z0Zcx0WRqtlUYpgw@hyperscale.parallels Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
James Clark	3178155d29	perf test brstack: Speed up running test by using tr -s instead of xargs The brstack test runs quite slowly in software models. Part of the reason is "xargs -n1" is quite inefficient in replacing spaces with newlines. While that's not noticeable on normal machines, it is on software models. Use "tr -s ' ' '\n'" instead which can do the same transformation, but is much faster. For comparison on an M1 Macbook Pro: $ time seq -s ' ' 10000 \| xargs -n1 > /dev/null real 0m2.729s user 0m2.009s sys 0m0.914s $ time seq -s ' ' 10000 \| tr -s ' ' '\n' \| grep '.' > /dev/null real 0m0.002s user 0m0.001s sys 0m0.001s The "grep '.'" is also needed to remove any remaining blank lines. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241213231312.2640687-2-robh@kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> [robh: Drop changing loop iterations on arm64. Squash blank line fix and redo commit msg] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Charlie Jenkins	b1bb6fc06b	perf tools mips: Fix mips syscall generation The mips syscall generation was still based on the old method. Delete the Makefile since it is no longer needed with the new method of generation. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Fixes: `619ffe6694` ("perf tools mips: Use generic syscall scripts") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250110-perf_fix_mips-v1-1-4e661c3b710a@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-13 11:46:41 -03:00
James Clark	05cd60e4d0	perf tests arm_spe: Add test for discard mode Add a test that checks that there were no AUX or AUXTRACE events recorded when discard mode is used. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108142904.401139-6-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-13 11:45:05 -03:00
James Clark	9c3164ea7e	perf tools arm-spe: Don't allocate buffer or tracking event in discard mode The buffer will never be written to so don't bother allocating it. The tracking event is also not required. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108142904.401139-5-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-13 11:45:03 -03:00
James Clark	23a65c5e8b	perf tools arm-spe: Pull out functions for aux buffer and tracking setup These won't be used in the next commit in discard mode, so put them in their own functions. No functional changes intended. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108142904.401139-4-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-13 11:43:15 -03:00
Jiachen Zhang	ac0ac75189	perf report: Fix misleading help message about --demangle The wrong help message may mislead users. This commit fixes it. Fixes: `328ccdace8` ("perf report: Add --no-demangle option") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Jiachen Zhang <me@jcix.top> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250109152220.1869581-1-me@jcix.top Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 14:46:09 -03:00
Namhyung Kim	510f0247cd	perf ftrace: Fix display for range of the first bucket When min_latency is not given, it prints 0 - 0. It should be 0 - 1. Before: $ sudo ./perf ftrace latency -a -T do_futex sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 0 us \| 321 \| ########### \| ... After: $ sudo ./perf ftrace latency -a -T do_futex sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 699 \| ############ \| ... Fixes: `08b875b6bf` ("perf ftrace latency: Introduce --min-latency to narrow down into a latency range") Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gabriele Monaco <gmonaco@redhat.com Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108210015.1188531-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 14:45:43 -03:00
Namhyung Kim	dd01b985c5	perf ftrace: Check min/max latency only with bucket range It's an optional feature and remains 0 when bucket range is not given. And it makes the histogram goes to the last entry always because any latency (num) is greater than or equal to 0. Before: $ sudo ./perf ftrace latency -a -T do_futex sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 0 us \| 0 \| \| 1 - 2 us \| 0 \| \| 2 - 4 us \| 0 \| \| 4 - 8 us \| 0 \| \| 8 - 16 us \| 0 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - 2 ms \| 0 \| \| 2 - 4 ms \| 0 \| \| 4 - 8 ms \| 0 \| \| 8 - 16 ms \| 0 \| \| 16 - 32 ms \| 0 \| \| 32 - 64 ms \| 0 \| \| 64 - 128 ms \| 0 \| \| 128 - 256 ms \| 0 \| \| 256 - 512 ms \| 0 \| \| 512 - 1024 ms \| 0 \| \| 1 - ... s \| 1353 \| ############################################## \| After: $ sudo ./perf ftrace latency -a -T do_futex sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 0 us \| 321 \| ########### \| 1 - 2 us \| 132 \| #### \| 2 - 4 us \| 202 \| ####### \| 4 - 8 us \| 188 \| ###### \| 8 - 16 us \| 16 \| \| 16 - 32 us \| 12 \| \| 32 - 64 us \| 30 \| # \| 64 - 128 us \| 98 \| ### \| 128 - 256 us \| 53 \| # \| 256 - 512 us \| 57 \| ## \| 512 - 1024 us \| 9 \| \| 1 - 2 ms \| 9 \| \| 2 - 4 ms \| 1 \| \| 4 - 8 ms \| 98 \| ### \| 8 - 16 ms \| 5 \| \| 16 - 32 ms \| 7 \| \| 32 - 64 ms \| 32 \| # \| 64 - 128 ms \| 10 \| \| 128 - 256 ms \| 10 \| \| 256 - 512 ms \| 2 \| \| 512 - 1024 ms \| 0 \| \| 1 - ... s \| 0 \| \| Fixes: `690a052a6d` ("perf ftrace latency: Add --max-latency option") Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gabriele Monaco <gmonaco@redhat.com Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108210015.1188531-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 14:43:55 -03:00
James Clark	ba113ecad8	perf docs: arm_spe: Document new discard mode Document the flag along with PMU events to hint what it's used for and give an example with other useful options to get minimal output. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250108142904.401139-3-james.clark@linaro.org Signed-off-by: Will Deacon <will@kernel.org>	2025-01-10 14:50:55 +00:00
Arnaldo Carvalho de Melo	74c033b6aa	perf MANIFEST: Add arch//include/uapi/asm/bpf_perf_event.h to the perf tarball Needed to build tools/lib/bpf/ on various arches other than x86_64, notably arm64 when using the perf tarballs generated by: $ make help \| grep perf- perf-tar-src-pkg - Build the perf source tarball with no compression perf-targz-src-pkg - Build the perf source tarball with gzip compression perf-tarbz2-src-pkg - Build the perf source tarball with bz2 compression perf-tarxz-src-pkg - Build the perf source tarball with xz compression perf-tarzst-src-pkg - Build the perf source tarball with zst compression $ Building with BPF support was opt-in in perf for a long time, and testing it via the tarball main kernel Makefile targets in an architecture other than x86_64 was an odd case. I had noticed this at some point earlier this year while cross building perf to some arches, including arm64, but it fell thru the cracks, see the Link tag below. Fix it now by adding those arch//include/uapi/asm/bpf_perf_event.h files to the MANIFEST file used in building the perf source tarball. Tested with: perfbuilder@number:~$ time dm debian:experimental-x-arm64 1 21.60 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 14.1.0-5) 14.1.0 flex 2.6.4 BUILD_TARBALL_HEAD=d31a974f6edc576f84c35be9526fec549a3b3520 $ $ git log --oneline -1 d31a974f6edc576f84c35be9526fec549a3b3520 d31a974f6edc576f (HEAD -> perf-tools-next) perf MANIFEST: Add arch//include/uapi/asm/bpf_perf_event.h to the perf tarball $ That was previously failing: perfbuilder@number:~$ grep debian:experimental-x-arm64 dm.log.old/summary 19 4.80 debian:experimental-x-arm64 : FAIL gcc version 14.1.0 (Debian 14.1.0-5) $ perfbuilder@number:~$ grep -B6 'Error 1' dm.log.old/debian:experimental-x-arm64 In file included from /git/perf-6.12.0-rc6/tools/include/uapi/linux/bpf_perf_event.h:11, from libbpf.c:36: /git/perf-6.12.0-rc6/tools/include/uapi/asm/bpf_perf_event.h:2:10: fatal error: ../../arch/arm64/include/uapi/asm/bpf_perf_event.h: No such file or directory 2 \| #include "../../arch/arm64/include/uapi/asm/bpf_perf_event.h" \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make[4]: ** [/git/perf-6.12.0-rc6/tools/build/Makefile.build:105: /tmp/build/perf/libbpf/staticobjs/libbpf.o] Error 1 perfbuilder@number:~$ Closes: https://lore.kernel.org/all/Z0UNRCRYKunbDYxP@hyperscale.parallels Fixes: `9eea8fafe3` ("libbpf: fix __arg_ctx type enforcement for perf_event programs") Reported-by: Michel Lind <michel@michel-slm.name> Tested-by: Michel Lind <michel@michel-slm.name> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: 317c11923cf676437456e44a7f408d4ce589a9c0.camel@michel-slm.name Link: https://lore.kernel.org/bpf/ZfyEgoG3JFiOs2Fs@x1/ Link: https://lore.kernel.org/r/Z0Yy5u42Q1hWoEzz@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Yoshihiro Furudera	e5e34e9995	perf vendor events arm64: Add FUJITSU-MONAKA PMU event Add PMU events for FUJITSU-MONAKA. And, also updated common-and-microarch.json and recommended.json. FUJITSU-MONAKA Specification URL: https://github.com/fujitsu/FUJITSU-MONAKA Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Akio Kakuno <fj3333bs@aa.jp.fujitsu.com> Signed-off-by: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241217065751.1448755-1-fj5100bi@fujitsu.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Namhyung Kim	876e80cf83	perf tools: Fixup end address of modules In machine__create_module(), it reads /proc/modules to get a list of modules in the system. The file shows the start address (of text) and the size of the module so it uses the info to reconstruct system memory maps for symbol resolution. But module memory consists of multiple segments and they can be scaterred. Currently perf tools assume they are contiguous and see some overlaps. This can confuse the tool when it finds a map containing a given address. As we mostly care about the function symbols in the text segment, it can fixup the size or end address of modules when there's an overlap. We can use maps__fixup_end() which updates the end address using the start address of the next map. Ideally it should be able to track other segments (like data/rodata), but that would require some changes in /proc/modules IMHO. Reported-by: Blake Jones <blakejones@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20241218220453.203069-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Namhyung Kim	8c2eafbbfd	perf symbol: Prefer non-label symbols with same address When there are more than one symbols at the same address, it needs to choose which one is better. In choose_best_symbol() it didn't check the type of symbols. It's possible to have labels in other symbols and in that case, it would be better to pick the actual symbol over the labels. To minimize the possible impact on other symbols, I only check NOTYPE symbols specifically. $ readelf -sW vmlinux \| grep -e __do_softirq -e __softirqentry_text_start 105089: ffffffff82000000 814 FUNC GLOBAL DEFAULT 1 __do_softirq 111954: ffffffff82000000 0 NOTYPE GLOBAL DEFAULT 1 __softirqentry_text_start The commit `77b004f4c5` tried to do the same by not giving the size to the label symbols but it seems there's some label-only symbols in asm code. Let's restore the original code and choose the right symbol using type of the symbols. Fixes: `77b004f4c5` ("perf symbol: Do not fixup end address of labels") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: http://lore.kernel.org/lkml/Z3b-DqBMnNb4ucEm@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Ian Rogers	368781025a	perf symbol-elf: Avoid a weak cxx_demangle_sym function cxx_demangle_sym is weak in case demangle-cxx.c replaces the definition in symbol-elf.c. When demangle-cxx.c is built HAVE_CXA_DEMANGLE_SUPPORT is defined, as such the define can be used to avoid a weak symbol. As weak symbols are outside of the C standard their use can lead to strange behaviors, in particular with LTO, as well as causing issues to be hidden at link time. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241119031754.1021858-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Namhyung Kim	4f90ed0ae3	perf trace: Fix unaligned access for augmented args Some version of compilers reported unaligned accesses in perf trace when undefined-behavior sanitizer is on. I found that it uses raw data in the sample directly and assuming it's properly aligned. Unlike other sample fields, the raw data is not 8-byte aligned because there's a size field (u32) before the actual data. So I added a static buffer in syscall__augmented_args() and return it instead. This is not ideal but should work well as perf trace is single-threaded. A better approach would be aligning the raw data by adding a 4-byte data before the augmented args but I'm afraid it'd break the backward compatibility. Committer testing: To build with the undefined behaviour sanitizer: $ make CC=clang EXTRA_CFLAGS=-fsanitize=undefined -C tools/perf Checking if the resulting binary is instrumented: root@number:~# nm ~/bin/perf \| grep ubsan \| wc -l 113 root@number:~# nm ~/bin/perf \| grep ubsan \| tail -5 000000000043d5b0 t _ZN7__ubsanL19UBsanOnDeadlySignalEiPvS0_ 000000000043ce50 T _ZNK7__ubsan5Value12getSIntValueEv 000000000043cf40 T _ZNK7__ubsan5Value12getUIntValueEv 000000000043d140 T _ZNK7__ubsan5Value13getFloatValueEv 000000000043cfd0 T _ZNK7__ubsan5Value19getPositiveIntValueEv root@number:~# Now running something that will access timespec, as reported in the Closes URL: root@number:~# perf trace --max-events=1 -e nano sleep 1.1 trace/beauty/timespec.c:10:64: runtime error: member access within misaligned address 0x7fc583cfb2a4 for type 'struct augmented_arg', which requires 8 byte alignment 0x7fc583cfb2a4: note: pointer points here 99 99 11 00 10 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 e1 f5 05 00 00 00 00 00 00 00 00 ^ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior trace/beauty/timespec.c:10:64 <SNIP> As Namhyung said we need to make the raw_data to be 64-bit aligned, probably we need to add a PERF_SAMPLE_ALIGNED_RAW with a 64-bit raw_size instead of the current u32 done at kernel/events/core.c, perf_output_sample(), that perf_output_put(handle, raw->size) where raw->size is an u32 and then the raw_data is always 64-bit unaligned... After the patch: root@number:~# perf trace -e nano sleep 1.1 0.000 (1100.064 ms): sleep/1984224 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 100000001 }, rmtp: 0x7fff5b3fe970) = 0 root@number:~# Closes: https://lore.kernel.org/r/Z2STgyD1p456Qqhg@google.com Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250102201248.790841-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
James Clark	0ba2022410	perf test: Mark remaining probe tests as exclusive Probes are global and other probe tests are already exclusive. These two tests can throw warnings when run at the same time so mark them as exclusive too: $ perf test -vvv 81 79 79: perftool-testsuite_probe: --- start --- test child forked, pid 46419 ../common/init.sh: line 137: /sys/kernel/debug/tracing/uprobe_events: Device or resource busy Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20250107165933.292225-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Charlie Jenkins	3cc550f5bb	perf tools: Remove dependency on libaudit All architectures now support HAVE_SYSCALL_TABLE_SUPPORT, so the flag is no longer needed. With the removal of the flag, the related GENERIC_SYSCALL_TABLE can also be removed. libaudit was only used as a fallback for when HAVE_SYSCALL_TABLE_SUPPORT was not defined, so libaudit is also no longer needed for any architecture. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-16-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Charlie Jenkins	00d1bfae1b	perf tools s390: Use generic syscall table scripts Use the generic scripts to generate headers from the syscall table instead of the custom ones for s390. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-15-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Charlie Jenkins	4c02c7e0a2	perf tools powerpc: Use generic syscall table scripts Use the generic scripts to generate headers from the syscall table instead of the custom ones for powerpc. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-14-7543b5293098@rivosinc.com Link: https://lore.kernel.org/lkml/20250110100505.78d81450@canb.auug.org.au [ Stephen Rothwell noticed on linux-next that the powerpc build for perf was broken and ...] Link: https://lore.kernel.org/lkml/20250109-perf_powerpc_spu-v1-1-c097fc43737e@rivosinc.com [ ... Charlie fixed it up and asked for it to be squashed to avoid breaking bisection. ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:57:01 -03:00
Charlie Jenkins	619ffe6694	perf tools mips: Use generic syscall scripts Use the generic scripts to generate headers from the syscall table for mips. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-13-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:56:20 -03:00
Charlie Jenkins	fa70857a27	perf tools loongarch: Use syscall table loongarch uses a syscall table, use that in perf instead of using unistd.h. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-12-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:56:00 -03:00
Charlie Jenkins	cb8197db8c	perf tools arm64: Use syscall table arm64 uses a syscall table, use that in perf instead of using unistd.h. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-11-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:55:36 -03:00
Charlie Jenkins	02f2d58f23	perf tools parisc: Support syscall header parisc uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-10-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:55:13 -03:00
Charlie Jenkins	bb4f842891	perf tools alpha: Support syscall header alpha uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-9-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:54:54 -03:00
Charlie Jenkins	a874d1f6f1	perf tools x86: Use generic syscall scripts Use the generic scripts to generate headers from the syscall table for both 32- and 64-bit x86. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-8-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:53:49 -03:00
Charlie Jenkins	24f122dc09	perf tools xtensa: Support syscall header xtensa uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-7-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:53:28 -03:00
Charlie Jenkins	1f44829e5e	perf tools sparc: Support syscall headers sparc uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-6-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:52:51 -03:00
Charlie Jenkins	430a6dfe41	perf tools sh: Support syscall headers sh uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-5-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:52:28 -03:00
Charlie Jenkins	9605665a64	perf tools arm: Support syscall headers arm uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-4-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:51:30 -03:00
Charlie Jenkins	c68825eed9	perf tools csky: Support generic syscall headers csky uses the generic syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Guo Ren <guoren@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-3-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:51:18 -03:00
Charlie Jenkins	26db672256	perf tools arc: Support generic syscall headers Arc uses the generic syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-2-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:50:56 -03:00
Charlie Jenkins	4a73aff8c5	perf tools: Create generic syscall table support Currently each architecture in perf independently generates syscall headers. Adapt the work that has gone into unifying syscall header implementations in the kernel to work with perf tools. Introduce this framework with riscv at first. riscv previously relied on libaudit, but with this change, perf tools for riscv no longer needs this external dependency. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-1-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:49:49 -03:00
Ian Rogers	6bfb4c571b	perf test cpumap: Avoid use-after-free following merge Previously cpu maps in the test weren't modified by calls to the cpu map API, however, perf_cpu_map__merge was modified so the left hand argument was updated. In the test this meant the maps copy of the "two" map was put/deleted in the merge meaning when accessed via maps, the pointer was stale and to the put/deleted memory. To fix this add an extra layer of indirection to the maps array, so the updated value of two is accessed. Fixes: `a9d2217556` ("libperf cpumap: Refactor perf_cpu_map__merge()") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108051511.1720369-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:40:05 -03:00
Dmitry Vyukov	9c64c7c658	perf llvm-add2line: Remove unused symbol_conf.h include Remove unused symbol_conf.h include. First, it's just unused. Second, it's problematic since this is a C++ file, and most perf headers don't compile as C++. So if any other includes are added to symbol_conf.h, it may break the build. Signed-off-by: Dmitriy Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250108070248.237943-1-dvyukov@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:38:32 -03:00
James Clark	58f4f294b3	perf test trace_btf_general: Fix shellcheck warning Shellcheck versions < v0.7.2 can't follow this path so add the helper to fix the following warning: tests/shell/trace_btf_general.sh line 8: . "$(dirname $0)"/lib/probe.sh ^--------------------------^ SC1090: Can't follow non-constant source. Use a directive to specify location. Fixes: `0255338d69` ("perf trace: Add tests for BTF general augmentation") Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250106164300.734202-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:36:04 -03:00
Arnaldo Carvalho de Melo	64a7617efd	perf namespaces: Fixup the nsinfo__in_pidns() return type, its bool When adding support for refconunt checking a cut'n'paste made this function, that is just an accessor to a bool member of 'struct nsinfo', return a pid_t, when that member is a boolean, fix it. Fixes: `bcaf0a9785` ("perf namespaces: Add functions to access nsinfo") Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20241206204828.507527-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:31:06 -03:00
Arnaldo Carvalho de Melo	74833e37df	perf jitdump: Fixup in_pidns member when java agent and 'perf record' are not in the same pidns When running 'perf record' outside a container and the java agent inside a container the jit_repipe_code_load() and friends will emit PERF_RECORD_MMAP2 entries for the jitdump records and will check if we need to fixup the pid/tid: nspid = jr->load.pid; pid = jr_entry_pid(jd, jr); tid = jr_entry_tid(jd, jr); The jr_entry_pid() function looks if we're in the same pidns: static pid_t jr_entry_pid(struct jit_buf_desc jd, union jr_entry jr) { if (jd->nsi && nsinfo__in_pidns(jd->nsi)) return nsinfo__tgid(jd->nsi); return jr->load.pid; } But since the thread, populated from perf.data records, try to figure out if in the same pidns by actually trying, on the system where 'perf inject' is running to open a procfs file (a bug that remains to be fixed), assuming that if it is not possible that is because that thread terminated and thus we can't get its namespace info and tolerates nsinfo__init() failing, noting only that that namespace can't be entered, so don't even try. But we can kinda get at least that info (thread->nsinfo->in_pidns) from the data in the perf.data file, namely the pid and tid in the PERF_RECORD_MMAP2 for the jit-<PID>.dump file generated from the java agent, if the PERF_RECORD_MMAP2->pid is the same as what is in the jitdump file, then we're in the same namespace, otherwise we need to use the PERF_RECORD_MMAP2->pid. This all has to be revamped for this jitdump + running perf from outside, as the meaning of in_pidns is being abused, the initialization of nsinfo->pid with the value coming from the PERF_RECORD_MMAP2 data is wrong as it is the pid _outside_ the container since perf was running there. The hack in this patch at least produces the expected result in this scenario by following the assumptions in the current codebase for finding maps and for generating the PERF_RECORD_MMAP2 for the ELF files synthesized from the jitdump records in jit_repipe_code_load(), etc.s Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20241206204828.507527-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:30:50 -03:00
Arnaldo Carvalho de Melo	9c6a585d25	perf namespaces: Introduce nsinfo__set_in_pidns() When we're processing a perf.data file we will, for every thread in that file do a machine__findnew_thread(machine, pid, tid) that when that pid is seen for the first time will create a 'struct thread' representing it. That in turn will call nsinfo__new() -> nsinfo__init() and there it will assume we're running live, which is wrong and will need to be addressed in a followup patch. The nsinfo__new() assumes that if we can't access that thread it has already finished and will ignore the -1 return from nsinfo__init(), just taking notes to avoid trying to enter in that namespace, since it isn't there anymore, a race. When doing this from 'perf inject', tho, we can fill in parts of that nsinfo from what we get from the PERF_RECORD_MMAP2 (pid, tid) and in the jitdump file name, that has the form of jit-<PID>.dump. So if the pid in the jitdump file name is not the one in the PERF_RECORD_MMAP2, we can assume that its the pid of the process _inside_ the namespace, and that perf was runing outside that namespace. This will be done in the following patch. Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20241206204828.507527-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:30:24 -03:00
Arnaldo Carvalho de Melo	f523347ba6	perf jitdump: Accept jitdump mmaps emitted from inside containers When the java agent is running inside a container it will emit mmaps with the format: ⬢ [acme@toolbox a]$ perf report -D \| grep PERF_RECORD_MMAP \| grep \.dump 0 0x15c400 [0x90]: PERF_RECORD_MMAP2 3308868/3308868: [0x7fb8de6cb000(0x1000) @ 0 08:14 3222905945 0]: r-xp /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jit-1.dump ⬢ [acme@toolbox a]$ Since perf is running from outside the container it sees the pid 3308868 in PERF_RECORD_MMAP2, while the agent saw the pid of the profiled app inside the container, 1. The previous validation was: if (pid && pid2 != nsinfo__nstgid(nsi)) pid2 at this point is '1' (/jit-1.dump), so it considers this as a malformed jitdump mmap and refuses to process it. The test ends up as: if (3308868 && 1 != 3308868) which is true and the jitdump is not processed. Since 1 in the container namespace is really 3308868 in the namespace that perf is running, consider this a valid mmap. We need to make perf realize this and behave accordingly, for now checking instead: if (pid && pid2 && pid != nsinfo__nstgid(nsi)) Translating to: if (3308868 && 1 && 3308868 != 3308868) Will make the jitdump mmap to be considered valid and processed. The jitdump is described in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jitdump-specification.txt Now we end up with the expected flurry of MMAPs, one per jitted function transformed into a little ELF file that should then be processable by the other perf features, like code annotation: [acme@toolbox a]$ echo $JITDUMPDIR /tmp/.debug/jit [acme@toolbox a]$ First use 'perf inject': ⬢ [acme@toolbox a]$ time perf inject -i perf.data -o acme-perf-injected.data -j Then look at the PERF_RECORD_MMAP events in the result file, that went thru the JIT map file: ⬢ [acme@toolbox a]$ ls -la /tmp/.map -rw-r--r--. 1 acme acme 2989559 Nov 27 16:11 /tmp/perf-3308868.map [acme@toolbox a]$ It is a symbol table: ⬢ [acme@toolbox a]$ head /tmp/.map 0x00007fb8bda5c1a0 0x00000000000000d0 java.lang.String java.lang.module.ModuleDescriptor.name() 0x00007fb8bda5c4a0 0x0000000000000178 int java.lang.StringLatin1.hashCode(byte[]) 0x00007fb8bda5c9a0 0x00000000000000d0 java.lang.String org.springframework.boot.context.config.ConfigDataLocation.getValue() 0x00007fb8bda5cca0 0x00000000000000d0 java.lang.module.ModuleDescriptor java.lang.module.ModuleReference.descriptor() 0x00007fb8bda5cfa0 0x00000000000000d0 java.lang.Object java.util.KeyValueHolder.getKey() 0x00007fb8bda5d2a0 0x00000000000000d0 java.lang.Object java.util.KeyValueHolder.getValue() 0x00007fb8bda5d5a0 0x0000000000000218 boolean jdk.internal.misc.Unsafe.compareAndSetReference(java.lang.Object, long, java.lang.Object, java.lang.Object) 0x00007fb8bda5d9a0 0x00000000000001f0 boolean jdk.internal.misc.Unsafe.compareAndSetLong(java.lang.Object, long, long, long) 0x00007fb8bda5dda0 0x00000000000001f8 void java.lang.System.arraycopy(java.lang.Object, int, java.lang.Object, int, int) 0x00007fb8bda5e1a0 0x00000000000001e8 int java.lang.Object.hashCode() ⬢ [acme@toolbox a]$ As specified in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jit-interface.txt This was collected from inside the container, so came as /tmp/perf-1.map. To make perf, running outside the container to use it we need to copy it to /tmp/perf-3308868.map. This is another logic that has to be added to perf to work on this scenario of running outside the container but processing things created by the hava agent running inside the container. With all this in place we get to: ⬢ [acme@toolbox a]$ perf report -D -i acme-perf-injected.data \| \ grep PERF_RECORD_MMAP > /tmp/acme-perf-injected.data.mmaps ; \ wc -l /tmp/acme-perf-injected.data.mmaps 44182 /tmp/acme-perf-injected.data.mmaps ⬢ [acme@toolbox a]$ tail /tmp/acme-perf-injected.data.mmaps 1030266786574466 0x7bc9e0 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0ceb1c0(0x8d0) @ 0x80 00:2c 238715 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so 1030266795288774 0x7bca78 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cecc00(0x7e8) @ 0x80 00:2c 238716 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so 1030266895967339 0x7bcb10 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cee500(0x3328) @ 0x80 00:2c 238717 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43991.so 1030266915748306 0x7bcba8 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0aae0a0(0x138) @ 0x80 00:2c 238718 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43992.so 1030267185851220 0x7bcc40 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cf61e0(0x3b50) @ 0x80 00:2c 238719 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43993.so 1030267231364524 0x7bccd8 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cfea80(0x14a0) @ 0x80 00:2c 238720 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43994.so 1030267425498831 0x7bcd70 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c054b4a0(0x338) @ 0x80 00:2c 238721 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43995.so 1030267506147888 0x7bce08 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0a995c0(0x1e8) @ 0x80 00:2c 238722 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43996.so 1030268112586116 0x7bcea0 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0d02520(0x258) @ 0x80 00:2c 238723 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43997.so 1030269435398150 0x7bcf38 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0d02dc0(0x278) @ 0x80 00:2c 238724 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43998.so ⬢ [acme@toolbox a]$ And if we look at those tiny ELF files generated by the jitdump code used by 'perf inject' we see: ⬢ [acme@toolbox a]$ file /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=790591db95a77d644657dfe5058658b200000000, with debug_info, not stripped ⬢ [acme@toolbox a]$ file /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=762f932acbee53a22638bf4c2b86780200000000, with debug_info, not stripped ⬢ [acme@toolbox a]$ ⬢ [acme@toolbox a]$ ls -la /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so -rw-r--r--. 1 acme acme 9432 Nov 29 10:56 /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so -rw-r--r--. 1 acme acme 7504 Nov 29 10:56 /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so ⬢ [acme@toolbox a]$ And: ⬢ [acme@toolbox a]$ objdump -dS /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so \| head -20 /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so: file format elf64-x86-64 Disassembly of section .text: 0000000000000080 <Lredacted/REDACTED/REDACTED/logging/RedactedRedacted;Redacted(Lredacted/REDACTED/REDACTED/redactedRedacted/Redacted;)V>: 80: 44 8b 56 08 mov 0x8(%rsi),%r10d 84: 49 c1 e2 03 shl $0x3,%r10 88: 49 3b c2 cmp %r10,%rax 8b: 0f 85 6f 15 83 fc jne fffffffffc831600 <Lredacted/REDACTED/REDACTED/redacted/RedactedRedactedRedacted;Redacted(Lredacted/Redacted/Redacted/redactedRedacted/Redacted;)V+0xfffffffffc831580> 91: 66 66 90 data16 xchg %ax,%ax 94: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 9b: 00 9c: 66 66 66 90 data16 data16 xchg %ax,%ax a0: 89 84 24 00 c0 fe ff mov %eax,-0x14000(%rsp) a7: 55 push %rbp a8: 48 8b ec mov %rsp,%rbp ab: 48 83 ec 40 sub $0x40,%rsp af: 48 89 34 24 mov %rsi,(%rsp) ⬢ [acme@toolbox a]$ The thing now being investigated is why we can't annotate anything here, maybe that JITDUMPDIR is getting in the way: ⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Internal error: Invalid -1 error code ⬢ [acme@toolbox a]$ In the tests I performed while merging this patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d518ac7be6223811ab947897273b1bbef846180 It works, but then there was no JITDUMPDIR involved: /home/acme/.debug/jit/java-jit-20241127.XXF1SRgN/jitted-3912413-4191.so ⬢ [acme@toolbox perf-tools-next]$ perf report --call-graph none --no-child -i perf-injected.data \| grep jitted- \| head 1.36% java jitted-3912413-54.so [.] Interpreter 0.30% C1 CompilerThre jitted-3912413-1.so [.] flush_icache_stub 0.18% java jitted-3912413-4184.so [.] org.apache.fop.fo.properties.PropertyMaker.get(int, org.apache.fop.fo.PropertyList, boolean, boolean) 0.18% java jitted-3912413-4177.so [.] org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int) 0.13% java jitted-3912413-3845.so [.] java.text.DecimalFormat.subformatNumber(java.lang.StringBuffer, java.text.Format$FieldDelegate, boolean, boolean, int, int, int, int) 0.11% java jitted-3912413-4191.so [.] org.apache.fop.fo.FObj.addChildNode(org.apache.fop.fo.FONode) 0.09% java jitted-3912413-2418.so [.] org.apache.fop.fo.XMLWhiteSpaceHandler.handleWhiteSpace() 0.08% Reference Handl jitted-3912413-54.so [.] Interpreter 0.08% java jitted-3912413-3326.so [.] org.apache.xmlgraphics.fonts.Glyphs.stringToGlyph(java.lang.String) 0.08% java jitted-3912413-3953.so [.] org.apache.fop.layoutmgr.BreakingAlgorithm.considerLegalBreak(org.apache.fop.layoutmgr.KnuthElement, int) ⬢ [acme@toolbox perf-tools-next]$ And then: ⬢ [acme@toolbox perf-tools-next]$ perf annotate --stdio2 -i perf-injected.data 'org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)' \| head -20 Samples: 8 of event 'cpu_atom/cycles/Pu', 4000 Hz, Event count (approx.): 8112794, [percent: local period] org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)() /home/acme/.debug/jit/java-jit-20241127.XXF1SRgN/jitted-3912413-4177.so Percent 0x80 <org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)>: nop movl 0x8(%rsi),%r10d cmpl 0x8(%rax),%r10d → jne 0 movl %eax,-0x14000(%rsp) pushq %rbp subq $0xb0,%rsp nop cmpl $0x3,0x20(%r15) ↓ jne 7037 2e: movl %ecx,0x28(%rsp) movq %rdx,%rbp movl 0x64(%rdx),%ebx cmpb $0x0,0x38(%r15) ↓ jne 3a44 movq %rsi,0x30(%rsp) 48: movq 0x30(%rsp),%r10 ⬢ [acme@toolbox perf-tools-next]$ No source code nor line numbers, that I saw in another build of perf for RHEL9, for the same workload described in the cset above (a publicly available java benchmark), so something to investigate on perf upstream running on fedora, maybe some quirk with the jdk used when building perf for RHEL 9 and for Fedora 40. A related patch that should have make this all work is: "perf inject jit: Add namespaces support" https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=67dec926931448d688efb5fe34f7b5a22470fc0a But we still need to polish this some more, maybe there are differences in the agent used in NodeJS with --perf-prof and the jvmti one we're using. Hopefully describing all the steps while we investigate this case will help us improve perf support for profiling JITed environments running in containers while profiling from inside and outside it. Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20241206204828.507527-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:29:51 -03:00
Christophe Leroy	7a93786c30	perf machine: Don't ignore _etext when not a text symbol Depending on how vmlinux.lds is written, _etext might be the very first data symbol instead of the very last text symbol. Don't require it to be a text symbol, accept any symbol type. Comitter notes: See the first Link for further discussion, but it all boils down to this: --- # grep -e _stext -e _etext -e _edata /proc/kallsyms c0000000 T _stext c08b8000 D _etext So there is no _edata and _etext is not text $ ppc-linux-objdump -x vmlinux \| grep -e _stext -e _etext -e _edata c0000000 g .head.text 00000000 _stext c08b8000 g .rodata 00000000 _etext c1378000 g .sbss 00000000 _edata --- Fixes: `ed9adb2035` ("perf machine: Read also the end of the kernel") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/b3ee1994d95257cb7f2de037c5030ba7d1bed404.1736327613.git.christophe.leroy@csgroup.eu Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Christophe Leroy	dae29277fd	perf maps: Fix display of kernel symbols Since commit `659ad3492b` ("perf maps: Switch from rbtree to lazily sorted array for addresses"), perf doesn't display anymore kernel symbols on powerpc, allthough it still detects them as kernel addresses. # Overhead Command Shared Object Symbol # ........ .......... ............. ...................................... # 80.49% Coeur main [unknown] [k] 0xc005f0f8 3.91% Coeur main gau [.] engine_loop.constprop.0.isra.0 1.72% Coeur main [unknown] [k] 0xc005f11c 1.09% Coeur main [unknown] [k] 0xc01f82c8 0.44% Coeur main libc.so.6 [.] epoll_wait 0.38% Coeur main [unknown] [k] 0xc0011718 0.36% Coeur main [unknown] [k] 0xc01f45c0 This is because function maps__find_next_entry() now returns current entry instead of next entry, leading to kernel map end address getting mis-configured with its own start address instead of the start address of the following map. Fix it by really taking the next entry, also make sure that entry follows current one by making sure entries are sorted. Fixes: `659ad3492b` ("perf maps: Switch from rbtree to lazily sorted array for addresses") Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/2ea4501209d5363bac71a6757fe91c0747558a42.1736329923.git.christophe.leroy@csgroup.eu Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Namhyung Kim	c738a34417	perf test: Update ftrace test to use --graph-opts I found it failed on machines with limited memory because 16M byte per-cpu buffer is too big. The reason it added the option is not to miss tracing data. Thus we can limit the data size by reducing the function call depth instead of increasing the buffer size to handle the whole data. As it used the same option in the test_ftrace_trace() and it was able to find the sleep function, it should work with the profile subcommand. Get rid of other grep commands which might be affected by the depth change. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20250107224352.1128669-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Namhyung Kim	e5f2024cb9	perf ftrace profile: Add --graph-opts option Like trace subcommand, it should be able to pass some options to control the tracing behavior for the function graph tracer. But some options are limited in order to maintain the internal behavior. For example, it can limit the function call depth like below: # perf ftrace profile --graph-opts depth=5 -- myprog Committer testing: root@number:~# perf ftrace profile --graph-opts thresh=1000 -- sleep 1 # Total (us) Avg (us) Max (us) Count Function 1001419.301 500709.650 1000032.000 2 x64_sys_call 1000032.000 1000032.000 1000032.000 1 __x64_sys_clock_nanosleep 1000032.000 1000032.000 1000032.000 1 common_nsleep 1000031.000 1000031.000 1000031.000 1 do_nanosleep 1000031.000 1000031.000 1000031.000 1 hrtimer_nanosleep 1000024.000 1000024.000 1000024.000 1 schedule 1387.208 1387.208 1387.208 1 __x64_sys_execve 1386.691 1386.691 1386.691 1 do_execveat_common.isra.0 1334.170 1334.170 1334.170 1 bprm_execve 1258.413 1258.413 1258.413 1 load_elf_binary 1123.068 1123.068 1123.068 1 begin_new_exec 1113.550 1113.550 1113.550 1 mmput 1109.237 1109.237 1109.237 1 exit_mmap root@number:~# perf ftrace profile --graph-opts thresh=1200 -- sleep 1 # Total (us) Avg (us) Max (us) Count Function 1001448.204 500724.102 1000018.000 2 x64_sys_call 1000017.000 1000017.000 1000017.000 1 __x64_sys_clock_nanosleep 1000017.000 1000017.000 1000017.000 1 common_nsleep 1000017.000 1000017.000 1000017.000 1 hrtimer_nanosleep 1000016.000 1000016.000 1000016.000 1 do_nanosleep 1000012.000 1000012.000 1000012.000 1 schedule 1430.112 1430.112 1430.112 1 __x64_sys_execve 1429.581 1429.581 1429.581 1 do_execveat_common.isra.0 1376.289 1376.289 1376.289 1 bprm_execve 1301.743 1301.743 1301.743 1 load_elf_binary root@number:~# Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250107224352.1128669-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Namhyung Kim	86a12b92a9	perf ftrace: Display latency statistics at the end Sometimes users also want to see average latency as well as histogram. Display latency statistics like avg, max, min at the end. $ sudo ./perf ftrace latency -ab -T synchronize_rcu -- ... # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 us \| 0 \| \| 2 - 4 us \| 0 \| \| 4 - 8 us \| 0 \| \| 8 - 16 us \| 0 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - 2 ms \| 0 \| \| 2 - 4 ms \| 0 \| \| 4 - 8 ms \| 0 \| \| 8 - 16 ms \| 1 \| ##### \| 16 - 32 ms \| 7 \| ######################################## \| 32 - 64 ms \| 0 \| \| 64 - 128 ms \| 0 \| \| 128 - 256 ms \| 0 \| \| 256 - 512 ms \| 0 \| \| 512 - 1024 ms \| 0 \| \| 1 - ... s \| 0 \| \| # statistics (in usec) total time: 171832 avg time: 21479 max time: 30906 min time: 15869 count: 8 Committer testing: root@number:~# perf ftrace latency -nab --bucket-range 100 --max-latency 512 -T switch_mm_irqs_off sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 100 ns \| 314 \| ## \| 100 - 200 ns \| 1843 \| ############# \| 200 - 300 ns \| 1390 \| ########## \| 300 - 400 ns \| 844 \| ###### \| 400 - 500 ns \| 480 \| ### \| 500 - 512 ns \| 315 \| ## \| 512 - ... ns \| 16 \| \| # statistics (in nsec) total time: 2448936 avg time: 387 max time: 3285 min time: 82 count: 6328 root@number:~# Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250107224352.1128669-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Ian Rogers	05efa0ab01	perf evsel: Improve the evsel__open_strerror() for EBUSY The existing EBUSY strerror message is: The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (intel_bts//). "dmesg \| grep -i perf" may provide additional information. The dmesg won't be useful. What is more useful is knowing what processes are potentially using the PMU, which some procfs scanning can reveal. When parallel testing tests/shell/stat_all_pmu.sh this yields: Testing intel_bts// Error: The PMU intel_bts counters are busy and in use by another process. Possible processes: 2585882 perf list 2585902 perf list -j -o /tmp/__perf_test.list_output.json.KF9MY 2585904 perf list `2585911` perf record -e task-clock --filter period > 1 -o /dev/null --quiet true 2585912 perf list 2585915 perf list 2586042 /tmp/perf/perf record -asdg -e cpu-clock -o /tmp/perftool-testsuite_report.dIF/perf_report/perf.data -- sleep 2 2589078 perf record -g -e task-clock:u -o - perf test -w noploop 2589148 /tmp/perf/perf record --control=fifo:control,ack -e cpu-clock -m 1 sleep 10 2589379 perf --buildid-dir /tmp/perf.debug.Umx record --buildid-all -o /tmp/perf.data.YBm /tmp/perf.ex.MD5.ZQW 2589568 perf record -o /tmp/__perf_test.program.mtcZH/perf.data --branch-filter any,save_type,u -- perf test -w brstack 2589649 perf record --per-thread -o /tmp/__perf_test.perf.data.5d3dc perf test -w thloop 2589898 perf record -o /tmp/perf-test-script.BX2b27Dcnj/pp-perf.data --sample-cpu uname Which gets a little closer to finding the issue. Committer testing: root@number:~# root@number:~# grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-14700K root@number:~# Before: root@number:~# perf stat -e intel_bts// & [1] 197954 root@number:~# perf test "perf all PMU test" 124: perf all PMU test : FAILED! root@number:~# perf test -v "perf all PMU test" \|& tail Testing i915/vecs0-busy/ Testing i915/vecs0-sema/ Testing i915/vecs0-wait/ Testing intel_bts// Unexpected signal in main Error: The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (intel_bts//). "dmesg \| grep -i perf" may provide additional information. ---- end(-1) ---- 124: perf all PMU test : FAILED! root@number:~# After: root@number:~# perf stat -e intel_bts// & [1] 200195 root@number:~# perf test "perf all PMU test" 123: perf all PMU test : FAILED! root@number:~# perf test -v "perf all PMU test" \|& tail Testing i915/vecs0-wait/ Testing intel_bts// Unexpected signal in main Error: The PMU intel_bts counters are busy and in use by another process. Possible processes: 200195 perf stat -e intel_bts// 2319766 /root/bin/perf top --stdio ---- end(-1) ---- 123: perf all PMU test : FAILED! root@number:~# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ze Gao <zegao2021@gmail.com> Change-Id: Ie1ed8688286c44e8f44a35e98fed8be3e2a344df Link: https://lore.kernel.org/r/20241106003007.2112584-1-ctshao@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Arnaldo Carvalho de Melo	d52af4b8c6	perf tests shell task_analyzer: Run this test exclusively When running in the now default parallel mode this test has been frequently failing, while when running exclusively, on a quiet system, it passes. Since its expectations were established when serial testing was the norm, mark it as exclusive to get this kind of resunt: root@x1:~# perf test 106 106: perf script task-analyzer tests : Ok root@x1:~# set -o vi root@x1:~# perf stat --null --repeat 10 perf test 106 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok Performance counter stats for 'perf test 106' (10 runs): 4.8872 +- 0.0179 seconds time elapsed ( +- 0.37% ) root@x1:~# Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Charlie Jenkins	0f9ad973b0	perf tests code-reading: Handle change in objdump output from binutils >= 2.41 on riscv After binutils commit e43d876 which was first included in binutils 2.41, riscv no longer supports dumping in the middle of instructions. Increase the objdump window by 2-bytes to ensure that any instruction that sits on the boundary of the specified stop-address is not cut in half. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241219-perf_fix_riscv_obj_reading-v3-1-a7d644dcfa50@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Arnaldo Carvalho de Melo	058b38ccd2	perf top: Don't complain about lack of vmlinux when not resolving some kernel samples Recently we got a case where a kernel sample wasn't being resolved due to a bug that was not setting the end address on kernel functions implemented in assembly (see Link: tag), and then those were not being found by machine__resolve() -> map__find_symbol(). So we ended up with: # perf top --stdio PerfTop: 0 irqs/s kernel: 0% exact: 0% lost: 0/0 drop: 0/0 [cycles/P] ----------------------------------------------------------------------- Warning: A vmlinux file was not found. Kernel samples will not be resolved. ^Z [1]+ Stopped perf top --stdio # But then resolving all other kernel symbols. So just fixup the logic to only print that warning when there are no symbols in the kernel map. Fixes: `d88205db9c` ("perf dso: Add dso__has_symbols() method") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/Z3buKhcCsZi3_aGb@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:18:31 -03:00
James Clark	ed60738a9b	perf stat: Document and clarify outstate members Not all of these are "state" so separate them into two sections. Rename and document to make all clearer. Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-6-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:34:52 -03:00
James Clark	dd566687ef	perf stat: Document and simplify interval timestamps Rename 'prefix' to 'timestamp' because that's all it does, except in iostat mode where it's slightly overloaded, but still includes a timestamp. This reveals a problem with iostat and JSON mode so document this. Make it more explicit that these are printed in interval mode by changing 'if (prefix)' to 'if (interval)' which reveals an unnecessary 'else if (... && !interval)' which can be removed. Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-5-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:34:21 -03:00
James Clark	d226f434fb	perf stat: Remove empty new_line_metric function Despite the name new_line_metric doesn't make a new line, it actually does nothing. Change it to NULL to avoid confusion. Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-4-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:33:29 -03:00
James Clark	9f1df75509	perf stat: Also hide metric-units from JSON when event didn't run We decided to hide NULL metric-units rather than showing it as "(null)" when a dependent event for a metric doesn't exist. But on hybrid systems if the process doesn't hit a PMU you get an empty string metric unit instead. To make it consistent change all empty strings to NULL. Note that metric-threshold is already hidden in this case without this change. Where a process only runs on cpu_core and never hits cpu_atom: Before: $ perf stat -j -- true ... {"counter-value" : "<not counted>", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 0, "pcnt-running" : 0.00, "metric-value" : "0.000000", "metric-unit" : ""} {"counter-value" : "6326.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 293786, "pcnt-running" : 100.00, "metric-value" : "3.553394", "metric-unit" : "of all branches", "metric-threshold" : "good"} ... After: ... {"counter-value" : "<not counted>", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 0, "pcnt-running" : 0.00} {"counter-value" : "5778.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 282240, "pcnt-running" : 100.00, "metric-value" : "3.226797", "metric-unit" : "of all branches", "metric-threshold" : "good"} ... Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-3-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:30:21 -03:00
James Clark	967364894e	perf stat: Fix trailing comma when there is no metric unit Now that printing metric-value and metric-unit is optional, print_running_json() shouldn't add the comma in case it becomes trailing. Replace all manual JSON comma stuff with a json_out() function that uses the existing os->first tracking and auto inserts a comma if it's needed. Update the test to handle that two of the fields can be missing. This fixes the following test failure on Cortex A57 where the branch misses metric is missing a required event: $ perf test -vvv "json output" 106: perf stat JSON output linter: --- start --- test child forked, pid 665682 Checking json output: no args Test failed for input: {"counter-value" : "3112.000000", "unit" : "", "event" : "armv8_pmuv3_1/branch-misses/", "event-runtime" : 20699340, "pcnt-running" : 100.00, } ... json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 12 column 144 (char 2109) ---- end(-1) ---- 106: perf stat JSON output linter : FAILED! Fixes: `e1cc918b6c` ("perf stat: Drop metric-unit if unit is NULL") Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-2-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:20:43 -03:00
Howard Chu	00c640595e	perf docs: Add documentation for --force-btf option The --force-btf option is intended for debugging purposes and is currently undocumented. Add documentation for it. Committer notes: We need a follow up patch expanding on what can be done via BTF and what isn't possible and thus needs further work to convert kernel C source code into tables that can then be associated with syscall integer args and struct members, as discussed in: https://lore.kernel.org/all/20241215190712.787847-3-howardchu95@gmail.com/T/#mcfbba653200775c59c730705229a49b34a153db7 Signed-off-by: Howard Chu <howardchu95@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20241215190712.787847-3-howardchu95@gmail.com Link: https://lore.kernel.org/all/20241215190712.787847-3-howardchu95@gmail.com/T/#mcfbba653200775c59c730705229a49b34a153db7 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:19:26 -03:00
Howard Chu	0255338d69	perf trace: Add tests for BTF general augmentation Currently, we only have 'perf trace' augmentation tests for enum arguments. This patch adds tests for more general syscall arguments, such as struct pointers, strings, and buffers. These tests utilize the 'perf config' system to configure 'the perf trace' output, as suggested by Arnaldo Carvalho de Melo <acme@kernel.org>. Committer testing: root@number:~# perf test "BTF general" 109: perf trace BTF general tests : Ok root@number:~# perf test -v "BTF general" 109: perf trace BTF general tests : Ok root@number:~# perf test -vv "BTF general" 109: perf trace BTF general tests: --- start --- test child forked, pid 1410451 Checking if vmlinux BTF exists Testing perf trace's string augmentation Testing perf trace's buffer augmentation Testing perf trace's struct augmentation ---- end(0) ---- 109: perf trace BTF general tests : Ok root@number:~# It still fails sometimes, for instance when tested with: root@number:~# perf stat --null -r 10 perf test "BTF general" 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : FAILED! 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : FAILED! 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok Performance counter stats for 'perf test BTF general' (10 runs): 2.148 +- 0.293 seconds time elapsed ( +- 13.63% ) root@number:~# But we can go on from here and fix things up with followup patches. Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Howard Chu <howardchu95@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20241215190712.787847-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:18:11 -03:00
Dr. David Alan Gilbert	e5de3f9da5	perf path: Remove unused is_executable_file() is_executable_file() has been unused since 2022's commit `7391db6459` ("perf test: Refactor shell tests allowing subdirs") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241222215831.283248-1-linux@treblig.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Ian Rogers	2f4847b5d6	perf values: Use evsel rather than evsel->idx An evsel idx may not be stable due to sorting, evlist removal, etc. Avoid use of the idx where the evsel itself can be used to avoid these problems. This removed 1 values array and duplicated evsel name strings. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Chen Ni <nichen@iscas.ac.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241114230713.330701-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Ian Rogers	2f0539fa02	perf stream: Use evsel rather than evsel->idx An evsel idx may not be stable due to sorting, evlist removal, etc. Avoid use of the idx where the evsel itself can be used to avoid these problems. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Chen Ni <nichen@iscas.ac.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241114230713.330701-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Ian Rogers	26f45ec8f0	perf jevents: Provide better path information for broken JSON If the JSON input to jevents.py is broken it can be problematic to work out which particular JSON file is broken. When processing files catch exceptions that occur that re-raise the exception with path details added. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20241114172309.840241-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Namhyung Kim	91a5bffa56	perf lock contention: Handle slab objects in -L/--lock-filter option This is to filter lock contention from specific slab objects only. Like in the lock symbol output, we can use '&' prefix to filter slab object names. root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 3 14.99 us 14.44 us 5.00 us ffffffff851c0940 pack_mutex (mutex) 2 2.75 us 2.56 us 1.38 us ffff98d7031fb498 &task_struct (mutex) 4 1.42 us 557 ns 355 ns ffff98d706311400 &kmalloc-cg-512 (mutex) 2 953 ns 714 ns 476 ns ffffffff851c3620 delayed_uprobe_lock (mutex) 1 929 ns 929 ns 929 ns ffff98d7031fb538 &task_struct (mutex) 3 561 ns 210 ns 187 ns ffffffff84a8b3a0 text_mutex (mutex) 1 479 ns 479 ns 479 ns ffffffff851b4cf8 tracepoint_srcu_srcu_usage (mutex) 2 320 ns 195 ns 160 ns ffffffff851cf840 pcpu_alloc_mutex (mutex) 1 212 ns 212 ns 212 ns ffff98d7031784d8 &signal_cache (mutex) 1 177 ns 177 ns 177 ns ffffffff851b4c28 tracepoint_srcu_srcu_usage (mutex) With the filter, it can show contentions from the task_struct only. root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl -L '&task_struct' sleep 1 contended total wait max wait avg wait address symbol 2 1.97 us 1.71 us 987 ns ffff98d7032fd658 &task_struct (mutex) 1 1.20 us 1.20 us 1.20 us ffff98d7032fd6f8 &task_struct (mutex) It can work with other aggregation mode: root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -ab -L '&task_struct' sleep 1 contended total wait max wait avg wait type caller 1 25.10 us 25.10 us 25.10 us mutex perf_event_exit_task+0x39 1 21.60 us 21.60 us 21.60 us mutex futex_exit_release+0x21 1 5.56 us 5.56 us 5.56 us mutex futex_exec_release+0x21 Committer testing: root@number:~# perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 1 20.80 us 20.80 us 20.80 us ffff9d417fbd65d0 (spinlock) 8 12.85 us 2.41 us 1.61 us ffff9d415eeb6a40 rq_lock (spinlock) 1 2.55 us 2.55 us 2.55 us ffff9d415f636a40 rq_lock (spinlock) 7 1.92 us 840 ns 274 ns ffff9d39c2cbc8c4 (spinlock) 1 1.23 us 1.23 us 1.23 us ffff9d415fb36a40 rq_lock (spinlock) 2 928 ns 738 ns 464 ns ffff9d39c1fa6660 &kmalloc-rnd-14-192 (rwlock) 4 788 ns 252 ns 197 ns ffffffffb8608a80 jiffies_lock (spinlock) 1 304 ns 304 ns 304 ns ffff9d39c2c979c4 (spinlock) 1 216 ns 216 ns 216 ns ffff9d3a0225c660 &kmalloc-rnd-14-192 (rwlock) 1 89 ns 89 ns 89 ns ffff9d3a0adbf3e0 &kmalloc-rnd-14-192 (rwlock) 1 61 ns 61 ns 61 ns ffff9d415f9b6a40 rq_lock (spinlock) root@number:~# uname -r 6.13.0-rc2 root@number:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Namhyung Kim	0c631ef07c	perf lock contention: Resolve slab object name using BPF The bpf_get_kmem_cache() kfunc can return an address of the slab cache (kmem_cache). As it has the name of the slab cache from the iterator, we can use it to symbolize some dynamic kernel locks in a slab. Before: root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 2 3.34 us 2.87 us 1.67 us ffff9d7800ad9600 (mutex) 2 2.16 us 1.93 us 1.08 us ffff9d7804b992d8 (mutex) 4 1.37 us 517 ns 343 ns ffff9d78036e6e00 (mutex) 1 1.27 us 1.27 us 1.27 us ffff9d7804b99378 (mutex) 2 845 ns 599 ns 422 ns ffffffff9e1c3620 delayed_uprobe_lock (mutex) 1 845 ns 845 ns 845 ns ffffffff9da0b280 jiffies_lock (spinlock) 2 377 ns 259 ns 188 ns ffffffff9e1cf840 pcpu_alloc_mutex (mutex) 1 305 ns 305 ns 305 ns ffffffff9e1b4cf8 tracepoint_srcu_srcu_usage (mutex) 1 295 ns 295 ns 295 ns ffffffff9e1c0940 pack_mutex (mutex) 1 232 ns 232 ns 232 ns ffff9d7804b7d8d8 (mutex) 1 180 ns 180 ns 180 ns ffffffff9e1b4c28 tracepoint_srcu_srcu_usage (mutex) 1 165 ns 165 ns 165 ns ffffffff9da8b3a0 text_mutex (mutex) After: root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex) 1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex) 4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex) 2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex) 3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex) 3 421 ns 164 ns 140 ns ffffffffa3a8b3a0 text_mutex (mutex) 1 409 ns 409 ns 409 ns ffffffffa41b4cf8 tracepoint_srcu_srcu_usage (mutex) 2 362 ns 239 ns 181 ns ffffffffa41cf840 pcpu_alloc_mutex (mutex) 1 220 ns 220 ns 220 ns ffff9d5e82b534d8 &signal_cache (mutex) 1 215 ns 215 ns 215 ns ffffffffa41b4c28 tracepoint_srcu_srcu_usage (mutex) Note that the name starts with '&' sign for slab objects to inform they are dynamic locks. It won't give the accurate lock or type names but it's still useful. We may add type info to the slab cache later to get the exact name of the lock in the type later. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Namhyung Kim	e2c4dc54cd	perf lock contention: Run BPF slab cache iterator Recently the kernel got the kmem_cache iterator to traverse metadata of slab objects. This can be used to symbolize dynamic locks in a slab. The new slab_caches hash map will have the pointer of the kmem_cache as a key and save the name and a id. The id will be saved in the flags part of the lock. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-3-namhyung@kernel.org [ Added change from Namhyung addressing review from Alexei: ] Link: https://lore.kernel.org/r/Z2dVdH3o5iF-KrWj@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:52:03 -03:00
Namhyung Kim	d8cc6da406	perf lock contention: Add and use LCB_F_TYPE_MASK This is a preparation for the later change. It'll use more bits in the flags so let's rename the type part and use the mask to extract the type. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-20 17:36:06 -03:00
Arnaldo Carvalho de Melo	efff5add20	perf script: Cache the output type Right now every time we need to figure out the type of an evsel for output purposes we do a quick sequence of ifs, but there are new cases where there is a need to do more complex iterations over multiple data structures, sso allow for caching this operation on a hole of 'struct evsel'. This should really be done on the evsel->priv area that 'perf script' sets up, but more work is needed to make sure that it is allocated when we need it, right now it is only used for conditionally, add some comments so that we move this to that 'perf script' specific area when the conditions are in place for that. Acked-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/lkml/Z2XCi3PgstSrV0SE@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-20 17:35:54 -03:00
Ian Rogers	233157785a	perf python: Correctly throw IndexError Correctly throw IndexError for out-of-bound accesses to evlist: Python 3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path.insert(0, '/tmp/perf/python') >>> import perf >>> x=perf.parse_events('cycles') >>> print(x) evlist([cycles]) >>> x[2] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: Index out of range Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-23-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	24fb6de241	perf python: Add __str__ and __repr__ functions to evsel This allows evsel to be shown in the REPL like: Python 3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path.insert(0, '/tmp/perf/python') >>> import perf >>> x=perf.parse_events('cycles,data_read') >>> print(x) evlist([cycles,uncore_imc_free_running_0/data_read/,uncore_imc_free_running_1/data_read/]) >>> x[0] evsel(cycles) >>> x[1] evsel(uncore_imc_free_running_0/data_read/) >>> x[2] evsel(uncore_imc_free_running_1/data_read/) Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-22-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	3c0401a081	perf python: Add __str__ and __repr__ functions to evlist This allows the values in the evlist to be shown in the REPL like: Python 3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path.insert(0,'/tmp/perf/python') >>> import perf >>> perf.parse_events('cycles,data_read') evlist([cycles,uncore_imc_free_running_0/data_read/,uncore_imc_free_running_1/data_read/]) Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-21-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	f081defccd	perf python: Add parse_events function Add basic parse_events function that takes a string and returns an evlist. As the python evlist is embedded in a pyrf_evlist, and the evsels are embedded in pyrf_evsels, copy the parsed data into those structs and update evsel__clone to enable this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-20-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	5c10f3b446	perf build: Remove test library from python shared object With the attr.c code moved to a shell test, there is no need to link the test code into the python dso to avoid a missing reference to test_attr__open. Drop the test code from the python library. With the bench and test code removed from the python library on my x86 debian derived laptop the python library is reduced in size by 508,712 bytes or nearly 5%. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-19-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	9cf133c25c	perf kwork: Make perf_kwork_add_work a callback perf_kwork_add_work is declared in builtin-kwork, whereas much kwork code is in util. To avoid needing to stub perf_kwork_add_work in python.c, add a callback to struct perf_kwork and initialize it in builtin-kwork to perf_kwork_add_work - this is the only struct perf_kwork. This removes the need for the stub in python.c. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-18-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	df487111bd	perf bench: Remove reference to cmd_inject Avoid `perf bench internals inject-build-id` referencing the cmd_inject sub-command that requires perf-bench to backward reference internals of builtins. Replace the reference to cmd_inject with a call to main. To avoid python.c needing to link with something providing main, drop the libperf-bench library from the python shared object. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	1a12ed09bc	perf lock: Move common lock contention code to new file Avoid references from util code to builtin-lock that require python stubs. Move the functions and related variables to util/lock-contention.c. Add max_stack_depth parameter to match_callstack_filter to avoid sharing a global variable. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-16-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	16ecb4316f	perf env: Move arch errno function to only use in env Move arch_syscalls__strerrno_function out of builtin-trace.c to env.c so that there isn't a util to builtin function call. This allows the python.c stub to be removed. Also, remove declaration/prototype from env.h and make static to reduce scope. The include is moved inside ifdefs to avoid, "defined but unused warnings". Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-15-irogers@google.com perf: perf python: Correctly throw IndexError Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	254a867b98	perf intel-pt: Remove stale build comment Commit `00a263902a` ("perf intel-pt: Use shared x86 insn decoder") removed the use of diff, so remove stale busybox comment. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	e7bb49e3f6	perf x86: Define arch_fetch_insn in NO_AUXTRACE builds archinsn.c containing arch_fetch_insn was only enabled with CONFIG_AUXTRACE, but this meant that a NO_AUXTRACE build on x86 would use the empty weak version of arch_fetch_insn - weak symbols are a frequent source of errors like this and are outside of the C specification. Change it so that archinsn.c is always built on x86 and make the weak symbol empty version of arch_fetch_insn a strong one guarded by ifdefs. arch_fetch_insn on x86 depends on insn_decode which is a function included then built into intel-pt-insn-decoder.c. intel-pt-insn-decoder.c isn't built in a NO_AUXTRACE=1 build. Separate the insn_decode function from intel-pt-insn-decoder.c by just directly compiling the relevant file. Guard this compilation to be for either always on x86 (because of the use in arch_fetch_insn) or when auxtrace is enabled. Apply the CFLAGS overrides as necessary, reducing the amount of code where warnings are disabled. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	dc7be5e4c0	perf script: Move perf_sample__sprintf_flags to trace-event-scripting.c perf_sample__sprintf_flags is used in the python C code and so needs to be in the util library rather than a builtin. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241119011644.971342-12-irogers@google.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	1ff2ca39b3	perf script: Move script_fetch_insn to trace-event-scripting.c Add native_arch as a parameter to script_fetch_insn rather than relying on the builtin-script value that won't be initialized for the dlfilter and python Context use cases. Assume both of those cases are running natively. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	04051b4a93	perf script: Move script_spec code to trace-event-scripting.c The script_spec code is referenced in util/trace-event-scripting but the list was in builtin-script, accessed via a function that required a stub function in python.c. Move all the logic to trace-event-scripting, with lookup and foreach functions exposed for builtin-script's benefit. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	9557d1562a	perf stat: Move stat_config into config.c stat_config is accessed by config.c via helper functions, but declared in builtin-stat. Move to util/config.c so that stub functions aren't needed in python.c which doesn't link against the builtin files. To avoid name conflicts change builtin-script to use the same stat_config as builtin-stat. Rename local variables in tests to avoid shadow declaration warnings. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	d927e30ca0	perf script: Move find_scripts to browser/scripts.c The only use of find_scripts is in browser/scripts.c but the definition in builtin causes linking problems requiring a stub in python.c. Move the function to allow the stub to be removed. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	f76f94dc78	perf script: Use openat for directory iteration Rewrite the directory iteration to use openat so that large character arrays aren't needed. The arrays are warned about potential buffer overflows by GCC when the code exists in a single C file. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	3f1889422a	perf kvm: Move functions used in util out of builtin The util library code is used by the python module but doesn't have access to the builtin files. Make a util/kvm-stat.c to match the kvm-stat.h file that declares the functions and move the functions there. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	702c7a4aec	perf script: Move scripting_max_stack out of builtin scripting_max_stack is used in util code which is linked into the python module. Move the variable declaration to util/trace-event-scripting.c to avoid conditional compilation. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	c027e637bb	perf python: Remove unused #include Remove unused #include of bpf-filter.h. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	b8816289ab	perf python: Constify variables and parameters Opportunistically constify variables and parameters when possible. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	e7e9943c87	perf python: Remove python 2 scripting support Python2 was deprecated 4 years ago, remove support and workarounds. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	4c7f9ee2eb	perf intel-pt: Add a test for pause / resume Add a simple sub-test to the "Miscellaneous Intel PT testing" test to check pause / resume. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	f8b301e0a4	perf intel-pt: Add documentation for pause / resume Document the use of aux-action config term and provide a simple example. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	f38ec2274c	perf intel-pt: Improve man page format Improve format of config terms and section references. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	bf66b5fd6e	perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume Display "feature is not supported" error message if aux_start_paused, aux_pause or aux_resume result in a perf_event_open() error. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	8a0f49a7f1	perf tools: Parse aux-action Add parsing for aux-action to accept "pause", "resume" or "start-paused" values. "start-paused" is valid only for AUX area events. "pause" and "resume" are valid only for events grouped with an AUX area event as the group leader. However, like with aux-output, the events will be automatically grouped if they are not currently in a group, and the AUX area event precedes the other events. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	314bf84e03	perf tools: Add aux-action config term Add a new common config term "aux-action" to use for configuring AUX area trace pause / resume. The value is a string that will be parsed in a subsequent patch. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	f3e7194756	perf tools: Add aux_start_paused, aux_pause and aux_resume Add 'struct perf_event_attr' members to support pause and resume of AUX area tracing. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Leo Yan	44b44ffd5d	perf build: Minor improvement for linking libzstd The zstd library will be automatically linked by detecting the feature libzstd. It is no need to explicitly link it for static builds, so remove the redundant linkage. It is contradictory to detect the feature libelf-zstd while the build configuration NO_LIBZSTD is set. Report an error for reminding users not to set NO_LIBZSTD. Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Nick Terrell <terrelln@fb.com> Cc: Quentin Monnet <qmo@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241215221223.293205-3-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Athira Rajeev	ea3683fda6	perf tools tests shell base_probe: Enhance print_overall_results to print summary information Currently print_overall_results prints the number of fails in the summary, example from base_probe tests in testsuite_probe: ## [ FAIL ] ## perf_probe :: test_invalid_options SUMMARY :: 11 failures found test_invalid_options contains multiple tests and out of that 11 failed. Sometimes it could happen that it is due to missing dependency in the build or environment dependency. Example, perf probe -L requires DWARF enabled. otherwise it fails as below: ./perf probe -L Error: switch `L' is not available because NO_DWARF=1 "-L" is tested as one of the option in: for opt in '-a' '-d' '-L' '-V'; do <<perf probe test>> print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "missing argument for $opt" Here -a and -d doesn't require DWARF. Similarly there are few other tests requiring DWARF. To hint the user that missing DWARF could be one issue, update print_overall_results to print a comment string along with summary hinting the possible cause. Update test_invalid_options.sh and test_line_semantics.sh to pass the info about DWARF requirement since these tests failed when perf is built without DWARF. Use the check for presence of DWARF with "perf check feature" and append the hint message based on the result. With the change: ## [ FAIL ] ## perf_probe :: test_invalid_options SUMMARY :: 11 failures found :: Some of the tests need DWARF to run Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241206135254.35727-1-atrajeev@linux.vnet.ibm.com [ Minor edits changing "dwarf" to "DWARF" as its an acronym ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:31 -03:00
Athira Rajeev	2aad2130c2	perf tools arch powerpc: Add register mask for power11 PVR in extended regs Perf tools side uses extended mask to display the platform supported register names (with -I? option) to the user and also send this mask to the kernel to capture the extended registers as part of each sample. This mask value is decided based on the processor version ( from PVR ). Add PVR value for power11 to enable capturing the extended regs as part of sample in power11. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241206135637.36166-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:31 -03:00
Namhyung Kim	a5bbe6dd69	perf ftrace latency: Fix compiler error for clang 12 I noticed this error on CentOS 8. CLANG /build/util/bpf_skel/.tmp/func_latency.bpf.o Error at line 119: Unsupport signed division for DAG: 0x55829ee68a10: i64 = sdiv 0x55829ee68bb0, 0x55829ee69090, util/bpf_skel/func_latency.bpf.c:119:17 @[ util/bpf_skel/func_latency.bpf.c:84:5 ]Please convert to unsigned div/mod. fatal error: error in backend: Cannot select: 0x55829ee68a10: i64 = sdiv 0x55829ee68bb0, 0x55829ee69090, util/bpf_skel/func_latency.bpf.c:119:17 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68bb0: i64,ch = CopyFromReg 0x55829edc9a78, Register:i64 %5, util/bpf_skel/func_latency.bpf.c:119:17 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68e20: i64 = Register %5 0x55829ee69090: i64,ch = load<(volatile dereferenceable load 4 from @bucket_range, !tbaa !160), zext from i32> 0x55829edc9a78, 0x55829ee68fc0, undef:i64, util/bpf_skel/func_latency.bpf.c:119:19 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68fc0: i64 = BPFISD::Wrapper TargetGlobalAddress:i64<i32* @bucket_range> 0, util/bpf_skel/func_latency.bpf.c:119:19 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68808: i64 = TargetGlobalAddress<i32* @bucket_range> 0, util/bpf_skel/func_latency.bpf.c:119:19 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68530: i64 = undef In function: func_end PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. It complains about sdiv which is (s64)delta / (u32)bucket_range. Let's cast the delta to u64 for division. Committer testing: Tested on: $ head -2 /etc/os-release NAME="Fedora Linux" VERSION="40 (Toolbx Container Image)" $ clang --version \|& head -1 clang version 18.1.8 (Fedora 18.1.8-1.fc40) $ root@number:~# perf ftrace latency --use-nsec --bucket-range=200 --min-latency 250 --max-latency=5000 -T switch_mm_irqs_off -a sleep 10 # DURATION \| COUNT \| GRAPH \| 0 - 250 ns \| 28 \| ##### \| 250 - 450 ns \| 12 \| ## \| 450 - 650 ns \| 10 \| # \| 650 - 850 ns \| 9 \| # \| 850 - 1050 ns \| 20 \| ### \| 1.05 - 1.25 us \| 14 \| ## \| 1.25 - 1.45 us \| 16 \| ### \| 1.45 - 1.65 us \| 8 \| # \| 1.65 - 1.85 us \| 11 \| ## \| 1.85 - 2.05 us \| 7 \| # \| 2.05 - 2.25 us \| 11 \| ## \| 2.25 - 2.45 us \| 10 \| # \| 2.45 - 2.65 us \| 7 \| # \| 2.65 - 2.85 us \| 8 \| # \| 2.85 - 3.05 us \| 7 \| # \| 3.05 - 3.25 us \| 7 \| # \| 3.25 - 3.45 us \| 10 \| # \| 3.45 - 3.65 us \| 5 \| \| 3.65 - 3.85 us \| 9 \| # \| 3.85 - 4.05 us \| 2 \| \| 4.05 - 4.25 us \| 6 \| # \| 4.25 - ... us \| 23 \| #### \| root@number:~# Fixes: `e8536dd47a` ("perf ftrace latency: Introduce --bucket-range to ask for linear bucketing") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241214002938.1027546-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:31 -03:00
Arnaldo Carvalho de Melo	055f0ce7d8	tools build: Test for presence of libtraceevent and libtracefs in test-all.c Since these are so far considered part of the basic set of libraries to be present when building perf, have then in tools/build/features/test-all.c. They were already in the FEATURE_TESTS_BASIC variable of tools/build/Makefile.feature, meaning if test-all.c builds, those features would be set as present, but then we were calling "again" (well, they were not in test-all.c, so were not really being tested) for it to be detected, fix this all up by not calling feature_check for those features but instead have them in test-all.c to be tested together with the the set of basic expected libraries. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20241213195052.914914-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:28 -03:00
Arnaldo Carvalho de Melo	dea654e34a	perf tests switch-tracking: Set this test to run exclusively This test was failing when run with the default 'perf test' mode, which is to run multiple regression tests in parallel. Since it checks system_wide mode, set it to run in exclusive mode. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/lkml/Z1yPYqYYs_isO1PJ@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-13 18:30:48 -03:00
Ravi Bangoria	4cd67bac9d	perf test: Introduce DEFINE_SUITE_EXCLUSIVE() A variant of DEFINE_SUITE() but sets ->exclusive bit for the test so the test will be executed sequentially. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20241210093449.1662-10-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-13 16:55:09 -03:00
Arnaldo Carvalho de Melo	aec95d7ce1	Merge remote-tracking branch 'torvalds/master' into perf-tools-next To get the fixes that went thru perf-tools for v6.13. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-13 11:53:27 -03:00
Levi Yun	1d18ebcfd3	perf expr: Initialize is_test value in expr__ctx_new() When expr_parse_ctx is allocated by expr_ctx_new(), expr_scanner_ctx->is_test isn't initialize, so it has garbage value. this can affects the result of expr__parse() return when it parses non-exist event literal according to garbage value. Use calloc instead of malloc in expr_ctx_new() to fix this. Fixes: `3340a08354` ("perf pmu-events: Fix testing with JEVENTS_ARCH=all") Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241108143424.819126-1-yeoreum.yun@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 16:12:37 -03:00
Jiapeng Chong	9ba3462c1c	perf tests: Fix an incorrect type in append_script() The return value from the call to readlink() is ssize_t. However, the return value is being assigned to an size_t variable 'len', so making 'len' an ssize_t. ./tools/perf/tests/tests-scripts.c:182:5-8: WARNING: Unsigned expression compared with zero: len < 0. Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=11909 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241115091527.128923-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 16:08:36 -03:00
Ruffalo Lavoisier	8791a78fb7	perf test: Remove duplicate word - Remove duplicate word, 'the'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ruffalo Lavoisier <RuffaloLavoisier@gmail.com> Cc: linux-security-module@vger.kernel.org Link: https://lore.kernel.org/r/20241120043503.80530-1-RuffaloLavoisier@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 15:55:16 -03:00
Ian Rogers	61e0a94463	perf string: Avoid undefined NULL+1 While the value NULL+1 is never used it triggers a ubsan warning. Restructure and comment the loop to avoid this. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241120065224.286813-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 15:53:36 -03:00
James Clark	7269846617	perf vendor events arm64: Update N2/V2 events from source Update using the new data [1] for these changes: * Scale some metrics like dtlb_walk_ratio to percent so they display better with Perf's 2 dp precision * Description typos, grammar and clarifications * Unnecessary metric formula brackets seem to have been removed in the source but this is not a functional change * New sve_all_percentage metric The following command was used to generate this commit: $ telemetry-solution/tools/perf_json_generator/generate.py \ tools/perf/ --telemetry-files \ telemetry-solution/data/pmu/cpu/neoverse/neoverse-v2.json:neoverse-n2-v2 [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-v2.json Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241120143739.243728-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 15:41:27 -03:00
Namhyung Kim	ad5d76aecd	perf tools: Avoid unaligned pointer operations The sample data is 64-bit aligned basically but raw data starts with 32-bit length field and data follows. In perf_event__synthesize_sample it treats the sample data as a 64-bit array. And it needs some trick to update the raw data properly. But it seems some compilers are not happy with this and the program dies siliently. I found the sample parsing test failed without any messages on affected systems. Let's update the code to use a 32-bit pointer directly and make sure the result is 64-bit aligned again. No functional changes intended. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241128010325.946897-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 15:36:46 -03:00
James Clark	434fffa926	perf probe: Fix uninitialized variable Since the linked fixes: commit, err is returned uninitialized due to the removal of "return 0". Initialize err to fix it. This fixes the following intermittent test failure on release builds: $ perf test "testsuite_probe" ... -- [ FAIL ] -- perf_probe :: test_invalid_options :: mutually exclusive options :: -L foo -V bar (output regexp parsing) Regexp not found: \"Error: switch .+ cannot be used with switch .+\" ... Fixes: `080e47b2a2` ("perf probe: Introduce quotation marks support") Tested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241211085525.519458-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-11 21:40:46 -08:00
Ian Rogers	a93a620c38	perf test expr: Fix system_tsc_freq for only x86 The refactoring of tool PMU events to have a PMU then adding the expr literals to the tool PMU made it so that the literal system_tsc_freq was only supported on x86. Update the test expectations to match - namely the parsing is x86 specific and only yields a non-zero value on Intel. Fixes: `609aa2667f` ("perf tool_pmu: Switch to standard pmu functions and json descriptions") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Closes: https://lore.kernel.org/linux-perf-users/20241022140156.98854-1-atrajeev@linux.vnet.ibm.com/ Co-developed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241205022305.158202-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-11 09:19:44 -08:00
Zhongqiu Han	03edb7020b	perf bpf: Fix two memory leakages when calling perf_env__insert_bpf_prog_info() If perf_env__insert_bpf_prog_info() returns false due to a duplicate bpf prog info node insertion, the temporary info_node and info_linear memory will leak. Add a check to ensure the memory is freed if the function returns false. Fixes: `d56354dc49` ("perf tools: Save bpf_prog_info and BTF of new BPF programs") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241205084500.823660-4-quic_zhonhan@quicinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 16:59:32 -03:00
Zhongqiu Han	a7da6c7030	perf header: Fix one memory leakage in process_bpf_prog_info() Function __perf_env__insert_bpf_prog_info() will return without inserting bpf prog info node into perf env again due to a duplicate bpf prog info node insertion, causing the temporary info_linear and info_node memory to leak. Modify the return type of this function to bool and add a check to ensure the memory is freed if the function returns false. Fixes: `606f972b13` ("perf bpf: Save bpf_prog_info information as headers to perf.data") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241205084500.823660-3-quic_zhonhan@quicinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 16:59:32 -03:00
Zhongqiu Han	875d22980a	perf header: Fix one memory leakage in process_bpf_btf() If __perf_env__insert_btf() returns false due to a duplicate btf node insertion, the temporary node will leak. Add a check to ensure the memory is freed if the function returns false. Fixes: `a70a112317` ("perf bpf: Save BTF information as headers to perf.data") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241205084500.823660-2-quic_zhonhan@quicinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 16:59:32 -03:00
Ian Rogers	7504a1c20e	perf jevents: Fix build issue in '/' in event descriptions For big string offsets we output comments for what string the offset is for. If the string contains a '/' as seen in Intel Arrowlake event descriptions, then this causes C parsing issues for the generated pmu-events.c. Catch such '/' values and escape to avoid this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20241113165558.628856-1-irogers@google.com [ Used return s.replace('/', r'\*\/') based on failure followed by request by Ian ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 16:59:09 -03:00
Veronika Molnarova	625f4de23f	perf test: Parse 'perf stat' Topdown events for aarch64 The 'perf stat' output on aarch64 machines with topdown events wasn't counted for in the 'perf stat STD output linter' test case. Add the topdown metric to the skip_metric list as it is done for topdown events on other systems. The Topdown events are also disabled on aarch64 KVM guests because the value of caps/slots is set to 0 due to the part of the system register being a stub. This prevents the metric for the topdown events from being computed, leaving the 'perf stat' topdown metric without any value at all. Add the "TopdownL1" to the skip_metric list as well to handle this possibility. Before aarch64: 100: perf stat STD output linter: --- start --- test child forked, pid 403305 Checking STD output: no args Unknown event name in TopdownL1 # 4.3 percent of slots slots_lost_misspeculation_fraction ---- end(-1) ---- 100: perf stat STD output linter : FAILED! Before aarch64 KVM: 100: perf stat STD output linter: --- start --- test child forked, pid 404671 Checking STD output: no args Unknown event name in TopdownL1 ---- end(-1) ---- 100: perf stat STD output linter : FAILED! After: 100: perf stat STD output linter: --- start --- test child forked, pid 404777 Checking STD output: no args [Success] Checking STD output: system wide [Success] Checking STD output: interval [Success] Checking STD output: per thread [Success] Checking STD output: per node [Success] Checking STD output: system wide no aggregation [Success] Checking STD output: per core [Success] Checking STD output: per cache instance [Success] Checking STD output: per cluster [Success] Checking STD output: per die [Success] Checking STD output: per socket [Success] ---- end(0) ---- 100: perf stat STD output linter : Ok Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241029144347.25651-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:44:07 -03:00
Masami Hiramatsu (Google)	b223564fe1	perf probe: Replace unacceptable characters when generating event name Replace unacceptable characters with '_' when generating event name from the probing function name. This is not for a C program. For the a C program, it will continue to remove suffixes. Note that this language checking depends on the debuginfo. So without the debuginfo, perf probe will always replaces unacceptable characters with '_'. For example. $ ./perf probe -x cro3 -D \"cro3::cmd::servo::run_show\" p:probe_cro3/cro3_cmd_servo_run_show /work/cro3/target/x86_64-unknown-linux-gnu/debug/cro3:0x197530 $ ./perf probe -x /work/go/example/outyet/main -D 'main.(*Server).poll' p:probe_main/main_Server_poll /work/go/example/outyet/main:0x353040 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173145728160.2747044.18089011235495186810.stgit@mhiramat.roam.corp.google.com [ Removed some extra tabs in the new struct fields ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:41:10 -03:00
Gabriele Monaco	690a052a6d	perf ftrace latency: Add --max-latency option This patch adds a max-latency option as discussed, in case the number of buckets is more than 22, we don't observe the setting (for now, let's say). By default or if 0 is passed, the value is automatically determined based on the number of buckets, range and minimum, so that we fill all available buffers (equivalent to the behaviour before this patch). We now get something like this: # perf ftrace latency --bucket-range=20 \ --min-latency 10 \ --max-latency=100 \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 10 us \| 1731 \| ################ \| 10 - 30 us \| 1 \| \| 30 - 50 us \| 0 \| \| 50 - 70 us \| 0 \| \| 70 - 90 us \| 0 \| \| 90 - 100 us \| 0 \| \| 100 - ... us \| 0 \| \| Note the maximum is observed also if it doesn't cover completely a full range (the second to last range is 10us long to let the last start at 100 sharp), this looks to me more sensible and eases the computations, since we don't need to account for the range while filling the buckets. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:16:40 -03:00
Arnaldo Carvalho de Melo	08b875b6bf	perf ftrace latency: Introduce --min-latency to narrow down into a latency range Things below and over will be in the first and last, outlier, buckets. Without it: # perf ftrace latency --use-nsec --use-bpf \ --bucket-range=200 \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 200 ns \| 0 \| \| 200 - 400 ns \| 44 \| \| 400 - 600 ns \| 291 \| # \| 600 - 800 ns \| 506 \| ## \| 800 - 1000 ns \| 148 \| \| 1.00 - 1.20 us \| 581 \| ## \| 1.20 - 1.40 us \| 2199 \| ########## \| 1.40 - 1.60 us \| 1048 \| #### \| 1.60 - 1.80 us \| 1448 \| ###### \| 1.80 - 2.00 us \| 1091 \| ##### \| 2.00 - 2.20 us \| 517 \| ## \| 2.20 - 2.40 us \| 318 \| # \| 2.40 - 2.60 us \| 370 \| # \| 2.60 - 2.80 us \| 271 \| # \| 2.80 - 3.00 us \| 150 \| \| 3.00 - 3.20 us \| 85 \| \| 3.20 - 3.40 us \| 48 \| \| 3.40 - 3.60 us \| 40 \| \| 3.60 - 3.80 us \| 22 \| \| 3.80 - 4.00 us \| 13 \| \| 4.00 - 4.20 us \| 14 \| \| 4.20 - ... us \| 626 \| ## \| # # perf ftrace latency --use-nsec --use-bpf \ --bucket-range=20 --min-latency=1200 \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1200 ns \| 1243 \| ##### \| 1.20 - 1.22 us \| 141 \| \| 1.22 - 1.24 us \| 202 \| \| 1.24 - 1.26 us \| 209 \| \| 1.26 - 1.28 us \| 219 \| \| 1.28 - 1.30 us \| 208 \| \| 1.30 - 1.32 us \| 245 \| # \| 1.32 - 1.34 us \| 246 \| # \| 1.34 - 1.36 us \| 224 \| # \| 1.36 - 1.38 us \| 219 \| \| 1.38 - 1.40 us \| 206 \| \| 1.40 - 1.42 us \| 190 \| \| 1.42 - 1.44 us \| 190 \| \| 1.44 - 1.46 us \| 146 \| \| 1.46 - 1.48 us \| 140 \| \| 1.48 - 1.50 us \| 125 \| \| 1.50 - 1.52 us \| 115 \| \| 1.52 - 1.54 us \| 102 \| \| 1.54 - 1.56 us \| 87 \| \| 1.56 - 1.58 us \| 90 \| \| 1.58 - 1.60 us \| 85 \| \| 1.60 - ... us \| 5487 \| ######################## \| # Now we want focus on the latencies starting at 1.2us, with a finer grained range of 20ns: This is all on a live system, so statistically interesting, but not narrowing down on the same numbers, so a 'perf ftrace latency record' seems interesting to then use all on the same snapshot of latencies. A --max-latency counterpart should come next, at first limiting the max-latency to 20 * bucket-size, as we have a fixed buckets array with 20 + 2 entries (+ for the outliers) and thus would need to make it larger for higher latencies. We also may need a way to ask for not considering the out of range values (first and last buckets) when drawing the buckets bars. Co-developed-by: Gabriele Monaco <gmonaco@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-4-acme@kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:16:27 -03:00
Arnaldo Carvalho de Melo	e8536dd47a	perf ftrace latency: Introduce --bucket-range to ask for linear bucketing In addition to showing it exponentially, using log2() to figure out the histogram index, allow for showing it linearly: The preexisting more, the default: # perf ftrace latency --use-nsec --use-bpf \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1 ns \| 0 \| \| 1 - 2 ns \| 0 \| \| 2 - 4 ns \| 0 \| \| 4 - 8 ns \| 0 \| \| 8 - 16 ns \| 0 \| \| 16 - 32 ns \| 0 \| \| 32 - 64 ns \| 0 \| \| 64 - 128 ns \| 238 \| # \| 128 - 256 ns \| 1704 \| ########## \| 256 - 512 ns \| 672 \| ### \| 512 - 1024 ns \| 4458 \| ########################## \| 1 - 2 us \| 677 \| #### \| 2 - 4 us \| 5 \| \| 4 - 8 us \| 0 \| \| 8 - 16 us \| 0 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - ... ms \| 0 \| \| # The new histogram mode: # perf ftrace latency --bucket-range=150 --use-nsec --use-bpf \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1 ns \| 0 \| \| 1 - 151 ns \| 265 \| # \| 151 - 301 ns \| 1797 \| ########### \| 301 - 451 ns \| 258 \| # \| 451 - 601 ns \| 289 \| # \| 601 - 751 ns \| 2049 \| ############# \| 751 - 901 ns \| 967 \| ###### \| 901 - 1051 ns \| 513 \| ### \| 1.05 - 1.20 us \| 114 \| \| 1.20 - 1.35 us \| 559 \| ### \| 1.35 - 1.50 us \| 189 \| # \| 1.50 - 1.65 us \| 137 \| \| 1.65 - 1.80 us \| 32 \| \| 1.80 - 1.95 us \| 2 \| \| 1.95 - 2.10 us \| 0 \| \| 2.10 - 2.25 us \| 1 \| \| 2.25 - 2.40 us \| 1 \| \| 2.40 - 2.55 us \| 0 \| \| 2.55 - 2.70 us \| 0 \| \| 2.70 - 2.85 us \| 0 \| \| 2.85 - 3.00 us \| 1 \| \| 3.00 - ... us \| 4 \| \| # Co-developed-by: Gabriele Monaco <gmonaco@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-3-acme@kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:16:01 -03:00
Arnaldo Carvalho de Melo	12115c6037	perf ftrace latency: Pass ftrace pointer to histogram routines to pass more args The ftrace->use_nsec arg is being passed to both make_historgram() and display_histogram(), since another ftrace field will be passed to those functions in a followup patch, make them look like other functions in this codebase that receive the 'struct perf_ftrace' pointer. No change in logic. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:15:55 -03:00
Ian Rogers	d4e17a322a	perf test hwmon_pmu: Fix event file location The temp directory is made and a known fake hwmon PMU created within it. Prior to this fix the events were being incorrectly written to the temp directory rather than the fake PMU directory. This didn't impact the test as the directory fd matched the wrong location, but it doesn't mirror what a hwmon PMU would actually look like. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20241206042306.1055913-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-09 15:00:26 -08:00
Ian Rogers	3f61a12b08	perf hwmon_pmu: Use openat rather than dup to refresh directory The hwmon PMU test will make a temp directory, open the directory with O_DIRECTORY then fill it with contents. As the open is before the filling the contents the later fdopendir may reflect the initial empty state, meaning no events are seen. Change to re-open the directory, rather than dup the fd, so the latest contents are seen. Minor tweaks/additions to debug messages. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20241206042306.1055913-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-09 15:00:03 -08:00
Kuan-Wei Chiu	246dfe3dc1	perf ftrace: Fix undefined behavior in cmp_profile_data() The comparison function cmp_profile_data() violates the C standard's requirements for qsort() comparison functions, which mandate symmetry and transitivity: * Symmetry: If x < y, then y > x. * Transitivity: If x < y and y < z, then x < z. When v1 and v2 are equal, the function incorrectly returns 1, breaking symmetry and transitivity. This causes undefined behavior, which can lead to memory corruption in certain versions of glibc [1]. Fix the issue by returning 0 when v1 and v2 are equal, ensuring compliance with the C standard and preventing undefined behavior. Link: https://www.qualys.com/2024/01/30/qsort.txt [1] Fixes: `0f223813ed` ("perf ftrace: Add 'profile' command") Fixes: `74ae366c37` ("perf ftrace profile: Add -s/--sort option") Cc: stable@vger.kernel.org Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: jserv@ccns.ncku.edu.tw Cc: chuang@cs.nycu.edu.tw Link: https://lore.kernel.org/r/20241209134226.1939163-1-visitorckw@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-09 13:54:08 -08:00
Ian Rogers	c95584e07b	perf test hwmon_pmu: Fix event file location The temp directory is made and a known fake hwmon PMU created within it. Prior to this fix the events were being incorrectly written to the temp directory rather than the fake PMU directory. This didn't impact the test as the directory fd matched the wrong location, but it doesn't mirror what a hwmon PMU would actually look like. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206042306.1055913-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 18:15:38 -03:00
Ian Rogers	9a4426120d	perf hwmon_pmu: Use openat rather than dup to refresh directory The hwmon PMU test will make a temp directory, open the directory with O_DIRECTORY then fill it with contents. As the open is before the filling the contents the later fdopendir may reflect the initial empty state, meaning no events are seen. Change to re-open the directory, rather than dup the fd, so the latest contents are seen. Minor tweaks/additions to debug messages. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206042306.1055913-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 18:15:30 -03:00
Ian Rogers	5e530a8287	perf tests: Enable tests disabled due to tracepoint parsing Tracepoint parsing required libtraceevent but no longer does. Remove the Build logic and #ifdefs that caused the tests not to be run. Test code that directly uses libtraceevent is still guarded. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	6c8310e838	perf evsel: Allow evsel__newtp without libtraceevent Switch from reading the tracepoint format to reading the id directly for the evsel config. This avoids the need to initialize libtraceevent, plugins, etc. It is sufficient for many tracepoint commands to work like: $ perf stat -e sched:sched_switch true To populate evsel->tp_format, do lazy initialization using libtraceevent in the evsel__tp_format function (the sys and name are saved in evsel__newtp_idx for this purpose). Reading the id should be indicative of the format failing to load, but if not an error is reported in evsel__tp_format. This could happen for a tracepoint with a format that fails to parse. As tracepoints can be parsed without libtraceevent with this, remove the associated #ifdefs in parse-events.c. By only lazily parsing the tracepoint format information it is hoped this will help improve the performance of code using tracepoints but not the format information. It also cuts down on the build and ifdef logic. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	c46d634a03	perf evsel: Add/use accessor for tp_format Add an accessor function for tp_format. Rather than search+replace uses try to use a variable and reuse it. Add additional NULL checks when accessing/using the value. Make sure the PTR_ERR is nulled out on error path in evsel__newtp_idx. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	800c93ffaf	perf trace-event: Always build trace-event-info.c trace-event-info.c has no libtraceevent dependencies, always build it and use it in builtin-record and perf_event_attr printing. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	f7264150b4	perf trace-event: Constify print arguments Capture that these functions don't mutate their input. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	925c25efca	perf env: Ensure failure broken topology file reads are always -1 encoded get_core_id returns 0 on success and a negative errno value on error. Currently the error can only be -1, but fixing this to be any errno value breaks perf: https://lore.kernel.org/lkml/Zzu4Sdebve-NXEMX@google.com/ To avoid this, make sure all error values are written as -1. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Arnaldo Carvalho de Melo	dcf900429d	perf btf: Make the sigtrap test helper to find a member by name widely available By introducing a tools/perf/util/btf.c to collect utilities not yet available via libbpf, the first being a way to find a member by name once we get the type_id for the struct. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Ian Rogers	4b8a7c0327	perf pmu: Remove use of perf_cpu_map__read() Remove use of a FILE and switch to reading a string that is then passed to perf_cpu_map__new(). Being able to remove perf_cpu_map__read() avoids duplicated parsing logic. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Ian Rogers	02b5ed8a6a	perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUS libperf exposes MAX_NR_CPUS via tools/lib/perf/include/internal/cpumap.h which is internal. The preferred dependency should be the definition in tools/perf/perf.h. Add the includes of perf.h so that MAX_NR_CPUS can be hidden in libperf. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Kyle Meyer	9a1e106550	perf: Increase MAX_NR_CPUS to 4096 Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096. Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B, cpus_runtime will increase from 81960B to 163880B, and max_entries will increase from 8192B to 16384B. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Ilkka Koskinen	9e7a00ec6a	perf arm-spe: Add support for SPE Data Source packet on AmpereOne Decode SPE Data Source packets on AmpereOne. The field is IMPDEF. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241108202946.16835-3-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Ilkka Koskinen	ccdc9e9c5e	perf arm-spe: Prepare for adding data source packet implementations for other cores Split Data Source Packet handling to prepare adding support for other implementations. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241108202946.16835-2-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Leo Yan	9eef3ec920	perf cpumap: Add checking for reference counter For the CPU map merging test, add an extra check for the reference counter before releasing the last CPU map. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241107125308.41226-4-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Leo Yan	fb953dfa66	perf cpumap: Add more tests for CPU map merging Add additional tests for CPU map merging to cover more cases. These tests include different types of arguments, such as when one CPU map is a subset of another, as well as cases with or without overlap between the two maps. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241107125308.41226-3-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Leo Yan	a9d2217556	libperf cpumap: Refactor perf_cpu_map__merge() The perf_cpu_map__merge() function has two arguments, 'orig' and 'other'. The function definition might cause confusion as it could give the impression that the CPU maps in the two arguments are copied into a new allocated structure, which is then returned as the result. The purpose of the function is to merge the CPU map 'other' into the CPU map 'orig'. This commit changes the 'orig' argument to a pointer to pointer, so the new result will be updated into 'orig'. The return value is changed to an int type, as an error number or 0 for success. Update callers and tests for the new function definition. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241107125308.41226-2-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Arnaldo Carvalho de Melo	161c3402fd	perf config: Fix trival typo 'an' -> 'can' Just a trivial typo, should be 'can', did a spell check on the rest of the file just in case, nothing more stood out. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:51:53 -03:00
Ian Rogers	d78e20c081	perf script python: Improve physical mem type resolution Previously system RAM and persistent memory were hard code matched, change so that the label of the memory region is just read from /proc/iomem. This avoids frequent N/A samples. Change the /proc/iomem reading, event processing and output so that nested entries appear and their counts count toward their parent. As labels may be repeated, include the memory ranges in the output to make it clear why, for example, "System RAM" appears twice. Before: Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- System RAM 9460 96.5% N/A 998 3.5% After: Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-105f7fffff : System RAM 36741 96.5 841400000-8416599ff : Kernel data 89 0.2 840800000-8412a6fff : Kernel rodata 60 0.2 841ebe000-8423fffff : Kernel bss 34 0.1 0-fff : Reserved 1345 3.5 100000-89dd9fff : System RAM 2 0.0 Before: Event: mem_inst_retired.any:P Memory type count percentage ---------------------------------------- ----------- ----------- System RAM 9460 90.5% N/A 998 9.5% After: Event: mem_inst_retired.any:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-105f7fffff : System RAM 9460 90.5 841400000-8416599ff : Kernel data 45 0.4 840800000-8412a6fff : Kernel rodata 19 0.2 841ebe000-8423fffff : Kernel bss 12 0.1 0-fff : Reserved 998 9.5 The code has been updated to python 3 with type hints and resolving issues reported by mypy and pylint. Tabs are swapped to spaces as preferred in PEP8, because most lines of code were modified (of this small file) and this makes pylint significantly less noisy. Committer testing: root@number:/tmp# grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-14700K root@number:/tmp# root@number:/tmp# perf script mem-phys-addr -a find / /bin /lib /lib64 /sbin Warning: 744 out of order events recorded. Event: cpu_core/mem_inst_retired.all_loads/P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-8bfbfffff : System RAM 364561 76.5 621400000-6223a6fff : Kernel rodata 10474 2.2 622400000-62283d4bf : Kernel data 4828 1.0 623304000-6237fffff : Kernel bss 1063 0.2 620000000-6213fffff : Kernel code 98 0.0 0-fff : Reserved 111480 23.4 100000-2b0ca017 : System RAM 337 0.1 2fbad000-30d92fff : System RAM 44 0.0 2c79d000-2fbabfff : System RAM 30 0.0 30d94000-316d5fff : System RAM 16 0.0 2b131a58-2c71dfff : System RAM 7 0.0 root@number:/tmp# Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241119180130.19160-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:51:53 -03:00
Arnaldo Carvalho de Melo	b2b95a2d78	perf disasm: Return a proper error when not determining the file type Before: ⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Internal error: Invalid -1 error code ⬢ [acme@toolbox a]$ After: ⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Couldn't determine the file /tmp/perf-3308868.map type. ⬢ [acme@toolbox a]$ Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/lkml/Z092D9-r_iOgwIWM@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:51:53 -03:00
Arnaldo Carvalho de Melo	176c9d1e6a	tools features: Don't check for libunwind devel files by default Since `13e17c9ff4` ("perf build: Make libunwind opt-in rather than opt-out"), so we shouldn't by default be testing for its availability at build time in tools/build/features/test-all.c. That test was designed to test the features we expect to be the most common ones in most builds, so if we test build just that file, then we assume the features there are present and will not test one by one. Removing it from test-all.c gets rid of the first impediment for test-all.c to build successfully: $ cat /tmp/build/perf-tools-next/feature/test-all.make.output In file included from test-all.c:62: test-libunwind.c:2:10: fatal error: libunwind.h: No such file or directory 2 \| #include <libunwind.h> \| ^~~~~~~~~~~~~ compilation terminated. $ We then get to: $ cat /tmp/build/perf-tools-next/feature/test-all.make.output /usr/bin/ld: cannot find -lunwind-x86_64: No such file or directory /usr/bin/ld: cannot find -lunwind: No such file or directory collect2: error: ld returned 1 exit status $ So make all the logic related to setting CFLAGS, LDFLAGS, etc for libunwind to be conditional on NO_LIBWUNWIND=1, which is now the default, now we get a faster build: $ cat /tmp/build/perf-tools-next/feature/test-all.make.output $ ldd /tmp/build/perf-tools-next/feature/test-all.bin linux-vdso.so.1 (0x00007fef04cde000) libdw.so.1 => /lib64/libdw.so.1 (0x00007fef04a49000) libpython3.12.so.1.0 => /lib64/libpython3.12.so.1.0 (0x00007fef04478000) libm.so.6 => /lib64/libm.so.6 (0x00007fef04394000) libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007fef0436c000) libtracefs.so.1 => /lib64/libtracefs.so.1 (0x00007fef04345000) libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007fef03e95000) libz.so.1 => /lib64/libz.so.1 (0x00007fef03e72000) libelf.so.1 => /lib64/libelf.so.1 (0x00007fef03e56000) libnuma.so.1 => /lib64/libnuma.so.1 (0x00007fef03e48000) libslang.so.2 => /lib64/libslang.so.2 (0x00007fef03b65000) libperl.so.5.38 => /lib64/libperl.so.5.38 (0x00007fef037c6000) libc.so.6 => /lib64/libc.so.6 (0x00007fef035d5000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fef035a0000) libzstd.so.1 => /lib64/libzstd.so.1 (0x00007fef034e1000) libbz2.so.1 => /lib64/libbz2.so.1 (0x00007fef034cd000) /lib64/ld-linux-x86-64.so.2 (0x00007fef04ce0000) libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007fef03495000) $ Fixes: `13e17c9ff4` ("perf build: Make libunwind opt-in rather than opt-out") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Z09zTztD8X8qIWCX@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:51:53 -03:00
Namhyung Kim	c33aea446b	perf tools: Fix precise_ip fallback logic Sometimes it returns other than EOPNOTSUPP for invalid precise_ip so it cannot check the error code. Let's move the fallback after the missing feature checks so that it can handle EINVAL as well. This also aligns well with the existing behavior which blindly turns off the precise_ip but we check the missing features correctly now. Fixes: `af954f76ee` ("perf tools: Check fallback error and order") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Closes: https://lore.kernel.org/oe-lkp/202411301431.799e5531-lkp@intel.com Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/Z1DV0lN8qHSysX7f@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-05 15:15:29 -08:00
Namhyung Kim	968121f0a6	perf tools: Fix build error on generated/fs_at_flags_array.c It should only have generic flags in the array but the recent header sync brought a new flags to fcntl.h and caused a build error. Let's update the shell script to exclude flags specific to name_to_handle_at(). CC trace/beauty/fs_at_flags.o In file included from trace/beauty/fs_at_flags.c:21: tools/perf/trace/beauty/generated/fs_at_flags_array.c:13:30: error: initialized field overwritten [-Werror=override-init] 13 \| [ilog2(0x002) + 1] = "HANDLE_CONNECTABLE", \| ^~~~~~~~~~~~~~~~~~~~ tools/perf/trace/beauty/generated/fs_at_flags_array.c:13:30: note: (near initialization for ‘fs_at_flags[2]’) Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241203035349.1901262-12-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Namhyung Kim	c994ac74cc	tools headers: Sync uapi/linux/prctl.h with the kernel sources To pick up the changes in this cset: `09d6775f50` riscv: Add support for userspace pointer masking `91e102e797` prctl: arch-agnostic prctl for shadow stack This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Please see tools/include/uapi/README for further details. Reviewed-by: James Clark <james.clark@linaro.org> Cc: Mark Brown <broonie@kernel.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241203035349.1901262-11-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Namhyung Kim	02116fcfd8	tools headers: Sync uapi/linux/mount.h with the kernel sources To pick up the changes in this cset: `aefff51e1c` statmount: retrieve security mount options `2f4d4503e9` statmount: add flag to retrieve unescaped options `44010543fc` fs: add the ability for statmount() to report the sb_source `ed9d95f691` fs: add the ability for statmount() to report the fs_subtype This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/mount.h include/uapi/linux/mount.h Please see tools/include/uapi/README for further details. Reviewed-by: James Clark <james.clark@linaro.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20241203035349.1901262-10-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Namhyung Kim	6d442c69cb	tools headers: Sync uapi/linux/fcntl.h with the kernel sources To pick up the changes in this cset: `c374196b2b` ("fs: name_to_handle_at() support for "explicit connectable" file handles") `95f567f81e` ("fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag") This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h Please see tools/include/uapi/README for further details. Reviewed-by: James Clark <james.clark@linaro.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20241203035349.1901262-9-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Namhyung Kim	81b483f722	tools headers: Sync xattrat syscall changes with the kernel sources To pick up the changes in this cset: `6140be90ec` ("fs/xattr: add at family syscalls") This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_32.tbl arch/x86/entry/syscalls/syscall_32.tbl diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl The arm64 changes are not included as it requires more changes in the tools. It'll be worked for the later cycle. Please see tools/include/uapi/README for further details. Reviewed-by: James Clark <james.clark@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> CC: x86@kernel.org CC: linux-mips@vger.kernel.org CC: linuxppc-dev@lists.ozlabs.org CC: linux-s390@vger.kernel.org Link: https://lore.kernel.org/r/20241203035349.1901262-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Arnaldo Carvalho de Melo	88a6e2f67c	perf machine: Initialize machine->env to address a segfault Its used from trace__run(), for the 'perf trace' live mode, i.e. its strace-like, non-perf.data file processing mode, the most common one. The trace__run() function will set trace->host using machine__new_host() that is supposed to give a machine instance representing the running machine, and since we'll use perf_env__arch_strerrno() to get the right errno -> string table, we need to use machine->env, so initialize it in machine__new_host(). Before the patch: (gdb) run trace --errno-summary -a sleep 1 <SNIP> Summary of events: gvfs-afc-volume (3187), 2 events, 0.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ pselect6 1 0 0.000 0.000 0.000 0.000 0.00% GUsbEventThread (3519), 2 events, 0.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ poll 1 0 0.000 0.000 0.000 0.000 0.00% <SNIP> Program received signal SIGSEGV, Segmentation fault. 0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478 478 if (env->arch_strerrno == NULL) (gdb) bt #0 0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478 #1 0x00000000004b75d2 in thread__dump_stats (ttrace=0x14f58f0, trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4673 #2 0x00000000004b78bf in trace__fprintf_thread (fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>, thread=0x10fa0b0, trace=0x7fffffffa5b0) at builtin-trace.c:4708 #3 0x00000000004b7ad9 in trace__fprintf_thread_summary (trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4747 #4 0x00000000004b656e in trace__run (trace=0x7fffffffa5b0, argc=2, argv=0x7fffffffde60) at builtin-trace.c:4456 #5 0x00000000004ba43e in cmd_trace (argc=2, argv=0x7fffffffde60) at builtin-trace.c:5487 #6 0x00000000004c0414 in run_builtin (p=0xec3068 <commands+648>, argc=5, argv=0x7fffffffde60) at perf.c:351 #7 0x00000000004c06bb in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:404 #8 0x00000000004c0814 in run_argv (argcp=0x7fffffffdc4c, argv=0x7fffffffdc40) at perf.c:448 #9 0x00000000004c0b5d in main (argc=5, argv=0x7fffffffde60) at perf.c:560 (gdb) After: root@number:~# perf trace -a --errno-summary sleep 1 <SNIP> pw-data-loop (2685), 1410 events, 16.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ epoll_wait 188 0 983.428 0.000 5.231 15.595 8.68% ioctl 94 0 0.811 0.004 0.009 0.016 2.82% read 188 0 0.322 0.001 0.002 0.006 5.15% write 141 0 0.280 0.001 0.002 0.018 8.39% timerfd_settime 94 0 0.138 0.001 0.001 0.007 6.47% gnome-control-c (179406), 1848 events, 20.9% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ poll 222 0 959.577 0.000 4.322 21.414 11.40% recvmsg 150 0 0.539 0.001 0.004 0.013 5.12% write 300 0 0.442 0.001 0.001 0.007 3.29% read 150 0 0.183 0.001 0.001 0.009 5.53% getpid 102 0 0.101 0.000 0.001 0.008 7.82% root@number:~# Fixes: `54373b5d53` ("perf env: Introduce perf_env__arch_strerrno()") Reported-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Michael Petlan <mpetlan@redhat.com> Tested-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/Z0XffUgNSv_9OjOi@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-03 10:07:31 -08:00
James Clark	f54cd8f43f	perf test: Don't signal all processes on system when interrupting tests This signal handler loops over all tests on ctrl-C, but it's active while the test list is being constructed. process.pid is 0, then -1, then finally set to the child pid on fork. If the Ctrl-C is received during this point a kill(-1, SIGINT) can be sent which affects all processes. Make sure the child has forked first before forwarding the signal. This can be reproduced with ctrl-C immediately after launching perf test which terminates the ssh connection. Fixes: `553d5efeb3` ("perf test: Add a signal handler to kill forked child processes") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241129151948.3199732-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-02 12:36:35 -08:00
Namhyung Kim	23c44f6c83	perf tools: Fix build-id event recording The build-id events written at the end of the record session are broken due to unexpected data. The write_buildid() writes the fixed length event first and then variable length filename. But a recent change made it write more data in the padding area accidentally. So readers of the event see zero-filled data for the next entry and treat it incorrectly. This resulted in wrong kernel symbols because the kernel DSO loaded a random vmlinux image in the path as it didn't have a valid build-id. Fixes: `ae39ba1655` ("perf inject: Fix build ID injection") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/Z0aRFFW9xMh3mqKB@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-02 12:36:20 -08:00
Linus Torvalds	b50ecc5aca	perf tools changes for v6.13 perf record ----------- * Enable leader sampling for inherited task events. It was supported only for system-wide events but the kernel started to support such a setup since v6.12. This is to reduce the number of PMU interrupts. The samples of the leader event will contain counts of other events and no samples will be generated for the other member events. $ perf record -e '{cycles,instructions}:S' ${MYPROG} perf report ----------- * Fix --branch-history option to display more branch-related information like prediction, abort and cycles which is available on Intel machines. $ perf record -bg -- perf test -w brstack $ perf report --branch-history ... # # Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage] # ........ ........................ .............. .................... ......... ..... ...... .................... # 8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - - \| ---xas_load xarray.h:171 \| \|--5.68%--xas_load xarray.c:245 (cycles:1) \| xas_load xarray.c:242 \| xas_load xarray.h:1260 (cycles:1) \| xas_descend xarray.c:146 \| xas_load xarray.c:244 (cycles:2) \| xas_load xarray.c:245 \| xas_descend xarray.c:218 (cycles:10) ... perf stat --------- * Add HWMON PMU support. The HWMON provides various system information like CPU/GPU temperature, fan speed and so on. Expose them as PMU events so that users can see the values using perf stat commands. $ perf stat -e temp_cpu,fan1 true Performance counter stats for 'true': 60.00 'C temp_cpu 0 rpm fan1 0.000745382 seconds time elapsed 0.000883000 seconds user 0.000000000 seconds sys * Display metric threshold in JSON output. Some metrics define thresholds to classify value ranges. It used to be in a different color but it won't work for JSON. Add "metric-threshold" field to the JSON that can be one of "good", "less good", "nearly bad" and "bad". # perf stat -a -M TopdownL1 -j true {"counter-value" : "18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" : 5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"} {"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"} {"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"} {"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"} {"counter-value" : "3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" : 5552708, "pcnt-running" : 100.00, } ... perf sched ---------- * Add -P/--pre-migrations option for 'timehist' sub-command to track time a task waited on a run-queue before migrating to a different CPU. $ perf sched timehist -P time cpu task name wait time sch delay run time pre-mig time [tid/pid] (msec) (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- --------- 585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000 585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000 585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000 585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000 585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000 585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000 585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000 585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000 585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001 585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000 ... Build ----- * Make libunwind opt-in (LIBUNWIND=1) rather than opt-out. The perf tools are generally built with libelf and libdw which has unwinder functionality. The libunwind support predates it and no need to have duplicate unwinders by default. * Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify it's using libdw for handling DWARF information. Internals --------- * Do not set exclude_guest bit in the perf_event_attr by default. This was causing a trouble in AMD IBS PMU as it doesn't support the bit. The bit will be set when it's needed later by the fallback logic. Also update the missing feature detection logic to make sure not clear supported bits unnecessarily. * Run perf test in parallel by default and mark flaky tests "exclusive" to run them serially at the end. Some test numbers are changed but the test can complete in less than half the time. JSON vendor events ------------------ * Add AMD Zen 5 events and metrics. * Add i.MX91 and i.MX95 DDR metrics * Fix HiSilicon HIP08 Topdown metric name. * Support compat events on PowerPC. Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ0Qi3gAKCRCMstVUGiXM g6NIAP49eoSmQF40u55sJN0J7RpYd+bTgXZkahv0IUCBX98TLwEA2NrK0oUcB84C xeanq28/3JxNM/oBpsEvvB8mb/0lGwI= =FAVF -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf record: - Enable leader sampling for inherited task events. It was supported only for system-wide events but the kernel started to support such a setup since v6.12. This is to reduce the number of PMU interrupts. The samples of the leader event will contain counts of other events and no samples will be generated for the other member events. $ perf record -e '{cycles,instructions}:S' ${MYPROG} perf report: - Fix --branch-history option to display more branch-related information like prediction, abort and cycles which is available on Intel machines. $ perf record -bg -- perf test -w brstack $ perf report --branch-history ... # # Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage] # ........ ........................ .............. .................... ......... ..... ...... .................... # 8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - - \| ---xas_load xarray.h:171 \| \|--5.68%--xas_load xarray.c:245 (cycles:1) \| xas_load xarray.c:242 \| xas_load xarray.h:1260 (cycles:1) \| xas_descend xarray.c:146 \| xas_load xarray.c:244 (cycles:2) \| xas_load xarray.c:245 \| xas_descend xarray.c:218 (cycles:10) ... perf stat: - Add HWMON PMU support. The HWMON provides various system information like CPU/GPU temperature, fan speed and so on. Expose them as PMU events so that users can see the values using perf stat commands. $ perf stat -e temp_cpu,fan1 true Performance counter stats for 'true': 60.00 'C temp_cpu 0 rpm fan1 0.000745382 seconds time elapsed 0.000883000 seconds user 0.000000000 seconds sys - Display metric threshold in JSON output. Some metrics define thresholds to classify value ranges. It used to be in a different color but it won't work for JSON. Add "metric-threshold" field to the JSON that can be one of "good", "less good", "nearly bad" and "bad". # perf stat -a -M TopdownL1 -j true {"counter-value" : "18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" : 5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"} {"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"} {"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"} {"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"} {"counter-value" : "3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" : 5552708, "pcnt-running" : 100.00, } ... perf sched: - Add -P/--pre-migrations option for 'timehist' sub-command to track time a task waited on a run-queue before migrating to a different CPU. $ perf sched timehist -P time cpu task name wait time sch delay run time pre-mig time [tid/pid] (msec) (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- --------- 585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000 585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000 585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000 585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000 585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000 585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000 585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000 585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000 585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001 585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000 ... Build: - Make libunwind opt-in (LIBUNWIND=1) rather than opt-out. The perf tools are generally built with libelf and libdw which has unwinder functionality. The libunwind support predates it and no need to have duplicate unwinders by default. - Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify it's using libdw for handling DWARF information. Internals: - Do not set exclude_guest bit in the perf_event_attr by default. This was causing a trouble in AMD IBS PMU as it doesn't support the bit. The bit will be set when it's needed later by the fallback logic. Also update the missing feature detection logic to make sure not clear supported bits unnecessarily. - Run perf test in parallel by default and mark flaky tests "exclusive" to run them serially at the end. Some test numbers are changed but the test can complete in less than half the time. JSON vendor events: - Add AMD Zen 5 events and metrics. - Add i.MX91 and i.MX95 DDR metrics - Fix HiSilicon HIP08 Topdown metric name. - Support compat events on PowerPC" * tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (232 commits) perf tests: Fix hwmon parsing with PMU name test perf hwmon_pmu: Ensure hwmon key union is zeroed before use perf tests hwmon_pmu: Remove double evlist__delete() perf/test: fix perf ftrace test on s390 perf bpf-filter: Return -ENOMEM directly when pfi allocation fails perf test: Correct hwmon test PMU detection perf: Remove unused del_perf_probe_events() perf pmu: Move pmu_metrics_table__find and remove ARM override perf jevents: Add map_for_cpu() perf header: Pass a perf_cpu rather than a PMU to get_cpuid_str perf header: Avoid transitive PMU includes perf arm64 header: Use cpu argument in get_cpuid perf header: Refactor get_cpuid to take a CPU for ARM perf header: Move is_cpu_online to numa bench perf jevents: fix breakage when do perf stat on system metric perf test: Add missing __exit calls in tool/hwmon tests perf tests: Make leader sampling test work without branch event perf util: Remove kernel version deadcode perf test shell trace_exit_race: Use --no-comm to avoid cases where COMM isn't resolved perf test shell trace_exit_race: Show what went wrong in verbose mode ...	2024-11-26 14:54:00 -08:00
Ian Rogers	6d78089da9	perf tests: Fix hwmon parsing with PMU name test Incorrectly the hwmon with PMU name test didn't pass "true". Fix and address issue with hwmon_pmu__config_terms needing to load events - a load bearing assert fired. Also fix missing list deletion when putting the hwmon test PMU and lower some debug warnings to make the hwmon PMU less spammy in verbose mode. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241121000955.536930-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:38:39 -08:00
Ian Rogers	62878b400f	perf hwmon_pmu: Ensure hwmon key union is zeroed before use Non-zero values led to mismatches in testing. This was reproducible with -fsanitize=undefined. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Closes: https://lore.kernel.org/lkml/Zzdtj0PEWEX3ATwL@x1/ Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241119230033.115369-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:38:39 -08:00
Arnaldo Carvalho de Melo	870748fa1f	perf tests hwmon_pmu: Remove double evlist__delete() In the error path when failing to parse events the evlist is being deleted twice, keep the one after the out label. Fixes: `531ee0fd48` ("perf test: Add hwmon "PMU" test") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/ZzzoJNNcJJVnPCCe@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:38:27 -08:00
Thomas Richter	5f2c8f4e10	perf/test: fix perf ftrace test on s390 On s390 the perf test case ftrace sometimes fails as follows: # ./perf test ftrace 79: perf ftrace tests : FAILED! # The failure depends on the kernel .config file. Some configurations always work fine, some do not. The ftrace profile test mostly fails, because the ring buffer was not large enough, and some lines (especially the interesting ones with nanosleep in it) where dropped. To achieve success for all tested kernel configurations, enlarge the buffer to store the traces completely without wrapping. The default buffer size is too small for all kernel configurations. Set the buffer size of for the ftrace profile test to 16 MB. Output after: # ./perf test ftrace 79: perf ftrace tests : Ok # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: agordeev@linux.ibm.com Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Link: https://lore.kernel.org/r/20241119064856.641446-1-tmricht@linux.ibm.com Suggested-by: Sven Schnelle <svens@linux.ibm.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:36:44 -08:00
Hao Ge	bd077a53ad	perf bpf-filter: Return -ENOMEM directly when pfi allocation fails Directly return -ENOMEM when pfi allocation fails, instead of performing other operations on pfi. Fixes: `0fe2b18ddc` ("perf bpf-filter: Support multiple events properly") Signed-off-by: Hao Ge <gehao@kylinos.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: hao.ge@linux.dev Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20241113030537.26732-1-hao.ge@linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:36:00 -08:00
Ian Rogers	fc26637d70	perf test: Correct hwmon test PMU detection Use name to avoid potential other hwmon PMUs. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241118052638.754981-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:34:31 -08:00
Dr. David Alan Gilbert	85c60a01b8	perf: Remove unused del_perf_probe_events() del_perf_probe_events() last use was removed by commit `3d6dfae889` ("perf parse-events: Remove BPF event support") Remove it. It was the last user of probe_file__del_events(), so remove it as well. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241022002940.302946-1-linux@treblig.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 17:07:31 -03:00
Ian Rogers	8f997865ee	perf pmu: Move pmu_metrics_table__find and remove ARM override Move pmu_metrics_table__find() to the jevents.py generated pmu-events.c and remove indirection override for ARM. The movement removes perf_pmu__find_metrics_table that exists to enable the ARM override. The ARM override isn't necessary as just the CPUID, not PMU, is used in the metric table lookup. On non-ARM the CPU argument is just ignored for the CPUID, for ARM -1 is passed so that the CPUID for the first logical CPU is read. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:42:36 -03:00
Ian Rogers	0434410fa4	perf jevents: Add map_for_cpu() The PMU is no longer part of the map finding process and for metrics doesn't make sense as they lack a PMU. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:41:42 -03:00
Ian Rogers	494c403ff1	perf header: Pass a perf_cpu rather than a PMU to get_cpuid_str On ARM the cpuid is dependent on the core type of the CPU in question. The PMU was passed for the sake of the CPU map but this means in places a temporary PMU is created just to pass a CPU value. Just pass the CPU and fix up the callers. As there are no longer PMU users in header.h, shuffle forward declarations earlier to work around build failures. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:40:30 -03:00
Ian Rogers	7463ee17a7	perf header: Avoid transitive PMU includes Currently satisfied via header.h. Note, pmu.h includes parse-events.h. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:39:59 -03:00
Ian Rogers	538737da96	perf arm64 header: Use cpu argument in get_cpuid Use the cpu to read the MIDR file requested. If the "any" value (-1) is passed that keep the behavior of returning the first MIDR file that can be read. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:39:04 -03:00
Ian Rogers	cec0d6572a	perf header: Refactor get_cpuid to take a CPU for ARM ARM BIG.little has no notion of a constant CPUID for both core types. To reflect this reality, change the get_cpuid function to also pass in a possibly unused logical cpu. If the dummy value (-1) is passed in then ARM can, as currently happens, select the first logical CPU's "CPUID". The changes to ARM getcpuid happen in a follow up change. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:37:54 -03:00
Ian Rogers	c6fafe36ba	perf header: Move is_cpu_online to numa bench The helper function is only used in the NUMA benchmark as typically online CPUs are determined through perf_cpu_map__new_online_cpus(). Reduce the scope of the function for now. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:36:47 -03:00
Xu Yang	4a159e6049	perf jevents: fix breakage when do perf stat on system metric When do perf stat on sys metric, perf tool output nothing now: $ perf stat -a -M imx95_ddr_read.all -I 1000 $ This command runs on an arm64 machine and the Soc has one DDR hw pmu except one armv8_cortex_a55 pmu. Their maps show as follows: const struct pmu_events_map pmu_events_map[] = { { .arch = "arm64", .cpuid = "0x00000000410fd050", .event_table = { .pmus = pmu_events__arm_cortex_a55, .num_pmus = ARRAY_SIZE(pmu_events__arm_cortex_a55) }, .metric_table = { .pmus = NULL, .num_pmus = 0 } }, static const struct pmu_sys_events pmu_sys_event_tables[] = { { .event_table = { .pmus = pmu_events__freescale_imx95_sys, .num_pmus = ARRAY_SIZE(pmu_events__freescale_imx95_sys) }, .metric_table = { .pmus = pmu_metrics__freescale_imx95_sys, .num_pmus = ARRAY_SIZE(pmu_metrics__freescale_imx95_sys) }, .name = "pmu_events__freescale_imx95_sys", }, Currently, pmu_metrics_table__find() will return NULL when only do perf stat on sys metric. Then parse_groups() will never be called to parse sys metric_name, finally perf tool will exit directly. This should be a common problem. To fix the issue, this will keep the logic before commit `f20c15d13f` ("perf pmu-events: Remember the perf_events_map for a PMU") to return a empty metric table rather than a NULL pointer. This should be fine since the removed part just check if the table match provided metric_name. Without these code, the code in parse_groups() will also check the validity of metrci_name too. Fixes: `f20c15d13f` ("perf pmu-events: Remember the perf_events_map for a PMU") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241107162035.52206-2-irogers@google.com Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:34:15 -03:00
Ian Rogers	db26a8c9e3	perf test: Add missing __exit calls in tool/hwmon tests Address sanitizer flagged the missing parse_events_error__exit when testing on ARM. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241115201258.509477-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:30:40 -03:00
James Clark	180fd0c1ea	perf tests: Make leader sampling test work without branch event Arm a57 only has speculative branch events so this test fails there. The test doesn't depend on branch instructions so change it to instructions which is pretty much guaranteed to be everywhere. The test_branch_counter() test above already tests for the existence of the branches event and skips if its not present. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241115161600.228994-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:30:39 -03:00
Dr. David Alan Gilbert	264708b8ac	perf util: Remove kernel version deadcode fetch_kernel_version() has been unused since Ian's 2023 commit `3d6dfae889` ("perf parse-events: Remove BPF event support") Remove it, and it's helpers. I noticed there are a bunch of kernel-version macros that are also unused nearby. Also remove them. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241116155850.113129-1-linux@treblig.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:30:39 -03:00
Arnaldo Carvalho de Melo	0b687912c9	perf test shell trace_exit_race: Use --no-comm to avoid cases where COMM isn't resolved The purpose of this test is to test for races in the exit of 'perf trace' missing the last events, it was failing when the COMM wasn't resolved either because we missed some PERF_RECORD_COMM or somehow raced on getting it from procfs. Add --no-comm to the 'perf trace' command line so that we get a consistent, pid only output, which allows the test to achieve its goal. This is the output from 'perf trace --no-comm -e syscalls:sys_enter_exit_group': 0.000 21953 syscalls:sys_enter_exit_group() 0.000 21955 syscalls:sys_enter_exit_group() 0.000 21957 syscalls:sys_enter_exit_group() 0.000 21959 syscalls:sys_enter_exit_group() 0.000 21961 syscalls:sys_enter_exit_group() 0.000 21963 syscalls:sys_enter_exit_group() 0.000 21965 syscalls:sys_enter_exit_group() 0.000 21967 syscalls:sys_enter_exit_group() 0.000 21969 syscalls:sys_enter_exit_group() 0.000 21971 syscalls:sys_enter_exit_group() Now it passes: root@number:~# perf test "trace exit race" 110: perf trace exit race : Ok root@number:~# root@number:~# perf test -v "trace exit race" 110: perf trace exit race : Ok root@number:~# If we artificially make it run just 9 times instead of the 10 it runs, i.e. by manually doing: trace_shutdown_race() { for _ in $(seq 9); do that 9 is $iter, 10 in the patch, we get: root@number:~# vim ~acme/libexec/perf-core/tests/shell/trace_exit_race.sh root@number:~# perf test -v "trace exit race" --- start --- test child forked, pid 24629 Missing output, expected 10 but only got 9 ---- end(-1) ---- 110: perf trace exit race : FAILED! root@number:~# I.e. 9 'perf trace' calls produced the expected output, the inverse grep didn't show anything, so the patch provided by Howard for the previous patch kicks in and shows a more informative message. Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Peterson <benjamin@engflow.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZzdknoHqrJbojb6P@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:30:32 -03:00
Arnaldo Carvalho de Melo	7ca41faa5f	perf test shell trace_exit_race: Show what went wrong in verbose mode If it fails we need to check what was the reason, what were the lines that didn't match the expected format, so: root@number:~# perf test -v "trace exit race" --- start --- test child forked, pid 2028724 Lines not matching the expected regexp: ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group$': 0.000 :2028750/2028750 syscalls:sys_enter_exit_group() ---- end(-1) ---- 110: perf trace exit race : FAILED! root@number:~# In this case we're not resolving the process COMM for some reason and fallback to printing just the pid/tid, this will be fixed in a followup patch. Howard Chu spotted a problem with single code surrounding a regexp, that made the test always fail, but since there were some failures when I tested (COMM not being resolved in some of the results) the end inverse grep would show some lines and thus didn't notice the single quote problem. He also provided a patch to test if less than the number of expected matches took place but all of them with the expected output, in which case the inverse grep wouldn't show anything, confusing the tester. Reviewed-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Peterson <benjamin@engflow.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZzdknoHqrJbojb6P@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-15 12:31:01 -03:00
Benjamin Peterson	f72bcb92e9	perf tests: Add test for trace output loss Add a test that checks that trace output is not lost to races. This is accomplished by tracing the exit_group syscall of "true" multiple times and checking for correct output. Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241107232128.108981-3-benjamin@engflow.com [ Addressed two ShellCheck warnings ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 18:10:40 -03:00
Benjamin Peterson	1302e352b2	perf trace: Avoid garbage when not printing a syscall's arguments syscall__scnprintf_args may not place anything in the output buffer (e.g., because the arguments are all zero). If that happened in trace__fprintf_sys_enter, its fprintf would receive an unitialized buffer leading to garbage output. Fix the problem by passing the (possibly zero) bounds of the argument buffer to the output fprintf. Fixes: `a98392bb1e` ("perf trace: Use beautifiers on syscalls:sys_enter_ handlers") Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241107232128.108981-2-benjamin@engflow.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 18:06:52 -03:00
Benjamin Peterson	3fd7c36973	perf trace: Do not lose last events in a race If a perf trace event selector specifies a maximum number of events to output (i.e., "/nr=N/" syntax), the event printing handler, trace__event_handler, disables the event selector after the maximum number events are printed. Furthermore, trace__event_handler checked if the event selector was disabled before doing any work. This avoided exceeding the maximum number of events to print if more events were in the buffer before the selector was disabled. However, the event selector can be disabled for reasons other than exceeding the maximum number of events. In particular, when the traced subprocess exits, the main loop disables all event selectors. This meant the last events of a traced subprocess might be lost to the printing handler's short-circuiting logic. This nondeterministic problem could be seen by running the following many times: $ perf trace -e syscalls:sys_enter_exit_group true trace__event_handler should simply check for exceeding the maximum number of events to print rather than the state of the event selector. Fixes: `a9c5e6c1e9` ("perf trace: Introduce per-event maximum number of events property") Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241107232128.108981-1-benjamin@engflow.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 18:05:48 -03:00
Masami Hiramatsu (Google)	080e47b2a2	perf probe: Introduce quotation marks support In non-C languages, it is possible to have ':' in the function names. It is possible to escape it with backslashes, but if there are too many backslashes, it is annoying. This introduce quotation marks (`"` or `'`) support. For example, without quotes, we have to pass it as below $ perf probe -x cro3 -L "cro3\:\:cmd\:\:servo\:\:run_show" <run_show@/work/cro3/src/cmd/servo.rs:0> 0 fn run_show(args: &ArgsShow) -> Result<()> { 1 let list = ServoList::discover()?; 2 let s = list.find_by_serial(&args.servo)?; 3 if args.json { 4 println!("{s}"); With quotes, we can more naturally write the function name as below; $ perf probe -x cro3 -L \"cro3::cmd::servo::run_show\" <run_show@/work/cro3/src/cmd/servo.rs:0> 0 fn run_show(args: &ArgsShow) -> Result<()> { 1 let list = ServoList::discover()?; 2 let s = list.find_by_serial(&args.servo)?; 3 if args.json { 4 println!("{s}"); Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099116941.2431889.11609129616090100386.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Masami Hiramatsu (Google)	313026f3ce	perf string: Add strpbrk_esq() and strdup_esq() for escape and quote strpbrk_esq() and strdup_esq() are new variants for strpbrk() and strdup() which handles escaped characters and quoted strings. - strpbrk_esq() searches specified set of characters but ignores the escaped characters and quoted strings. e.g. strpbrk_esq("'quote\d' \queue quiz", "qd") returns "quiz". - strdup_esq() duplicates string but removes backslash and quotes which is used for quotation. It also keeps the string (including backslash) in the quoted part. e.g. strdup_esq("'quote\d' \queue quiz") returns "quote\d queue quiz". The (single, double) quotes in the quoted part should be escaped by backslash. In this case, strdup_esq() removes that backslash. The same quotes must be paired. If you use double quotation, you need to use the double quotation to close the quoted part. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099116045.2431889.15772916605719019533.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Masami Hiramatsu (Google)	b9e577225c	perf probe: Accept FUNC@* to specify function name explicitly In Golang, the function name will have the '.', and 'perf probe' misinterprets it as a file name. To mitigate this situation, introduce `function@*` so that user can explicitly specify that it is a function name. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099115149.2431889.13682110856853358354.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Masami Hiramatsu (Google)	47fa0f99a9	perf probe: Fix to ignore escaped characters in --lines option Use strbprk_esc() and strdup_esc() to ignore escaped characters in --lines option. This has been done for other options, but only --lines option doesn't. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099114272.2431889.4820591557298941207.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Masami Hiramatsu (Google)	e7c70ee7c9	perf probe: Fix error message for failing to find line range With --lines option, if perf-probe fails to find the specified line, it warns as "Debuginfo analysis failed." but this misleads user as the debuginfo is broken. Fix this message to "Specified source line(LINESPEC) is not found." so that user can understand the error correctly. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099113381.2431889.16263147678401426107.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Howard Chu	fe4f9b4124	perf trace: Fix tracing itself, creating feedback loops There exists a pids_filtered map in augmented_raw_syscalls.bpf.c that ceases to provide functionality after the BPF skeleton migration done in: `5e6da6be30` ("perf trace: Migrate BPF augmentation to use a skeleton") Before the migration, pid_filtered map works, courtesy of Arnaldo Carvalho de Melo <acme@kernel.org>: ⬢ [acme@toolbox perf-tools]$ git log --oneline -5 `6f769c3458` (HEAD) perf tests trace+probe_vfs_getname.sh: Accept quotes surrounding the filename `7777ac3dfe` perf test trace+probe_vfs_getname.sh: Remove stray \ before / `33d9c50621` perf script python: Add stub for PMU symbol to the python binding `e59fea47f8` perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to its long_name, type and adjust_symbols `878460e8d0` perf build: Remove -Wno-unused-but-set-variable from the flex flags when building with clang < 13.0.0 root@x1:/home/acme/git/perf-tools# perf trace -e /tmp/augmented_raw_syscalls.o -e write* --max-events=30 & [1] 180632 root@x1:/home/acme/git/perf-tools# 0.000 ( 0.051 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8 0.115 ( 0.010 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8 0.916 ( 0.068 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 246) = 246 1.699 ( 0.047 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 2.167 ( 0.041 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 2.739 ( 0.042 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 3.138 ( 0.027 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 3.477 ( 0.027 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 3.738 ( 0.023 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 3.946 ( 0.024 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 4.195 ( 0.024 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 4.212 ( 0.026 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8 4.285 ( 0.006 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8 4.445 ( 0.018 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 260) = 260 4.508 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 124) = 124 4.592 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 116) = 116 4.666 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 130) = 130 4.715 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 95) = 95 4.765 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 102) = 102 4.815 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 79) = 79 4.890 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 57) = 57 4.937 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 89) = 89 5.009 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 112) = 112 5.059 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 112) = 112 5.116 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 79) = 79 5.152 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 33) = 33 5.215 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 37) = 37 5.293 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 128) = 128 5.339 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 89) = 89 5.384 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 100) = 100 [1]+ Done perf trace -e /tmp/augmented_raw_syscalls.o -e write* --max-events=30 root@x1:/home/acme/git/perf-tools# No events for the 'perf trace' (pid 180632), i.e. no feedback loop. If we leave it running: root@x1:/home/acme/git/perf-tools# perf trace -e /tmp/augmented_raw_syscalls.o -e landlock_add_rule & [1] 181068 root@x1:/home/acme/git/perf-tools# And then look at what maps it sets up: root@x1:/home/acme/git/perf-tools# bpftool map \| grep pids_filtered -A3 1190: hash name pids_filtered flags 0x0 key 4B value 1B max_entries 64 memlock 7264B btf_id 1613 pids perf(181068) root@x1:/home/acme/git/perf-tools# And ask for dumping its contents: We see that we are _also_ setting it to filter those: root@x1:/home/acme/git/perf-tools# bpftool map dump id 1190 [{ "key": 181068, "value": 1 },{ "key": 156801, "value": 1 } ] Now testing the migration commit: perf $ git log commit `5e6da6be30` (HEAD) Author: Ian Rogers <irogers@google.com> Date: Thu Aug 10 11:48:51 2023 -0700 perf trace: Migrate BPF augmentation to use a skeleton perf $ ./perf trace -e write --max-events=10 & echo #! [1] 1808653 perf $ 0.000 ( 0.010 ms): :1808671/1808671 write(fd: 1, buf: 0x6003f5b26fc0, count: 11) = 11 0.162 ( ): perf/1808653 write(fd: 2, buf: 0x7fffc2174e50, count: 11) ... 0.174 ( ): perf/1808653 write(fd: 2, buf: 0x74ce21804563, count: 1) ... 0.184 ( ): perf/1808653 write(fd: 2, buf: 0x57b936589052, count: 5) The feedback loop is there. Keep it running, look into the bpf map: perf $ bpftool map \| grep pids_filtered 10675: hash name pids_filtered flags 0x0 perf $ bpftool map dump id 10675 [] The map is empty. Now, this commit: `64917f4df0` ("perf trace: Use heuristic when deciding if a syscall tracepoint "const char " field is really a string") Temporarily fixed the feedback loop for perf trace -e write, that's because before using the heuristic, write is hooked to sys_enter_openat: perf $ git log commit `83a0943b18` (HEAD) Author: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Thu Aug 17 12:11:51 2023 -0300 perf trace: Use the augmented_raw_syscall BPF skel only for tracing syscalls perf $ ./perf trace -e write --max-events=10 -v 2>&1 \| grep Reusing Reusing "openat" BPF sys_enter augmenter for "write" And after the heuristic fix, it's unaugmented: perf $ git log commit `64917f4df0` (HEAD) Author: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Thu Aug 17 15:14:21 2023 -0300 perf trace: Use heuristic when deciding if a syscall tracepoint "const char " field is really a string perf $ ./perf trace -e write --max-events=10 -v 2>&1 \| grep Reusing perf $ After using the heuristic, write is hooked to syscall_unaugmented, which returns 1. SEC("tp/raw_syscalls/sys_enter") int syscall_unaugmented(struct syscall_enter_args *args) { return 1; } If the BPF program returns 1, the tracepoint filter will filter it (since the tracepoint filter for perf is correctly set), but before the heuristic, when it was hooked to a sys_enter_openat(), which is a BPF program that calls bpf_perf_event_output() and writes to the buffer, it didn't get filtered, thus creating feedback loop. So switching write to unaugmented accidentally fixed the problem. But some syscalls are not so lucky, for example newfstatat: perf $ ./perf trace -e newfstatat --max-events=100 & echo #! [1] 2166948 457.718 ( ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/self/ns/mnt", statbuf: 0x7fff0132a9f0) ... 457.749 ( ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/2166950/ns/mnt", statbuf: 0x7fff0132aa80) ... 457.962 ( ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/self/ns/mnt", statbuf: 0x7fff0132a9f0) ... Currently, write is augmented by the new BTF general augmenter (which calls bpf_perf_event_output()). The problem, which luckily got fixed, resurfaced, and that’s how it was discovered. Fixes: `5e6da6be30` ("perf trace: Migrate BPF augmentation to use a skeleton") Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241030052431.2220130-1-howardchu95@gmail.com [ Check if trace->skel is non-NULL, as it is only initialized if trace->trace_syscalls is set ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:55:36 -03:00
Luo Yifan	b81bb70337	perf timechart: Remove redundant variable assignment This patch makes a minor change that removes a redundant variable assignment. The assignment before the for loop is duplicated by the initialization within the loop header. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241111095209.276332-1-luoyifan@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:27:35 -03:00
Jean-Philippe Romain	d99b312572	perf list: Fix topic and pmu_name argument order Fix function definitions to match header file declaration. Fix two callers to pass the arguments in the right order. On Intel Tigerlake, before: ``` $ perf list -j\|grep "\"Topic\""\|sort\|uniq "Topic": "cache", "Topic": "cpu", "Topic": "floating point", "Topic": "frontend", "Topic": "memory", "Topic": "other", "Topic": "pfm icl", "Topic": "pfm ix86arch", "Topic": "pfm perf_raw", "Topic": "pipeline", "Topic": "tool", "Topic": "uncore interconnect", "Topic": "uncore memory", "Topic": "uncore other", "Topic": "virtual memory", $ perf list -j\|grep "\"Unit\""\|sort\|uniq "Unit": "cache", "Unit": "cpu", "Unit": "cstate_core", "Unit": "cstate_pkg", "Unit": "i915", "Unit": "icl", "Unit": "intel_bts", "Unit": "intel_pt", "Unit": "ix86arch", "Unit": "msr", "Unit": "perf_raw", "Unit": "power", "Unit": "tool", "Unit": "uncore_arb", "Unit": "uncore_clock", "Unit": "uncore_imc_free_running_0", "Unit": "uncore_imc_free_running_1", ``` After: ``` $ perf list -j\|grep "\"Topic\""\|sort\|uniq "Topic": "cache", "Topic": "floating point", "Topic": "frontend", "Topic": "memory", "Topic": "other", "Topic": "pfm icl", "Topic": "pfm ix86arch", "Topic": "pfm perf_raw", "Topic": "pipeline", "Topic": "tool", "Topic": "uncore interconnect", "Topic": "uncore memory", "Topic": "uncore other", "Topic": "virtual memory", $ perf list -j\|grep "\"Unit\""\|sort\|uniq "Unit": "cpu", "Unit": "cstate_core", "Unit": "cstate_pkg", "Unit": "i915", "Unit": "icl", "Unit": "intel_bts", "Unit": "intel_pt", "Unit": "ix86arch", "Unit": "msr", "Unit": "perf_raw", "Unit": "power", "Unit": "tool", "Unit": "uncore_arb", "Unit": "uncore_clock", "Unit": "uncore_imc_free_running_0", "Unit": "uncore_imc_free_running_1", ``` Fixes: `e5c6109f48` ("perf list: Reorganize to use callbacks to allow honouring command line options") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Jean-Philippe Romain <jean-philippe.romain@foss.st.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Junhao He <hejunhao3@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241109025801.560378-1-irogers@google.com [ I fixed the two callers and added it to Jean-Phillippe's original change. ] Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:27:35 -03:00
Andrew Kreimer	463c203165	perf tools: Fix typos Muliplier -> Multiplier There are some typos in fprintf messages. Fix them via codespell. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108134728.25515-1-algonell@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:27:35 -03:00
Arnaldo Carvalho de Melo	a6e8a58de6	perf disasm: Allow configuring what disassemblers to use The perf tools annotation code used for a long time parsing the output of binutils's objdump (or its reimplementations, like llvm's) to then parse and augment it with samples, allow navigation, etc. More recently disassemblers from the capstone and llvm (libraries, not parsing the output of tools using those libraries to mimic binutils's objdump output) were introduced. So when all those methods are available, there is a static preference for a series of attempts of disassembling a binary, with the 'llvm, capstone, objdump' sequence being hard coded. This patch allows users to change that sequence, specifying via a 'perf config' 'annotate.disassemblers' entry which and in what order disassemblers should be attempted. As alluded to in the comments in the source code of this series, this flexibility is useful for users and developers alike, elliminating the requirement to rebuild the tool with some specific set of libraries to see how the output of disassembling would be for one of these methods. root@x1:~# rm -f ~/.perfconfig root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> symbol__disassemble: filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux, sym=update_load_avg, start=0xffffffffb6148fe0, en> annotating [0x6ff7170] /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux : [0x7407ca0] update_load_avg Disassembled with llvm annotate.disassemblers=llvm,capstone,objdump Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = capstone root@x1:~# root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> Disassembled with capstone annotate.disassemblers=capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=objdump,capstone root@x1:~# perf config annotate.disassemblers annotate.disassemblers=objdump,capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump,capstone root@x1:~# perf annotate -v --stdio2 update_load_avg Executing: objdump --start-address=0xffffffff81148fe0 \ --stop-address=0xffffffff811497aa \ -d --no-show-raw-insn -S -C "$1" Disassembled with objdump annotate.disassemblers=objdump,capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent Disassembly of section .text: ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 #define DO_DETACH 0x8 /* Update task and its cfs_rq load average / static inline void update_load_avg(struct cfs_rq cfs_rq, struct sched_entity se, int flags) { 1.61 push %r15 push %r14 1.00 push %r13 mov %edx,%r13d 1.90 push %r12 push %rbp mov %rsi,%rbp push %rbx mov %rdi,%rbx sub $0x18,%rsp } / rq->task_clock normalized against any time this cfs_rq has spent throttled / static inline u64 cfs_rq_clock_pelt(struct cfs_rq cfs_rq) { if (unlikely(cfs_rq->throttle_count)) 15.14 mov 0x1a4(%rdi),%eax root@x1:~# After adding a way to select the disassembler from the command line a 'perf test' comparing the output of the various diassemblers should be introduced, to test these codebases. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:27:35 -03:00
Arnaldo Carvalho de Melo	1f7393adf6	perf disasm: Define stubs for the LLVM and capstone disassemblers This reduces the number of ifdefs in the main symbol__disassemble() method and paves the way for allowing the user to configure the disassemblers of preference. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Aditya Bodkhe <Aditya.Bodkhe1@ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-3-acme@kernel.org [ Applied fixes from Masami Hiramatsu and Aditya Bodkhe for when capstone devel files are not available ] Link: https://lore.kernel.org/r/B78FB6DF-24E9-4A3C-91C9-535765EC0E2A@ibm.com Link: https://lore.kernel.org/r/173145729034.2747044.453926054000880254.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:20:32 -03:00
Arnaldo Carvalho de Melo	4c1d8f0547	perf disasm: Introduce symbol__disassemble_objdump() With the first disassemble method in perf, the parsing of objdump output, just like we have for llvm and capstone. This paves the way to allow the user to specify what disassemblers are preferred and to also to at some point allow building without the objdump method. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-11 14:26:37 -03:00
Ian Rogers	ddbfb6f20c	perf build: Remove PERF_HAVE_DWARF_REGS PERF_HAVE_DWARF_REGS was true when an architecture had a dwarf-regs.c file. There are no more architecture dwarf-regs.c files, selection is done using constants from the ELF file rather than conditional compilation. When removing PERF_HAVE_DWARF_REGS was the only variable in the Makefile, remove the Makefile. Add missing SPDX for RISC-V Makefile. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-21-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	3ef6b89a12	perf dwarf-regs: Remove get_arch_regstr code get_arch_regstr no longer exists so remove declaration. Associated ifs and switches are made unconditional. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-20-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	a4747c0950	perf xtensa: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. As this is the only file in the arch/xtensa/util clean up Build files. Tidy up the EM_NONE cases for xtensa in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-19-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	85567a2a8d	perf sparc: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. As this is the only file in the arch/sparc/util clean up Build files. Tidy up the EM_NONE cases for sparc in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-18-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	04150f29e2	perf sh: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. As this is the only file in the arch/sh/util clean up Build files. Tidy up the EM_NONE cases for sh in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-17-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	b232b704a7	perf s390: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for s390 in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-16-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	a90c451918	perf riscv: Remove dwarf-regs.c and add dwarf-regs-table.h The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case, and the register table is provided in a header file, the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for riscv in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-15-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	285b523c2d	perf dwarf-regs: Move powerpc dwarf-regs out of arch Move arch/powerpc/util/dwarf-regs.c to util/dwarf-regs-powerpc.c and compile in unconditionally. get_arch_regstr is redundant when EM_NONE is treated as EM_HOST so remove and update dwarf-regs.c conditions. Make get_powerpc_regs unconditionally available whwn libdw is. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-14-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	8a768a2f65	perf mips: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for mips in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-13-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	1d37bd8366	perf loongarch: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for loongarch in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	d4a0c4f221	perf dwarf-regs: Move csky dwarf-regs out of arch Move arch/csky/util/dwarf-regs.c to util/dwarf-regs-csky.c and compile in unconditionally. To avoid get_arch_regstr being duplicated, rename to get_csky_regstr and add to get_dwarf_regstr switch. Update #ifdefs to allow ABI V1 and V2 tables at the same time. Determine the table from the ELF flags. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	0c0a20ecdf	perf arm: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for arm in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	6f8e8add5a	perf arm64: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for arm64 in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	bf4e799a0a	perf dwarf-regs: Move x86 dwarf-regs out of arch Move arch/x86/util/dwarf-regs.c to util/dwarf-regs-x86.c and compile in unconditionally. To avoid get_arch_regnum being duplicated, rename to get_x86_regnum and add to get_dwarf_regnum switch. For get_arch_regstr, this was unused on x86 unless the machine type was EM_NONE. Map that case to EM_HOST and remove get_arch_regstr from dwarf-regs-x86.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	a784847c2d	perf dwarf-regs: Pass ELF flags to get_dwarf_regstr Pass a flags value as architectures like csky need the flags to determine the ABI variant. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	9fc4489a16	perf dwarf-regs: Pass accurate disassembly machine to get_dwarf_regnum Rather than pass 0/EM_NONE, use the value computed in the disasm struct arch. Switch the EM_NONE case to EM_HOST, rewriting EM_NONE if it were passed to get_dwarf_regnum. Pass a flags value as architectures like csky need the flags to determine the ABI variant. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	cd6c9dca9d	perf disasm: Add e_machine/e_flags to struct arch Currently functions like get_dwarf_regnum only work with the host architecture. Carry the elf machine and flags in struct arch so that in disassembly these can be used to allow cross platform disassembly. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	ae894b7792	perf dwarf-regs: Add EM_HOST and EF_HOST defines Computed from the build architecture defines, EM_HOST and EF_HOST give values that can be used in dwarf register lookup. Place in dwarf-regs.h so the value can be shared. Move some dwarf-regs.c constants used for EM_HOST to dwarf-regs.h. Add CSky constants that may be missing. In disasm.c add an include of dwarf-regs.h as the included arch/*/annotate/instructions.c files make use of the constants and we want the elf.h/dwarf-regs.h dependency to be explicit. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	6ac75289b2	perf dwarf-regs: Remove PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET was used for BPF prologue support which was removed in Commit `3d6dfae889` ("perf parse-events: Remove BPF event support"). The code is no longer used so remove. Remove the offset from various dwarf-regs.c tables and the dependence on ptrace.h. Rename structs starting pt_ as the ptrace derived offset is now removed. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:12 -08:00
Ian Rogers	2bf7692ead	perf bpf-prologue: Remove unused file Commit `4a73fca226` ("perf bpf-prologue: Remove unused file") missed cleaning up the header file. The code was unnecessary as Commit `3d6dfae889` ("perf parse-events: Remove BPF event support") removed building bpf-prologue.c. Fixes: `4a73fca226` ("perf bpf-prologue: Remove unused file") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:12 -08:00
Ian Rogers	6d5d90a6ab	perf docs: Document tool and hwmon events Add a few paragraphs on tool and hwmon events. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:28:03 -08:00
Ian Rogers	531ee0fd48	perf test: Add hwmon "PMU" test Based on a mix of the sysfs PMU test (for creating the reference files) and the tool PMU test, test that parsing given hwmon events with there aliases creates the expected config values. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:28:03 -08:00
Ian Rogers	654986ed5d	perf pmu: Add calls enabling the hwmon_pmu Add the base PMU calls necessary for hwmon_pmu(s) to be created/deleted and events found, listed, opened and read. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:28:03 -08:00
Ian Rogers	53cc0b351e	perf hwmon_pmu: Add a tool PMU exposing events from hwmon in sysfs Add a tool PMU for hwmon events but don't enable. The hwmon sysfs ABI is defined in Documentation/hwmon/sysfs-interface.rst. Create a PMU that reads the hwmon input and can be used in `perf stat` and metrics much as an uncore PMU can. For example, when enabled by a later patch, the following shows reading the CPU temperature and 2 fan speeds alongside the uncore frequency: ``` $ perf stat -e temp_cpu,fan1,hwmon_thinkpad/fan2/,tool/num_cpus_online/ -M UNCORE_FREQ -I 1000 1.001153138 52.00 'C temp_cpu 1.001153138 2,588 rpm fan1 1.001153138 2,482 rpm hwmon_thinkpad/fan2/ 1.001153138 8 tool/num_cpus_online/ 1.001153138 1,077,101,397 UNC_CLOCK.SOCKET # 1.08 UNCORE_FREQ 1.001153138 1,012,773,595 duration_time ... ``` The PMUs are named from /sys/class/hwmon/hwmon<num>/name and have an alias of hwmon<num>. Hwmon data is presented in multiple <type><number>_<item> files. The <type><number> is used to identify the event as is the <type> followed by the contents of the <type>_label file if it exists. The <type><number>_input file gives the data read by perf. When enabled by a later patch, in `perf list` the other hwmon <item> files are used to give a richer description, for example: ``` hwmon: temp1 [Temperature in unit acpitz named temp1. Unit: hwmon_acpitz] in0 [Voltage in unit bat0 named in0. Unit: hwmon_bat0] temp_core_0 OR temp2 [Temperature in unit coretemp named Core 0. crit=100'C,max=100'C crit_alarm=0'C. Unit: hwmon_coretemp] temp_core_1 OR temp3 [Temperature in unit coretemp named Core 1. crit=100'C,max=100'C crit_alarm=0'C. Unit: hwmon_coretemp] ... temp_package_id_0 OR temp1 [Temperature in unit coretemp named Package id 0. crit=100'C,max=100'C crit_alarm=0'C. Unit: hwmon_coretemp] temp1 [Temperature in unit iwlwifi_1 named temp1. Unit: hwmon_iwlwifi_1] temp_composite OR temp1 [Temperature in unit nvme named Composite. alarm=0'C,crit=86.85'C,max=75.85'C, min=-273.15'C. Unit: hwmon_nvme] temp_sensor_1 OR temp2 [Temperature in unit nvme named Sensor 1. max=65261.8'C,min=-273.15'C. Unit: hwmon_nvme] temp_sensor_2 OR temp3 [Temperature in unit nvme named Sensor 2. max=65261.8'C,min=-273.15'C. Unit: hwmon_nvme] fan1 [Fan in unit thinkpad named fan1. Unit: hwmon_thinkpad] fan2 [Fan in unit thinkpad named fan2. Unit: hwmon_thinkpad] ... temp_cpu OR temp1 [Temperature in unit thinkpad named CPU. Unit: hwmon_thinkpad] temp_gpu OR temp2 [Temperature in unit thinkpad named GPU. Unit: hwmon_thinkpad] curr1 [Current in unit ucsi_source_psy_usbc000_0 named curr1. max=1.5A. Unit: hwmon_ucsi_source_psy_usbc000_0] in0 [Voltage in unit ucsi_source_psy_usbc000_0 named in0. max=5V,min=5V. Unit: hwmon_ucsi_source_psy_usbc000_0] ``` As there may be multiple hwmon devices a range of PMU types are reserved for their use and to identify the PMU as belonging to the hwmon types. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:28:03 -08:00
Ian Rogers	8c329057de	perf test: Add hwmon filename parser test Filename parsing maps a hwmon filename to constituent parts enum/int parts for the hwmon config value. Add a test case for the parsing. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> [namhyung: add #include <linux/string.h> for strlcpy()] Link: https://lore.kernel.org/r/20241109003759.473460-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:27:44 -08:00
Ian Rogers	4810b761f8	perf hwmon_pmu: Add hwmon filename parser hwmon filenames have a specific encoding that will be used to give a config value. The encoding is described in: Documentation/hwmon/sysfs-interface.rst Add a function to parse the filename into consituent enums/ints that will then be amenable to config encoding. Note, things are done this way to allow mapping names to config and back without the use of hash/dynamic lookup tables. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> [namhyung: add #include <linux/string.h> for strlcpy()] Link: https://lore.kernel.org/r/20241109003759.473460-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:26:53 -08:00
Yicong Yang	35de42cdfb	perf build: Include libtraceevent headers directly indicated by pkg-config Currently the libtraceevent's found by pkg-config, which give the include path as: [root@localhost tmp]# pkg-config --cflags libtraceevent -I/usr/local/include/traceevent So we should include the libtraceevent headers directly without "traceevent/" prefix. Update all the users. Fixes: `0f0e1f4456` ("perf build: Use pkg-config for feature check for libtrace{event,fs}") Suggested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/linux-perf-users/ZyF5_Hf1iL01kldE@google.com/ Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: leo.yan@arm.com Cc: amadio@gentoo.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20241105105649.45399-1-yangyicong@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-08 22:42:57 -08:00
Steve Clevenger	e8328bf3cd	perf script python: Adjust objdump start/end per map pgoff parameter Extract map_pgoff parameter from the dictionary, and adjust start/end range passed to objdump based on the value. A zero start_addr is filtered to prevent output of dso address range check failures. This script repeatedly sees a zero value passed in for start_addr = cpu_data[str(cpu) + 'addr'] These zero values are not a new problem. The start_addr/stop_addr warning clutters the instruction trace output, hence this change. Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Cc: suzuki.poulose@arm.com Cc: james.clark@linaro.org Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: ilkka@os.amperecomputing.com Link: https://lore.kernel.org/r/21ccdd22e664bdeccb878672d4b2c0518873c1e5.1731027120.git.scclevenger@os.amperecomputing.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-08 22:42:57 -08:00
Steve Clevenger	26ec3d7cc3	perf script cs_etm: Add map_pgoff to python dictionary Extract map_pgoff parameter from the dictionary, and adjust start/end range passed to objdump based on the value. A zero start_addr is filtered to prevent output of dso address range check failures. This script repeatedly sees a zero value passed in for start_addr = cpu_data[str(cpu) + 'addr'] These zero values are not a new problem. The start_addr/stop_addr warning clutters the instruction trace output, hence this change. Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Cc: suzuki.poulose@arm.com Cc: james.clark@linaro.org Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: ilkka@os.amperecomputing.com Link: https://lore.kernel.org/r/8d9a1142dc58ffa34a000cb7b7a26055df0a37ec.1731027120.git.scclevenger@os.amperecomputing.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-08 22:42:56 -08:00
Ian Rogers	62a6d092f1	perf stat: Expand metric+unit buffer size Long metric names combined with units may exceed the metric_bf and lead to truncation. Double metric_bf in size to avoid this. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20241106004818.2174593-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-07 11:49:50 -08:00
Haiyue Wang	d8c0f8b4ee	perf tools: Add the empty-pmu-events build to .gitignore The commit `0fe881f10c` ("perf jevents: Autogenerate empty-pmu-events.c") build will generate two files, add them to .gitignore: tools/perf/pmu-events/empty-pmu-events.log tools/perf/pmu-events/test-empty-pmu-events.c Signed-off-by: Haiyue Wang <haiyuewa@163.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241106121254.2869-1-haiyuewa@163.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-07 10:51:56 -08:00
Dr. David Alan Gilbert	9ac98662db	perf: event: Remove deadcode event_format__print() last use was removed by 2017's commit `894f3f1732` ("perf script: Use event_format__fprintf()") evlist__find_tracepoint_by_id() last use was removed by 2012's commit `e60fc847ce` ("perf evlist: Remove some unused methods") evlist__set_tp_filter_pid() last use was removed by 2017's commit `dd1a50377c` ("perf trace: Introduce filter_loop_pids()") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241106144826.91728-1-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-07 10:51:56 -08:00
Benjamin Peterson	5fb8e56542	perf trace: avoid garbage when not printing a trace event's arguments trace__fprintf_tp_fields may not print any tracepoint arguments. E.g., if the argument values are all zero. Previously, this would result in a totally uninitialized buffer being passed to fprintf, which could lead to garbage on the console. Fix the problem by passing the number of initialized bytes fprintf. Fixes: `f11b2803bb` ("perf trace: Allow choosing how to augment the tracepoint arguments") Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20241103204816.7834-1-benjamin@engflow.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-05 23:27:17 -08:00
Kuan-Wei Chiu	8f0d91f410	perf tools: update expected diff for lib/list_sort.c Since there are no longer any header include differences between lib/list_sort.c and tools/lib/list_sort.c, update the expected diff in check-header_ignore_hunks accordingly. Link: https://lkml.kernel.org/r/20241012042828.471614-4-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:33 -08:00
Namhyung Kim	29bf07bc9a	perf test: Fix ftrace test with regex patterns During the parallel testing, I've noticed some ftrace test failures. It seems the regex pattern checks 100 msec of nanosleep with the error range of 10 msec. But sometimes it's affected by other processes and resulted in more time in the syscall. The following output shows that it took more than 120 msec and failed. Let's update the regex pattern so that it can allow more drifts. perf ftrace profile test # Total (us) Avg (us) Max (us) Count Function 121279.500 121279.500 121279.500 1 __x64_sys_clock_nanosleep 121278.400 121278.400 121278.400 1 common_nsleep 121277.800 121277.800 121277.800 1 hrtimer_nanosleep 121277.100 121277.100 121277.100 1 do_nanosleep 341760.289 56960.048 121273.400 6 schedule 176.200 25.171 31.616 7 scheduler_tick 0.923 0.923 0.923 1 native_smp_send_reschedule 345522.360 69104.472 345320.600 5 __x64_sys_execve 345486.585 69097.317 345312.700 5 do_execveat_common.isra.0 340730.300 340730.300 340730.300 1 bprm_execve 1.758 0.879 0.883 2 sched_mm_cid_before_execve 1.112 1.112 1.112 1 sched_mm_cid_after_execve ---- end(-1) ---- 81: perf ftrace tests : FAILED! Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241102231702.2262258-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:28:25 -08:00
Arnaldo Carvalho de Melo	a52143aa21	perf test: Remove dangling CFLAGS for removed attr.o object Since the C test wrapper for attr.py was removed we don't have an attr.o object for that CFLAGS_attr.o to apply for, remove it. Fixes: `3a447031f5` ("perf test: Remove C test wrapper for attr.py") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: James Clark <james.clark@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/ZyjbksKYnV22zmz-@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:23:26 -08:00
Charlie Jenkins	6e0e0a1863	perf tools: Add all shellcheck_log to gitignore Instead of adding specific shellcheck_log files to the gitignore, add all of them to prevent these files from cluttering the git status. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20241104-shellcheck_gitignore-v1-1-ffc179f57dc9@rivosinc.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:23:20 -08:00
Yicong Yang	d5a0a4ab4a	perf build: Add missing cflags when building with custom libtraceevent When building with custom libtraceevent, below errors occur: $ make -C tools/perf NO_LIBPYTHON=1 PKG_CONFIG_PATH=<custom libtraceevent> In file included from util/session.h:5, from builtin-buildid-list.c:17: util/trace-event.h:153:10: fatal error: traceevent/event-parse.h: No such file or directory 153 \| #include <traceevent/event-parse.h> \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ <snip similar errors of missing headers> This is because the include path is missed in the cflags. Add it. Fixes: `0f0e1f4456` ("perf build: Use pkg-config for feature check for libtrace{event,fs}") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Guilherme Amadio <amadio@gentoo.org> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20241024133236.31016-1-yangyicong@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:11:32 -08:00
Michael Petlan	c741c7b5e9	perf test: Remove cpu-list BPF cgroup counter test The cpu-list part of this testcase has proven itself to be unreliable. Sometimes, we get "<not counted>" for system.slice when pinned to CPUs 0 and 1. In such case, the test fails. Since we cannot simply guarantee that any system.slice load will run on any arbitrary list of CPUs, except the whole set of all CPUs, let's rather remove the cpu-list subtest. Fixes: `a84260e314` ("perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test") Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: vmolnaro@redhat.com Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20241101102812.576425-1-mpetlan@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:10:48 -08:00
Ian Rogers	13e17c9ff4	perf build: Make libunwind opt-in rather than opt-out Having multiple unwinding libraries makes the perf code harder to understand and we have unused/untested code paths. Perf made BPF support an opt-out rather than opt-in feature. As libbpf has a libelf dependency, elfutils that provides libelf will also provide libdw. When libdw is present perf will use libdw unwinding rather than libunwind unwinding even if libunwind support is compiled in. Rather than have libunwind built into perf and never used, explicitly disable the support and make it opt-in. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20241028193619.247727-1-irogers@google.com Closes: https://lore.kernel.org/linux-perf-users/CAP-5=fUXkp-d7gkzX4eF+nbjb2978dZsiHZ9abGHN=BN1qAcbg@mail.gmail.com/ Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 11:32:35 -08:00
Namhyung Kim	aa5c90601b	Merge 'origin/master' into perf-tools-next To get the fixes in the perf-tools branch. Resolved a conflict due to RISC-V's syscall table change. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-03 23:18:20 -08:00
Tengda Wu	d36e5b36a2	perf test: Use sqrtloop workload to test bperf event Replace `brstack` workload with `sqrtloop` workload, because `sqrtloop` workload contains fork(), which is suitable for testing the bperf event inheritance feature. Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Cc: song@kernel.org Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20241021110201.325617-3-wutengda@huaweicloud.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-01 23:31:08 -07:00
Tengda Wu	07dc3a6de3	perf stat: Support inherit events during fork() for bperf bperf has a nice ability to share PMUs, but it still does not support inherit events during fork(), resulting in some deviations in its stat results compared with perf. perf stat result: $ ./perf stat -e cycles,instructions -- ./perf test -w sqrtloop Performance counter stats for './perf test -w sqrtloop': 2,316,038,116 cycles 2,859,350,725 instructions 1.009603637 seconds time elapsed 1.004196000 seconds user 0.003950000 seconds sys bperf stat result: $ ./perf stat --bpf-counters -e cycles,instructions -- \ ./perf test -w sqrtloop Performance counter stats for './perf test -w sqrtloop': 18,762,093 cycles 23,487,766 instructions 1.008913769 seconds time elapsed 1.003248000 seconds user 0.004069000 seconds sys In order to support event inheritance, two new bpf programs are added to monitor the fork and exit of tasks respectively. When a task is created, add it to the filter map to enable counting, and reuse the `accum_key` of its parent task to count together with the parent task. When a task exits, remove it from the filter map to disable counting. After support: $ ./perf stat --bpf-counters -e cycles,instructions -- \ ./perf test -w sqrtloop Performance counter stats for './perf test -w sqrtloop': 2,316,252,189 cycles 2,859,946,547 instructions 1.009422314 seconds time elapsed 1.003597000 seconds user 0.004270000 seconds sys Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Cc: song@kernel.org Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20241021110201.325617-2-wutengda@huaweicloud.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-01 23:31:08 -07:00
James Clark	ba993e5ada	perf arm-spe: Use old behavior when opening old SPE files Since the linked commit, we stopped interpreting data source if the perf.data file doesn't have the new metadata version. This means that perf c2c will show no samples in this case. Keep the old behavior so old files can be opened, but also still show the new warning that updating might improve the decoding. Also re-write the warning to be more concise and specific to a user. Fixes: `ba5e7169e5` ("perf arm-spe: Use metadata to decide the data source feature") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Cc: Julio.Suarez@arm.com Cc: Kiel.Friedt@arm.com Cc: Ryan.Roberts@arm.com Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241029143734.291638-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-30 23:50:47 -07:00
Arnaldo Carvalho de Melo	064d569e20	perf ftrace latency: Fix unit on histogram first entry when using --use-nsec The use_nsec arg wasn't being taken into account when printing the first histogram entry, fix it: root@number:~# perf ftrace latency --use-nsec -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 ns \| 0 \| \| 2 - 4 ns \| 0 \| \| 4 - 8 ns \| 0 \| \| 8 - 16 ns \| 0 \| \| 16 - 32 ns \| 0 \| \| 32 - 64 ns \| 125 \| \| 64 - 128 ns \| 335 \| \| 128 - 256 ns \| 2155 \| #### \| 256 - 512 ns \| 9996 \| ################### \| 512 - 1024 ns \| 4958 \| ######### \| 1 - 2 us \| 4636 \| ######### \| 2 - 4 us \| 1053 \| ## \| 4 - 8 us \| 15 \| \| 8 - 16 us \| 1 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - ... ms \| 0 \| \| root@number:~# After: root@number:~# perf ftrace latency --use-nsec -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1 ns \| 0 \| \| 1 - 2 ns \| 0 \| \| 2 - 4 ns \| 0 \| \| 4 - 8 ns \| 0 \| \| 8 - 16 ns \| 0 \| \| 16 - 32 ns \| 0 \| \| 32 - 64 ns \| 19 \| \| 64 - 128 ns \| 94 \| \| 128 - 256 ns \| 2191 \| #### \| 256 - 512 ns \| 9719 \| #################### \| 512 - 1024 ns \| 5330 \| ########### \| 1 - 2 us \| 4104 \| ######## \| 2 - 4 us \| 807 \| # \| 4 - 8 us \| 9 \| \| 8 - 16 us \| 0 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - ... ms \| 0 \| \| root@number:~# Fixes: `84005bb614` ("perf ftrace latency: Add -n/--use-nsec option") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/ZyE3frB-hMXHCnMO@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-30 23:46:43 -07:00
Björn Töpel	8c0d1202ba	perf, riscv: Wire up perf trace support for RISC-V RISC-V does not currently support perf trace, since the system call table is not generated. Perform the copy/paste exercise, wiring up RISC-V system call table generation. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Anup Patel <anup@brainfault.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: linux-riscv@lists.infradead.org Cc: Atish Patra <atishp@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20241024190353.46737-1-bjorn@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-30 23:39:34 -07:00
Arnaldo Carvalho de Melo	54afc56db2	perf probe: Fix retrieval of source files from a debuginfod server When perf is linked with libdebuginfod: root@number:~# ldd ~/bin/perf \| grep debuginfod libdebuginfod.so.1 => /lib64/libdebuginfod.so.1 (0x00007ff5c3930000) root@number:~# perf check feature debuginfod debuginfod: [ on ] # HAVE_DEBUGINFOD_SUPPORT root@number:~# And we don't have a debuginfo package installed for the binary we're trying to use, vmlinux in this case as we didn't specify any using 'perf probe -x', it will use the build for the running kernel: root@number:~# perf buildid-list -k 38e927fd7799d50dbc4d99ec5e3f781b6105a6a9 root@number:~# And communicate with a debuginfo server, be it configured in a ~/.perfconfig file, excerpt from the 'perf config' man page: buildid-cache.* buildid-cache.debuginfod=URLs Specify debuginfod URLs to be used when retrieving perf.data binaries, it follows the same syntax as the DEBUGINFOD_URLS variable, like: buildid-cache.debuginfod=http://192.168.122.174:8002 Or via the DEBUGINFOD_URLS env var, as distros like fedora do by default: root@number:~# echo $DEBUGINFOD_URLS https://debuginfod.fedoraproject.org/ root@number:~# To pick and cache just what is needed, instead of requiring the manual installation of the entire kernel-debuginfo package, which is really large. It will, in this example, use the following cache files, deleted before/after this patch just to test the whole process: root@number:~# rm -f /root/.cache/debuginfod_client/38e927fd7799d50dbc4d99ec5e3f781b6105a6a9/source-a1414a5d-#usr#src#debug#kernel-6.11.4#linux-6.11.4-201.fc40.x86_64#net#ipv4#icmp.c root@number:~# rm -f /root/.cache/debuginfod_client/38e927fd7799d50dbc4d99ec5e3f781b6105a6a9/debuginfo Before this patch: root@number:~# perf probe -L icmp_rcv Failed to find source file path. Error: Failed to show lines. root@number:~# This is because 'perf probe' was using just the relative file name, in this case "net/ipv4/icmp.c", that is where the 'icmp_rcv' function is located, if we add it and comply with the debuginfo_find_source() function man page, it contacts the server, finds the necessary files, cache them locally and all works: root@number:~# perf probe -L icmp_rcv \| head <icmp_rcv@/root/.cache/debuginfod_client/38e927fd7799d50dbc4d99ec5e3f781b6105a6a9/source-a1414a5d-#usr#src#debug#kernel-6.11.4#linux-6.11.4-201.fc40.x86_64#net#ipv4#icmp.c:0> 0 int icmp_rcv(struct sk_buff skb) { 2 enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; struct rtable rt = skb_rtable(skb); struct net net = dev_net(rt->dst.dev); struct icmphdr icmph; if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { 8 struct sec_path *sp = skb_sec_path(skb); root@number:~# Acked-by: Frank Ch. Eigler <fche@redhat.com> Cc: Aaron Merey <amerey@redhat.com> Cc: Francesco Nigro <fnigro@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/ZyACsIFUETsr7-09@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:36:39 -07:00
Graham Woodward	35f5aa9ccc	perf arm-spe: Update --itrace help text The --itrace help now needs updating to reflect that the --itrace=b argument sythesises branches as well as branch misses. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-5-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:10:17 -07:00
Graham Woodward	edff8dad3f	perf arm-spe: Correctly set sample flags Set flags on all synthesized instruction and branch samples. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-4-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:10:14 -07:00
Graham Woodward	c1b67c8510	perf arm-spe: Use ARM_SPE_OP_BRANCH_ERET when synthesizing branches Instead of checking the type for just branch misses, we can instead check for the OP_BRANCH_ERET and synthesise branches as well as branch misses. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-3-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:10:10 -07:00
Graham Woodward	19966d792b	perf arm-spe: Set sample.addr to target address for instruction sample For an instruction sample, assign the target address to the field 'to_ip'. If it is a non-branch record, to_ip will be 0, presenting a non-valid target address. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-2-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:10:05 -07:00
Xu Yang	e3b2949e3f	perf vendor events arm64: Add i.MX91 DDR Performance Monitor metrics Add JSON metrics for i.MX91 DDR Performance Monitor. Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: festevam@gmail.com Cc: conor+dt@kernel.org Cc: krzk+dt@kernel.org Cc: robh@kernel.org Cc: shawnguo@kernel.org Cc: will@kernel.org Cc: james.clark@linaro.org Cc: mike.leach@linaro.org Cc: leo.yan@linux.dev Cc: linux-arm-kernel@lists.infradead.org Cc: imx@lists.linux.dev Cc: Frank.li@nxp.com Cc: john.g.garry@oracle.com Cc: kernel@pengutronix.de Cc: s.hauer@pengutronix.de Cc: devicetree@vger.kernel.org Link: https://lore.kernel.org/r/20240924061251.3387850-3-xu.yang_2@nxp.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:37:02 -07:00
Ian Rogers	7449a4d674	perf test: Sort tests placing exclusive tests last This allows a uniform test numbering even though two passes are used to execute them. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	553d5efeb3	perf test: Add a signal handler to kill forked child processes If the `perf test` process is killed the child tests continue running and may run indefinitely. Propagate SIGINT (ctrl-C) and SIGTERM (kill) signals to the running child processes so that they terminate when the parent is killed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	94d1a913bd	perf test: Make parallel testing the default Now C tests can have the "exclusive" flag to run without other tests, and shell tests can add "(exclusive)" to their description, run tests in parallel by default. Tests which flake when run in parallel can be marked exclusive to resolve the problem. Non-scientifically, the reduction on `perf test` execution time is from 8m35.890s to 3m55.115s on a Tigerlake laptop. So the tests complete in less than half the time. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	79e72f384d	perf test: Run parallel tests in two passes In pass 1 run all tests that succeed when run in parallel. In pass 2 sequentially run all remaining tests that are flagged as "exclusive". Sequential and dont_fork tests keep to run in pass 1. Read the exclusive flag from the shell test descriptions, but remove from display to avoid >100 characters. Add error handling to finish tests if starting a later test fails. Mark the task-exit test as exclusive due to issues reported-by James Clark. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	a6fffc6094	perf test: Add a signal handler around running a test Add a signal handler around running a test. If a signal occurs during the test a siglongjmp unwinds the stack and output is flushed. The global run_test_jmp_buf is either unique per forked child or not shared during sequential execution. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	2532be3d21	perf test: Tag parallel failing shell tests with "(exclusive)" Some shell tests compete for resources and so can't run with other tests, tag such tests. The "(exclusive)" stems from shared/exclusive to describe how the tests run as if holding a lock. For ARM/coresight tests: Suggested-by: James Clark <james.clark@linaro.org> Additional failing tests: Suggested-by: Namhyung Kim <namhyung@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	2c66343927	perf test: Avoid list test blocking on writing to stdout Python's json.tool will output the input json to stdout. Redirect to /dev/null to avoid blocking on stdout writes. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:57 -07:00
Ian Rogers	d50318fe00	perf test: Reduce scope of parallel variable The variable duplicates sequential but is only used for command line argument processing. Reduce scope to make the behavior clearer. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:57 -07:00
Ian Rogers	0e036dcad4	perf test: Display number of active running tests Before polling or sleeping to wait for a test to complete, print out ": Running (<num> active)" where the number of active tests is determined by iterating over the tests and seeing which return false for check_if_command_finished. The line erasing and printing out only occur if the number of runnings tests changes to avoid the line flickering excessively. Knowing tests are running allows a user to know a test is running and in parallel mode how many of the tests are waiting to complete. If color mode is disabled then avoid displaying the "Running" message as deleting the line isn't reliable. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:57 -07:00
Ian Rogers	a5384c4267	perf cap: Add __NR_capget to arch/x86 unistd As there are duplicated kernel headers in tools/include libc can pick up the wrong definitions. This was causing the wrong system call for capget in perf. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: `e25ebda78e` ("perf cap: Tidy up and improve capability testing") Closes: https://lore.kernel.org/lkml/cc7d6bdf-1aeb-4179-9029-4baf50b59342@intel.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241026055448.312247-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-28 13:04:52 -03:00
Arnaldo Carvalho de Melo	55f1b540d8	tools headers: Update the linux/unaligned.h copy with the kernel sources To pick up the changes in: `7f053812da` ("random: vDSO: minimize and simplify header includes") That required adding a copy of include/vdso/unaligned.h and its checking in tools/perf/check-headers.h. Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/include/linux/unaligned.h include/linux/unaligned.h Please see tools/include/uapi/README for further details. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Ian Rogers <irogers@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zx-uHvAbPAESofEN@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-28 12:34:28 -03:00
Li Huafei	150dab31d5	perf disasm: Fix not cleaning up disasm_line in symbol__disassemble_raw() In symbol__disassemble_raw(), the created disasm_line should be discarded before returning an error. When creating disasm_line fails, break the loop and then release the created lines. Fixes: `0b971e6bf1` ("perf annotate: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: sesse@google.com Cc: kjain@linux.ibm.com Link: https://lore.kernel.org/r/20241019154157.282038-3-lihuafei1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-23 15:36:14 -07:00
Li Huafei	908d50e50e	perf disasm: Use disasm_line__free() to properly free disasm_line symbol__disassemble_capstone_powerpc() goto the 'err' label when it failed in the loop that created disasm_line, and then used free() directly to free disasm_line. Since the structure disasm_line contains members that allocate memory dynamically, this can result in a memory leak. In fact, we can simply break the loop when it fails in the middle of the loop, and disasm_line__free() will then be called to properly free the created line. Other error paths do not need to consider freeing disasm_line. Fixes: `c5d60de181` ("perf annotate: Add support to use libcapstone in powerpc") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: sesse@google.com Cc: kjain@linux.ibm.com Link: https://lore.kernel.org/r/20241019154157.282038-2-lihuafei1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-23 15:36:06 -07:00
Li Huafei	b4e0e9a1e3	perf disasm: Use disasm_line__free() to properly free disasm_line The structure disasm_line contains members that require dynamically allocated memory and need to be freed correctly using disasm_line__free(). This patch fixes the incorrect release in symbol__disassemble_capstone(). Fixes: `6d17edc113` ("perf annotate: Use libcapstone to disassemble") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: sesse@google.com Cc: kjain@linux.ibm.com Link: https://lore.kernel.org/r/20241019154157.282038-1-lihuafei1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-23 15:35:38 -07:00
Arnaldo Carvalho de Melo	758f181589	perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT Noticed while building on a raspbian arm 32-bit system. There was also this other case, fixed by adding a missing util/stat.h with the prototypes: /tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1396:6: error: no previous prototype for ‘perf_stat__set_no_csv_summary’ [-Werror=missing-prototypes] 1396 \| void perf_stat__set_no_csv_summary(int set __maybe_unused) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1400:6: error: no previous prototype for ‘perf_stat__set_big_num’ [-Werror=missing-prototypes] 1400 \| void perf_stat__set_big_num(int set __maybe_unused) \| ^~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors In other architectures this must be building due to some lucky indirect inclusion of that header. Fixes: `9dabf40034` ("perf python: Switch module to linking libraries from building source") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZxllAtpmEw5fg9oy@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 19:29:50 -03:00
Veronika Molnarova	06a130e42a	perf test: Handle perftool-testsuite_probe failure due to broken DWARF Test case test_adding_blacklisted ends in failure if the blacklisted probe is of an assembler function with no DWARF available. At the same time, probing the blacklisted function with ASM DWARF doesn't test the blacklist itself as the failure is a result of the broken DWARF. When the broken DWARF output is encountered, check if the probed function was compiled by the assembler. If so, the broken DWARF message is expected and does not report a perf issue, else report a failure. If the ASM DWARF affected the probe, try the next probe on the blacklist. If the first 5 probes are defective due to broken DWARF, skip the test case. Fixes: `def5480d63` ("perf testsuite probe: Add test for blacklisted kprobes handling") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241017161555.236769-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 17:23:09 -03:00
Jiri Slaby	5d35634ecc	perf trace: Fix non-listed archs in the syscalltbl routines This fixes a build breakage on 32-bit arm, where the syscalltbl__id_at_idx() function was missing. Committer notes: Generating a proper syscall table from a copy of arch/arm/tools/syscall.tbl ends up being too big a patch for this rc stage, I started doing it but while testing noticed some other problems with using BPF to collect pointer args on arm7 (32-bit) will maybe continue trying to make it work on the next cycle... Fixes: `7a2fb5619c` ("perf trace: Fix iteration of syscall ids in syscalltbl->entries") Suggested-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: <jslaby@suse.cz> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/lkml/3a592835-a14f-40be-8961-c0cee7720a94@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 11:34:56 -03:00
Howard Chu	7fbff3c0e0	perf build: Change the clang check back to 12.0.1 This serves as a revert for this patch: https://lore.kernel.org/linux-perf-users/ZuGL9ROeTV2uXoSp@x1/ Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241011021403.4089793-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 11:34:56 -03:00
Howard Chu	395d38419f	perf trace augmented_raw_syscalls: Add more checks to pass the verifier Add some more checks to pass the verifier in more kernels. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241011021403.4089793-3-howardchu95@gmail.com [ Reduced the patch removing things that can be done later ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 11:34:56 -03:00
Arnaldo Carvalho de Melo	ecabac70ff	perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers In a RHEL8 kernel (4.18.0-513.11.1.el8_9.x86_64), that, as enterprise kernels go, have backports from modern kernels, the verifier complains about lack of bounds check for the index into the array of syscall arguments, on a BPF bytecode generated by clang 17, with: ; } else if (size < 0 && size >= -6) { /* buffer / 116: (b7) r1 = -6 117: (2d) if r1 > r6 goto pc-30 R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv-6 R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6=inv(id=0,umin_value=18446744073709551610,var_off=(0xffffffff00000000; 0xffffffff)) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value ; index = -(size + 1); 118: (a7) r6 ^= -1 119: (67) r6 <<= 32 120: (77) r6 >>= 32 ; aug_size = args->args[index]; 121: (67) r6 <<= 3 122: (79) r1 = (u64 )(r10 -24) 123: (0f) r1 += r6 last_idx 123 first_idx 116 regs=40 stack=0 before 122: (79) r1 = (u64 )(r10 -24) regs=40 stack=0 before 121: (67) r6 <<= 3 regs=40 stack=0 before 120: (77) r6 >>= 32 regs=40 stack=0 before 119: (67) r6 <<= 32 regs=40 stack=0 before 118: (a7) r6 ^= -1 regs=40 stack=0 before 117: (2d) if r1 > r6 goto pc-30 regs=42 stack=0 before 116: (b7) r1 = -6 R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv1 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_w=inv(id=0) R5_w=inv40 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=0) R7_w=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8_w=invP6 R9_w=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_w=map_value fp-24_r=map_value fp-32_w=inv40 fp-40=ctx fp-48=map_value fp-56_w=inv1 fp-64_w=map_value fp-72=map_value fp-80=map_value parent didn't have regs=40 stack=0 marks last_idx 110 first_idx 98 regs=40 stack=0 before 110: (6d) if r1 s> r6 goto pc+5 regs=42 stack=0 before 109: (b7) r1 = 1 regs=40 stack=0 before 108: (65) if r6 s> 0x1000 goto pc+7 regs=40 stack=0 before 98: (55) if r6 != 0x1 goto pc+9 R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=invP12 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_rw=inv(id=0) R5_w=inv24 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=2147483647) R7_w=map_value(id=0,off=40,ks=4,vs=8272,imm=0) R8_rw=invP4 R9_w=map_value(id=0,off=12,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_rw=map_value fp-24_r=map_value fp-32_rw=invP24 fp-40_r=ctx fp-48_r=map_value fp-56_w=invP1 fp-64_rw=map_value fp-72_r=map_value fp-80_r=map_value parent already had regs=40 stack=0 marks 124: (79) r6 = (u64 )(r1 +16) R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=map_value(id=0,off=0,ks=4,vs=8272,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6_w=invP(id=0,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value R1 unbounded memory access, make sure to bounds check any such access processed 466 insns (limit 1000000) max_states_per_insn 2 total_states 20 peak_states 20 mark_read 3 If we add this line, as used in other BPF programs, to cap that index: index &= 7; The generated BPF program is considered safe by that version of the BPF verifier, allowing perf to collect the syscall args in one more kernel using the BPF based pointer contents collector. With the above one-liner it works with that kernel: [root@dell-per740-01 ~]# uname -a Linux dell-per740-01.khw.eng.rdu2.dc.redhat.com 4.18.0-513.11.1.el8_9.x86_64 #1 SMP Thu Dec 7 03:06:13 EST 2023 x86_64 x86_64 x86_64 GNU/Linux [root@dell-per740-01 ~]# ~acme/bin/perf trace -e sleep* sleep 1.234567890 0.000 (1234.704 ms): sleep/3863610 nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }) = 0 [root@dell-per740-01 ~]# As well as with the one in Fedora 40: root@number:~# uname -a Linux number 6.11.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 10 22:31:19 UTC 2024 x86_64 GNU/Linux root@number:~# perf trace -e sleep sleep 1.234567890 0.000 (1234.722 ms): sleep/14873 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x7ffe87311a40) = 0 root@number:~# Song Liu reported that this one-liner was being optimized out by clang 18, so I suggested and he tested that adding a compiler barrier before it made clang v18 to keep it and the verifier in the kernel in Song's case (Meta's 5.12 based kernel) also was happy with the resulting bytecode. I'll investigate using virtme-ng[1] to have all the perf BPF based functionality thoroughly tested over multiple kernels and clang versions. [1] https://kernel-recipes.org/en/2024/virtme-ng/ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <andrea.righi@linux.dev> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/lkml/Zw7JgJc0LOwSpuvx@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 11:34:56 -03:00
Namhyung Kim	36fae9f93e	perf test: Add precise_max subtest to the perf record shell test It's a very simply test just to run with cycles:P and instructions:P events. Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-10-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:55:08 -07:00
Namhyung Kim	634d36f825	perf record: Just use "cycles:P" as the default event The fallback logic can add ":u" modifier if needed. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-9-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:55:08 -07:00
Namhyung Kim	af954f76ee	perf tools: Check fallback error and order The perf_event_open might fail due to various reasons, so blindly reducing precise_ip level might not be the best way to deal with it. It seems the kernel return -EOPNOTSUPP when PMU doesn't support the given precise level. Let's try again with the correct error code. This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but user events with exclude_kernel=1 cannot make progress. Let's add the evsel__handle_error_quirks() to this case specially. I plan to work on the kernel side to improve this situation but it'd still need some special handling for IBS. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:55:08 -07:00
Namhyung Kim	28398ce172	perf tools: Move x86__is_amd_cpu() to util/env.c It can be called from non-x86 platform so let's move it to the general util directory. Also add a new helper perf_env__is_x86_amd_cpu() so that it can be called with an existing perf_env as well. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:55:07 -07:00
Namhyung Kim	3b193a57ba	perf tools: Detect missing kernel features properly The evsel__detect_missing_features() is to check if the attributes of the evsel is supported or not. But it checks the attribute based on the given evsel, it might miss something if the attr doesn't have the bit or give incorrect results if the event is special. Also it maintains the order of the feature that was added to the kernel which means it can assume older features should be supported once it detects the current feature is working. To minimized the confusion and to accurately check the kernel features, I think it's better to use a software event and go through all the features at once. Also make the function static since it's only used in evsel.c. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:52:11 -07:00
Namhyung Kim	88bc63d00e	perf tools: Do not set exclude_guest for precise_ip It seems perf sets the exclude_guest bit because of Intel PEBS implementation which uses a virtual address. IIUC now kernel disables PEBS when it goes to the guest mode regardless of this bit so we don't need to set it explicitly. At least for the other archs/vendors. I found the commit `1342798cc1` set the exclude_guest for precise_ip in the tool and the commit `20b279ddb3` added kernel side enforcement which was reverted by commit `a706d965dc` later. Actually it doesn't set the exclude_guest for the default event (cycles:P) already. $ grep -m1 vendor /proc/cpuinfo vendor_id : GenuineIntel $ perf record -e cycles:P true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ] $ perf evlist -v \| tr ',' '\n' \| grep -e exclude -e precise precise_ip: 3 But having lower 'p' modifier set the bit for some reason. $ perf record -e cycles:pp true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ] $ perf evlist -v \| tr ',' '\n' \| grep -e exclude -e precise precise_ip: 2 exclude_guest: 1 Actually AMD IBS suffers from this because it doesn't support excludes and having this bit effectively disables new features in the current implementation (due to the missing feature check). $ grep -m1 vendor /proc/cpuinfo vendor_id : AuthenticAMD $ perf record -W -e cycles:p -vv true 2>&1 \| grep switching switching off PERF_FORMAT_LOST support switching off weight struct support switching off bpf_event switching off ksymbol switching off cloexec flag switching off mmap2 switching off exclude_guest, exclude_host By not setting exclude_guest, we can fix this inconsistency and the troubles. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:52:11 -07:00
Namhyung Kim	d9e0970f77	perf tools: Simplify evsel__add_modifier() Since it doesn't set the exclude_guest, no need to special handle the bit and simply show only if one of host or guest bit is set. Now the default event name might not have :H prefix anymore so change the dlfilter test not to compare the ":" at the end. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:52:11 -07:00
Namhyung Kim	35c8d21371	perf tools: Don't set attr.exclude_guest by default The exclude_guest in the event attribute is to limit profiling in the host environment. But I'm not sure why we want to set it by default cause we don't care about it in most cases and I feel like it just makes new PMU implementation complicated. Of course it's useful for perf kvm command so I added the exclude_GH_default variable to preserve the old behavior for perf kvm and other commands like perf record and stat won't set the exclude bit. This is helpful for AMD IBS case since having exclude_guest bit will clear new feature bit due to the missing feature check logic. $ sysctl kernel.perf_event_paranoid kernel.perf_event_paranoid = 0 $ perf record -W -e ibs_op// -vv true 2>&1 \| grep switching switching off PERF_FORMAT_LOST support switching off weight struct support switching off bpf_event switching off ksymbol switching off cloexec flag switching off mmap2 switching off exclude_guest, exclude_host Intestingly, I found it sets the exclude_bit if "u" modifier is used. I don't know why but it's neither intuitive nor consistent. Let's remove the bit there too. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:52:11 -07:00
Namhyung Kim	bb6e7cb11d	perf tools: Add fallback for exclude_guest Commit `7b100989b4` ("perf evlist: Remove __evlist__add_default") changed to parse "cycles:P" event instead of creating a new cycles event for perf record. But it also changed the way how modifiers are handled so it doesn't set the exclude_guest bit by default. It seems Apple M1 PMU requires exclude_guest set and returns EOPNOTSUPP if not. Let's add a fallback so that it can work with default events. Also update perf stat hybrid tests to handle possible u or H modifiers. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-2-namhyung@kernel.org Fixes: `7b100989b4` ("perf evlist: Remove __evlist__add_default") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:51:22 -07:00
Brian Geffon	3e2d4df574	perf tools: sched-pipe bench: add (-n) nonblocking benchmark The -n mode will benchmark pipes in a non-blocking mode using epoll_wait. This specific mode was added to demonstrate the broken sync nature of epoll: https://lore.kernel.org/lkml/20240426-zupfen-jahrzehnt-5be786bcdf04@brauner Signed-off-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20241016190009.866615-1-bgeffon@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-21 21:23:01 -07:00
Arnaldo Carvalho de Melo	915a377627	perf test: Document the -w/--workload option Wasn't documented so far, mention that it is mostly used in the shell regression tests. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/r/20241020021842.1752770-4-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-21 21:10:50 -07:00
Arnaldo Carvalho de Melo	13c138308d	perf test: Introduce --list-workloads to list the available workloads Using it: $ perf test -w noplop No workload found: noplop $ $ perf test -w Error: switch `w' requires a value Usage: perf test [<options>] [{list <test-name-fragment>\|[<test-name-fragments>\|<test-numbers>]}] -w, --workload <work> workload to run for testing, use '--list-workloads' to list the available ones. $ $ perf test --list-workloads noploop thloop leafloop sqrtloop brstack datasym landlock $ Would be good at some point to have a description in 'struct test_workload'. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/r/20241020021842.1752770-3-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-21 21:10:33 -07:00
Arnaldo Carvalho de Melo	18b63d63cd	perf test: Introduce workloads__for_each() And use it in run_workload(). Testing it: root@x1:~# perf trace -e landlock perf test -w landlock 0.000 ( 0.015 ms): :1274331/1274331 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffd3fea55e0, flags: 45) = -1 EINVAL (Invalid argument) 0.018 ( 0.003 ms): :1274331/1274331 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_NET_PORT, rule_attr: 0x7ffd3fea55f0, flags: 45) = -1 EINVAL (Invalid argument) root@x1:~# perf test -w bla No workload found: bla root@x1:~# Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/r/20241020021842.1752770-2-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-21 21:10:06 -07:00
Sandipan Das	46610ba41e	perf vendor events amd: Update Zen 5 data cache fill events For events that count data cache fills, some combinations of the unit mask bits are useful for counting fills from local caches, DRAM or any far sources. However, named events currently exist for PMCx044 (Any Data Cache Fills) only. Add similar events for the following base events. * PMCx043 (Demand Data Cache Fills) * PMCx059 (Software Prefetch Data Cache Fills) * PMCx05A (Hardware Prefetch Data Cache Fills) While at it, remove "ls_any_fills_from_sys.all_dram_io" since it is a duplicate of "ls_any_fills_from_sys.dram_io_all". Event descriptions can be found in Section 2.1.16.5.2 "Load/Store (LS) Events" of the Processor Programming Reference (PPR) for AMD Family 1Ah Model 02h Revision C1 Processors document available at the link below. Link: https://bugzilla.kernel.org/attachment.cgi?id=307010 Signed-off-by: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ananth.narayan@amd.com Cc: ravi.bangoria@amd.com Cc: eranian@google.com Link: https://lore.kernel.org/r/e036e3c9fb962c939fa06c855b68e532ee609e01.1729242778.git.sandipan.das@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:41:51 -07:00
Sandipan Das	17aedce6e0	perf vendor events amd: Add Zen 5 data fabric metrics Add data fabric metrics taken from Section 2.1.16.2 "Performance Measurement" in the Processor Programming Reference (PPR) for AMD Family 1Ah Model 02h Revision C1 Processors document available at the link below. The recommended metrics are sourced from Table 28 "Guidance for Common Performance Statistics with Complex Event Selects". They capture data bandwidth for various links and interfaces in the data fabric. Link: https://bugzilla.kernel.org/attachment.cgi?id=307010 Signed-off-by: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ananth.narayan@amd.com Cc: ravi.bangoria@amd.com Cc: eranian@google.com Link: https://lore.kernel.org/r/e8757bb9f511907a52bc182de9395c5edec2fccf.1729242778.git.sandipan.das@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:41:51 -07:00
Sandipan Das	f101a8e345	perf vendor events amd: Add Zen 5 data fabric events Add data fabric events taken from Section 2.1.16.2 "Performance Measurement" in the Processor Programming Reference (PPR) for AMD Family 1Ah Model 02h Revision C1 Processors document available at the link below. This constitutes events which capture the flow of data beats at various links and interfaces in the data fabric. Link: https://bugzilla.kernel.org/attachment.cgi?id=307010 Signed-off-by: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ananth.narayan@amd.com Cc: ravi.bangoria@amd.com Cc: eranian@google.com Link: https://lore.kernel.org/r/198049e27366f3980e4991b95cec5eaac6d31d75.1729242778.git.sandipan.das@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:41:51 -07:00
Thomas Richter	21677f653f	perf test: Fix perf test case 84 on s390 Perf test case 84 'perf pipe recording and injection test' sometime fails on s390, especially on z/VM virtual machines. This is caused by a very short run time of workload # perf test -w noploop which runs for 1 second. Occasionally this is not long enough and the perf report has no samples for symbol noploop. Fix this and enlarge the runtime for the perf work load to 3 seconds. This ensures the symbol noploop is always present. Since only s390 is affected, make this loop architecture dependend. Output before: Inject -b build-ids test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.277 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.160 MB /tmp/perf.data.ELzRdq (4031 samples) ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] Inject -b build-ids test [Success] Inject --buildid-all build-ids test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB - ] Inject --buildid-all build-ids test [Failed - cannot find noploop function in pipe #2] Output after: Successful execution for over 10 times in a loop. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Cc: agordeev@linux.ibm.com Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Link: https://lore.kernel.org/r/20241018081732.1391060-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:39:54 -07:00
Namhyung Kim	e2cb1db7da	perf test: Update all metrics test like metricgroups test Like in the metricgroup tests, it should check the permission first and then skip relevant failures accordingly. Also it needs to try again with the system wide flag properly. On the second round, check if the result has the metric name because other failure cases are checked in the first round already. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241018204306.741972-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:34:56 -07:00
Ian Rogers	5455d89bf3	perf build: Rename CONFIG_DWARF to CONFIG_LIBDW In Makefile.config for unwinding the name dwarf implies either libunwind or libdw. Make it clearer that CONFIG_DWARF is really just defined when libdw is present by renaming to CONFIG_LIBDW. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	8838abf626	perf build: Rename HAVE_DWARF_SUPPORT to HAVE_LIBDW_SUPPORT In Makefile.config for unwinding the name dwarf implies either libunwind or libdw. Make it clearer that HAVE_DWARF_SUPPORT is really just defined when libdw is present by renaming to HAVE_LIBDW_SUPPORT. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	5eb2242513	perf libdw: Remove unnecessary defines As HAVE_DWARF_GETLOCATIONS_SUPPORT and HAVE_DWARF_CFI_SUPPORT always match HAVE_DWARF_SUPPORT remove the macros and use HAVE_DWARF_SUPPORT. If building the file is guarded by CONFIG_DWARF then remove all ifs. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	91e81e988f	perf probe: Move elfutils support check to libdw check The test _ELFUTILS_PREREQ(0, 142) is false for elfutils before 2009-06-13, but that is 15 years ago and very unlikely. Add a test to test-libdw.c and assume the libdw version is at least 0.142 to simplify the build logic. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	26385fd237	perf build: Combine test-dwarf-getcfi into test-libdw dwarf_getcfi support in libdw is 15 years old. Make libdw imply dwarf_getcfi support and simplify build logic. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	23580d7bb1	perf build: Combine test-dwarf-getlocations into test-libdw dwarf_getlocations support in libdw is more than 10 years old. Make libdw imply dwarf_getlocations support and simplify build logic. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	3034b48a4b	perf build: Combine libdw-dwarf-unwind into libdw feature tests Support in libdw has been present for 10 years so let's simplify the build logic with a single feature test. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	7c943261a1	perf build: Rename test-dwarf to test-libdw Be more intention revealing that the dwarf test is actually testing for libdw support. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	a6c55df973	perf build: Remove defined but never used variable Previously NO_DWARF_UNWIND was part of conditional compilation but it is now unused so remove. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	54a1368567	perf build: Rename NO_DWARF to NO_LIBDW NO_DWARF could mean more than NO_LIBDW support, in particular no libunwind support. Rename to be more intention revealing. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	a9823dae4c	perf build: Fix LIBDW_DIR Testing with a LIBDW_DIR showed that in Makefile.config the dwarf feature tests need the LIBDW_DIR setting in the CFLAGS/LDFLAGS. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	8296aa0f28	perf test: Move attr files into shell directory where they are used Now the attr tests are shell tests move the associated python and configuration files. Update the installation build rules for the new directories. Recycle the lib install rules for python files allowing the explicit attr.py install line to be dropped. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241015000158.871828-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 13:17:36 -07:00
Ian Rogers	3a447031f5	perf test: Remove C test wrapper for attr.py Remove the C wrapper now a shell script wrapper exists. Move perf_event_attr dumping functions to evsel.c and reduce the scope of variables/defines. Use fprintf to avoid snprintf complexities in WRITE_ASS. Add __SANE_USERSPACE_TYPES__ to evsel.c to fix format flag issues on PowerPC triggered by moving attr.c functions to evsel.c. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241015000158.871828-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 13:17:36 -07:00
Ian Rogers	8519e4f44c	perf test: Add a shell wrapper for "Setup struct perf_event_attr" The "Setup struct perf_event_attr" test in attr.c does a bunch of directory finding to set up running a python test that in general is more brittle than similar logic we have in shell tests. Add a shell test that invokes and runs the tests in the python attr.py script. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241015000158.871828-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 13:17:36 -07:00
Leo Yan	314909f13c	perf probe: Correct demangled symbols in C++ program An issue can be observed when probe C++ demangled symbol with steps: # nm test_cpp_mangle \| grep print_data 0000000000000c94 t _GLOBAL__sub_I__Z10print_datai 0000000000000afc T _Z10print_datai 0000000000000b38 T _Z10print_dataR5Point # perf probe -x /home/niayan01/test_cpp_mangle -F --demangle ... print_data(Point&) print_data(int) ... # perf --debug verbose=3 probe -x test_cpp_mangle --add "test=print_data(int)" probe-definition(0): test=print_data(int) symbol:print_data(int) file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Open Debuginfo file: /home/niayan01/test_cpp_mangle Try to find probe point from debuginfo. Symbol print_data(int) address found : afc Matched function: print_data [2ccf] Probe point found: print_data+0 Found 1 probe_trace_events. Opening /sys/kernel/tracing//uprobe_events write=1 Opening /sys/kernel/tracing//README write=0 Writing event: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0xb38 ... When tried to probe symbol "print_data(int)", the log shows: Symbol print_data(int) address found : afc The found address is 0xafc - which is right with verifying the output result from nm. Afterwards when write event, the command uses offset 0xb38 in the last log, which is a wrong address. The dwarf_diename() gets a common function name, in above case, it returns string "print_data". As a result, the tool parses the offset based on the common name. This leads to probe at the wrong symbol "print_data(Point&)". To fix the issue, use the die_get_linkage_name() function to retrieve the distinct linkage name - this is the mangled name for the C++ case. Based on this unique name, the tool can get a correct offset for probing. Based on DWARF doc, it is possible the linkage name is missed in the DIE, it rolls back to use dwarf_diename(). After: # perf --debug verbose=3 probe -x test_cpp_mangle --add "test=print_data(int)" probe-definition(0): test=print_data(int) symbol:print_data(int) file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Open Debuginfo file: /home/niayan01/test_cpp_mangle Try to find probe point from debuginfo. Symbol print_data(int) address found : afc Matched function: print_data [2d06] Probe point found: print_data+0 Found 1 probe_trace_events. Opening /sys/kernel/tracing//uprobe_events write=1 Opening /sys/kernel/tracing//README write=0 Writing event: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0xafc Added new event: probe_test_cpp_mangle:test (on print_data(int) in /home/niayan01/test_cpp_mangle) You can now use it in all perf tools, such as: perf record -e probe_test_cpp_mangle:test -aR sleep 1 # perf --debug verbose=3 probe -x test_cpp_mangle --add "test2=print_data(Point&)" probe-definition(0): test2=print_data(Point&) symbol:print_data(Point&) file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Open Debuginfo file: /home/niayan01/test_cpp_mangle Try to find probe point from debuginfo. Symbol print_data(Point&) address found : b38 Matched function: print_data [2ccf] Probe point found: print_data+0 Found 1 probe_trace_events. Opening /sys/kernel/tracing//uprobe_events write=1 Parsing probe_events: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0x0000000000000afc Group:probe_test_cpp_mangle Event:test probe:p Opening /sys/kernel/tracing//README write=0 Writing event: p:probe_test_cpp_mangle/test2 /home/niayan01/test_cpp_mangle:0xb38 Added new event: probe_test_cpp_mangle:test2 (on print_data(Point&) in /home/niayan01/test_cpp_mangle) You can now use it in all perf tools, such as: perf record -e probe_test_cpp_mangle:test2 -aR sleep 1 Fixes: `fb1587d869` ("perf probe: List probes with line number and file name") Signed-off-by: Leo Yan <leo.yan@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20241012141432.877894-1-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:55:48 -07:00
Ian Rogers	17df33fe22	perf stat: Disable metric thresholds for CSV and JSON metric-only mode These modes don't use the threshold, so don't compute it saving time and potentially reducing events. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	f9825601aa	perf stat: Add metric-threshold to json output When the threshold isn't unknown add a value to the json like: "metric-threshold" : "good" A more complete example: ``` $ perf stat -a -j -I 1000 {"interval" : 1.001089747, "counter-value" : "16045.281449", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 16045355135, "pcnt-running" : 100.00, "metric-value" : "16.045281", "metric-unit" : "CPUs utilized"} {"interval" : 1.001089747, "counter-value" : "10003.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 16045314844, "pcnt-running" : 100.00, "metric-value" : "623.423156", "metric-unit" : "/sec"} {"interval" : 1.001089747, "counter-value" : "328.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 16045321403, "pcnt-running" : 100.00, "metric-value" : "20.442147", "metric-unit" : "/sec"} {"interval" : 1.001089747, "counter-value" : "20114.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 16045355927, "pcnt-running" : 100.00, "metric-value" : "1.253577", "metric-unit" : "K/sec"} {"interval" : 1.001089747, "counter-value" : "4066679471.000000", "unit" : "", "event" : "instructions", "event-runtime" : 16045369123, "pcnt-running" : 100.00, "metric-value" : "1.628330", "metric-unit" : "insn per cycle"} {"interval" : 1.001089747, "counter-value" : "2497454658.000000", "unit" : "", "event" : "cycles", "event-runtime" : 16045374810, "pcnt-running" : 100.00, "metric-value" : "0.155650", "metric-unit" : "GHz"} {"interval" : 1.001089747, "counter-value" : "914974294.000000", "unit" : "", "event" : "branches", "event-runtime" : 16045379877, "pcnt-running" : 100.00, "metric-value" : "57.024509", "metric-unit" : "M/sec"} {"interval" : 1.001089747, "counter-value" : "9237201.000000", "unit" : "", "event" : "branch-misses", "event-runtime" : 16045375017, "pcnt-running" : 100.00, "metric-value" : "1.009559", "metric-unit" : "of all branches", "metric-threshold" : "good"} {"interval" : 1.001089747, "event-runtime" : 16045397172, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1"} {"interval" : 1.001089747, "metric-value" : "22.036686", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"} {"interval" : 1.001089747, "metric-value" : "7.610161", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"} {"interval" : 1.001089747, "metric-value" : "36.729687", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"} {"interval" : 1.001089747, "metric-value" : "33.623465", "metric-unit" : "% tma_retiring"} ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	37b77ae954	perf stat: Change color to threshold in print_metric Colors don't mean things in CSV and JSON output, switch to a threshold enum value that the standard output can convert to a color. Updating the CSV and JSON output will be later changes. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	e1cc918b6c	perf stat: Drop metric-unit if unit is NULL Avoid cases like: ``` $ perf stat -a -M topdownl1 -j -I 1000 ... {"interval" : 11.127757275, "counter-value" : "85715898.000000", "unit" : "", "event" : "IDQ.MITE_UOPS", "event-runtime" : 988376123, "pcnt-running" : 100.00, "metric-value" : "0.000000", "metric-unit" : "(null)"} ... ``` If there is no unit then drop the metric-value too as: Suggested-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	1133e7f7dc	perf stat: Display "none" for NaN with metric only json Return earlier for an empty unit case. If snprintf of the fmt doesn't produce digits between vals and ends, as happens with NaN, make the value "none" as happens in print_metric_end. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	9809b2b1f2	perf stat: Fix/add parameter names for print_metric The print_metric parameter names were rearranged, fix and add comments in the stat-shadow callers to ensure they are correct. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	58fc358a3e	perf color: Add printf format checking and resolve issues Add printf format checking to vararg printf routines in color.h. Resolve build errors/bugs that are found through this checking. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	4585038b8e	perf probe: Fix libdw memory leak Add missing dwarf_cfi_end to free memory associated with probe_finder cfi_eh which is allocated and owned via a call to dwarf_getcfi_elf. Confusingly cfi_dbg shouldn't be freed as its memory is owned by the passed in debuginfo struct. Add comments to highlight this. This addresses leak sanitizer issues seen in: tools/perf/tests/shell/test_uprobe_from_different_cu.sh Fixes: `270bde1e76` ("perf probe: Search both .eh_frame and .debug_frame sections for probe location") Signed-off-by: Ian Rogers <irogers@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241016235622.52166-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:43:14 -07:00
Ian Rogers	1280f012e0	perf disasm: Fix capstone memory leak The insn argument passed to cs_disasm needs freeing. To support accurately having count, add an additional free_count variable. Fixes: `c5d60de181` ("perf annotate: Add support to use libcapstone in powerpc") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: David S. Miller <davem@davemloft.net> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241016235622.52166-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:43:14 -07:00
Athira Rajeev	54f9aa1092	tools/perf/powerpc/util: Add support to handle compatible mode PVR for perf json events perf list picks the events supported for specific platform from pmu-events/arch/powerpc/<platform>. Example power10 events are in pmu-events/arch/powerpc/power10, power9 events are part of pmu-events/arch/powerpc/power9. The decision of which platform to pick is determined based on PVR value in powerpc. The PVR value is matched from pmu-events/arch/powerpc/mapfile.csv Example: Format: PVR,Version,JSON/file/pathname,Type 0x004[bcd][[:xdigit:]]{4},1,power8,core 0x0066[[:xdigit:]]{4},1,power8,core 0x004e[[:xdigit:]]{4},1,power9,core 0x0080[[:xdigit:]]{4},1,power10,core 0x0082[[:xdigit:]]{4},1,power10,core The code gets the PVR from system using get_cpuid_str function in arch/powerpc/util/headers.c ( from SPRN_PVR ) and compares with value from mapfile.csv In case of compat mode, say when partition is booted in a power9 mode when the system is a power10, this picks incorrectly. Because PVR will point to power10 where as it should pick events from power9 folder. To support generic events, add new folder pmu-events/arch/powerpc/compat to contain the ISA architected events which is supported in compat mode. Also return 0x00ffffff as pvr when booted in compat mode. Based on this pvr value, json will pick events from pmu-events/arch/powerpc/compat Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel<disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241010145107.51211-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 11:25:00 -07:00
Athira Rajeev	86f45d0f17	tools/perf/pmu-events/powerpc: Add support for compat events in json perf list picks the events supported for specific platform from pmu-events/arch/powerpc/<platform>. Example power10 events are in pmu-events/arch/powerpc/power10, power9 events are part of pmu-events/arch/powerpc/power9. The decision of which platform to pick is determined based on PVR value in powerpc. The PVR value is matched from pmu-events/arch/powerpc/mapfile.csv Example: Format: PVR,Version,JSON/file/pathname,Type 0x004[bcd][[:xdigit:]]{4},1,power8,core 0x0066[[:xdigit:]]{4},1,power8,core 0x004e[[:xdigit:]]{4},1,power9,core 0x0080[[:xdigit:]]{4},1,power10,core 0x0082[[:xdigit:]]{4},1,power10,core The code gets the PVR from system using get_cpuid_str function in arch/powerpc/util/headers.c ( from SPRN_PVR ) and compares with value from mapfile.csv In case of compat mode, say when partition is booted in a power9 mode when the system is a power10, add an entry to pick the ISA architected events from "pmu-events/arch/powerpc/compat". Add json file generic-events.json which will contain these events which is supported in compat mode. Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241010145107.51211-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 11:24:52 -07:00
Veronika Molnarova	05a62936e6	perf dso: Fix symtab_type for kmod compression During the rework of the dso structure in patch `ee756ef749` an increment was forgotten for the symtab_type in case the data for the kernel module are compressed. This affects the probing of the kernel modules, which fails if the data are not already cached. Increment the value of the symtab_type to its compressed variant so the data could be recovered successfully. Fixes: `ee756ef749` ("perf dso: Add reference count checking and accessor functions") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Michael Petlan <mpetlan@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20241010144836.16424-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:59 -07:00
Leo Yan	e34f6ac511	perf probe: Improve log for long event name failure If a symbol name is longer than the maximum event length (64 bytes), the perf tool reports error: # perf probe -x test_cpp_mangle --add "this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)" snprintf() failed: -7; the event name nbase='this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)' is too long Error: Failed to add events. An information is missed in the log that the symbol name and the event name can be set separately. Especially, this is recommended for adding probe for a long symbol. This commit refines the log for reminding event syntax. After: # perf probe -x test_cpp_mangle --add "this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)" snprintf() failed: -7; the event name 'this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)' is too long Hint: Set a shorter event with syntax "EVENT=PROBEDEF" EVENT: Event name (max length: 64 bytes). Error: Failed to add events. Signed-off-by: Leo Yan <leo.yan@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20241012204725.928794-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:59 -07:00
Leo Yan	6768faf9b7	perf probe: Check group string length In the kernel, the probe group string length is limited up to MAX_EVENT_NAME_LEN (including the NULL terminator). Check for this limitation and report an error if it is exceeded. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20241012204725.928794-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:58 -07:00
Leo Yan	d08e3f14e8	perf probe: Use the MAX_EVENT_NAME_LEN macro The MAX_EVENT_NAME_LEN macro has been defined in the kernel. Use the same definition in the tool for more readable. Signed-off-by: Leo Yan <leo.yan@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20241012204725.928794-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:58 -07:00
Namhyung Kim	3662f82f16	perf test: Speed up some tests using perf list On my system, perf list is very slow to print the whole events. I think there's a performance issue in SDT and uprobes event listing. I noticed this issue while running perf test on x86 but it takes long to check some CoreSight event which should be skipped quickly. Anyway, some test uses perf list to check whether the required event is available before running the test. The perf list command can take an argument to specify event class or (glob) pattern. But glob pattern is only to suppress output for unmatched ones after checking all events. In this case, specifying event class is better to reduce the number of events it checks and to avoid buggy subsystems entirely. No functional changes intended. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20241016065654.269994-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:58 -07:00
Arnaldo Carvalho de Melo	39c6a35620	perf trace: The return from 'write' isn't a pid When adding a explicit beautifier for the 'write' syscall when the BPF based buffer collector was introduced there was a cut'n'paste error that carried the syscall_fmt->errpid setting from a nearby syscall (waitid) that returns a pid. So the write return was being suppressed by the return pretty printer, remove that field, reverting it back to the default return handler, that prints positive numbers as-is and interpret negative values as errnos. I actually introduced the problem while making Howard's original patch work just with the 'write' syscall, as we couldn't just look for any buffers, the ones that are filled in by the kernel couldn't use the same sys_enter BPF collector. Fixes: `b257fac12f` ("perf trace: Pretty print buffer data") Reported-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/lkml/bcf50648-3c7e-4513-8717-0d14492c53b9@linaro.org Link: https://lore.kernel.org/all/Zt8jTfzDYgBPvFCd@x1/#t Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-17 10:34:43 -03:00
Dapeng Mi	fbc798316b	perf x86/topdown: Refine helper arch_is_topdown_metrics() Leverage the existed function perf_pmu__name_from_config() to check if an event is topdown metrics event. perf_pmu__name_from_config() goes through the defined formats and figures out the config of pre-defined topdown events. This avoids to figure out the config of topdown pre-defined events with hard-coded format strings "event=" and "umask=" and provides more flexibility. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241011110207.1032235-2-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-16 13:36:47 -07:00
Dapeng Mi	b68b5b36c7	perf x86/topdown: Make topdown metrics comparators be symmetric The commit "3b5edc0421e2 (perf x86/topdown: Don't move topdown metric events in group)" modifies topdown metrics comparator to move topdown metrics events which are not in same group with previous event. But it just modifies the 2nd comparator and causes the comparators become asymmetric. Thus modify the 1st topdown metrics comparator and make the two comparators be symmetric, and refine the comments as well. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241011110207.1032235-1-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-16 13:36:41 -07:00
Ian Rogers	42fd7cac57	perf tool_pmu: Remove duplicate io.h header Remove duplicate inclusion of api/io.h. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410131417.ynhvnEJb-lkp@intel.com/ Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241016160413.51587-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-16 13:35:04 -07:00
Leo Yan	ea2ead4224	perf arm-spe: Add Cortex CPUs to common data source encoding list Add Cortex-A720, Cortex-A725, Cortex-X1C, Cortex-X3 and Cortex-X925 into the common data source encoding list. For everyone of these CPUs, it technical reference manual defines the data source packet as the common encoding format. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-8-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:32 -07:00
Besar Wicaksono	041c0e5715	perf arm-spe: Add Neoverse-V2 to common data source encoding list Add Neoverse-V2 MIDR to the common data source encoding range list. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-7-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:32 -07:00
Leo Yan	6bcf54c89b	perf arm-spe: Remove the unused 'midr' field The 'midr' field is replaced by the MIDR values stored in metadata (per CPU wise). Remove the 'midr' field as it is no longer used. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-6-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Leo Yan	ba5e7169e5	perf arm-spe: Use metadata to decide the data source feature Use the info in the metadata to decide if the data source feature is supported. The CPU MIDR must be in the CPU list for the common data source encoding. For the metadata version 1, it doesn't include info for MIDR. In this case, due to absent info for making decision, print out warning to remind users to upgrade tool and returns false. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-5-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Leo Yan	56ae663e76	perf arm-spe: Introduce arm_spe__is_homogeneous() Introduce the arm_spe__is_homogeneous() function, it uses to check if Arm SPE is homogeneous cross all CPUs. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Leo Yan	50b8f1d5bf	perf arm-spe: Rename the common data source encoding The Neoverse CPUs follow the common data source encoding, and other CPU variants can share the same format. Rename the CPU list and data source definitions as common data source names. This change prepares for appending more CPU variants. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Leo Yan	fb98fa3bf8	perf arm-spe: Rename arm_spe__synth_data_source_generic() The arm_spe__synth_data_source_generic() function is invoked when the tool detects that CPUs do not support data source packets and falls back to synthesizing only the memory level. Rename it to arm_spe__synth_memory_level() for better reflecting its purpose. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Howard Chu	0c383c0827	perf test: Delete unused Intel CQM test As Ian Rogers <irogers@google.com> pointed out, intel-cqm.c is neither used nor built. It was deleted in the following commit: commit `b24413180f` ("License cleanup: add SPDX GPL-2.0 license identifier to files with no license") However, it resurfaced soon after in the following commit: commit `5c9295bfe6` ("perf tests: Remove Intel CQM perf test") It should be deleted once and for all. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Matt Fleming <mfleming@cloudflare.com> Link: https://lore.kernel.org/r/20241011055700.4142694-1-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Namhyung Kim	1afe05b0cf	perf evsel: Fix missing inherit + sample read check It should not clear the inherit bit simply because the kernel doesn't support the sample read with it. IOW the inherit bit should be kept when the sample read is not requested for the event. Fixes: `90035d3cd8` ("tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events") Acked-by: Ben Gainey <ben.gainey@arm.com> Link: https://lore.kernel.org/r/20241009062250.730192-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Madadi Vineeth Reddy	cd912ab3b6	perf sched timehist: Add pre-migration wait time option pre-migration wait time is the time that a task unnecessarily spends on the runqueue of a CPU but doesn't get switched-in there. In terms of tracepoints, it is the time between sched:sched_wakeup and sched:sched_migrate_task. Let's say a task woke up on CPU2, then it got migrated to CPU4 and then it's switched-in to CPU4. So, here pre-migration wait time is time that it was waiting on runqueue of CPU2 after it is woken up. The general pattern for pre-migration to occur is: sched:sched_wakeup sched:sched_migrate_task sched:sched_switch The sched:sched_waking event is used to capture the wakeup time, as it aligns with the existing code and only introduces a negligible time difference. pre-migrations are generally not useful and it increases migrations. This metric would be helpful in testing patches mainly related to wakeup and load-balancer code paths as better wakeup logic would choose an optimal CPU where task would be switched-in and thereby reducing pre- migrations. The sample output(s) when -P or --pre-migrations is used: ================= time cpu task name wait time sch delay run time pre-mig time [tid/pid] (msec) (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- --------- 38456.720806 [0001] schbench[28634/28574] 4.917 4.768 1.004 0.000 38456.720810 [0001] rcu_preempt[18] 3.919 0.003 0.004 0.000 38456.721800 [0006] schbench[28779/28574] 23.465 23.465 1.999 0.000 38456.722800 [0002] schbench[28773/28574] 60.371 60.237 3.955 60.197 38456.722806 [0001] schbench[28634/28574] 0.004 0.004 1.996 0.000 38456.722811 [0001] rcu_preempt[18] 1.996 0.005 0.005 0.000 38456.723800 [0000] schbench[28833/28574] 4.000 4.000 3.999 0.000 38456.723800 [0004] schbench[28762/28574] 42.951 42.839 3.999 39.867 38456.723802 [0007] schbench[28812/28574] 43.947 43.817 3.999 40.866 38456.723804 [0001] schbench[28587/28574] 7.935 7.822 0.993 0.000 Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Link: https://lore.kernel.org/r/20241004170756.18064-1-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Namhyung Kim	af3902bfc1	perf tools: Remove unnecessary parentheses The hashmap API used to require parentheses for the hashmap argument if it's not a pointer type. It's now fixed so let's drop the parentheses. Link: https://lore.kernel.org/r/20241009202009.884884-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Namhyung Kim	04042674b2	perf tools: Fix possible compiler warnings in hashmap The hashmap__for_each_entry[_safe] is accessing 'map' as if it's a pointer. But it does without parentheses so passing a static hash map with an ampersand (like &slab_hash below) caused compiler warnings due to unmatched types. In file included from util/bpf_lock_contention.c:5: util/bpf_lock_contention.c: In function ‘exit_slab_cache_iter’: linux/tools/perf/util/hashmap.h:169:32: error: invalid type argument of ‘->’ (have ‘struct hashmap’) 169 \| for (bkt = 0; bkt < map->cap; bkt++) \ \| ^~ util/bpf_lock_contention.c:105:9: note: in expansion of macro ‘hashmap__for_each_entry’ 105 \| hashmap__for_each_entry(&slab_hash, cur, bkt) \| ^~~~~~~~~~~~~~~~~~~~~~~ /home/namhyung/project/linux/tools/perf/util/hashmap.h:170:31: error: invalid type argument of ‘->’ (have ‘struct hashmap’) 170 \| for (cur = map->buckets[bkt]; cur; cur = cur->next) \| ^~ util/bpf_lock_contention.c:105:9: note: in expansion of macro ‘hashmap__for_each_entry’ 105 \| hashmap__for_each_entry(&slab_hash, cur, bkt) \| ^~~~~~~~~~~~~~~~~~~~~~~ Cc: bpf@vger.kernel.org Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241009202009.884884-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Namhyung Kim	77b679453d	Linux 6.12-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmcMPK0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGw5kH/0GukMc4uUytezog 7UxIxa0G1zvwJwAhJpNCZR90e8GQ7YCvJFUOxjX3bVqjxZlCjEJ3YWC3fZNdx8YS fOjbuZlGiTmyKg91aVYlby5l23N+r2u6gCDBdPfJD0japiIbayBKjrL+hdEicmf3 w6qToMY20mdvRQ6SUd+Y9nrc//TONru4EhabqRU2Sf1sDzQd1qj4WPtDLSKp3YG9 hpFR7YeJaSYDjwRz1vF8tEnQVJ4I2Df3lXJZYsoSsqiQhQ1Lasp4a09ppVPysj6x oQCza6xeR3jwKib23pZIbNAF4xPMdN1OMOiYELkgHo7YGc6kxniXqSVSrP3LAvkA b92bQpc= =T5hJ -----END PGP SIGNATURE----- Merge tag 'v6.12-rc3' into perf-tools-next To get the fixes in the current perf-tools tree. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 10:45:28 -07:00
Namhyung Kim	1a3d6a9723	perf tools: Fix compiler error in util/tool_pmu.c util/tool_pmu.c: In function 'evsel__tool_pmu_read': util/tool_pmu.c:419:55: error: passing argument 2 of 'tool_pmu__read_event' from incompatible pointer type [-Werror=incompatible-pointer-types] 419 \| if (!tool_pmu__read_event(ev, &val)) { \| ^~~~ \| \| \| long unsigned int * util/tool_pmu.c:335:56: note: expected 'u64 ' {aka 'long long unsigned int '} but argument is of type 'long unsigned int ' 335 \| bool tool_pmu__read_event(enum tool_pmu_event ev, u64 result) \| ~~~~~^~~~~~ Link: https://lore.kernel.org/r/Zw1XIGML32VaxE0t@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 10:40:30 -07:00
Athira Rajeev	9ea671d1b2	tools/perf/tests: Remove duplicate evlist__delete in tests/tool_pmu.c The testcase for tool_pmu failed in powerpc as below: ./perf test -v "Parsing without PMU name" 8: Tool PMU : 8.1: Parsing without PMU name : FAILED! This happens when parse_events results in either skip or fail of an event. Because the code invokes evlist__delete(evlist) and "goto out". ret = parse_events(evlist, str, &err); if (ret) { evlist__delete(evlist); But in the "out" section also evlist__delete happens. out: evlist__delete(evlist); return ret; Hence remove the duplicate evlist__delete from the first path in the testcase With the change: # ./perf test -v "Parsing without PMU name" 8: Tool PMU : 8.1: Parsing without PMU name : Ok Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241013170732.71339-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 10:29:55 -07:00
Athira Rajeev	d94d86cee1	tools/perf/tests: Fix compilation error with strncpy in tests/tool_pmu perf fails to compile on systems with GCC version11 as below: In file included from /usr/include/string.h:519, from /home/athir/perf-tools-next/tools/include/linux/bitmap.h:5, from /home/athir/perf-tools-next/tools/perf/util/pmu.h:5, from /home/athir/perf-tools-next/tools/perf/util/evsel.h:14, from /home/athir/perf-tools-next/tools/perf/util/evlist.h:14, from tests/tool_pmu.c:3: In function ‘strncpy’, inlined from ‘do_test’ at tests/tool_pmu.c:25:3: /usr/include/bits/string_fortified.h:95:10: error: ‘__builtin_strncpy’ specified bound 128 equals destination size [-Werror=stringop-truncation] 95 \| return __builtin___strncpy_chk (__dest, __src, __len, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 96 \| __glibc_objsize (__dest)); \| ~~~~~~~~~~~~~~~~~~~~~~~~~ The compile error is from strncpy refernce in do_test: strncpy(str, tool_pmu__event_to_str(ev), sizeof(str)); This behaviour is not observed with GCC version 8, but observed with GCC version 11 . This is message from gcc for detecting truncation while using strncpu. Use snprintf instead of strncpy here to be safe. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241013173742.71882-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 10:29:14 -07:00
Thomas Falcon	48966a5a48	perf report: Display columns Predicted/Abort/Cycles in --branch-history The original commit message: " Use current sort mechanism but the real .se_cmp() just returns 0 so that new columns "Predicted", "Abort" and "Cycles" are created in display but actually these keys are not the sort keys. For example: Overhead Source:Line Symbol Shared Object Predicted Abort Cycles ........ ............ ........ ............. ......... ..... ...... 38.25% div.c:45 [.] main div 97.6% 0 3 " Update missed commit from series "perf report: Show branch flags/cycles in --branch-history callgraph view" to apply to current repository so that new columns described above are visible. Link to original series: https://lore.kernel.org/lkml/1477876794-30749-1-git-send-email-yao.jin@linux.intel.com/ Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Suggested-by: Kan Liang <kan.liang@linux.intel.com> Co-developed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20241010184046.203822-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:41:23 -07:00
Ian Rogers	8c25df7af3	perf tests: Add tool PMU test Ensure parsing with and without PMU creates events with the expected config values. This ensures the tool.json doesn't get out of sync with tool_pmu_event enum. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:41:13 -07:00
Ian Rogers	609aa2667f	perf tool_pmu: Switch to standard pmu functions and json descriptions Use the regular PMU approaches with tool json events to reduce the amount of special tool_pmu code - tool_pmu__config_terms and tool_pmu__for_each_event_cb are removed. Some functions remain, like tool_pmu__str_to_event, as conveniences to metricgroups. Add tool_pmu__skip_event/tool_pmu__num_skip_events to handle the case that tool json events shouldn't appear on certain architectures. This isn't done in jevents.py due to complexity in the empty-pmu-events.c and when all vendor json is built into the tool. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:33 -07:00
Ian Rogers	c9b121b7fa	perf jevents: Add tool event json under a common architecture Introduce the notion of a common architecture/model that can be used to find event tables for common PMUs like the tool PMU. By having tool events be json standard PMU attribute configuration, descriptions, etc. can be used and these routines are already optimized for things like binary searching. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:33 -07:00
Ian Rogers	069057239a	perf tool_pmu: Move expr literals to tool_pmu Add the expr literals like "#smt_on" as tool events, this allows stat events to give the values. On my laptop with hyperthreading enabled: ``` $ perf stat -e "has_pmem,num_cores,num_cpus,num_cpus_online,num_dies,num_packages,smt_on,system_tsc_freq" true Performance counter stats for 'true': 0 has_pmem 8 num_cores 16 num_cpus 16 num_cpus_online 1 num_dies 1 num_packages 1 smt_on 2,496,000,000 system_tsc_freq 0.001113637 seconds time elapsed 0.001218000 seconds user 0.000000000 seconds sys ``` And with hyperthreading disabled: ``` $ perf stat -e "has_pmem,num_cores,num_cpus,num_cpus_online,num_dies,num_packages,smt_on,system_tsc_freq" true Performance counter stats for 'true': 0 has_pmem 8 num_cores 16 num_cpus 8 num_cpus_online 1 num_dies 1 num_packages 0 smt_on 2,496,000,000 system_tsc_freq 0.000802115 seconds time elapsed 0.000000000 seconds user 0.000806000 seconds sys ``` As zero matters for these values, in stat-display should_skip_zero_counter only skip the zero value if it is not the first aggregation index. The tool event implementations are used in expr but not evaluated as events for simplicity. Also core_wide isn't made a tool event as it requires command line parameters. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	b8f1a1b068	perf tool_pmu: Rename perf_tool_event__* to tool_pmu__* Now the events are associated with the tool PMU, rename the functions to reflect this. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	0709a82c10	perf tool_pmu: Rename enum perf_tool_event to tool_pmu_event To better reflect the events listed are from the tool PMU. Rename the enum values from PERF_TOOL_* to TOOL_PMU__EVENT_*. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	240505b2d0	perf tool_pmu: Factor tool events into their own PMU Rather than treat tool events as a special kind of event, create a tool only PMU where the events/aliases match the existing duration_time, user_time and system_time events. Remove special parsing and printing support for the tool events, but add function calls for when PMU functions are called on a tool_pmu. Move the tool PMU code in evsel into tool_pmu.c to better encapsulate the tool event behavior in that file. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	d2f3ecb0ca	perf parse-events: Expose/rename config_term_name Expose config_term_name as parse_events__term_type_str so that PMUs not in pmu.c may access it. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	c798f72c7a	perf pmu: Allow hardcoded terms to be applied to attributes Hard coded terms like "config=10" are skipped by perf_pmu__config assuming they were already applied to a perf_event_attr by parse event's config_attr function. When doing a reverse number to name lookup in perf_pmu__name_from_config, as the hardcoded terms aren't applied the config value is incorrect leading to misses or false matches. Fix this by adding a parameter to have perf_pmu__config apply hardcoded terms too (not just in parse event's config_term_common). Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	c051220d38	perf pmu: Simplify an asprintf error message Use ifs rather than ?: to avoid a large compound statement. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Dr. David Alan Gilbert	c7c1bb78f3	perf tools: Remove unused color_fwrite_lines color_fwrite_lines() was added by 2009's commit `8fc0321f1a` ("perf_counter tools: Add color terminal output support") but has never been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241009003938.254936-1-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:38:33 -07:00
Thomas Falcon	9f759d41b3	perf test x86: Fix typo in intel-pt-test Change function name "is_hydrid" to "is_hybrid". Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20241007194758.78659-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-09 10:52:08 -07:00
Dr. David Alan Gilbert	3c4e558787	perf probe: Remove unused add_perf_probe_events add_perf_probe_events has been unused since 2015's commit `b02137cc65` ("perf probe: Move print logic into cmd_probe()") which confusingly now uses perf_add_probe_events. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240929010659.430208-1-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-09 10:52:08 -07:00
Linus Torvalds	b2760b8390	perf tools fixes for v6.12: - Fix an assert() to handle captured and unprocessed ARM CoreSight CPU traces. - Fix static build compilation error when libdw isn't installed or is too old. - Add missing include when building with !HAVE_DWARF_GETLOCATIONS_SUPPORT. - Add missing refcount put on 32-bit DSOs. - Fix disassembly of user space binaries by setting the binary_type of DSO when loading. - Update headers with the kernel sources, including asound.h, sched.h, fcntl, msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's cputype.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZwU2dgAKCRCyPKLppCJ+ J8uaAQDEbp0lMf1S/Y6vOGbnP6mGQCewQsXtIpSA4gcRMWlCCgD+O6ZxbnBCHOzn nQfBmbT62qUGuUA38Mg7pCyRXBd8FgU= =s4JZ -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix an assert() to handle captured and unprocessed ARM CoreSight CPU traces - Fix static build compilation error when libdw isn't installed or is too old - Add missing include when building with !HAVE_DWARF_GETLOCATIONS_SUPPORT - Add missing refcount put on 32-bit DSOs - Fix disassembly of user space binaries by setting the binary_type of DSO when loading - Update headers with the kernel sources, including asound.h, sched.h, fcntl, msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's cputype.h * tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace perf build: Fix build feature-dwarf_getlocations fail for old libdw perf build: Fix static compilation error when libdw is not installed perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT tools headers arm64: Sync arm64's cputype.h with the kernel sources perf tools: Cope with differences for lib/list_sort.c copy from the kernel tools check_headers.sh: Add check variant that excludes some hunks perf beauty: Update copy of linux/socket.h with the kernel sources tools headers UAPI: Sync the linux/in.h with the kernel sources perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources tools include UAPI: Sync linux/fcntl.h copy with the kernel sources tools include UAPI: Sync linux/sched.h copy with the kernel sources tools include UAPI: Sync sound/asound.h copy with the kernel sources perf vdso: Missed put on 32-bit dsos perf symbol: Set binary_type of dso when loading	2024-10-08 10:43:22 -07:00
Veronika Molnarova	6bff76af96	perf test attr: Add back missing topdown events With the patch `0b6c5371c0` "Add missing topdown metrics events" eight topdown metric events with numbers ranging from 0x8000 to 0x8700 were added to the test since they were added as 'perf stat' default events. Later the patch `951efb9976` "Update no event/metric expectations" kept only 4 of those events(0x8000-0x8300). Currently, the topdown events with numbers 0x8400 to 0x8700 are missing from the list of expected events resulting in a failure. Add back the missing topdown events. Fixes: `951efb9976` ("perf test attr: Update no event/metric expectations") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Tested-by: Ian Rogers <irogers@google.com> Cc: mpetlan@redhat.com Link: https://lore.kernel.org/r/20240311081611.7835-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:50:12 -07:00
Leo Yan	e52abceb4b	perf arm-spe: Dump metadata with version 2 This commit dumps metadata with version 2. It dumps metadata for header and per CPU data respectively in the arm_spe_print_info() function to support metadata version 2 format. After: 0 0 0x3c0 [0x1b0]: PERF_RECORD_AUXTRACE_INFO type: 4 Header version :2 Header size :4 PMU type v2 :13 CPU number :8 Magic :0x1010101010101010 CPU # :0 Num of params :3 MIDR :0x410fd801 PMU Type :-1 Min Interval :0 Magic :0x1010101010101010 CPU # :1 Num of params :3 MIDR :0x410fd801 PMU Type :-1 Min Interval :0 Magic :0x1010101010101010 CPU # :2 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :3 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :4 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :5 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :6 Num of params :3 MIDR :0x410fd850 PMU Type :-1 Min Interval :0 Magic :0x1010101010101010 CPU # :7 Num of params :3 MIDR :0x410fd850 PMU Type :-1 Min Interval :0 Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-6-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:31 -07:00
Leo Yan	7842a4b6ff	perf arm-spe: Support metadata version 2 This commit is to support metadata version 2 and at the meantime it is backward compatible for version 1's format. The metadata version 1 doesn't include the ARM_SPE_HEADER_VERSION field. As version 1 is fixed with two u64 fields, by checking the metadata size, it distinguishes the metadata is version 1 or version 2 (and any new versions if later will have). For version 2, it reads out CPU number and retrieves the metadata info for every CPU. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-5-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:27 -07:00
Leo Yan	703f344d0c	perf arm-spe: Save per CPU information in metadata Save the Arm SPE information on a per-CPU basis. This approach is easier in the decoding phase for retrieving metadata based on the CPU number of every Arm SPE record. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:24 -07:00
Leo Yan	59715b1908	perf arm-spe: Calculate meta data size The metadata is designed to contain a header and per CPU information. The arm_spe_find_cpus() function is introduced to identify how many CPUs support ARM SPE. Based on the CPU number, calculates the metadata size. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:20 -07:00
Leo Yan	0ca2c45404	perf arm-spe: Define metadata header version 2 The first version's metadata header structure doesn't include a field to indicate a header version, which is not friendly for extension. Define the metadata version 2 format with a new header structure and extend per CPU's metadata. In the meantime, the old metadata header will still be supported for backward compatibility. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:09 -07:00
Yoshihiro Furudera	f7ef062fe1	perf list: update option desc in man page There is a difference between the SYNOPSIS section of the help message and the man page (tools/perf/Documentation/perf-list.txt) for the perf list command. After checking, we found that the help message reflected the latest specifications. Therefore, revised the SYNOPSIS section of the man page to match the help message. Signed-off-by: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Liang Link: https://lore.kernel.org/r/20241003002404.2592094-1-fj5100bi@fujitsu.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 10:01:01 -07:00
Veronika Molnarova	f72751a73a	perf test: Restore sample rate for perf_event_attr Test "Setup struct perf_event_attr" consists of multiple test cases that can affect the max sample rate value for perf events. Some test cases check this value as it should not be lowered under the set minimum for the given test. Currently, it is possible for the test cases to affect each other as the previous tests can lower the sample rate, leading to a possible failure of some of the future test cases as the value is not restored at any point. # 10: Setup struct perf_event_attr: --- start --- test child forked, pid 104220 Using CPUID 0x00000000413fd0c1 running './tests/attr/test-record-C0' Current sample rate: 10000 running './tests/attr/test-record-basic' Current sample rate: 900 running './tests/attr/test-record-branch-any' Current sample rate: 600 running './tests/attr/test-record-dummy-C0' Current sample rate: 600 expected sample_period=4000, got 600 FAILED './tests/attr/test-record-dummy-C0' - match failure Restore the max sample rate value for perf events to a reasonable value before each test case if its value was lowered too much to ensure the same conditions for each test case. # 10: Setup struct perf_event_attr: --- start --- test child forked, pid 107222 Using CPUID 0x00000000413fd0c1 running './tests/attr/test-record-C0' Current sample rate: 10000 running './tests/attr/test-record-basic' Current sample rate: 800 running './tests/attr/test-record-branch-any' Current sample rate: 700 unsupp './tests/attr/test-record-branch-any' running './tests/attr/test-record-branch-filter-any' Current sample rate: 10000 running './tests/attr/test-record-count' Current sample rate: 10000 running './tests/attr/test-record-data' Current sample rate: 600 running './tests/attr/test-record-dummy-C0' Current sample rate: 800 running './tests/attr/test-record-freq' Current sample rate: 10000 ... Cc: Michael Petlan <mpetlan@redhat.com> Cc: Radostin Stoyanov <rstoyano@redhat.com> Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241003125136.15918-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 09:46:12 -07:00
Michael Petlan	d29d92df41	perf trace: Keep exited threads for summary Since `9ffa6c7512` ("perf machine thread: Remove exited threads by default") perf cleans exited threads up, but as said, sometimes they are necessary to be kept. The mentioned commit does not cover all the cases, we also need the information to construct the summary table in perf-trace. Before: # perf trace -s true Summary of events: After: # perf trace -s -- true Summary of events: true (383382), 64 events, 91.4% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ mmap 8 0 0.150 0.013 0.019 0.031 11.90% mprotect 3 0 0.045 0.014 0.015 0.017 6.47% openat 2 0 0.014 0.006 0.007 0.007 9.73% munmap 1 0 0.009 0.009 0.009 0.009 0.00% access 1 1 0.009 0.009 0.009 0.009 0.00% pread64 4 0 0.006 0.001 0.001 0.002 4.53% fstat 2 0 0.005 0.001 0.002 0.003 37.59% arch_prctl 2 1 0.003 0.001 0.002 0.002 25.91% read 1 0 0.003 0.003 0.003 0.003 0.00% close 2 0 0.003 0.001 0.001 0.001 3.86% brk 1 0 0.002 0.002 0.002 0.002 0.00% rseq 1 0 0.001 0.001 0.001 0.001 0.00% prlimit64 1 0 0.001 0.001 0.001 0.001 0.00% set_robust_list 1 0 0.001 0.001 0.001 0.001 0.00% set_tid_address 1 0 0.001 0.001 0.001 0.001 0.00% execve 1 0 0.000 0.000 0.000 0.000 0.00% [namhyung: simplified the condition] Fixes: `9ffa6c7512` ("perf machine thread: Remove exited threads by default") Reported-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20240927151926.399474-1-mpetlan@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 15:29:25 -07:00
Thomas Richter	5873de9031	perf/test: perf test 86 fails on s390 Command perf test 86 fails on s390: # perf test -F 86 ping 868299 [007] 28248.013596: probe_libc:inet_pton_1: (3ff95948020) 3ff95948020 inet_pton+0x0 (inlined) 3ff9595e6e7 text_to_binary_address+0x1007 (inlined) 3ff9595e6e7 gaih_inet+0x1007 (inlined) FAIL: expected backtrace entry \ "main\+0x[[:xdigit:]]+[[:space:]]$./bin/ping.$$" got "3ff9595e6e7 gaih_inet+0x1007 (inlined)" 86: probe libc's inet_pton & backtrace it with ping : FAILED! # The root cause is a new stack layout, two functions have been added as seen below. # perf script \| tac \| grep -m1 '^ping' -B9 \| tac ping 866856 [007] 25979.494921: probe_libc:inet_pton: (3ff8ec48020) 3ff8ec48020 inet_pton+0x0 (inlined) new --> 3ff8ec5e6e7 text_to_binary_address+0x1007 (inlined) new --> 3ff8ec5e6e7 gaih_inet+0x1007 (inlined) 3ff8ec5e6e7 getaddrinfo+0x1007 (/usr/lib64/libc.so.6) 2aa3fe04bf5 main+0xff5 (/usr/bin/ping) 3ff8eb34a5b __libc_start_call_main+0x8b (/usr/lib64/libc.so.6) 3ff8eb34b5d __libc_start_main@GLIBC_2.2+0xad (inlined) 2aa3fe06a1f [unknown] (/usr/bin/ping) # The new functions in the call chain are: - text_to_binary_address() - gaih_inet(). Both functions are inlined and do not show up in the output of the nm command: # nm -a /usr/lib64/libc.so.6 \| \ grep -E '(text_to_binary_address\|gaih_inet)$' # There is no possibility to add these 2 functions depending on their existance in the C library. Add text_to_binary_address() and gaih_inet() to the list of expected functions in an compatible way and extend the regular expression. On s390 the backtrace can now be Before After probe_libc:inet_pton probe_libc:inet_pton inet_pton inet_pton getaddrinfo getaddrinfo \| text_to_binary_address main main \| gaih_inet Output after: # perf test -F 86 86: probe libc's inet_pton & backtrace it with ping : Ok # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: agordeev@linux.ibm.com Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Link: https://lore.kernel.org/r/20241001124224.3370306-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:04 -07:00
Ben Gainey	90035d3cd8	tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events The "perf record" tool will now default to this new mode if the user specifies a sampling group when not in system-wide mode, and when "--no-inherit" is not specified. This change updates evsel to allow the combination of inherit and PERF_SAMPLE_READ. A fallback is implemented for kernel versions where this feature is not supported. Signed-off-by: Ben Gainey <ben.gainey@arm.com> Cc: james.clark@arm.com Link: https://lore.kernel.org/r/20241001121505.1009685-3-ben.gainey@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ben Gainey	80c281fca2	tools/perf: Correctly calculate sample period for inherited SAMPLE_READ values Sample period calculation in deliver_sample_value is updated to calculate the per-thread period delta for events that are inherit + PERF_SAMPLE_READ. When the sampling event has this configuration, the read_format.id is used with the tid from the sample to lookup the storage of the previously accumulated counter total before calculating the delta. All existing valid configurations where read_format.value represents some global value continue to use just the read_format.id to locate the storage of the previously accumulated total. perf_sample_id is modified to support tracking per-thread values, along with the existing global per-id values. In the per-thread case, values are stored in a hash by tid within the perf_sample_id, and are dynamically allocated as the number is not known ahead of time. Signed-off-by: Ben Gainey <ben.gainey@arm.com> Cc: james.clark@arm.com Link: https://lore.kernel.org/r/20241001121505.1009685-2-ben.gainey@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ian Rogers	ad321b19d2	perf test: Skip not fail syscall tp fields test when insufficient permissions Clean up return value to be TEST_* rather than unspecific integer. Add test case skip reason. Skip test if EACCES comes back from evsel__newtp. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241001052327.7052-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ian Rogers	7457bcfcfb	perf test: Skip not fail tp fields test when insufficient permissions Clean up return value to be TEST_* rather than unspecific integer. Add test case skip reason. Skip test if EACCES comes back from evsel__newtp. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241001052327.7052-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ian Rogers	1334ee9169	perf test: Fix memory leaks on event-times error paths These error paths occur without sufficient permissions. Fix the memory leaks to make leak sanitizer happier. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241001052327.7052-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ian Rogers	7f6ccb70e4	perf stat: Fix affinity memory leaks on error path Missed cleanup when an error occurs. Fixes: `49de179577` ("perf stat: No need to setup affinities when starting a workload") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241001052327.7052-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Kan Liang	8d7f85e323	perf jevents: Don't stop at the first matched pmu when searching a events table The "perf all PMU test" fails on a Coffee Lake machine. The failure is caused by the below change in the commit `e2641db83f` ("perf vendor events: Add/update skylake events/metrics"). + { + "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles", + "Counter": "FIXED", + "EventCode": "0xff", + "EventName": "UNC_CLOCK.SOCKET", + "PerPkg": "1", + "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.", + "Unit": "cbox_0" } The other cbox events have the unit name "CBOX", while the fixed counter has a unit name "cbox_0". So the events_table will maintain separate entries for cbox and cbox_0. The perf_pmus__print_pmu_events() calculates the total number of events, allocate an aliases buffer, store all the events into the buffer, sort, and print all the aliases one by one. The problem is that the calculated total number of events doesn't match the stored events in the aliases buffer. The perf_pmu__num_events() is used to calculate the number of events. It invokes the pmu_events_table__num_events() to go through the entire events_table to find all events. Because of the pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So the events for cbox and cbox_0 are all counted. When storing events into the aliases buffer, the perf_pmu__for_each_event() only process the events for cbox. Since a bigger buffer was allocated, the last entry are all 0. When printing all the aliases, null will be outputted, and trigger the failure. The mismatch was introduced from the commit `e3edd6cf63` ("perf pmu-events: Reduce processed events by passing PMU"). The pmu_events_table__for_each_event() stops immediately once a pmu is set. But for uncore, especially this case, the method is wrong and mismatch what perf does in the perf_pmu__num_events(). With the patch, $ perf list pmu \| grep -A 1 clock.socket unc_clock.socket [This 48-bit fixed counter counts the UCLK cycles. Unit: uncore_cbox_0 $ perf test "perf all PMU test" 107: perf all PMU test : Ok Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/all/202407101021.2c8baddb-oliver.sang@intel.com/ Fixes: `e3edd6cf63` ("perf pmu-events: Reduce processed events by passing PMU") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241001021431.814811-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Ilkka Koskinen	e934a35e3c	perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace If one builds perf with DEBUG=1, captures data on multiple CPUs and finally runs 'perf report -C <cpu>' for only one of the cpus, assert() aborts the program. This happens because there are empty queues with format set. This patch changes the condition to abort only if a queue is not empty and if the format is unset. $ make -C tools/perf DEBUG=1 CORESIGHT=1 CSLIBS=/usr/lib CSINCLUDES=/usr/include install $ perf record -o kcore --kcore -e cs_etm/timestamp/k -s -C 0-1 dd if=/dev/zero of=/dev/null bs=1M count=1 $ perf report --input kcore/data --vmlinux=/home/ikoskine/projects/linux/vmlinux -C 1 Aborted (core dumped) Fixes: `57880a7966` ("perf: cs-etm: Allocate queues for all CPUs") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240924233930.5193-1-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 18:21:49 -03:00
Yang Jihong	a530337ba9	perf build: Fix build feature-dwarf_getlocations fail for old libdw For libdw versions below 0.177, need to link libdl.a in addition to libbebl.a during static compilation, otherwise feature-dwarf_getlocations compilation will fail. Before: $ make LDFLAGS=-static BUILD: Doing 'make -j20' parallel build <SNIP> Makefile.config:483: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157 <SNIP> $ cat ../build/feature/test-dwarf_getlocations.make.output /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libebl.a(eblclosebackend.o): in function `ebl_closebackend': (.text+0x20): undefined reference to `dlclose' collect2: error: ld returned 1 exit status After: $ make LDFLAGS=-static <SNIP> Auto-detecting system features: ... dwarf: [ on ] <SNIP> $ ./perf probe Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...] or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...] or: perf probe [<options>] --del '[GROUP:]EVENT' ... or: perf probe --list [GROUP:]EVENT ... <SNIP> Fixes: `536661da6e` ("perf: build: Only link libebl.a for old libdw") Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240919013513.118527-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 18:21:49 -03:00
Yang Jihong	43f6564f18	perf build: Fix static compilation error when libdw is not installed If libdw is not installed in build environment, the output of 'pkg-config --modversion libdw' is empty, causing LIBDW_VERSION_2 to be empty and the shell test will have the following error: /bin/sh: 1: test: -lt: unexpected operator Before: $ pkg-config --modversion libdw Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found /bin/sh: 1: test: -lt: unexpected operator After: 1. libdw is not installed: $ pkg-config --modversion libdw Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found Makefile.config:473: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR 2. libdw version is lower than 0.177 $ pkg-config --modversion libdw 0.176 $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Auto-detecting system features: ... dwarf: [ on ] <SNIP> INSTALL libsubcmd_headers INSTALL libapi_headers INSTALL libperf_headers INSTALL libsymbol_headers INSTALL libbpf_headers LINK perf 3. libdw version is higher than 0.177 $ pkg-config --modversion libdw 0.186 $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Auto-detecting system features: ... dwarf: [ on ] <SNIP> CC util/bpf-utils.o CC util/pfm.o LD util/perf-util-in.o LD perf-util-in.o AR libperf-util.a LINK perf Fixes: `536661da6e` ("perf: build: Only link libebl.a for old libdw") Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240919013513.118527-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 18:21:49 -03:00
James Clark	008979cc69	perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT The linked fixes commit added an #include "dwarf-aux.h" to disasm.h which gets picked up in a lot of places. Without HAVE_DWARF_GETLOCATIONS_SUPPORT the stubs return an errno, so include errno.h to fix the following build error: In file included from util/disasm.h:8, from util/annotate.h:16, from builtin-top.c:23: util/dwarf-aux.h: In function 'die_get_var_range': util/dwarf-aux.h:183:10: error: 'ENOTSUP' undeclared (first use in this function) 183 \| return -ENOTSUP; \| ^~~~~~~ Fixes: `782959ac24` ("perf annotate: Add "update_insn_state" callback function to handle arch specific instruction tracking") Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241001123625.1063153-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 18:21:49 -03:00
Arnaldo Carvalho de Melo	36110669dd	perf tools: Cope with differences for lib/list_sort.c copy from the kernel With `6d74e1e371` ("tools/lib/list_sort: remove redundant code for cond_resched handling") we need to use the newly added hunk based exceptions when comparing the copy we carry in tools/lib/ to the original file, do it by adding the hunks that we know will be the expected diff. If at some point the original file is updated in other parts, then we should flag and check the file for update. Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20240930202136.16904-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 15:07:32 -03:00
Arnaldo Carvalho de Melo	cd46ea5ab4	tools check_headers.sh: Add check variant that excludes some hunks With `6d74e1e371` ("tools/lib/list_sort: remove redundant code for cond_resched handling") we end up with a multi-line variation in the merge_final() implementation, one that the simple line based exceptions we had so far can't cope. Thus this check has been failing: Warning: Kernel ABI header differences: diff -u tools/lib/list_sort.c lib/list_sort.c So add a new check routine that uses grep -vf to exclude some hunks that we store in the tools/perf/check-header_ignore_hunks/ directory. This first patch is just the new check routine, the next one will use it to check lib/list_sort.c. Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20240930202136.16904-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 14:50:44 -03:00
Dapeng Mi	80f192724e	perf tests: Add more topdown events regroup tests Add more test cases to cover all supported topdown events regroup cases. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-7-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00
Dapeng Mi	0836aa6008	perf tests: Add topdown events counting and sampling tests Add counting and leader sampling tests to verify topdown events including raw format can be reordered correctly. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-6-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00
Dapeng Mi	387892723a	perf tests: Add leader sampling test in record tests Add leader sampling test to validate event counts are captured into record and the count value is consistent. Suggested-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-5-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00
Dapeng Mi	3b5edc0421	perf x86/topdown: Don't move topdown metric events in group when running below perf command, we say error is reported. perf record -e "{slots,instructions,topdown-retiring}:S" -vv -C0 sleep 1 ------------------------------------------------------------ perf_event_attr: type 4 (cpu) size 168 config 0x400 (slots) sample_type IP\|TID\|TIME\|READ\|CPU\|PERIOD\|IDENTIFIER read_format ID\|GROUP\|LOST disabled 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 4 (cpu) size 168 config 0x8000 (topdown-retiring) { sample_period, sample_freq } 4000 sample_type IP\|TID\|TIME\|READ\|CPU\|PERIOD\|IDENTIFIER read_format ID\|GROUP\|LOST freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8 sys_perf_event_open failed, error -22 Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring). The reason of error is that the events are regrouped and topdown-retiring event is moved to closely after the slots event and topdown-retiring event needs to do the sampling, but Intel PMU driver doesn't support to sample topdown metrics events. For topdown metrics events, it just requires to be in a group which has slots event as leader. It doesn't require topdown metrics event must be closely after slots event. Thus it's a overkill to move topdown metrics event closely after slots event in events regrouping and furtherly cause the above issue. Thus don't move topdown metrics events forward if they are already in a group. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-4-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00
Dapeng Mi	1e53e9d178	perf x86/topdown: Correct leader selection with sample_read enabled Addresses an issue where, in the absence of a topdown metrics event within a sampling group, the slots event was incorrectly bypassed as the sampling leader when sample_read was enabled. perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1 In this case, the slots event should be sampled as leader but the branches event is sampled in fact like the verbose output shows. perf_event_attr: type 4 (cpu) size 168 config 0x400 (slots) sample_type IP\|TID\|TIME\|READ\|CPU\|IDENTIFIER read_format ID\|GROUP\|LOST disabled 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 168 config 0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS) { sample_period, sample_freq } 10000 sample_type IP\|TID\|TIME\|READ\|CPU\|IDENTIFIER read_format ID\|GROUP\|LOST sample_id_all 1 exclude_guest 1 The sample period of slots event instead of branches event is reset to 0. This fix ensures the slots event remains the leader under these conditions. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-3-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00

... 9 10 11 12 13 ...

17720 Commits