linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-03 17:51:23 +00:00

Author	SHA1	Message	Date
Ian Rogers	870b92024e	perf vendor events: Update Rocketlake events/metrics Update events from v1.03 to v1.04. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.04: `015d5a5eab` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-19-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	b4152015a9	perf vendor events: Update Meteorlake events/metrics Update events from v1.10 to v1.12. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.12: `d8fe70c91b` `b9dabd05ff` This updates the mapfile.csv for the 0xB5 CPUID variant of meteorlake. `c3094bc9bb` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-18-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	23878069de	perf vendor events: Update/add Lunarlake events/metrics Update events from v1.01 to v1.10. Add TMA metrics 5.02. Bring in the event updates v1.11: `af329039e8` `4a1cff8ceb` `cbc3b0dc19` `28f4b24f91` `172900e962` `dab0308f7a` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-17-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	c49b050915	perf vendor events: Update IcelakeX events/metrics Update events from v1.26 to v1.27. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.27: `6ee80d0532` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-16-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	094b233575	perf vendor events: Update Icelake events/metrics Update events from v1.22 to v1.24. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.24: `d4f10746cf` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-15-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	be67d89f79	perf vendor events: Update HaswellX events/metrics Update events from v28 to v29. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v29: `71dbf03aba` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-14-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	55bf5d0792	perf vendor events: Update Haswell events/metrics Update events from v35 to v36. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v36: `616ec6fc03` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Remove duplicate event UNC_CLOCK.SOCKET that was erroneously left in uncore-other.json. Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-13-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:39 -08:00
Ian Rogers	aaa73d778b	perf vendor events: Update/add Graniterapids events/metrics Update events from v1.02 to v1.06. Add TMA metrics 5.02. Bring in the event updates v1.06: `de5502e51a` `79b9e512ea` `bc74a895e4` The TMA 5.02 addition is from (with subsequent fixes): `1d72913b2d` Update uncore IIO events umask with the change: `d78e8a1665` which should address an issue originally raised by Michael Petlan: Reported-by: Michael Petlan <mpetlan@redhat.com> Closes: https://lore.kernel.org/all/alpine.LRH.2.20.2401300733310.11354@Diego/ Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	b52c4123a5	perf vendor events: Update GrandRidge events/metrics Update events from v1.03 to v1.05. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.05: `3b2e3528fb` `9bc1815536` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Update uncore IIO events umask with the change: `d78e8a1665` which should address an issue originally raised by Michael Petlan: Reported-by: Michael Petlan <mpetlan@redhat.com> Closes: https://lore.kernel.org/all/alpine.LRH.2.20.2401300733310.11354@Diego/ Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	5ee60fbf73	perf vendor events: Update EmeraldRapids events/metrics Update events from v1.09 to v1.11. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.11: `bffcec00a1` `a63da6de48` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Update uncore IIO events umask with the change: `d78e8a1665` which should address an issue originally raised by Michael Petlan: Reported-by: Michael Petlan <mpetlan@redhat.com> Closes: https://lore.kernel.org/all/alpine.LRH.2.20.2401300733310.11354@Diego/ Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	e415c1493f	perf vendor events: Add Clearwaterforest events Add events v1.00. Bring in the events from: https://github.com/intel/perfmon/tree/main/CWF/events Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	7487e4fce9	perf vendor events: Update CascadelakeX events/metrics Update events from v1.22 to v1.23. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.23: `8f3665f6be` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	a75d905d64	perf vendor events: Update BroadwellX events/metrics Update events from v22 to v23. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v23: `679982113f` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	11e644eb46	perf vendor events: Update BroadwellDE events/metrics Update events from v11 to v12. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v12: `e0b83388d5` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	240411b048	perf vendor events: Update Broadwell events/metrics Update events from v29 to v30. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v30: `9a1827b2ac` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	ba56a91063	perf vendor events: Add Arrowlake events/metrics Add events v1.07. Add TMA metrics based on v5.02. Bring in the events from: https://github.com/intel/perfmon/tree/main/ARL/events TMA 5.02 is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	b04fe42f6e	perf vendor events: Update AlderlakeN events/metrics Update events from v1.27 to v1.28. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.28: `801f43f22e` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-developed-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Ian Rogers	54169b4663	perf vendor events: Update Alderlake events/metrics Update events from v1.27 to v1.28. Update TMA metrics from 4.8 to 5.02. Bring in the event updates v1.28: `801f43f22e` The TMA 5.02 update is from (with subsequent fixes): `1d72913b2d` Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250211213031.114209-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:54:38 -08:00
Namhyung Kim	70f127c716	perf tools: Use symfs when opening debuginfo by path I found that it failed to load a binary using --symfs option. Say I have a binary in /home/user/prog/xxx and a perf data file with it. If I move them to a different machine and use --symfs, it tries to find the binary in some locations under symfs using dso__read_binary_type_filename(), but not the last one. ${symfs}/usr/lib/debug/home/user/prog/xxx.debug ${symfs}/usr/lib/debug/home/user/prog/xxx ${symfs}/home/user/prog/.debug/xxx /home/user/prog/xxx It should check ${symfs}/home/usr/prog/xxx. Let's fix it. Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20250212221445.437481-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:16 -08:00
Namhyung Kim	fc00897c8a	perf trace: Add --summary-mode option The --summary-mode option will select how to show the syscall summary at the end. By default, it'll show the summary for each thread and it's the same as if --summary-mode=thread is passed. The other option is to show total summary, which is --summary-mode=total. I'd like to have this instead of a separate option like --total-summary because we may want to add a new summary mode (by cgroup) later. $ sudo ./perf trace -as --summary-mode=total sleep 1 Summary of events: total, 21580 events syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ epoll_wait 1305 0 14716.712 0.000 11.277 551.529 8.87% futex 1256 89 13331.197 0.000 10.614 733.722 15.49% poll 669 0 6806.618 0.000 10.174 459.316 11.77% ppoll 220 0 3968.797 0.000 18.040 516.775 25.35% clock_nanosleep 1 0 1000.027 1000.027 1000.027 1000.027 0.00% epoll_pwait 21 0 592.783 0.000 28.228 522.293 88.29% nanosleep 16 0 60.515 0.000 3.782 10.123 33.33% ioctl 510 0 4.284 0.001 0.008 0.182 8.84% recvmsg 1434 775 3.497 0.001 0.002 0.174 6.37% write 1393 0 2.854 0.001 0.002 0.017 1.79% read 1063 100 2.236 0.000 0.002 0.083 5.11% ... Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250205205443.1986408-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:16 -08:00
Namhyung Kim	bd50a26c9a	perf tools: Get rid of now-unused rb_resort.h It was only used in perf trace and it switched to use hashmap instead. Let's delete the code. Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250205205443.1986408-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:15 -08:00
Namhyung Kim	ef2da619b1	perf trace: Convert syscall_stats to hashmap It was using a RBtree-based int-list as a hash and a custom resort logic for that. As we have hashmap, let's convert to it and add a custom sort function for the hashmap entries using an array. It should be faster and more light-weighted. It's also to prepare supporting system-wide syscall stats. No functional changes intended. Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250205205443.1986408-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:15 -08:00
Namhyung Kim	c7f821b876	perf trace: Allocate syscall stats only if summary is on The syscall stats are used only when summary is requested. Let's avoid unnecessary operations. While at it, let's pass 'trace' pointer directly instead of passing 'output' file pointer and 'summary' option in the 'trace' separately. Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250205205443.1986408-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:44:10 -08:00
James Clark	615ec00b06	perf tests: Fix Tool PMU test segfault tool_pmu__event_to_str() now handles skipped events by returning NULL, so it's wrong to re-check for a skip on the resulting string. Calling tool_pmu__skip_event() with a NULL string results in a segfault so remove the unnecessary skip to fix it: $ perf test -vv "parsing with PMU name" 12.2: Parsing with PMU name: ... ---- unexpected signal (11) ---- 12.2: Parsing with PMU name : FAILED! Fixes: `ee8aef2d23` ("perf tools: Add skip check in tool_pmu__event_to_str()") Signed-off-by: James Clark <james.clark@linaro.org> Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250212163859.1489916-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-12 19:34:56 -08:00
Kan Liang	ee8aef2d23	perf tools: Add skip check in tool_pmu__event_to_str() Some topdown related metrics may fail on hybrid machines. $ perf stat -M tma_frontend_bound Cannot resolve IDs for tma_frontend_bound: cpu_atom@TOPDOWN_FE_BOUND.ALL@ / (8 * cpu_atom@CPU_CLK_UNHALTED.CORE@) In the find_tool_events(), the tool_pmu__event_to_str() is used to compare the tool_events. It only checks the event name, no PMU or arch. So the tool_events[TOOL_PMU__EVENT_SLOTS] is set to true, because the p-core Topdown metrics has "slots" event. The tool_events is shared. So when parsing the e-core metrics, the "slots" is automatically added. The "slots" event as a tool event should only be available on arm64. It has a different meaning on X86. The tool_pmu__skip_event() intends handle the case. Apply it for tool_pmu__event_to_str() as well. There is a lack of sanity check in the expr__get_id(). Add the check. Closes: https://lore.kernel.org/lkml/608077bc-4139-4a97-8dc4-7997177d95c4@linux.intel.com/ Fixes: `069057239a` ("perf tool_pmu: Move expr literals to tool_pmu") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: thomas.falcon@intel.com Link: https://lore.kernel.org/r/20250207152844.302167-1-kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-10 11:46:30 -08:00
Dr. David Alan Gilbert	1df4b33f62	perf tools: Deadcode removal The last use of machine__fprintf_vmlinux_path() was removed in 2011 by commit `ab81f3fd35` ("perf top: Reuse the 'report' hist_entry/hists classes") mmap_cpu_mask__duplicate() was added in 2021 by commit `6bd006c6eb` ("perf mmap: Introduce mmap_cpu_mask__duplicate()") but hasn't been used since. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250204220545.456435-1-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-10 11:46:02 -08:00
Namhyung Kim	9e676a024f	Linux 6.14-rc1 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmegAi4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG+cMH/jFx5lmvzVObuStc OdqfdMJVF238cX3iovDF6hLMDCuSgYY9CX5FYmd7pGtxGuUEecSLxin+WbJcxfin WBHzgPP+hmcjqpU0yCd3azITi8BHJeFCgT86OM/1Rsv82M4T/xWxBIET79izQJ0E 5L9KzlmPMLTLbLPVa+wookXfoJOycWRDCN6p/jxTLzeM/szqDlokAsSf19iodkl/ 59Gnk5oEYneqyt4FdTgxWcq1fteTlzZJgC6heN5XIjZuSN1ME11N4QO0xu+ld3UA nzbpnNwCRIl50yO5+pvYpkoRrHDwxjJ7an9sliWAHxDt/etVngTaSsl8uGht/9QK +4Vi48I= =TI43 -----END PGP SIGNATURE----- Merge tag 'v6.14-rc1' into perf-tools-next To get the various fixes in the current master. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-05 14:57:18 -08:00
Ian Rogers	357b965deb	perf stat: Changes to event name uniquification The existing logic would disable uniquification on an evlist or enable it per evsel, this is unfortunate as uniquification is most needed when events have the same name and so the whole evlist must be considered. Change the initial disable uniquify on an evlist processing to also set a needs_uniquify flag, for cases like the matching event names. This must be done as an initial pass as uniquification of an event name will change the behavior of the check. Keep the per counter uniquification but now only uniquify event names when the needs_uniquify flag is set. Before this change a hwmon like temp1 wouldn't be uniquified and afterwards it will (ie the PMU is added to the temp1 event's name). Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:29:13 -08:00
Ian Rogers	2d9961c690	perf stat: Don't merge counters purely on name Counter merging was added in commit `942c559339` ("perf stat: Add perf_stat_merge_counters()"), however, it merges events with the same name on different PMUs regardless of whether the different PMUs are actually of the same type (ie they differ only in the suffix on the PMU). For hwmon events there may be a temp1 event on every PMU, but the PMU names are all unique and don't differ just by a suffix. The merging is over eager and will merge all the hwmon counters together meaning an aggregated and very large temp1 value is shown. The same would be true for say cache events and memory controller events where the PMUs differ but the event names are the same. Fix the problem by correctly saying two PMUs alias when they differ only by suffix. Note, there is an overlap with evsel's merged_stat with aggregation and the evsel's metric_leader where aggregation happens for metrics. Fixes: `942c559339` ("perf stat: Add perf_stat_merge_counters()") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:29:05 -08:00
Ian Rogers	63e287131c	perf pmu: Rename name matching for no suffix or wildcard variants Wildcard PMU naming will match a name like pmu_1 to a PMU name like pmu_10 but not to a PMU name like pmu_2 as the suffix forms part of the match. No suffix matching will match pmu_10 to either pmu_1 or pmu_2. Add or rename matching functions on PMU to make it clearer what kind of matching is being performed. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:28:46 -08:00
Ian Rogers	57e13264dc	perf pmus: Restructure pmu_read_sysfs to scan fewer PMUs Rather than scanning core or all PMUs, allow pmu_read_sysfs to read some combination of core, other, hwmon and tool PMUs. The PMUs that should be read and are already read are held as bitmaps. It is known that a "hwmon_" prefix is necessary for a hwmon PMU's name, similarly with "tool", so only scan those PMUs in situations the PMU name or the PMU's type number make sense to. The number of openat system calls reduces from 276 to 98 for a hwmon event. The number of openats for regular perf events isn't changed. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:28:37 -08:00
Ian Rogers	340c345e58	perf evsel: Reduce scanning core PMUs in is_hybrid evsel__is_hybrid returns true if there are multiple core PMUs and the evsel is for a core PMU. Determining the number of core PMUs can require loading/scanning PMUs. There's no point doing the scanning if evsel for the is_hybrid test isn't core so reorder the tests to reduce PMU scanning. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 21:28:25 -08:00
Thomas Richter	888751e4d0	perf test: Fix Hwmon PMU test endianess issue perf test 11 hwmon fails on s390 with this error # ./perf test -Fv 11 --- start --- ---- end ---- 11.1: Basic parsing test : Ok --- start --- Testing 'temp_test_hwmon_event1' Using CPUID IBM,3931,704,A01,3.7,002f temp_test_hwmon_event1 -> hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/ FAILED tests/hwmon_pmu.c:189 Unexpected config for 'temp_test_hwmon_event1', 292470092988416 != 655361 ---- end ---- 11.2: Parsing without PMU name : FAILED! --- start --- Testing 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/' FAILED tests/hwmon_pmu.c:189 Unexpected config for 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/', 292470092988416 != 655361 ---- end ---- 11.3: Parsing with PMU name : FAILED! # The root cause is in member test_event::config which is initialized to 0xA0001 or 655361. During event parsing a long list event parsing functions are called and end up with this gdb call stack: #0 hwmon_pmu__config_term (hwm=0x168dfd0, attr=0x3ffffff5ee8, term=0x168db60, err=0x3ffffff81c8) at util/hwmon_pmu.c:623 #1 hwmon_pmu__config_terms (pmu=0x168dfd0, attr=0x3ffffff5ee8, terms=0x3ffffff5ea8, err=0x3ffffff81c8) at util/hwmon_pmu.c:662 #2 0x00000000012f870c in perf_pmu__config_terms (pmu=0x168dfd0, attr=0x3ffffff5ee8, terms=0x3ffffff5ea8, zero=false, apply_hardcoded=false, err=0x3ffffff81c8) at util/pmu.c:1519 #3 0x00000000012f88a4 in perf_pmu__config (pmu=0x168dfd0, attr=0x3ffffff5ee8, head_terms=0x3ffffff5ea8, apply_hardcoded=false, err=0x3ffffff81c8) at util/pmu.c:1545 #4 0x00000000012680c4 in parse_events_add_pmu (parse_state=0x3ffffff7fb8, list=0x168dc00, pmu=0x168dfd0, const_parsed_terms=0x3ffffff6090, auto_merge_stats=true, alternate_hw_config=10) at util/parse-events.c:1508 #5 0x00000000012684c6 in parse_events_multi_pmu_add (parse_state=0x3ffffff7fb8, event_name=0x168ec10 "temp_test_hwmon_event1", hw_config=10, const_parsed_terms=0x0, listp=0x3ffffff6230, loc_=0x3ffffff70e0) at util/parse-events.c:1592 #6 0x00000000012f0e4e in parse_events_parse (_parse_state=0x3ffffff7fb8, scanner=0x16878c0) at util/parse-events.y:293 #7 0x00000000012695a0 in parse_events__scanner (str=0x3ffffff81d8 "temp_test_hwmon_event1", input=0x0, parse_state=0x3ffffff7fb8) at util/parse-events.c:1867 #8 0x000000000126a1e8 in __parse_events (evlist=0x168b580, str=0x3ffffff81d8 "temp_test_hwmon_event1", pmu_filter=0x0, err=0x3ffffff81c8, fake_pmu=false, warn_if_reordered=true, fake_tp=false) at util/parse-events.c:2136 #9 0x00000000011e36aa in parse_events (evlist=0x168b580, str=0x3ffffff81d8 "temp_test_hwmon_event1", err=0x3ffffff81c8) at /root/linux/tools/perf/util/parse-events.h:41 #10 0x00000000011e3e64 in do_test (i=0, with_pmu=false, with_alias=false) at tests/hwmon_pmu.c:164 #11 0x00000000011e422c in test__hwmon_pmu (with_pmu=false) at tests/hwmon_pmu.c:219 #12 0x00000000011e431c in test__hwmon_pmu_without_pmu (test=0x1610368 <suite.hwmon_pmu>, subtest=1) at tests/hwmon_pmu.c:23 where the attr::config is set to value 292470092988416 or 0x10a0000000000 in line 625 of file ./util/hwmon_pmu.c: attr->config = key.type_and_num; However member key::type_and_num is defined as union and bit field: union hwmon_pmu_event_key { long type_and_num; struct { int num :16; enum hwmon_type type :8; }; }; s390 is big endian and Intel is little endian architecture. The events for the hwmon dummy pmu have num = 1 or num = 2 and type is set to HWMON_TYPE_TEMP (which is 10). On s390 this assignes member key::type_and_num the value of 0x10a0000000000 (which is 292470092988416) as shown in above trace output. Fix this and export the structure/union hwmon_pmu_event_key so the test shares the same implementation as the event parsing functions for union and bit fields. This should avoid endianess issues on all platforms. Output after: # ./perf test -F 11 11.1: Basic parsing test : Ok 11.2: Parsing without PMU name : Ok 11.3: Parsing with PMU name : Ok # Fixes: `531ee0fd48` ("perf test: Add hwmon "PMU" test") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250131112400.568975-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 17:22:40 -08:00
Thomas Richter	90d97674d4	perf test: Use cycles event in perf record test for leader_sampling On s390 the event instructions can not be used for recording. This event is only supported by perf stat. Change the event from instructions to cycles in subtest test_leader_sampling. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Suggested-by: James Clark <james.clark@linaro.org> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250131102756.4185235-3-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 11:36:14 -08:00
Thomas Richter	859199431d	perf test: Fix perf record test for precise_max On s390 the event instructions can not be used for recording. This event is only supported by perf stat. Test that each event cycles and instructions supports sampling. If the event can not be sampled, skip it. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Suggested-by: James Clark <james.clark@linaro.org> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250131102756.4185235-2-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-04 11:34:25 -08:00
Anubhav Shelat	23e0a63c6d	perf script: force stdin for flamegraph in live mode Currently, running "perf script flamegraph -a -F 99 sleep 1" should produce flamegraph.html containing the flamegraph. Howevever, it gives a segmentation fault. This is caused because the flamegraph.py script is supposed to take as input the output of "perf record", which should be in stdin. This would require passing "-i -" to flamegraph.py. However, the "flamegraph-report" script causes "perf script" command to take the "-i -" option instead of flamegraph.py, which causes no problem for "perf script", but causes a seg fault since flamegraph.py has no input file. To fix this I added the "-i -" option directly to the flamegraph-report script to ensure flamegraph.py gets input from stdin. Signed-off-by: Anubhav Shelat <ashelat@redhat.com> Tested-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20250131145704.3164542-2-ashelat@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-03 19:49:10 -08:00
Ian Rogers	bb4b8f9697	perf test: Extra verbosity and hypervisor skip for tpebs test When not running as root and with higher perf event paranoia values the perf record forked by TPEBS can fail to attach to the process. Skip the test in these scenarios. Intel TPEBS test skips on non-Intel CPUs. On Intel CPUs under a hypervisor the cache-misses event may not be present or precise. Skip the test under this condition. Refactor the output code to be placed in a file so that on a signal the file can be dumped. This was necessary to catch the issue above as the failing perf record command would fail without output. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250130170135.5817-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-02-03 19:45:50 -08:00
James Clark	4c4c0724d6	perf: Always feature test reallocarray This is also used in util/comm.c now, so instead of selectively doing the feature test, always do it. If it's ever used anywhere else it's less likely to cause another build failure. This doesn't remove the need to manually include libc_compat.h, and missing that will still cause an error for glibc < 2.26. There isn't a way to fix that without poisoning reallocarray like libbpf did, but that has other downsides like making memory debugging tools less useful. So for Perf keep it like this and we'll have to fix up any missed includes. Fixes the following build error: util/comm.c:152:31: error: implicit declaration of function 'reallocarray' [-Wimplicit-function-declaration] 152 \| tmp = reallocarray(comm_strs->strs, \| ^~~~~~~~~~~~ Fixes: `13ca628716` ("perf comm: Add reference count checking to 'struct comm_str'") Reported-by: Ali Utku Selen <ali.utku.selen@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250129154405.777533-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-31 14:45:19 -08:00
Linus Torvalds	c06310fd6b	perf-tools fixes for 6.14 An early round of random fixes in perf tools for this cycle. perf trace ---------- * Fix loading of BPF program on certain clang versions * Fix out-of-bound access in syscalls with 6 arguments * Skip syscall enum test if landlock syscall is not available perf annotate ------------- * Fix segfaults due to invalid access in disasm arrays perf stat --------- * Fix error handling in topology parsing Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ5vx+gAKCRCMstVUGiXM g8PpAP9fNWvkxEiylqO9GGqMJWnIwWwlz4NCqqOZWyPspcECrgD9Eu0lZlna4tOL 3I8giYN2m7ogNt+ZXP2b0y2np7hOGQc= =lVVJ -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.14-2025-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "An early round of random fixes in perf tools for this cycle. perf trace: - Fix loading of BPF program on certain clang versions - Fix out-of-bound access in syscalls with 6 arguments - Skip syscall enum test if landlock syscall is not available perf annotate: - Fix segfaults due to invalid access in disasm arrays perf stat: - Fix error handling in topology parsing" * tag 'perf-tools-fixes-for-v6.14-2025-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf cpumap: Fix die and cluster IDs perf test: Skip syscall enum test if no landlock syscall perf trace: Fix runtime error of index out of bounds perf annotate: Use an array for the disassembler preference perf trace: Fix BPF loading failure (-E2BIG)	2025-01-30 17:38:20 -08:00
Ian Rogers	8ce0d2da14	perf stat: Fix find_stat for mixed legacy/non-legacy events Legacy events typically don't have a PMU when added leading to mismatched legacy/non-legacy cases in find_stat. Use evsel__find_pmu to make sure the evsel PMU is looked up. Update the evsel__find_pmu code to look for the PMU using the extended config type or, for legacy hardware/hw_cache events on non-hybrid systems, just use the core PMU. Before: ``` $ perf stat -e cycles,cpu/instructions/ -a sleep 1 Performance counter stats for 'system wide': 215,309,764 cycles 44,326,491 cpu/instructions/ 1.002555314 seconds time elapsed ``` After: ``` $ perf stat -e cycles,cpu/instructions/ -a sleep 1 Performance counter stats for 'system wide': 990,676,332 cycles 1,235,762,487 cpu/instructions/ # 1.25 insn per cycle 1.002667198 seconds time elapsed ``` Fixes: `3612ca8e29` ("perf stat: Fix the hard-coded metrics calculation on the hybrid") Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250109222109.567031-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-29 14:06:25 -08:00
Ian Rogers	6ab89b7fc2	perf evsel: Add pmu_name helper Add helper to get the name of the evsel's PMU. This handles the case where there's no sysfs PMU via parse_events event_type helper. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250109222109.567031-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-29 14:05:57 -08:00
James Clark	9fae5884bb	perf cpumap: Fix die and cluster IDs Now that filename__read_int() returns -errno instead of -1 these statements need to be updated otherwise error values will be used as die IDs. This appears as a -2 die ID when the platform doesn't export one: $ perf stat --per-core -a -- true S36-D-2-C0 1 9.45 msec cpu-clock And the session topology test fails: $ perf test -vvv topology CPU 0, core 0, socket 36 CPU 1, core 1, socket 36 CPU 2, core 2, socket 36 CPU 3, core 3, socket 36 FAILED tests/topology.c:137 Cpu map - Die ID doesn't match ---- end(-1) ---- 38: Session topology : FAILED! Fixes: `05be17eed7` ("tool api fs: Correctly encode errno for read/write open failures") Reported-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241218115552.912517-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-28 10:03:26 -08:00
Namhyung Kim	72d81e1062	perf test: Skip syscall enum test if no landlock syscall The perf trace enum augmentation test specifically targets landlock_ add_rule syscall but IIUC it's an optional and can be opt-out by a kernel config. Currently trace_landlock() runs `perf test -w landlock` before the actual testing to check the availability but it's not enough since the workload always returns 0. Instead it could check if perf trace output has 'landlock' string. Fixes: `d66763fed3` ("perf test trace_btf_enum: Add regression test for the BTF augmentation of enums in 'perf trace'") Reviewed-by: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250128170629.1251574-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-28 09:29:39 -08:00
Howard Chu	c7b87ce0dd	perf trace: Fix runtime error of index out of bounds libtraceevent parses and returns an array of argument fields, sometimes larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr", idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6 elements max, creating an out-of-bounds access. This runtime error is found by UBsan. The error message: $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1 builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]' #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966 #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110 #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436 #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897 #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335 #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502 #6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351 #7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404 #8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448 #9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556 #10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360 #12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6) 0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1) = 1 Fixes: `5e58fcfaf4` ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint") Signed-off-by: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250122025519.361873-1-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-28 09:27:27 -08:00
Ian Rogers	bde4ccfd5a	perf annotate: Use an array for the disassembler preference Prior to this change a string was used which could cause issues with an unrecognized disassembler in symbol__disassembler. Change to initializing an array of perf_disassembler enum values. If a value already exists then adding it a second time is ignored to avoid array out of bounds problems present in the previous code, it also allows a statically sized array and removes memory allocation needs. Errors in the disassembler string are reported when the config is parsed during perf annotate or perf top start up. If the array is uninitialized after processing the config file the default llvm, capstone then objdump values are added but without a need to parse a string. Fixes: `a6e8a58de6` ("perf disasm: Allow configuring what disassemblers to use") Closes: https://lore.kernel.org/lkml/CAP-5=fUdfCyxmEiTpzS2uumUp3-SyQOseX2xZo81-dQtWXj6vA@mail.gmail.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250124043856.1177264-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-27 15:58:01 -08:00
James Clark	66e99fd5a1	perf vendor events arm64: Add V3 events/metrics Using the scripts at: https://gitlab.arm.com/telemetry-solution/telemetry-solution/ Generate perf json for neoverse-v3 using the following command: ``` $ telemetry-solution/tools/perf_json_generator/generate.py \ tools/perf/ --telemetry-files \ telemetry-solution/data/pmu/cpu/neoverse/neoverse-v3.json ``` Signed-off-by: Ian Rogers <irogers@google.com> [Re-generate after updating script] Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250122163504.2061472-3-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-24 15:14:15 -08:00
James Clark	994256a798	perf vendor events arm64: Add N3 events/metrics Using the scripts at: https://gitlab.arm.com/telemetry-solution/telemetry-solution/ Generate perf json for neoverse-n3 using the following command: ``` $ telemetry-solution/tools/perf_json_generator/generate.py \ tools/perf/ --telemetry-files \ telemetry-solution/data/pmu/cpu/neoverse/neoverse-n3.json ``` Signed-off-by: Ian Rogers <irogers@google.com> [Re-generate after updating script] Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250122163504.2061472-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-24 15:14:06 -08:00
Benjamin Peterson	0aefb3df8b	perf trace: Fix return value of trace__fprintf_tp_fields This function formerly returned twice the number of bytes printed. Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Reviewed-by: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250123-void-fprintf_tp_fields-v2-1-6038f8224987@engflow.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-24 13:21:49 -08:00
Linus Torvalds	7685b334d1	perf-tools changes for v6.14 There are a lot of changes in the perf tools in this cycle. build ----- * Use generic syscall table to generate syscall numbers on supported archs. * This also enables to get rid of libaudit which was used for syscall numbers. * Remove python2 support as it's deprecated for years. * Fix issues on static build with libzstd. perf record ----------- * Intel-PT supports "aux-action" config term to pause or resume tracing in the aux-buffer. Users can start the intel_pt event as "started-paused" and configure other events to control the Intel-PT tracing. # perf record --kcore -e intel_pt/aux-action=start-paused/ \ -e syscalls:sys_enter_newuname/aux-action=resume/ \ -e syscalls:sys_exit_newuname/aux-action=pause/ -- uname This requires the kernel support (which was added in v6.13). perf lock --------- * 'perf lock contention' command has an ability to symbolize locks in dynamically allocated objects using slab cache name when it runs with BPF. Those dynamic locks would have "&" prefix in the name to distinguish them from ordinary (static) locks. # perf lock con -abl -E 5 sleep 1 contended total wait max wait avg wait address symbol 2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex) 1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex) 4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex) 2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex) 3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex) This also requires the kernel/BPF support (which was added in v6.13). perf ftrace ----------- * 'perf ftrace latency' command gets a couple of options to support linear buckets instead of exponential. Also it's possible to specify max and min latency for the linear buckets. # perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \ --min-latency=200 --max-latency=800 -- sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 200 ns \| 186 \| ### \| 200 - 300 ns \| 256 \| ##### \| 300 - 400 ns \| 364 \| ####### \| 400 - 500 ns \| 223 \| #### \| 500 - 600 ns \| 111 \| ## \| 600 - 700 ns \| 41 \| \| 700 - 800 ns \| 141 \| ## \| 800 - ... ns \| 169 \| ### \| # statistics (in nsec) total time: 2162212 avg time: 967 max time: 16817 min time: 132 count: 2236 * As you can see in the above example, it nows shows the statistics at the end so that users can see the avg/max/min latencies easily. * 'perf ftrace profile' command has --graph-opts option like 'perf ftrace trace' so that it can control the tracing behaviors in the same way. For example, it can limit the function call depth or threshold. perf script ----------- * Improve physical memory resolution in 'mem-phys-addr' script by parsing /proc/iomem file. # perf script mem-phys-addr -- find / ... Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-85f7fffff : System RAM 8929 69.7 547600000-54785d23f : Kernel data 1240 9.7 546a00000-5474bdfff : Kernel rodata 490 3.8 5480ce000-5485fffff : Kernel bss 121 0.9 0-fff : Reserved 3860 30.1 100000-89c01fff : System RAM 18 0.1 8a22c000-8df6efff : System RAM 5 0.0 Others ------ * 'perf test' gets --runs-per-test option to run the test cases repeatedly. This would be helpful to see if it's flaky. * Add 'parse_events' method to Python perf extension module, so that users can use the same event parsing logic in the python code. One more step towards implementing perf tools in Python. :) * Support opening tracepoint events without libtraceevent. This will be helpful if it won't use the tracing data like in 'perf stat'. * Update ARM Neoverse N2/V2 JSON events and metrics Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ5AgiQAKCRCMstVUGiXM g0WhAP43Dpfatrm1jicTyAogk5D/JrIMOgjGtrJJi5RXG/r0gwD8DSWFzLppS9xy KGtjLHrN6v6BqR4DCubdlZmRfh9Qjgg= =M0Kz -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf-tools updates from Namhyung Kim: "There are a lot of changes in the perf tools in this cycle. build: - Use generic syscall table to generate syscall numbers on supported archs - This also enables to get rid of libaudit which was used for syscall numbers - Remove python2 support as it's deprecated for years - Fix issues on static build with libzstd perf record: - Intel-PT supports "aux-action" config term to pause or resume tracing in the aux-buffer. Users can start the intel_pt event as "started-paused" and configure other events to control the Intel-PT tracing: # perf record --kcore -e intel_pt/aux-action=start-paused/ \ -e syscalls:sys_enter_newuname/aux-action=resume/ \ -e syscalls:sys_exit_newuname/aux-action=pause/ -- uname This requires kernel support (which was added in v6.13) perf lock: - 'perf lock contention' command has an ability to symbolize locks in dynamically allocated objects using slab cache name when it runs with BPF. Those dynamic locks would have "&" prefix in the name to distinguish them from ordinary (static) locks # perf lock con -abl -E 5 sleep 1 contended total wait max wait avg wait address symbol 2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex) 1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex) 4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex) 2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex) 3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex) This also requires kernel/BPF support (which was added in v6.13) perf ftrace: - 'perf ftrace latency' command gets a couple of options to support linear buckets instead of exponential. Also it's possible to specify max and min latency for the linear buckets: # perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \ --min-latency=200 --max-latency=800 -- sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 200 ns \| 186 \| ### \| 200 - 300 ns \| 256 \| ##### \| 300 - 400 ns \| 364 \| ####### \| 400 - 500 ns \| 223 \| #### \| 500 - 600 ns \| 111 \| ## \| 600 - 700 ns \| 41 \| \| 700 - 800 ns \| 141 \| ## \| 800 - ... ns \| 169 \| ### \| # statistics (in nsec) total time: 2162212 avg time: 967 max time: 16817 min time: 132 count: 2236 - As you can see in the above example, it nows shows the statistics at the end so that users can see the avg/max/min latencies easily - 'perf ftrace profile' command has --graph-opts option like 'perf ftrace trace' so that it can control the tracing behaviors in the same way. For example, it can limit the function call depth or threshold perf script: - Improve physical memory resolution in 'mem-phys-addr' script by parsing /proc/iomem file # perf script mem-phys-addr -- find / ... Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-85f7fffff : System RAM 8929 69.7 547600000-54785d23f : Kernel data 1240 9.7 546a00000-5474bdfff : Kernel rodata 490 3.8 5480ce000-5485fffff : Kernel bss 121 0.9 0-fff : Reserved 3860 30.1 100000-89c01fff : System RAM 18 0.1 8a22c000-8df6efff : System RAM 5 0.0 Others: - 'perf test' gets --runs-per-test option to run the test cases repeatedly. This would be helpful to see if it's flaky - Add 'parse_events' method to Python perf extension module, so that users can use the same event parsing logic in the python code. One more step towards implementing perf tools in Python. :) - Support opening tracepoint events without libtraceevent. This will be helpful if it won't use the tracing data like in 'perf stat' - Update ARM Neoverse N2/V2 JSON events and metrics" * tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (176 commits) perf test: Update event_groups test to use instructions perf bench: Fix undefined behavior in cmpworker() perf annotate: Prefer passing evsel to evsel->core.idx perf lock: Rename fields in lock_type_table perf lock: Add percpu-rwsem for type filter perf lock: Fix parse_lock_type which only retrieve one lock flag perf lock: Fix return code for functions in __cmd_contention perf hist: Fix width calculation in hpp__fmt() perf hist: Fix bogus profiles when filters are enabled perf hist: Deduplicate cmp/sort/collapse code perf test: Improve verbose documentation perf test: Add a runs-per-test flag perf test: Fix parallel/sequential option documentation perf test: Send list output to stdout rather than stderr perf test: Rename functions and variables for better clarity perf tools: Expose quiet/verbose variables in Makefile.perf perf config: Add a function to set one variable in .perfconfig perf test perftool_testsuite: Return correct value for skipping perf test perftool_testsuite: Add missing description perf test record+probe_libc_inet_pton: Make test resilient ...	2025-01-24 05:45:40 -08:00
Howard Chu	013eb043f3	perf trace: Fix BPF loading failure (-E2BIG) As reported by Namhyung Kim and acknowledged by Qiao Zhao (link: https://lore.kernel.org/linux-perf-users/20241206001436.1947528-1-namhyung@kernel.org/), on certain machines, perf trace failed to load the BPF program into the kernel. The verifier runs perf trace's BPF program for up to 1 million instructions, returning an E2BIG error, whereas the perf trace BPF program should be much less complex than that. This patch aims to fix the issue described above. The E2BIG problem from clang-15 to clang-16 is cause by this line: } else if (size < 0 && size >= -6) { /* buffer / Specifically this check: size < 0. seems like clang generates a cool optimization to this sign check that breaks things. Making 'size' s64, and use } else if ((int)size < 0 && size >= -6) { / buffer / Solves the problem. This is some Hogwarts magic. And the unbounded access of clang-12 and clang-14 (clang-13 works this time) is fixed by making variable 'aug_size' s64. As for this: -if (aug_size > TRACE_AUG_MAX_BUF) - aug_size = TRACE_AUG_MAX_BUF; +aug_size = args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF : args->args[index]; This makes the BPF skel generated by clang-18 work. Yes, new clangs introduce problems too. Sorry, I only know that it works, but I don't know how it works. I'm not an expert in the BPF verifier. I really hope this is not a kernel version issue, as that would make the test case (kernel_nr) (clang_nr), a true horror story. I will test it on more kernel versions in the future. Fixes: `395d38419f`: ("perf trace augmented_raw_syscalls: Add more check s to pass the verifier") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241213023047.541218-1-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-23 15:55:52 -08:00
Linus Torvalds	9ad09c4f28	arm64 updates for 6.14 Confidential Computing: * Register a platform device when running in CCA realm mode to enable automatic loading of dependent modules. CPU Features: * Update a bunch of system register definitions to pick up new field encodings from the architectural documentation. * Add hwcaps and selftests for the new (2024) dpISA extensions. Documentation: * Update EL3 (firmware) requirements for booting Linux on modern arm64 designs. * Remove stale information about the kernel virtual memory map. Miscellaneous: * Minor cleanups and typo fixes. Memory management: * Fix vmemmap_check_pmd() to look at the PMD type bits * LPA2 (52-bit physical addressing) cleanups and minor fixes. * Adjust physical address space depending upon whether or not LPA2 is enabled. Perf and PMUs: * Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU * Extend AXI filtering support for the DDR PMU on NXP IMX SoCs * Fix Designware PCIe PMU event numbering. * Add generic branch events for the Apple M1 CPU PMU. * Add support for Marvell Odyssey DDR and LLC-TAD PMUs. * Cleanups to the Hisilicon DDRC and Uncore PMU code. * Advertise discard mode for the SPE PMU. * Add the perf users mailing list to our MAINTAINERS entry. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmeKZLcQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNEQzB/0X2U89ZiqxIkTPQvfFrjN/uUGybkq59rEL DfeoGukTgJIwc3GHWXXtQ//wuuYKdTeCXaIz5NFK3+7/wmKSLvjkexmue8pta6EY 5rx9bAPr/D8lAUvhKIN2l3pF/ygoRwDz+nT2yVQ1xlZxYJWX7ZIsMj7W7ceb5kdx HRrTSQuhEEPREAWWO4oCMWl5SQZSrIflSE3Be/PsP0OhW6k//ZmWbcJTgUcHbKam o2WtNjITyGzxMpRCcrGEZKoe9YcwSxiut/PoD7JuoB4C/rbsf1cdJ6uLmtvGJcZj qsdRHhVfBzP1+ahONrDbiT3C2+s1UZySKdCDIxiYy6lB39wpP0dd =E7Mf -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "We've got a little less than normal thanks to the holidays in December, but there's the usual summary below. The highlight is probably the 52-bit physical addressing (LPA2) clean-up from Ard. Confidential Computing: - Register a platform device when running in CCA realm mode to enable automatic loading of dependent modules CPU Features: - Update a bunch of system register definitions to pick up new field encodings from the architectural documentation - Add hwcaps and selftests for the new (2024) dpISA extensions Documentation: - Update EL3 (firmware) requirements for booting Linux on modern arm64 designs - Remove stale information about the kernel virtual memory map Miscellaneous: - Minor cleanups and typo fixes Memory management: - Fix vmemmap_check_pmd() to look at the PMD type bits - LPA2 (52-bit physical addressing) cleanups and minor fixes - Adjust physical address space depending upon whether or not LPA2 is enabled Perf and PMUs: - Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU - Extend AXI filtering support for the DDR PMU on NXP IMX SoCs - Fix Designware PCIe PMU event numbering - Add generic branch events for the Apple M1 CPU PMU - Add support for Marvell Odyssey DDR and LLC-TAD PMUs - Cleanups to the Hisilicon DDRC and Uncore PMU code - Advertise discard mode for the SPE PMU - Add the perf users mailing list to our MAINTAINERS entry" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits) Documentation: arm64: Remove stale and redundant virtual memory diagrams perf docs: arm_spe: Document new discard mode perf: arm_spe: Add format option for discard mode MAINTAINERS: Add perf list for drivers/perf/ arm64: Remove duplicate included header drivers/perf: apple_m1: Map generic branch events arm64: rsi: Add automatic arm-cca-guest module loading kselftest/arm64: Add 2024 dpISA extensions to hwcap test KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1 arm64/hwcap: Describe 2024 dpISA extensions to userspace arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12 arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu() arm64: mm: Test for pmd_sect() in vmemmap_check_pmd() arm64/mm: Replace open encodings with PXD_TABLE_BIT arm64/mm: Rename pte_mkpresent() as pte_mkvalid() arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09 ...	2025-01-20 21:21:49 -08:00
Athira Rajeev	91b7747dc7	perf test: Update event_groups test to use instructions In some of the powerpc platforms, event group testcase fails as below: # perf test -v 'Event groups' 69: Event groups : --- start --- test child forked, pid 9765 Using CPUID 0x00820200 Using hv_24x7 for uncore pmu event 0x0 0x0, 0x0 0x0, 0x0 0x0: Fail 0x0 0x0, 0x0 0x0, 0x1 0x3: Pass The testcase creates various combinations of hw, sw and uncore PMU events and verify group creation succeeds or fails as expected. This tests one of the limitation in perf where it doesn't allow creating a group of events from different hw PMUs. The testcase starts a leader event and opens two sibling events. The combination the fails is three hardware events in a group. "0x0 0x0, 0x0 0x0, 0x0 0x0: Fail" Type zero and config zero which translates to PERF_TYPE_HARDWARE and PERF_COUNT_HW_CPU_CYCLE. There is event constraint in powerpc that events using same counter cannot be programmed in a group. Here there is one alternative event for cycles, hence one leader and only one sibling event can go in as a group. if all three events (leader and two sibling events), are hardware events, use instructions as one of the sibling event. Since PERF_COUNT_HW_INSTRUCTIONS is a generic hardware event and present in all architectures, use this as third event. Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20250110094620.94976-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-18 10:32:57 -08:00
Kuan-Wei Chiu	62892e77b8	perf bench: Fix undefined behavior in cmpworker() The comparison function cmpworker() violates the C standard's requirements for qsort() comparison functions, which mandate symmetry and transitivity: Symmetry: If x < y, then y > x. Transitivity: If x < y and y < z, then x < z. In its current implementation, cmpworker() incorrectly returns 0 when w1->tid < w2->tid, which breaks both symmetry and transitivity. This violation causes undefined behavior, potentially leading to issues such as memory corruption in glibc [1]. Fix the issue by returning -1 when w1->tid < w2->tid, ensuring compliance with the C standard and preventing undefined behavior. Link: https://www.qualys.com/2024/01/30/qsort.txt [1] Fixes: `121dd9ea01` ("perf bench: Add epoll parallel epoll_wait benchmark") Cc: stable@vger.kernel.org Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250116110842.4087530-1-visitorckw@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-18 10:14:36 -08:00
Ian Rogers	035f0c279b	perf annotate: Prefer passing evsel to evsel->core.idx An evsel idx may not be stable due to sorting, evlist removal, etc. Try to reduce it being part of APIs by explicitly passing the evsel in annotate code. Internally the code just reads evsel->core.idx so behavior is unchanged. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Chen Ni <nichen@iscas.ac.cn> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20250117181848.690474-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-18 10:02:10 -08:00
Chun-Tse Shao	ac22d75377	perf lock: Rename fields in lock_type_table `lock_type_table` contains `name` and `str` which can be confusing. Rename them to `flags_name` and `lock_name` and add descriptions to enhance understanding. Tested by building perf for x86. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-3-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 10:12:41 -08:00
Chun-Tse Shao	e9188ae3cd	perf lock: Add percpu-rwsem for type filter percpu-rwsem was missing in man page. And for backward compatibility, replace `pcpu-sem` with `percpu-rwsem` before parsing lock name. Tested `./perf lock con -ab -Y pcpu-sem` and `./perf lock con -ab -Y percpu-rwsem` Fixes: `4f701063bf` ("perf lock contention: Show lock type with address") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-2-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 10:12:40 -08:00
Chun-Tse Shao	1be9264158	perf lock: Fix parse_lock_type which only retrieve one lock flag `parse_lock_type` can only add the first lock flag in `lock_type_table` given input `str`. For example, for `Y rwlock`, it only adds `rwlock:R` into this perf session. Another example is for `-Y mutex`, it only adds the mutex without `LCB_F_SPIN` flag. The patch fixes this issue, makes sure both `rwlock:R` and `rwlock:W` will be added with `-Y rwlock`, and so on. Testing: $ ./perf lock con -ab -Y mutex,rwlock -- perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 9.313 [sec] 9.313976 usecs/op 107365 ops/sec contended total wait max wait avg wait type caller 176 1.65 ms 19.43 us 9.38 us mutex pipe_read+0x57 34 180.14 us 10.93 us 5.30 us mutex pipe_write+0x50 7 77.48 us 16.09 us 11.07 us mutex do_epoll_wait+0x24d 7 74.70 us 13.50 us 10.67 us mutex do_epoll_wait+0x24d 3 35.97 us 14.44 us 11.99 us rwlock:W ep_done_scan+0x2d 3 35.00 us 12.23 us 11.66 us rwlock:W do_epoll_wait+0x255 2 15.88 us 11.96 us 7.94 us rwlock:W do_epoll_wait+0x47c 1 15.23 us 15.23 us 15.23 us rwlock:W do_epoll_wait+0x4d0 1 14.26 us 14.26 us 14.26 us rwlock:W ep_done_scan+0x2d 2 14.00 us 7.99 us 7.00 us mutex pipe_read+0x282 1 12.29 us 12.29 us 12.29 us rwlock:R ep_poll_callback+0x35 1 12.02 us 12.02 us 12.02 us rwlock:W do_epoll_ctl+0xb65 1 10.25 us 10.25 us 10.25 us rwlock:R ep_poll_callback+0x35 1 7.86 us 7.86 us 7.86 us mutex do_epoll_ctl+0x6c1 1 5.04 us 5.04 us 5.04 us mutex do_epoll_ctl+0x3d4 [namhyung: Add a comment and rename to 'mutex:spin' for consistency Fixes: `d783ea8f62` ("perf lock contention: Simplify parse_lock_type()") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-1-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 10:11:57 -08:00
Athira Rajeev	83196dd349	perf lock: Fix return code for functions in __cmd_contention perf lock contention returns zero exit value even if the lock contention BPF setup failed. # ./perf lock con -b true libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF libbpf: Error loading vmlinux BTF: -ESRCH libbpf: failed to load object 'lock_contention_bpf' libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH Failed to load lock-contention BPF skeleton lock contention BPF setup failed # echo $? 0 Fix this by saving the return code for lock_contention_prepare so that command exits with proper return code. Similarly set the return code properly for two other functions in builtin-lock, namely setup_output_field() and select_key(). Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250110093730.93610-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 10:09:54 -08:00
Dmitry Vyukov	036e2faa99	perf hist: Fix width calculation in hpp__fmt() hpp__width_fn() round up width to length of the field name, hpp__fmt() should do it too. Otherwise, the numbers may end up unaligned if the field name is long. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250108065949.235718-1-dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-17 09:51:18 -08:00
Dmitry Vyukov	8b4799e4f0	perf hist: Fix bogus profiles when filters are enabled When a filtered column is not present in the sort order, profiles become arbitrary broken. Filtered and non-filtered entries are collapsed together, and the filtered-by field ends up with a random value (either from a filtered or non-filtered entry). If we end up with filtered entry/value, then the whole collapsed entry will be filtered out and will be missing in the profile. If we end up with non-filtered entry/value, then the overhead value will be wrongly larger (include some subset of filtered out samples). This leads to very confusing profiles. The problem is hard to notice, and if noticed hard to understand. If the filter is for a single value, then it can be fixed by adding the corresponding field to the sort order (provided user understood the problem). But if the filter is for multiple values, it's impossible to fix b/c there is no concept of binary sorting based on filter predicate (we want to group all non-filtered values in one bucket, and all filtered values in another). Examples of affected commands: perf report --tid=123 perf report --sort overhead,symbol --comm=foo,bar Fix this by considering filtered status as the highest priority sort/collapse predicate. As a side effect this effectively adds a new feature of showing profile where several lines are combined based on arbitrary filtering predicate. For example, showing symbols from binaries foo and bar combined together, but not from other binaries; or showing combined overhead of several particular threads. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/359dc444ce94d20e59d3a9e360c36fbeac833a04.1736927981.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 13:43:28 -08:00
Dmitry Vyukov	cd57c04c38	perf hist: Deduplicate cmp/sort/collapse code Application of cmp/sort/collapse fmt callbacks is duplicated 6 times. Factor it into a common helper function. NFC. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/84c4b55614e24a344f86ae0db62e8fa8f251f874.1736927981.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 13:43:28 -08:00
Ian Rogers	4e38f2814f	perf test: Improve verbose documentation Add a little more detail on the output expectations for each verbose level. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Ian Rogers	1c0d9816e9	perf test: Add a runs-per-test flag To detect flakes it is useful to run tests more than once. Add a runs-per-test flag that will run each test multiple times. Example output: ``` $ perf test -r 3 lbr -v 122: perf record LBR tests : Ok 122: perf record LBR tests : Ok 122: perf record LBR tests : Ok ``` Update the documentation for the runs-per-test option. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Ian Rogers	4dd8bc4bf5	perf test: Fix parallel/sequential option documentation The parallel option was removed in commit `94d1a913bd` ("perf test: Make parallel testing the default"). Update the sequential documentation to reflect it isn't the default except for "exclusive" tests. Fixes: `94d1a913bd` ("perf test: Make parallel testing the default") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Ian Rogers	2b7b78efc8	perf test: Send list output to stdout rather than stderr Follow the workload listing in using stdout rather than stderr. Correct the numbering of sub-tests to be 1.1 rather than 1:1. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Ian Rogers	2e47c503de	perf test: Rename functions and variables for better clarity The relationship between subtests and test cases is somewhat confusing, so let's do away with the notion of sub-tests and switch to just working with some number of test cases. Add a test_suite__for_each_test_case as in many cases, except the special one test case situation, the iteration can just be on all test cases. Switch variable names to be more intention revealing of what their value is. This work was motivated by discussion with Kan where it was noted the code is becoming overly indented: https://lore.kernel.org/lkml/20241109160219.49976-1-irogers@google.com/ Unifying more of the sub-test/no-sub-tests avoids one level of indentation in a number of places. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 11:01:03 -08:00
Charlie Jenkins	f2868b1a66	perf tools: Expose quiet/verbose variables in Makefile.perf The variables to make builds silent/verbose live inside tools/build/Makefile.build. Move those variables to the top-level Makefile.perf to be generally available. Committer testing: See the SYSCALL lines, now they are consistent with the other operations in other lines: SYSTBL /tmp/build/perf-tools-next/arch/x86/include/generated/asm/syscalls_32.h SYSTBL /tmp/build/perf-tools-next/arch/x86/include/generated/asm/syscalls_64.h GEN /tmp/build/perf-tools-next/common-cmds.h GEN /tmp/build/perf-tools-next/arch/arm64/include/generated/asm/sysreg-defs.h PERF_VERSION = 6.13.rc2.g3d94bb6ed1d0 GEN perf-archive MKDIR /tmp/build/perf-tools-next/jvmti/ MKDIR /tmp/build/perf-tools-next/jvmti/ MKDIR /tmp/build/perf-tools-next/jvmti/ MKDIR /tmp/build/perf-tools-next/jvmti/ GEN perf-iostat CC /tmp/build/perf-tools-next/jvmti/libjvmti.o Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20250114-perf_make_test-v1-1-decc1c517b11@rivosinc.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-01-16 10:59:20 -08:00
Arnaldo Carvalho de Melo	e9cbc854d8	perf config: Add a function to set one variable in .perfconfig To allow for setting a variable from some other tool, like with the "wallclock" patchset needs to allow the user to opt-in to having that key in the sort order for 'perf report'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/lkml/Z4akewi7UPXpagce@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 15:05:56 -03:00
Veronika Molnarova	1ab138febc	perf test perftool_testsuite: Return correct value for skipping In 'perf test', a return value 2 represents that the test case was skipped. Fix this value for perftool_testsuite test cases to differentiate between skip and pass values. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250113182605.130719-3-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:20 -03:00
Veronika Molnarova	5afd6d38cf	perf test perftool_testsuite: Add missing description Properly name the test cases of perftool_testsuite instead of the license being taken as the name for 'perf test'. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250113182605.130719-2-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:20 -03:00
Leo Yan	9a7b618ef6	perf test record+probe_libc_inet_pton: Make test resilient The test failed back and forth due to the call chain being heavily impacted by the libc, which varies across different architectures and distros. The libc contains the symbols for "gaih_inet" and "getaddrinfo" in some cases, but not always. Moreover, these symbols can be either normal symbols or dynamic symbols, making it difficult to decide the call chain entries due to the symbols are inconsistent. To fix the issue, this commit identifies three call chain entries are always present. These entries are matched by iterating through the lines in the "perf script" result. The recording attribute max-stack is set to 4 for the possible maximum call chain depth. After: # perf test -vF pton --- start --- Pattern: ping[][0-9 \.:]+probe_libc:inet_pton: $[[:xdigit:]]+$ Matching: ping 285058 [025] 1219802.466939: probe_libc:inet_pton: (ffffa14b7cf0) Pattern: .inet_pton\+0x[[:xdigit:]]+[[:space:]]$/usr/lib/aarch64-linux-gnu/libc-2.31.so\|inlined$$ Matching: ping 285058 [025] 1219802.466939: probe_libc:inet_pton: (ffffa14b7cf0) Matching: ffffa14b7cf0 __GI___inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc-2.31.so) Pattern: .(\+0x[[:xdigit:]]+\|\[unknown\])[[:space:]]$./bin/ping.$$ Matching: ping 285058 [025] 1219802.466939: probe_libc:inet_pton: (ffffa14b7cf0) Matching: ffffa14b7cf0 __GI___inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc-2.31.so) Matching: ffffa1488040 getaddrinfo+0xe8 (/usr/lib/aarch64-linux-gnu/libc-2.31.so) Matching: aaaab8672da4 [unknown] (/usr/bin/ping) ---- end ---- 82: probe libc's inet_pton & backtrace it with ping : Ok Closes: https://lore.kernel.org/linux-perf-users/1728978807-81116-1-git-send-email-renyu.zj@linux.alibaba.com/ Closes: https://lore.kernel.org/linux-perf-users/Z0X3AYUWkAgfPpWj@x1/T/#m57327e135b156047e37d214a0d453af6ae1e02be Reported-by: Guilherme Amadio <amadio@gentoo.org> Reported-by: Jing Zhang <renyu.zj@linux.alibaba.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241202111958.553403-1-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:20 -03:00
Ian Rogers	8e246a1b2a	perf inject: Fix use without initialization of local variables Local variables were missing initialization and command line processing didn't provide default values. Fixes: `64eed019f3` ("perf inject: Lazy build-id mmap2 event insertion") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241211060831.806539-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
James Clark	6804a7192a	perf probe: Rename err label Rename err to out to avoid confusion because buf is still supposed to be freed in non error cases. Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241211085525.519458-3-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Ian Rogers	f9c506fb69	perf test stat: Avoid hybrid assumption when virtualized The cycles event will fallback to task-clock in the hybrid test when running virtualized. Change the test to not fail for this. Fixes: `65d1182191` ("perf test: Add a test for default perf stat command") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241212173354.9860-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Athira Rajeev	2adbf5349a	perf record: Fix segfault with --off-cpu when debuginfo is not enabled When kernel is built without debuginfo, running 'perf record' with --off-cpu results in segfault as below: ./perf record --off-cpu -e dummy sleep 1 libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF Segmentation fault (core dumped) The backtrace pointed to: #0 0x00000000100fb17c in btf.type_cnt () #1 0x00000000100fc1a8 in btf_find_by_name_kind () #2 0x00000000100fc38c in btf.find_by_name_kind () #3 0x00000000102ee3ac in off_cpu_prepare () #4 0x000000001002f78c in cmd_record () #5 0x00000000100aee78 in run_builtin () #6 0x00000000100af3e4 in handle_internal_command () #7 0x000000001001004c in main () Code sequence is: static void check_sched_switch_args(void) { struct btf btf = btf__load_vmlinux_btf(); const struct btf_type t1, t2, t3; u32 type_id; type_id = btf__find_by_name_kind(btf, "btf_trace_sched_switch", BTF_KIND_TYPEDEF); btf__load_vmlinux_btf() fails when CONFIG_DEBUG_INFO_BTF is not enabled. Here bpf__find_by_name_kind() calls btf__type_cnt() with NULL btf value and results in segfault. To fix this, add a check to see if btf is not NULL before invoking bpf__find_by_name_kind(). Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://lore.kernel.org/r/20241223135813.8175-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Athira Rajeev	8c1a106635	perf tests base_probe: Fix check for the count of existing probes in test_adding_kernel perftool-testsuite_probe fails in test_adding_kernel as below: Regexp not found: "probe:inode_permission_11" -- [ FAIL ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (with force) (output regexp parsing) event syntax error: 'probe:inode_permission_11' \___ unknown tracepoint Error: File /sys/kernel/tracing//events/probe/inode_permission_11 not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. The test does the following: 1) Adds a probe point first using: $CMD_PERF probe --add $TEST_PROBE 2) Then tries to add same probe again without —force and expects it to fail. Next tries to add same probe again with —force. In this case, perf probe succeeds and adds the probe with a suffix number. Example: ./perf probe --add inode_permission Added new event: probe:inode_permission (on inode_permission) ./perf probe --add inode_permission --force Added new event: probe:inode_permission_1 (on inode_permission) ./perf probe --add inode_permission --force Added new event: probe:inode_permission_2 (on inode_permission) Each time, suffix is added to existing probe name. To get the suffix number, test cases uses: NO_OF_PROBES=`$CMD_PERF probe -l \| wc -l` This will work if there is no other probe existing in the system. If there are any other probes other than kernel probes or inode_permission, ( example: any probe), "perf probe -l" will include count for other probes too. Example, in the system where this failed, already some probes were default added. So count became 10 ./perf probe -l \| wc -l 10 So to be specific for "inode_permission", restrict the probe count check to that probe point alone using: NO_OF_PROBES=`$CMD_PERF probe -l $TEST_PROBE\| wc -l` Similarly while removing the probe using "probe --del *", (removing all probes), check uses: ../common/check_all_lines_matched.pl "Removed event: probe:$TEST_PROBE" But if there are other probes in the system, the log will contain reference to other existing probe too. Hence change usage of check_all_lines_matched.pl to check_all_patterns_found.pl This will make sure expecting string comes in the result Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250110094324.94604-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Michel Lind	8bf18c5cef	perf MANIFEST: Add license files The standalone tarballs should include the license files - both the COPYING declaration as well as the text of GPLv2. Signed-off-by: Michel Lind <michel@michel-slm.name> Link: https://lore.kernel.org/r/Z0Zcx0WRqtlUYpgw@hyperscale.parallels Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
James Clark	3178155d29	perf test brstack: Speed up running test by using tr -s instead of xargs The brstack test runs quite slowly in software models. Part of the reason is "xargs -n1" is quite inefficient in replacing spaces with newlines. While that's not noticeable on normal machines, it is on software models. Use "tr -s ' ' '\n'" instead which can do the same transformation, but is much faster. For comparison on an M1 Macbook Pro: $ time seq -s ' ' 10000 \| xargs -n1 > /dev/null real 0m2.729s user 0m2.009s sys 0m0.914s $ time seq -s ' ' 10000 \| tr -s ' ' '\n' \| grep '.' > /dev/null real 0m0.002s user 0m0.001s sys 0m0.001s The "grep '.'" is also needed to remove any remaining blank lines. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241213231312.2640687-2-robh@kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> [robh: Drop changing loop iterations on arm64. Squash blank line fix and redo commit msg] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-14 14:57:19 -03:00
Charlie Jenkins	b1bb6fc06b	perf tools mips: Fix mips syscall generation The mips syscall generation was still based on the old method. Delete the Makefile since it is no longer needed with the new method of generation. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Fixes: `619ffe6694` ("perf tools mips: Use generic syscall scripts") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250110-perf_fix_mips-v1-1-4e661c3b710a@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-13 11:46:41 -03:00
James Clark	05cd60e4d0	perf tests arm_spe: Add test for discard mode Add a test that checks that there were no AUX or AUXTRACE events recorded when discard mode is used. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108142904.401139-6-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-13 11:45:05 -03:00
James Clark	9c3164ea7e	perf tools arm-spe: Don't allocate buffer or tracking event in discard mode The buffer will never be written to so don't bother allocating it. The tracking event is also not required. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108142904.401139-5-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-13 11:45:03 -03:00
James Clark	23a65c5e8b	perf tools arm-spe: Pull out functions for aux buffer and tracking setup These won't be used in the next commit in discard mode, so put them in their own functions. No functional changes intended. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108142904.401139-4-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-13 11:43:15 -03:00
Jiachen Zhang	ac0ac75189	perf report: Fix misleading help message about --demangle The wrong help message may mislead users. This commit fixes it. Fixes: `328ccdace8` ("perf report: Add --no-demangle option") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Jiachen Zhang <me@jcix.top> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250109152220.1869581-1-me@jcix.top Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 14:46:09 -03:00
Namhyung Kim	510f0247cd	perf ftrace: Fix display for range of the first bucket When min_latency is not given, it prints 0 - 0. It should be 0 - 1. Before: $ sudo ./perf ftrace latency -a -T do_futex sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 0 us \| 321 \| ########### \| ... After: $ sudo ./perf ftrace latency -a -T do_futex sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 699 \| ############ \| ... Fixes: `08b875b6bf` ("perf ftrace latency: Introduce --min-latency to narrow down into a latency range") Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gabriele Monaco <gmonaco@redhat.com Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108210015.1188531-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 14:45:43 -03:00
Namhyung Kim	dd01b985c5	perf ftrace: Check min/max latency only with bucket range It's an optional feature and remains 0 when bucket range is not given. And it makes the histogram goes to the last entry always because any latency (num) is greater than or equal to 0. Before: $ sudo ./perf ftrace latency -a -T do_futex sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 0 us \| 0 \| \| 1 - 2 us \| 0 \| \| 2 - 4 us \| 0 \| \| 4 - 8 us \| 0 \| \| 8 - 16 us \| 0 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - 2 ms \| 0 \| \| 2 - 4 ms \| 0 \| \| 4 - 8 ms \| 0 \| \| 8 - 16 ms \| 0 \| \| 16 - 32 ms \| 0 \| \| 32 - 64 ms \| 0 \| \| 64 - 128 ms \| 0 \| \| 128 - 256 ms \| 0 \| \| 256 - 512 ms \| 0 \| \| 512 - 1024 ms \| 0 \| \| 1 - ... s \| 1353 \| ############################################## \| After: $ sudo ./perf ftrace latency -a -T do_futex sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 0 us \| 321 \| ########### \| 1 - 2 us \| 132 \| #### \| 2 - 4 us \| 202 \| ####### \| 4 - 8 us \| 188 \| ###### \| 8 - 16 us \| 16 \| \| 16 - 32 us \| 12 \| \| 32 - 64 us \| 30 \| # \| 64 - 128 us \| 98 \| ### \| 128 - 256 us \| 53 \| # \| 256 - 512 us \| 57 \| ## \| 512 - 1024 us \| 9 \| \| 1 - 2 ms \| 9 \| \| 2 - 4 ms \| 1 \| \| 4 - 8 ms \| 98 \| ### \| 8 - 16 ms \| 5 \| \| 16 - 32 ms \| 7 \| \| 32 - 64 ms \| 32 \| # \| 64 - 128 ms \| 10 \| \| 128 - 256 ms \| 10 \| \| 256 - 512 ms \| 2 \| \| 512 - 1024 ms \| 0 \| \| 1 - ... s \| 0 \| \| Fixes: `690a052a6d` ("perf ftrace latency: Add --max-latency option") Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gabriele Monaco <gmonaco@redhat.com Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108210015.1188531-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 14:43:55 -03:00
James Clark	ba113ecad8	perf docs: arm_spe: Document new discard mode Document the flag along with PMU events to hint what it's used for and give an example with other useful options to get minimal output. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250108142904.401139-3-james.clark@linaro.org Signed-off-by: Will Deacon <will@kernel.org>	2025-01-10 14:50:55 +00:00
Arnaldo Carvalho de Melo	74c033b6aa	perf MANIFEST: Add arch//include/uapi/asm/bpf_perf_event.h to the perf tarball Needed to build tools/lib/bpf/ on various arches other than x86_64, notably arm64 when using the perf tarballs generated by: $ make help \| grep perf- perf-tar-src-pkg - Build the perf source tarball with no compression perf-targz-src-pkg - Build the perf source tarball with gzip compression perf-tarbz2-src-pkg - Build the perf source tarball with bz2 compression perf-tarxz-src-pkg - Build the perf source tarball with xz compression perf-tarzst-src-pkg - Build the perf source tarball with zst compression $ Building with BPF support was opt-in in perf for a long time, and testing it via the tarball main kernel Makefile targets in an architecture other than x86_64 was an odd case. I had noticed this at some point earlier this year while cross building perf to some arches, including arm64, but it fell thru the cracks, see the Link tag below. Fix it now by adding those arch//include/uapi/asm/bpf_perf_event.h files to the MANIFEST file used in building the perf source tarball. Tested with: perfbuilder@number:~$ time dm debian:experimental-x-arm64 1 21.60 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 14.1.0-5) 14.1.0 flex 2.6.4 BUILD_TARBALL_HEAD=d31a974f6edc576f84c35be9526fec549a3b3520 $ $ git log --oneline -1 d31a974f6edc576f84c35be9526fec549a3b3520 d31a974f6edc576f (HEAD -> perf-tools-next) perf MANIFEST: Add arch//include/uapi/asm/bpf_perf_event.h to the perf tarball $ That was previously failing: perfbuilder@number:~$ grep debian:experimental-x-arm64 dm.log.old/summary 19 4.80 debian:experimental-x-arm64 : FAIL gcc version 14.1.0 (Debian 14.1.0-5) $ perfbuilder@number:~$ grep -B6 'Error 1' dm.log.old/debian:experimental-x-arm64 In file included from /git/perf-6.12.0-rc6/tools/include/uapi/linux/bpf_perf_event.h:11, from libbpf.c:36: /git/perf-6.12.0-rc6/tools/include/uapi/asm/bpf_perf_event.h:2:10: fatal error: ../../arch/arm64/include/uapi/asm/bpf_perf_event.h: No such file or directory 2 \| #include "../../arch/arm64/include/uapi/asm/bpf_perf_event.h" \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make[4]: ** [/git/perf-6.12.0-rc6/tools/build/Makefile.build:105: /tmp/build/perf/libbpf/staticobjs/libbpf.o] Error 1 perfbuilder@number:~$ Closes: https://lore.kernel.org/all/Z0UNRCRYKunbDYxP@hyperscale.parallels Fixes: `9eea8fafe3` ("libbpf: fix __arg_ctx type enforcement for perf_event programs") Reported-by: Michel Lind <michel@michel-slm.name> Tested-by: Michel Lind <michel@michel-slm.name> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: 317c11923cf676437456e44a7f408d4ce589a9c0.camel@michel-slm.name Link: https://lore.kernel.org/bpf/ZfyEgoG3JFiOs2Fs@x1/ Link: https://lore.kernel.org/r/Z0Yy5u42Q1hWoEzz@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Yoshihiro Furudera	e5e34e9995	perf vendor events arm64: Add FUJITSU-MONAKA PMU event Add PMU events for FUJITSU-MONAKA. And, also updated common-and-microarch.json and recommended.json. FUJITSU-MONAKA Specification URL: https://github.com/fujitsu/FUJITSU-MONAKA Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Akio Kakuno <fj3333bs@aa.jp.fujitsu.com> Signed-off-by: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241217065751.1448755-1-fj5100bi@fujitsu.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Namhyung Kim	876e80cf83	perf tools: Fixup end address of modules In machine__create_module(), it reads /proc/modules to get a list of modules in the system. The file shows the start address (of text) and the size of the module so it uses the info to reconstruct system memory maps for symbol resolution. But module memory consists of multiple segments and they can be scaterred. Currently perf tools assume they are contiguous and see some overlaps. This can confuse the tool when it finds a map containing a given address. As we mostly care about the function symbols in the text segment, it can fixup the size or end address of modules when there's an overlap. We can use maps__fixup_end() which updates the end address using the start address of the next map. Ideally it should be able to track other segments (like data/rodata), but that would require some changes in /proc/modules IMHO. Reported-by: Blake Jones <blakejones@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20241218220453.203069-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Namhyung Kim	8c2eafbbfd	perf symbol: Prefer non-label symbols with same address When there are more than one symbols at the same address, it needs to choose which one is better. In choose_best_symbol() it didn't check the type of symbols. It's possible to have labels in other symbols and in that case, it would be better to pick the actual symbol over the labels. To minimize the possible impact on other symbols, I only check NOTYPE symbols specifically. $ readelf -sW vmlinux \| grep -e __do_softirq -e __softirqentry_text_start 105089: ffffffff82000000 814 FUNC GLOBAL DEFAULT 1 __do_softirq 111954: ffffffff82000000 0 NOTYPE GLOBAL DEFAULT 1 __softirqentry_text_start The commit `77b004f4c5` tried to do the same by not giving the size to the label symbols but it seems there's some label-only symbols in asm code. Let's restore the original code and choose the right symbol using type of the symbols. Fixes: `77b004f4c5` ("perf symbol: Do not fixup end address of labels") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: http://lore.kernel.org/lkml/Z3b-DqBMnNb4ucEm@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Ian Rogers	368781025a	perf symbol-elf: Avoid a weak cxx_demangle_sym function cxx_demangle_sym is weak in case demangle-cxx.c replaces the definition in symbol-elf.c. When demangle-cxx.c is built HAVE_CXA_DEMANGLE_SUPPORT is defined, as such the define can be used to avoid a weak symbol. As weak symbols are outside of the C standard their use can lead to strange behaviors, in particular with LTO, as well as causing issues to be hidden at link time. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241119031754.1021858-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Namhyung Kim	4f90ed0ae3	perf trace: Fix unaligned access for augmented args Some version of compilers reported unaligned accesses in perf trace when undefined-behavior sanitizer is on. I found that it uses raw data in the sample directly and assuming it's properly aligned. Unlike other sample fields, the raw data is not 8-byte aligned because there's a size field (u32) before the actual data. So I added a static buffer in syscall__augmented_args() and return it instead. This is not ideal but should work well as perf trace is single-threaded. A better approach would be aligning the raw data by adding a 4-byte data before the augmented args but I'm afraid it'd break the backward compatibility. Committer testing: To build with the undefined behaviour sanitizer: $ make CC=clang EXTRA_CFLAGS=-fsanitize=undefined -C tools/perf Checking if the resulting binary is instrumented: root@number:~# nm ~/bin/perf \| grep ubsan \| wc -l 113 root@number:~# nm ~/bin/perf \| grep ubsan \| tail -5 000000000043d5b0 t _ZN7__ubsanL19UBsanOnDeadlySignalEiPvS0_ 000000000043ce50 T _ZNK7__ubsan5Value12getSIntValueEv 000000000043cf40 T _ZNK7__ubsan5Value12getUIntValueEv 000000000043d140 T _ZNK7__ubsan5Value13getFloatValueEv 000000000043cfd0 T _ZNK7__ubsan5Value19getPositiveIntValueEv root@number:~# Now running something that will access timespec, as reported in the Closes URL: root@number:~# perf trace --max-events=1 -e nano sleep 1.1 trace/beauty/timespec.c:10:64: runtime error: member access within misaligned address 0x7fc583cfb2a4 for type 'struct augmented_arg', which requires 8 byte alignment 0x7fc583cfb2a4: note: pointer points here 99 99 11 00 10 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 e1 f5 05 00 00 00 00 00 00 00 00 ^ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior trace/beauty/timespec.c:10:64 <SNIP> As Namhyung said we need to make the raw_data to be 64-bit aligned, probably we need to add a PERF_SAMPLE_ALIGNED_RAW with a 64-bit raw_size instead of the current u32 done at kernel/events/core.c, perf_output_sample(), that perf_output_put(handle, raw->size) where raw->size is an u32 and then the raw_data is always 64-bit unaligned... After the patch: root@number:~# perf trace -e nano sleep 1.1 0.000 (1100.064 ms): sleep/1984224 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 100000001 }, rmtp: 0x7fff5b3fe970) = 0 root@number:~# Closes: https://lore.kernel.org/r/Z2STgyD1p456Qqhg@google.com Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250102201248.790841-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
James Clark	0ba2022410	perf test: Mark remaining probe tests as exclusive Probes are global and other probe tests are already exclusive. These two tests can throw warnings when run at the same time so mark them as exclusive too: $ perf test -vvv 81 79 79: perftool-testsuite_probe: --- start --- test child forked, pid 46419 ../common/init.sh: line 137: /sys/kernel/debug/tracing/uprobe_events: Device or resource busy Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20250107165933.292225-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Charlie Jenkins	3cc550f5bb	perf tools: Remove dependency on libaudit All architectures now support HAVE_SYSCALL_TABLE_SUPPORT, so the flag is no longer needed. With the removal of the flag, the related GENERIC_SYSCALL_TABLE can also be removed. libaudit was only used as a fallback for when HAVE_SYSCALL_TABLE_SUPPORT was not defined, so libaudit is also no longer needed for any architecture. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-16-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Charlie Jenkins	00d1bfae1b	perf tools s390: Use generic syscall table scripts Use the generic scripts to generate headers from the syscall table instead of the custom ones for s390. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-15-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:59:42 -03:00
Charlie Jenkins	4c02c7e0a2	perf tools powerpc: Use generic syscall table scripts Use the generic scripts to generate headers from the syscall table instead of the custom ones for powerpc. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-14-7543b5293098@rivosinc.com Link: https://lore.kernel.org/lkml/20250110100505.78d81450@canb.auug.org.au [ Stephen Rothwell noticed on linux-next that the powerpc build for perf was broken and ...] Link: https://lore.kernel.org/lkml/20250109-perf_powerpc_spu-v1-1-c097fc43737e@rivosinc.com [ ... Charlie fixed it up and asked for it to be squashed to avoid breaking bisection. ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-10 10:57:01 -03:00
Charlie Jenkins	619ffe6694	perf tools mips: Use generic syscall scripts Use the generic scripts to generate headers from the syscall table for mips. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-13-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:56:20 -03:00
Charlie Jenkins	fa70857a27	perf tools loongarch: Use syscall table loongarch uses a syscall table, use that in perf instead of using unistd.h. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-12-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:56:00 -03:00
Charlie Jenkins	cb8197db8c	perf tools arm64: Use syscall table arm64 uses a syscall table, use that in perf instead of using unistd.h. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-11-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:55:36 -03:00
Charlie Jenkins	02f2d58f23	perf tools parisc: Support syscall header parisc uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-10-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:55:13 -03:00
Charlie Jenkins	bb4f842891	perf tools alpha: Support syscall header alpha uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-9-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:54:54 -03:00
Charlie Jenkins	a874d1f6f1	perf tools x86: Use generic syscall scripts Use the generic scripts to generate headers from the syscall table for both 32- and 64-bit x86. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-8-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:53:49 -03:00
Charlie Jenkins	24f122dc09	perf tools xtensa: Support syscall header xtensa uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-7-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:53:28 -03:00
Charlie Jenkins	1f44829e5e	perf tools sparc: Support syscall headers sparc uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-6-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:52:51 -03:00
Charlie Jenkins	430a6dfe41	perf tools sh: Support syscall headers sh uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-5-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:52:28 -03:00
Charlie Jenkins	9605665a64	perf tools arm: Support syscall headers arm uses a syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-4-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:51:30 -03:00
Charlie Jenkins	c68825eed9	perf tools csky: Support generic syscall headers csky uses the generic syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Guo Ren <guoren@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-3-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:51:18 -03:00
Charlie Jenkins	26db672256	perf tools arc: Support generic syscall headers Arc uses the generic syscall table, use that in perf instead of requiring libaudit. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-2-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:50:56 -03:00
Charlie Jenkins	4a73aff8c5	perf tools: Create generic syscall table support Currently each architecture in perf independently generates syscall headers. Adapt the work that has gone into unifying syscall header implementations in the kernel to work with perf tools. Introduce this framework with riscv at first. riscv previously relied on libaudit, but with this change, perf tools for riscv no longer needs this external dependency. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-1-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-09 12:49:49 -03:00
Ian Rogers	6bfb4c571b	perf test cpumap: Avoid use-after-free following merge Previously cpu maps in the test weren't modified by calls to the cpu map API, however, perf_cpu_map__merge was modified so the left hand argument was updated. In the test this meant the maps copy of the "two" map was put/deleted in the merge meaning when accessed via maps, the pointer was stale and to the put/deleted memory. To fix this add an extra layer of indirection to the maps array, so the updated value of two is accessed. Fixes: `a9d2217556` ("libperf cpumap: Refactor perf_cpu_map__merge()") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108051511.1720369-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:40:05 -03:00
Dmitry Vyukov	9c64c7c658	perf llvm-add2line: Remove unused symbol_conf.h include Remove unused symbol_conf.h include. First, it's just unused. Second, it's problematic since this is a C++ file, and most perf headers don't compile as C++. So if any other includes are added to symbol_conf.h, it may break the build. Signed-off-by: Dmitriy Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250108070248.237943-1-dvyukov@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:38:32 -03:00
James Clark	58f4f294b3	perf test trace_btf_general: Fix shellcheck warning Shellcheck versions < v0.7.2 can't follow this path so add the helper to fix the following warning: tests/shell/trace_btf_general.sh line 8: . "$(dirname $0)"/lib/probe.sh ^--------------------------^ SC1090: Can't follow non-constant source. Use a directive to specify location. Fixes: `0255338d69` ("perf trace: Add tests for BTF general augmentation") Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250106164300.734202-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:36:04 -03:00
Arnaldo Carvalho de Melo	64a7617efd	perf namespaces: Fixup the nsinfo__in_pidns() return type, its bool When adding support for refconunt checking a cut'n'paste made this function, that is just an accessor to a bool member of 'struct nsinfo', return a pid_t, when that member is a boolean, fix it. Fixes: `bcaf0a9785` ("perf namespaces: Add functions to access nsinfo") Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20241206204828.507527-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:31:06 -03:00
Arnaldo Carvalho de Melo	74833e37df	perf jitdump: Fixup in_pidns member when java agent and 'perf record' are not in the same pidns When running 'perf record' outside a container and the java agent inside a container the jit_repipe_code_load() and friends will emit PERF_RECORD_MMAP2 entries for the jitdump records and will check if we need to fixup the pid/tid: nspid = jr->load.pid; pid = jr_entry_pid(jd, jr); tid = jr_entry_tid(jd, jr); The jr_entry_pid() function looks if we're in the same pidns: static pid_t jr_entry_pid(struct jit_buf_desc jd, union jr_entry jr) { if (jd->nsi && nsinfo__in_pidns(jd->nsi)) return nsinfo__tgid(jd->nsi); return jr->load.pid; } But since the thread, populated from perf.data records, try to figure out if in the same pidns by actually trying, on the system where 'perf inject' is running to open a procfs file (a bug that remains to be fixed), assuming that if it is not possible that is because that thread terminated and thus we can't get its namespace info and tolerates nsinfo__init() failing, noting only that that namespace can't be entered, so don't even try. But we can kinda get at least that info (thread->nsinfo->in_pidns) from the data in the perf.data file, namely the pid and tid in the PERF_RECORD_MMAP2 for the jit-<PID>.dump file generated from the java agent, if the PERF_RECORD_MMAP2->pid is the same as what is in the jitdump file, then we're in the same namespace, otherwise we need to use the PERF_RECORD_MMAP2->pid. This all has to be revamped for this jitdump + running perf from outside, as the meaning of in_pidns is being abused, the initialization of nsinfo->pid with the value coming from the PERF_RECORD_MMAP2 data is wrong as it is the pid _outside_ the container since perf was running there. The hack in this patch at least produces the expected result in this scenario by following the assumptions in the current codebase for finding maps and for generating the PERF_RECORD_MMAP2 for the ELF files synthesized from the jitdump records in jit_repipe_code_load(), etc.s Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20241206204828.507527-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:30:50 -03:00
Arnaldo Carvalho de Melo	9c6a585d25	perf namespaces: Introduce nsinfo__set_in_pidns() When we're processing a perf.data file we will, for every thread in that file do a machine__findnew_thread(machine, pid, tid) that when that pid is seen for the first time will create a 'struct thread' representing it. That in turn will call nsinfo__new() -> nsinfo__init() and there it will assume we're running live, which is wrong and will need to be addressed in a followup patch. The nsinfo__new() assumes that if we can't access that thread it has already finished and will ignore the -1 return from nsinfo__init(), just taking notes to avoid trying to enter in that namespace, since it isn't there anymore, a race. When doing this from 'perf inject', tho, we can fill in parts of that nsinfo from what we get from the PERF_RECORD_MMAP2 (pid, tid) and in the jitdump file name, that has the form of jit-<PID>.dump. So if the pid in the jitdump file name is not the one in the PERF_RECORD_MMAP2, we can assume that its the pid of the process _inside_ the namespace, and that perf was runing outside that namespace. This will be done in the following patch. Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20241206204828.507527-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:30:24 -03:00
Arnaldo Carvalho de Melo	f523347ba6	perf jitdump: Accept jitdump mmaps emitted from inside containers When the java agent is running inside a container it will emit mmaps with the format: ⬢ [acme@toolbox a]$ perf report -D \| grep PERF_RECORD_MMAP \| grep \.dump 0 0x15c400 [0x90]: PERF_RECORD_MMAP2 3308868/3308868: [0x7fb8de6cb000(0x1000) @ 0 08:14 3222905945 0]: r-xp /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jit-1.dump ⬢ [acme@toolbox a]$ Since perf is running from outside the container it sees the pid 3308868 in PERF_RECORD_MMAP2, while the agent saw the pid of the profiled app inside the container, 1. The previous validation was: if (pid && pid2 != nsinfo__nstgid(nsi)) pid2 at this point is '1' (/jit-1.dump), so it considers this as a malformed jitdump mmap and refuses to process it. The test ends up as: if (3308868 && 1 != 3308868) which is true and the jitdump is not processed. Since 1 in the container namespace is really 3308868 in the namespace that perf is running, consider this a valid mmap. We need to make perf realize this and behave accordingly, for now checking instead: if (pid && pid2 && pid != nsinfo__nstgid(nsi)) Translating to: if (3308868 && 1 && 3308868 != 3308868) Will make the jitdump mmap to be considered valid and processed. The jitdump is described in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jitdump-specification.txt Now we end up with the expected flurry of MMAPs, one per jitted function transformed into a little ELF file that should then be processable by the other perf features, like code annotation: [acme@toolbox a]$ echo $JITDUMPDIR /tmp/.debug/jit [acme@toolbox a]$ First use 'perf inject': ⬢ [acme@toolbox a]$ time perf inject -i perf.data -o acme-perf-injected.data -j Then look at the PERF_RECORD_MMAP events in the result file, that went thru the JIT map file: ⬢ [acme@toolbox a]$ ls -la /tmp/.map -rw-r--r--. 1 acme acme 2989559 Nov 27 16:11 /tmp/perf-3308868.map [acme@toolbox a]$ It is a symbol table: ⬢ [acme@toolbox a]$ head /tmp/.map 0x00007fb8bda5c1a0 0x00000000000000d0 java.lang.String java.lang.module.ModuleDescriptor.name() 0x00007fb8bda5c4a0 0x0000000000000178 int java.lang.StringLatin1.hashCode(byte[]) 0x00007fb8bda5c9a0 0x00000000000000d0 java.lang.String org.springframework.boot.context.config.ConfigDataLocation.getValue() 0x00007fb8bda5cca0 0x00000000000000d0 java.lang.module.ModuleDescriptor java.lang.module.ModuleReference.descriptor() 0x00007fb8bda5cfa0 0x00000000000000d0 java.lang.Object java.util.KeyValueHolder.getKey() 0x00007fb8bda5d2a0 0x00000000000000d0 java.lang.Object java.util.KeyValueHolder.getValue() 0x00007fb8bda5d5a0 0x0000000000000218 boolean jdk.internal.misc.Unsafe.compareAndSetReference(java.lang.Object, long, java.lang.Object, java.lang.Object) 0x00007fb8bda5d9a0 0x00000000000001f0 boolean jdk.internal.misc.Unsafe.compareAndSetLong(java.lang.Object, long, long, long) 0x00007fb8bda5dda0 0x00000000000001f8 void java.lang.System.arraycopy(java.lang.Object, int, java.lang.Object, int, int) 0x00007fb8bda5e1a0 0x00000000000001e8 int java.lang.Object.hashCode() ⬢ [acme@toolbox a]$ As specified in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jit-interface.txt This was collected from inside the container, so came as /tmp/perf-1.map. To make perf, running outside the container to use it we need to copy it to /tmp/perf-3308868.map. This is another logic that has to be added to perf to work on this scenario of running outside the container but processing things created by the hava agent running inside the container. With all this in place we get to: ⬢ [acme@toolbox a]$ perf report -D -i acme-perf-injected.data \| \ grep PERF_RECORD_MMAP > /tmp/acme-perf-injected.data.mmaps ; \ wc -l /tmp/acme-perf-injected.data.mmaps 44182 /tmp/acme-perf-injected.data.mmaps ⬢ [acme@toolbox a]$ tail /tmp/acme-perf-injected.data.mmaps 1030266786574466 0x7bc9e0 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0ceb1c0(0x8d0) @ 0x80 00:2c 238715 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so 1030266795288774 0x7bca78 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cecc00(0x7e8) @ 0x80 00:2c 238716 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so 1030266895967339 0x7bcb10 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cee500(0x3328) @ 0x80 00:2c 238717 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43991.so 1030266915748306 0x7bcba8 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0aae0a0(0x138) @ 0x80 00:2c 238718 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43992.so 1030267185851220 0x7bcc40 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cf61e0(0x3b50) @ 0x80 00:2c 238719 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43993.so 1030267231364524 0x7bccd8 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cfea80(0x14a0) @ 0x80 00:2c 238720 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43994.so 1030267425498831 0x7bcd70 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c054b4a0(0x338) @ 0x80 00:2c 238721 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43995.so 1030267506147888 0x7bce08 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0a995c0(0x1e8) @ 0x80 00:2c 238722 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43996.so 1030268112586116 0x7bcea0 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0d02520(0x258) @ 0x80 00:2c 238723 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43997.so 1030269435398150 0x7bcf38 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0d02dc0(0x278) @ 0x80 00:2c 238724 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43998.so ⬢ [acme@toolbox a]$ And if we look at those tiny ELF files generated by the jitdump code used by 'perf inject' we see: ⬢ [acme@toolbox a]$ file /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=790591db95a77d644657dfe5058658b200000000, with debug_info, not stripped ⬢ [acme@toolbox a]$ file /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=762f932acbee53a22638bf4c2b86780200000000, with debug_info, not stripped ⬢ [acme@toolbox a]$ ⬢ [acme@toolbox a]$ ls -la /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so -rw-r--r--. 1 acme acme 9432 Nov 29 10:56 /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so -rw-r--r--. 1 acme acme 7504 Nov 29 10:56 /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so ⬢ [acme@toolbox a]$ And: ⬢ [acme@toolbox a]$ objdump -dS /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so \| head -20 /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so: file format elf64-x86-64 Disassembly of section .text: 0000000000000080 <Lredacted/REDACTED/REDACTED/logging/RedactedRedacted;Redacted(Lredacted/REDACTED/REDACTED/redactedRedacted/Redacted;)V>: 80: 44 8b 56 08 mov 0x8(%rsi),%r10d 84: 49 c1 e2 03 shl $0x3,%r10 88: 49 3b c2 cmp %r10,%rax 8b: 0f 85 6f 15 83 fc jne fffffffffc831600 <Lredacted/REDACTED/REDACTED/redacted/RedactedRedactedRedacted;Redacted(Lredacted/Redacted/Redacted/redactedRedacted/Redacted;)V+0xfffffffffc831580> 91: 66 66 90 data16 xchg %ax,%ax 94: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 9b: 00 9c: 66 66 66 90 data16 data16 xchg %ax,%ax a0: 89 84 24 00 c0 fe ff mov %eax,-0x14000(%rsp) a7: 55 push %rbp a8: 48 8b ec mov %rsp,%rbp ab: 48 83 ec 40 sub $0x40,%rsp af: 48 89 34 24 mov %rsi,(%rsp) ⬢ [acme@toolbox a]$ The thing now being investigated is why we can't annotate anything here, maybe that JITDUMPDIR is getting in the way: ⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Internal error: Invalid -1 error code ⬢ [acme@toolbox a]$ In the tests I performed while merging this patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d518ac7be6223811ab947897273b1bbef846180 It works, but then there was no JITDUMPDIR involved: /home/acme/.debug/jit/java-jit-20241127.XXF1SRgN/jitted-3912413-4191.so ⬢ [acme@toolbox perf-tools-next]$ perf report --call-graph none --no-child -i perf-injected.data \| grep jitted- \| head 1.36% java jitted-3912413-54.so [.] Interpreter 0.30% C1 CompilerThre jitted-3912413-1.so [.] flush_icache_stub 0.18% java jitted-3912413-4184.so [.] org.apache.fop.fo.properties.PropertyMaker.get(int, org.apache.fop.fo.PropertyList, boolean, boolean) 0.18% java jitted-3912413-4177.so [.] org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int) 0.13% java jitted-3912413-3845.so [.] java.text.DecimalFormat.subformatNumber(java.lang.StringBuffer, java.text.Format$FieldDelegate, boolean, boolean, int, int, int, int) 0.11% java jitted-3912413-4191.so [.] org.apache.fop.fo.FObj.addChildNode(org.apache.fop.fo.FONode) 0.09% java jitted-3912413-2418.so [.] org.apache.fop.fo.XMLWhiteSpaceHandler.handleWhiteSpace() 0.08% Reference Handl jitted-3912413-54.so [.] Interpreter 0.08% java jitted-3912413-3326.so [.] org.apache.xmlgraphics.fonts.Glyphs.stringToGlyph(java.lang.String) 0.08% java jitted-3912413-3953.so [.] org.apache.fop.layoutmgr.BreakingAlgorithm.considerLegalBreak(org.apache.fop.layoutmgr.KnuthElement, int) ⬢ [acme@toolbox perf-tools-next]$ And then: ⬢ [acme@toolbox perf-tools-next]$ perf annotate --stdio2 -i perf-injected.data 'org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)' \| head -20 Samples: 8 of event 'cpu_atom/cycles/Pu', 4000 Hz, Event count (approx.): 8112794, [percent: local period] org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)() /home/acme/.debug/jit/java-jit-20241127.XXF1SRgN/jitted-3912413-4177.so Percent 0x80 <org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)>: nop movl 0x8(%rsi),%r10d cmpl 0x8(%rax),%r10d → jne 0 movl %eax,-0x14000(%rsp) pushq %rbp subq $0xb0,%rsp nop cmpl $0x3,0x20(%r15) ↓ jne 7037 2e: movl %ecx,0x28(%rsp) movq %rdx,%rbp movl 0x64(%rdx),%ebx cmpb $0x0,0x38(%r15) ↓ jne 3a44 movq %rsi,0x30(%rsp) 48: movq 0x30(%rsp),%r10 ⬢ [acme@toolbox perf-tools-next]$ No source code nor line numbers, that I saw in another build of perf for RHEL9, for the same workload described in the cset above (a publicly available java benchmark), so something to investigate on perf upstream running on fedora, maybe some quirk with the jdk used when building perf for RHEL 9 and for Fedora 40. A related patch that should have make this all work is: "perf inject jit: Add namespaces support" https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=67dec926931448d688efb5fe34f7b5a22470fc0a But we still need to polish this some more, maybe there are differences in the agent used in NodeJS with --perf-prof and the jvmti one we're using. Hopefully describing all the steps while we investigate this case will help us improve perf support for profiling JITed environments running in containers while profiling from inside and outside it. Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/r/20241206204828.507527-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:29:51 -03:00
Christophe Leroy	7a93786c30	perf machine: Don't ignore _etext when not a text symbol Depending on how vmlinux.lds is written, _etext might be the very first data symbol instead of the very last text symbol. Don't require it to be a text symbol, accept any symbol type. Comitter notes: See the first Link for further discussion, but it all boils down to this: --- # grep -e _stext -e _etext -e _edata /proc/kallsyms c0000000 T _stext c08b8000 D _etext So there is no _edata and _etext is not text $ ppc-linux-objdump -x vmlinux \| grep -e _stext -e _etext -e _edata c0000000 g .head.text 00000000 _stext c08b8000 g .rodata 00000000 _etext c1378000 g .sbss 00000000 _edata --- Fixes: `ed9adb2035` ("perf machine: Read also the end of the kernel") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/b3ee1994d95257cb7f2de037c5030ba7d1bed404.1736327613.git.christophe.leroy@csgroup.eu Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Christophe Leroy	dae29277fd	perf maps: Fix display of kernel symbols Since commit `659ad3492b` ("perf maps: Switch from rbtree to lazily sorted array for addresses"), perf doesn't display anymore kernel symbols on powerpc, allthough it still detects them as kernel addresses. # Overhead Command Shared Object Symbol # ........ .......... ............. ...................................... # 80.49% Coeur main [unknown] [k] 0xc005f0f8 3.91% Coeur main gau [.] engine_loop.constprop.0.isra.0 1.72% Coeur main [unknown] [k] 0xc005f11c 1.09% Coeur main [unknown] [k] 0xc01f82c8 0.44% Coeur main libc.so.6 [.] epoll_wait 0.38% Coeur main [unknown] [k] 0xc0011718 0.36% Coeur main [unknown] [k] 0xc01f45c0 This is because function maps__find_next_entry() now returns current entry instead of next entry, leading to kernel map end address getting mis-configured with its own start address instead of the start address of the following map. Fix it by really taking the next entry, also make sure that entry follows current one by making sure entries are sorted. Fixes: `659ad3492b` ("perf maps: Switch from rbtree to lazily sorted array for addresses") Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/2ea4501209d5363bac71a6757fe91c0747558a42.1736329923.git.christophe.leroy@csgroup.eu Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Namhyung Kim	c738a34417	perf test: Update ftrace test to use --graph-opts I found it failed on machines with limited memory because 16M byte per-cpu buffer is too big. The reason it added the option is not to miss tracing data. Thus we can limit the data size by reducing the function call depth instead of increasing the buffer size to handle the whole data. As it used the same option in the test_ftrace_trace() and it was able to find the sleep function, it should work with the profile subcommand. Get rid of other grep commands which might be affected by the depth change. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20250107224352.1128669-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Namhyung Kim	e5f2024cb9	perf ftrace profile: Add --graph-opts option Like trace subcommand, it should be able to pass some options to control the tracing behavior for the function graph tracer. But some options are limited in order to maintain the internal behavior. For example, it can limit the function call depth like below: # perf ftrace profile --graph-opts depth=5 -- myprog Committer testing: root@number:~# perf ftrace profile --graph-opts thresh=1000 -- sleep 1 # Total (us) Avg (us) Max (us) Count Function 1001419.301 500709.650 1000032.000 2 x64_sys_call 1000032.000 1000032.000 1000032.000 1 __x64_sys_clock_nanosleep 1000032.000 1000032.000 1000032.000 1 common_nsleep 1000031.000 1000031.000 1000031.000 1 do_nanosleep 1000031.000 1000031.000 1000031.000 1 hrtimer_nanosleep 1000024.000 1000024.000 1000024.000 1 schedule 1387.208 1387.208 1387.208 1 __x64_sys_execve 1386.691 1386.691 1386.691 1 do_execveat_common.isra.0 1334.170 1334.170 1334.170 1 bprm_execve 1258.413 1258.413 1258.413 1 load_elf_binary 1123.068 1123.068 1123.068 1 begin_new_exec 1113.550 1113.550 1113.550 1 mmput 1109.237 1109.237 1109.237 1 exit_mmap root@number:~# perf ftrace profile --graph-opts thresh=1200 -- sleep 1 # Total (us) Avg (us) Max (us) Count Function 1001448.204 500724.102 1000018.000 2 x64_sys_call 1000017.000 1000017.000 1000017.000 1 __x64_sys_clock_nanosleep 1000017.000 1000017.000 1000017.000 1 common_nsleep 1000017.000 1000017.000 1000017.000 1 hrtimer_nanosleep 1000016.000 1000016.000 1000016.000 1 do_nanosleep 1000012.000 1000012.000 1000012.000 1 schedule 1430.112 1430.112 1430.112 1 __x64_sys_execve 1429.581 1429.581 1429.581 1 do_execveat_common.isra.0 1376.289 1376.289 1376.289 1 bprm_execve 1301.743 1301.743 1301.743 1 load_elf_binary root@number:~# Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250107224352.1128669-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Namhyung Kim	86a12b92a9	perf ftrace: Display latency statistics at the end Sometimes users also want to see average latency as well as histogram. Display latency statistics like avg, max, min at the end. $ sudo ./perf ftrace latency -ab -T synchronize_rcu -- ... # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 us \| 0 \| \| 2 - 4 us \| 0 \| \| 4 - 8 us \| 0 \| \| 8 - 16 us \| 0 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - 2 ms \| 0 \| \| 2 - 4 ms \| 0 \| \| 4 - 8 ms \| 0 \| \| 8 - 16 ms \| 1 \| ##### \| 16 - 32 ms \| 7 \| ######################################## \| 32 - 64 ms \| 0 \| \| 64 - 128 ms \| 0 \| \| 128 - 256 ms \| 0 \| \| 256 - 512 ms \| 0 \| \| 512 - 1024 ms \| 0 \| \| 1 - ... s \| 0 \| \| # statistics (in usec) total time: 171832 avg time: 21479 max time: 30906 min time: 15869 count: 8 Committer testing: root@number:~# perf ftrace latency -nab --bucket-range 100 --max-latency 512 -T switch_mm_irqs_off sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 100 ns \| 314 \| ## \| 100 - 200 ns \| 1843 \| ############# \| 200 - 300 ns \| 1390 \| ########## \| 300 - 400 ns \| 844 \| ###### \| 400 - 500 ns \| 480 \| ### \| 500 - 512 ns \| 315 \| ## \| 512 - ... ns \| 16 \| \| # statistics (in nsec) total time: 2448936 avg time: 387 max time: 3285 min time: 82 count: 6328 root@number:~# Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250107224352.1128669-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Ian Rogers	05efa0ab01	perf evsel: Improve the evsel__open_strerror() for EBUSY The existing EBUSY strerror message is: The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (intel_bts//). "dmesg \| grep -i perf" may provide additional information. The dmesg won't be useful. What is more useful is knowing what processes are potentially using the PMU, which some procfs scanning can reveal. When parallel testing tests/shell/stat_all_pmu.sh this yields: Testing intel_bts// Error: The PMU intel_bts counters are busy and in use by another process. Possible processes: 2585882 perf list 2585902 perf list -j -o /tmp/__perf_test.list_output.json.KF9MY 2585904 perf list `2585911` perf record -e task-clock --filter period > 1 -o /dev/null --quiet true 2585912 perf list 2585915 perf list 2586042 /tmp/perf/perf record -asdg -e cpu-clock -o /tmp/perftool-testsuite_report.dIF/perf_report/perf.data -- sleep 2 2589078 perf record -g -e task-clock:u -o - perf test -w noploop 2589148 /tmp/perf/perf record --control=fifo:control,ack -e cpu-clock -m 1 sleep 10 2589379 perf --buildid-dir /tmp/perf.debug.Umx record --buildid-all -o /tmp/perf.data.YBm /tmp/perf.ex.MD5.ZQW 2589568 perf record -o /tmp/__perf_test.program.mtcZH/perf.data --branch-filter any,save_type,u -- perf test -w brstack 2589649 perf record --per-thread -o /tmp/__perf_test.perf.data.5d3dc perf test -w thloop 2589898 perf record -o /tmp/perf-test-script.BX2b27Dcnj/pp-perf.data --sample-cpu uname Which gets a little closer to finding the issue. Committer testing: root@number:~# root@number:~# grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-14700K root@number:~# Before: root@number:~# perf stat -e intel_bts// & [1] 197954 root@number:~# perf test "perf all PMU test" 124: perf all PMU test : FAILED! root@number:~# perf test -v "perf all PMU test" \|& tail Testing i915/vecs0-busy/ Testing i915/vecs0-sema/ Testing i915/vecs0-wait/ Testing intel_bts// Unexpected signal in main Error: The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (intel_bts//). "dmesg \| grep -i perf" may provide additional information. ---- end(-1) ---- 124: perf all PMU test : FAILED! root@number:~# After: root@number:~# perf stat -e intel_bts// & [1] 200195 root@number:~# perf test "perf all PMU test" 123: perf all PMU test : FAILED! root@number:~# perf test -v "perf all PMU test" \|& tail Testing i915/vecs0-wait/ Testing intel_bts// Unexpected signal in main Error: The PMU intel_bts counters are busy and in use by another process. Possible processes: 200195 perf stat -e intel_bts// 2319766 /root/bin/perf top --stdio ---- end(-1) ---- 123: perf all PMU test : FAILED! root@number:~# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ze Gao <zegao2021@gmail.com> Change-Id: Ie1ed8688286c44e8f44a35e98fed8be3e2a344df Link: https://lore.kernel.org/r/20241106003007.2112584-1-ctshao@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Arnaldo Carvalho de Melo	d52af4b8c6	perf tests shell task_analyzer: Run this test exclusively When running in the now default parallel mode this test has been frequently failing, while when running exclusively, on a quiet system, it passes. Since its expectations were established when serial testing was the norm, mark it as exclusive to get this kind of resunt: root@x1:~# perf test 106 106: perf script task-analyzer tests : Ok root@x1:~# set -o vi root@x1:~# perf stat --null --repeat 10 perf test 106 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok 106: perf script task-analyzer tests : Ok Performance counter stats for 'perf test 106' (10 runs): 4.8872 +- 0.0179 seconds time elapsed ( +- 0.37% ) root@x1:~# Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Charlie Jenkins	0f9ad973b0	perf tests code-reading: Handle change in objdump output from binutils >= 2.41 on riscv After binutils commit e43d876 which was first included in binutils 2.41, riscv no longer supports dumping in the middle of instructions. Increase the objdump window by 2-bytes to ensure that any instruction that sits on the boundary of the specified stop-address is not cut in half. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241219-perf_fix_riscv_obj_reading-v3-1-a7d644dcfa50@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:20:42 -03:00
Arnaldo Carvalho de Melo	058b38ccd2	perf top: Don't complain about lack of vmlinux when not resolving some kernel samples Recently we got a case where a kernel sample wasn't being resolved due to a bug that was not setting the end address on kernel functions implemented in assembly (see Link: tag), and then those were not being found by machine__resolve() -> map__find_symbol(). So we ended up with: # perf top --stdio PerfTop: 0 irqs/s kernel: 0% exact: 0% lost: 0/0 drop: 0/0 [cycles/P] ----------------------------------------------------------------------- Warning: A vmlinux file was not found. Kernel samples will not be resolved. ^Z [1]+ Stopped perf top --stdio # But then resolving all other kernel symbols. So just fixup the logic to only print that warning when there are no symbols in the kernel map. Fixes: `d88205db9c` ("perf dso: Add dso__has_symbols() method") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/Z3buKhcCsZi3_aGb@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2025-01-08 17:18:31 -03:00
James Clark	ed60738a9b	perf stat: Document and clarify outstate members Not all of these are "state" so separate them into two sections. Rename and document to make all clearer. Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-6-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:34:52 -03:00
James Clark	dd566687ef	perf stat: Document and simplify interval timestamps Rename 'prefix' to 'timestamp' because that's all it does, except in iostat mode where it's slightly overloaded, but still includes a timestamp. This reveals a problem with iostat and JSON mode so document this. Make it more explicit that these are printed in interval mode by changing 'if (prefix)' to 'if (interval)' which reveals an unnecessary 'else if (... && !interval)' which can be removed. Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-5-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:34:21 -03:00
James Clark	d226f434fb	perf stat: Remove empty new_line_metric function Despite the name new_line_metric doesn't make a new line, it actually does nothing. Change it to NULL to avoid confusion. Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-4-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:33:29 -03:00
James Clark	9f1df75509	perf stat: Also hide metric-units from JSON when event didn't run We decided to hide NULL metric-units rather than showing it as "(null)" when a dependent event for a metric doesn't exist. But on hybrid systems if the process doesn't hit a PMU you get an empty string metric unit instead. To make it consistent change all empty strings to NULL. Note that metric-threshold is already hidden in this case without this change. Where a process only runs on cpu_core and never hits cpu_atom: Before: $ perf stat -j -- true ... {"counter-value" : "<not counted>", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 0, "pcnt-running" : 0.00, "metric-value" : "0.000000", "metric-unit" : ""} {"counter-value" : "6326.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 293786, "pcnt-running" : 100.00, "metric-value" : "3.553394", "metric-unit" : "of all branches", "metric-threshold" : "good"} ... After: ... {"counter-value" : "<not counted>", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 0, "pcnt-running" : 0.00} {"counter-value" : "5778.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 282240, "pcnt-running" : 100.00, "metric-value" : "3.226797", "metric-unit" : "of all branches", "metric-threshold" : "good"} ... Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-3-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:30:21 -03:00
James Clark	967364894e	perf stat: Fix trailing comma when there is no metric unit Now that printing metric-value and metric-unit is optional, print_running_json() shouldn't add the comma in case it becomes trailing. Replace all manual JSON comma stuff with a json_out() function that uses the existing os->first tracking and auto inserts a comma if it's needed. Update the test to handle that two of the fields can be missing. This fixes the following test failure on Cortex A57 where the branch misses metric is missing a required event: $ perf test -vvv "json output" 106: perf stat JSON output linter: --- start --- test child forked, pid 665682 Checking json output: no args Test failed for input: {"counter-value" : "3112.000000", "unit" : "", "event" : "armv8_pmuv3_1/branch-misses/", "event-runtime" : 20699340, "pcnt-running" : 100.00, } ... json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 12 column 144 (char 2109) ---- end(-1) ---- 106: perf stat JSON output linter : FAILED! Fixes: `e1cc918b6c` ("perf stat: Drop metric-unit if unit is NULL") Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241112160048.951213-2-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:20:43 -03:00
Howard Chu	00c640595e	perf docs: Add documentation for --force-btf option The --force-btf option is intended for debugging purposes and is currently undocumented. Add documentation for it. Committer notes: We need a follow up patch expanding on what can be done via BTF and what isn't possible and thus needs further work to convert kernel C source code into tables that can then be associated with syscall integer args and struct members, as discussed in: https://lore.kernel.org/all/20241215190712.787847-3-howardchu95@gmail.com/T/#mcfbba653200775c59c730705229a49b34a153db7 Signed-off-by: Howard Chu <howardchu95@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20241215190712.787847-3-howardchu95@gmail.com Link: https://lore.kernel.org/all/20241215190712.787847-3-howardchu95@gmail.com/T/#mcfbba653200775c59c730705229a49b34a153db7 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:19:26 -03:00
Howard Chu	0255338d69	perf trace: Add tests for BTF general augmentation Currently, we only have 'perf trace' augmentation tests for enum arguments. This patch adds tests for more general syscall arguments, such as struct pointers, strings, and buffers. These tests utilize the 'perf config' system to configure 'the perf trace' output, as suggested by Arnaldo Carvalho de Melo <acme@kernel.org>. Committer testing: root@number:~# perf test "BTF general" 109: perf trace BTF general tests : Ok root@number:~# perf test -v "BTF general" 109: perf trace BTF general tests : Ok root@number:~# perf test -vv "BTF general" 109: perf trace BTF general tests: --- start --- test child forked, pid 1410451 Checking if vmlinux BTF exists Testing perf trace's string augmentation Testing perf trace's buffer augmentation Testing perf trace's struct augmentation ---- end(0) ---- 109: perf trace BTF general tests : Ok root@number:~# It still fails sometimes, for instance when tested with: root@number:~# perf stat --null -r 10 perf test "BTF general" 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : FAILED! 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : FAILED! 109: perf trace BTF general tests : Ok 109: perf trace BTF general tests : Ok Performance counter stats for 'perf test BTF general' (10 runs): 2.148 +- 0.293 seconds time elapsed ( +- 13.63% ) root@number:~# But we can go on from here and fix things up with followup patches. Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Howard Chu <howardchu95@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20241215190712.787847-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-26 12:18:11 -03:00
Dr. David Alan Gilbert	e5de3f9da5	perf path: Remove unused is_executable_file() is_executable_file() has been unused since 2022's commit `7391db6459` ("perf test: Refactor shell tests allowing subdirs") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241222215831.283248-1-linux@treblig.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Ian Rogers	2f4847b5d6	perf values: Use evsel rather than evsel->idx An evsel idx may not be stable due to sorting, evlist removal, etc. Avoid use of the idx where the evsel itself can be used to avoid these problems. This removed 1 values array and duplicated evsel name strings. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Chen Ni <nichen@iscas.ac.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241114230713.330701-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Ian Rogers	2f0539fa02	perf stream: Use evsel rather than evsel->idx An evsel idx may not be stable due to sorting, evlist removal, etc. Avoid use of the idx where the evsel itself can be used to avoid these problems. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Chen Ni <nichen@iscas.ac.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241114230713.330701-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Ian Rogers	26f45ec8f0	perf jevents: Provide better path information for broken JSON If the JSON input to jevents.py is broken it can be problematic to work out which particular JSON file is broken. When processing files catch exceptions that occur that re-raise the exception with path details added. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20241114172309.840241-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Namhyung Kim	91a5bffa56	perf lock contention: Handle slab objects in -L/--lock-filter option This is to filter lock contention from specific slab objects only. Like in the lock symbol output, we can use '&' prefix to filter slab object names. root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 3 14.99 us 14.44 us 5.00 us ffffffff851c0940 pack_mutex (mutex) 2 2.75 us 2.56 us 1.38 us ffff98d7031fb498 &task_struct (mutex) 4 1.42 us 557 ns 355 ns ffff98d706311400 &kmalloc-cg-512 (mutex) 2 953 ns 714 ns 476 ns ffffffff851c3620 delayed_uprobe_lock (mutex) 1 929 ns 929 ns 929 ns ffff98d7031fb538 &task_struct (mutex) 3 561 ns 210 ns 187 ns ffffffff84a8b3a0 text_mutex (mutex) 1 479 ns 479 ns 479 ns ffffffff851b4cf8 tracepoint_srcu_srcu_usage (mutex) 2 320 ns 195 ns 160 ns ffffffff851cf840 pcpu_alloc_mutex (mutex) 1 212 ns 212 ns 212 ns ffff98d7031784d8 &signal_cache (mutex) 1 177 ns 177 ns 177 ns ffffffff851b4c28 tracepoint_srcu_srcu_usage (mutex) With the filter, it can show contentions from the task_struct only. root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl -L '&task_struct' sleep 1 contended total wait max wait avg wait address symbol 2 1.97 us 1.71 us 987 ns ffff98d7032fd658 &task_struct (mutex) 1 1.20 us 1.20 us 1.20 us ffff98d7032fd6f8 &task_struct (mutex) It can work with other aggregation mode: root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -ab -L '&task_struct' sleep 1 contended total wait max wait avg wait type caller 1 25.10 us 25.10 us 25.10 us mutex perf_event_exit_task+0x39 1 21.60 us 21.60 us 21.60 us mutex futex_exit_release+0x21 1 5.56 us 5.56 us 5.56 us mutex futex_exec_release+0x21 Committer testing: root@number:~# perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 1 20.80 us 20.80 us 20.80 us ffff9d417fbd65d0 (spinlock) 8 12.85 us 2.41 us 1.61 us ffff9d415eeb6a40 rq_lock (spinlock) 1 2.55 us 2.55 us 2.55 us ffff9d415f636a40 rq_lock (spinlock) 7 1.92 us 840 ns 274 ns ffff9d39c2cbc8c4 (spinlock) 1 1.23 us 1.23 us 1.23 us ffff9d415fb36a40 rq_lock (spinlock) 2 928 ns 738 ns 464 ns ffff9d39c1fa6660 &kmalloc-rnd-14-192 (rwlock) 4 788 ns 252 ns 197 ns ffffffffb8608a80 jiffies_lock (spinlock) 1 304 ns 304 ns 304 ns ffff9d39c2c979c4 (spinlock) 1 216 ns 216 ns 216 ns ffff9d3a0225c660 &kmalloc-rnd-14-192 (rwlock) 1 89 ns 89 ns 89 ns ffff9d3a0adbf3e0 &kmalloc-rnd-14-192 (rwlock) 1 61 ns 61 ns 61 ns ffff9d415f9b6a40 rq_lock (spinlock) root@number:~# uname -r 6.13.0-rc2 root@number:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Namhyung Kim	0c631ef07c	perf lock contention: Resolve slab object name using BPF The bpf_get_kmem_cache() kfunc can return an address of the slab cache (kmem_cache). As it has the name of the slab cache from the iterator, we can use it to symbolize some dynamic kernel locks in a slab. Before: root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 2 3.34 us 2.87 us 1.67 us ffff9d7800ad9600 (mutex) 2 2.16 us 1.93 us 1.08 us ffff9d7804b992d8 (mutex) 4 1.37 us 517 ns 343 ns ffff9d78036e6e00 (mutex) 1 1.27 us 1.27 us 1.27 us ffff9d7804b99378 (mutex) 2 845 ns 599 ns 422 ns ffffffff9e1c3620 delayed_uprobe_lock (mutex) 1 845 ns 845 ns 845 ns ffffffff9da0b280 jiffies_lock (spinlock) 2 377 ns 259 ns 188 ns ffffffff9e1cf840 pcpu_alloc_mutex (mutex) 1 305 ns 305 ns 305 ns ffffffff9e1b4cf8 tracepoint_srcu_srcu_usage (mutex) 1 295 ns 295 ns 295 ns ffffffff9e1c0940 pack_mutex (mutex) 1 232 ns 232 ns 232 ns ffff9d7804b7d8d8 (mutex) 1 180 ns 180 ns 180 ns ffffffff9e1b4c28 tracepoint_srcu_srcu_usage (mutex) 1 165 ns 165 ns 165 ns ffffffff9da8b3a0 text_mutex (mutex) After: root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex) 1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex) 4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex) 2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex) 3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex) 3 421 ns 164 ns 140 ns ffffffffa3a8b3a0 text_mutex (mutex) 1 409 ns 409 ns 409 ns ffffffffa41b4cf8 tracepoint_srcu_srcu_usage (mutex) 2 362 ns 239 ns 181 ns ffffffffa41cf840 pcpu_alloc_mutex (mutex) 1 220 ns 220 ns 220 ns ffff9d5e82b534d8 &signal_cache (mutex) 1 215 ns 215 ns 215 ns ffffffffa41b4c28 tracepoint_srcu_srcu_usage (mutex) Note that the name starts with '&' sign for slab objects to inform they are dynamic locks. It won't give the accurate lock or type names but it's still useful. We may add type info to the slab cache later to get the exact name of the lock in the type later. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:53:08 -03:00
Namhyung Kim	e2c4dc54cd	perf lock contention: Run BPF slab cache iterator Recently the kernel got the kmem_cache iterator to traverse metadata of slab objects. This can be used to symbolize dynamic locks in a slab. The new slab_caches hash map will have the pointer of the kmem_cache as a key and save the name and a id. The id will be saved in the flags part of the lock. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-3-namhyung@kernel.org [ Added change from Namhyung addressing review from Alexei: ] Link: https://lore.kernel.org/r/Z2dVdH3o5iF-KrWj@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-23 13:52:03 -03:00
Namhyung Kim	d8cc6da406	perf lock contention: Add and use LCB_F_TYPE_MASK This is a preparation for the later change. It'll use more bits in the flags so let's rename the type part and use the mask to extract the type. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-20 17:36:06 -03:00
Arnaldo Carvalho de Melo	efff5add20	perf script: Cache the output type Right now every time we need to figure out the type of an evsel for output purposes we do a quick sequence of ifs, but there are new cases where there is a need to do more complex iterations over multiple data structures, sso allow for caching this operation on a hole of 'struct evsel'. This should really be done on the evsel->priv area that 'perf script' sets up, but more work is needed to make sure that it is allocated when we need it, right now it is only used for conditionally, add some comments so that we move this to that 'perf script' specific area when the conditions are in place for that. Acked-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/lkml/Z2XCi3PgstSrV0SE@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-20 17:35:54 -03:00
Ian Rogers	233157785a	perf python: Correctly throw IndexError Correctly throw IndexError for out-of-bound accesses to evlist: Python 3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path.insert(0, '/tmp/perf/python') >>> import perf >>> x=perf.parse_events('cycles') >>> print(x) evlist([cycles]) >>> x[2] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: Index out of range Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-23-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	24fb6de241	perf python: Add __str__ and __repr__ functions to evsel This allows evsel to be shown in the REPL like: Python 3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path.insert(0, '/tmp/perf/python') >>> import perf >>> x=perf.parse_events('cycles,data_read') >>> print(x) evlist([cycles,uncore_imc_free_running_0/data_read/,uncore_imc_free_running_1/data_read/]) >>> x[0] evsel(cycles) >>> x[1] evsel(uncore_imc_free_running_0/data_read/) >>> x[2] evsel(uncore_imc_free_running_1/data_read/) Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-22-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	3c0401a081	perf python: Add __str__ and __repr__ functions to evlist This allows the values in the evlist to be shown in the REPL like: Python 3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path.insert(0,'/tmp/perf/python') >>> import perf >>> perf.parse_events('cycles,data_read') evlist([cycles,uncore_imc_free_running_0/data_read/,uncore_imc_free_running_1/data_read/]) Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-21-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	f081defccd	perf python: Add parse_events function Add basic parse_events function that takes a string and returns an evlist. As the python evlist is embedded in a pyrf_evlist, and the evsels are embedded in pyrf_evsels, copy the parsed data into those structs and update evsel__clone to enable this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-20-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	5c10f3b446	perf build: Remove test library from python shared object With the attr.c code moved to a shell test, there is no need to link the test code into the python dso to avoid a missing reference to test_attr__open. Drop the test code from the python library. With the bench and test code removed from the python library on my x86 debian derived laptop the python library is reduced in size by 508,712 bytes or nearly 5%. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-19-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	9cf133c25c	perf kwork: Make perf_kwork_add_work a callback perf_kwork_add_work is declared in builtin-kwork, whereas much kwork code is in util. To avoid needing to stub perf_kwork_add_work in python.c, add a callback to struct perf_kwork and initialize it in builtin-kwork to perf_kwork_add_work - this is the only struct perf_kwork. This removes the need for the stub in python.c. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-18-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	df487111bd	perf bench: Remove reference to cmd_inject Avoid `perf bench internals inject-build-id` referencing the cmd_inject sub-command that requires perf-bench to backward reference internals of builtins. Replace the reference to cmd_inject with a call to main. To avoid python.c needing to link with something providing main, drop the libperf-bench library from the python shared object. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	1a12ed09bc	perf lock: Move common lock contention code to new file Avoid references from util code to builtin-lock that require python stubs. Move the functions and related variables to util/lock-contention.c. Add max_stack_depth parameter to match_callstack_filter to avoid sharing a global variable. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-16-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	16ecb4316f	perf env: Move arch errno function to only use in env Move arch_syscalls__strerrno_function out of builtin-trace.c to env.c so that there isn't a util to builtin function call. This allows the python.c stub to be removed. Also, remove declaration/prototype from env.h and make static to reduce scope. The include is moved inside ifdefs to avoid, "defined but unused warnings". Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-15-irogers@google.com perf: perf python: Correctly throw IndexError Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	254a867b98	perf intel-pt: Remove stale build comment Commit `00a263902a` ("perf intel-pt: Use shared x86 insn decoder") removed the use of diff, so remove stale busybox comment. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	e7bb49e3f6	perf x86: Define arch_fetch_insn in NO_AUXTRACE builds archinsn.c containing arch_fetch_insn was only enabled with CONFIG_AUXTRACE, but this meant that a NO_AUXTRACE build on x86 would use the empty weak version of arch_fetch_insn - weak symbols are a frequent source of errors like this and are outside of the C specification. Change it so that archinsn.c is always built on x86 and make the weak symbol empty version of arch_fetch_insn a strong one guarded by ifdefs. arch_fetch_insn on x86 depends on insn_decode which is a function included then built into intel-pt-insn-decoder.c. intel-pt-insn-decoder.c isn't built in a NO_AUXTRACE=1 build. Separate the insn_decode function from intel-pt-insn-decoder.c by just directly compiling the relevant file. Guard this compilation to be for either always on x86 (because of the use in arch_fetch_insn) or when auxtrace is enabled. Apply the CFLAGS overrides as necessary, reducing the amount of code where warnings are disabled. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	dc7be5e4c0	perf script: Move perf_sample__sprintf_flags to trace-event-scripting.c perf_sample__sprintf_flags is used in the python C code and so needs to be in the util library rather than a builtin. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241119011644.971342-12-irogers@google.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:33 -03:00
Ian Rogers	1ff2ca39b3	perf script: Move script_fetch_insn to trace-event-scripting.c Add native_arch as a parameter to script_fetch_insn rather than relying on the builtin-script value that won't be initialized for the dlfilter and python Context use cases. Assume both of those cases are running natively. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	04051b4a93	perf script: Move script_spec code to trace-event-scripting.c The script_spec code is referenced in util/trace-event-scripting but the list was in builtin-script, accessed via a function that required a stub function in python.c. Move all the logic to trace-event-scripting, with lookup and foreach functions exposed for builtin-script's benefit. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	9557d1562a	perf stat: Move stat_config into config.c stat_config is accessed by config.c via helper functions, but declared in builtin-stat. Move to util/config.c so that stub functions aren't needed in python.c which doesn't link against the builtin files. To avoid name conflicts change builtin-script to use the same stat_config as builtin-stat. Rename local variables in tests to avoid shadow declaration warnings. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	d927e30ca0	perf script: Move find_scripts to browser/scripts.c The only use of find_scripts is in browser/scripts.c but the definition in builtin causes linking problems requiring a stub in python.c. Move the function to allow the stub to be removed. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	f76f94dc78	perf script: Use openat for directory iteration Rewrite the directory iteration to use openat so that large character arrays aren't needed. The arrays are warned about potential buffer overflows by GCC when the code exists in a single C file. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	3f1889422a	perf kvm: Move functions used in util out of builtin The util library code is used by the python module but doesn't have access to the builtin files. Make a util/kvm-stat.c to match the kvm-stat.h file that declares the functions and move the functions there. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	702c7a4aec	perf script: Move scripting_max_stack out of builtin scripting_max_stack is used in util code which is linked into the python module. Move the variable declaration to util/trace-event-scripting.c to avoid conditional compilation. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	c027e637bb	perf python: Remove unused #include Remove unused #include of bpf-filter.h. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	b8816289ab	perf python: Constify variables and parameters Opportunistically constify variables and parameters when possible. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Ian Rogers	e7e9943c87	perf python: Remove python 2 scripting support Python2 was deprecated 4 years ago, remove support and workarounds. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	4c7f9ee2eb	perf intel-pt: Add a test for pause / resume Add a simple sub-test to the "Miscellaneous Intel PT testing" test to check pause / resume. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	f8b301e0a4	perf intel-pt: Add documentation for pause / resume Document the use of aux-action config term and provide a simple example. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	f38ec2274c	perf intel-pt: Improve man page format Improve format of config terms and section references. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	bf66b5fd6e	perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume Display "feature is not supported" error message if aux_start_paused, aux_pause or aux_resume result in a perf_event_open() error. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	8a0f49a7f1	perf tools: Parse aux-action Add parsing for aux-action to accept "pause", "resume" or "start-paused" values. "start-paused" is valid only for AUX area events. "pause" and "resume" are valid only for events grouped with an AUX area event as the group leader. However, like with aux-output, the events will be automatically grouped if they are not currently in a group, and the AUX area event precedes the other events. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	314bf84e03	perf tools: Add aux-action config term Add a new common config term "aux-action" to use for configuring AUX area trace pause / resume. The value is a string that will be parsed in a subsequent patch. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Adrian Hunter	f3e7194756	perf tools: Add aux_start_paused, aux_pause and aux_resume Add 'struct perf_event_attr' members to support pause and resume of AUX area tracing. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Leo Yan	44b44ffd5d	perf build: Minor improvement for linking libzstd The zstd library will be automatically linked by detecting the feature libzstd. It is no need to explicitly link it for static builds, so remove the redundant linkage. It is contradictory to detect the feature libelf-zstd while the build configuration NO_LIBZSTD is set. Report an error for reminding users not to set NO_LIBZSTD. Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Nick Terrell <terrelln@fb.com> Cc: Quentin Monnet <qmo@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241215221223.293205-3-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:32 -03:00
Athira Rajeev	ea3683fda6	perf tools tests shell base_probe: Enhance print_overall_results to print summary information Currently print_overall_results prints the number of fails in the summary, example from base_probe tests in testsuite_probe: ## [ FAIL ] ## perf_probe :: test_invalid_options SUMMARY :: 11 failures found test_invalid_options contains multiple tests and out of that 11 failed. Sometimes it could happen that it is due to missing dependency in the build or environment dependency. Example, perf probe -L requires DWARF enabled. otherwise it fails as below: ./perf probe -L Error: switch `L' is not available because NO_DWARF=1 "-L" is tested as one of the option in: for opt in '-a' '-d' '-L' '-V'; do <<perf probe test>> print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "missing argument for $opt" Here -a and -d doesn't require DWARF. Similarly there are few other tests requiring DWARF. To hint the user that missing DWARF could be one issue, update print_overall_results to print a comment string along with summary hinting the possible cause. Update test_invalid_options.sh and test_line_semantics.sh to pass the info about DWARF requirement since these tests failed when perf is built without DWARF. Use the check for presence of DWARF with "perf check feature" and append the hint message based on the result. With the change: ## [ FAIL ] ## perf_probe :: test_invalid_options SUMMARY :: 11 failures found :: Some of the tests need DWARF to run Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241206135254.35727-1-atrajeev@linux.vnet.ibm.com [ Minor edits changing "dwarf" to "DWARF" as its an acronym ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:31 -03:00
Athira Rajeev	2aad2130c2	perf tools arch powerpc: Add register mask for power11 PVR in extended regs Perf tools side uses extended mask to display the platform supported register names (with -I? option) to the user and also send this mask to the kernel to capture the extended registers as part of each sample. This mask value is decided based on the processor version ( from PVR ). Add PVR value for power11 to enable capturing the extended regs as part of sample in power11. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241206135637.36166-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:31 -03:00
Namhyung Kim	a5bbe6dd69	perf ftrace latency: Fix compiler error for clang 12 I noticed this error on CentOS 8. CLANG /build/util/bpf_skel/.tmp/func_latency.bpf.o Error at line 119: Unsupport signed division for DAG: 0x55829ee68a10: i64 = sdiv 0x55829ee68bb0, 0x55829ee69090, util/bpf_skel/func_latency.bpf.c:119:17 @[ util/bpf_skel/func_latency.bpf.c:84:5 ]Please convert to unsigned div/mod. fatal error: error in backend: Cannot select: 0x55829ee68a10: i64 = sdiv 0x55829ee68bb0, 0x55829ee69090, util/bpf_skel/func_latency.bpf.c:119:17 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68bb0: i64,ch = CopyFromReg 0x55829edc9a78, Register:i64 %5, util/bpf_skel/func_latency.bpf.c:119:17 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68e20: i64 = Register %5 0x55829ee69090: i64,ch = load<(volatile dereferenceable load 4 from @bucket_range, !tbaa !160), zext from i32> 0x55829edc9a78, 0x55829ee68fc0, undef:i64, util/bpf_skel/func_latency.bpf.c:119:19 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68fc0: i64 = BPFISD::Wrapper TargetGlobalAddress:i64<i32* @bucket_range> 0, util/bpf_skel/func_latency.bpf.c:119:19 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68808: i64 = TargetGlobalAddress<i32* @bucket_range> 0, util/bpf_skel/func_latency.bpf.c:119:19 @[ util/bpf_skel/func_latency.bpf.c:84:5 ] 0x55829ee68530: i64 = undef In function: func_end PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. It complains about sdiv which is (s64)delta / (u32)bucket_range. Let's cast the delta to u64 for division. Committer testing: Tested on: $ head -2 /etc/os-release NAME="Fedora Linux" VERSION="40 (Toolbx Container Image)" $ clang --version \|& head -1 clang version 18.1.8 (Fedora 18.1.8-1.fc40) $ root@number:~# perf ftrace latency --use-nsec --bucket-range=200 --min-latency 250 --max-latency=5000 -T switch_mm_irqs_off -a sleep 10 # DURATION \| COUNT \| GRAPH \| 0 - 250 ns \| 28 \| ##### \| 250 - 450 ns \| 12 \| ## \| 450 - 650 ns \| 10 \| # \| 650 - 850 ns \| 9 \| # \| 850 - 1050 ns \| 20 \| ### \| 1.05 - 1.25 us \| 14 \| ## \| 1.25 - 1.45 us \| 16 \| ### \| 1.45 - 1.65 us \| 8 \| # \| 1.65 - 1.85 us \| 11 \| ## \| 1.85 - 2.05 us \| 7 \| # \| 2.05 - 2.25 us \| 11 \| ## \| 2.25 - 2.45 us \| 10 \| # \| 2.45 - 2.65 us \| 7 \| # \| 2.65 - 2.85 us \| 8 \| # \| 2.85 - 3.05 us \| 7 \| # \| 3.05 - 3.25 us \| 7 \| # \| 3.25 - 3.45 us \| 10 \| # \| 3.45 - 3.65 us \| 5 \| \| 3.65 - 3.85 us \| 9 \| # \| 3.85 - 4.05 us \| 2 \| \| 4.05 - 4.25 us \| 6 \| # \| 4.25 - ... us \| 23 \| #### \| root@number:~# Fixes: `e8536dd47a` ("perf ftrace latency: Introduce --bucket-range to ask for linear bucketing") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241214002938.1027546-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:31 -03:00
Arnaldo Carvalho de Melo	055f0ce7d8	tools build: Test for presence of libtraceevent and libtracefs in test-all.c Since these are so far considered part of the basic set of libraries to be present when building perf, have then in tools/build/features/test-all.c. They were already in the FEATURE_TESTS_BASIC variable of tools/build/Makefile.feature, meaning if test-all.c builds, those features would be set as present, but then we were calling "again" (well, they were not in test-all.c, so were not really being tested) for it to be detected, fix this all up by not calling feature_check for those features but instead have them in test-all.c to be tested together with the the set of basic expected libraries. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20241213195052.914914-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-18 16:24:28 -03:00
Arnaldo Carvalho de Melo	dea654e34a	perf tests switch-tracking: Set this test to run exclusively This test was failing when run with the default 'perf test' mode, which is to run multiple regression tests in parallel. Since it checks system_wide mode, set it to run in exclusive mode. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/lkml/Z1yPYqYYs_isO1PJ@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-13 18:30:48 -03:00
Ravi Bangoria	4cd67bac9d	perf test: Introduce DEFINE_SUITE_EXCLUSIVE() A variant of DEFINE_SUITE() but sets ->exclusive bit for the test so the test will be executed sequentially. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20241210093449.1662-10-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-13 16:55:09 -03:00
Arnaldo Carvalho de Melo	aec95d7ce1	Merge remote-tracking branch 'torvalds/master' into perf-tools-next To get the fixes that went thru perf-tools for v6.13. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-13 11:53:27 -03:00
Levi Yun	1d18ebcfd3	perf expr: Initialize is_test value in expr__ctx_new() When expr_parse_ctx is allocated by expr_ctx_new(), expr_scanner_ctx->is_test isn't initialize, so it has garbage value. this can affects the result of expr__parse() return when it parses non-exist event literal according to garbage value. Use calloc instead of malloc in expr_ctx_new() to fix this. Fixes: `3340a08354` ("perf pmu-events: Fix testing with JEVENTS_ARCH=all") Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241108143424.819126-1-yeoreum.yun@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 16:12:37 -03:00
Jiapeng Chong	9ba3462c1c	perf tests: Fix an incorrect type in append_script() The return value from the call to readlink() is ssize_t. However, the return value is being assigned to an size_t variable 'len', so making 'len' an ssize_t. ./tools/perf/tests/tests-scripts.c:182:5-8: WARNING: Unsigned expression compared with zero: len < 0. Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=11909 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241115091527.128923-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 16:08:36 -03:00
Ruffalo Lavoisier	8791a78fb7	perf test: Remove duplicate word - Remove duplicate word, 'the'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ruffalo Lavoisier <RuffaloLavoisier@gmail.com> Cc: linux-security-module@vger.kernel.org Link: https://lore.kernel.org/r/20241120043503.80530-1-RuffaloLavoisier@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 15:55:16 -03:00
Ian Rogers	61e0a94463	perf string: Avoid undefined NULL+1 While the value NULL+1 is never used it triggers a ubsan warning. Restructure and comment the loop to avoid this. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241120065224.286813-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 15:53:36 -03:00
James Clark	7269846617	perf vendor events arm64: Update N2/V2 events from source Update using the new data [1] for these changes: * Scale some metrics like dtlb_walk_ratio to percent so they display better with Perf's 2 dp precision * Description typos, grammar and clarifications * Unnecessary metric formula brackets seem to have been removed in the source but this is not a functional change * New sve_all_percentage metric The following command was used to generate this commit: $ telemetry-solution/tools/perf_json_generator/generate.py \ tools/perf/ --telemetry-files \ telemetry-solution/data/pmu/cpu/neoverse/neoverse-v2.json:neoverse-n2-v2 [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-v2.json Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241120143739.243728-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 15:41:27 -03:00
Namhyung Kim	ad5d76aecd	perf tools: Avoid unaligned pointer operations The sample data is 64-bit aligned basically but raw data starts with 32-bit length field and data follows. In perf_event__synthesize_sample it treats the sample data as a 64-bit array. And it needs some trick to update the raw data properly. But it seems some compilers are not happy with this and the program dies siliently. I found the sample parsing test failed without any messages on affected systems. Let's update the code to use a 32-bit pointer directly and make sure the result is 64-bit aligned again. No functional changes intended. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241128010325.946897-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-12 15:36:46 -03:00
James Clark	434fffa926	perf probe: Fix uninitialized variable Since the linked fixes: commit, err is returned uninitialized due to the removal of "return 0". Initialize err to fix it. This fixes the following intermittent test failure on release builds: $ perf test "testsuite_probe" ... -- [ FAIL ] -- perf_probe :: test_invalid_options :: mutually exclusive options :: -L foo -V bar (output regexp parsing) Regexp not found: \"Error: switch .+ cannot be used with switch .+\" ... Fixes: `080e47b2a2` ("perf probe: Introduce quotation marks support") Tested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241211085525.519458-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-11 21:40:46 -08:00
Ian Rogers	a93a620c38	perf test expr: Fix system_tsc_freq for only x86 The refactoring of tool PMU events to have a PMU then adding the expr literals to the tool PMU made it so that the literal system_tsc_freq was only supported on x86. Update the test expectations to match - namely the parsing is x86 specific and only yields a non-zero value on Intel. Fixes: `609aa2667f` ("perf tool_pmu: Switch to standard pmu functions and json descriptions") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Closes: https://lore.kernel.org/linux-perf-users/20241022140156.98854-1-atrajeev@linux.vnet.ibm.com/ Co-developed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241205022305.158202-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-11 09:19:44 -08:00
Zhongqiu Han	03edb7020b	perf bpf: Fix two memory leakages when calling perf_env__insert_bpf_prog_info() If perf_env__insert_bpf_prog_info() returns false due to a duplicate bpf prog info node insertion, the temporary info_node and info_linear memory will leak. Add a check to ensure the memory is freed if the function returns false. Fixes: `d56354dc49` ("perf tools: Save bpf_prog_info and BTF of new BPF programs") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241205084500.823660-4-quic_zhonhan@quicinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 16:59:32 -03:00
Zhongqiu Han	a7da6c7030	perf header: Fix one memory leakage in process_bpf_prog_info() Function __perf_env__insert_bpf_prog_info() will return without inserting bpf prog info node into perf env again due to a duplicate bpf prog info node insertion, causing the temporary info_linear and info_node memory to leak. Modify the return type of this function to bool and add a check to ensure the memory is freed if the function returns false. Fixes: `606f972b13` ("perf bpf: Save bpf_prog_info information as headers to perf.data") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241205084500.823660-3-quic_zhonhan@quicinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 16:59:32 -03:00
Zhongqiu Han	875d22980a	perf header: Fix one memory leakage in process_bpf_btf() If __perf_env__insert_btf() returns false due to a duplicate btf node insertion, the temporary node will leak. Add a check to ensure the memory is freed if the function returns false. Fixes: `a70a112317` ("perf bpf: Save BTF information as headers to perf.data") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241205084500.823660-2-quic_zhonhan@quicinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 16:59:32 -03:00
Ian Rogers	7504a1c20e	perf jevents: Fix build issue in '/' in event descriptions For big string offsets we output comments for what string the offset is for. If the string contains a '/' as seen in Intel Arrowlake event descriptions, then this causes C parsing issues for the generated pmu-events.c. Catch such '/' values and escape to avoid this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20241113165558.628856-1-irogers@google.com [ Used return s.replace('/', r'\*\/') based on failure followed by request by Ian ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 16:59:09 -03:00
Veronika Molnarova	625f4de23f	perf test: Parse 'perf stat' Topdown events for aarch64 The 'perf stat' output on aarch64 machines with topdown events wasn't counted for in the 'perf stat STD output linter' test case. Add the topdown metric to the skip_metric list as it is done for topdown events on other systems. The Topdown events are also disabled on aarch64 KVM guests because the value of caps/slots is set to 0 due to the part of the system register being a stub. This prevents the metric for the topdown events from being computed, leaving the 'perf stat' topdown metric without any value at all. Add the "TopdownL1" to the skip_metric list as well to handle this possibility. Before aarch64: 100: perf stat STD output linter: --- start --- test child forked, pid 403305 Checking STD output: no args Unknown event name in TopdownL1 # 4.3 percent of slots slots_lost_misspeculation_fraction ---- end(-1) ---- 100: perf stat STD output linter : FAILED! Before aarch64 KVM: 100: perf stat STD output linter: --- start --- test child forked, pid 404671 Checking STD output: no args Unknown event name in TopdownL1 ---- end(-1) ---- 100: perf stat STD output linter : FAILED! After: 100: perf stat STD output linter: --- start --- test child forked, pid 404777 Checking STD output: no args [Success] Checking STD output: system wide [Success] Checking STD output: interval [Success] Checking STD output: per thread [Success] Checking STD output: per node [Success] Checking STD output: system wide no aggregation [Success] Checking STD output: per core [Success] Checking STD output: per cache instance [Success] Checking STD output: per cluster [Success] Checking STD output: per die [Success] Checking STD output: per socket [Success] ---- end(0) ---- 100: perf stat STD output linter : Ok Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20241029144347.25651-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:44:07 -03:00
Masami Hiramatsu (Google)	b223564fe1	perf probe: Replace unacceptable characters when generating event name Replace unacceptable characters with '_' when generating event name from the probing function name. This is not for a C program. For the a C program, it will continue to remove suffixes. Note that this language checking depends on the debuginfo. So without the debuginfo, perf probe will always replaces unacceptable characters with '_'. For example. $ ./perf probe -x cro3 -D \"cro3::cmd::servo::run_show\" p:probe_cro3/cro3_cmd_servo_run_show /work/cro3/target/x86_64-unknown-linux-gnu/debug/cro3:0x197530 $ ./perf probe -x /work/go/example/outyet/main -D 'main.(*Server).poll' p:probe_main/main_Server_poll /work/go/example/outyet/main:0x353040 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173145728160.2747044.18089011235495186810.stgit@mhiramat.roam.corp.google.com [ Removed some extra tabs in the new struct fields ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:41:10 -03:00
Gabriele Monaco	690a052a6d	perf ftrace latency: Add --max-latency option This patch adds a max-latency option as discussed, in case the number of buckets is more than 22, we don't observe the setting (for now, let's say). By default or if 0 is passed, the value is automatically determined based on the number of buckets, range and minimum, so that we fill all available buffers (equivalent to the behaviour before this patch). We now get something like this: # perf ftrace latency --bucket-range=20 \ --min-latency 10 \ --max-latency=100 \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 10 us \| 1731 \| ################ \| 10 - 30 us \| 1 \| \| 30 - 50 us \| 0 \| \| 50 - 70 us \| 0 \| \| 70 - 90 us \| 0 \| \| 90 - 100 us \| 0 \| \| 100 - ... us \| 0 \| \| Note the maximum is observed also if it doesn't cover completely a full range (the second to last range is 10us long to let the last start at 100 sharp), this looks to me more sensible and eases the computations, since we don't need to account for the range while filling the buckets. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:16:40 -03:00
Arnaldo Carvalho de Melo	08b875b6bf	perf ftrace latency: Introduce --min-latency to narrow down into a latency range Things below and over will be in the first and last, outlier, buckets. Without it: # perf ftrace latency --use-nsec --use-bpf \ --bucket-range=200 \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 200 ns \| 0 \| \| 200 - 400 ns \| 44 \| \| 400 - 600 ns \| 291 \| # \| 600 - 800 ns \| 506 \| ## \| 800 - 1000 ns \| 148 \| \| 1.00 - 1.20 us \| 581 \| ## \| 1.20 - 1.40 us \| 2199 \| ########## \| 1.40 - 1.60 us \| 1048 \| #### \| 1.60 - 1.80 us \| 1448 \| ###### \| 1.80 - 2.00 us \| 1091 \| ##### \| 2.00 - 2.20 us \| 517 \| ## \| 2.20 - 2.40 us \| 318 \| # \| 2.40 - 2.60 us \| 370 \| # \| 2.60 - 2.80 us \| 271 \| # \| 2.80 - 3.00 us \| 150 \| \| 3.00 - 3.20 us \| 85 \| \| 3.20 - 3.40 us \| 48 \| \| 3.40 - 3.60 us \| 40 \| \| 3.60 - 3.80 us \| 22 \| \| 3.80 - 4.00 us \| 13 \| \| 4.00 - 4.20 us \| 14 \| \| 4.20 - ... us \| 626 \| ## \| # # perf ftrace latency --use-nsec --use-bpf \ --bucket-range=20 --min-latency=1200 \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1200 ns \| 1243 \| ##### \| 1.20 - 1.22 us \| 141 \| \| 1.22 - 1.24 us \| 202 \| \| 1.24 - 1.26 us \| 209 \| \| 1.26 - 1.28 us \| 219 \| \| 1.28 - 1.30 us \| 208 \| \| 1.30 - 1.32 us \| 245 \| # \| 1.32 - 1.34 us \| 246 \| # \| 1.34 - 1.36 us \| 224 \| # \| 1.36 - 1.38 us \| 219 \| \| 1.38 - 1.40 us \| 206 \| \| 1.40 - 1.42 us \| 190 \| \| 1.42 - 1.44 us \| 190 \| \| 1.44 - 1.46 us \| 146 \| \| 1.46 - 1.48 us \| 140 \| \| 1.48 - 1.50 us \| 125 \| \| 1.50 - 1.52 us \| 115 \| \| 1.52 - 1.54 us \| 102 \| \| 1.54 - 1.56 us \| 87 \| \| 1.56 - 1.58 us \| 90 \| \| 1.58 - 1.60 us \| 85 \| \| 1.60 - ... us \| 5487 \| ######################## \| # Now we want focus on the latencies starting at 1.2us, with a finer grained range of 20ns: This is all on a live system, so statistically interesting, but not narrowing down on the same numbers, so a 'perf ftrace latency record' seems interesting to then use all on the same snapshot of latencies. A --max-latency counterpart should come next, at first limiting the max-latency to 20 * bucket-size, as we have a fixed buckets array with 20 + 2 entries (+ for the outliers) and thus would need to make it larger for higher latencies. We also may need a way to ask for not considering the out of range values (first and last buckets) when drawing the buckets bars. Co-developed-by: Gabriele Monaco <gmonaco@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-4-acme@kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:16:27 -03:00
Arnaldo Carvalho de Melo	e8536dd47a	perf ftrace latency: Introduce --bucket-range to ask for linear bucketing In addition to showing it exponentially, using log2() to figure out the histogram index, allow for showing it linearly: The preexisting more, the default: # perf ftrace latency --use-nsec --use-bpf \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1 ns \| 0 \| \| 1 - 2 ns \| 0 \| \| 2 - 4 ns \| 0 \| \| 4 - 8 ns \| 0 \| \| 8 - 16 ns \| 0 \| \| 16 - 32 ns \| 0 \| \| 32 - 64 ns \| 0 \| \| 64 - 128 ns \| 238 \| # \| 128 - 256 ns \| 1704 \| ########## \| 256 - 512 ns \| 672 \| ### \| 512 - 1024 ns \| 4458 \| ########################## \| 1 - 2 us \| 677 \| #### \| 2 - 4 us \| 5 \| \| 4 - 8 us \| 0 \| \| 8 - 16 us \| 0 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - ... ms \| 0 \| \| # The new histogram mode: # perf ftrace latency --bucket-range=150 --use-nsec --use-bpf \ -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1 ns \| 0 \| \| 1 - 151 ns \| 265 \| # \| 151 - 301 ns \| 1797 \| ########### \| 301 - 451 ns \| 258 \| # \| 451 - 601 ns \| 289 \| # \| 601 - 751 ns \| 2049 \| ############# \| 751 - 901 ns \| 967 \| ###### \| 901 - 1051 ns \| 513 \| ### \| 1.05 - 1.20 us \| 114 \| \| 1.20 - 1.35 us \| 559 \| ### \| 1.35 - 1.50 us \| 189 \| # \| 1.50 - 1.65 us \| 137 \| \| 1.65 - 1.80 us \| 32 \| \| 1.80 - 1.95 us \| 2 \| \| 1.95 - 2.10 us \| 0 \| \| 2.10 - 2.25 us \| 1 \| \| 2.25 - 2.40 us \| 1 \| \| 2.40 - 2.55 us \| 0 \| \| 2.55 - 2.70 us \| 0 \| \| 2.70 - 2.85 us \| 0 \| \| 2.85 - 3.00 us \| 1 \| \| 3.00 - ... us \| 4 \| \| # Co-developed-by: Gabriele Monaco <gmonaco@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-3-acme@kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:16:01 -03:00
Arnaldo Carvalho de Melo	12115c6037	perf ftrace latency: Pass ftrace pointer to histogram routines to pass more args The ftrace->use_nsec arg is being passed to both make_historgram() and display_histogram(), since another ftrace field will be passed to those functions in a followup patch, make them look like other functions in this codebase that receive the 'struct perf_ftrace' pointer. No change in logic. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-10 15:15:55 -03:00
Ian Rogers	d4e17a322a	perf test hwmon_pmu: Fix event file location The temp directory is made and a known fake hwmon PMU created within it. Prior to this fix the events were being incorrectly written to the temp directory rather than the fake PMU directory. This didn't impact the test as the directory fd matched the wrong location, but it doesn't mirror what a hwmon PMU would actually look like. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20241206042306.1055913-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-09 15:00:26 -08:00
Ian Rogers	3f61a12b08	perf hwmon_pmu: Use openat rather than dup to refresh directory The hwmon PMU test will make a temp directory, open the directory with O_DIRECTORY then fill it with contents. As the open is before the filling the contents the later fdopendir may reflect the initial empty state, meaning no events are seen. Change to re-open the directory, rather than dup the fd, so the latest contents are seen. Minor tweaks/additions to debug messages. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20241206042306.1055913-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-09 15:00:03 -08:00
Kuan-Wei Chiu	246dfe3dc1	perf ftrace: Fix undefined behavior in cmp_profile_data() The comparison function cmp_profile_data() violates the C standard's requirements for qsort() comparison functions, which mandate symmetry and transitivity: * Symmetry: If x < y, then y > x. * Transitivity: If x < y and y < z, then x < z. When v1 and v2 are equal, the function incorrectly returns 1, breaking symmetry and transitivity. This causes undefined behavior, which can lead to memory corruption in certain versions of glibc [1]. Fix the issue by returning 0 when v1 and v2 are equal, ensuring compliance with the C standard and preventing undefined behavior. Link: https://www.qualys.com/2024/01/30/qsort.txt [1] Fixes: `0f223813ed` ("perf ftrace: Add 'profile' command") Fixes: `74ae366c37` ("perf ftrace profile: Add -s/--sort option") Cc: stable@vger.kernel.org Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: jserv@ccns.ncku.edu.tw Cc: chuang@cs.nycu.edu.tw Link: https://lore.kernel.org/r/20241209134226.1939163-1-visitorckw@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-09 13:54:08 -08:00
Ian Rogers	c95584e07b	perf test hwmon_pmu: Fix event file location The temp directory is made and a known fake hwmon PMU created within it. Prior to this fix the events were being incorrectly written to the temp directory rather than the fake PMU directory. This didn't impact the test as the directory fd matched the wrong location, but it doesn't mirror what a hwmon PMU would actually look like. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206042306.1055913-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 18:15:38 -03:00
Ian Rogers	9a4426120d	perf hwmon_pmu: Use openat rather than dup to refresh directory The hwmon PMU test will make a temp directory, open the directory with O_DIRECTORY then fill it with contents. As the open is before the filling the contents the later fdopendir may reflect the initial empty state, meaning no events are seen. Change to re-open the directory, rather than dup the fd, so the latest contents are seen. Minor tweaks/additions to debug messages. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206042306.1055913-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 18:15:30 -03:00
Ian Rogers	5e530a8287	perf tests: Enable tests disabled due to tracepoint parsing Tracepoint parsing required libtraceevent but no longer does. Remove the Build logic and #ifdefs that caused the tests not to be run. Test code that directly uses libtraceevent is still guarded. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	6c8310e838	perf evsel: Allow evsel__newtp without libtraceevent Switch from reading the tracepoint format to reading the id directly for the evsel config. This avoids the need to initialize libtraceevent, plugins, etc. It is sufficient for many tracepoint commands to work like: $ perf stat -e sched:sched_switch true To populate evsel->tp_format, do lazy initialization using libtraceevent in the evsel__tp_format function (the sys and name are saved in evsel__newtp_idx for this purpose). Reading the id should be indicative of the format failing to load, but if not an error is reported in evsel__tp_format. This could happen for a tracepoint with a format that fails to parse. As tracepoints can be parsed without libtraceevent with this, remove the associated #ifdefs in parse-events.c. By only lazily parsing the tracepoint format information it is hoped this will help improve the performance of code using tracepoints but not the format information. It also cuts down on the build and ifdef logic. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	c46d634a03	perf evsel: Add/use accessor for tp_format Add an accessor function for tp_format. Rather than search+replace uses try to use a variable and reuse it. Add additional NULL checks when accessing/using the value. Make sure the PTR_ERR is nulled out on error path in evsel__newtp_idx. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	800c93ffaf	perf trace-event: Always build trace-event-info.c trace-event-info.c has no libtraceevent dependencies, always build it and use it in builtin-record and perf_event_attr printing. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	f7264150b4	perf trace-event: Constify print arguments Capture that these functions don't mutate their input. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Ian Rogers	925c25efca	perf env: Ensure failure broken topology file reads are always -1 encoded get_core_id returns 0 on success and a negative errno value on error. Currently the error can only be -1, but fixing this to be any errno value breaks perf: https://lore.kernel.org/lkml/Zzu4Sdebve-NXEMX@google.com/ To avoid this, make sure all error values are written as -1. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:42 -03:00
Arnaldo Carvalho de Melo	dcf900429d	perf btf: Make the sigtrap test helper to find a member by name widely available By introducing a tools/perf/util/btf.c to collect utilities not yet available via libbpf, the first being a way to find a member by name once we get the type_id for the struct. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Ian Rogers	4b8a7c0327	perf pmu: Remove use of perf_cpu_map__read() Remove use of a FILE and switch to reading a string that is then passed to perf_cpu_map__new(). Being able to remove perf_cpu_map__read() avoids duplicated parsing logic. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Ian Rogers	02b5ed8a6a	perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUS libperf exposes MAX_NR_CPUS via tools/lib/perf/include/internal/cpumap.h which is internal. The preferred dependency should be the definition in tools/perf/perf.h. Add the includes of perf.h so that MAX_NR_CPUS can be hidden in libperf. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kyle Meyer <kyle.meyer@hpe.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Kyle Meyer	9a1e106550	perf: Increase MAX_NR_CPUS to 4096 Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096. Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B, cpus_runtime will increase from 81960B to 163880B, and max_entries will increase from 8192B to 16384B. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241206044035.1062032-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Ilkka Koskinen	9e7a00ec6a	perf arm-spe: Add support for SPE Data Source packet on AmpereOne Decode SPE Data Source packets on AmpereOne. The field is IMPDEF. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241108202946.16835-3-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Ilkka Koskinen	ccdc9e9c5e	perf arm-spe: Prepare for adding data source packet implementations for other cores Split Data Source Packet handling to prepare adding support for other implementations. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241108202946.16835-2-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Leo Yan	9eef3ec920	perf cpumap: Add checking for reference counter For the CPU map merging test, add an extra check for the reference counter before releasing the last CPU map. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241107125308.41226-4-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Leo Yan	fb953dfa66	perf cpumap: Add more tests for CPU map merging Add additional tests for CPU map merging to cover more cases. These tests include different types of arguments, such as when one CPU map is a subset of another, as well as cases with or without overlap between the two maps. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241107125308.41226-3-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Leo Yan	a9d2217556	libperf cpumap: Refactor perf_cpu_map__merge() The perf_cpu_map__merge() function has two arguments, 'orig' and 'other'. The function definition might cause confusion as it could give the impression that the CPU maps in the two arguments are copied into a new allocated structure, which is then returned as the result. The purpose of the function is to merge the CPU map 'other' into the CPU map 'orig'. This commit changes the 'orig' argument to a pointer to pointer, so the new result will be updated into 'orig'. The return value is changed to an int type, as an error number or 0 for success. Update callers and tests for the new function definition. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241107125308.41226-2-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:52:41 -03:00
Arnaldo Carvalho de Melo	161c3402fd	perf config: Fix trival typo 'an' -> 'can' Just a trivial typo, should be 'can', did a spell check on the rest of the file just in case, nothing more stood out. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:51:53 -03:00
Ian Rogers	d78e20c081	perf script python: Improve physical mem type resolution Previously system RAM and persistent memory were hard code matched, change so that the label of the memory region is just read from /proc/iomem. This avoids frequent N/A samples. Change the /proc/iomem reading, event processing and output so that nested entries appear and their counts count toward their parent. As labels may be repeated, include the memory ranges in the output to make it clear why, for example, "System RAM" appears twice. Before: Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- System RAM 9460 96.5% N/A 998 3.5% After: Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-105f7fffff : System RAM 36741 96.5 841400000-8416599ff : Kernel data 89 0.2 840800000-8412a6fff : Kernel rodata 60 0.2 841ebe000-8423fffff : Kernel bss 34 0.1 0-fff : Reserved 1345 3.5 100000-89dd9fff : System RAM 2 0.0 Before: Event: mem_inst_retired.any:P Memory type count percentage ---------------------------------------- ----------- ----------- System RAM 9460 90.5% N/A 998 9.5% After: Event: mem_inst_retired.any:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-105f7fffff : System RAM 9460 90.5 841400000-8416599ff : Kernel data 45 0.4 840800000-8412a6fff : Kernel rodata 19 0.2 841ebe000-8423fffff : Kernel bss 12 0.1 0-fff : Reserved 998 9.5 The code has been updated to python 3 with type hints and resolving issues reported by mypy and pylint. Tabs are swapped to spaces as preferred in PEP8, because most lines of code were modified (of this small file) and this makes pylint significantly less noisy. Committer testing: root@number:/tmp# grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-14700K root@number:/tmp# root@number:/tmp# perf script mem-phys-addr -a find / /bin /lib /lib64 /sbin Warning: 744 out of order events recorded. Event: cpu_core/mem_inst_retired.all_loads/P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-8bfbfffff : System RAM 364561 76.5 621400000-6223a6fff : Kernel rodata 10474 2.2 622400000-62283d4bf : Kernel data 4828 1.0 623304000-6237fffff : Kernel bss 1063 0.2 620000000-6213fffff : Kernel code 98 0.0 0-fff : Reserved 111480 23.4 100000-2b0ca017 : System RAM 337 0.1 2fbad000-30d92fff : System RAM 44 0.0 2c79d000-2fbabfff : System RAM 30 0.0 30d94000-316d5fff : System RAM 16 0.0 2b131a58-2c71dfff : System RAM 7 0.0 root@number:/tmp# Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241119180130.19160-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:51:53 -03:00
Arnaldo Carvalho de Melo	b2b95a2d78	perf disasm: Return a proper error when not determining the file type Before: ⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Internal error: Invalid -1 error code ⬢ [acme@toolbox a]$ After: ⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)' Error: Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int): Couldn't determine the file /tmp/perf-3308868.map type. ⬢ [acme@toolbox a]$ Reported-by: Francesco Nigro <fnigro@redhat.com> Reported-by: Ilan Green <igreen@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Link: https://lore.kernel.org/lkml/Z092D9-r_iOgwIWM@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:51:53 -03:00
Arnaldo Carvalho de Melo	176c9d1e6a	tools features: Don't check for libunwind devel files by default Since `13e17c9ff4` ("perf build: Make libunwind opt-in rather than opt-out"), so we shouldn't by default be testing for its availability at build time in tools/build/features/test-all.c. That test was designed to test the features we expect to be the most common ones in most builds, so if we test build just that file, then we assume the features there are present and will not test one by one. Removing it from test-all.c gets rid of the first impediment for test-all.c to build successfully: $ cat /tmp/build/perf-tools-next/feature/test-all.make.output In file included from test-all.c:62: test-libunwind.c:2:10: fatal error: libunwind.h: No such file or directory 2 \| #include <libunwind.h> \| ^~~~~~~~~~~~~ compilation terminated. $ We then get to: $ cat /tmp/build/perf-tools-next/feature/test-all.make.output /usr/bin/ld: cannot find -lunwind-x86_64: No such file or directory /usr/bin/ld: cannot find -lunwind: No such file or directory collect2: error: ld returned 1 exit status $ So make all the logic related to setting CFLAGS, LDFLAGS, etc for libunwind to be conditional on NO_LIBWUNWIND=1, which is now the default, now we get a faster build: $ cat /tmp/build/perf-tools-next/feature/test-all.make.output $ ldd /tmp/build/perf-tools-next/feature/test-all.bin linux-vdso.so.1 (0x00007fef04cde000) libdw.so.1 => /lib64/libdw.so.1 (0x00007fef04a49000) libpython3.12.so.1.0 => /lib64/libpython3.12.so.1.0 (0x00007fef04478000) libm.so.6 => /lib64/libm.so.6 (0x00007fef04394000) libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007fef0436c000) libtracefs.so.1 => /lib64/libtracefs.so.1 (0x00007fef04345000) libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007fef03e95000) libz.so.1 => /lib64/libz.so.1 (0x00007fef03e72000) libelf.so.1 => /lib64/libelf.so.1 (0x00007fef03e56000) libnuma.so.1 => /lib64/libnuma.so.1 (0x00007fef03e48000) libslang.so.2 => /lib64/libslang.so.2 (0x00007fef03b65000) libperl.so.5.38 => /lib64/libperl.so.5.38 (0x00007fef037c6000) libc.so.6 => /lib64/libc.so.6 (0x00007fef035d5000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fef035a0000) libzstd.so.1 => /lib64/libzstd.so.1 (0x00007fef034e1000) libbz2.so.1 => /lib64/libbz2.so.1 (0x00007fef034cd000) /lib64/ld-linux-x86-64.so.2 (0x00007fef04ce0000) libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007fef03495000) $ Fixes: `13e17c9ff4` ("perf build: Make libunwind opt-in rather than opt-out") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Z09zTztD8X8qIWCX@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-12-09 17:51:53 -03:00
Namhyung Kim	c33aea446b	perf tools: Fix precise_ip fallback logic Sometimes it returns other than EOPNOTSUPP for invalid precise_ip so it cannot check the error code. Let's move the fallback after the missing feature checks so that it can handle EINVAL as well. This also aligns well with the existing behavior which blindly turns off the precise_ip but we check the missing features correctly now. Fixes: `af954f76ee` ("perf tools: Check fallback error and order") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Closes: https://lore.kernel.org/oe-lkp/202411301431.799e5531-lkp@intel.com Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/Z1DV0lN8qHSysX7f@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-05 15:15:29 -08:00
Namhyung Kim	968121f0a6	perf tools: Fix build error on generated/fs_at_flags_array.c It should only have generic flags in the array but the recent header sync brought a new flags to fcntl.h and caused a build error. Let's update the shell script to exclude flags specific to name_to_handle_at(). CC trace/beauty/fs_at_flags.o In file included from trace/beauty/fs_at_flags.c:21: tools/perf/trace/beauty/generated/fs_at_flags_array.c:13:30: error: initialized field overwritten [-Werror=override-init] 13 \| [ilog2(0x002) + 1] = "HANDLE_CONNECTABLE", \| ^~~~~~~~~~~~~~~~~~~~ tools/perf/trace/beauty/generated/fs_at_flags_array.c:13:30: note: (near initialization for ‘fs_at_flags[2]’) Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241203035349.1901262-12-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Namhyung Kim	c994ac74cc	tools headers: Sync uapi/linux/prctl.h with the kernel sources To pick up the changes in this cset: `09d6775f50` riscv: Add support for userspace pointer masking `91e102e797` prctl: arch-agnostic prctl for shadow stack This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Please see tools/include/uapi/README for further details. Reviewed-by: James Clark <james.clark@linaro.org> Cc: Mark Brown <broonie@kernel.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241203035349.1901262-11-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Namhyung Kim	02116fcfd8	tools headers: Sync uapi/linux/mount.h with the kernel sources To pick up the changes in this cset: `aefff51e1c` statmount: retrieve security mount options `2f4d4503e9` statmount: add flag to retrieve unescaped options `44010543fc` fs: add the ability for statmount() to report the sb_source `ed9d95f691` fs: add the ability for statmount() to report the fs_subtype This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/mount.h include/uapi/linux/mount.h Please see tools/include/uapi/README for further details. Reviewed-by: James Clark <james.clark@linaro.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20241203035349.1901262-10-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Namhyung Kim	6d442c69cb	tools headers: Sync uapi/linux/fcntl.h with the kernel sources To pick up the changes in this cset: `c374196b2b` ("fs: name_to_handle_at() support for "explicit connectable" file handles") `95f567f81e` ("fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag") This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h Please see tools/include/uapi/README for further details. Reviewed-by: James Clark <james.clark@linaro.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20241203035349.1901262-9-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Namhyung Kim	81b483f722	tools headers: Sync xattrat syscall changes with the kernel sources To pick up the changes in this cset: `6140be90ec` ("fs/xattr: add at family syscalls") This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_32.tbl arch/x86/entry/syscalls/syscall_32.tbl diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl The arm64 changes are not included as it requires more changes in the tools. It'll be worked for the later cycle. Please see tools/include/uapi/README for further details. Reviewed-by: James Clark <james.clark@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> CC: x86@kernel.org CC: linux-mips@vger.kernel.org CC: linuxppc-dev@lists.ozlabs.org CC: linux-s390@vger.kernel.org Link: https://lore.kernel.org/r/20241203035349.1901262-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-04 14:34:50 -08:00
Arnaldo Carvalho de Melo	88a6e2f67c	perf machine: Initialize machine->env to address a segfault Its used from trace__run(), for the 'perf trace' live mode, i.e. its strace-like, non-perf.data file processing mode, the most common one. The trace__run() function will set trace->host using machine__new_host() that is supposed to give a machine instance representing the running machine, and since we'll use perf_env__arch_strerrno() to get the right errno -> string table, we need to use machine->env, so initialize it in machine__new_host(). Before the patch: (gdb) run trace --errno-summary -a sleep 1 <SNIP> Summary of events: gvfs-afc-volume (3187), 2 events, 0.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ pselect6 1 0 0.000 0.000 0.000 0.000 0.00% GUsbEventThread (3519), 2 events, 0.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ poll 1 0 0.000 0.000 0.000 0.000 0.00% <SNIP> Program received signal SIGSEGV, Segmentation fault. 0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478 478 if (env->arch_strerrno == NULL) (gdb) bt #0 0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478 #1 0x00000000004b75d2 in thread__dump_stats (ttrace=0x14f58f0, trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4673 #2 0x00000000004b78bf in trace__fprintf_thread (fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>, thread=0x10fa0b0, trace=0x7fffffffa5b0) at builtin-trace.c:4708 #3 0x00000000004b7ad9 in trace__fprintf_thread_summary (trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4747 #4 0x00000000004b656e in trace__run (trace=0x7fffffffa5b0, argc=2, argv=0x7fffffffde60) at builtin-trace.c:4456 #5 0x00000000004ba43e in cmd_trace (argc=2, argv=0x7fffffffde60) at builtin-trace.c:5487 #6 0x00000000004c0414 in run_builtin (p=0xec3068 <commands+648>, argc=5, argv=0x7fffffffde60) at perf.c:351 #7 0x00000000004c06bb in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:404 #8 0x00000000004c0814 in run_argv (argcp=0x7fffffffdc4c, argv=0x7fffffffdc40) at perf.c:448 #9 0x00000000004c0b5d in main (argc=5, argv=0x7fffffffde60) at perf.c:560 (gdb) After: root@number:~# perf trace -a --errno-summary sleep 1 <SNIP> pw-data-loop (2685), 1410 events, 16.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ epoll_wait 188 0 983.428 0.000 5.231 15.595 8.68% ioctl 94 0 0.811 0.004 0.009 0.016 2.82% read 188 0 0.322 0.001 0.002 0.006 5.15% write 141 0 0.280 0.001 0.002 0.018 8.39% timerfd_settime 94 0 0.138 0.001 0.001 0.007 6.47% gnome-control-c (179406), 1848 events, 20.9% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ poll 222 0 959.577 0.000 4.322 21.414 11.40% recvmsg 150 0 0.539 0.001 0.004 0.013 5.12% write 300 0 0.442 0.001 0.001 0.007 3.29% read 150 0 0.183 0.001 0.001 0.009 5.53% getpid 102 0 0.101 0.000 0.001 0.008 7.82% root@number:~# Fixes: `54373b5d53` ("perf env: Introduce perf_env__arch_strerrno()") Reported-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Michael Petlan <mpetlan@redhat.com> Tested-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/Z0XffUgNSv_9OjOi@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-03 10:07:31 -08:00
James Clark	f54cd8f43f	perf test: Don't signal all processes on system when interrupting tests This signal handler loops over all tests on ctrl-C, but it's active while the test list is being constructed. process.pid is 0, then -1, then finally set to the child pid on fork. If the Ctrl-C is received during this point a kill(-1, SIGINT) can be sent which affects all processes. Make sure the child has forked first before forwarding the signal. This can be reproduced with ctrl-C immediately after launching perf test which terminates the ssh connection. Fixes: `553d5efeb3` ("perf test: Add a signal handler to kill forked child processes") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241129151948.3199732-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-02 12:36:35 -08:00
Namhyung Kim	23c44f6c83	perf tools: Fix build-id event recording The build-id events written at the end of the record session are broken due to unexpected data. The write_buildid() writes the fixed length event first and then variable length filename. But a recent change made it write more data in the padding area accidentally. So readers of the event see zero-filled data for the next entry and treat it incorrectly. This resulted in wrong kernel symbols because the kernel DSO loaded a random vmlinux image in the path as it didn't have a valid build-id. Fixes: `ae39ba1655` ("perf inject: Fix build ID injection") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/Z0aRFFW9xMh3mqKB@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-12-02 12:36:20 -08:00
Linus Torvalds	b50ecc5aca	perf tools changes for v6.13 perf record ----------- * Enable leader sampling for inherited task events. It was supported only for system-wide events but the kernel started to support such a setup since v6.12. This is to reduce the number of PMU interrupts. The samples of the leader event will contain counts of other events and no samples will be generated for the other member events. $ perf record -e '{cycles,instructions}:S' ${MYPROG} perf report ----------- * Fix --branch-history option to display more branch-related information like prediction, abort and cycles which is available on Intel machines. $ perf record -bg -- perf test -w brstack $ perf report --branch-history ... # # Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage] # ........ ........................ .............. .................... ......... ..... ...... .................... # 8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - - \| ---xas_load xarray.h:171 \| \|--5.68%--xas_load xarray.c:245 (cycles:1) \| xas_load xarray.c:242 \| xas_load xarray.h:1260 (cycles:1) \| xas_descend xarray.c:146 \| xas_load xarray.c:244 (cycles:2) \| xas_load xarray.c:245 \| xas_descend xarray.c:218 (cycles:10) ... perf stat --------- * Add HWMON PMU support. The HWMON provides various system information like CPU/GPU temperature, fan speed and so on. Expose them as PMU events so that users can see the values using perf stat commands. $ perf stat -e temp_cpu,fan1 true Performance counter stats for 'true': 60.00 'C temp_cpu 0 rpm fan1 0.000745382 seconds time elapsed 0.000883000 seconds user 0.000000000 seconds sys * Display metric threshold in JSON output. Some metrics define thresholds to classify value ranges. It used to be in a different color but it won't work for JSON. Add "metric-threshold" field to the JSON that can be one of "good", "less good", "nearly bad" and "bad". # perf stat -a -M TopdownL1 -j true {"counter-value" : "18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" : 5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"} {"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"} {"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"} {"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"} {"counter-value" : "3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" : 5552708, "pcnt-running" : 100.00, } ... perf sched ---------- * Add -P/--pre-migrations option for 'timehist' sub-command to track time a task waited on a run-queue before migrating to a different CPU. $ perf sched timehist -P time cpu task name wait time sch delay run time pre-mig time [tid/pid] (msec) (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- --------- 585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000 585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000 585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000 585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000 585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000 585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000 585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000 585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000 585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001 585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000 ... Build ----- * Make libunwind opt-in (LIBUNWIND=1) rather than opt-out. The perf tools are generally built with libelf and libdw which has unwinder functionality. The libunwind support predates it and no need to have duplicate unwinders by default. * Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify it's using libdw for handling DWARF information. Internals --------- * Do not set exclude_guest bit in the perf_event_attr by default. This was causing a trouble in AMD IBS PMU as it doesn't support the bit. The bit will be set when it's needed later by the fallback logic. Also update the missing feature detection logic to make sure not clear supported bits unnecessarily. * Run perf test in parallel by default and mark flaky tests "exclusive" to run them serially at the end. Some test numbers are changed but the test can complete in less than half the time. JSON vendor events ------------------ * Add AMD Zen 5 events and metrics. * Add i.MX91 and i.MX95 DDR metrics * Fix HiSilicon HIP08 Topdown metric name. * Support compat events on PowerPC. Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ0Qi3gAKCRCMstVUGiXM g6NIAP49eoSmQF40u55sJN0J7RpYd+bTgXZkahv0IUCBX98TLwEA2NrK0oUcB84C xeanq28/3JxNM/oBpsEvvB8mb/0lGwI= =FAVF -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf record: - Enable leader sampling for inherited task events. It was supported only for system-wide events but the kernel started to support such a setup since v6.12. This is to reduce the number of PMU interrupts. The samples of the leader event will contain counts of other events and no samples will be generated for the other member events. $ perf record -e '{cycles,instructions}:S' ${MYPROG} perf report: - Fix --branch-history option to display more branch-related information like prediction, abort and cycles which is available on Intel machines. $ perf record -bg -- perf test -w brstack $ perf report --branch-history ... # # Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage] # ........ ........................ .............. .................... ......... ..... ...... .................... # 8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - - \| ---xas_load xarray.h:171 \| \|--5.68%--xas_load xarray.c:245 (cycles:1) \| xas_load xarray.c:242 \| xas_load xarray.h:1260 (cycles:1) \| xas_descend xarray.c:146 \| xas_load xarray.c:244 (cycles:2) \| xas_load xarray.c:245 \| xas_descend xarray.c:218 (cycles:10) ... perf stat: - Add HWMON PMU support. The HWMON provides various system information like CPU/GPU temperature, fan speed and so on. Expose them as PMU events so that users can see the values using perf stat commands. $ perf stat -e temp_cpu,fan1 true Performance counter stats for 'true': 60.00 'C temp_cpu 0 rpm fan1 0.000745382 seconds time elapsed 0.000883000 seconds user 0.000000000 seconds sys - Display metric threshold in JSON output. Some metrics define thresholds to classify value ranges. It used to be in a different color but it won't work for JSON. Add "metric-threshold" field to the JSON that can be one of "good", "less good", "nearly bad" and "bad". # perf stat -a -M TopdownL1 -j true {"counter-value" : "18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" : 5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"} {"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"} {"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"} {"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"} {"counter-value" : "3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, } {"counter-value" : "1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" : 5552708, "pcnt-running" : 100.00, } ... perf sched: - Add -P/--pre-migrations option for 'timehist' sub-command to track time a task waited on a run-queue before migrating to a different CPU. $ perf sched timehist -P time cpu task name wait time sch delay run time pre-mig time [tid/pid] (msec) (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- --------- 585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000 585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000 585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000 585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000 585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000 585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000 585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000 585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000 585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001 585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000 ... Build: - Make libunwind opt-in (LIBUNWIND=1) rather than opt-out. The perf tools are generally built with libelf and libdw which has unwinder functionality. The libunwind support predates it and no need to have duplicate unwinders by default. - Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify it's using libdw for handling DWARF information. Internals: - Do not set exclude_guest bit in the perf_event_attr by default. This was causing a trouble in AMD IBS PMU as it doesn't support the bit. The bit will be set when it's needed later by the fallback logic. Also update the missing feature detection logic to make sure not clear supported bits unnecessarily. - Run perf test in parallel by default and mark flaky tests "exclusive" to run them serially at the end. Some test numbers are changed but the test can complete in less than half the time. JSON vendor events: - Add AMD Zen 5 events and metrics. - Add i.MX91 and i.MX95 DDR metrics - Fix HiSilicon HIP08 Topdown metric name. - Support compat events on PowerPC" * tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (232 commits) perf tests: Fix hwmon parsing with PMU name test perf hwmon_pmu: Ensure hwmon key union is zeroed before use perf tests hwmon_pmu: Remove double evlist__delete() perf/test: fix perf ftrace test on s390 perf bpf-filter: Return -ENOMEM directly when pfi allocation fails perf test: Correct hwmon test PMU detection perf: Remove unused del_perf_probe_events() perf pmu: Move pmu_metrics_table__find and remove ARM override perf jevents: Add map_for_cpu() perf header: Pass a perf_cpu rather than a PMU to get_cpuid_str perf header: Avoid transitive PMU includes perf arm64 header: Use cpu argument in get_cpuid perf header: Refactor get_cpuid to take a CPU for ARM perf header: Move is_cpu_online to numa bench perf jevents: fix breakage when do perf stat on system metric perf test: Add missing __exit calls in tool/hwmon tests perf tests: Make leader sampling test work without branch event perf util: Remove kernel version deadcode perf test shell trace_exit_race: Use --no-comm to avoid cases where COMM isn't resolved perf test shell trace_exit_race: Show what went wrong in verbose mode ...	2024-11-26 14:54:00 -08:00
Ian Rogers	6d78089da9	perf tests: Fix hwmon parsing with PMU name test Incorrectly the hwmon with PMU name test didn't pass "true". Fix and address issue with hwmon_pmu__config_terms needing to load events - a load bearing assert fired. Also fix missing list deletion when putting the hwmon test PMU and lower some debug warnings to make the hwmon PMU less spammy in verbose mode. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241121000955.536930-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:38:39 -08:00
Ian Rogers	62878b400f	perf hwmon_pmu: Ensure hwmon key union is zeroed before use Non-zero values led to mismatches in testing. This was reproducible with -fsanitize=undefined. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Closes: https://lore.kernel.org/lkml/Zzdtj0PEWEX3ATwL@x1/ Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241119230033.115369-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:38:39 -08:00
Arnaldo Carvalho de Melo	870748fa1f	perf tests hwmon_pmu: Remove double evlist__delete() In the error path when failing to parse events the evlist is being deleted twice, keep the one after the out label. Fixes: `531ee0fd48` ("perf test: Add hwmon "PMU" test") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/ZzzoJNNcJJVnPCCe@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:38:27 -08:00
Thomas Richter	5f2c8f4e10	perf/test: fix perf ftrace test on s390 On s390 the perf test case ftrace sometimes fails as follows: # ./perf test ftrace 79: perf ftrace tests : FAILED! # The failure depends on the kernel .config file. Some configurations always work fine, some do not. The ftrace profile test mostly fails, because the ring buffer was not large enough, and some lines (especially the interesting ones with nanosleep in it) where dropped. To achieve success for all tested kernel configurations, enlarge the buffer to store the traces completely without wrapping. The default buffer size is too small for all kernel configurations. Set the buffer size of for the ftrace profile test to 16 MB. Output after: # ./perf test ftrace 79: perf ftrace tests : Ok # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: agordeev@linux.ibm.com Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Link: https://lore.kernel.org/r/20241119064856.641446-1-tmricht@linux.ibm.com Suggested-by: Sven Schnelle <svens@linux.ibm.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:36:44 -08:00
Hao Ge	bd077a53ad	perf bpf-filter: Return -ENOMEM directly when pfi allocation fails Directly return -ENOMEM when pfi allocation fails, instead of performing other operations on pfi. Fixes: `0fe2b18ddc` ("perf bpf-filter: Support multiple events properly") Signed-off-by: Hao Ge <gehao@kylinos.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: hao.ge@linux.dev Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20241113030537.26732-1-hao.ge@linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:36:00 -08:00
Ian Rogers	fc26637d70	perf test: Correct hwmon test PMU detection Use name to avoid potential other hwmon PMUs. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241118052638.754981-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-22 13:34:31 -08:00
Dr. David Alan Gilbert	85c60a01b8	perf: Remove unused del_perf_probe_events() del_perf_probe_events() last use was removed by commit `3d6dfae889` ("perf parse-events: Remove BPF event support") Remove it. It was the last user of probe_file__del_events(), so remove it as well. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241022002940.302946-1-linux@treblig.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 17:07:31 -03:00
Ian Rogers	8f997865ee	perf pmu: Move pmu_metrics_table__find and remove ARM override Move pmu_metrics_table__find() to the jevents.py generated pmu-events.c and remove indirection override for ARM. The movement removes perf_pmu__find_metrics_table that exists to enable the ARM override. The ARM override isn't necessary as just the CPUID, not PMU, is used in the metric table lookup. On non-ARM the CPU argument is just ignored for the CPUID, for ARM -1 is passed so that the CPUID for the first logical CPU is read. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:42:36 -03:00
Ian Rogers	0434410fa4	perf jevents: Add map_for_cpu() The PMU is no longer part of the map finding process and for metrics doesn't make sense as they lack a PMU. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:41:42 -03:00
Ian Rogers	494c403ff1	perf header: Pass a perf_cpu rather than a PMU to get_cpuid_str On ARM the cpuid is dependent on the core type of the CPU in question. The PMU was passed for the sake of the CPU map but this means in places a temporary PMU is created just to pass a CPU value. Just pass the CPU and fix up the callers. As there are no longer PMU users in header.h, shuffle forward declarations earlier to work around build failures. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:40:30 -03:00
Ian Rogers	7463ee17a7	perf header: Avoid transitive PMU includes Currently satisfied via header.h. Note, pmu.h includes parse-events.h. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:39:59 -03:00
Ian Rogers	538737da96	perf arm64 header: Use cpu argument in get_cpuid Use the cpu to read the MIDR file requested. If the "any" value (-1) is passed that keep the behavior of returning the first MIDR file that can be read. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:39:04 -03:00
Ian Rogers	cec0d6572a	perf header: Refactor get_cpuid to take a CPU for ARM ARM BIG.little has no notion of a constant CPUID for both core types. To reflect this reality, change the get_cpuid function to also pass in a possibly unused logical cpu. If the dummy value (-1) is passed in then ARM can, as currently happens, select the first logical CPU's "CPUID". The changes to ARM getcpuid happen in a follow up change. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:37:54 -03:00
Ian Rogers	c6fafe36ba	perf header: Move is_cpu_online to numa bench The helper function is only used in the NUMA benchmark as typically online CPUs are determined through perf_cpu_map__new_online_cpus(). Reduce the scope of the function for now. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:36:47 -03:00
Xu Yang	4a159e6049	perf jevents: fix breakage when do perf stat on system metric When do perf stat on sys metric, perf tool output nothing now: $ perf stat -a -M imx95_ddr_read.all -I 1000 $ This command runs on an arm64 machine and the Soc has one DDR hw pmu except one armv8_cortex_a55 pmu. Their maps show as follows: const struct pmu_events_map pmu_events_map[] = { { .arch = "arm64", .cpuid = "0x00000000410fd050", .event_table = { .pmus = pmu_events__arm_cortex_a55, .num_pmus = ARRAY_SIZE(pmu_events__arm_cortex_a55) }, .metric_table = { .pmus = NULL, .num_pmus = 0 } }, static const struct pmu_sys_events pmu_sys_event_tables[] = { { .event_table = { .pmus = pmu_events__freescale_imx95_sys, .num_pmus = ARRAY_SIZE(pmu_events__freescale_imx95_sys) }, .metric_table = { .pmus = pmu_metrics__freescale_imx95_sys, .num_pmus = ARRAY_SIZE(pmu_metrics__freescale_imx95_sys) }, .name = "pmu_events__freescale_imx95_sys", }, Currently, pmu_metrics_table__find() will return NULL when only do perf stat on sys metric. Then parse_groups() will never be called to parse sys metric_name, finally perf tool will exit directly. This should be a common problem. To fix the issue, this will keep the logic before commit `f20c15d13f` ("perf pmu-events: Remember the perf_events_map for a PMU") to return a empty metric table rather than a NULL pointer. This should be fine since the removed part just check if the table match provided metric_name. Without these code, the code in parse_groups() will also check the validity of metrci_name too. Fixes: `f20c15d13f` ("perf pmu-events: Remember the perf_events_map for a PMU") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241107162035.52206-2-irogers@google.com Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:34:15 -03:00
Ian Rogers	db26a8c9e3	perf test: Add missing __exit calls in tool/hwmon tests Address sanitizer flagged the missing parse_events_error__exit when testing on ARM. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241115201258.509477-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:30:40 -03:00
James Clark	180fd0c1ea	perf tests: Make leader sampling test work without branch event Arm a57 only has speculative branch events so this test fails there. The test doesn't depend on branch instructions so change it to instructions which is pretty much guaranteed to be everywhere. The test_branch_counter() test above already tests for the existence of the branches event and skips if its not present. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241115161600.228994-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:30:39 -03:00
Dr. David Alan Gilbert	264708b8ac	perf util: Remove kernel version deadcode fetch_kernel_version() has been unused since Ian's 2023 commit `3d6dfae889` ("perf parse-events: Remove BPF event support") Remove it, and it's helpers. I noticed there are a bunch of kernel-version macros that are also unused nearby. Also remove them. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241116155850.113129-1-linux@treblig.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:30:39 -03:00
Arnaldo Carvalho de Melo	0b687912c9	perf test shell trace_exit_race: Use --no-comm to avoid cases where COMM isn't resolved The purpose of this test is to test for races in the exit of 'perf trace' missing the last events, it was failing when the COMM wasn't resolved either because we missed some PERF_RECORD_COMM or somehow raced on getting it from procfs. Add --no-comm to the 'perf trace' command line so that we get a consistent, pid only output, which allows the test to achieve its goal. This is the output from 'perf trace --no-comm -e syscalls:sys_enter_exit_group': 0.000 21953 syscalls:sys_enter_exit_group() 0.000 21955 syscalls:sys_enter_exit_group() 0.000 21957 syscalls:sys_enter_exit_group() 0.000 21959 syscalls:sys_enter_exit_group() 0.000 21961 syscalls:sys_enter_exit_group() 0.000 21963 syscalls:sys_enter_exit_group() 0.000 21965 syscalls:sys_enter_exit_group() 0.000 21967 syscalls:sys_enter_exit_group() 0.000 21969 syscalls:sys_enter_exit_group() 0.000 21971 syscalls:sys_enter_exit_group() Now it passes: root@number:~# perf test "trace exit race" 110: perf trace exit race : Ok root@number:~# root@number:~# perf test -v "trace exit race" 110: perf trace exit race : Ok root@number:~# If we artificially make it run just 9 times instead of the 10 it runs, i.e. by manually doing: trace_shutdown_race() { for _ in $(seq 9); do that 9 is $iter, 10 in the patch, we get: root@number:~# vim ~acme/libexec/perf-core/tests/shell/trace_exit_race.sh root@number:~# perf test -v "trace exit race" --- start --- test child forked, pid 24629 Missing output, expected 10 but only got 9 ---- end(-1) ---- 110: perf trace exit race : FAILED! root@number:~# I.e. 9 'perf trace' calls produced the expected output, the inverse grep didn't show anything, so the patch provided by Howard for the previous patch kicks in and shows a more informative message. Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Peterson <benjamin@engflow.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZzdknoHqrJbojb6P@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-16 16:30:32 -03:00
Arnaldo Carvalho de Melo	7ca41faa5f	perf test shell trace_exit_race: Show what went wrong in verbose mode If it fails we need to check what was the reason, what were the lines that didn't match the expected format, so: root@number:~# perf test -v "trace exit race" --- start --- test child forked, pid 2028724 Lines not matching the expected regexp: ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group$': 0.000 :2028750/2028750 syscalls:sys_enter_exit_group() ---- end(-1) ---- 110: perf trace exit race : FAILED! root@number:~# In this case we're not resolving the process COMM for some reason and fallback to printing just the pid/tid, this will be fixed in a followup patch. Howard Chu spotted a problem with single code surrounding a regexp, that made the test always fail, but since there were some failures when I tested (COMM not being resolved in some of the results) the end inverse grep would show some lines and thus didn't notice the single quote problem. He also provided a patch to test if less than the number of expected matches took place but all of them with the expected output, in which case the inverse grep wouldn't show anything, confusing the tester. Reviewed-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Peterson <benjamin@engflow.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZzdknoHqrJbojb6P@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-15 12:31:01 -03:00
Benjamin Peterson	f72bcb92e9	perf tests: Add test for trace output loss Add a test that checks that trace output is not lost to races. This is accomplished by tracing the exit_group syscall of "true" multiple times and checking for correct output. Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241107232128.108981-3-benjamin@engflow.com [ Addressed two ShellCheck warnings ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 18:10:40 -03:00
Benjamin Peterson	1302e352b2	perf trace: Avoid garbage when not printing a syscall's arguments syscall__scnprintf_args may not place anything in the output buffer (e.g., because the arguments are all zero). If that happened in trace__fprintf_sys_enter, its fprintf would receive an unitialized buffer leading to garbage output. Fix the problem by passing the (possibly zero) bounds of the argument buffer to the output fprintf. Fixes: `a98392bb1e` ("perf trace: Use beautifiers on syscalls:sys_enter_ handlers") Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241107232128.108981-2-benjamin@engflow.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 18:06:52 -03:00
Benjamin Peterson	3fd7c36973	perf trace: Do not lose last events in a race If a perf trace event selector specifies a maximum number of events to output (i.e., "/nr=N/" syntax), the event printing handler, trace__event_handler, disables the event selector after the maximum number events are printed. Furthermore, trace__event_handler checked if the event selector was disabled before doing any work. This avoided exceeding the maximum number of events to print if more events were in the buffer before the selector was disabled. However, the event selector can be disabled for reasons other than exceeding the maximum number of events. In particular, when the traced subprocess exits, the main loop disables all event selectors. This meant the last events of a traced subprocess might be lost to the printing handler's short-circuiting logic. This nondeterministic problem could be seen by running the following many times: $ perf trace -e syscalls:sys_enter_exit_group true trace__event_handler should simply check for exceeding the maximum number of events to print rather than the state of the event selector. Fixes: `a9c5e6c1e9` ("perf trace: Introduce per-event maximum number of events property") Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241107232128.108981-1-benjamin@engflow.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 18:05:48 -03:00
Masami Hiramatsu (Google)	080e47b2a2	perf probe: Introduce quotation marks support In non-C languages, it is possible to have ':' in the function names. It is possible to escape it with backslashes, but if there are too many backslashes, it is annoying. This introduce quotation marks (`"` or `'`) support. For example, without quotes, we have to pass it as below $ perf probe -x cro3 -L "cro3\:\:cmd\:\:servo\:\:run_show" <run_show@/work/cro3/src/cmd/servo.rs:0> 0 fn run_show(args: &ArgsShow) -> Result<()> { 1 let list = ServoList::discover()?; 2 let s = list.find_by_serial(&args.servo)?; 3 if args.json { 4 println!("{s}"); With quotes, we can more naturally write the function name as below; $ perf probe -x cro3 -L \"cro3::cmd::servo::run_show\" <run_show@/work/cro3/src/cmd/servo.rs:0> 0 fn run_show(args: &ArgsShow) -> Result<()> { 1 let list = ServoList::discover()?; 2 let s = list.find_by_serial(&args.servo)?; 3 if args.json { 4 println!("{s}"); Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099116941.2431889.11609129616090100386.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Masami Hiramatsu (Google)	313026f3ce	perf string: Add strpbrk_esq() and strdup_esq() for escape and quote strpbrk_esq() and strdup_esq() are new variants for strpbrk() and strdup() which handles escaped characters and quoted strings. - strpbrk_esq() searches specified set of characters but ignores the escaped characters and quoted strings. e.g. strpbrk_esq("'quote\d' \queue quiz", "qd") returns "quiz". - strdup_esq() duplicates string but removes backslash and quotes which is used for quotation. It also keeps the string (including backslash) in the quoted part. e.g. strdup_esq("'quote\d' \queue quiz") returns "quote\d queue quiz". The (single, double) quotes in the quoted part should be escaped by backslash. In this case, strdup_esq() removes that backslash. The same quotes must be paired. If you use double quotation, you need to use the double quotation to close the quoted part. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099116045.2431889.15772916605719019533.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Masami Hiramatsu (Google)	b9e577225c	perf probe: Accept FUNC@* to specify function name explicitly In Golang, the function name will have the '.', and 'perf probe' misinterprets it as a file name. To mitigate this situation, introduce `function@*` so that user can explicitly specify that it is a function name. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099115149.2431889.13682110856853358354.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Masami Hiramatsu (Google)	47fa0f99a9	perf probe: Fix to ignore escaped characters in --lines option Use strbprk_esc() and strdup_esc() to ignore escaped characters in --lines option. This has been done for other options, but only --lines option doesn't. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099114272.2431889.4820591557298941207.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Masami Hiramatsu (Google)	e7c70ee7c9	perf probe: Fix error message for failing to find line range With --lines option, if perf-probe fails to find the specified line, it warns as "Debuginfo analysis failed." but this misleads user as the debuginfo is broken. Fix this message to "Specified source line(LINESPEC) is not found." so that user can understand the error correctly. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/173099113381.2431889.16263147678401426107.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:56:32 -03:00
Howard Chu	fe4f9b4124	perf trace: Fix tracing itself, creating feedback loops There exists a pids_filtered map in augmented_raw_syscalls.bpf.c that ceases to provide functionality after the BPF skeleton migration done in: `5e6da6be30` ("perf trace: Migrate BPF augmentation to use a skeleton") Before the migration, pid_filtered map works, courtesy of Arnaldo Carvalho de Melo <acme@kernel.org>: ⬢ [acme@toolbox perf-tools]$ git log --oneline -5 `6f769c3458` (HEAD) perf tests trace+probe_vfs_getname.sh: Accept quotes surrounding the filename `7777ac3dfe` perf test trace+probe_vfs_getname.sh: Remove stray \ before / `33d9c50621` perf script python: Add stub for PMU symbol to the python binding `e59fea47f8` perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to its long_name, type and adjust_symbols `878460e8d0` perf build: Remove -Wno-unused-but-set-variable from the flex flags when building with clang < 13.0.0 root@x1:/home/acme/git/perf-tools# perf trace -e /tmp/augmented_raw_syscalls.o -e write* --max-events=30 & [1] 180632 root@x1:/home/acme/git/perf-tools# 0.000 ( 0.051 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8 0.115 ( 0.010 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8 0.916 ( 0.068 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 246) = 246 1.699 ( 0.047 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 2.167 ( 0.041 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 2.739 ( 0.042 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 3.138 ( 0.027 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 3.477 ( 0.027 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 3.738 ( 0.023 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 3.946 ( 0.024 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 4.195 ( 0.024 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121 4.212 ( 0.026 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8 4.285 ( 0.006 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8 4.445 ( 0.018 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 260) = 260 4.508 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 124) = 124 4.592 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 116) = 116 4.666 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 130) = 130 4.715 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 95) = 95 4.765 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 102) = 102 4.815 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 79) = 79 4.890 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 57) = 57 4.937 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 89) = 89 5.009 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 112) = 112 5.059 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 112) = 112 5.116 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 79) = 79 5.152 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 33) = 33 5.215 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 37) = 37 5.293 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 128) = 128 5.339 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 89) = 89 5.384 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 100) = 100 [1]+ Done perf trace -e /tmp/augmented_raw_syscalls.o -e write* --max-events=30 root@x1:/home/acme/git/perf-tools# No events for the 'perf trace' (pid 180632), i.e. no feedback loop. If we leave it running: root@x1:/home/acme/git/perf-tools# perf trace -e /tmp/augmented_raw_syscalls.o -e landlock_add_rule & [1] 181068 root@x1:/home/acme/git/perf-tools# And then look at what maps it sets up: root@x1:/home/acme/git/perf-tools# bpftool map \| grep pids_filtered -A3 1190: hash name pids_filtered flags 0x0 key 4B value 1B max_entries 64 memlock 7264B btf_id 1613 pids perf(181068) root@x1:/home/acme/git/perf-tools# And ask for dumping its contents: We see that we are _also_ setting it to filter those: root@x1:/home/acme/git/perf-tools# bpftool map dump id 1190 [{ "key": 181068, "value": 1 },{ "key": 156801, "value": 1 } ] Now testing the migration commit: perf $ git log commit `5e6da6be30` (HEAD) Author: Ian Rogers <irogers@google.com> Date: Thu Aug 10 11:48:51 2023 -0700 perf trace: Migrate BPF augmentation to use a skeleton perf $ ./perf trace -e write --max-events=10 & echo #! [1] 1808653 perf $ 0.000 ( 0.010 ms): :1808671/1808671 write(fd: 1, buf: 0x6003f5b26fc0, count: 11) = 11 0.162 ( ): perf/1808653 write(fd: 2, buf: 0x7fffc2174e50, count: 11) ... 0.174 ( ): perf/1808653 write(fd: 2, buf: 0x74ce21804563, count: 1) ... 0.184 ( ): perf/1808653 write(fd: 2, buf: 0x57b936589052, count: 5) The feedback loop is there. Keep it running, look into the bpf map: perf $ bpftool map \| grep pids_filtered 10675: hash name pids_filtered flags 0x0 perf $ bpftool map dump id 10675 [] The map is empty. Now, this commit: `64917f4df0` ("perf trace: Use heuristic when deciding if a syscall tracepoint "const char " field is really a string") Temporarily fixed the feedback loop for perf trace -e write, that's because before using the heuristic, write is hooked to sys_enter_openat: perf $ git log commit `83a0943b18` (HEAD) Author: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Thu Aug 17 12:11:51 2023 -0300 perf trace: Use the augmented_raw_syscall BPF skel only for tracing syscalls perf $ ./perf trace -e write --max-events=10 -v 2>&1 \| grep Reusing Reusing "openat" BPF sys_enter augmenter for "write" And after the heuristic fix, it's unaugmented: perf $ git log commit `64917f4df0` (HEAD) Author: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Thu Aug 17 15:14:21 2023 -0300 perf trace: Use heuristic when deciding if a syscall tracepoint "const char " field is really a string perf $ ./perf trace -e write --max-events=10 -v 2>&1 \| grep Reusing perf $ After using the heuristic, write is hooked to syscall_unaugmented, which returns 1. SEC("tp/raw_syscalls/sys_enter") int syscall_unaugmented(struct syscall_enter_args *args) { return 1; } If the BPF program returns 1, the tracepoint filter will filter it (since the tracepoint filter for perf is correctly set), but before the heuristic, when it was hooked to a sys_enter_openat(), which is a BPF program that calls bpf_perf_event_output() and writes to the buffer, it didn't get filtered, thus creating feedback loop. So switching write to unaugmented accidentally fixed the problem. But some syscalls are not so lucky, for example newfstatat: perf $ ./perf trace -e newfstatat --max-events=100 & echo #! [1] 2166948 457.718 ( ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/self/ns/mnt", statbuf: 0x7fff0132a9f0) ... 457.749 ( ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/2166950/ns/mnt", statbuf: 0x7fff0132aa80) ... 457.962 ( ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/self/ns/mnt", statbuf: 0x7fff0132a9f0) ... Currently, write is augmented by the new BTF general augmenter (which calls bpf_perf_event_output()). The problem, which luckily got fixed, resurfaced, and that’s how it was discovered. Fixes: `5e6da6be30` ("perf trace: Migrate BPF augmentation to use a skeleton") Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241030052431.2220130-1-howardchu95@gmail.com [ Check if trace->skel is non-NULL, as it is only initialized if trace->trace_syscalls is set ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-14 16:55:36 -03:00
Luo Yifan	b81bb70337	perf timechart: Remove redundant variable assignment This patch makes a minor change that removes a redundant variable assignment. The assignment before the for loop is duplicated by the initialization within the loop header. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241111095209.276332-1-luoyifan@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:27:35 -03:00
Jean-Philippe Romain	d99b312572	perf list: Fix topic and pmu_name argument order Fix function definitions to match header file declaration. Fix two callers to pass the arguments in the right order. On Intel Tigerlake, before: ``` $ perf list -j\|grep "\"Topic\""\|sort\|uniq "Topic": "cache", "Topic": "cpu", "Topic": "floating point", "Topic": "frontend", "Topic": "memory", "Topic": "other", "Topic": "pfm icl", "Topic": "pfm ix86arch", "Topic": "pfm perf_raw", "Topic": "pipeline", "Topic": "tool", "Topic": "uncore interconnect", "Topic": "uncore memory", "Topic": "uncore other", "Topic": "virtual memory", $ perf list -j\|grep "\"Unit\""\|sort\|uniq "Unit": "cache", "Unit": "cpu", "Unit": "cstate_core", "Unit": "cstate_pkg", "Unit": "i915", "Unit": "icl", "Unit": "intel_bts", "Unit": "intel_pt", "Unit": "ix86arch", "Unit": "msr", "Unit": "perf_raw", "Unit": "power", "Unit": "tool", "Unit": "uncore_arb", "Unit": "uncore_clock", "Unit": "uncore_imc_free_running_0", "Unit": "uncore_imc_free_running_1", ``` After: ``` $ perf list -j\|grep "\"Topic\""\|sort\|uniq "Topic": "cache", "Topic": "floating point", "Topic": "frontend", "Topic": "memory", "Topic": "other", "Topic": "pfm icl", "Topic": "pfm ix86arch", "Topic": "pfm perf_raw", "Topic": "pipeline", "Topic": "tool", "Topic": "uncore interconnect", "Topic": "uncore memory", "Topic": "uncore other", "Topic": "virtual memory", $ perf list -j\|grep "\"Unit\""\|sort\|uniq "Unit": "cpu", "Unit": "cstate_core", "Unit": "cstate_pkg", "Unit": "i915", "Unit": "icl", "Unit": "intel_bts", "Unit": "intel_pt", "Unit": "ix86arch", "Unit": "msr", "Unit": "perf_raw", "Unit": "power", "Unit": "tool", "Unit": "uncore_arb", "Unit": "uncore_clock", "Unit": "uncore_imc_free_running_0", "Unit": "uncore_imc_free_running_1", ``` Fixes: `e5c6109f48` ("perf list: Reorganize to use callbacks to allow honouring command line options") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Jean-Philippe Romain <jean-philippe.romain@foss.st.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Junhao He <hejunhao3@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241109025801.560378-1-irogers@google.com [ I fixed the two callers and added it to Jean-Phillippe's original change. ] Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:27:35 -03:00
Andrew Kreimer	463c203165	perf tools: Fix typos Muliplier -> Multiplier There are some typos in fprintf messages. Fix them via codespell. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108134728.25515-1-algonell@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:27:35 -03:00
Arnaldo Carvalho de Melo	a6e8a58de6	perf disasm: Allow configuring what disassemblers to use The perf tools annotation code used for a long time parsing the output of binutils's objdump (or its reimplementations, like llvm's) to then parse and augment it with samples, allow navigation, etc. More recently disassemblers from the capstone and llvm (libraries, not parsing the output of tools using those libraries to mimic binutils's objdump output) were introduced. So when all those methods are available, there is a static preference for a series of attempts of disassembling a binary, with the 'llvm, capstone, objdump' sequence being hard coded. This patch allows users to change that sequence, specifying via a 'perf config' 'annotate.disassemblers' entry which and in what order disassemblers should be attempted. As alluded to in the comments in the source code of this series, this flexibility is useful for users and developers alike, elliminating the requirement to rebuild the tool with some specific set of libraries to see how the output of disassembling would be for one of these methods. root@x1:~# rm -f ~/.perfconfig root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> symbol__disassemble: filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux, sym=update_load_avg, start=0xffffffffb6148fe0, en> annotating [0x6ff7170] /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux : [0x7407ca0] update_load_avg Disassembled with llvm annotate.disassemblers=llvm,capstone,objdump Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = capstone root@x1:~# root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> Disassembled with capstone annotate.disassemblers=capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=objdump,capstone root@x1:~# perf config annotate.disassemblers annotate.disassemblers=objdump,capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump,capstone root@x1:~# perf annotate -v --stdio2 update_load_avg Executing: objdump --start-address=0xffffffff81148fe0 \ --stop-address=0xffffffff811497aa \ -d --no-show-raw-insn -S -C "$1" Disassembled with objdump annotate.disassemblers=objdump,capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent Disassembly of section .text: ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 #define DO_DETACH 0x8 /* Update task and its cfs_rq load average / static inline void update_load_avg(struct cfs_rq cfs_rq, struct sched_entity se, int flags) { 1.61 push %r15 push %r14 1.00 push %r13 mov %edx,%r13d 1.90 push %r12 push %rbp mov %rsi,%rbp push %rbx mov %rdi,%rbx sub $0x18,%rsp } / rq->task_clock normalized against any time this cfs_rq has spent throttled / static inline u64 cfs_rq_clock_pelt(struct cfs_rq cfs_rq) { if (unlikely(cfs_rq->throttle_count)) 15.14 mov 0x1a4(%rdi),%eax root@x1:~# After adding a way to select the disassembler from the command line a 'perf test' comparing the output of the various diassemblers should be introduced, to test these codebases. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:27:35 -03:00
Arnaldo Carvalho de Melo	1f7393adf6	perf disasm: Define stubs for the LLVM and capstone disassemblers This reduces the number of ifdefs in the main symbol__disassemble() method and paves the way for allowing the user to configure the disassemblers of preference. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Aditya Bodkhe <Aditya.Bodkhe1@ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-3-acme@kernel.org [ Applied fixes from Masami Hiramatsu and Aditya Bodkhe for when capstone devel files are not available ] Link: https://lore.kernel.org/r/B78FB6DF-24E9-4A3C-91C9-535765EC0E2A@ibm.com Link: https://lore.kernel.org/r/173145729034.2747044.453926054000880254.stgit@mhiramat.roam.corp.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-13 16:20:32 -03:00
Arnaldo Carvalho de Melo	4c1d8f0547	perf disasm: Introduce symbol__disassemble_objdump() With the first disassemble method in perf, the parsing of objdump output, just like we have for llvm and capstone. This paves the way to allow the user to specify what disassemblers are preferred and to also to at some point allow building without the objdump method. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-11-11 14:26:37 -03:00
Ian Rogers	ddbfb6f20c	perf build: Remove PERF_HAVE_DWARF_REGS PERF_HAVE_DWARF_REGS was true when an architecture had a dwarf-regs.c file. There are no more architecture dwarf-regs.c files, selection is done using constants from the ELF file rather than conditional compilation. When removing PERF_HAVE_DWARF_REGS was the only variable in the Makefile, remove the Makefile. Add missing SPDX for RISC-V Makefile. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-21-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	3ef6b89a12	perf dwarf-regs: Remove get_arch_regstr code get_arch_regstr no longer exists so remove declaration. Associated ifs and switches are made unconditional. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-20-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	a4747c0950	perf xtensa: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. As this is the only file in the arch/xtensa/util clean up Build files. Tidy up the EM_NONE cases for xtensa in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-19-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	85567a2a8d	perf sparc: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. As this is the only file in the arch/sparc/util clean up Build files. Tidy up the EM_NONE cases for sparc in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-18-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	04150f29e2	perf sh: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. As this is the only file in the arch/sh/util clean up Build files. Tidy up the EM_NONE cases for sh in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-17-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	b232b704a7	perf s390: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for s390 in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-16-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:14 -08:00
Ian Rogers	a90c451918	perf riscv: Remove dwarf-regs.c and add dwarf-regs-table.h The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case, and the register table is provided in a header file, the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for riscv in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-15-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	285b523c2d	perf dwarf-regs: Move powerpc dwarf-regs out of arch Move arch/powerpc/util/dwarf-regs.c to util/dwarf-regs-powerpc.c and compile in unconditionally. get_arch_regstr is redundant when EM_NONE is treated as EM_HOST so remove and update dwarf-regs.c conditions. Make get_powerpc_regs unconditionally available whwn libdw is. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-14-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	8a768a2f65	perf mips: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for mips in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-13-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	1d37bd8366	perf loongarch: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for loongarch in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	d4a0c4f221	perf dwarf-regs: Move csky dwarf-regs out of arch Move arch/csky/util/dwarf-regs.c to util/dwarf-regs-csky.c and compile in unconditionally. To avoid get_arch_regstr being duplicated, rename to get_csky_regstr and add to get_dwarf_regstr switch. Update #ifdefs to allow ABI V1 and V2 tables at the same time. Determine the table from the ELF flags. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	0c0a20ecdf	perf arm: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for arm in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	6f8e8add5a	perf arm64: Remove dwarf-regs.c The file just provides the function get_arch_regstr, however, if in the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case the function can never be called. So remove as dead code. Tidy up the EM_NONE cases for arm64 in dwarf-regs.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	bf4e799a0a	perf dwarf-regs: Move x86 dwarf-regs out of arch Move arch/x86/util/dwarf-regs.c to util/dwarf-regs-x86.c and compile in unconditionally. To avoid get_arch_regnum being duplicated, rename to get_x86_regnum and add to get_dwarf_regnum switch. For get_arch_regstr, this was unused on x86 unless the machine type was EM_NONE. Map that case to EM_HOST and remove get_arch_regstr from dwarf-regs-x86.c. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	a784847c2d	perf dwarf-regs: Pass ELF flags to get_dwarf_regstr Pass a flags value as architectures like csky need the flags to determine the ABI variant. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	9fc4489a16	perf dwarf-regs: Pass accurate disassembly machine to get_dwarf_regnum Rather than pass 0/EM_NONE, use the value computed in the disasm struct arch. Switch the EM_NONE case to EM_HOST, rewriting EM_NONE if it were passed to get_dwarf_regnum. Pass a flags value as architectures like csky need the flags to determine the ABI variant. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	cd6c9dca9d	perf disasm: Add e_machine/e_flags to struct arch Currently functions like get_dwarf_regnum only work with the host architecture. Carry the elf machine and flags in struct arch so that in disassembly these can be used to allow cross platform disassembly. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	ae894b7792	perf dwarf-regs: Add EM_HOST and EF_HOST defines Computed from the build architecture defines, EM_HOST and EF_HOST give values that can be used in dwarf register lookup. Place in dwarf-regs.h so the value can be shared. Move some dwarf-regs.c constants used for EM_HOST to dwarf-regs.h. Add CSky constants that may be missing. In disasm.c add an include of dwarf-regs.h as the included arch/*/annotate/instructions.c files make use of the constants and we want the elf.h/dwarf-regs.h dependency to be explicit. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:13 -08:00
Ian Rogers	6ac75289b2	perf dwarf-regs: Remove PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET was used for BPF prologue support which was removed in Commit `3d6dfae889` ("perf parse-events: Remove BPF event support"). The code is no longer used so remove. Remove the offset from various dwarf-regs.c tables and the dependence on ptrace.h. Rename structs starting pt_ as the ptrace derived offset is now removed. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:12 -08:00
Ian Rogers	2bf7692ead	perf bpf-prologue: Remove unused file Commit `4a73fca226` ("perf bpf-prologue: Remove unused file") missed cleaning up the header file. The code was unnecessary as Commit `3d6dfae889` ("perf parse-events: Remove BPF event support") removed building bpf-prologue.c. Fixes: `4a73fca226` ("perf bpf-prologue: Remove unused file") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241108234606.429459-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:39:12 -08:00
Ian Rogers	6d5d90a6ab	perf docs: Document tool and hwmon events Add a few paragraphs on tool and hwmon events. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:28:03 -08:00
Ian Rogers	531ee0fd48	perf test: Add hwmon "PMU" test Based on a mix of the sysfs PMU test (for creating the reference files) and the tool PMU test, test that parsing given hwmon events with there aliases creates the expected config values. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:28:03 -08:00
Ian Rogers	654986ed5d	perf pmu: Add calls enabling the hwmon_pmu Add the base PMU calls necessary for hwmon_pmu(s) to be created/deleted and events found, listed, opened and read. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:28:03 -08:00
Ian Rogers	53cc0b351e	perf hwmon_pmu: Add a tool PMU exposing events from hwmon in sysfs Add a tool PMU for hwmon events but don't enable. The hwmon sysfs ABI is defined in Documentation/hwmon/sysfs-interface.rst. Create a PMU that reads the hwmon input and can be used in `perf stat` and metrics much as an uncore PMU can. For example, when enabled by a later patch, the following shows reading the CPU temperature and 2 fan speeds alongside the uncore frequency: ``` $ perf stat -e temp_cpu,fan1,hwmon_thinkpad/fan2/,tool/num_cpus_online/ -M UNCORE_FREQ -I 1000 1.001153138 52.00 'C temp_cpu 1.001153138 2,588 rpm fan1 1.001153138 2,482 rpm hwmon_thinkpad/fan2/ 1.001153138 8 tool/num_cpus_online/ 1.001153138 1,077,101,397 UNC_CLOCK.SOCKET # 1.08 UNCORE_FREQ 1.001153138 1,012,773,595 duration_time ... ``` The PMUs are named from /sys/class/hwmon/hwmon<num>/name and have an alias of hwmon<num>. Hwmon data is presented in multiple <type><number>_<item> files. The <type><number> is used to identify the event as is the <type> followed by the contents of the <type>_label file if it exists. The <type><number>_input file gives the data read by perf. When enabled by a later patch, in `perf list` the other hwmon <item> files are used to give a richer description, for example: ``` hwmon: temp1 [Temperature in unit acpitz named temp1. Unit: hwmon_acpitz] in0 [Voltage in unit bat0 named in0. Unit: hwmon_bat0] temp_core_0 OR temp2 [Temperature in unit coretemp named Core 0. crit=100'C,max=100'C crit_alarm=0'C. Unit: hwmon_coretemp] temp_core_1 OR temp3 [Temperature in unit coretemp named Core 1. crit=100'C,max=100'C crit_alarm=0'C. Unit: hwmon_coretemp] ... temp_package_id_0 OR temp1 [Temperature in unit coretemp named Package id 0. crit=100'C,max=100'C crit_alarm=0'C. Unit: hwmon_coretemp] temp1 [Temperature in unit iwlwifi_1 named temp1. Unit: hwmon_iwlwifi_1] temp_composite OR temp1 [Temperature in unit nvme named Composite. alarm=0'C,crit=86.85'C,max=75.85'C, min=-273.15'C. Unit: hwmon_nvme] temp_sensor_1 OR temp2 [Temperature in unit nvme named Sensor 1. max=65261.8'C,min=-273.15'C. Unit: hwmon_nvme] temp_sensor_2 OR temp3 [Temperature in unit nvme named Sensor 2. max=65261.8'C,min=-273.15'C. Unit: hwmon_nvme] fan1 [Fan in unit thinkpad named fan1. Unit: hwmon_thinkpad] fan2 [Fan in unit thinkpad named fan2. Unit: hwmon_thinkpad] ... temp_cpu OR temp1 [Temperature in unit thinkpad named CPU. Unit: hwmon_thinkpad] temp_gpu OR temp2 [Temperature in unit thinkpad named GPU. Unit: hwmon_thinkpad] curr1 [Current in unit ucsi_source_psy_usbc000_0 named curr1. max=1.5A. Unit: hwmon_ucsi_source_psy_usbc000_0] in0 [Voltage in unit ucsi_source_psy_usbc000_0 named in0. max=5V,min=5V. Unit: hwmon_ucsi_source_psy_usbc000_0] ``` As there may be multiple hwmon devices a range of PMU types are reserved for their use and to identify the PMU as belonging to the hwmon types. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:28:03 -08:00
Ian Rogers	8c329057de	perf test: Add hwmon filename parser test Filename parsing maps a hwmon filename to constituent parts enum/int parts for the hwmon config value. Add a test case for the parsing. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> [namhyung: add #include <linux/string.h> for strlcpy()] Link: https://lore.kernel.org/r/20241109003759.473460-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:27:44 -08:00
Ian Rogers	4810b761f8	perf hwmon_pmu: Add hwmon filename parser hwmon filenames have a specific encoding that will be used to give a config value. The encoding is described in: Documentation/hwmon/sysfs-interface.rst Add a function to parse the filename into consituent enums/ints that will then be amenable to config encoding. Note, things are done this way to allow mapping names to config and back without the use of hash/dynamic lookup tables. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> [namhyung: add #include <linux/string.h> for strlcpy()] Link: https://lore.kernel.org/r/20241109003759.473460-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-09 08:26:53 -08:00
Yicong Yang	35de42cdfb	perf build: Include libtraceevent headers directly indicated by pkg-config Currently the libtraceevent's found by pkg-config, which give the include path as: [root@localhost tmp]# pkg-config --cflags libtraceevent -I/usr/local/include/traceevent So we should include the libtraceevent headers directly without "traceevent/" prefix. Update all the users. Fixes: `0f0e1f4456` ("perf build: Use pkg-config for feature check for libtrace{event,fs}") Suggested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/linux-perf-users/ZyF5_Hf1iL01kldE@google.com/ Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: leo.yan@arm.com Cc: amadio@gentoo.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20241105105649.45399-1-yangyicong@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-08 22:42:57 -08:00
Steve Clevenger	e8328bf3cd	perf script python: Adjust objdump start/end per map pgoff parameter Extract map_pgoff parameter from the dictionary, and adjust start/end range passed to objdump based on the value. A zero start_addr is filtered to prevent output of dso address range check failures. This script repeatedly sees a zero value passed in for start_addr = cpu_data[str(cpu) + 'addr'] These zero values are not a new problem. The start_addr/stop_addr warning clutters the instruction trace output, hence this change. Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Cc: suzuki.poulose@arm.com Cc: james.clark@linaro.org Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: ilkka@os.amperecomputing.com Link: https://lore.kernel.org/r/21ccdd22e664bdeccb878672d4b2c0518873c1e5.1731027120.git.scclevenger@os.amperecomputing.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-08 22:42:57 -08:00
Steve Clevenger	26ec3d7cc3	perf script cs_etm: Add map_pgoff to python dictionary Extract map_pgoff parameter from the dictionary, and adjust start/end range passed to objdump based on the value. A zero start_addr is filtered to prevent output of dso address range check failures. This script repeatedly sees a zero value passed in for start_addr = cpu_data[str(cpu) + 'addr'] These zero values are not a new problem. The start_addr/stop_addr warning clutters the instruction trace output, hence this change. Signed-off-by: Steve Clevenger <scclevenger@os.amperecomputing.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Cc: suzuki.poulose@arm.com Cc: james.clark@linaro.org Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: ilkka@os.amperecomputing.com Link: https://lore.kernel.org/r/8d9a1142dc58ffa34a000cb7b7a26055df0a37ec.1731027120.git.scclevenger@os.amperecomputing.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-08 22:42:56 -08:00
Ian Rogers	62a6d092f1	perf stat: Expand metric+unit buffer size Long metric names combined with units may exceed the metric_bf and lead to truncation. Double metric_bf in size to avoid this. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20241106004818.2174593-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-07 11:49:50 -08:00
Haiyue Wang	d8c0f8b4ee	perf tools: Add the empty-pmu-events build to .gitignore The commit `0fe881f10c` ("perf jevents: Autogenerate empty-pmu-events.c") build will generate two files, add them to .gitignore: tools/perf/pmu-events/empty-pmu-events.log tools/perf/pmu-events/test-empty-pmu-events.c Signed-off-by: Haiyue Wang <haiyuewa@163.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241106121254.2869-1-haiyuewa@163.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-07 10:51:56 -08:00
Dr. David Alan Gilbert	9ac98662db	perf: event: Remove deadcode event_format__print() last use was removed by 2017's commit `894f3f1732` ("perf script: Use event_format__fprintf()") evlist__find_tracepoint_by_id() last use was removed by 2012's commit `e60fc847ce` ("perf evlist: Remove some unused methods") evlist__set_tp_filter_pid() last use was removed by 2017's commit `dd1a50377c` ("perf trace: Introduce filter_loop_pids()") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241106144826.91728-1-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-07 10:51:56 -08:00
Benjamin Peterson	5fb8e56542	perf trace: avoid garbage when not printing a trace event's arguments trace__fprintf_tp_fields may not print any tracepoint arguments. E.g., if the argument values are all zero. Previously, this would result in a totally uninitialized buffer being passed to fprintf, which could lead to garbage on the console. Fix the problem by passing the number of initialized bytes fprintf. Fixes: `f11b2803bb` ("perf trace: Allow choosing how to augment the tracepoint arguments") Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20241103204816.7834-1-benjamin@engflow.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-05 23:27:17 -08:00
Kuan-Wei Chiu	8f0d91f410	perf tools: update expected diff for lib/list_sort.c Since there are no longer any header include differences between lib/list_sort.c and tools/lib/list_sort.c, update the expected diff in check-header_ignore_hunks accordingly. Link: https://lkml.kernel.org/r/20241012042828.471614-4-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:33 -08:00
Namhyung Kim	29bf07bc9a	perf test: Fix ftrace test with regex patterns During the parallel testing, I've noticed some ftrace test failures. It seems the regex pattern checks 100 msec of nanosleep with the error range of 10 msec. But sometimes it's affected by other processes and resulted in more time in the syscall. The following output shows that it took more than 120 msec and failed. Let's update the regex pattern so that it can allow more drifts. perf ftrace profile test # Total (us) Avg (us) Max (us) Count Function 121279.500 121279.500 121279.500 1 __x64_sys_clock_nanosleep 121278.400 121278.400 121278.400 1 common_nsleep 121277.800 121277.800 121277.800 1 hrtimer_nanosleep 121277.100 121277.100 121277.100 1 do_nanosleep 341760.289 56960.048 121273.400 6 schedule 176.200 25.171 31.616 7 scheduler_tick 0.923 0.923 0.923 1 native_smp_send_reschedule 345522.360 69104.472 345320.600 5 __x64_sys_execve 345486.585 69097.317 345312.700 5 do_execveat_common.isra.0 340730.300 340730.300 340730.300 1 bprm_execve 1.758 0.879 0.883 2 sched_mm_cid_before_execve 1.112 1.112 1.112 1 sched_mm_cid_after_execve ---- end(-1) ---- 81: perf ftrace tests : FAILED! Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241102231702.2262258-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:28:25 -08:00
Arnaldo Carvalho de Melo	a52143aa21	perf test: Remove dangling CFLAGS for removed attr.o object Since the C test wrapper for attr.py was removed we don't have an attr.o object for that CFLAGS_attr.o to apply for, remove it. Fixes: `3a447031f5` ("perf test: Remove C test wrapper for attr.py") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: James Clark <james.clark@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/ZyjbksKYnV22zmz-@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:23:26 -08:00
Charlie Jenkins	6e0e0a1863	perf tools: Add all shellcheck_log to gitignore Instead of adding specific shellcheck_log files to the gitignore, add all of them to prevent these files from cluttering the git status. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20241104-shellcheck_gitignore-v1-1-ffc179f57dc9@rivosinc.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:23:20 -08:00
Yicong Yang	d5a0a4ab4a	perf build: Add missing cflags when building with custom libtraceevent When building with custom libtraceevent, below errors occur: $ make -C tools/perf NO_LIBPYTHON=1 PKG_CONFIG_PATH=<custom libtraceevent> In file included from util/session.h:5, from builtin-buildid-list.c:17: util/trace-event.h:153:10: fatal error: traceevent/event-parse.h: No such file or directory 153 \| #include <traceevent/event-parse.h> \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ <snip similar errors of missing headers> This is because the include path is missed in the cflags. Add it. Fixes: `0f0e1f4456` ("perf build: Use pkg-config for feature check for libtrace{event,fs}") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Guilherme Amadio <amadio@gentoo.org> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20241024133236.31016-1-yangyicong@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:11:32 -08:00
Michael Petlan	c741c7b5e9	perf test: Remove cpu-list BPF cgroup counter test The cpu-list part of this testcase has proven itself to be unreliable. Sometimes, we get "<not counted>" for system.slice when pinned to CPUs 0 and 1. In such case, the test fails. Since we cannot simply guarantee that any system.slice load will run on any arbitrary list of CPUs, except the whole set of all CPUs, let's rather remove the cpu-list subtest. Fixes: `a84260e314` ("perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test") Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: vmolnaro@redhat.com Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20241101102812.576425-1-mpetlan@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 22:10:48 -08:00
Ian Rogers	13e17c9ff4	perf build: Make libunwind opt-in rather than opt-out Having multiple unwinding libraries makes the perf code harder to understand and we have unused/untested code paths. Perf made BPF support an opt-out rather than opt-in feature. As libbpf has a libelf dependency, elfutils that provides libelf will also provide libdw. When libdw is present perf will use libdw unwinding rather than libunwind unwinding even if libunwind support is compiled in. Rather than have libunwind built into perf and never used, explicitly disable the support and make it opt-in. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20241028193619.247727-1-irogers@google.com Closes: https://lore.kernel.org/linux-perf-users/CAP-5=fUXkp-d7gkzX4eF+nbjb2978dZsiHZ9abGHN=BN1qAcbg@mail.gmail.com/ Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-04 11:32:35 -08:00
Namhyung Kim	aa5c90601b	Merge 'origin/master' into perf-tools-next To get the fixes in the perf-tools branch. Resolved a conflict due to RISC-V's syscall table change. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-03 23:18:20 -08:00
Tengda Wu	d36e5b36a2	perf test: Use sqrtloop workload to test bperf event Replace `brstack` workload with `sqrtloop` workload, because `sqrtloop` workload contains fork(), which is suitable for testing the bperf event inheritance feature. Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Cc: song@kernel.org Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20241021110201.325617-3-wutengda@huaweicloud.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-01 23:31:08 -07:00
Tengda Wu	07dc3a6de3	perf stat: Support inherit events during fork() for bperf bperf has a nice ability to share PMUs, but it still does not support inherit events during fork(), resulting in some deviations in its stat results compared with perf. perf stat result: $ ./perf stat -e cycles,instructions -- ./perf test -w sqrtloop Performance counter stats for './perf test -w sqrtloop': 2,316,038,116 cycles 2,859,350,725 instructions 1.009603637 seconds time elapsed 1.004196000 seconds user 0.003950000 seconds sys bperf stat result: $ ./perf stat --bpf-counters -e cycles,instructions -- \ ./perf test -w sqrtloop Performance counter stats for './perf test -w sqrtloop': 18,762,093 cycles 23,487,766 instructions 1.008913769 seconds time elapsed 1.003248000 seconds user 0.004069000 seconds sys In order to support event inheritance, two new bpf programs are added to monitor the fork and exit of tasks respectively. When a task is created, add it to the filter map to enable counting, and reuse the `accum_key` of its parent task to count together with the parent task. When a task exits, remove it from the filter map to disable counting. After support: $ ./perf stat --bpf-counters -e cycles,instructions -- \ ./perf test -w sqrtloop Performance counter stats for './perf test -w sqrtloop': 2,316,252,189 cycles 2,859,946,547 instructions 1.009422314 seconds time elapsed 1.003597000 seconds user 0.004270000 seconds sys Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Cc: song@kernel.org Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20241021110201.325617-2-wutengda@huaweicloud.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-11-01 23:31:08 -07:00
James Clark	ba993e5ada	perf arm-spe: Use old behavior when opening old SPE files Since the linked commit, we stopped interpreting data source if the perf.data file doesn't have the new metadata version. This means that perf c2c will show no samples in this case. Keep the old behavior so old files can be opened, but also still show the new warning that updating might improve the decoding. Also re-write the warning to be more concise and specific to a user. Fixes: `ba5e7169e5` ("perf arm-spe: Use metadata to decide the data source feature") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Cc: Julio.Suarez@arm.com Cc: Kiel.Friedt@arm.com Cc: Ryan.Roberts@arm.com Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241029143734.291638-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-30 23:50:47 -07:00
Arnaldo Carvalho de Melo	064d569e20	perf ftrace latency: Fix unit on histogram first entry when using --use-nsec The use_nsec arg wasn't being taken into account when printing the first histogram entry, fix it: root@number:~# perf ftrace latency --use-nsec -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 ns \| 0 \| \| 2 - 4 ns \| 0 \| \| 4 - 8 ns \| 0 \| \| 8 - 16 ns \| 0 \| \| 16 - 32 ns \| 0 \| \| 32 - 64 ns \| 125 \| \| 64 - 128 ns \| 335 \| \| 128 - 256 ns \| 2155 \| #### \| 256 - 512 ns \| 9996 \| ################### \| 512 - 1024 ns \| 4958 \| ######### \| 1 - 2 us \| 4636 \| ######### \| 2 - 4 us \| 1053 \| ## \| 4 - 8 us \| 15 \| \| 8 - 16 us \| 1 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - ... ms \| 0 \| \| root@number:~# After: root@number:~# perf ftrace latency --use-nsec -T switch_mm_irqs_off -a sleep 2 # DURATION \| COUNT \| GRAPH \| 0 - 1 ns \| 0 \| \| 1 - 2 ns \| 0 \| \| 2 - 4 ns \| 0 \| \| 4 - 8 ns \| 0 \| \| 8 - 16 ns \| 0 \| \| 16 - 32 ns \| 0 \| \| 32 - 64 ns \| 19 \| \| 64 - 128 ns \| 94 \| \| 128 - 256 ns \| 2191 \| #### \| 256 - 512 ns \| 9719 \| #################### \| 512 - 1024 ns \| 5330 \| ########### \| 1 - 2 us \| 4104 \| ######## \| 2 - 4 us \| 807 \| # \| 4 - 8 us \| 9 \| \| 8 - 16 us \| 0 \| \| 16 - 32 us \| 0 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - ... ms \| 0 \| \| root@number:~# Fixes: `84005bb614` ("perf ftrace latency: Add -n/--use-nsec option") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/ZyE3frB-hMXHCnMO@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-30 23:46:43 -07:00
Björn Töpel	8c0d1202ba	perf, riscv: Wire up perf trace support for RISC-V RISC-V does not currently support perf trace, since the system call table is not generated. Perform the copy/paste exercise, wiring up RISC-V system call table generation. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Anup Patel <anup@brainfault.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: linux-riscv@lists.infradead.org Cc: Atish Patra <atishp@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20241024190353.46737-1-bjorn@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-30 23:39:34 -07:00
Arnaldo Carvalho de Melo	54afc56db2	perf probe: Fix retrieval of source files from a debuginfod server When perf is linked with libdebuginfod: root@number:~# ldd ~/bin/perf \| grep debuginfod libdebuginfod.so.1 => /lib64/libdebuginfod.so.1 (0x00007ff5c3930000) root@number:~# perf check feature debuginfod debuginfod: [ on ] # HAVE_DEBUGINFOD_SUPPORT root@number:~# And we don't have a debuginfo package installed for the binary we're trying to use, vmlinux in this case as we didn't specify any using 'perf probe -x', it will use the build for the running kernel: root@number:~# perf buildid-list -k 38e927fd7799d50dbc4d99ec5e3f781b6105a6a9 root@number:~# And communicate with a debuginfo server, be it configured in a ~/.perfconfig file, excerpt from the 'perf config' man page: buildid-cache.* buildid-cache.debuginfod=URLs Specify debuginfod URLs to be used when retrieving perf.data binaries, it follows the same syntax as the DEBUGINFOD_URLS variable, like: buildid-cache.debuginfod=http://192.168.122.174:8002 Or via the DEBUGINFOD_URLS env var, as distros like fedora do by default: root@number:~# echo $DEBUGINFOD_URLS https://debuginfod.fedoraproject.org/ root@number:~# To pick and cache just what is needed, instead of requiring the manual installation of the entire kernel-debuginfo package, which is really large. It will, in this example, use the following cache files, deleted before/after this patch just to test the whole process: root@number:~# rm -f /root/.cache/debuginfod_client/38e927fd7799d50dbc4d99ec5e3f781b6105a6a9/source-a1414a5d-#usr#src#debug#kernel-6.11.4#linux-6.11.4-201.fc40.x86_64#net#ipv4#icmp.c root@number:~# rm -f /root/.cache/debuginfod_client/38e927fd7799d50dbc4d99ec5e3f781b6105a6a9/debuginfo Before this patch: root@number:~# perf probe -L icmp_rcv Failed to find source file path. Error: Failed to show lines. root@number:~# This is because 'perf probe' was using just the relative file name, in this case "net/ipv4/icmp.c", that is where the 'icmp_rcv' function is located, if we add it and comply with the debuginfo_find_source() function man page, it contacts the server, finds the necessary files, cache them locally and all works: root@number:~# perf probe -L icmp_rcv \| head <icmp_rcv@/root/.cache/debuginfod_client/38e927fd7799d50dbc4d99ec5e3f781b6105a6a9/source-a1414a5d-#usr#src#debug#kernel-6.11.4#linux-6.11.4-201.fc40.x86_64#net#ipv4#icmp.c:0> 0 int icmp_rcv(struct sk_buff skb) { 2 enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; struct rtable rt = skb_rtable(skb); struct net net = dev_net(rt->dst.dev); struct icmphdr icmph; if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { 8 struct sec_path *sp = skb_sec_path(skb); root@number:~# Acked-by: Frank Ch. Eigler <fche@redhat.com> Cc: Aaron Merey <amerey@redhat.com> Cc: Francesco Nigro <fnigro@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/ZyACsIFUETsr7-09@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:36:39 -07:00
Graham Woodward	35f5aa9ccc	perf arm-spe: Update --itrace help text The --itrace help now needs updating to reflect that the --itrace=b argument sythesises branches as well as branch misses. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-5-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:10:17 -07:00
Graham Woodward	edff8dad3f	perf arm-spe: Correctly set sample flags Set flags on all synthesized instruction and branch samples. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-4-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:10:14 -07:00
Graham Woodward	c1b67c8510	perf arm-spe: Use ARM_SPE_OP_BRANCH_ERET when synthesizing branches Instead of checking the type for just branch misses, we can instead check for the OP_BRANCH_ERET and synthesise branches as well as branch misses. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-3-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:10:10 -07:00
Graham Woodward	19966d792b	perf arm-spe: Set sample.addr to target address for instruction sample For an instruction sample, assign the target address to the field 'to_ip'. If it is a non-branch record, to_ip will be 0, presenting a non-valid target address. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-2-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-29 16:10:05 -07:00
Xu Yang	e3b2949e3f	perf vendor events arm64: Add i.MX91 DDR Performance Monitor metrics Add JSON metrics for i.MX91 DDR Performance Monitor. Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: festevam@gmail.com Cc: conor+dt@kernel.org Cc: krzk+dt@kernel.org Cc: robh@kernel.org Cc: shawnguo@kernel.org Cc: will@kernel.org Cc: james.clark@linaro.org Cc: mike.leach@linaro.org Cc: leo.yan@linux.dev Cc: linux-arm-kernel@lists.infradead.org Cc: imx@lists.linux.dev Cc: Frank.li@nxp.com Cc: john.g.garry@oracle.com Cc: kernel@pengutronix.de Cc: s.hauer@pengutronix.de Cc: devicetree@vger.kernel.org Link: https://lore.kernel.org/r/20240924061251.3387850-3-xu.yang_2@nxp.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:37:02 -07:00
Ian Rogers	7449a4d674	perf test: Sort tests placing exclusive tests last This allows a uniform test numbering even though two passes are used to execute them. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	553d5efeb3	perf test: Add a signal handler to kill forked child processes If the `perf test` process is killed the child tests continue running and may run indefinitely. Propagate SIGINT (ctrl-C) and SIGTERM (kill) signals to the running child processes so that they terminate when the parent is killed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	94d1a913bd	perf test: Make parallel testing the default Now C tests can have the "exclusive" flag to run without other tests, and shell tests can add "(exclusive)" to their description, run tests in parallel by default. Tests which flake when run in parallel can be marked exclusive to resolve the problem. Non-scientifically, the reduction on `perf test` execution time is from 8m35.890s to 3m55.115s on a Tigerlake laptop. So the tests complete in less than half the time. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	79e72f384d	perf test: Run parallel tests in two passes In pass 1 run all tests that succeed when run in parallel. In pass 2 sequentially run all remaining tests that are flagged as "exclusive". Sequential and dont_fork tests keep to run in pass 1. Read the exclusive flag from the shell test descriptions, but remove from display to avoid >100 characters. Add error handling to finish tests if starting a later test fails. Mark the task-exit test as exclusive due to issues reported-by James Clark. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	a6fffc6094	perf test: Add a signal handler around running a test Add a signal handler around running a test. If a signal occurs during the test a siglongjmp unwinds the stack and output is flushed. The global run_test_jmp_buf is either unique per forked child or not shared during sequential execution. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	2532be3d21	perf test: Tag parallel failing shell tests with "(exclusive)" Some shell tests compete for resources and so can't run with other tests, tag such tests. The "(exclusive)" stems from shared/exclusive to describe how the tests run as if holding a lock. For ARM/coresight tests: Suggested-by: James Clark <james.clark@linaro.org> Additional failing tests: Suggested-by: Namhyung Kim <namhyung@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:58 -07:00
Ian Rogers	2c66343927	perf test: Avoid list test blocking on writing to stdout Python's json.tool will output the input json to stdout. Redirect to /dev/null to avoid blocking on stdout writes. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:57 -07:00
Ian Rogers	d50318fe00	perf test: Reduce scope of parallel variable The variable duplicates sequential but is only used for command line argument processing. Reduce scope to make the behavior clearer. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:57 -07:00
Ian Rogers	0e036dcad4	perf test: Display number of active running tests Before polling or sleeping to wait for a test to complete, print out ": Running (<num> active)" where the number of active tests is determined by iterating over the tests and seeing which return false for check_if_command_finished. The line erasing and printing out only occur if the number of runnings tests changes to avoid the line flickering excessively. Knowing tests are running allows a user to know a test is running and in parallel mode how many of the tests are waiting to complete. If color mode is disabled then avoid displaying the "Running" message as deleting the line isn't reliable. Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241025192109.132482-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-28 09:32:57 -07:00
Ian Rogers	a5384c4267	perf cap: Add __NR_capget to arch/x86 unistd As there are duplicated kernel headers in tools/include libc can pick up the wrong definitions. This was causing the wrong system call for capget in perf. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: `e25ebda78e` ("perf cap: Tidy up and improve capability testing") Closes: https://lore.kernel.org/lkml/cc7d6bdf-1aeb-4179-9029-4baf50b59342@intel.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241026055448.312247-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-28 13:04:52 -03:00
Arnaldo Carvalho de Melo	55f1b540d8	tools headers: Update the linux/unaligned.h copy with the kernel sources To pick up the changes in: `7f053812da` ("random: vDSO: minimize and simplify header includes") That required adding a copy of include/vdso/unaligned.h and its checking in tools/perf/check-headers.h. Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/include/linux/unaligned.h include/linux/unaligned.h Please see tools/include/uapi/README for further details. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Ian Rogers <irogers@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zx-uHvAbPAESofEN@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-28 12:34:28 -03:00
Li Huafei	150dab31d5	perf disasm: Fix not cleaning up disasm_line in symbol__disassemble_raw() In symbol__disassemble_raw(), the created disasm_line should be discarded before returning an error. When creating disasm_line fails, break the loop and then release the created lines. Fixes: `0b971e6bf1` ("perf annotate: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: sesse@google.com Cc: kjain@linux.ibm.com Link: https://lore.kernel.org/r/20241019154157.282038-3-lihuafei1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-23 15:36:14 -07:00
Li Huafei	908d50e50e	perf disasm: Use disasm_line__free() to properly free disasm_line symbol__disassemble_capstone_powerpc() goto the 'err' label when it failed in the loop that created disasm_line, and then used free() directly to free disasm_line. Since the structure disasm_line contains members that allocate memory dynamically, this can result in a memory leak. In fact, we can simply break the loop when it fails in the middle of the loop, and disasm_line__free() will then be called to properly free the created line. Other error paths do not need to consider freeing disasm_line. Fixes: `c5d60de181` ("perf annotate: Add support to use libcapstone in powerpc") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: sesse@google.com Cc: kjain@linux.ibm.com Link: https://lore.kernel.org/r/20241019154157.282038-2-lihuafei1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-23 15:36:06 -07:00
Li Huafei	b4e0e9a1e3	perf disasm: Use disasm_line__free() to properly free disasm_line The structure disasm_line contains members that require dynamically allocated memory and need to be freed correctly using disasm_line__free(). This patch fixes the incorrect release in symbol__disassemble_capstone(). Fixes: `6d17edc113` ("perf annotate: Use libcapstone to disassemble") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: sesse@google.com Cc: kjain@linux.ibm.com Link: https://lore.kernel.org/r/20241019154157.282038-1-lihuafei1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-23 15:35:38 -07:00
Arnaldo Carvalho de Melo	758f181589	perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT Noticed while building on a raspbian arm 32-bit system. There was also this other case, fixed by adding a missing util/stat.h with the prototypes: /tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1396:6: error: no previous prototype for ‘perf_stat__set_no_csv_summary’ [-Werror=missing-prototypes] 1396 \| void perf_stat__set_no_csv_summary(int set __maybe_unused) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1400:6: error: no previous prototype for ‘perf_stat__set_big_num’ [-Werror=missing-prototypes] 1400 \| void perf_stat__set_big_num(int set __maybe_unused) \| ^~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors In other architectures this must be building due to some lucky indirect inclusion of that header. Fixes: `9dabf40034` ("perf python: Switch module to linking libraries from building source") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZxllAtpmEw5fg9oy@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 19:29:50 -03:00
Veronika Molnarova	06a130e42a	perf test: Handle perftool-testsuite_probe failure due to broken DWARF Test case test_adding_blacklisted ends in failure if the blacklisted probe is of an assembler function with no DWARF available. At the same time, probing the blacklisted function with ASM DWARF doesn't test the blacklist itself as the failure is a result of the broken DWARF. When the broken DWARF output is encountered, check if the probed function was compiled by the assembler. If so, the broken DWARF message is expected and does not report a perf issue, else report a failure. If the ASM DWARF affected the probe, try the next probe on the blacklist. If the first 5 probes are defective due to broken DWARF, skip the test case. Fixes: `def5480d63` ("perf testsuite probe: Add test for blacklisted kprobes handling") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241017161555.236769-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 17:23:09 -03:00
Jiri Slaby	5d35634ecc	perf trace: Fix non-listed archs in the syscalltbl routines This fixes a build breakage on 32-bit arm, where the syscalltbl__id_at_idx() function was missing. Committer notes: Generating a proper syscall table from a copy of arch/arm/tools/syscall.tbl ends up being too big a patch for this rc stage, I started doing it but while testing noticed some other problems with using BPF to collect pointer args on arm7 (32-bit) will maybe continue trying to make it work on the next cycle... Fixes: `7a2fb5619c` ("perf trace: Fix iteration of syscall ids in syscalltbl->entries") Suggested-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: <jslaby@suse.cz> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/lkml/3a592835-a14f-40be-8961-c0cee7720a94@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 11:34:56 -03:00
Howard Chu	7fbff3c0e0	perf build: Change the clang check back to 12.0.1 This serves as a revert for this patch: https://lore.kernel.org/linux-perf-users/ZuGL9ROeTV2uXoSp@x1/ Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241011021403.4089793-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 11:34:56 -03:00
Howard Chu	395d38419f	perf trace augmented_raw_syscalls: Add more checks to pass the verifier Add some more checks to pass the verifier in more kernels. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241011021403.4089793-3-howardchu95@gmail.com [ Reduced the patch removing things that can be done later ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 11:34:56 -03:00
Arnaldo Carvalho de Melo	ecabac70ff	perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers In a RHEL8 kernel (4.18.0-513.11.1.el8_9.x86_64), that, as enterprise kernels go, have backports from modern kernels, the verifier complains about lack of bounds check for the index into the array of syscall arguments, on a BPF bytecode generated by clang 17, with: ; } else if (size < 0 && size >= -6) { /* buffer / 116: (b7) r1 = -6 117: (2d) if r1 > r6 goto pc-30 R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv-6 R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6=inv(id=0,umin_value=18446744073709551610,var_off=(0xffffffff00000000; 0xffffffff)) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value ; index = -(size + 1); 118: (a7) r6 ^= -1 119: (67) r6 <<= 32 120: (77) r6 >>= 32 ; aug_size = args->args[index]; 121: (67) r6 <<= 3 122: (79) r1 = (u64 )(r10 -24) 123: (0f) r1 += r6 last_idx 123 first_idx 116 regs=40 stack=0 before 122: (79) r1 = (u64 )(r10 -24) regs=40 stack=0 before 121: (67) r6 <<= 3 regs=40 stack=0 before 120: (77) r6 >>= 32 regs=40 stack=0 before 119: (67) r6 <<= 32 regs=40 stack=0 before 118: (a7) r6 ^= -1 regs=40 stack=0 before 117: (2d) if r1 > r6 goto pc-30 regs=42 stack=0 before 116: (b7) r1 = -6 R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv1 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_w=inv(id=0) R5_w=inv40 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=0) R7_w=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8_w=invP6 R9_w=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_w=map_value fp-24_r=map_value fp-32_w=inv40 fp-40=ctx fp-48=map_value fp-56_w=inv1 fp-64_w=map_value fp-72=map_value fp-80=map_value parent didn't have regs=40 stack=0 marks last_idx 110 first_idx 98 regs=40 stack=0 before 110: (6d) if r1 s> r6 goto pc+5 regs=42 stack=0 before 109: (b7) r1 = 1 regs=40 stack=0 before 108: (65) if r6 s> 0x1000 goto pc+7 regs=40 stack=0 before 98: (55) if r6 != 0x1 goto pc+9 R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=invP12 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_rw=inv(id=0) R5_w=inv24 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=2147483647) R7_w=map_value(id=0,off=40,ks=4,vs=8272,imm=0) R8_rw=invP4 R9_w=map_value(id=0,off=12,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_rw=map_value fp-24_r=map_value fp-32_rw=invP24 fp-40_r=ctx fp-48_r=map_value fp-56_w=invP1 fp-64_rw=map_value fp-72_r=map_value fp-80_r=map_value parent already had regs=40 stack=0 marks 124: (79) r6 = (u64 )(r1 +16) R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=map_value(id=0,off=0,ks=4,vs=8272,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6_w=invP(id=0,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value R1 unbounded memory access, make sure to bounds check any such access processed 466 insns (limit 1000000) max_states_per_insn 2 total_states 20 peak_states 20 mark_read 3 If we add this line, as used in other BPF programs, to cap that index: index &= 7; The generated BPF program is considered safe by that version of the BPF verifier, allowing perf to collect the syscall args in one more kernel using the BPF based pointer contents collector. With the above one-liner it works with that kernel: [root@dell-per740-01 ~]# uname -a Linux dell-per740-01.khw.eng.rdu2.dc.redhat.com 4.18.0-513.11.1.el8_9.x86_64 #1 SMP Thu Dec 7 03:06:13 EST 2023 x86_64 x86_64 x86_64 GNU/Linux [root@dell-per740-01 ~]# ~acme/bin/perf trace -e sleep* sleep 1.234567890 0.000 (1234.704 ms): sleep/3863610 nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }) = 0 [root@dell-per740-01 ~]# As well as with the one in Fedora 40: root@number:~# uname -a Linux number 6.11.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 10 22:31:19 UTC 2024 x86_64 GNU/Linux root@number:~# perf trace -e sleep sleep 1.234567890 0.000 (1234.722 ms): sleep/14873 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x7ffe87311a40) = 0 root@number:~# Song Liu reported that this one-liner was being optimized out by clang 18, so I suggested and he tested that adding a compiler barrier before it made clang v18 to keep it and the verifier in the kernel in Song's case (Meta's 5.12 based kernel) also was happy with the resulting bytecode. I'll investigate using virtme-ng[1] to have all the perf BPF based functionality thoroughly tested over multiple kernels and clang versions. [1] https://kernel-recipes.org/en/2024/virtme-ng/ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <andrea.righi@linux.dev> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/lkml/Zw7JgJc0LOwSpuvx@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-23 11:34:56 -03:00
Namhyung Kim	36fae9f93e	perf test: Add precise_max subtest to the perf record shell test It's a very simply test just to run with cycles:P and instructions:P events. Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-10-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:55:08 -07:00
Namhyung Kim	634d36f825	perf record: Just use "cycles:P" as the default event The fallback logic can add ":u" modifier if needed. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-9-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:55:08 -07:00
Namhyung Kim	af954f76ee	perf tools: Check fallback error and order The perf_event_open might fail due to various reasons, so blindly reducing precise_ip level might not be the best way to deal with it. It seems the kernel return -EOPNOTSUPP when PMU doesn't support the given precise level. Let's try again with the correct error code. This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but user events with exclude_kernel=1 cannot make progress. Let's add the evsel__handle_error_quirks() to this case specially. I plan to work on the kernel side to improve this situation but it'd still need some special handling for IBS. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:55:08 -07:00
Namhyung Kim	28398ce172	perf tools: Move x86__is_amd_cpu() to util/env.c It can be called from non-x86 platform so let's move it to the general util directory. Also add a new helper perf_env__is_x86_amd_cpu() so that it can be called with an existing perf_env as well. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:55:07 -07:00
Namhyung Kim	3b193a57ba	perf tools: Detect missing kernel features properly The evsel__detect_missing_features() is to check if the attributes of the evsel is supported or not. But it checks the attribute based on the given evsel, it might miss something if the attr doesn't have the bit or give incorrect results if the event is special. Also it maintains the order of the feature that was added to the kernel which means it can assume older features should be supported once it detects the current feature is working. To minimized the confusion and to accurately check the kernel features, I think it's better to use a software event and go through all the features at once. Also make the function static since it's only used in evsel.c. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:52:11 -07:00
Namhyung Kim	88bc63d00e	perf tools: Do not set exclude_guest for precise_ip It seems perf sets the exclude_guest bit because of Intel PEBS implementation which uses a virtual address. IIUC now kernel disables PEBS when it goes to the guest mode regardless of this bit so we don't need to set it explicitly. At least for the other archs/vendors. I found the commit `1342798cc1` set the exclude_guest for precise_ip in the tool and the commit `20b279ddb3` added kernel side enforcement which was reverted by commit `a706d965dc` later. Actually it doesn't set the exclude_guest for the default event (cycles:P) already. $ grep -m1 vendor /proc/cpuinfo vendor_id : GenuineIntel $ perf record -e cycles:P true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ] $ perf evlist -v \| tr ',' '\n' \| grep -e exclude -e precise precise_ip: 3 But having lower 'p' modifier set the bit for some reason. $ perf record -e cycles:pp true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ] $ perf evlist -v \| tr ',' '\n' \| grep -e exclude -e precise precise_ip: 2 exclude_guest: 1 Actually AMD IBS suffers from this because it doesn't support excludes and having this bit effectively disables new features in the current implementation (due to the missing feature check). $ grep -m1 vendor /proc/cpuinfo vendor_id : AuthenticAMD $ perf record -W -e cycles:p -vv true 2>&1 \| grep switching switching off PERF_FORMAT_LOST support switching off weight struct support switching off bpf_event switching off ksymbol switching off cloexec flag switching off mmap2 switching off exclude_guest, exclude_host By not setting exclude_guest, we can fix this inconsistency and the troubles. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:52:11 -07:00
Namhyung Kim	d9e0970f77	perf tools: Simplify evsel__add_modifier() Since it doesn't set the exclude_guest, no need to special handle the bit and simply show only if one of host or guest bit is set. Now the default event name might not have :H prefix anymore so change the dlfilter test not to compare the ":" at the end. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:52:11 -07:00
Namhyung Kim	35c8d21371	perf tools: Don't set attr.exclude_guest by default The exclude_guest in the event attribute is to limit profiling in the host environment. But I'm not sure why we want to set it by default cause we don't care about it in most cases and I feel like it just makes new PMU implementation complicated. Of course it's useful for perf kvm command so I added the exclude_GH_default variable to preserve the old behavior for perf kvm and other commands like perf record and stat won't set the exclude bit. This is helpful for AMD IBS case since having exclude_guest bit will clear new feature bit due to the missing feature check logic. $ sysctl kernel.perf_event_paranoid kernel.perf_event_paranoid = 0 $ perf record -W -e ibs_op// -vv true 2>&1 \| grep switching switching off PERF_FORMAT_LOST support switching off weight struct support switching off bpf_event switching off ksymbol switching off cloexec flag switching off mmap2 switching off exclude_guest, exclude_host Intestingly, I found it sets the exclude_bit if "u" modifier is used. I don't know why but it's neither intuitive nor consistent. Let's remove the bit there too. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:52:11 -07:00
Namhyung Kim	bb6e7cb11d	perf tools: Add fallback for exclude_guest Commit `7b100989b4` ("perf evlist: Remove __evlist__add_default") changed to parse "cycles:P" event instead of creating a new cycles event for perf record. But it also changed the way how modifiers are handled so it doesn't set the exclude_guest bit by default. It seems Apple M1 PMU requires exclude_guest set and returns EOPNOTSUPP if not. Let's add a fallback so that it can work with default events. Also update perf stat hybrid tests to handle possible u or H modifiers. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-2-namhyung@kernel.org Fixes: `7b100989b4` ("perf evlist: Remove __evlist__add_default") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-22 09:51:22 -07:00
Brian Geffon	3e2d4df574	perf tools: sched-pipe bench: add (-n) nonblocking benchmark The -n mode will benchmark pipes in a non-blocking mode using epoll_wait. This specific mode was added to demonstrate the broken sync nature of epoll: https://lore.kernel.org/lkml/20240426-zupfen-jahrzehnt-5be786bcdf04@brauner Signed-off-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20241016190009.866615-1-bgeffon@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-21 21:23:01 -07:00
Arnaldo Carvalho de Melo	915a377627	perf test: Document the -w/--workload option Wasn't documented so far, mention that it is mostly used in the shell regression tests. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/r/20241020021842.1752770-4-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-21 21:10:50 -07:00
Arnaldo Carvalho de Melo	13c138308d	perf test: Introduce --list-workloads to list the available workloads Using it: $ perf test -w noplop No workload found: noplop $ $ perf test -w Error: switch `w' requires a value Usage: perf test [<options>] [{list <test-name-fragment>\|[<test-name-fragments>\|<test-numbers>]}] -w, --workload <work> workload to run for testing, use '--list-workloads' to list the available ones. $ $ perf test --list-workloads noploop thloop leafloop sqrtloop brstack datasym landlock $ Would be good at some point to have a description in 'struct test_workload'. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/r/20241020021842.1752770-3-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-21 21:10:33 -07:00
Arnaldo Carvalho de Melo	18b63d63cd	perf test: Introduce workloads__for_each() And use it in run_workload(). Testing it: root@x1:~# perf trace -e landlock perf test -w landlock 0.000 ( 0.015 ms): :1274331/1274331 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffd3fea55e0, flags: 45) = -1 EINVAL (Invalid argument) 0.018 ( 0.003 ms): :1274331/1274331 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_NET_PORT, rule_attr: 0x7ffd3fea55f0, flags: 45) = -1 EINVAL (Invalid argument) root@x1:~# perf test -w bla No workload found: bla root@x1:~# Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/r/20241020021842.1752770-2-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-21 21:10:06 -07:00
Sandipan Das	46610ba41e	perf vendor events amd: Update Zen 5 data cache fill events For events that count data cache fills, some combinations of the unit mask bits are useful for counting fills from local caches, DRAM or any far sources. However, named events currently exist for PMCx044 (Any Data Cache Fills) only. Add similar events for the following base events. * PMCx043 (Demand Data Cache Fills) * PMCx059 (Software Prefetch Data Cache Fills) * PMCx05A (Hardware Prefetch Data Cache Fills) While at it, remove "ls_any_fills_from_sys.all_dram_io" since it is a duplicate of "ls_any_fills_from_sys.dram_io_all". Event descriptions can be found in Section 2.1.16.5.2 "Load/Store (LS) Events" of the Processor Programming Reference (PPR) for AMD Family 1Ah Model 02h Revision C1 Processors document available at the link below. Link: https://bugzilla.kernel.org/attachment.cgi?id=307010 Signed-off-by: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ananth.narayan@amd.com Cc: ravi.bangoria@amd.com Cc: eranian@google.com Link: https://lore.kernel.org/r/e036e3c9fb962c939fa06c855b68e532ee609e01.1729242778.git.sandipan.das@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:41:51 -07:00
Sandipan Das	17aedce6e0	perf vendor events amd: Add Zen 5 data fabric metrics Add data fabric metrics taken from Section 2.1.16.2 "Performance Measurement" in the Processor Programming Reference (PPR) for AMD Family 1Ah Model 02h Revision C1 Processors document available at the link below. The recommended metrics are sourced from Table 28 "Guidance for Common Performance Statistics with Complex Event Selects". They capture data bandwidth for various links and interfaces in the data fabric. Link: https://bugzilla.kernel.org/attachment.cgi?id=307010 Signed-off-by: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ananth.narayan@amd.com Cc: ravi.bangoria@amd.com Cc: eranian@google.com Link: https://lore.kernel.org/r/e8757bb9f511907a52bc182de9395c5edec2fccf.1729242778.git.sandipan.das@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:41:51 -07:00
Sandipan Das	f101a8e345	perf vendor events amd: Add Zen 5 data fabric events Add data fabric events taken from Section 2.1.16.2 "Performance Measurement" in the Processor Programming Reference (PPR) for AMD Family 1Ah Model 02h Revision C1 Processors document available at the link below. This constitutes events which capture the flow of data beats at various links and interfaces in the data fabric. Link: https://bugzilla.kernel.org/attachment.cgi?id=307010 Signed-off-by: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ananth.narayan@amd.com Cc: ravi.bangoria@amd.com Cc: eranian@google.com Link: https://lore.kernel.org/r/198049e27366f3980e4991b95cec5eaac6d31d75.1729242778.git.sandipan.das@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:41:51 -07:00
Thomas Richter	21677f653f	perf test: Fix perf test case 84 on s390 Perf test case 84 'perf pipe recording and injection test' sometime fails on s390, especially on z/VM virtual machines. This is caused by a very short run time of workload # perf test -w noploop which runs for 1 second. Occasionally this is not long enough and the perf report has no samples for symbol noploop. Fix this and enlarge the runtime for the perf work load to 3 seconds. This ensures the symbol noploop is always present. Since only s390 is affected, make this loop architecture dependend. Output before: Inject -b build-ids test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.277 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.160 MB /tmp/perf.data.ELzRdq (4031 samples) ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] Inject -b build-ids test [Success] Inject --buildid-all build-ids test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB - ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB - ] Inject --buildid-all build-ids test [Failed - cannot find noploop function in pipe #2] Output after: Successful execution for over 10 times in a loop. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Cc: agordeev@linux.ibm.com Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Link: https://lore.kernel.org/r/20241018081732.1391060-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:39:54 -07:00
Namhyung Kim	e2cb1db7da	perf test: Update all metrics test like metricgroups test Like in the metricgroup tests, it should check the permission first and then skip relevant failures accordingly. Also it needs to try again with the system wide flag properly. On the second round, check if the result has the metric name because other failure cases are checked in the first round already. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241018204306.741972-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-19 09:34:56 -07:00
Ian Rogers	5455d89bf3	perf build: Rename CONFIG_DWARF to CONFIG_LIBDW In Makefile.config for unwinding the name dwarf implies either libunwind or libdw. Make it clearer that CONFIG_DWARF is really just defined when libdw is present by renaming to CONFIG_LIBDW. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	8838abf626	perf build: Rename HAVE_DWARF_SUPPORT to HAVE_LIBDW_SUPPORT In Makefile.config for unwinding the name dwarf implies either libunwind or libdw. Make it clearer that HAVE_DWARF_SUPPORT is really just defined when libdw is present by renaming to HAVE_LIBDW_SUPPORT. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	5eb2242513	perf libdw: Remove unnecessary defines As HAVE_DWARF_GETLOCATIONS_SUPPORT and HAVE_DWARF_CFI_SUPPORT always match HAVE_DWARF_SUPPORT remove the macros and use HAVE_DWARF_SUPPORT. If building the file is guarded by CONFIG_DWARF then remove all ifs. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	91e81e988f	perf probe: Move elfutils support check to libdw check The test _ELFUTILS_PREREQ(0, 142) is false for elfutils before 2009-06-13, but that is 15 years ago and very unlikely. Add a test to test-libdw.c and assume the libdw version is at least 0.142 to simplify the build logic. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	26385fd237	perf build: Combine test-dwarf-getcfi into test-libdw dwarf_getcfi support in libdw is 15 years old. Make libdw imply dwarf_getcfi support and simplify build logic. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	23580d7bb1	perf build: Combine test-dwarf-getlocations into test-libdw dwarf_getlocations support in libdw is more than 10 years old. Make libdw imply dwarf_getlocations support and simplify build logic. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	3034b48a4b	perf build: Combine libdw-dwarf-unwind into libdw feature tests Support in libdw has been present for 10 years so let's simplify the build logic with a single feature test. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	7c943261a1	perf build: Rename test-dwarf to test-libdw Be more intention revealing that the dwarf test is actually testing for libdw support. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	a6c55df973	perf build: Remove defined but never used variable Previously NO_DWARF_UNWIND was part of conditional compilation but it is now unused so remove. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	54a1368567	perf build: Rename NO_DWARF to NO_LIBDW NO_DWARF could mean more than NO_LIBDW support, in particular no libunwind support. Rename to be more intention revealing. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	a9823dae4c	perf build: Fix LIBDW_DIR Testing with a LIBDW_DIR showed that in Makefile.config the dwarf feature tests need the LIBDW_DIR setting in the CFLAGS/LDFLAGS. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-18 10:17:40 -07:00
Ian Rogers	8296aa0f28	perf test: Move attr files into shell directory where they are used Now the attr tests are shell tests move the associated python and configuration files. Update the installation build rules for the new directories. Recycle the lib install rules for python files allowing the explicit attr.py install line to be dropped. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241015000158.871828-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 13:17:36 -07:00
Ian Rogers	3a447031f5	perf test: Remove C test wrapper for attr.py Remove the C wrapper now a shell script wrapper exists. Move perf_event_attr dumping functions to evsel.c and reduce the scope of variables/defines. Use fprintf to avoid snprintf complexities in WRITE_ASS. Add __SANE_USERSPACE_TYPES__ to evsel.c to fix format flag issues on PowerPC triggered by moving attr.c functions to evsel.c. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241015000158.871828-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 13:17:36 -07:00
Ian Rogers	8519e4f44c	perf test: Add a shell wrapper for "Setup struct perf_event_attr" The "Setup struct perf_event_attr" test in attr.c does a bunch of directory finding to set up running a python test that in general is more brittle than similar logic we have in shell tests. Add a shell test that invokes and runs the tests in the python attr.py script. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241015000158.871828-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 13:17:36 -07:00
Leo Yan	314909f13c	perf probe: Correct demangled symbols in C++ program An issue can be observed when probe C++ demangled symbol with steps: # nm test_cpp_mangle \| grep print_data 0000000000000c94 t _GLOBAL__sub_I__Z10print_datai 0000000000000afc T _Z10print_datai 0000000000000b38 T _Z10print_dataR5Point # perf probe -x /home/niayan01/test_cpp_mangle -F --demangle ... print_data(Point&) print_data(int) ... # perf --debug verbose=3 probe -x test_cpp_mangle --add "test=print_data(int)" probe-definition(0): test=print_data(int) symbol:print_data(int) file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Open Debuginfo file: /home/niayan01/test_cpp_mangle Try to find probe point from debuginfo. Symbol print_data(int) address found : afc Matched function: print_data [2ccf] Probe point found: print_data+0 Found 1 probe_trace_events. Opening /sys/kernel/tracing//uprobe_events write=1 Opening /sys/kernel/tracing//README write=0 Writing event: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0xb38 ... When tried to probe symbol "print_data(int)", the log shows: Symbol print_data(int) address found : afc The found address is 0xafc - which is right with verifying the output result from nm. Afterwards when write event, the command uses offset 0xb38 in the last log, which is a wrong address. The dwarf_diename() gets a common function name, in above case, it returns string "print_data". As a result, the tool parses the offset based on the common name. This leads to probe at the wrong symbol "print_data(Point&)". To fix the issue, use the die_get_linkage_name() function to retrieve the distinct linkage name - this is the mangled name for the C++ case. Based on this unique name, the tool can get a correct offset for probing. Based on DWARF doc, it is possible the linkage name is missed in the DIE, it rolls back to use dwarf_diename(). After: # perf --debug verbose=3 probe -x test_cpp_mangle --add "test=print_data(int)" probe-definition(0): test=print_data(int) symbol:print_data(int) file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Open Debuginfo file: /home/niayan01/test_cpp_mangle Try to find probe point from debuginfo. Symbol print_data(int) address found : afc Matched function: print_data [2d06] Probe point found: print_data+0 Found 1 probe_trace_events. Opening /sys/kernel/tracing//uprobe_events write=1 Opening /sys/kernel/tracing//README write=0 Writing event: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0xafc Added new event: probe_test_cpp_mangle:test (on print_data(int) in /home/niayan01/test_cpp_mangle) You can now use it in all perf tools, such as: perf record -e probe_test_cpp_mangle:test -aR sleep 1 # perf --debug verbose=3 probe -x test_cpp_mangle --add "test2=print_data(Point&)" probe-definition(0): test2=print_data(Point&) symbol:print_data(Point&) file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Open Debuginfo file: /home/niayan01/test_cpp_mangle Try to find probe point from debuginfo. Symbol print_data(Point&) address found : b38 Matched function: print_data [2ccf] Probe point found: print_data+0 Found 1 probe_trace_events. Opening /sys/kernel/tracing//uprobe_events write=1 Parsing probe_events: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0x0000000000000afc Group:probe_test_cpp_mangle Event:test probe:p Opening /sys/kernel/tracing//README write=0 Writing event: p:probe_test_cpp_mangle/test2 /home/niayan01/test_cpp_mangle:0xb38 Added new event: probe_test_cpp_mangle:test2 (on print_data(Point&) in /home/niayan01/test_cpp_mangle) You can now use it in all perf tools, such as: perf record -e probe_test_cpp_mangle:test2 -aR sleep 1 Fixes: `fb1587d869` ("perf probe: List probes with line number and file name") Signed-off-by: Leo Yan <leo.yan@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20241012141432.877894-1-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:55:48 -07:00
Ian Rogers	17df33fe22	perf stat: Disable metric thresholds for CSV and JSON metric-only mode These modes don't use the threshold, so don't compute it saving time and potentially reducing events. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	f9825601aa	perf stat: Add metric-threshold to json output When the threshold isn't unknown add a value to the json like: "metric-threshold" : "good" A more complete example: ``` $ perf stat -a -j -I 1000 {"interval" : 1.001089747, "counter-value" : "16045.281449", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 16045355135, "pcnt-running" : 100.00, "metric-value" : "16.045281", "metric-unit" : "CPUs utilized"} {"interval" : 1.001089747, "counter-value" : "10003.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 16045314844, "pcnt-running" : 100.00, "metric-value" : "623.423156", "metric-unit" : "/sec"} {"interval" : 1.001089747, "counter-value" : "328.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 16045321403, "pcnt-running" : 100.00, "metric-value" : "20.442147", "metric-unit" : "/sec"} {"interval" : 1.001089747, "counter-value" : "20114.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 16045355927, "pcnt-running" : 100.00, "metric-value" : "1.253577", "metric-unit" : "K/sec"} {"interval" : 1.001089747, "counter-value" : "4066679471.000000", "unit" : "", "event" : "instructions", "event-runtime" : 16045369123, "pcnt-running" : 100.00, "metric-value" : "1.628330", "metric-unit" : "insn per cycle"} {"interval" : 1.001089747, "counter-value" : "2497454658.000000", "unit" : "", "event" : "cycles", "event-runtime" : 16045374810, "pcnt-running" : 100.00, "metric-value" : "0.155650", "metric-unit" : "GHz"} {"interval" : 1.001089747, "counter-value" : "914974294.000000", "unit" : "", "event" : "branches", "event-runtime" : 16045379877, "pcnt-running" : 100.00, "metric-value" : "57.024509", "metric-unit" : "M/sec"} {"interval" : 1.001089747, "counter-value" : "9237201.000000", "unit" : "", "event" : "branch-misses", "event-runtime" : 16045375017, "pcnt-running" : 100.00, "metric-value" : "1.009559", "metric-unit" : "of all branches", "metric-threshold" : "good"} {"interval" : 1.001089747, "event-runtime" : 16045397172, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1"} {"interval" : 1.001089747, "metric-value" : "22.036686", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"} {"interval" : 1.001089747, "metric-value" : "7.610161", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"} {"interval" : 1.001089747, "metric-value" : "36.729687", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"} {"interval" : 1.001089747, "metric-value" : "33.623465", "metric-unit" : "% tma_retiring"} ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	37b77ae954	perf stat: Change color to threshold in print_metric Colors don't mean things in CSV and JSON output, switch to a threshold enum value that the standard output can convert to a color. Updating the CSV and JSON output will be later changes. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	e1cc918b6c	perf stat: Drop metric-unit if unit is NULL Avoid cases like: ``` $ perf stat -a -M topdownl1 -j -I 1000 ... {"interval" : 11.127757275, "counter-value" : "85715898.000000", "unit" : "", "event" : "IDQ.MITE_UOPS", "event-runtime" : 988376123, "pcnt-running" : 100.00, "metric-value" : "0.000000", "metric-unit" : "(null)"} ... ``` If there is no unit then drop the metric-value too as: Suggested-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	1133e7f7dc	perf stat: Display "none" for NaN with metric only json Return earlier for an empty unit case. If snprintf of the fmt doesn't produce digits between vals and ends, as happens with NaN, make the value "none" as happens in print_metric_end. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	9809b2b1f2	perf stat: Fix/add parameter names for print_metric The print_metric parameter names were rearranged, fix and add comments in the stat-shadow callers to ensure they are correct. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	58fc358a3e	perf color: Add printf format checking and resolve issues Add printf format checking to vararg printf routines in color.h. Resolve build errors/bugs that are found through this checking. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:44:26 -07:00
Ian Rogers	4585038b8e	perf probe: Fix libdw memory leak Add missing dwarf_cfi_end to free memory associated with probe_finder cfi_eh which is allocated and owned via a call to dwarf_getcfi_elf. Confusingly cfi_dbg shouldn't be freed as its memory is owned by the passed in debuginfo struct. Add comments to highlight this. This addresses leak sanitizer issues seen in: tools/perf/tests/shell/test_uprobe_from_different_cu.sh Fixes: `270bde1e76` ("perf probe: Search both .eh_frame and .debug_frame sections for probe location") Signed-off-by: Ian Rogers <irogers@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241016235622.52166-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:43:14 -07:00
Ian Rogers	1280f012e0	perf disasm: Fix capstone memory leak The insn argument passed to cs_disasm needs freeing. To support accurately having count, add an additional free_count variable. Fixes: `c5d60de181` ("perf annotate: Add support to use libcapstone in powerpc") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: David S. Miller <davem@davemloft.net> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241016235622.52166-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 12:43:14 -07:00
Athira Rajeev	54f9aa1092	tools/perf/powerpc/util: Add support to handle compatible mode PVR for perf json events perf list picks the events supported for specific platform from pmu-events/arch/powerpc/<platform>. Example power10 events are in pmu-events/arch/powerpc/power10, power9 events are part of pmu-events/arch/powerpc/power9. The decision of which platform to pick is determined based on PVR value in powerpc. The PVR value is matched from pmu-events/arch/powerpc/mapfile.csv Example: Format: PVR,Version,JSON/file/pathname,Type 0x004[bcd][[:xdigit:]]{4},1,power8,core 0x0066[[:xdigit:]]{4},1,power8,core 0x004e[[:xdigit:]]{4},1,power9,core 0x0080[[:xdigit:]]{4},1,power10,core 0x0082[[:xdigit:]]{4},1,power10,core The code gets the PVR from system using get_cpuid_str function in arch/powerpc/util/headers.c ( from SPRN_PVR ) and compares with value from mapfile.csv In case of compat mode, say when partition is booted in a power9 mode when the system is a power10, this picks incorrectly. Because PVR will point to power10 where as it should pick events from power9 folder. To support generic events, add new folder pmu-events/arch/powerpc/compat to contain the ISA architected events which is supported in compat mode. Also return 0x00ffffff as pvr when booted in compat mode. Based on this pvr value, json will pick events from pmu-events/arch/powerpc/compat Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel<disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241010145107.51211-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 11:25:00 -07:00
Athira Rajeev	86f45d0f17	tools/perf/pmu-events/powerpc: Add support for compat events in json perf list picks the events supported for specific platform from pmu-events/arch/powerpc/<platform>. Example power10 events are in pmu-events/arch/powerpc/power10, power9 events are part of pmu-events/arch/powerpc/power9. The decision of which platform to pick is determined based on PVR value in powerpc. The PVR value is matched from pmu-events/arch/powerpc/mapfile.csv Example: Format: PVR,Version,JSON/file/pathname,Type 0x004[bcd][[:xdigit:]]{4},1,power8,core 0x0066[[:xdigit:]]{4},1,power8,core 0x004e[[:xdigit:]]{4},1,power9,core 0x0080[[:xdigit:]]{4},1,power10,core 0x0082[[:xdigit:]]{4},1,power10,core The code gets the PVR from system using get_cpuid_str function in arch/powerpc/util/headers.c ( from SPRN_PVR ) and compares with value from mapfile.csv In case of compat mode, say when partition is booted in a power9 mode when the system is a power10, add an entry to pick the ISA architected events from "pmu-events/arch/powerpc/compat". Add json file generic-events.json which will contain these events which is supported in compat mode. Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241010145107.51211-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 11:24:52 -07:00
Veronika Molnarova	05a62936e6	perf dso: Fix symtab_type for kmod compression During the rework of the dso structure in patch `ee756ef749` an increment was forgotten for the symtab_type in case the data for the kernel module are compressed. This affects the probing of the kernel modules, which fails if the data are not already cached. Increment the value of the symtab_type to its compressed variant so the data could be recovered successfully. Fixes: `ee756ef749` ("perf dso: Add reference count checking and accessor functions") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Michael Petlan <mpetlan@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20241010144836.16424-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:59 -07:00
Leo Yan	e34f6ac511	perf probe: Improve log for long event name failure If a symbol name is longer than the maximum event length (64 bytes), the perf tool reports error: # perf probe -x test_cpp_mangle --add "this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)" snprintf() failed: -7; the event name nbase='this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)' is too long Error: Failed to add events. An information is missed in the log that the symbol name and the event name can be set separately. Especially, this is recommended for adding probe for a long symbol. This commit refines the log for reminding event syntax. After: # perf probe -x test_cpp_mangle --add "this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)" snprintf() failed: -7; the event name 'this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)' is too long Hint: Set a shorter event with syntax "EVENT=PROBEDEF" EVENT: Event name (max length: 64 bytes). Error: Failed to add events. Signed-off-by: Leo Yan <leo.yan@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20241012204725.928794-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:59 -07:00
Leo Yan	6768faf9b7	perf probe: Check group string length In the kernel, the probe group string length is limited up to MAX_EVENT_NAME_LEN (including the NULL terminator). Check for this limitation and report an error if it is exceeded. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20241012204725.928794-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:58 -07:00
Leo Yan	d08e3f14e8	perf probe: Use the MAX_EVENT_NAME_LEN macro The MAX_EVENT_NAME_LEN macro has been defined in the kernel. Use the same definition in the tool for more readable. Signed-off-by: Leo Yan <leo.yan@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20241012204725.928794-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:58 -07:00
Namhyung Kim	3662f82f16	perf test: Speed up some tests using perf list On my system, perf list is very slow to print the whole events. I think there's a performance issue in SDT and uprobes event listing. I noticed this issue while running perf test on x86 but it takes long to check some CoreSight event which should be skipped quickly. Anyway, some test uses perf list to check whether the required event is available before running the test. The perf list command can take an argument to specify event class or (glob) pattern. But glob pattern is only to suppress output for unmatched ones after checking all events. In this case, specifying event class is better to reduce the number of events it checks and to avoid buggy subsystems entirely. No functional changes intended. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20241016065654.269994-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-17 09:55:58 -07:00
Arnaldo Carvalho de Melo	39c6a35620	perf trace: The return from 'write' isn't a pid When adding a explicit beautifier for the 'write' syscall when the BPF based buffer collector was introduced there was a cut'n'paste error that carried the syscall_fmt->errpid setting from a nearby syscall (waitid) that returns a pid. So the write return was being suppressed by the return pretty printer, remove that field, reverting it back to the default return handler, that prints positive numbers as-is and interpret negative values as errnos. I actually introduced the problem while making Howard's original patch work just with the 'write' syscall, as we couldn't just look for any buffers, the ones that are filled in by the kernel couldn't use the same sys_enter BPF collector. Fixes: `b257fac12f` ("perf trace: Pretty print buffer data") Reported-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/lkml/bcf50648-3c7e-4513-8717-0d14492c53b9@linaro.org Link: https://lore.kernel.org/all/Zt8jTfzDYgBPvFCd@x1/#t Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-17 10:34:43 -03:00
Dapeng Mi	fbc798316b	perf x86/topdown: Refine helper arch_is_topdown_metrics() Leverage the existed function perf_pmu__name_from_config() to check if an event is topdown metrics event. perf_pmu__name_from_config() goes through the defined formats and figures out the config of pre-defined topdown events. This avoids to figure out the config of topdown pre-defined events with hard-coded format strings "event=" and "umask=" and provides more flexibility. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241011110207.1032235-2-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-16 13:36:47 -07:00
Dapeng Mi	b68b5b36c7	perf x86/topdown: Make topdown metrics comparators be symmetric The commit "3b5edc0421e2 (perf x86/topdown: Don't move topdown metric events in group)" modifies topdown metrics comparator to move topdown metrics events which are not in same group with previous event. But it just modifies the 2nd comparator and causes the comparators become asymmetric. Thus modify the 1st topdown metrics comparator and make the two comparators be symmetric, and refine the comments as well. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241011110207.1032235-1-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-16 13:36:41 -07:00
Ian Rogers	42fd7cac57	perf tool_pmu: Remove duplicate io.h header Remove duplicate inclusion of api/io.h. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410131417.ynhvnEJb-lkp@intel.com/ Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241016160413.51587-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-16 13:35:04 -07:00
Leo Yan	ea2ead4224	perf arm-spe: Add Cortex CPUs to common data source encoding list Add Cortex-A720, Cortex-A725, Cortex-X1C, Cortex-X3 and Cortex-X925 into the common data source encoding list. For everyone of these CPUs, it technical reference manual defines the data source packet as the common encoding format. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-8-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:32 -07:00
Besar Wicaksono	041c0e5715	perf arm-spe: Add Neoverse-V2 to common data source encoding list Add Neoverse-V2 MIDR to the common data source encoding range list. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-7-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:32 -07:00
Leo Yan	6bcf54c89b	perf arm-spe: Remove the unused 'midr' field The 'midr' field is replaced by the MIDR values stored in metadata (per CPU wise). Remove the 'midr' field as it is no longer used. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-6-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Leo Yan	ba5e7169e5	perf arm-spe: Use metadata to decide the data source feature Use the info in the metadata to decide if the data source feature is supported. The CPU MIDR must be in the CPU list for the common data source encoding. For the metadata version 1, it doesn't include info for MIDR. In this case, due to absent info for making decision, print out warning to remind users to upgrade tool and returns false. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-5-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Leo Yan	56ae663e76	perf arm-spe: Introduce arm_spe__is_homogeneous() Introduce the arm_spe__is_homogeneous() function, it uses to check if Arm SPE is homogeneous cross all CPUs. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Leo Yan	50b8f1d5bf	perf arm-spe: Rename the common data source encoding The Neoverse CPUs follow the common data source encoding, and other CPU variants can share the same format. Rename the CPU list and data source definitions as common data source names. This change prepares for appending more CPU variants. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Leo Yan	fb98fa3bf8	perf arm-spe: Rename arm_spe__synth_data_source_generic() The arm_spe__synth_data_source_generic() function is invoked when the tool detects that CPUs do not support data source packets and falls back to synthesizing only the memory level. Rename it to arm_spe__synth_memory_level() for better reflecting its purpose. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Howard Chu	0c383c0827	perf test: Delete unused Intel CQM test As Ian Rogers <irogers@google.com> pointed out, intel-cqm.c is neither used nor built. It was deleted in the following commit: commit `b24413180f` ("License cleanup: add SPDX GPL-2.0 license identifier to files with no license") However, it resurfaced soon after in the following commit: commit `5c9295bfe6` ("perf tests: Remove Intel CQM perf test") It should be deleted once and for all. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Matt Fleming <mfleming@cloudflare.com> Link: https://lore.kernel.org/r/20241011055700.4142694-1-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Namhyung Kim	1afe05b0cf	perf evsel: Fix missing inherit + sample read check It should not clear the inherit bit simply because the kernel doesn't support the sample read with it. IOW the inherit bit should be kept when the sample read is not requested for the event. Fixes: `90035d3cd8` ("tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events") Acked-by: Ben Gainey <ben.gainey@arm.com> Link: https://lore.kernel.org/r/20241009062250.730192-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Madadi Vineeth Reddy	cd912ab3b6	perf sched timehist: Add pre-migration wait time option pre-migration wait time is the time that a task unnecessarily spends on the runqueue of a CPU but doesn't get switched-in there. In terms of tracepoints, it is the time between sched:sched_wakeup and sched:sched_migrate_task. Let's say a task woke up on CPU2, then it got migrated to CPU4 and then it's switched-in to CPU4. So, here pre-migration wait time is time that it was waiting on runqueue of CPU2 after it is woken up. The general pattern for pre-migration to occur is: sched:sched_wakeup sched:sched_migrate_task sched:sched_switch The sched:sched_waking event is used to capture the wakeup time, as it aligns with the existing code and only introduces a negligible time difference. pre-migrations are generally not useful and it increases migrations. This metric would be helpful in testing patches mainly related to wakeup and load-balancer code paths as better wakeup logic would choose an optimal CPU where task would be switched-in and thereby reducing pre- migrations. The sample output(s) when -P or --pre-migrations is used: ================= time cpu task name wait time sch delay run time pre-mig time [tid/pid] (msec) (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- --------- 38456.720806 [0001] schbench[28634/28574] 4.917 4.768 1.004 0.000 38456.720810 [0001] rcu_preempt[18] 3.919 0.003 0.004 0.000 38456.721800 [0006] schbench[28779/28574] 23.465 23.465 1.999 0.000 38456.722800 [0002] schbench[28773/28574] 60.371 60.237 3.955 60.197 38456.722806 [0001] schbench[28634/28574] 0.004 0.004 1.996 0.000 38456.722811 [0001] rcu_preempt[18] 1.996 0.005 0.005 0.000 38456.723800 [0000] schbench[28833/28574] 4.000 4.000 3.999 0.000 38456.723800 [0004] schbench[28762/28574] 42.951 42.839 3.999 39.867 38456.723802 [0007] schbench[28812/28574] 43.947 43.817 3.999 40.866 38456.723804 [0001] schbench[28587/28574] 7.935 7.822 0.993 0.000 Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Link: https://lore.kernel.org/r/20241004170756.18064-1-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Namhyung Kim	af3902bfc1	perf tools: Remove unnecessary parentheses The hashmap API used to require parentheses for the hashmap argument if it's not a pointer type. It's now fixed so let's drop the parentheses. Link: https://lore.kernel.org/r/20241009202009.884884-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Namhyung Kim	04042674b2	perf tools: Fix possible compiler warnings in hashmap The hashmap__for_each_entry[_safe] is accessing 'map' as if it's a pointer. But it does without parentheses so passing a static hash map with an ampersand (like &slab_hash below) caused compiler warnings due to unmatched types. In file included from util/bpf_lock_contention.c:5: util/bpf_lock_contention.c: In function ‘exit_slab_cache_iter’: linux/tools/perf/util/hashmap.h:169:32: error: invalid type argument of ‘->’ (have ‘struct hashmap’) 169 \| for (bkt = 0; bkt < map->cap; bkt++) \ \| ^~ util/bpf_lock_contention.c:105:9: note: in expansion of macro ‘hashmap__for_each_entry’ 105 \| hashmap__for_each_entry(&slab_hash, cur, bkt) \| ^~~~~~~~~~~~~~~~~~~~~~~ /home/namhyung/project/linux/tools/perf/util/hashmap.h:170:31: error: invalid type argument of ‘->’ (have ‘struct hashmap’) 170 \| for (cur = map->buckets[bkt]; cur; cur = cur->next) \| ^~ util/bpf_lock_contention.c:105:9: note: in expansion of macro ‘hashmap__for_each_entry’ 105 \| hashmap__for_each_entry(&slab_hash, cur, bkt) \| ^~~~~~~~~~~~~~~~~~~~~~~ Cc: bpf@vger.kernel.org Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241009202009.884884-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 12:04:31 -07:00
Namhyung Kim	77b679453d	Linux 6.12-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmcMPK0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGw5kH/0GukMc4uUytezog 7UxIxa0G1zvwJwAhJpNCZR90e8GQ7YCvJFUOxjX3bVqjxZlCjEJ3YWC3fZNdx8YS fOjbuZlGiTmyKg91aVYlby5l23N+r2u6gCDBdPfJD0japiIbayBKjrL+hdEicmf3 w6qToMY20mdvRQ6SUd+Y9nrc//TONru4EhabqRU2Sf1sDzQd1qj4WPtDLSKp3YG9 hpFR7YeJaSYDjwRz1vF8tEnQVJ4I2Df3lXJZYsoSsqiQhQ1Lasp4a09ppVPysj6x oQCza6xeR3jwKib23pZIbNAF4xPMdN1OMOiYELkgHo7YGc6kxniXqSVSrP3LAvkA b92bQpc= =T5hJ -----END PGP SIGNATURE----- Merge tag 'v6.12-rc3' into perf-tools-next To get the fixes in the current perf-tools tree. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 10:45:28 -07:00
Namhyung Kim	1a3d6a9723	perf tools: Fix compiler error in util/tool_pmu.c util/tool_pmu.c: In function 'evsel__tool_pmu_read': util/tool_pmu.c:419:55: error: passing argument 2 of 'tool_pmu__read_event' from incompatible pointer type [-Werror=incompatible-pointer-types] 419 \| if (!tool_pmu__read_event(ev, &val)) { \| ^~~~ \| \| \| long unsigned int * util/tool_pmu.c:335:56: note: expected 'u64 ' {aka 'long long unsigned int '} but argument is of type 'long unsigned int ' 335 \| bool tool_pmu__read_event(enum tool_pmu_event ev, u64 result) \| ~~~~~^~~~~~ Link: https://lore.kernel.org/r/Zw1XIGML32VaxE0t@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 10:40:30 -07:00
Athira Rajeev	9ea671d1b2	tools/perf/tests: Remove duplicate evlist__delete in tests/tool_pmu.c The testcase for tool_pmu failed in powerpc as below: ./perf test -v "Parsing without PMU name" 8: Tool PMU : 8.1: Parsing without PMU name : FAILED! This happens when parse_events results in either skip or fail of an event. Because the code invokes evlist__delete(evlist) and "goto out". ret = parse_events(evlist, str, &err); if (ret) { evlist__delete(evlist); But in the "out" section also evlist__delete happens. out: evlist__delete(evlist); return ret; Hence remove the duplicate evlist__delete from the first path in the testcase With the change: # ./perf test -v "Parsing without PMU name" 8: Tool PMU : 8.1: Parsing without PMU name : Ok Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241013170732.71339-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 10:29:55 -07:00
Athira Rajeev	d94d86cee1	tools/perf/tests: Fix compilation error with strncpy in tests/tool_pmu perf fails to compile on systems with GCC version11 as below: In file included from /usr/include/string.h:519, from /home/athir/perf-tools-next/tools/include/linux/bitmap.h:5, from /home/athir/perf-tools-next/tools/perf/util/pmu.h:5, from /home/athir/perf-tools-next/tools/perf/util/evsel.h:14, from /home/athir/perf-tools-next/tools/perf/util/evlist.h:14, from tests/tool_pmu.c:3: In function ‘strncpy’, inlined from ‘do_test’ at tests/tool_pmu.c:25:3: /usr/include/bits/string_fortified.h:95:10: error: ‘__builtin_strncpy’ specified bound 128 equals destination size [-Werror=stringop-truncation] 95 \| return __builtin___strncpy_chk (__dest, __src, __len, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 96 \| __glibc_objsize (__dest)); \| ~~~~~~~~~~~~~~~~~~~~~~~~~ The compile error is from strncpy refernce in do_test: strncpy(str, tool_pmu__event_to_str(ev), sizeof(str)); This behaviour is not observed with GCC version 8, but observed with GCC version 11 . This is message from gcc for detecting truncation while using strncpu. Use snprintf instead of strncpy here to be safe. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: akanksha@linux.ibm.com Cc: hbathini@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20241013173742.71882-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-14 10:29:14 -07:00
Thomas Falcon	48966a5a48	perf report: Display columns Predicted/Abort/Cycles in --branch-history The original commit message: " Use current sort mechanism but the real .se_cmp() just returns 0 so that new columns "Predicted", "Abort" and "Cycles" are created in display but actually these keys are not the sort keys. For example: Overhead Source:Line Symbol Shared Object Predicted Abort Cycles ........ ............ ........ ............. ......... ..... ...... 38.25% div.c:45 [.] main div 97.6% 0 3 " Update missed commit from series "perf report: Show branch flags/cycles in --branch-history callgraph view" to apply to current repository so that new columns described above are visible. Link to original series: https://lore.kernel.org/lkml/1477876794-30749-1-git-send-email-yao.jin@linux.intel.com/ Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Suggested-by: Kan Liang <kan.liang@linux.intel.com> Co-developed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20241010184046.203822-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:41:23 -07:00
Ian Rogers	8c25df7af3	perf tests: Add tool PMU test Ensure parsing with and without PMU creates events with the expected config values. This ensures the tool.json doesn't get out of sync with tool_pmu_event enum. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:41:13 -07:00
Ian Rogers	609aa2667f	perf tool_pmu: Switch to standard pmu functions and json descriptions Use the regular PMU approaches with tool json events to reduce the amount of special tool_pmu code - tool_pmu__config_terms and tool_pmu__for_each_event_cb are removed. Some functions remain, like tool_pmu__str_to_event, as conveniences to metricgroups. Add tool_pmu__skip_event/tool_pmu__num_skip_events to handle the case that tool json events shouldn't appear on certain architectures. This isn't done in jevents.py due to complexity in the empty-pmu-events.c and when all vendor json is built into the tool. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:33 -07:00
Ian Rogers	c9b121b7fa	perf jevents: Add tool event json under a common architecture Introduce the notion of a common architecture/model that can be used to find event tables for common PMUs like the tool PMU. By having tool events be json standard PMU attribute configuration, descriptions, etc. can be used and these routines are already optimized for things like binary searching. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:33 -07:00
Ian Rogers	069057239a	perf tool_pmu: Move expr literals to tool_pmu Add the expr literals like "#smt_on" as tool events, this allows stat events to give the values. On my laptop with hyperthreading enabled: ``` $ perf stat -e "has_pmem,num_cores,num_cpus,num_cpus_online,num_dies,num_packages,smt_on,system_tsc_freq" true Performance counter stats for 'true': 0 has_pmem 8 num_cores 16 num_cpus 16 num_cpus_online 1 num_dies 1 num_packages 1 smt_on 2,496,000,000 system_tsc_freq 0.001113637 seconds time elapsed 0.001218000 seconds user 0.000000000 seconds sys ``` And with hyperthreading disabled: ``` $ perf stat -e "has_pmem,num_cores,num_cpus,num_cpus_online,num_dies,num_packages,smt_on,system_tsc_freq" true Performance counter stats for 'true': 0 has_pmem 8 num_cores 16 num_cpus 8 num_cpus_online 1 num_dies 1 num_packages 0 smt_on 2,496,000,000 system_tsc_freq 0.000802115 seconds time elapsed 0.000000000 seconds user 0.000806000 seconds sys ``` As zero matters for these values, in stat-display should_skip_zero_counter only skip the zero value if it is not the first aggregation index. The tool event implementations are used in expr but not evaluated as events for simplicity. Also core_wide isn't made a tool event as it requires command line parameters. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	b8f1a1b068	perf tool_pmu: Rename perf_tool_event__* to tool_pmu__* Now the events are associated with the tool PMU, rename the functions to reflect this. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	0709a82c10	perf tool_pmu: Rename enum perf_tool_event to tool_pmu_event To better reflect the events listed are from the tool PMU. Rename the enum values from PERF_TOOL_* to TOOL_PMU__EVENT_*. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	240505b2d0	perf tool_pmu: Factor tool events into their own PMU Rather than treat tool events as a special kind of event, create a tool only PMU where the events/aliases match the existing duration_time, user_time and system_time events. Remove special parsing and printing support for the tool events, but add function calls for when PMU functions are called on a tool_pmu. Move the tool PMU code in evsel into tool_pmu.c to better encapsulate the tool event behavior in that file. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	d2f3ecb0ca	perf parse-events: Expose/rename config_term_name Expose config_term_name as parse_events__term_type_str so that PMUs not in pmu.c may access it. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	c798f72c7a	perf pmu: Allow hardcoded terms to be applied to attributes Hard coded terms like "config=10" are skipped by perf_pmu__config assuming they were already applied to a perf_event_attr by parse event's config_attr function. When doing a reverse number to name lookup in perf_pmu__name_from_config, as the hardcoded terms aren't applied the config value is incorrect leading to misses or false matches. Fix this by adding a parameter to have perf_pmu__config apply hardcoded terms too (not just in parse event's config_term_common). Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Ian Rogers	c051220d38	perf pmu: Simplify an asprintf error message Use ifs rather than ?: to avoid a large compound statement. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:40:32 -07:00
Dr. David Alan Gilbert	c7c1bb78f3	perf tools: Remove unused color_fwrite_lines color_fwrite_lines() was added by 2009's commit `8fc0321f1a` ("perf_counter tools: Add color terminal output support") but has never been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241009003938.254936-1-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-10 23:38:33 -07:00
Thomas Falcon	9f759d41b3	perf test x86: Fix typo in intel-pt-test Change function name "is_hydrid" to "is_hybrid". Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20241007194758.78659-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-09 10:52:08 -07:00
Dr. David Alan Gilbert	3c4e558787	perf probe: Remove unused add_perf_probe_events add_perf_probe_events has been unused since 2015's commit `b02137cc65` ("perf probe: Move print logic into cmd_probe()") which confusingly now uses perf_add_probe_events. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240929010659.430208-1-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-09 10:52:08 -07:00
Linus Torvalds	b2760b8390	perf tools fixes for v6.12: - Fix an assert() to handle captured and unprocessed ARM CoreSight CPU traces. - Fix static build compilation error when libdw isn't installed or is too old. - Add missing include when building with !HAVE_DWARF_GETLOCATIONS_SUPPORT. - Add missing refcount put on 32-bit DSOs. - Fix disassembly of user space binaries by setting the binary_type of DSO when loading. - Update headers with the kernel sources, including asound.h, sched.h, fcntl, msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's cputype.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZwU2dgAKCRCyPKLppCJ+ J8uaAQDEbp0lMf1S/Y6vOGbnP6mGQCewQsXtIpSA4gcRMWlCCgD+O6ZxbnBCHOzn nQfBmbT62qUGuUA38Mg7pCyRXBd8FgU= =s4JZ -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix an assert() to handle captured and unprocessed ARM CoreSight CPU traces - Fix static build compilation error when libdw isn't installed or is too old - Add missing include when building with !HAVE_DWARF_GETLOCATIONS_SUPPORT - Add missing refcount put on 32-bit DSOs - Fix disassembly of user space binaries by setting the binary_type of DSO when loading - Update headers with the kernel sources, including asound.h, sched.h, fcntl, msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's cputype.h * tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace perf build: Fix build feature-dwarf_getlocations fail for old libdw perf build: Fix static compilation error when libdw is not installed perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT tools headers arm64: Sync arm64's cputype.h with the kernel sources perf tools: Cope with differences for lib/list_sort.c copy from the kernel tools check_headers.sh: Add check variant that excludes some hunks perf beauty: Update copy of linux/socket.h with the kernel sources tools headers UAPI: Sync the linux/in.h with the kernel sources perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources tools include UAPI: Sync linux/fcntl.h copy with the kernel sources tools include UAPI: Sync linux/sched.h copy with the kernel sources tools include UAPI: Sync sound/asound.h copy with the kernel sources perf vdso: Missed put on 32-bit dsos perf symbol: Set binary_type of dso when loading	2024-10-08 10:43:22 -07:00
Veronika Molnarova	6bff76af96	perf test attr: Add back missing topdown events With the patch `0b6c5371c0` "Add missing topdown metrics events" eight topdown metric events with numbers ranging from 0x8000 to 0x8700 were added to the test since they were added as 'perf stat' default events. Later the patch `951efb9976` "Update no event/metric expectations" kept only 4 of those events(0x8000-0x8300). Currently, the topdown events with numbers 0x8400 to 0x8700 are missing from the list of expected events resulting in a failure. Add back the missing topdown events. Fixes: `951efb9976` ("perf test attr: Update no event/metric expectations") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Tested-by: Ian Rogers <irogers@google.com> Cc: mpetlan@redhat.com Link: https://lore.kernel.org/r/20240311081611.7835-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:50:12 -07:00
Leo Yan	e52abceb4b	perf arm-spe: Dump metadata with version 2 This commit dumps metadata with version 2. It dumps metadata for header and per CPU data respectively in the arm_spe_print_info() function to support metadata version 2 format. After: 0 0 0x3c0 [0x1b0]: PERF_RECORD_AUXTRACE_INFO type: 4 Header version :2 Header size :4 PMU type v2 :13 CPU number :8 Magic :0x1010101010101010 CPU # :0 Num of params :3 MIDR :0x410fd801 PMU Type :-1 Min Interval :0 Magic :0x1010101010101010 CPU # :1 Num of params :3 MIDR :0x410fd801 PMU Type :-1 Min Interval :0 Magic :0x1010101010101010 CPU # :2 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :3 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :4 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :5 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :6 Num of params :3 MIDR :0x410fd850 PMU Type :-1 Min Interval :0 Magic :0x1010101010101010 CPU # :7 Num of params :3 MIDR :0x410fd850 PMU Type :-1 Min Interval :0 Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-6-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:31 -07:00
Leo Yan	7842a4b6ff	perf arm-spe: Support metadata version 2 This commit is to support metadata version 2 and at the meantime it is backward compatible for version 1's format. The metadata version 1 doesn't include the ARM_SPE_HEADER_VERSION field. As version 1 is fixed with two u64 fields, by checking the metadata size, it distinguishes the metadata is version 1 or version 2 (and any new versions if later will have). For version 2, it reads out CPU number and retrieves the metadata info for every CPU. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-5-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:27 -07:00
Leo Yan	703f344d0c	perf arm-spe: Save per CPU information in metadata Save the Arm SPE information on a per-CPU basis. This approach is easier in the decoding phase for retrieving metadata based on the CPU number of every Arm SPE record. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:24 -07:00
Leo Yan	59715b1908	perf arm-spe: Calculate meta data size The metadata is designed to contain a header and per CPU information. The arm_spe_find_cpus() function is introduced to identify how many CPUs support ARM SPE. Based on the CPU number, calculates the metadata size. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:20 -07:00
Leo Yan	0ca2c45404	perf arm-spe: Define metadata header version 2 The first version's metadata header structure doesn't include a field to indicate a header version, which is not friendly for extension. Define the metadata version 2 format with a new header structure and extend per CPU's metadata. In the meantime, the old metadata header will still be supported for backward compatibility. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 15:23:09 -07:00
Yoshihiro Furudera	f7ef062fe1	perf list: update option desc in man page There is a difference between the SYNOPSIS section of the help message and the man page (tools/perf/Documentation/perf-list.txt) for the perf list command. After checking, we found that the help message reflected the latest specifications. Therefore, revised the SYNOPSIS section of the man page to match the help message. Signed-off-by: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Liang Link: https://lore.kernel.org/r/20241003002404.2592094-1-fj5100bi@fujitsu.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 10:01:01 -07:00
Veronika Molnarova	f72751a73a	perf test: Restore sample rate for perf_event_attr Test "Setup struct perf_event_attr" consists of multiple test cases that can affect the max sample rate value for perf events. Some test cases check this value as it should not be lowered under the set minimum for the given test. Currently, it is possible for the test cases to affect each other as the previous tests can lower the sample rate, leading to a possible failure of some of the future test cases as the value is not restored at any point. # 10: Setup struct perf_event_attr: --- start --- test child forked, pid 104220 Using CPUID 0x00000000413fd0c1 running './tests/attr/test-record-C0' Current sample rate: 10000 running './tests/attr/test-record-basic' Current sample rate: 900 running './tests/attr/test-record-branch-any' Current sample rate: 600 running './tests/attr/test-record-dummy-C0' Current sample rate: 600 expected sample_period=4000, got 600 FAILED './tests/attr/test-record-dummy-C0' - match failure Restore the max sample rate value for perf events to a reasonable value before each test case if its value was lowered too much to ensure the same conditions for each test case. # 10: Setup struct perf_event_attr: --- start --- test child forked, pid 107222 Using CPUID 0x00000000413fd0c1 running './tests/attr/test-record-C0' Current sample rate: 10000 running './tests/attr/test-record-basic' Current sample rate: 800 running './tests/attr/test-record-branch-any' Current sample rate: 700 unsupp './tests/attr/test-record-branch-any' running './tests/attr/test-record-branch-filter-any' Current sample rate: 10000 running './tests/attr/test-record-count' Current sample rate: 10000 running './tests/attr/test-record-data' Current sample rate: 600 running './tests/attr/test-record-dummy-C0' Current sample rate: 800 running './tests/attr/test-record-freq' Current sample rate: 10000 ... Cc: Michael Petlan <mpetlan@redhat.com> Cc: Radostin Stoyanov <rstoyano@redhat.com> Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241003125136.15918-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-03 09:46:12 -07:00
Michael Petlan	d29d92df41	perf trace: Keep exited threads for summary Since `9ffa6c7512` ("perf machine thread: Remove exited threads by default") perf cleans exited threads up, but as said, sometimes they are necessary to be kept. The mentioned commit does not cover all the cases, we also need the information to construct the summary table in perf-trace. Before: # perf trace -s true Summary of events: After: # perf trace -s -- true Summary of events: true (383382), 64 events, 91.4% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ mmap 8 0 0.150 0.013 0.019 0.031 11.90% mprotect 3 0 0.045 0.014 0.015 0.017 6.47% openat 2 0 0.014 0.006 0.007 0.007 9.73% munmap 1 0 0.009 0.009 0.009 0.009 0.00% access 1 1 0.009 0.009 0.009 0.009 0.00% pread64 4 0 0.006 0.001 0.001 0.002 4.53% fstat 2 0 0.005 0.001 0.002 0.003 37.59% arch_prctl 2 1 0.003 0.001 0.002 0.002 25.91% read 1 0 0.003 0.003 0.003 0.003 0.00% close 2 0 0.003 0.001 0.001 0.001 3.86% brk 1 0 0.002 0.002 0.002 0.002 0.00% rseq 1 0 0.001 0.001 0.001 0.001 0.00% prlimit64 1 0 0.001 0.001 0.001 0.001 0.00% set_robust_list 1 0 0.001 0.001 0.001 0.001 0.00% set_tid_address 1 0 0.001 0.001 0.001 0.001 0.00% execve 1 0 0.000 0.000 0.000 0.000 0.00% [namhyung: simplified the condition] Fixes: `9ffa6c7512` ("perf machine thread: Remove exited threads by default") Reported-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20240927151926.399474-1-mpetlan@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 15:29:25 -07:00
Thomas Richter	5873de9031	perf/test: perf test 86 fails on s390 Command perf test 86 fails on s390: # perf test -F 86 ping 868299 [007] 28248.013596: probe_libc:inet_pton_1: (3ff95948020) 3ff95948020 inet_pton+0x0 (inlined) 3ff9595e6e7 text_to_binary_address+0x1007 (inlined) 3ff9595e6e7 gaih_inet+0x1007 (inlined) FAIL: expected backtrace entry \ "main\+0x[[:xdigit:]]+[[:space:]]$./bin/ping.$$" got "3ff9595e6e7 gaih_inet+0x1007 (inlined)" 86: probe libc's inet_pton & backtrace it with ping : FAILED! # The root cause is a new stack layout, two functions have been added as seen below. # perf script \| tac \| grep -m1 '^ping' -B9 \| tac ping 866856 [007] 25979.494921: probe_libc:inet_pton: (3ff8ec48020) 3ff8ec48020 inet_pton+0x0 (inlined) new --> 3ff8ec5e6e7 text_to_binary_address+0x1007 (inlined) new --> 3ff8ec5e6e7 gaih_inet+0x1007 (inlined) 3ff8ec5e6e7 getaddrinfo+0x1007 (/usr/lib64/libc.so.6) 2aa3fe04bf5 main+0xff5 (/usr/bin/ping) 3ff8eb34a5b __libc_start_call_main+0x8b (/usr/lib64/libc.so.6) 3ff8eb34b5d __libc_start_main@GLIBC_2.2+0xad (inlined) 2aa3fe06a1f [unknown] (/usr/bin/ping) # The new functions in the call chain are: - text_to_binary_address() - gaih_inet(). Both functions are inlined and do not show up in the output of the nm command: # nm -a /usr/lib64/libc.so.6 \| \ grep -E '(text_to_binary_address\|gaih_inet)$' # There is no possibility to add these 2 functions depending on their existance in the C library. Add text_to_binary_address() and gaih_inet() to the list of expected functions in an compatible way and extend the regular expression. On s390 the backtrace can now be Before After probe_libc:inet_pton probe_libc:inet_pton inet_pton inet_pton getaddrinfo getaddrinfo \| text_to_binary_address main main \| gaih_inet Output after: # perf test -F 86 86: probe libc's inet_pton & backtrace it with ping : Ok # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: agordeev@linux.ibm.com Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Link: https://lore.kernel.org/r/20241001124224.3370306-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:04 -07:00
Ben Gainey	90035d3cd8	tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events The "perf record" tool will now default to this new mode if the user specifies a sampling group when not in system-wide mode, and when "--no-inherit" is not specified. This change updates evsel to allow the combination of inherit and PERF_SAMPLE_READ. A fallback is implemented for kernel versions where this feature is not supported. Signed-off-by: Ben Gainey <ben.gainey@arm.com> Cc: james.clark@arm.com Link: https://lore.kernel.org/r/20241001121505.1009685-3-ben.gainey@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ben Gainey	80c281fca2	tools/perf: Correctly calculate sample period for inherited SAMPLE_READ values Sample period calculation in deliver_sample_value is updated to calculate the per-thread period delta for events that are inherit + PERF_SAMPLE_READ. When the sampling event has this configuration, the read_format.id is used with the tid from the sample to lookup the storage of the previously accumulated counter total before calculating the delta. All existing valid configurations where read_format.value represents some global value continue to use just the read_format.id to locate the storage of the previously accumulated total. perf_sample_id is modified to support tracking per-thread values, along with the existing global per-id values. In the per-thread case, values are stored in a hash by tid within the perf_sample_id, and are dynamically allocated as the number is not known ahead of time. Signed-off-by: Ben Gainey <ben.gainey@arm.com> Cc: james.clark@arm.com Link: https://lore.kernel.org/r/20241001121505.1009685-2-ben.gainey@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ian Rogers	ad321b19d2	perf test: Skip not fail syscall tp fields test when insufficient permissions Clean up return value to be TEST_* rather than unspecific integer. Add test case skip reason. Skip test if EACCES comes back from evsel__newtp. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241001052327.7052-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ian Rogers	7457bcfcfb	perf test: Skip not fail tp fields test when insufficient permissions Clean up return value to be TEST_* rather than unspecific integer. Add test case skip reason. Skip test if EACCES comes back from evsel__newtp. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241001052327.7052-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ian Rogers	1334ee9169	perf test: Fix memory leaks on event-times error paths These error paths occur without sufficient permissions. Fix the memory leaks to make leak sanitizer happier. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241001052327.7052-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Ian Rogers	7f6ccb70e4	perf stat: Fix affinity memory leaks on error path Missed cleanup when an error occurs. Fixes: `49de179577` ("perf stat: No need to setup affinities when starting a workload") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241001052327.7052-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Kan Liang	8d7f85e323	perf jevents: Don't stop at the first matched pmu when searching a events table The "perf all PMU test" fails on a Coffee Lake machine. The failure is caused by the below change in the commit `e2641db83f` ("perf vendor events: Add/update skylake events/metrics"). + { + "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles", + "Counter": "FIXED", + "EventCode": "0xff", + "EventName": "UNC_CLOCK.SOCKET", + "PerPkg": "1", + "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.", + "Unit": "cbox_0" } The other cbox events have the unit name "CBOX", while the fixed counter has a unit name "cbox_0". So the events_table will maintain separate entries for cbox and cbox_0. The perf_pmus__print_pmu_events() calculates the total number of events, allocate an aliases buffer, store all the events into the buffer, sort, and print all the aliases one by one. The problem is that the calculated total number of events doesn't match the stored events in the aliases buffer. The perf_pmu__num_events() is used to calculate the number of events. It invokes the pmu_events_table__num_events() to go through the entire events_table to find all events. Because of the pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So the events for cbox and cbox_0 are all counted. When storing events into the aliases buffer, the perf_pmu__for_each_event() only process the events for cbox. Since a bigger buffer was allocated, the last entry are all 0. When printing all the aliases, null will be outputted, and trigger the failure. The mismatch was introduced from the commit `e3edd6cf63` ("perf pmu-events: Reduce processed events by passing PMU"). The pmu_events_table__for_each_event() stops immediately once a pmu is set. But for uncore, especially this case, the method is wrong and mismatch what perf does in the perf_pmu__num_events(). With the patch, $ perf list pmu \| grep -A 1 clock.socket unc_clock.socket [This 48-bit fixed counter counts the UCLK cycles. Unit: uncore_cbox_0 $ perf test "perf all PMU test" 107: perf all PMU test : Ok Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/all/202407101021.2c8baddb-oliver.sang@intel.com/ Fixes: `e3edd6cf63` ("perf pmu-events: Reduce processed events by passing PMU") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241001021431.814811-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-10-02 14:58:03 -07:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Ilkka Koskinen	e934a35e3c	perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace If one builds perf with DEBUG=1, captures data on multiple CPUs and finally runs 'perf report -C <cpu>' for only one of the cpus, assert() aborts the program. This happens because there are empty queues with format set. This patch changes the condition to abort only if a queue is not empty and if the format is unset. $ make -C tools/perf DEBUG=1 CORESIGHT=1 CSLIBS=/usr/lib CSINCLUDES=/usr/include install $ perf record -o kcore --kcore -e cs_etm/timestamp/k -s -C 0-1 dd if=/dev/zero of=/dev/null bs=1M count=1 $ perf report --input kcore/data --vmlinux=/home/ikoskine/projects/linux/vmlinux -C 1 Aborted (core dumped) Fixes: `57880a7966` ("perf: cs-etm: Allocate queues for all CPUs") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240924233930.5193-1-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 18:21:49 -03:00
Yang Jihong	a530337ba9	perf build: Fix build feature-dwarf_getlocations fail for old libdw For libdw versions below 0.177, need to link libdl.a in addition to libbebl.a during static compilation, otherwise feature-dwarf_getlocations compilation will fail. Before: $ make LDFLAGS=-static BUILD: Doing 'make -j20' parallel build <SNIP> Makefile.config:483: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157 <SNIP> $ cat ../build/feature/test-dwarf_getlocations.make.output /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libebl.a(eblclosebackend.o): in function `ebl_closebackend': (.text+0x20): undefined reference to `dlclose' collect2: error: ld returned 1 exit status After: $ make LDFLAGS=-static <SNIP> Auto-detecting system features: ... dwarf: [ on ] <SNIP> $ ./perf probe Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...] or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...] or: perf probe [<options>] --del '[GROUP:]EVENT' ... or: perf probe --list [GROUP:]EVENT ... <SNIP> Fixes: `536661da6e` ("perf: build: Only link libebl.a for old libdw") Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240919013513.118527-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 18:21:49 -03:00
Yang Jihong	43f6564f18	perf build: Fix static compilation error when libdw is not installed If libdw is not installed in build environment, the output of 'pkg-config --modversion libdw' is empty, causing LIBDW_VERSION_2 to be empty and the shell test will have the following error: /bin/sh: 1: test: -lt: unexpected operator Before: $ pkg-config --modversion libdw Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found /bin/sh: 1: test: -lt: unexpected operator After: 1. libdw is not installed: $ pkg-config --modversion libdw Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found Makefile.config:473: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR 2. libdw version is lower than 0.177 $ pkg-config --modversion libdw 0.176 $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Auto-detecting system features: ... dwarf: [ on ] <SNIP> INSTALL libsubcmd_headers INSTALL libapi_headers INSTALL libperf_headers INSTALL libsymbol_headers INSTALL libbpf_headers LINK perf 3. libdw version is higher than 0.177 $ pkg-config --modversion libdw 0.186 $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Auto-detecting system features: ... dwarf: [ on ] <SNIP> CC util/bpf-utils.o CC util/pfm.o LD util/perf-util-in.o LD perf-util-in.o AR libperf-util.a LINK perf Fixes: `536661da6e` ("perf: build: Only link libebl.a for old libdw") Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240919013513.118527-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 18:21:49 -03:00
James Clark	008979cc69	perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT The linked fixes commit added an #include "dwarf-aux.h" to disasm.h which gets picked up in a lot of places. Without HAVE_DWARF_GETLOCATIONS_SUPPORT the stubs return an errno, so include errno.h to fix the following build error: In file included from util/disasm.h:8, from util/annotate.h:16, from builtin-top.c:23: util/dwarf-aux.h: In function 'die_get_var_range': util/dwarf-aux.h:183:10: error: 'ENOTSUP' undeclared (first use in this function) 183 \| return -ENOTSUP; \| ^~~~~~~ Fixes: `782959ac24` ("perf annotate: Add "update_insn_state" callback function to handle arch specific instruction tracking") Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241001123625.1063153-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 18:21:49 -03:00
Arnaldo Carvalho de Melo	36110669dd	perf tools: Cope with differences for lib/list_sort.c copy from the kernel With `6d74e1e371` ("tools/lib/list_sort: remove redundant code for cond_resched handling") we need to use the newly added hunk based exceptions when comparing the copy we carry in tools/lib/ to the original file, do it by adding the hunks that we know will be the expected diff. If at some point the original file is updated in other parts, then we should flag and check the file for update. Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20240930202136.16904-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 15:07:32 -03:00
Arnaldo Carvalho de Melo	cd46ea5ab4	tools check_headers.sh: Add check variant that excludes some hunks With `6d74e1e371` ("tools/lib/list_sort: remove redundant code for cond_resched handling") we end up with a multi-line variation in the merge_final() implementation, one that the simple line based exceptions we had so far can't cope. Thus this check has been failing: Warning: Kernel ABI header differences: diff -u tools/lib/list_sort.c lib/list_sort.c So add a new check routine that uses grep -vf to exclude some hunks that we store in the tools/perf/check-header_ignore_hunks/ directory. This first patch is just the new check routine, the next one will use it to check lib/list_sort.c. Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20240930202136.16904-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-10-02 14:50:44 -03:00
Dapeng Mi	80f192724e	perf tests: Add more topdown events regroup tests Add more test cases to cover all supported topdown events regroup cases. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-7-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00
Dapeng Mi	0836aa6008	perf tests: Add topdown events counting and sampling tests Add counting and leader sampling tests to verify topdown events including raw format can be reordered correctly. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-6-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00
Dapeng Mi	387892723a	perf tests: Add leader sampling test in record tests Add leader sampling test to validate event counts are captured into record and the count value is consistent. Suggested-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-5-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00
Dapeng Mi	3b5edc0421	perf x86/topdown: Don't move topdown metric events in group when running below perf command, we say error is reported. perf record -e "{slots,instructions,topdown-retiring}:S" -vv -C0 sleep 1 ------------------------------------------------------------ perf_event_attr: type 4 (cpu) size 168 config 0x400 (slots) sample_type IP\|TID\|TIME\|READ\|CPU\|PERIOD\|IDENTIFIER read_format ID\|GROUP\|LOST disabled 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 4 (cpu) size 168 config 0x8000 (topdown-retiring) { sample_period, sample_freq } 4000 sample_type IP\|TID\|TIME\|READ\|CPU\|PERIOD\|IDENTIFIER read_format ID\|GROUP\|LOST freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8 sys_perf_event_open failed, error -22 Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring). The reason of error is that the events are regrouped and topdown-retiring event is moved to closely after the slots event and topdown-retiring event needs to do the sampling, but Intel PMU driver doesn't support to sample topdown metrics events. For topdown metrics events, it just requires to be in a group which has slots event as leader. It doesn't require topdown metrics event must be closely after slots event. Thus it's a overkill to move topdown metrics event closely after slots event in events regrouping and furtherly cause the above issue. Thus don't move topdown metrics events forward if they are already in a group. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-4-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00
Dapeng Mi	1e53e9d178	perf x86/topdown: Correct leader selection with sample_read enabled Addresses an issue where, in the absence of a topdown metrics event within a sampling group, the slots event was incorrectly bypassed as the sampling leader when sample_read was enabled. perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1 In this case, the slots event should be sampled as leader but the branches event is sampled in fact like the verbose output shows. perf_event_attr: type 4 (cpu) size 168 config 0x400 (slots) sample_type IP\|TID\|TIME\|READ\|CPU\|IDENTIFIER read_format ID\|GROUP\|LOST disabled 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 168 config 0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS) { sample_period, sample_freq } 10000 sample_type IP\|TID\|TIME\|READ\|CPU\|IDENTIFIER read_format ID\|GROUP\|LOST sample_id_all 1 exclude_guest 1 The sample period of slots event instead of branches event is reset to 0. This fix ensures the slots event remains the leader under these conditions. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-3-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:44 -07:00
Dapeng Mi	39820ced2a	perf x86/topdown: Complete topdown slots/metrics events check It's not complete to check whether an event is a topdown slots or topdown metrics event by only comparing the event name since user may assign the event by RAW format, e.g. perf stat -e '{instructions,cpu/r400/,cpu/r8300/}' sleep 1 Performance counter stats for 'sleep 1': <not counted> instructions <not counted> cpu/r400/ <not supported> cpu/r8300/ 1.002917796 seconds time elapsed 0.002955000 seconds user 0.000000000 seconds sys The RAW format slots and topdown-be-bound events are not recognized and not regroup the events, and eventually cause error. Thus add two helpers arch_is_topdown_slots()/arch_is_topdown_metrics() to detect whether an event is topdown slots/metrics event by comparing the event config directly, and use these two helpers to replace the original event name comparisons. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20240913084712.13861-2-dapeng1.mi@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 15:23:43 -07:00
Ian Rogers	4d1b305dc8	perf evsel: Reduce a variables scope In __evsel__config_callchain avoid computing arch until code path that uses it. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240918223116.127386-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 13:48:27 -07:00
Yicong Yang	f0cb9fa7a5	perf vender events arm64: Use "Topdown" as topdown metric group name HiSilicon HIP08 does support Topdown metrics but perf tool complains when trying to count Topdown metrics: [root@localhost tracing]# perf stat --topdown Topdown requested but the topdown metric groups aren't present. (See perf list the metric groups have names like TopdownL1) It's because tool's using "Topdown" as the metric group name[1] rather than "TopDown", so follow the convention. This is introduced by [2] which allows to use json metrics to support --topdown function. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/builtin-stat.c?h=v6.11-rc1#n1994 [2] commit `1647cd5b88` ("perf stat: Implement --topdown using json metrics") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: prime.zeng@hisilicon.com Cc: hejunhao3@huawei.com Cc: linuxarm@huawei.com Cc: shameerali.kolothum.thodi@huawei.com Link: https://lore.kernel.org/r/20240912063903.31460-1-yangyicong@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 13:42:16 -07:00
Arnaldo Carvalho de Melo	d164868879	perf beauty: Update copy of linux/socket.h with the kernel sources To pick the changes in: `8f0b3cc9a4` ("tcp: RX path for devmem TCP") That don't result in any changes in the tables generated from that header. But while updating I noticed we need to support the new MSG_SOCK_DEVMEM flag in the hard coded table for the msg flags table, add it. This silences this perf build warning: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h Please see tools/include/uapi/README for details. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZvrO_eT9e_41xrNv@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-30 17:23:38 -03:00
Arnaldo Carvalho de Melo	c94cd9508b	perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources To pick up the change in: `a1fab3e69d` ("x86/irq: Fix comment on IRQ vector layout") That just adds some comments, so no changes in perf tooling, just silences this build warning: diff -u tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h arch/x86/include/asm/irq_vectors.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/ZvrKT7oQc1AOv6Vk@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-30 17:23:38 -03:00
Arnaldo Carvalho de Melo	58f969b7a8	tools include UAPI: Sync linux/fcntl.h copy with the kernel sources Picking the changes from: `4356d575ef` ("fhandle: expose u64 mount id to name_to_handle_at(2)") `b4fef22c2f` ("uapi: explain how per-syscall AT_* flags should be allocated") `820a185896` ("fcntl: add F_CREATED_QUERY") It just moves AT_REMOVEDIR around, and adds a bunch more AT_ for renameat2() and name_to_handle_at(). We need to improve this situation, as not all AT_ defines are applicable to all fs flags... This adds support for those new AT_ defines, addressing this build warning: diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/lkml/ZvrIKL3cREoRHIQd@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-30 17:23:31 -03:00
Jiapeng Chong	9865f0a209	perf test: Use ARRAY_SIZE for array length Use of macro ARRAY_SIZE to calculate array size minimizes the redundant code and improves code reusability. ./tools/perf/tests/demangle-java-test.c:31:34-35: WARNING: Use ARRAY_SIZE. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=11173 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20240929093045.10136-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-30 12:59:42 -07:00
Arnaldo Carvalho de Melo	7ae76b32f9	tools include UAPI: Sync linux/sched.h copy with the kernel sources Picking the changes from: `f0e1a0643a` ("sched_ext: Implement BPF extensible scheduler class") The inclusion of the SCHED_EXT define doesn't cause any change in behaviour in tools/perf. This just silences this perf tools build warning: diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/lkml/ZvrDShNVXotZpiwk@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-30 12:38:01 -03:00
Arnaldo Carvalho de Melo	c850897b6c	tools include UAPI: Sync sound/asound.h copy with the kernel sources Picking the changes from: `37745918e0` ("ALSA: timer: Introduce virtual userspace-driven timers") Which entails no changes in the tooling side as it only introduces new SNDRV_TIMER_IOCTL_ ioctls, and the ones tracked by scripts in tools/perf/trace/beauty/ are only SNDRV_PCM_IOCTL_ and SNDRV_CTL_IOCTL_, we still need to support SNDRV_TIMER_IOCTL_ ones, but that probably will be one of the first for a BTF enumeration based approach :-) This silences this perf tools build warning: diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ivan Orlov <ivan.orlov0322@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/lkml/ZvrB-g_E7g2ArlYW@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-30 12:21:54 -03:00
Ian Rogers	424aafb61a	perf vdso: Missed put on 32-bit dsos If the dso type doesn't match then NULL is returned but the dso should be put first. Fixes: `f649ed80f3` ("perf dsos: Tidy reference counting and locking") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240912182757.762369-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-27 15:38:52 -03:00
Thomas Richter	b38c49d829	perf/test: Speed up test case perf annotate basic tests perf test 70 takes a long time. One culprit is the output of command perf annotate. Per default enabled are - demangle symbol names - interleave source code with assembly code. Disable demangle of symbols and abort the annotation after the first 250 lines. This speeds up the test case considerable, for example on s390: Output before: # time perf test 70 70: perf annotate basic tests : Ok ..... real 2m7.467s user 1m26.869s sys 0m34.086s # Output after: # time perf test 70 70: perf annotate basic tests : Ok real 0m3.341s user 0m1.606s sys 0m0.362s # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: sumanthk@linux.ibm.com Link: https://lore.kernel.org/r/20240917085706.249691-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 23:42:46 -07:00
Thomas Falcon	4f23fc34cc	perf mem: Fix printing PERF_MEM_LVLNUM_{L2_MHB\|MSC} With commit `8ec9497d3e` ("tools/include: Sync uapi/linux/perf.h with the kernel sources"), 'perf mem report' gives an incorrect memory access string. ... 0.02% 1 3644 L5 hit [.] 0x0000000000009b0e mlc [.] 0x00007fce43f59480 ... This occurs because, if no entry exists in mem_lvlnum, perf_mem__lvl_scnprintf will default to 'L%d, lvl', which in this case for PERF_MEM_LVLNUM_L2_MHB is 0x05. Add entries for PERF_MEM_LVLNUM_L2_MHB and PERF_MEM_LVLNUM_MSC to mem_lvlnum, so that the correct strings are printed. ... 0.02% 1 3644 L2 MHB hit [.] 0x0000000000009b0e mlc [.] 0x00007fce43f59480 ... Fixes: `8ec9497d3e` ("tools/include: Sync uapi/linux/perf.h with the kernel sources") Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20240926144040.77897-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 23:37:22 -07:00
Madadi Vineeth Reddy	6adeb277fe	perf sched replay: Remove unused parts of the code The sleep_sem semaphore and the specific_wait field (member of sched_atom) are initialized but not used anywhere in the code, so this patch removes them. The SCHED_EVENT_MIGRATION case in perf_sched__process_event() is currently not used and is also removed. Additionally, prev_state in add_sched_event_sleep() is marked with __maybe_unused and is not utilized anywhere in the function. This patch removes the parameter. If the task_state parameter was intended for future use, it can be reintroduced when needed. No functionality change intended. Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240917090100.42783-1-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 15:47:57 -07:00
James Clark	65d1182191	perf test: Add a test for default perf stat command Test that one cycles event is opened for each core PMU when "perf stat" is run without arguments. The event line can either be output as "pmu/cycles/" or just "cycles" if there is only one PMU. Include 2 spaces for padding in the one PMU case to avoid matching when the word cycles is included in metric descriptions. Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-8-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 13:26:12 -07:00
James Clark	14b6b269f4	perf test: Make stat test work on DT devices PMUs aren't listed in /sys/devices/ on DT devices, so change the search directory to /sys/bus/event_source/devices which works everywhere. Also add armv8_cortex_* as a known PMU type to search for to make the test run on more devices. Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-7-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 13:26:12 -07:00
Ian Rogers	d7d156fc5e	perf evsel: Remove pmu_name "evsel->pmu_name" is only ever assigned a strdup of "pmu->name", a strdup of "evsel->pmu_name" or NULL. As such, prefer to use "pmu->name" directly and even to directly compare PMUs than PMU names. For safety, add some additional NULL tests. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> [ Fix arm-spe.c usage of pmu_name and empty PMU name ] Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-6-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 13:26:11 -07:00
Ian Rogers	e2216fac1e	perf evsel x86: Make evsel__has_perf_metrics work for legacy events Use PMU interface to better detect core PMU for legacy events. Look for slots event on core PMU if it is appropriate for the event. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-5-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 13:26:11 -07:00
Ian Rogers	d38461e977	perf stat: Remove evlist__add_default_attrs use strings add_default_atttributes would add evsels by having pre-created perf_event_attr, however, this needed fixing for hybrid as the extended PMU type was necessary for each core PMU. The logic for this was in an arch specific x86 function and wasn't present for ARM, meaning that default events weren't being opened on all PMUs on ARM. Change the creation of the default events to use parse_events and strings as that will open the events on all PMUs. Rather than try to detect events on PMUs before parsing, parse the event but skip its output in stat-display. The previous order of hardware events was: cycles, stalled-cycles-frontend, stalled-cycles-backend, instructions. As instructions is a more fundamental concept the order is changed to: instructions, cycles, stalled-cycles-frontend, stalled-cycles-backend. Closes: https://lore.kernel.org/lkml/CAP-5=fVABSBZnsmtRn1uF-k-G1GWM-L5SgiinhPTfHbQsKXb_g@mail.gmail.com/ Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> [Don't display unsupported default events except 'cycles'] Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-4-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 13:26:11 -07:00
Ian Rogers	057f8bfc6f	perf stat: Uniquify event name improvements Without aggregation on Intel: ``` $ perf stat -e instructions,cycles ... ``` Will use "cycles" for the name of the legacy cycles event but as "instructions" has a sysfs name it will and a "[cpu]" PMU suffix. This often breaks things as the space between the event and the PMU name look like an extra column. The existing uniquify logic was also uniquifying in cases when all events are core and not with uncore events, it was not correctly handling modifiers, etc. Change the logic so that an initial pass that can disable uniquification is run. For individual counters, disable uniquification in more cases such as for consistency with legacy events or for libpfm4 events. Don't use the "[pmu]" style suffix in uniquification, always use "pmu/.../". Change how modifiers/terms are handled in the uniquification so that they look like parse-able events. This fixes "102: perf stat metrics (shadow stat) test:" that has been failing due to "instructions [cpu]" breaking its column/awk logic when values aren't aggregated. This started happening when instructions could match a sysfs rather than a legacy event, so the fixes tag reflects this. Fixes: `617824a7f0` ("perf parse-events: Prefer sysfs/JSON hardware events over legacy") Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> [ Fix Intel TPEBS counting mode test ] Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-3-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 13:26:11 -07:00
Ian Rogers	22a4db3c36	perf evsel: Add alternate_hw_config and use in evsel__match There are cases where we want to match events like instructions and cycles with legacy hardware values, in particular in stat-shadow's hard coded metrics. An evsel's name isn't a good point of reference as it gets altered, strstr would be too imprecise and re-parsing the event from its name is silly. Instead, hold the legacy hardware event name, determined during parsing, in the evsel for this matching case. Inline evsel__match2 that is only used in builtin-diff. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 13:26:11 -07:00
Ian Rogers	7e73ea4029	perf test: Ignore security failures in all PMU test Refactor code to have some more error diagnosis on traps, etc. and to do less work on each line. Add an ignore situation for security failures. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240925173013.12789-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-26 11:17:25 -07:00
Namhyung Kim	77b004f4c5	perf symbol: Do not fixup end address of labels When it loads symbols from an ELF file, it loads label symbols which is 0 size. Sometimes it has the same address with other symbols and might shadow the original symbols because it fixes up the size of the symbol. For example, in my system __do_softirq is shadowed and only accepts the __softirqentry_text_start instead. But it should accept __do_softirq. $ readelf -sW vmlinux \| grep -e __do_softirq -e __softirqentry_text_start 105089: ffffffff82000000 814 FUNC GLOBAL DEFAULT 1 __do_softirq 111954: ffffffff82000000 0 NOTYPE GLOBAL DEFAULT 1 __softirqentry_text_start $ perf annotate --stdio __do_softirq Error: The perf.data data has no samples! $ perf annotate --stdio __softirqentry_text_start \| head Percent \| Source code & Disassembly of vmlinux for cycles (26 samples, percent: local period) --------------------------------------------------------------------------------------------------- : 0 0xffffffff82000000 <__softirqentry_text_start>: 0.00 : ffffffff82000000: nopl (%rax,%rax) 30.77 : ffffffff82000005: pushq %rbp 3.85 : ffffffff82000006: movq %rsp, %rbp 0.00 : ffffffff82000009: pushq %r15 3.85 : ffffffff8200000b: pushq %r14 3.85 : ffffffff8200000d: pushq %r13 0.00 : ffffffff8200000f: pushq %r12 We can ignore NOTYPE symbols in the symbols__fixup_end() so that it can pick the __do_softirq() in choose_best_symbol(). This should be fine since most symbols have either STT_FUNC or STT_OBJECT. Link: https://lore.kernel.org/r/20240912224208.3360116-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-25 22:37:25 -07:00
Xu Yang	235f0da327	perf vendor events arm64: imx95: add imx95_bandwidth_usage.lpddr4x metric Except lpddr5, i.MX95 also support lpddr4x. This will add a metric for lpddr4x. Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Cc: shawnguo@kernel.org Cc: will@kernel.org Cc: james.clark@linaro.org Cc: mike.leach@linaro.org Cc: imx@lists.linux.dev Cc: john.g.garry@oracle.com Cc: kernel@pengutronix.de Cc: s.hauer@pengutronix.de Link: https://lore.kernel.org/r/20240924030812.3211029-1-xu.yang_2@nxp.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-25 16:09:22 -07:00
Levi Yun	b77f8c36ce	perf stat: Stop repeating when ref_perf_stat() returns -1 Exit when run_perf_stat() returns an error to avoid continuously repeating the same error message. It's not expected that COUNTER_FATAL or internal errors are recoverable so there's no point in retrying. This fixes the following flood of error messages for permission issues, for example when perf_event_paranoid==3: perf stat -r 1044 -- false Error: Access to performance monitoring and observability operations is limited. ... Error: Access to performance monitoring and observability operations is limited. ... (repeating for 1044 times). Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: nd@arm.com Cc: howardchu95@gmail.com Link: https://lore.kernel.org/r/20240925132022.2650180-3-yeoreum.yun@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-25 15:58:42 -07:00
Levi Yun	e880a70f80	perf stat: Close cork_fd when create_perf_stat_counter() failed When create_perf_stat_counter() failed, it doesn't close workload.cork_fd open in evlist__prepare_workload(). This could make too many open file error while __run_perf_stat() repeats. Introduce evlist__cancel_workload to close workload.cork_fd and wait workload.child_pid until exit to clear child process when create_perf_stat_counter() is failed. Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: nd@arm.com Cc: howardchu95@gmail.com Link: https://lore.kernel.org/r/20240925132022.2650180-2-yeoreum.yun@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-25 15:58:42 -07:00
Masum Reza	f115506d2c	perf evsel: display dmesg command of showing a hardcoded path In non-FHS compliant distros like NixOS, nothing resides in `/bin` and `/usr/bin`. Instead dynamically symlinked into `/run/current-system/sw/bin/`, the executable resides in `/nix/store`. With this patch,`/bin` prefix from the dmesg command in the error message is stripped. Link: https://github.com/NixOS/nixpkgs/pull/258027 Signed-off-by: Masum Reza <masumrezarock100@gmail.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240922112619.149429-1-masumrezarock100@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-24 13:26:34 -07:00
James Clark	eb0a59e9e1	perf test: cs-etm: Test Coresight disassembly script Run a few samples through the disassembly script and check to see that at least one branch instruction is printed. Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-8-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-24 11:47:19 -07:00
James Clark	66dd3b539e	perf scripts python cs-etm: Add start and stop arguments Make it possible to only disassemble a range of timestamps or sample indexes. This will be used by the test to limit the runtime, but it's also useful for users. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-7-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-24 11:47:15 -07:00
James Clark	8286cc55a9	perf scripts python cs-etm: Improve arguments Make vmlinux detection automatic and use Perf's default objdump when -d is specified. This will make it easier for a test to use the script without having to provide arguments. And similarly for users. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-6-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-24 11:47:11 -07:00
James Clark	7b371afc9b	perf scripts python cs-etm: Update to use argparse optparse is deprecated and less flexible than argparse so update it. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-5-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-24 11:47:07 -07:00
James Clark	9943581c64	perf scripting python: Add function to get a config value This can be used to get config values like which objdump Perf uses for disassembly. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-4-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-24 11:47:03 -07:00
James Clark	ba5ae78a5a	perf cs-etm: Use new OpenCSD consistency checks Previously when the incorrect binary was used for decode, Perf would silently continue to generate incorrect samples. With OpenCSD 1.5.4 we can enable consistency checks that do a best effort to detect a mismatch in the image. When one is detected a warning is printed and sample generation stops until the trace resynchronizes with a good part of the image. Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/ Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-3-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-24 11:46:46 -07:00
James Clark	5afd032961	perf cs-etm: Don't flush when packet_queue fills up cs_etm__flush(), like cs_etm__sample() is an operation that generates a sample and then swaps the current with the previous packet. Calling flush after processing the queues results in two swaps which corrupts the next sample. Therefore it wasn't appropriate to call flush here so remove it. Flushing is still done on a discontinuity to explicitly clear the last branch buffer, but when the packet_queue fills up before reaching a timestamp, that's not a discontinuity and the call to cs_etm__process_traceid_queue() already generated samples and drained the buffers correctly. This is visible by looking for a branch that has the same target as the previous branch and the following source is before the address of the last target, which is impossible as execution would have had to have gone backwards: ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94 (packet_queue fills here before a timestamp, resulting in a flush and branch target ffff80008011cadc is duplicated.) ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff80008011cadc update_sg_lb_stats+0x94 ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34 After removing the flush the correct branch target is used for the second sample, and ffff8000801117c4 is no longer before the previous address: ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94 ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff8000801117a0 cpu_util+0x0 ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34 Make sure that a final branch stack is output at the end of the trace by calling cs_etm__end_block(). This is already done for both the timeless decode paths. Fixes: `21fe8dc119` ("perf cs-etm: Add support for CPU-wide trace scenarios") Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/ Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-24 11:46:24 -07:00
Ian Rogers	c940a66b3a	perf test: Be more tolerant of metricgroup failures Previously "set -e" meant any non-zero exit code from perf stat would cause a test failure. As a non-zero exit happens when there aren't sufficient permissions, check for this case and make the exit code 2/skip for it. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240502223115.2357499-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-09-24 10:46:24 -07:00
Namhyung Kim	5363c30678	perf symbol: Set binary_type of dso when loading For the kernel dso, it sets the binary type of dso when loading the symbol table. But it seems not to do that for user DSOs. Actually it sets the symtab type only. It's not clear why we want to maintain the two separately but it uses the binary type info before getting the disassembly. Let's use the symtab type as binary type too if it's not set. I think it's ok to set the binary type when it founds a symsrc whether or not it has actual symbols. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Alexander Monakov <amonakov@ispras.ru> Link: https://lore.kernel.org/r/20240426215139.1271039-1-namhyung@kernel.org Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: LKML <linux-kernel@vger.kernel.org> Cc: <linux-perf-users@vger.kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-22 23:46:18 +02:00
Arnaldo Carvalho de Melo	1de5b5dcb8	perf trace: Mark the 'head' arg in the set_robust_list syscall as coming from user space With that it uses the generic BTF based pretty printer: This one we need to think about, not being acquainted with this syscall, should we _traverse_ that list somehow? Would that be useful? root@number:~# perf trace -e set_robust_list sleep 1 0.000 ( 0.004 ms): sleep/1206493 set_robust_list(head: (struct robust_list_head){.list = (struct robust_list){.next = (struct robust_list *)0x7f48a9a02a20,},.futex_offset = (long int)-32,}, len: 24) = root@number:~# strace prints the default integer args: root@number:~# strace -e set_robust_list sleep 1 set_robust_list(0x7efd99559a20, 24) = 0 +++ exited with 0 +++ root@number:~# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org Link: https://lore.kernel.org/lkml/ZuH6MquMraBvODRp@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 17:25:45 -03:00
Arnaldo Carvalho de Melo	0c1019e346	perf trace: Mark the 'rseq' arg in the rseq syscall as coming from user space With that it uses the generic BTF based pretty printer: root@number:~# grep -w rseq /sys/kernel/tracing/events/syscalls/sys_enter_rseq/format field:struct rseq * rseq; offset:16; size:8; signed:0; print fmt: "rseq: 0x%08lx, rseq_len: 0x%08lx, flags: 0x%08lx, sig: 0x%08lx", ((unsigned long)(REC->rseq)), ((unsigned long)(REC->rseq_len)), ((unsigned long)(REC->flags)), ((unsigned long)(REC->sig)) root@number:~# Before: root@number:~# perf trace -e rseq 0.000 ( 0.017 ms): Isolated Web C/1195452 rseq(rseq: 0x7ff0ecfe6fe0, rseq_len: 32, sig: 1392848979) = 0 74.018 ( 0.006 ms): :1195453/1195453 rseq(rseq: 0x7f2af20fffe0, rseq_len: 32, sig: 1392848979) = 0 1817.220 ( 0.009 ms): Isolated Web C/1195454 rseq(rseq: 0x7f5c9ec7dfe0, rseq_len: 32, sig: 1392848979) = 0 2515.526 ( 0.034 ms): :1195455/1195455 rseq(rseq: 0x7f61503fffe0, rseq_len: 32, sig: 1392848979) = 0 ^Croot@number:~# After: root@number:~# perf trace -e rseq 0.000 ( 0.019 ms): Isolated Web C/1197258 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)4,.cpu_id = (__u32)4,.mm_cid = (__u32)5,}, rseq_len: 32, sig: 1392848979) = 0 1663.835 ( 0.019 ms): Isolated Web C/1197259 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)24,.cpu_id = (__u32)24,.mm_cid = (__u32)2,}, rseq_len: 32, sig: 1392848979) = 0 4750.444 ( 0.018 ms): Isolated Web C/1197260 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)8,.cpu_id = (__u32)8,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0 4994.132 ( 0.018 ms): Isolated Web C/1197261 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)10,.cpu_id = (__u32)10,.mm_cid = (__u32)1,}, rseq_len: 32, sig: 1392848979) = 0 4997.578 ( 0.011 ms): Isolated Web C/1197263 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)16,.cpu_id = (__u32)16,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0 4997.462 ( 0.014 ms): Isolated Web C/1197262 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)17,.cpu_id = (__u32)17,.mm_cid = (__u32)3,}, rseq_len: 32, sig: 1392848979) = 0 ^Croot@number:~# We'll probably need to come up with some way for using the BTF info to synthesize a test that then gets used and captures the output of the 'perf trace' output to check if the arguments are the ones synthesized, randomically, for now, lets make do manually: root@number:~# cat ~acme/c/rseq.c #include <sys/syscall.h> /* Definition of SYS_* constants / #include <linux/rseq.h> #include <errno.h> #include <string.h> #include <unistd.h> #include <stdint.h> #include <stdio.h> / Provide own rseq stub because glibc doesn't / __attribute__((weak)) int sys_rseq(struct rseq rseq, __u32 rseq_len, int flags, __u32 sig) { return syscall(SYS_rseq, rseq, rseq_len, flags, sig); } int main(int argc, char argv[]) { struct rseq rseq = { .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }; int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf); printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno)); return err; } root@number:~# perf trace -e rseq ~acme/c/rseq sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument) 0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979) = 0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument) root@number:~#root@number:~# cat ~acme/c/rseq.c #include <sys/syscall.h> / Definition of SYS_* constants / #include <linux/rseq.h> #include <errno.h> #include <string.h> #include <unistd.h> #include <stdint.h> #include <stdio.h> / Provide own rseq stub because glibc doesn't / __attribute__((weak)) int sys_rseq(struct rseq rseq, __u32 rseq_len, int flags, __u32 sig) { return syscall(SYS_rseq, rseq, rseq_len, flags, sig); } int main(int argc, char argv[]) { struct rseq rseq = { .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }; int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf); printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno)); return err; } root@number:~# perf trace -e rseq ~acme/c/rseq sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument) 0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979) = 0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument) root@number:~# Interesting, glibc seems to be using rseq here, as in addition to the totally fake one this test case uses, we have this one, around these other syscalls: 0.175 ( 0.001 ms): rseq/1201095 set_tid_address(tidptr: 0x7f6def759a10) = 1201095 (rseq) 0.177 ( 0.001 ms): rseq/1201095 set_robust_list(head: 0x7f6def759a20, len: 24) = 0 0.178 ( 0.001 ms): rseq/1201095 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979) = 0.231 ( 0.005 ms): rseq/1201095 mprotect(start: 0x7f6def93f000, len: 16384, prot: READ) = 0 0.238 ( 0.003 ms): rseq/1201095 mprotect(start: 0x403000, len: 4096, prot: READ) = 0 0.244 ( 0.004 ms): rseq/1201095 mprotect(start: 0x7f6def99c000, len: 8192, prot: READ) Matches strace (well, not really as the strace in fedora:40 doesn't know about rseq, printing just integer values in hex): set_robust_list(0x7fbc6acc7a20, 24) = 0 rseq(0x7fbc6acc8060, 0x20, 0, 0x53053053) = 0 mprotect(0x7fbc6aead000, 16384, PROT_READ) = 0 mprotect(0x403000, 4096, PROT_READ) = 0 mprotect(0x7fbc6af0a000, 8192, PROT_READ) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=81921024, rlim_max=RLIM64_INFINITY}) = 0 munmap(0x7fbc6aebd000, 81563) = 0 rseq(0x7fff15bb9920, 0x20, 0x181cd, 0xdeadbeaf) = -1 EINVAL (Invalid argument) fstat(1, {st_mode=S_IFCHR\|0620, st_rdev=makedev(0x88, 0x9), ...}) = 0 getrandom("\xd0\x34\x97\x17\x61\xc2\x2b\x10", 8, GRND_NONBLOCK) = 8 brk(NULL) = 0x18ff4000 brk(0x19015000) = 0x19015000 write(1, "sys_rseq({ .cpu_id_start = 12, ."..., 136sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument) ) = 136 exit_group(-1) = ? +++ exited with 255 +++ root@number:~# And also the focus for the v6.13 should be to have a better, strace like BTF pretty printer as one of the outputs we can get from the libbpf BTF dumper. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuH2K1LLt1pIDkbd@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 17:05:23 -03:00
Kan Liang	edf3ce0ed3	perf env: Find correct branch counter info on hybrid No event is printed in the "Branch Counter" column on hybrid machines. For example, $ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter $ perf report --total-cycles # Branch counter abbr list: # cpu_core/branch-instructions/pp = A # cpu_core/branches/ = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter # ............... .............. ........... .......... .............. 44.54% 727.1K 0.00% 1 \|+ \|+ \| 36.31% 592.7K 0.00% 2 \|+ \|+ \| 17.83% 291.1K 0.00% 1 \|+ \|+ \| The branch counter information (br_cntr_width and br_cntr_nr) in the perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS is not available on hybrid machines. Without the width information, the number of occurrences of an event cannot be calculated. For a hybrid machine, the caps information should be retrieved from the PMU_CAPS, and stored in the perf_env->pmu_caps. Add a perf_env__find_br_cntr_info() to return the correct branch counter information from the corresponding fields. Committer notes: While testing I couldn't s ee those "Branch counter" columns enabled by pressing 'B' on the TUI, after reporting it to the list Kan explained the situation: <quote Kan Liang> For a hybrid client, the "Branch Counter" feature is only supported starting from the just released Lunar Lake. Perf falls back to only "ANY" on your Raptor Lake. The "The branch counter is not available" message is expected. Here is the 'perf evlist' result from my Lunar Lake machine, # perf evlist -v cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|READ\|PERIOD\|BRANCH_STACK\|IDENTIFIER, read_format: ID\|GROUP\|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY\|COUNTERS # </quote> Fixes: `6f9d8d1de2` ("perf script: Add branch counters") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240909184201.553519-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 13:08:46 -03:00
Kan Liang	9953807c9e	perf evlist: Print hint for group An event group is a critical relationship. There is a -g option that can display the relationship. But it's hard for a user to know when should this option be applied. If there is an event group in the perf record, print a hint to suggest the user apply the -g to display the group information. With the patch, $ perf record -e "{cycles,instructions},instructions" sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (4 samples) ] $ $ perf evlist cycles instructions instructions # Tip: use 'perf evlist -g' to show group information $ perf evlist -g {cycles,instructions} instructions $ Committer testing: So for a perf.data file _with_ a group: root@number:~# perf evlist -g {cpu_core/branch-instructions/pp,cpu_core/branches/} dummy:u root@number:~# perf evlist cpu_core/branch-instructions/pp cpu_core/branches/ dummy:u # Tip: use 'perf evlist -g' to show group information root@number:~# Then for something _without_ a group, no hint: root@number:~# perf record ls <SNIP> [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.035 MB perf.data (7 samples) ] root@number:~# perf evlist cpu_atom/cycles/P cpu_core/cycles/P dummy:u root@number:~# No suggestion, good. Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Closes: https://lore.kernel.org/lkml/ZttgvduaKsVn1r4p@x1/ Link: https://lore.kernel.org/r/20240908202847.176280-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 13:08:45 -03:00
Sam James	eb9b9a6f5a	tools: Drop nonsensical -O6 -O6 is very much not-a-thing. Really, this should've been dropped entirely in `49b3cd306e` ("tools: Set the maximum optimization level according to the compiler being used") instead of just passing it for not-Clang. Just collapse it down to -O3, instead of "-O6 unless Clang, in which case -O3". GCC interprets > -O3 as -O3. It doesn't even interpret > -O3 as -Ofast, which is a good thing, given -Ofast has specific (non-)requirements for code built using it. So, this does nothing except look a bit daft. Remove the silliness and also save a few lines in the Makefiles accordingly. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com> Signed-off-by: Sam James <sam@gentoo.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/4f01524fa4ea91c7146a41e26ceaf9dae4c127e4.1725821201.git.sam@gentoo.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 13:08:36 -03:00
Ian Rogers	89c0a55e55	perf pmu: To info add event_type_desc All PMU events are assumed to be "Kernel PMU event", however, this isn't true for fake PMUs and won't be true with the addition of more software PMUs. Make the PMU's type description name configurable - largely for printing callbacks. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240907050830.6752-5-irogers@google.com Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 11:29:20 -03:00
Ian Rogers	f08cc25843	perf evsel: Add accessor for tool_event Currently tool events use a dedicated variable within the evsel. Later changes will move this to the unused struct perf_event_attr config for these events. Add an accessor to allow the later change to be well typed and avoid changing all uses. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 11:28:27 -03:00
Ian Rogers	925320737a	perf pmus: Fake PMU clean up Rather than passing a fake PMU around, just pass that the fake PMU should be used - true when doing testing. Move the fake PMU into pmus.[ch] and try to abstract the PMU's properties in pmu.c, ie so there is less "if fake_pmu" in non-PMU code. Give the fake PMU a made up type number. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240907050830.6752-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 11:27:42 -03:00
Ian Rogers	d3d5c1a00f	perf list: Avoid potential out of bounds memory read If a desc string is 0 length then -1 will be out of bounds, add a check. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240907050830.6752-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 11:26:27 -03:00
Andrew Kreimer	4ae354d73a	perf help: Fix a typo ("bellow") Fix a typo in comments. Reported-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20240907131006.18510-1-algonell@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 11:24:12 -03:00
Changbin Du	74298dd8ac	perf ftrace: Detect whether ftrace is enabled on system To make error messages more accurate, this change detects whether ftrace is enabled on system by checking trace file "set_ftrace_pid". Before: # perf ftrace failed to reset ftrace # After: # perf ftrace ftrace is not supported on this system # Committer testing: Doing it in an unprivileged toolbox container on Fedora 40: Before: acme@number:~/git/perf-tools-next$ toolbox enter perf ⬢[acme@toolbox perf-tools-next]$ sudo su - ⬢[root@toolbox ~]# ~acme/bin/perf ftrace failed to reset ftrace ⬢[root@toolbox ~]# After this patch: ⬢[root@toolbox ~]# ~acme/bin/perf ftrace ftrace is not supported on this system ⬢[root@toolbox ~]# Maybe we could check if we are in such as situation, inside an unprivileged container, and provide a HINT line? Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Changbin Du <changbin.du@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240911100126.900779-1-changbin.du@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 09:35:35 -03:00
Arnaldo Carvalho de Melo	83420d5f58	perf test shell probe_vfs_getname: Remove extraneous '=' from probe line number regex Thomas reported the vfs_getname perf tests failing on s/390, it seems it was just to some extraneous '=' somehow getting into the regexp, remove it, now: root@x1:~# perf test getname 91: Add vfs_getname probe to get syscall args filenames : Ok 93: Use vfs_getname probe to get syscall args filenames : FAILED! 126: Check open filename arg using perf trace + vfs_getname : Ok root@x1:~# Second one remains a mistery, have to take some time to nail it down. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vasily Gorbik <gor@linux.ibm.com>, Link: https://lore.kernel.org/lkml/1d7f3b7b-9edc-4d90-955c-9345428563f1@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 09:35:34 -03:00
Arnaldo Carvalho de Melo	9327f0ecad	perf build: Require at least clang 16.0.6 to build BPF skeletons Howard reported problems using perf features that use BPF: perf $ clang -v Debian clang version 15.0.6 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /bin Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12 Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12 Candidate multilib: .;@m64 Selected multilib: .;@m64 perf $ ./perf trace -e write --max-events=1 libbpf: prog 'sys_enter_rename': BPF program load failed: Permission denied libbpf: prog 'sys_enter_rename': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 But it works with: perf $ clang -v Debian clang version 16.0.6 (15~deb12u1) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /bin Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12 Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12 Candidate multilib: .;@m64 Selected multilib: .;@m64 perf $ ./perf trace -e write --max-events=1 0.000 ( 0.009 ms): gmain/1448 write(fd: 4, buf: \1\0\0\0\0\0\0\0, count: 8) = 8 (kworker/0:0-eve) perf $ So lets make that the required version, if you happen to have a slightly older version where this work, please report so that we can adjust the minimum required version. Reported-by: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuGL9ROeTV2uXoSp@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 09:35:34 -03:00
Arnaldo Carvalho de Melo	4c1af9bf97	perf trace: If a syscall arg is marked as 'const', assume it is coming _from_ userspace We need to decide where to copy syscall arg contents, if at the syscalls:sys_entry hook, meaning is something that is coming from user to kernel space, or if it is a response, i.e. if it is something the _kernel_ is filling in and thus going to userspace. Since we have 'const' used in those syscalls, and unsure about this being consistent, doing: root@number:~# echo $(grep const /sys/kernel/tracing/events/syscalls/sys_enter_/format \| grep struct \| cut -c47- \| cut -d'/' -f1) clock_nanosleep clock_settime epoll_pwait2 futex io_pgetevents landlock_create_ruleset listmount mq_getsetattr mq_notify mq_timedreceive mq_timedsend preadv2 preadv prlimit64 process_madvise process_vm_readv process_vm_readv process_vm_writev process_vm_writev pwritev2 pwritev readv rt_sigaction rt_sigtimedwait semtimedop statmount timerfd_settime timer_settime vmsplice writev root@number:~# Seems to indicate that we can use that for the ones that have the 'const' to mark it as coming from user space, do it. Most notable/frequent syscall that now gets BTF pretty printed in a system wide 'perf trace' session is: root@number:~# perf trace 21.160 ( ): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e1dfe964, op: WAIT_BITSET\|PRIVATE_FLAG, utime: (struct __kernel_timespec){.tv_sec = (__kernel_time64_t)50290,.tv_nsec = (long long int)810362837,}, val3: MATCH_ANY) ... 21.166 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE\|PRIVATE_FLAG, val: 1) = 0 21.169 ( 0.001 ms): RemVidChild/6995 sendmsg(fd: 25<socket:[78915]>, msg: 0x7f49e9af9da0, flags: DONTWAIT) = 280 21.172 ( 0.289 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa58, op: WAIT_BITSET\|PRIVATE_FLAG\|CLOCK_REALTIME, val3: MATCH_ANY) = 0 21.463 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE\|PRIVATE_FLAG, val: 1) = 0 21.467 ( 0.001 ms): RemVidChild/6995 futex(uaddr: 0x7f49e28bb964, op: WAKE\|PRIVATE_FLAG, val: 1) = 1 21.160 ( 0.314 ms): MediaSu~isor #/1028597 ... [continued]: futex()) = 0 21.469 ( ): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa5c, op: WAIT_BITSET\|PRIVATE_FLAG\|CLOCK_REALTIME, val3: MATCH_ANY) ... 21.475 ( 0.000 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49d0223040, op: WAKE\|PRIVATE_FLAG, val: 1) = 0 21.478 ( 0.001 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e26ac964, op: WAKE\|PRIVATE_FLAG, val: 1) = 1 ^Croot@number:~# root@number:~# cat /sys/kernel/tracing/events/syscalls/sys_enter_futex/format name: sys_enter_futex ID: 454 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:int __syscall_nr; offset:8; size:4; signed:1; field:u32 uaddr; offset:16; size:8; signed:0; field:int op; offset:24; size:8; signed:0; field:u32 val; offset:32; size:8; signed:0; field:const struct __kernel_timespec * utime; offset:40; size:8; signed:0; field:u32 * uaddr2; offset:48; size:8; signed:0; field:u32 val3; offset:56; size:8; signed:0; print fmt: "uaddr: 0x%08lx, op: 0x%08lx, val: 0x%08lx, utime: 0x%08lx, uaddr2: 0x%08lx, val3: 0x%08lx", ((unsigned long)(REC->uaddr)), ((unsigned long)(REC->op)), ((unsigned long)(REC->val)), ((unsigned long)(REC->utime)), ((unsigned long)(REC->uaddr2)), ((unsigned long)(REC->val3)) root@number:~# Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWnuQrrBoTn6Rrn6vM_xQ2fCoc9i-AitD7abTcNi-4o1Q@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 09:35:34 -03:00
Yang Li	e37b315c17	perf parse-events: Remove duplicated include in parse-events.c The header files parse-events.h is included twice in parse-events.c, so one inclusion of each can be removed. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10822 Link: https://lore.kernel.org/r/20240910005522.35994-1-yang.lee@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-11 09:35:27 -03:00
Ian Rogers	02b2705017	perf callchain: Allow symbols to be optional when resolving a callchain In uses like 'perf inject' it is not necessary to gather the symbol for each call chain location, the map for the sample IP is wanted so that build IDs and the like can be injected. Make gathering the symbol in the callchain_cursor optional. For a 'perf inject -B' command this lowers the peak RSS from 54.1MB to 29.6MB by avoiding loading symbols. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 17:32:47 -03:00
Ian Rogers	64eed019f3	perf inject: Lazy build-id mmap2 event insertion Add -B option that lazily inserts mmap2 events thereby dropping all mmap events without samples. This is similar to the behavior of -b where only build_id events are inserted when a dso is accessed in a sample. File size savings can be significant in system-wide mode, consider: $ perf record -g -a -o perf.data sleep 1 $ perf inject -B -i perf.data -o perf.new.data $ ls -al perf.data perf.new.data 5147049 perf.data 2248493 perf.new.data Give test coverage of the new option in pipe test. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 17:32:47 -03:00
Ian Rogers	d762ba020d	perf inject: Add new mmap2-buildid-all option Add an option that allows all mmap or mmap2 events to be rewritten as mmap2 events with build IDs. This is similar to the existing -b/--build-ids and --buildid-all options except instead of adding a build_id event an existing mmap/mmap2 event is used as a template and a new mmap2 event synthesized from it. As mmap2 events are typical this avoids the insertion of build_id events. Add test coverage to the pipe test. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 17:32:47 -03:00
Ian Rogers	ae39ba1655	perf inject: Fix build ID injection Build ID injection wasn't inserting a sample ID and aligning events to 64 bytes rather than 8. No sample ID means events are unordered and two different build_id events for the same path, as happens when a file is replaced, can't be differentiated. Add in sample ID insertion for the build_id events alongside some refactoring. The refactoring better aligns the function arguments for different use cases, such as synthesizing build_id events without needing to have a dso. The misc bits are explicitly passed as with callchains the maps/dsos may span user and kernel land, so using sample->cpumode isn't good enough. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 17:32:47 -03:00
Namhyung Kim	02648783c2	perf annotate-data: Add pr_debug_scope() The pr_debug_scope() is to print more information about the scope DIE during the instruction tracking so that it can help finding relevant debug info and the source code like inlined functions more easily. $ perf --debug type-profile annotate --data-type ... ----------------------------------------------------------- find data type for 0(reg0, reg12) at set_task_cpu+0xdd CU for kernel/sched/core.c (die:0x1268dae) frame base: cfa=1 fbreg=7 scope: [3/3] (die:12b6d28) [inlined] set_task_rq <<<--- (here) bb: [9f - dd] var [9f] reg3 type='struct task_struct*' size=0x8 (die:0x126aff0) var [9f] reg6 type='unsigned int' size=0x4 (die:0x1268e0d) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240909214251.3033827-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 17:32:47 -03:00
Namhyung Kim	c8b9358778	perf annotate: Treat 'call' instruction as stack operation I found some portion of mem-store events sampled on CALL instruction which has no memory access. But it actually saves a return address into stack. It should be considered as a stack operation like RET instruction. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240909214251.3033827-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 17:32:47 -03:00
James Clark	206dcfca1f	perf build: Autodetect minimum required llvm-dev version The new LLVM addr2line feature requires a minimum version of 13 to compile. Add a feature check for the version so that NO_LLVM=1 doesn't need to be explicitly added. Leave the existing llvm feature check intact because it's used by tools other than Perf. This fixes the following compilation error when the llvm-dev version doesn't match: util/llvm-c-helpers.cpp: In function 'char* llvm_name_for_code(dso, const char, u64)': util/llvm-c-helpers.cpp:178:21: error: 'std::remove_reference_t<llvm::DILineInfo>' {aka 'struct llvm::DILineInfo'} has no member named 'StartAddress' 178 \| addr, res_or_err->StartAddress ? *res_or_err->StartAddress : 0); Fixes: `c3f8644c21` ("perf report: Support LLVM for addr2line()") Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bill Wendling <morbo@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Manu Bretelle <chantr4@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <qmo@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20240910140405.568791-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 17:32:46 -03:00
Arnaldo Carvalho de Melo	375f9262ac	perf trace: Mark the rlim arg in the prlimit64 and setrlimit syscalls as coming from user space With that it uses the generic BTF based pretty printer: root@number:~# perf trace -e prlimit64 0.000 ( 0.004 ms): :3417020/3417020 prlimit64(resource: NOFILE, old_rlim: 0x7fb8842fe3b0) = 0 0.126 ( 0.003 ms): Chroot Helper/3417022 prlimit64(resource: NOFILE, old_rlim: 0x7fb8842fdfd0) = 0 12.557 ( 0.005 ms): firefox/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1b80) = 0 26.640 ( 0.006 ms): MainThread/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1780) = 0 27.553 ( 0.002 ms): Web Content/3417020 prlimit64(resource: AS, old_rlim: 0x7ffe9ade1660) = 0 29.405 ( 0.003 ms): Web Content/3417020 prlimit64(resource: NOFILE, old_rlim: 0x7ffe9ade0c80) = 0 30.471 ( 0.002 ms): Web Content/3417020 prlimit64(resource: RTTIME, old_rlim: 0x7ffe9ade1370) = 0 30.485 ( 0.001 ms): Web Content/3417020 prlimit64(resource: RTTIME, new_rlim: (struct rlimit64){.rlim_cur = (__u64)50000,.rlim_max = (__u64)200000,}) = 0 31.779 ( 0.001 ms): Web Content/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1670) = 0 ^Croot@number:~# Better than before, still needs improvements in the configurability of the libbpf BTF dumper to get it to the strace output standard. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuBQI-f8CGpuhIdH@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 17:32:46 -03:00
Arnaldo Carvalho de Melo	f3f16112c6	perf trace: Support collecting 'union's with the BPF augmenter And reuse the BTF based struct pretty printer, with that we can offer initial support for the 'bpf' syscall's second argument, a 'union bpf_attr' pointer. But this is not that satisfactory as the libbpf btf dumper will pretty print _all_ the union, we need to have a way to say that the first arg selects the type for the union member to be pretty printed, something like what pahole does translating the PERF_RECORD_ selector into a name, and using that name to find a matching struct. In the case of 'union bpf_attr' it would map PROG_LOAD to one of the union members, but unfortunately there is no such mapping: root@number:~# pahole bpf_attr union bpf_attr { struct { __u32 map_type; /* 0 4 / __u32 key_size; / 4 4 / __u32 value_size; / 8 4 / __u32 max_entries; / 12 4 / __u32 map_flags; / 16 4 / __u32 inner_map_fd; / 20 4 / __u32 numa_node; / 24 4 / char map_name[16]; / 28 16 / __u32 map_ifindex; / 44 4 / __u32 btf_fd; / 48 4 / __u32 btf_key_type_id; / 52 4 / __u32 btf_value_type_id; / 56 4 / __u32 btf_vmlinux_value_type_id; / 60 4 / / --- cacheline 1 boundary (64 bytes) --- / __u64 map_extra; / 64 8 / __s32 value_type_btf_obj_fd; / 72 4 / __s32 map_token_fd; / 76 4 / }; / 0 80 / struct { __u32 map_fd; / 0 4 / / XXX 4 bytes hole, try to pack / __u64 key; / 8 8 / union { __u64 value; / 16 8 / __u64 next_key; / 16 8 / }; / 16 8 / __u64 flags; / 24 8 / }; / 0 32 / struct { __u64 in_batch; / 0 8 / __u64 out_batch; / 8 8 / __u64 keys; / 16 8 / __u64 values; / 24 8 / __u32 count; / 32 4 / __u32 map_fd; / 36 4 / __u64 elem_flags; / 40 8 / __u64 flags; / 48 8 / } batch; / 0 56 / struct { __u32 prog_type; / 0 4 / __u32 insn_cnt; / 4 4 / __u64 insns; / 8 8 / __u64 license; / 16 8 / __u32 log_level; / 24 4 / __u32 log_size; / 28 4 / __u64 log_buf; / 32 8 / __u32 kern_version; / 40 4 / __u32 prog_flags; / 44 4 / char prog_name[16]; / 48 16 / / --- cacheline 1 boundary (64 bytes) --- / __u32 prog_ifindex; / 64 4 / __u32 expected_attach_type; / 68 4 / __u32 prog_btf_fd; / 72 4 / __u32 func_info_rec_size; / 76 4 / __u64 func_info; / 80 8 / __u32 func_info_cnt; / 88 4 / __u32 line_info_rec_size; / 92 4 / __u64 line_info; / 96 8 / __u32 line_info_cnt; / 104 4 / __u32 attach_btf_id; / 108 4 / union { __u32 attach_prog_fd; / 112 4 / __u32 attach_btf_obj_fd; / 112 4 / }; / 112 4 / __u32 core_relo_cnt; / 116 4 / __u64 fd_array; / 120 8 / / --- cacheline 2 boundary (128 bytes) --- / __u64 core_relos; / 128 8 / __u32 core_relo_rec_size; / 136 4 / __u32 log_true_size; / 140 4 / __s32 prog_token_fd; / 144 4 / }; / 0 152 / struct { __u64 pathname; / 0 8 / __u32 bpf_fd; / 8 4 / __u32 file_flags; / 12 4 / __s32 path_fd; / 16 4 / }; / 0 24 / struct { union { __u32 target_fd; / 0 4 / __u32 target_ifindex; / 0 4 / }; / 0 4 / __u32 attach_bpf_fd; / 4 4 / __u32 attach_type; / 8 4 / __u32 attach_flags; / 12 4 / __u32 replace_bpf_fd; / 16 4 / union { __u32 relative_fd; / 20 4 / __u32 relative_id; / 20 4 / }; / 20 4 / __u64 expected_revision; / 24 8 / }; / 0 32 / struct { __u32 prog_fd; / 0 4 / __u32 retval; / 4 4 / __u32 data_size_in; / 8 4 / __u32 data_size_out; / 12 4 / __u64 data_in; / 16 8 / __u64 data_out; / 24 8 / __u32 repeat; / 32 4 / __u32 duration; / 36 4 / __u32 ctx_size_in; / 40 4 / __u32 ctx_size_out; / 44 4 / __u64 ctx_in; / 48 8 / __u64 ctx_out; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / __u32 flags; / 64 4 / __u32 cpu; / 68 4 / __u32 batch_size; / 72 4 / } test; / 0 80 / struct { union { __u32 start_id; / 0 4 / __u32 prog_id; / 0 4 / __u32 map_id; / 0 4 / __u32 btf_id; / 0 4 / __u32 link_id; / 0 4 / }; / 0 4 / __u32 next_id; / 4 4 / __u32 open_flags; / 8 4 / }; / 0 12 / struct { __u32 bpf_fd; / 0 4 / __u32 info_len; / 4 4 / __u64 info; / 8 8 / } info; / 0 16 / struct { union { __u32 target_fd; / 0 4 / __u32 target_ifindex; / 0 4 / }; / 0 4 / __u32 attach_type; / 4 4 / __u32 query_flags; / 8 4 / __u32 attach_flags; / 12 4 / __u64 prog_ids; / 16 8 / union { __u32 prog_cnt; / 24 4 / __u32 count; / 24 4 / }; / 24 4 / / XXX 4 bytes hole, try to pack / __u64 prog_attach_flags; / 32 8 / __u64 link_ids; / 40 8 / __u64 link_attach_flags; / 48 8 / __u64 revision; / 56 8 / } query; / 0 64 / struct { __u64 name; / 0 8 / __u32 prog_fd; / 8 4 / / XXX 4 bytes hole, try to pack / __u64 cookie; / 16 8 / } raw_tracepoint; / 0 24 / struct { __u64 btf; / 0 8 / __u64 btf_log_buf; / 8 8 / __u32 btf_size; / 16 4 / __u32 btf_log_size; / 20 4 / __u32 btf_log_level; / 24 4 / __u32 btf_log_true_size; / 28 4 / __u32 btf_flags; / 32 4 / __s32 btf_token_fd; / 36 4 / }; / 0 40 / struct { __u32 pid; / 0 4 / __u32 fd; / 4 4 / __u32 flags; / 8 4 / __u32 buf_len; / 12 4 / __u64 buf; / 16 8 / __u32 prog_id; / 24 4 / __u32 fd_type; / 28 4 / __u64 probe_offset; / 32 8 / __u64 probe_addr; / 40 8 / } task_fd_query; / 0 48 / struct { union { __u32 prog_fd; / 0 4 / __u32 map_fd; / 0 4 / }; / 0 4 / union { __u32 target_fd; / 4 4 / __u32 target_ifindex; / 4 4 / }; / 4 4 / __u32 attach_type; / 8 4 / __u32 flags; / 12 4 / union { __u32 target_btf_id; / 16 4 / struct { __u64 iter_info; / 16 8 / __u32 iter_info_len; / 24 4 / }; / 16 16 / struct { __u64 bpf_cookie; / 16 8 / } perf_event; / 16 8 / struct { __u32 flags; / 16 4 / __u32 cnt; / 20 4 / __u64 syms; / 24 8 / __u64 addrs; / 32 8 / __u64 cookies; / 40 8 / } kprobe_multi; / 16 32 / struct { __u32 target_btf_id; / 16 4 / / XXX 4 bytes hole, try to pack / __u64 cookie; / 24 8 / } tracing; / 16 16 / struct { __u32 pf; / 16 4 / __u32 hooknum; / 20 4 / __s32 priority; / 24 4 / __u32 flags; / 28 4 / } netfilter; / 16 16 / struct { union { __u32 relative_fd; / 16 4 / __u32 relative_id; / 16 4 / }; / 16 4 / / XXX 4 bytes hole, try to pack / __u64 expected_revision; / 24 8 / } tcx; / 16 16 / struct { __u64 path; / 16 8 / __u64 offsets; / 24 8 / __u64 ref_ctr_offsets; / 32 8 / __u64 cookies; / 40 8 / __u32 cnt; / 48 4 / __u32 flags; / 52 4 / __u32 pid; / 56 4 / } uprobe_multi; / 16 48 / struct { union { __u32 relative_fd; / 16 4 / __u32 relative_id; / 16 4 / }; / 16 4 / / XXX 4 bytes hole, try to pack / __u64 expected_revision; / 24 8 / } netkit; / 16 16 / }; / 16 48 / } link_create; / 0 64 / struct { __u32 link_fd; / 0 4 / union { __u32 new_prog_fd; / 4 4 / __u32 new_map_fd; / 4 4 / }; / 4 4 / __u32 flags; / 8 4 / union { __u32 old_prog_fd; / 12 4 / __u32 old_map_fd; / 12 4 / }; / 12 4 / } link_update; / 0 16 / struct { __u32 link_fd; / 0 4 / } link_detach; / 0 4 / struct { __u32 type; / 0 4 / } enable_stats; / 0 4 / struct { __u32 link_fd; / 0 4 / __u32 flags; / 4 4 / } iter_create; / 0 8 / struct { __u32 prog_fd; / 0 4 / __u32 map_fd; / 4 4 / __u32 flags; / 8 4 / } prog_bind_map; / 0 12 / struct { __u32 flags; / 0 4 / __u32 bpffs_fd; / 4 4 / } token_create; / 0 8 */ }; root@number:~# So this is one case where BTF gets us only that far, not getting all the way to automate the pretty printing of unions designed like 'union bpf_attr', we will need a custom pretty printer for this union, as using the libbpf union BTF dumper is way too verbose: root@number:~# perf trace --max-events 1 -e bpf bpftool map 0.000 ( 0.054 ms): bpftool/3409073 bpf(cmd: PROG_LOAD, uattr: (union bpf_attr){(struct){.map_type = (__u32)1,.key_size = (__u32)2,.value_size = (__u32)2755142048,.max_entries = (__u32)32764,.map_flags = (__u32)150263906,.inner_map_fd = (__u32)21920,},(struct){.map_fd = (__u32)1,.key = (__u64)140723063628192,(union){.value = (__u64)94145833392226,.next_key = (__u64)94145833392226,},},.batch = (struct){.in_batch = (__u64)8589934593,.out_batch = (__u64)140723063628192,.keys = (__u64)94145833392226,},(struct){.prog_type = (__u32)1,.insn_cnt = (__u32)2,.insns = (__u64)140723063628192,.license = (__u64)94145833392226,},(struct){.pathname = (__u64)8589934593,.bpf_fd = (__u32)2755142048,.file_flags = (__u32)32764,.path_fd = (__s32)150263906,},(struct){(union){.target_fd = (__u32)1,.target_ifindex = (__u32)1,},.attach_bpf_fd = (__u32)2,.attach_type = (__u32)2755142048,.attach_flags = (__u32)32764,.replace_bpf_fd = (__u32)150263906,(union){.relative_fd = (__u32)21920,.relative_id = (__u32)21920,},},.test = (struct){.prog_fd = (__u32)1,.retval = (__u32)2,.data_size_in = (__u32)2755142048,.data_size_out = (__u32)32764,.data_in = (__u64)94145833392226,},(struct){(union){.start_id = (__u32)1,.prog_id = (__u32)1,.map_id = (__u32)1,.btf_id = (__u32)1,.link_id = (__u32)1,},.next_id = (__u32)2,.open_flags = (__u32)2755142048,},.info = (struct){.bpf_fd = (__u32)1,.info_len = (__u32)2,.info = (__u64)140723063628192,},.query = (struct){(union){.target_fd = (__u32)1,.target_ifindex = (__u32)1,},.attach_type = (__u32)2,.query_flags = (__u32)2755142048,.attach_flags = (__u32)32764,.prog_ids = (__u64)94145833392226,},.raw_tracepoint = (struct){.name = (__u64)8589934593,.prog_fd = (__u32)2755142048,.cookie = (__u64)94145833392226,},(struct){.btf = (__u64)8589934593,.btf_log_buf = (__u64)140723063628192,.btf_size = (__u32)150263906,.btf_log_size = (__u32)21920,},.task_fd_query = (struct){.pid = (__u32)1,.fd = (__u32)2,.flags = (__u32)2755142048,.buf_len = (__u32)32764,.buf = (__u64)94145833392226,},.link_create = (struct){(union){.prog_fd = (__u32)1,.map_fd = (__u32)1,},(u) = 3 root@number:~# 2: prog_array name hid_jmp_table flags 0x0 key 4B value 4B max_entries 1024 memlock 8440B owner_prog_type tracing owner jited 13: hash_of_maps name cgroup_hash flags 0x0 key 8B value 4B max_entries 2048 memlock 167584B pids systemd(1) 960: array name libbpf_global flags 0x0 key 4B value 32B max_entries 1 memlock 280B 961: array name pid_iter.rodata flags 0x480 key 4B value 4B max_entries 1 memlock 8192B btf_id 1846 frozen pids bpftool(3409073) 962: array name libbpf_det_bind flags 0x0 key 4B value 32B max_entries 1 memlock 280B root@number:~# For simpler unions this may be better than not seeing any payload, so keep it there. Acked-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuBLat8cbadILNLA@x1 [ Removed needless parenteses in the if block leading to the trace__btf_scnprintf() call, as per Howard's review comments ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 17:31:51 -03:00
Howard Chu	3278024540	perf trace: Add --force-btf for debugging If --force-btf is enabled, prefer btf_dump general pretty printer to perf trace's customized pretty printers. Mostly for debug purposes. Committer testing: diff before/after shows we need several improvements to be able to compare the changes, first we need to cut off/disable mutable data such as pids and timestamps, then what is left are the buffer addresses passed from userspace, returned from kernel space, maybe we can ask 'perf trace' to go on making those reproducible. That would entail a Pointer Address Translation (PAT) like for networking, that would, for simple, reproducible if not for these details, workloads, that we would then use in our regression tests. Enough digression, this is one such diff: openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY\|CLOEXEC) = 3 -fstat(fd: 3, statbuf: 0x7fff01f212a0) = 0 -read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 2998 -read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 0 +fstat(fd: 3, statbuf: 0x7ffc163cf0e0) = 0 +read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 2998 +read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 0 close(fd: 3) = 0 openat(dfd: CWD, filename: "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) openat(dfd: CWD, filename: "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) @@ -45,7 +45,7 @@ openat(dfd: CWD, filename: "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) openat(dfd: CWD, filename: "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) openat(dfd: CWD, filename: "/usr/share/locale/en/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory) -{ .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7fff01f21990) = 0 +(struct __kernel_timespec){.tv_sec = (__kernel_time64_t)1,}, rmtp: 0x7ffc163cf7d0) = The problem more close to our hands is to make the libbpf BTF pretty printer to have a mode that closely resembles what we're trying to resemble: strace output. Being able to run something with 'perf trace' and with 'strace' and get the exact same output should be of interest of anybody wanting to have strace and 'perf trace' regression tested against each other. That last part is 'perf trace' shot at being something so useful as strace... ;-) Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240824163322.60796-8-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 09:52:27 -03:00
Howard Chu	a68fd6a6cd	perf trace: Collect augmented data using BPF Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads TRACE_AUG_MAX_BUF bytes of buffer maximum. Determine what type of argument and how many bytes to read from user space, us ing the value in the beauty_map. This is the relation of parameter type and its corres ponding value in the beauty map, and how many bytes we read eventually: string: 1 -> size of string (till null) struct: size of struct -> size of struct buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF) After reading from user space, we output the augmented data using bpf_perf_event_output(). If the struct augmenter, augment_sys_enter() failed, we fall back to using bpf_tail_call(). I have to make the payload 6 times the size of augmented_arg, to pass the BPF verifier. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-10-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-7-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 09:52:20 -03:00
Howard Chu	b257fac12f	perf trace: Pretty print buffer data Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum buffer size we can augment. BPF will include this header too. Print buffer in a way that's different than just printing a string, we print all the control characters in \digits (such as \0 for null, and \10 for newline, LF). For character that has a bigger value than 127, we print the digits instead of the character itself as well. Committer notes: Simplified the buffer scnprintf to avoid using multiple buffers as discussed in the patch review thread. We can't really all 'buf' args to SCA_BUF as we're collecting so far just on the sys_enter path, so we would be printing the previous 'read' arg buffer contents, not what the kernel puts there. So instead of: static int syscall_fmt__cmp(const void name, const void fmtp) @@ -1987,8 +1989,6 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt arg, struct tep_format_field - else if (strstr(field->type, "char ") && strstr(field->name, "buf")) - arg->scnprintf = SCA_BUF; Do: static const struct syscall_fmt syscall_fmts[] = { + { .name = "write", .errpid = true, + .arg = { [1] = { .scnprintf = SCA_BUF /* buf */, from_user = true, }, }, }, Signed-off-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-8-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-6-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 09:52:13 -03:00
Howard Chu	cb32035214	perf trace: Pretty print struct data Change the arg->augmented.args to arg->augmented.args->value to skip the header for customized pretty printers, since we collect data in BPF using the general augment_sys_enter(), which always adds the header. Use btf_dump API to pretty print augmented struct pointer. Prefer existed pretty-printer than btf general pretty-printer. set compact = true and skip_names = true, so that no newline character and argument name are printed. Committer notes: Simplified the btf_dump_snprintf callback to avoid using multiple buffers, as discussed in the thread accessible via the Link tag below. Also made it do: dump_data_opts.skip_names = !arg->trace->show_arg_names; I.e. show the type and struct field names according to that tunable, we probably need another tunable just for this, but for now if the user wants to see syscall names in addition to its value, it makes sense to see the struct field names according to that tunable. Committer testing: The following have explicitely set beautifiers (SCA_FILENAME, SCA_SOCKADDR and SCA_PERF_ATTR), SCA_FILENAME is here just because we have been wiring up the "renameat2" ("renameat" until recently), so it doesn't use the introduced generic fallback (btf_struct_scnprintf(), see the definition of SCA_PERF_ATTR, SCA_SOCKADDR to see the more feature rich beautifiers, that are not using BTF): root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654 0.000 ( 0.039 ms): mv/258478 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0 root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com 0.000 ( 0.014 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 0.040 ( 0.003 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a6980, len: 97, flags: DONTWAIT\|NOSIGNAL) = 97 18.742 ( 0.020 ms): ping/258481 sendto(fd: 5, buff: 0x7ffc04768df0, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20 PING www.google.com (142.251.129.68) 56(84) bytes of data. 18.783 ( 0.012 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0 18.797 ( 0.001 ms): ping/258481 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0 18.815 ( 0.002 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.129.68 }, addrlen: 16) = 0 18.862 ( 0.023 ms): ping/258481 sendto(fd: 3, buff: 0x55bc317a0ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.129.68 }, addr_len: 0x10) = 64 63.330 ( 0.038 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 63.435 ( 0.010 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a8340, len: 110, flags: DONTWAIT\|NOSIGNAL) = 110 64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.2 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 44.158/44.158/44.158/0.000 ms root@number:~# perf trace -e perf_event_open perf stat -e instructions,cache-misses,syscalls:sys_entersleep sleep 1.23456789 0.000 ( 0.010 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0xa00000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER\|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599052088, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 0.016 ( 0.002 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0x400000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER\|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599044082, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 1.838 ( 0.006 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5 1.846 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 6 1.849 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 7 1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9 1.853 ( 0.600 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x190 (syscalls:sys_enter_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 10 2.456 ( 0.016 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x196 (syscalls:sys_enter_clock_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 11 Performance counter stats for 'sleep 1.23456789': 1,402,839 cpu_atom/instructions/ <not counted> cpu_core/instructions/ (0.00%) 11,066 cpu_atom/cache-misses/ <not counted> cpu_core/cache-misses/ (0.00%) 0 syscalls:sys_enter_nanosleep 1 syscalls:sys_enter_clock_nanosleep 1.236246714 seconds time elapsed 0.000000000 seconds user 0.001308000 seconds sys root@number:~# Now if we use it even for the ones we have a specific beautifier in tools/perf/trace/beauty, i.e. use btf_struct_scnprintf() for all structs, by adding the following patch: @@ -2316,7 +2316,7 @@ static size_t syscall__scnprintf_args(struct syscall sc, char bf, size_t size, default_scnprintf = sc->arg_fmt[arg.idx].scnprintf; - if (default_scnprintf == NULL \|\| default_scnprintf == SCA_PTR) { + if (1 \|\| (default_scnprintf == NULL \|\| default_scnprintf == SCA_PTR)) { btf_printed = trace__btf_scnprintf(trace, &arg, bf + printed, size - printed, val, field->type); if (btf_printed) { We get: root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com PING www.google.com (142.251.129.68) 56(84) bytes of data. 0.000 ( 0.015 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0 0.046 ( 0.004 ms): ping/283259 sendto(fd: 5, buff: 0x559b008ae980, len: 97, flags: DONTWAIT\|NOSIGNAL) = 97 0.353 ( 0.012 ms): ping/283259 sendto(fd: 5, buff: 0x7ffc01294960, len: 20, addr: (struct sockaddr){.sa_family = (sa_family_t)16,}, addr_len: 0xc) = 20 0.377 ( 0.006 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addrlen: 16) = 0 0.388 ( 0.010 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)10,}, addrlen: 28) = 0 0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0 0.425 ( 0.045 ms): ping/283259 sendto(fd: 3, buff: 0x559b008a8ac0, len: 64, addr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addr_len: 0x10) = 64 64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.1 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 44.113/44.113/44.113/0.000 ms 44.849 ( 0.038 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0 44.927 ( 0.006 ms): ping/283259 sendto(fd: 5, buff: 0x559b008b03d0, len: 110, flags: DONTWAIT\|NOSIGNAL) = 110 root@number:~# Which looks sane, i.e.: 18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0 Becomes: 0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0 And. #define AF_UNIX 1 /* Unix domain sockets / #define AF_LOCAL 1 / POSIX name for AF_UNIX / #define AF_INET 2 / Internet IP Protocol / <SNIP> #define AF_INET6 10 / IP version 6 */ And 'D' == 68, so the preexisting sockaddr BPF collector is working with the new generic BTF pretty printer (btf_struct_scnprintf()), its just that it doesn't know about 'struct sockaddr' besides what is in BTF, i.e. its an array of bytes, not an IPv4 address that needs extra massaging. Ditto for the 'struct perf_event_attr' case: 1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9 Becomes: 2.081 ( 0.002 ms): :283304/283304 perf_event_open(attr_uptr: (struct perf_event_attr){.size = (__u32)136,.config = (__u64)17179869187,.sample_type = (__u64)65536,.read_format = (__u64)3,.disabled = (__u64)0x1,.inherit = (__u64)0x1,.enable_on_exec = (__u64)0x1,.exclude_guest = (__u64)0x1,}, pid: 283305 (sleep), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9 hex(17179869187) = 0x400000003, etc. read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING is enum perf_event_read_format { PERF_FORMAT_TOTAL_TIME_ENABLED = 1U << 0, PERF_FORMAT_TOTAL_TIME_RUNNING = 1U << 1, and so on. We need to work with the libbpf btf dump api to get one output that matches the 'perf trace'/strace expectations/format, but having this in this current form is already an improvement to 'perf trace', so lets improve from what we have. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-7-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-5-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 09:52:07 -03:00
Howard Chu	7f40306728	perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF Set up beauty_map, load it to BPF, in such format: if argument No.3 is a struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32; if argument No.3 is a string (of syscall number 114) beauty_map[114][2] = 1; if argument No.3 is a buffer, its size is indicated by argument No.4 (of syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this buffer size in BPF / Committer notes: Moved syscall_arg_fmt__cache_btf_struct() from a ifdef HAVE_LIBBPF_SUPPORT to closer to where it is used, that is ifdef'ed on HAVE_BPF_SKEL and thus breaks the build when building with BUILD_BPF_SKEL=0, as detected using 'make -C tools/perf build-test'. Also add 'struct beauty_map_enter' to tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c as we're using it in this patch, otherwise we get this while trying to build at this point in the original patch series: builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’: builtin-trace.c:3725:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’ 3725 \| int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter); \| We also have to take into account syscall_arg_fmt.from_user when telling the kernel what to copy in the sys_enter generic collector, we don't want to collect bogus data in buffers that will only be available to us at sys_exit time, i.e. after the kernel has filled it, so leave this for when we have such a sys_exit based collector. Committer testing: Not wired up yet, so all continues to work, using the existing BPF collector and userspace beautifiers that are augmentation aware: root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename mv 123456 987654 0.000 ( 0.031 ms): mv/20888 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0 root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com 0.000 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 0.040 ( 0.003 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff17980, len: 97, flags: DONTWAIT\|NOSIGNAL) = 97 0.480 ( 0.017 ms): ping/20892 sendto(fd: 5, buff: 0x7ffd82d07150, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20 0.526 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0 0.542 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 0.544 ( 0.004 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.135.100 }, addrlen: 16) = 0 0.559 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.135.100 }, addrlen: 16PING www.google.com (142.251.135.100) 56(84) bytes of data. ) = 0 0.589 ( 0.058 ms): ping/20892 sendto(fd: 3, buff: 0x560b4ff11ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.135.100 }, addr_len: 0x10) = 64 45.250 ( 0.029 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 45.344 ( 0.012 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff19340, len: 111, flags: DONTWAIT\|NOSIGNAL) = 111 64 bytes from rio09s08-in-f4.1e100.net (142.251.135.100): icmp_seq=1 ttl=49 time=44.4 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 44.361/44.361/44.361/0.000 ms root@number:~# Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-4-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-3-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 09:51:59 -03:00
Arnaldo Carvalho de Melo	d92f490cba	perf trace: Mark bpf's attr as from_user This one has no specific pretty printer right now, so will be handled by the generic BTF based one later in this patch series. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-10 09:51:51 -03:00
Arnaldo Carvalho de Melo	c790f2bafb	perf trace: Introduce SCA_TIMESPEC_FROM_USER() to set .from_user = true Paving the way for the generic BPF BTF based syscall arg augmenter. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-09 19:23:04 -03:00
Arnaldo Carvalho de Melo	be14a71984	perf trace: Introduce SCA_SOCKADDR_FROM_USER() to set .from_user = true Paving the way for the generic BPF BTF based syscall arg augmenter. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-09 19:23:04 -03:00
Arnaldo Carvalho de Melo	690eda6508	perf trace: Introduce SCA_PERF_ATTR_FROM_USER() to set .from_user = true Paving the way for the generic BPF BTF based syscall arg augmenter. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-09 19:23:04 -03:00
Arnaldo Carvalho de Melo	2f2e439ba5	perf trace: Mark which syscall arguments go from user space to kernel space We need to know where to collect it in the BPF augmenters, if in the sys_enter hook or in the sys_exit hook. Start with the SCA_FILENAME one, that is just from user to kernel space. The alternative, better, but takes a bit more time than I have now, is to use the __user information that is already in the syscall args and encoded in BTF via a tag, do it later. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-09 19:23:03 -03:00
Arnaldo Carvalho de Melo	c90a88d33a	perf trace: Use a common encoding for augmented arguments, with size + error + payload We were using a more compact format, without explicitely encoding the size and possible error in the payload for an argument. To do it generically, at least as Howard Chu did in his GSoC activities, it is more convenient to use the same model that was being used for string arguments, passing { size, error, payload }. So use that for the non string syscall args we have so far: struct timespec struct perf_event_attr struct sockaddr (this one has even a variable size) With this in place we have the userspace pretty printers: perf_event_attr___scnprintf() syscall_arg__scnprintf_augmented_sockaddr() syscall_arg__scnprintf_augmented_timespec() Ready to have the generic BPF collector in tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c sending its generic payload and thus we'll use them instead of a generic libbpf btf_dump interface that doesn't know about about the sockaddr mux, perf_event_attr non-trivial fields (sample_type, etc), leaving it as a (useful) fallback that prints just basic types until we put in place a more sophisticated pretty printer infrastructure that associates synthesized enums to struct fields using the header scrapers we have in tools/perf/trace/beauty/, some of them in this list: $ ls tools/perf/trace/beauty/.sh tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/kcmp_type.sh tools/perf/trace/beauty/perf_ioctl.sh tools/perf/trace/beauty/statx_mask.sh tools/perf/trace/beauty/clone.sh tools/perf/trace/beauty/kvm_ioctl.sh tools/perf/trace/beauty/pkey_alloc_access_rights.sh tools/perf/trace/beauty/sync_file_range.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/madvise_behavior.sh tools/perf/trace/beauty/prctl_option.sh tools/perf/trace/beauty/usbdevfs_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/mmap_flags.sh tools/perf/trace/beauty/rename_flags.sh tools/perf/trace/beauty/vhost_virtio_ioctl.sh tools/perf/trace/beauty/fs_at_flags.sh tools/perf/trace/beauty/mmap_prot.sh tools/perf/trace/beauty/sndrv_ctl_ioctl.sh tools/perf/trace/beauty/x86_arch_prctl.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/mount_flags.sh tools/perf/trace/beauty/sndrv_pcm_ioctl.sh tools/perf/trace/beauty/fsmount.sh tools/perf/trace/beauty/move_mount_flags.sh tools/perf/trace/beauty/sockaddr.sh tools/perf/trace/beauty/fspick.sh tools/perf/trace/beauty/mremap_flags.sh tools/perf/trace/beauty/socket.sh $ Testing it: root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename mv 123456 987654 0.000 ( 0.031 ms): mv/1193096 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0 root@number:~# perf trace -e nanosleep sleep 1.2345678901 0.000 (1234.654 ms): sleep/1192697 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567891 }, rmtp: 0x7ffe1ea80460) = 0 root@number:~# perf trace -e perf_event_open perf stat -e cpu-clock sleep 1 0.000 ( 0.011 ms): perf/1192701 perf_event_open(attr_uptr: { type: 1 (software), size: 136, config: 0 (PERF_COUNT_SW_CPU_CLOCK), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 1192702 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 Performance counter stats for 'sleep 1': 0.51 msec cpu-clock # 0.001 CPUs utilized 1.001242090 seconds time elapsed 0.000000000 seconds user 0.001010000 seconds sys root@number:~# perf trace -e connect* ping -c 1 bsky.app 0.000 ( 0.130 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 23.907 ( 0.006 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.20.108.158 }, addrlen: 16) = 0 23.915 PING bsky.app (3.20.108.158) 56(84) bytes of data. ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.917 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.12.170.30 }, addrlen: 16) = 0 23.921 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.923 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.217.70.179 }, addrlen: 16) = 0 23.925 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.927 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.132.20.46 }, addrlen: 16) = 0 23.930 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.931 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.142.89.165 }, addrlen: 16) = 0 23.934 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.935 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.119.147.159 }, addrlen: 16) = 0 23.938 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.940 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.22.38.164 }, addrlen: 16) = 0 23.942 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.944 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.13.14.133 }, addrlen: 16) = 0 23.956 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 3.20.108.158 }, addrlen: 16) = 0 ^C --- bsky.app ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms root@number:~# Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fW4=2GoP6foAN6qbrCiUzy0a_TzHbd8rvDsakTPfdzvfg@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-09 19:17:03 -03:00
Arnaldo Carvalho de Melo	c1632cc5ed	perf trace augmented_syscalls.bpf: Move the renameat aumenter to renameat2, temporarily While trying to shape Howard Chu's generic BPF augmenter transition into the codebase I got stuck with the renameat2 syscall. Until I noticed that the attempt at reusing augmenters were making it use the 'openat' syscall augmenter, that collect just one string syscall arg, for the 'renameat2' syscall, that takes two strings. So, for the moment, just to help in this transition period, since 'renameat2' is what is used these days in the 'mv' utility, just make the BPF collector be associated with the more widely used syscall, hopefully the transition to Howard's generic BPF augmenter will cure this, so get this out of the way for now! So now we still have that odd "reuse", but for something we're not testing so won't get in the way anymore: root@number:~# rm -f 987654 ; touch 123456 ; perf trace -vv -e rename* mv 123456 987654 \|& grep renameat Reusing "openat" BPF sys_enter augmenter for "renameat" 0.000 ( 0.079 ms): mv/1158612 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0 root@number:~# Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fXjGYs=tpBgETK-P9U-CuXssytk9pSnTXpfphrmmOydWA@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-09 19:16:26 -03:00
Kan Liang	003265bb6f	perf mem: Fix the wrong reference in parse_record_events() A segmentation fault can be triggered when running 'perf mem record -e ldlat-loads' The commit `35b38a71c9` ("perf mem: Rework command option handling") moves the OPT_CALLBACK of event from __cmd_record() to cmd_mem(). When invoking the __cmd_record(), the 'mem' has been referenced (&). So the &mem passed into the parse_record_events() is a double reference (&&) of the original struct perf_mem mem. But in the cmd_mem(), the &mem is the single reference (&) of the original struct perf_mem mem. Fixes: `35b38a71c9` ("perf mem: Rework command option handling") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240905170737.4070743-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-06 11:45:28 -03:00
Kan Liang	5ad7db2c3f	perf mem: Fix missed p-core mem events on ADL and RPL The p-core mem events are missed when launching 'perf mem record' on ADL and RPL. root@number:~# perf mem record sleep 1 Memory events are enabled on a subset of CPUs: 16-27 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.032 MB perf.data ] root@number:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u A variable 'record' in the 'struct perf_mem_event' is to indicate whether a mem event in a mem_events[] should be recorded. The current code only configure the variable for the first eligible PMU. It's good enough for a non-hybrid machine or a hybrid machine which has the same mem_events[]. However, if a different mem_events[] is used for different PMUs on a hybrid machine, e.g., ADL or RPL, the 'record' for the second PMU never get a chance to be set. The mem_events[] of the second PMU are always ignored. 'perf mem' doesn't support the per-PMU configuration now. A per-PMU mem_events[] 'record' variable doesn't make sense. Make it global. That could also avoid searching for the per-PMU mem_events[] via perf_pmu__mem_events_ptr every time. Committer testing: root@number:~# perf evlist -g cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/} cpu_core/mem-stores/P dummy:u root@number:~# The :S for '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}' is not being added by 'perf evlist -g', to be checked. Fixes: `abbdd79b78` ("perf mem: Clean up perf_mem_events__name()") Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/ Link: https://lore.kernel.org/r/20240905170737.4070743-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-06 11:45:17 -03:00
Kan Liang	6e05d28ff2	perf mem: Check mem_events for all eligible PMUs The current perf_pmu__mem_events_init() only checks the availability of the mem_events for the first eligible PMU. It works for non-hybrid machines and hybrid machines that have the same mem_events. However, it may bring issues if a hybrid machine has a different mem_events on different PMU, e.g., Alder Lake and Raptor Lake. A mem-loads-aux event is only required for the p-core. The mem_events on both e-core and p-core should be checked and marked. The issue was not found, because it's hidden by another bug, which only records the mem-events for the e-core. The wrong check for the p-core events didn't yell. Fixes: `abbdd79b78` ("perf mem: Clean up perf_mem_events__name()") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240905170737.4070743-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-06 11:45:07 -03:00
Andi Kleen	4bef6168c1	perf script python: Avoid buffer overflow in python PEBS register interface Running a script that processes PEBS records gives buffer overflows in valgrind. The problem is that the allocation of the register string doesn't include the terminating 0 byte. Fix this. I also replaced the very magic "28" with a more reasonable larger buffer that should fit all registers. There's no need to conserve memory here. ==2106591== Memcheck, a memory error detector ==2106591== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==2106591== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==2106591== Command: ../perf script -i tcall.data gcov.py tcall.gcov ==2106591== ==2106591== Invalid write of size 1 ==2106591== at 0x713354: regs_map (trace-event-python.c:748) ==2106591== by 0x7134EB: set_regs_in_dict (trace-event-python.c:784) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd ==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442) ==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== ==2106591== Invalid read of size 1 ==2106591== at 0x484B6C6: strlen (vg_replace_strmem.c:502) ==2106591== by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899) ==2106591== by 0x7134F7: set_regs_in_dict (trace-event-python.c:786) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd ==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442) ==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== ==2106591== Invalid write of size 1 ==2106591== at 0x713354: regs_map (trace-event-python.c:748) ==2106591== by 0x713539: set_regs_in_dict (trace-event-python.c:789) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd ==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442) ==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== ==2106591== Invalid read of size 1 ==2106591== at 0x484B6C6: strlen (vg_replace_strmem.c:502) ==2106591== by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899) ==2106591== by 0x713545: set_regs_in_dict (trace-event-python.c:791) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd ==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442) ==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== 73056 total, 29 ignored Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240905151058.2127122-2-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-06 11:44:58 -03:00
Ian Rogers	f2dbc77909	perf jevents: Ignore sys when determining a model directory Existing sys directories aren't placed under a model directory like skylake. Placing a sys directory there causes the `is_leaf_dir` test to fail and consequently no events or metrics are generated for the model. Ignore sys directories in this case and update the comments to reflect why. This change has no affect, but when testing with a sys directory for a model people have reported running into the no event/metric issue. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240904211705.915101-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-06 11:44:46 -03:00
Aditya Gupta	35439fe4e2	perf check: Fix inconsistencies in feature names Fix two inconsistencies in feature names as discussed in [1]: 1. Rename "dwarf-unwind-support" to "dwarf-unwind" 2. 'get_cpuid' feature and 'HAVE_AUXTRACE_SUPPORT' names don't look related, change the feature name to 'auxtrace' to match the macro name, as 'get_cpuid' string is not used anywhere to check the feature presence [1]: https://lore.kernel.org/linux-perf-users/ZoRw5we4HLSTZND6@x1/ Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-7-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-04 16:19:53 -03:00
Athira Rajeev	512fcf7d9d	perf tests probe_vfs_getname.sh: Update to use 'perf check feature' In probe_vfs_getname.sh, current we use "perf record --dry-run" to check for libtraceevent and skip the test if perf is not build with libtraceevent. Change the check to use "perf check feature" option Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-6-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-04 16:19:52 -03:00
Aditya Gupta	8a028502b4	perf tools test_task_analyzer.sh: Update to use 'perf check feature' Currently we use output of 'perf version --build-options', to check whether perf was built with libtraceevent support. Instead, use 'perf check feature libtraceevent' to check for libtraceevent support. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-5-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-04 16:19:33 -03:00
Aditya Gupta	6cdd7750de	perf version: Update --build-options to use 'supported_features' array Now that the feature list has been duplicated in a global 'supported_features' array, use that array instead of manually checking status of built-in features. This helps in being consistent with commands such as 'perf check feature', so commands can use the same array, and any new feature can be added at one place, in the 'supported_features' array Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-4-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-04 16:19:29 -03:00
Ian Rogers	9b2b9b66d5	perf jevents: Add cpuid to model lookup command When restricting jevents generated json lookup code with JEVENTS_MODEL a list of models must be provided. Some builds don't know model names but know cpuids. Add a command that can convert a cpuid to a model using mapfile.csv files. This can be used with JEVENTS_MODEL like: $ make JEVENTS_MODEL=`./pmu-events/models.py x86 'GenuineIntel-6-8D-1,AuthenticAMD-26-1' pmu-events/arch/` Committer testing: $ tools/perf/pmu-events/models.py x86 'GenuineIntel-6-8D-1,AuthenticAMD-26-1' tools/perf/pmu-events/arch/ tigerlake,amdzen5 $ perf stat -v sleep 1 \|& head -1 Using CPUID GenuineIntel-6-B7-1 $ tools/perf/pmu-events/models.py x86 'GenuineIntel-6-B7-1' tools/perf/pmu-events/arch/ alderlake $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240904044351.712080-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-04 10:43:18 -03:00
Aditya Gupta	98ad0b7732	perf check: Introduce 'check' subcommand Currently the presence of a feature is checked with a combination of perf version --build-options and greps, such as: perf version --build-options \| grep " on .* HAVE_FEATURE" Instead of this, introduce a subcommand "perf check feature", with which scripts can test for presence of a feature, such as: perf check feature HAVE_FEATURE 'perf check feature' command is expected to have exit status of 0 if feature is built-in, and 1 if it's not built-in or if feature is not known. Multiple features can also be passed as a comma-separated list, in which case the exit status will be 1 only if all of the passed features are built-in. For example, with below command, it will have exit status of 0 only if both libtraceevent and bpf are enabled, else 1 in all other cases perf check feature libtraceevent,bpf The arguments are case-insensitive. An array 'supported_features' has also been introduced that can be used by other commands like 'perf version --build-options', so that new features can be added in one place, with the array Committer testing: $ perf check feature libtraceevent,bpf libtraceevent: [ on ] # HAVE_LIBTRACEEVENT bpf: [ on ] # HAVE_LIBBPF_SUPPORT $ perf check feature libtraceevent libtraceevent: [ on ] # HAVE_LIBTRACEEVENT $ perf check feature bpf bpf: [ on ] # HAVE_LIBBPF_SUPPORT $ perf check -q feature bpf && echo "BPF support is present" BPF support is present $ perf check -q feature Bogus && echo "Bogus support is present" $ Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904061836.55873-3-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-04 09:56:05 -03:00
Aditya Gupta	1a5efc9e13	libsubcmd: Don't free the usage string Currently, commands which depend on 'parse_options_subcommand()' don't show the usage string, and instead show '(null)' $ ./perf sched Usage: (null) -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -i, --input <file> input file name -v, --verbose be more verbose (show symbol address, etc) 'parse_options_subcommand()' is generally expected to initialise the usage string, with information in the passed 'subcommands[]' array This behaviour was changed in: `230a7a71f9` ("libsubcmd: Fix parse-options memory leak") Where the generated usage string is deallocated, and usage[0] string is reassigned as NULL. As discussed in [1], free the allocated usage string in the main function itself, and don't reset usage string to NULL in parse_options_subcommand With this change, the behaviour is restored. $ ./perf sched Usage: perf sched [<options>] {record\|latency\|map\|replay\|script\|timehist} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -i, --input <file> input file name -v, --verbose be more verbose (show symbol address, etc) [1]: https://lore.kernel.org/linux-perf-users/htq5vhx6piet4nuq2mmhk7fs2bhfykv52dbppwxmo3s7du2odf@styd27tioc6e/ Fixes: `230a7a71f9` ("libsubcmd: Fix parse-options memory leak") Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904061836.55873-2-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-04 09:54:24 -03:00
Ian Rogers	fa6cc3f932	perf parse-events: Vary default_breakpoint_len on i386 and arm64 On arm64 the breakpoint length should be 4-bytes but 8-bytes is tolerated as perf passes that as sizeof(long). Just pass the correct value. On i386 the sizeof(long) check in the kernel needs to match the kernel's long size. Check using an environment (uname checks) whether 4 or 8 bytes needs to be passed. Cache the value in a static. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240904050606.752788-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-04 09:50:46 -03:00
Ian Rogers	70b27c756f	perf parse-events: Add default_breakpoint_len helper The default breakpoint length is "sizeof(long)" however this is incorrect on platforms like Aarch64 where sizeof(long) is 8 but the breakpoint length is 4. Add a helper function that can be used to determine the correct breakpoint length, in this change it just returns the existing default sizeof(long) value. Use the helper in the bp_account test so that, when modifying the event from a watchpoint to a breakpoint, the breakpoint length is appropriate for the architecture and not just sizeof(long). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240904050606.752788-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-04 09:49:09 -03:00
Ian Rogers	f76e3525ac	perf parse-events: Pass cpu_list as a perf_cpu_map in __add_event() Previously the cpu_list is a string and typically no cpu_list is passed to __add_event(). Wanting to make events have their cpus distinct from the PMU means that in more occassions we want to pass a cpu_list. If we're reading this from sysfs it is easier to read a perf_cpu_map than allocate and pass around strings that will later be parsed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240718003025.1486232-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 16:45:34 -03:00
Ian Rogers	beef8fb2af	perf pmu: Merge boolean sysfs event option parsing Merge perf_pmu__parse_per_pkg() and perf_pmu__parse_snapshot() that do the same parsing except for the file suffix used. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240718003025.1486232-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 16:42:22 -03:00
Yang Jihong	9b3a48bbe2	perf sched timehist: Add --prio option The --prio option is used to only show events for the given task priority(ies). The default is to show events for all priority tasks, which is consistent with the previous behavior. Testcase: # perf sched record nice -n 9 perf bench sched messaging -l 10000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 3.435 [sec] [ perf record: Woken up 270 times to write data ] [ perf record: Captured and wrote 618.688 MB perf.data (5729036 samples) ] # perf sched timehist -h Usage: perf sched timehist [<options>] -C, --cpu <cpu> list of cpus to profile -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -g, --call-graph Display call chains if present (default on) -I, --idle-hist Show idle events only -i, --input <file> input file name -k, --vmlinux <file> vmlinux pathname -M, --migrations Show migration events -n, --next Show next task -p, --pid <pid[,pid...]> analyze events only for given process id(s) -s, --summary Show only syscall summary with statistics -S, --with-summary Show all syscalls and summary with statistics -t, --tid <tid[,tid...]> analyze events only for given thread id(s) -V, --cpu-visual Add CPU visual -v, --verbose be more verbose (show symbol address, etc) -w, --wakeups Show wakeup events --kallsyms <file> kallsyms pathname --max-stack <n> Maximum number of functions to display backtrace. --prio <prio> analyze events only for given task priority(ies) --show-prio Show task priority --state Show task state when sched-out --symfs <directory> Look for files with symbols relative to this directory --time <str> Time span for analysis (start,stop) # perf sched timehist --prio 140 Samples of sched_switch event do not have callchains. Invalid prio string # perf sched timehist --show-prio --prio 129 Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 2090450.765421 [0002] sched-messaging[1229618] 129 0.000 0.000 0.029 2090450.765445 [0007] sched-messaging[1229616] 129 0.000 0.062 0.043 2090450.765448 [0014] sched-messaging[1229619] 129 0.000 0.000 0.032 2090450.765478 [0013] sched-messaging[1229617] 129 0.000 0.065 0.048 2090450.765503 [0014] sched-messaging[1229622] 129 0.000 0.000 0.017 2090450.765550 [0002] sched-messaging[1229624] 129 0.000 0.000 0.021 2090450.765562 [0007] sched-messaging[1229621] 129 0.000 0.071 0.028 2090450.765570 [0005] sched-messaging[1229620] 129 0.000 0.064 0.066 2090450.765583 [0001] sched-messaging[1229625] 129 0.000 0.001 0.031 2090450.765595 [0013] sched-messaging[1229623] 129 0.000 0.060 0.028 2090450.765637 [0014] sched-messaging[1229628] 129 0.000 0.000 0.019 2090450.765665 [0007] sched-messaging[1229627] 129 0.000 0.038 0.030 <SNIP> # perf sched timehist --show-prio --prio 0,120-129 Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 2090450.763231 [0000] perf[1229608] 120 0.000 0.000 0.000 2090450.763235 [0000] migration/0[15] 0 0.000 0.001 0.003 2090450.763263 [0001] perf[1229608] 120 0.000 0.000 0.000 2090450.763268 [0001] migration/1[21] 0 0.000 0.001 0.004 2090450.763302 [0002] perf[1229608] 120 0.000 0.000 0.000 2090450.763309 [0002] migration/2[27] 0 0.000 0.001 0.007 2090450.763338 [0003] perf[1229608] 120 0.000 0.000 0.000 2090450.763343 [0003] migration/3[33] 0 0.000 0.001 0.004 2090450.763459 [0004] perf[1229608] 120 0.000 0.000 0.000 2090450.763469 [0004] migration/4[39] 0 0.000 0.002 0.010 2090450.763496 [0005] perf[1229608] 120 0.000 0.000 0.000 2090450.763501 [0005] migration/5[45] 0 0.000 0.001 0.004 2090450.763613 [0006] perf[1229608] 120 0.000 0.000 0.000 2090450.763622 [0006] migration/6[51] 0 0.000 0.001 0.008 2090450.763652 [0007] perf[1229608] 120 0.000 0.000 0.000 2090450.763660 [0007] migration/7[57] 0 0.000 0.001 0.008 <SNIP> 2090450.765665 [0001] <idle> 120 0.031 0.031 0.081 2090450.765665 [0007] sched-messaging[1229627] 129 0.000 0.038 0.030 2090450.765667 [0000] s1-perf[8235/7168] 120 0.008 0.000 0.004 2090450.765684 [0013] <idle> 120 0.028 0.028 0.088 2090450.765685 [0001] sched-messaging[`1229630`] 129 0.000 0.001 0.020 2090450.765688 [0000] <idle> 120 0.004 0.004 0.020 2090450.765689 [0002] <idle> 120 0.021 0.021 0.138 2090450.765691 [0005] sched-messaging[1229626] 129 0.000 0.085 0.029 Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819033016.2427235-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 15:45:59 -03:00
Yang Jihong	3fcd740990	perf sched timehist: Add --show-prio option The --show-prio option is used to display the priority of task. It is disabled by default, which is consistent with original behavior. The display format is xxx (priority does not change during task running) or xxx->yyy (priority changes during task running) Testcase: # perf sched record nice -n 9 true [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 0.497 MB perf.data ] # perf sched timehist -h Usage: perf sched timehist [<options>] -C, --cpu <cpu> list of cpus to profile -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -g, --call-graph Display call chains if present (default on) -I, --idle-hist Show idle events only -i, --input <file> input file name -k, --vmlinux <file> vmlinux pathname -M, --migrations Show migration events -n, --next Show next task -p, --pid <pid[,pid...]> analyze events only for given process id(s) -s, --summary Show only syscall summary with statistics -S, --with-summary Show all syscalls and summary with statistics -t, --tid <tid[,tid...]> analyze events only for given thread id(s) -V, --cpu-visual Add CPU visual -v, --verbose be more verbose (show symbol address, etc) -w, --wakeups Show wakeup events --kallsyms <file> kallsyms pathname --max-stack <n> Maximum number of functions to display backtrace. --show-prio Show task priority --state Show task state when sched-out --symfs <directory> Look for files with symbols relative to this directory --time <str> Time span for analysis (start,stop) # perf sched timehist Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 23952.006537 [0000] perf[534] 0.000 0.000 0.000 23952.006593 [0000] migration/0[19] 0.000 0.014 0.056 23952.006899 [0001] perf[534] 0.000 0.000 0.000 23952.006947 [0001] migration/1[22] 0.000 0.015 0.047 23952.007138 [0002] perf[534] 0.000 0.000 0.000 <SNIP> # perf sched timehist --show-prio Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 23952.006537 [0000] perf[534] 120 0.000 0.000 0.000 23952.006593 [0000] migration/0[19] 0 0.000 0.014 0.056 23952.006899 [0001] perf[534] 120 0.000 0.000 0.000 <SNIP> 23952.034843 [0003] nice[535] 120->129 0.189 0.024 23.314 <SNIP> 23952.053838 [0005] rcu_preempt[16] 120 3.993 0.000 0.023 23952.053990 [0005] <idle> 120 0.023 0.023 0.152 23952.054137 [0006] <idle> 120 1.427 1.427 17.855 23952.054278 [0007] <idle> 120 0.506 0.506 1.650 Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819033016.2427235-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 15:45:34 -03:00
Yang Jihong	b93fb9cf45	perf sched timehist: Remove redundant BUG_ON in timehist_sched_change_event() The BUG_ON(thread__tid(thread) != 0) in timehist_sched_change_event() is redundant, remove it. No functional change. Fixes: `07235f84ec` ("perf sched timehist: Add -I/--idle-hist option") Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812132606.3126490-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 15:44:22 -03:00
Yang Jihong	575eec2180	perf sched timehist: Skip print non-idle task samples when only show idle events when only show idle events, runtime stats of non-idle tasks is not updated, and the value is 0, there is no need to print non-idle samples. Before: # perf sched timehist -I Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 2090450.763235 [0000] migration/0[15] 0.000 0.000 0.000 2090450.763268 [0001] migration/1[21] 0.000 0.000 0.000 2090450.763309 [0002] migration/2[27] 0.000 0.000 0.000 2090450.763343 [0003] migration/3[33] 0.000 0.000 0.000 2090450.763469 [0004] migration/4[39] 0.000 0.000 0.000 2090450.763501 [0005] migration/5[45] 0.000 0.000 0.000 2090450.763622 [0006] migration/6[51] 0.000 0.000 0.000 2090450.763660 [0007] migration/7[57] 0.000 0.000 0.000 2090450.763741 [0009] migration/9[69] 0.000 0.000 0.000 2090450.763862 [0010] migration/10[75] 0.000 0.000 0.000 2090450.763894 [0011] migration/11[81] 0.000 0.000 0.000 2090450.764021 [0012] migration/12[87] 0.000 0.000 0.000 2090450.764056 [0013] migration/13[93] 0.000 0.000 0.000 2090450.764135 [0014] migration/14[99] 0.000 0.000 0.000 2090450.764163 [0015] migration/15[105] 0.000 0.000 0.000 2090450.764292 [0016] migration/16[111] 0.000 0.000 0.000 2090450.764371 [0017] migration/17[117] 0.000 0.000 0.000 2090450.764422 [0018] migration/18[123] 0.000 0.000 0.000 2090450.764490 [0000] <idle> 0.000 0.000 1.255 2090450.764505 [0000] s1-perf[8235/7168] 0.000 0.000 0.000 2090450.764571 [0016] <idle> 0.000 0.000 0.278 2090450.764588 [0010] <idle> 0.000 0.000 0.725 2090450.764590 [0016] s1-agent[7179/7162] 0.000 0.000 0.000 2090450.764635 [0000] <idle> 0.015 0.015 0.129 2090450.764637 [0017] <idle> 0.000 0.000 0.266 2090450.764639 [0000] s1-perf[8235/7168] 0.000 0.000 0.000 2090450.764668 [0017] s1-agent[7180/7162] 0.000 0.000 0.000 2090450.764669 [0000] <idle> 0.003 0.003 0.029 2090450.764672 [0000] s1-perf[8235/7168] 0.000 0.000 0.000 2090450.764683 [0000] <idle> 0.003 0.003 0.010 After: # perf sched timehist -I Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 2090450.764490 [0000] <idle> 0.000 0.000 1.255 2090450.764571 [0016] <idle> 0.000 0.000 0.278 2090450.764588 [0010] <idle> 0.000 0.000 0.725 2090450.764635 [0000] <idle> 0.015 0.015 0.129 2090450.764637 [0017] <idle> 0.000 0.000 0.266 2090450.764669 [0000] <idle> 0.003 0.003 0.029 2090450.764683 [0000] <idle> 0.003 0.003 0.010 2090450.764688 [0016] <idle> 0.019 0.019 0.097 2090450.764694 [0000] <idle> 0.001 0.001 0.009 2090450.764706 [0000] <idle> 0.001 0.001 0.010 2090450.764725 [0002] <idle> 0.000 0.000 1.415 2090450.764728 [0000] <idle> 0.002 0.002 0.019 2090450.764823 [0000] <idle> 0.003 0.003 0.091 2090450.764838 [0019] <idle> 0.000 0.000 0.154 2090450.764865 [0002] <idle> 0.109 0.109 0.029 2090450.764866 [0000] <idle> 0.012 0.012 0.030 2090450.764880 [0002] <idle> 0.013 0.013 0.001 2090450.764880 [0000] <idle> 0.002 0.002 0.011 2090450.764896 [0000] <idle> 0.001 0.001 0.013 2090450.764903 [0019] <idle> 0.063 0.063 0.002 2090450.764908 [0019] <idle> 0.003 0.003 0.001 Fixes: `07235f84ec` ("perf sched timehist: Add -I/--idle-hist option") Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812132606.3126490-1-yangjihong@bytedance.com Reviewed-and-tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 15:43:03 -03:00
Andi Kleen	bf0db8c759	perf script: Minimize "not reaching sample" for '-F +brstackinsn' In some situations 'perf script -F +brstackinsn' sees a lot of "not reaching sample" messages. This happens when the last LBR block before the sample contains a branch that is not in the LBR, and the instruction dumping stops. $ perf record -b emacs -Q --batch '()' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.396 MB perf.data (443 samples) ] $ perf script -F +brstackinsn ... 00007f0ab2d171a4 insn: 41 0f 94 c0 00007f0ab2d171a8 insn: 83 fa 01 00007f0ab2d171ab insn: 74 d3 # PRED 6 cycles [313] 1.00 IPC 00007f0ab2d17180 insn: 45 84 c0 00007f0ab2d17183 insn: 74 28 ... not reaching sample ... $ perf script -F +brstackinsn \| grep -c reach 136 $ This is a problem for further analysis that wants to see the full code upto the sample. There are two common cases where the message is bogus: - The LBR only logs taken branches, but the branch might be a conditional branch that is not taken (that is the most common case actually) - The LBR sampling uses a filter ignoring some branches, but the perf script check checks for all branches. This patch fixes these two conditions, by only checking for conditional branches, as well as checking the perf_event_attr's branch filter attributes. For the test case above it fixes all the messages: $ ./perf script -F +brstackinsn \| grep -c reach 0 Note that there are still conditions when the message is hit -- sometimes there can be a unconditional branch that misses the LBR update before the sample -- but they are much more rare now. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240229161828.386397-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 12:22:01 -03:00
Namhyung Kim	8b3b1bb3ea	perf record offcpu: Constify control data for BPF The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf record --off-cpu ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.807 MB perf.data (5645 samples) ] root@x1:~# perf evlist cpu_atom/cycles/P cpu_core/cycles/P offcpu-time dummy:u root@x1:~# perf evlist -v cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000000, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CPU\|PERIOD\|IDENTIFIER, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000000, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CPU\|PERIOD\|IDENTIFIER, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 offcpu-time: type: 1 (software), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP\|TID\|TIME\|CALLCHAIN\|CPU\|PERIOD\|IDENTIFIER, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1 dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP\|TID\|TIME\|CPU\|IDENTIFIER, read_format: ID\|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 root@x1:~# perf trace -e bpf --max-events 5 perf record --off-cpu 0.000 ( 0.015 ms): :2949124/2949124 bpf(cmd: 36, uattr: 0x7ffefc6dbe30, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.031 ( 0.115 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbb60, size: 148) = 14 0.159 ( 0.037 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbc20, size: 148) = 14 23.868 ( 0.144 ms): perf/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbad0, size: 148) = 14 24.027 ( 0.014 ms): perf/2949124 bpf(uattr: 0x7ffefc6dbc80, size: 80) = 14 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 11:54:47 -03:00
Namhyung Kim	4afdc00c37	perf lock contention: Constify control data for BPF The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf lock contention --use-bpf contended total wait max wait avg wait type caller 5 31.57 us 14.93 us 6.31 us mutex btrfs_delayed_update_inode+0x43 1 16.91 us 16.91 us 16.91 us rwsem:R btrfs_tree_read_lock_nested+0x1b 1 15.13 us 15.13 us 15.13 us spinlock btrfs_getattr+0xd1 1 6.65 us 6.65 us 6.65 us rwsem:R btrfs_tree_read_lock_nested+0x1b 1 4.34 us 4.34 us 4.34 us spinlock process_one_work+0x1a9 root@x1:~# root@x1:~# perf trace -e bpf --max-events 10 perf lock contention --use-bpf 0.000 ( 0.013 ms): :2948281/2948281 bpf(cmd: 36, uattr: 0x7ffd5f12d730, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.024 ( 0.120 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d460, size: 148) = 16 0.158 ( 0.034 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d520, size: 148) = 16 26.653 ( 0.154 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d3d0, size: 148) = 16 26.825 ( 0.014 ms): perf/2948281 bpf(uattr: 0x7ffd5f12d580, size: 80) = 16 87.924 ( 0.038 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d400, size: 40) = 16 87.988 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d470, size: 40) = 16 88.019 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d250, size: 40) = 16 88.029 ( 0.172 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d320, size: 148) = 17 88.217 ( 0.005 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d4d0, size: 40) = 16 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 11:53:15 -03:00
Namhyung Kim	066fd84087	perf kwork: Constify control data for BPF The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf kwork report --use-bpf Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name \| Cpu \| Total Runtime \| Count \| Max runtime \| Max runtime start \| Max runtime end \| -------------------------------------------------------------------------------------------------------------------------------- (w)intel_atomic_commit_work [ \| 0009 \| 18.680 ms \| 2 \| 18.553 ms \| 362410.681580 s \| 362410.700133 s \| (w)pm_runtime_work \| 0007 \| 13.300 ms \| 1 \| 13.300 ms \| 362410.254996 s \| 362410.268295 s \| (w)intel_atomic_commit_work [ \| 0009 \| 9.846 ms \| 2 \| 9.717 ms \| 362410.172352 s \| 362410.182069 s \| (w)acpi_ec_event_processor \| 0002 \| 8.106 ms \| 1 \| 8.106 ms \| 362410.463187 s \| 362410.471293 s \| (s)SCHED:7 \| 0000 \| 1.351 ms \| 106 \| 0.063 ms \| 362410.658017 s \| 362410.658080 s \| i915:157 \| 0008 \| 0.994 ms \| 13 \| 0.361 ms \| 362411.222125 s \| 362411.222486 s \| (s)SCHED:7 \| 0001 \| 0.703 ms \| 98 \| 0.047 ms \| 362410.245004 s \| 362410.245051 s \| (s)SCHED:7 \| 0005 \| 0.674 ms \| 42 \| 0.074 ms \| 362411.483039 s \| 362411.483113 s \| (s)NET_RX:3 \| 0001 \| 0.556 ms \| 10 \| 0.079 ms \| 362411.066388 s \| 362411.066467 s \| <SNIP> root@x1:~# perf trace -e bpf --max-events 5 perf kwork report --use-bpf 0.000 ( 0.016 ms): perf/2948007 bpf(cmd: 36, uattr: 0x7ffededa6660, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.026 ( 0.106 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6390, size: 148) = 12 0.152 ( 0.032 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6450, size: 148) = 12 26.247 ( 0.138 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6300, size: 148) = 12 26.396 ( 0.012 ms): perf/2948007 bpf(uattr: 0x7ffededa64b0, size: 80) = 12 Starting trace, Hit <Ctrl+C> to stop and report root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240902200515.2103769-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 11:50:20 -03:00
Namhyung Kim	ac5a23b2f2	perf ftrace latency: Constify control data for BPF The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf ftrace latency --use-bpf -T schedule ^C# DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 us \| 0 \| \| 2 - 4 us \| 0 \| \| 4 - 8 us \| 0 \| \| 8 - 16 us \| 1 \| \| 16 - 32 us \| 5 \| \| 32 - 64 us \| 2 \| \| 64 - 128 us \| 6 \| \| 128 - 256 us \| 7 \| \| 256 - 512 us \| 5 \| \| 512 - 1024 us \| 22 \| # \| 1 - 2 ms \| 36 \| ## \| 2 - 4 ms \| 68 \| ##### \| 4 - 8 ms \| 22 \| # \| 8 - 16 ms \| 91 \| ####### \| 16 - 32 ms \| 11 \| \| 32 - 64 ms \| 26 \| ## \| 64 - 128 ms \| 213 \| ################# \| 128 - 256 ms \| 19 \| # \| 256 - 512 ms \| 14 \| # \| 512 - 1024 ms \| 5 \| \| 1 - ... s \| 8 \| \| root@x1:~# root@x1:~# perf trace -e bpf perf ftrace latency --use-bpf -T schedule 0.000 ( 0.015 ms): perf/2944525 bpf(cmd: 36, uattr: 0x7ffe80de7b40, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.025 ( 0.102 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7870, size: 148) = 8 0.136 ( 0.026 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7930, size: 148) = 8 0.174 ( 0.026 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de77e0, size: 148) = 8 0.205 ( 0.010 ms): perf/2944525 bpf(uattr: 0x7ffe80de7990, size: 80) = 8 0.227 ( 0.011 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7810, size: 40) = 8 0.244 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40) = 8 0.257 ( 0.006 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7660, size: 40) = 8 0.265 ( 0.058 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7730, size: 148) = 9 0.330 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78e0, size: 40) = 8 0.337 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7890, size: 40) = 8 0.343 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40) = 8 0.349 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78b0, size: 40) = 8 0.355 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7890, size: 40) = 8 0.361 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78b0, size: 40) = 8 0.367 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40) = 8 0.373 ( 0.014 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7a00, size: 40) = 8 0.390 ( 0.358 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.763 ( 0.014 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.783 ( 0.011 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.798 ( 0.017 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.819 ( 0.003 ms): perf/2944525 bpf(uattr: 0x7ffe80de7700, size: 80) = 9 0.824 ( 0.047 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de76c0, size: 148) = 10 0.878 ( 0.008 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.891 ( 0.014 ms): perf/2944525 bpf(cmd: MAP_UPDATE_ELEM, uattr: 0x7ffe80de79e0, size: 32) = 0 0.910 ( 0.103 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7880, size: 148) = 9 1.016 ( 0.143 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7880, size: 148) = 10 3.777 ( 0.068 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7570, size: 148) = 12 3.848 ( 0.003 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de7550, size: 64) = -1 EBADF (Bad file descriptor) 3.859 ( 0.006 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de77c0, size: 64) = 12 6.504 ( 0.010 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de77c0, size: 64) = 14 ^C# DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 us \| 0 \| \| 2 - 4 us \| 1 \| \| 4 - 8 us \| 3 \| \| 8 - 16 us \| 3 \| \| 16 - 32 us \| 11 \| \| 32 - 64 us \| 9 \| \| 64 - 128 us \| 17 \| \| 128 - 256 us \| 30 \| # \| 256 - 512 us \| 20 \| \| 512 - 1024 us \| 42 \| # \| 1 - 2 ms \| 151 \| ###### \| 2 - 4 ms \| 106 \| #### \| 4 - 8 ms \| 18 \| \| 8 - 16 ms \| 149 \| ###### \| 16 - 32 ms \| 30 \| # \| 32 - 64 ms \| 17 \| \| 64 - 128 ms \| 360 \| ############### \| 128 - 256 ms \| 52 \| ## \| 256 - 512 ms \| 18 \| \| 512 - 1024 ms \| 28 \| # \| 1 - ... s \| 5 \| \| root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 11:47:02 -03:00
Namhyung Kim	76d3685400	perf stat: Constify control data for BPF The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf stat --bpf-counters -e cpu_core/cycles/,cpu_core/instructions/ sleep 1 Performance counter stats for 'sleep 1': 2,442,583 cpu_core/cycles/ 2,494,425 cpu_core/instructions/ 1.002687372 seconds time elapsed 0.001126000 seconds user 0.001166000 seconds sys root@x1:~# perf trace -e bpf --max-events 10 perf stat --bpf-counters -e cpu_core/cycles/,cpu_core/instructions/ sleep 1 0.000 ( 0.019 ms): perf/2944119 bpf(cmd: OBJ_GET, uattr: 0x7fffdf5cdd40, size: 20) = 5 0.021 ( 0.002 ms): perf/2944119 bpf(cmd: OBJ_GET_INFO_BY_FD, uattr: 0x7fffdf5cdcd0, size: 16) = 0 0.030 ( 0.005 ms): perf/2944119 bpf(cmd: MAP_LOOKUP_ELEM, uattr: 0x7fffdf5ceda0, size: 32) = 0 0.037 ( 0.004 ms): perf/2944119 bpf(cmd: LINK_GET_FD_BY_ID, uattr: 0x7fffdf5ced80, size: 12) = -1 ENOENT (No such file or directory) 0.189 ( 0.004 ms): perf/2944119 bpf(cmd: 36, uattr: 0x7fffdf5cec10, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.201 ( 0.095 ms): perf/2944119 bpf(cmd: PROG_LOAD, uattr: 0x7fffdf5ce940, size: 148) = 10 0.305 ( 0.026 ms): perf/2944119 bpf(cmd: PROG_LOAD, uattr: 0x7fffdf5cea00, size: 148) = 10 0.347 ( 0.012 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce8e0, size: 40) = 10 0.364 ( 0.004 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce950, size: 40) = 10 0.376 ( 0.006 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce730, size: 40) = 10 root@x1:~# Performance counter stats for 'sleep 1': 271,221 cpu_core/cycles/ 139,150 cpu_core/instructions/ 1.002881677 seconds time elapsed 0.001318000 seconds user 0.001314000 seconds sys root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 11:43:16 -03:00
Ian Rogers	18f41f1ba5	perf test: Make watchpoint data 32-bits on i386 i386 only supports watchpoints up to size 4, 8 bytes causes extra counts and test failures. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 11:26:53 -03:00
Ian Rogers	91235380e5	perf test: Skip uprobe test if probe command isn't present The probe command is dependent on libelf. Skip the test if the required probe command isn't present. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 11:23:01 -03:00
Ian Rogers	38e2648a81	perf time-utils: Fix 32-bit nsec parsing The "time utils" test fails in 32-bit builds: ... parse_nsec_time("18446744073.709551615") Failed. ptime 4294967295709551615 expected 18446744073709551615 ... Switch strtoul to strtoull as an unsigned long in 32-bit build isn't 64-bits. Fixes: `c284d669a2` ("perf tools: Move parse_nsec_time to time-utils.c") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 11:21:55 -03:00
Ian Rogers	6c99903e08	perf pmus: Fix name comparisons on 32-bit systems The hex PMU suffix maybe 64-bit but the comparisons were "unsigned long" or 32-bit on 32-bit systems. This was causing the "PMU name comparison" test to fail in a 32-bit build. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 11:21:09 -03:00
Steinar H. Gunderson	0488568178	perf annotate: LLVM-based disassembler Support using LLVM as a disassembler method, allowing helperless annotation in non-distro builds. (It is also much faster than using libbfd or bfd objdump on binaries with a lot of debug information.) This is nearly identical to the output of llvm-objdump; there are some very rare whitespace differences, some minor changes to demangling (since we use perf's regular demangling and not LLVM's own) and the occasional case where llvm-objdump makes a different choice when multiple symbols share the same address. It should work across all of LLVM's supported architectures, although I've only tested 64-bit x86, and finding the right triple from perf's idea of machine architecture can sometimes be a bit tricky. Ideally, we should have some way of finding the triplet just from the file itself. Committer notes: Address this on 32-bit systems by using PRIu64 from inttypes.h 3 17.58 almalinux:9-i386 : FAIL gcc version 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC) util/llvm-c-helpers.cpp: In function ‘char* make_symbol_relative_string(dso, const char, u64, u64)’: util/llvm-c-helpers.cpp:150:52: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘u64’ {aka +‘long long unsigned int’} [-Werror=format=] 150 \| snprintf(buf, sizeof(buf), "%s+0x%lx", \| ~~^ \| \| \| long unsigned int \| %llx 151 \| demangled ? demangled : sym_name, addr - base_addr); \| ~~~~~~~~~~~~~~~~ \| \| \| u64 {aka long long unsigned int} cc1plus: all warnings being treated as errors Signed-off-by: Steinar H. Gunderson <sesse@google.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240803152008.2818485-3-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 10:39:20 -03:00
Steinar H. Gunderson	6eca7c5ac2	perf annotate: Split out read_symbol() The Capstone disassembler code has a useful code snippet to read the bytes for a given code symbol into memory. Split it out into its own function, so that the LLVM disassembler can use it in the next patch. Signed-off-by: Steinar H. Gunderson <sesse@google.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240803152008.2818485-2-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 10:15:55 -03:00
Steinar H. Gunderson	c3f8644c21	perf report: Support LLVM for addr2line() In addition to the existing support for libbfd and calling out to an external addr2line command, add support for using libllvm directly. This is both faster than libbfd, and can be enabled in distro builds (the LLVM license has an explicit provision for GPLv2 compatibility). Thus, it is set as the primary choice if available. As an example, running 'perf report' on a medium-size profile with DWARF-based backtraces took 58 seconds with LLVM, 78 seconds with libbfd, 153 seconds with external llvm-addr2line, and I got tired and aborted the test after waiting for 55 minutes with external bfd addr2line (which is the default for perf as compiled by distributions today). Evidently, for this case, the bfd addr2line process needs 18 seconds (on a 5.2 GHz Zen 3) to load the .debug ELF in question, hits the 1-second timeout and gets killed during initialization, getting restarted anew every time. Having an in-process addr2line makes this much more robust. As future extensions, libllvm can be used in many other places where we currently use libbfd or other libraries: - Symbol enumeration (in particular, for PE binaries). - Demangling (including non-Itanium demangling, e.g. Microsoft or Rust). - Disassembling (perf annotate). However, these are much less pressing; most people don't profile PE binaries, and perf has non-bfd paths for ELF. The same with demangling; the default _cxa_demangle path works fine for most users, and while bfd objdump can be slow on large binaries, it is possible to use --objdump=llvm-objdump to get the speed benefits. (It appears LLVM-based demangling is very simple, should we want that.) Tested with LLVM 14, 15, 16, 18 and 19. For some reason, LLVM 12 was not correctly detected using feature_check, and thus was not tested. Committer notes: Added the name and a __maybe_unused to address: 1 13.50 almalinux:8 : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-22) (GCC) util/srcline.c: In function 'dso__free_a2l': util/srcline.c:184:20: error: parameter name omitted void dso__free_a2l(struct dso ) ^~~~~~~~~~~~ make[3]: ** [/git/perf-6.11.0-rc3/tools/build/Makefile.build:158: util] Error 2 Signed-off-by: Steinar H. Gunderson <sesse@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240803152008.2818485-1-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-03 10:15:16 -03:00
Arnaldo Carvalho de Melo	0e7eb23668	perf tools: Build x86 32-bit syscall table from arch/x86/entry/syscalls/syscall_32.tbl To remove one more use of the audit libs and address a problem reported with a recent change where a function isn't available when using the audit libs method, that should really go away, this being one step in that direction. The script used to generate the 64-bit syscall table was already parametrized to generate for both 64-bit and 32-bit, so just use it and wire the generated table to the syscalltbl.c routines. Reported-by: Jiri Slaby <jirislaby@kernel.org> Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Jiri Slaby <jirislaby@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/6fe63fa3-6c63-4b75-ac09-884d26f6fb95@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-09-02 11:13:40 -03:00
Yang Jihong	39c243411b	perf sched timehist: Fixed timestamp error when unable to confirm event sched_in time If sched_in event for current task is not recorded, sched_in timestamp will be set to end_time of time window interest, causing an error in timestamp show. In this case, we choose to ignore this event. Test scenario: perf[1229608] does not record the first sched_in event, run time and sch delay are both 0 # perf sched timehist Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 2090450.763231 [0000] perf[1229608] 0.000 0.000 0.000 2090450.763235 [0000] migration/0[15] 0.000 0.001 0.003 2090450.763263 [0001] perf[1229608] 0.000 0.000 0.000 2090450.763268 [0001] migration/1[21] 0.000 0.001 0.004 2090450.763302 [0002] perf[1229608] 0.000 0.000 0.000 2090450.763309 [0002] migration/2[27] 0.000 0.001 0.007 2090450.763338 [0003] perf[1229608] 0.000 0.000 0.000 2090450.763343 [0003] migration/3[33] 0.000 0.001 0.004 Before: arbitrarily specify a time window of interest, timestamp will be set to an incorrect value # perf sched timehist --time 100,200 Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 200.000000 [0000] perf[1229608] 0.000 0.000 0.000 200.000000 [0001] perf[1229608] 0.000 0.000 0.000 200.000000 [0002] perf[1229608] 0.000 0.000 0.000 200.000000 [0003] perf[1229608] 0.000 0.000 0.000 200.000000 [0004] perf[1229608] 0.000 0.000 0.000 200.000000 [0005] perf[1229608] 0.000 0.000 0.000 200.000000 [0006] perf[1229608] 0.000 0.000 0.000 200.000000 [0007] perf[1229608] 0.000 0.000 0.000 After: # perf sched timehist --time 100,200 Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- Fixes: `853b740711` ("perf sched timehist: Add option to specify time window of interest") Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819024720.2405244-1-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-30 10:31:57 -03:00
Namhyung Kim	74fd69a35c	perf lock contention: Fix spinlock and rwlock accounting The spinlock and rwlock use a single-element per-cpu array to track current locks due to performance reason. But this means the key is always available and it cannot simply account lock stats in the array because some of them are invalid. In fact, the contention_end() program in the BPF invalidates the entry by setting the 'lock' value to 0 instead of deleting the entry for the hashmap. So it should skip entries with the lock value of 0 in the account_end_timestamp(). Otherwise, it'd have spurious high contention on an idle machine: $ sudo perf lock con -ab -Y spinlock sleep 3 contended total wait max wait avg wait type caller 8 4.72 s 1.84 s 590.46 ms spinlock rcu_core+0xc7 8 1.87 s 1.87 s 233.48 ms spinlock process_one_work+0x1b5 2 1.87 s 1.87 s 933.92 ms spinlock worker_thread+0x1a2 3 1.81 s 1.81 s 603.93 ms spinlock tmigr_update_events+0x13c 2 1.72 s 1.72 s 861.98 ms spinlock tick_do_update_jiffies64+0x25 6 42.48 us 13.02 us 7.08 us spinlock futex_q_lock+0x2a 1 13.03 us 13.03 us 13.03 us spinlock futex_wake+0xce 1 11.61 us 11.61 us 11.61 us spinlock rcu_core+0xc7 I don't believe it has contention on a spinlock longer than 1 second. After this change, it only reports some small contentions. $ sudo perf lock con -ab -Y spinlock sleep 3 contended total wait max wait avg wait type caller 4 133.51 us 43.29 us 33.38 us spinlock tick_do_update_jiffies64+0x25 4 69.06 us 31.82 us 17.27 us spinlock process_one_work+0x1b5 2 50.66 us 25.77 us 25.33 us spinlock rcu_core+0xc7 1 28.45 us 28.45 us 28.45 us spinlock rcu_core+0xc7 1 24.77 us 24.77 us 24.77 us spinlock tmigr_update_events+0x13c 1 23.34 us 23.34 us 23.34 us spinlock raw_spin_rq_lock_nested+0x15 Fixes: `b5711042a1` ("perf lock contention: Use per-cpu array map for spinlocks") Reported-by: Xi Wang <xii@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: bpf@vger.kernel.org Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240828052953.1445862-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-30 10:22:50 -03:00
Namhyung Kim	36cddd1056	perf lock contention: Do not fail EEXIST for update When it updates the lock stat for the first time, it needs to create an element in the BPF hash map. But if there's a concurrent thread waiting for the same lock (like for rwsem or rwlock), it might race with the thread and possibly fail to update with -EEXIST. In that case, it can lookup the map again and put the data there instead of failing. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240830065150.1758962-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-30 10:03:39 -03:00
Namhyung Kim	05a5dd1dfd	perf lock contention: Simplify spinlock check The LCB_F_SPIN bit is used for spinlock, rwlock and optimistic spinning in mutex. In get_tstamp_elem() it needs to check spinlock and rwlock only. As mutex sets the LCB_F_MUTEX, it can check those two bits and reduce the number of operations. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240830065150.1758962-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-30 09:57:47 -03:00
Namhyung Kim	10d6c57c82	perf lock contention: Handle error in a single place It has some duplicate codes to do the same job. Let's add a label and goto there to handle errors in a single place. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240830065150.1758962-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-30 09:57:19 -03:00
Ian Rogers	ccb9004656	perf test: Additional pipe tests with pipe output written to a file Additional pipe tests where piped files are written to disk. This means that spotting a file name of "-" isn't a sufficient "is pipe?" test. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-30 09:24:27 -03:00
Ian Rogers	2d57c32b32	perf header: Remove repipe option No longer used by `perf inject` the repipe_fd is always -1 and repipe is always false. Remove the options and associated code knowing the constant values of the removed variables. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-30 09:24:27 -03:00
Ian Rogers	89d64e7273	perf inject: Overhaul handling of pipe files Previously inject->is_pipe was set if the input or output were a pipe. Determining the input was a pipe had to be done prior to starting the session and opening the file. This was done by comparing the input file name with '-' but it fails if the pipe file is written to disk. Opening a pipe file from disk will correctly set perf_data.is_pipe, but this is too late for 'perf inject' and results in a broken file. A workaround is 'cat pipe_perf\|perf inject -i - ...'. This change removes inject->is_pipe and changes the dependent conditions to use the is_pipe flag on the input (inject->session->data) and output files (inject->output). This ensures the is_pipe condition reflects things like the header being read. The change removes the use of perf file header repiping, that is writing the file header out while reading it in. The case of input pipe and output file cannot repipe as the attributes for the file are unknown. To resolve this, write the file header when writing to disk and as the attributes may be unknown, write them after the data. Update sessions repipe variable to be trace_event_repipe as those are the only events now impacted by it. Update __perf_session__new as the repipe_fd no longer needs passing. Fully removing repipe from session header reading will be done in a later change. Committer testing: root@number:~# perf record -e syscalls:sys_enter_sleep/max-stack=4/ -o - sleep 0.01 \| perf report -i - # To display the perf.data header info, please use --header/--header-only options. # [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.050 MB - ] # # Total Lost Samples: 0 # # Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep' # Event count (approx.): 1 # # Overhead Command Shared Object Symbol # ........ ....... ............. ............................... # 100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5 \| ---__libc_start_main@@GLIBC_2.34 __libc_start_call_main 0x562fc2560a9f clock_nanosleep@GLIBC_2.2.5 # # (Tip: Create an archive with symtabs to analyse on other machine: perf archive) # root@number:~# perf record -e syscalls:sys_enter_sleep/max-stack=4/ -o - sleep 0.01 > pipe.data [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.050 MB - ] root@number:~# perf report --stdio -i pipe.data # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep' # Event count (approx.): 1 # # Overhead Command Shared Object Symbol # ........ ....... ............. ............................... # 100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5 \| ---__libc_start_main@@GLIBC_2.34 __libc_start_call_main 0x55f775975a9f clock_nanosleep@GLIBC_2.2.5 # # (Tip: To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ...) # root@number:~# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-30 09:23:51 -03:00
Ian Rogers	e9a7053da3	perf header: Allow attributes to be written after data With a file, to write data an offset needs to be known. Typically data follows the event attributes in a file. However, if processing a pipe the number of event attributes may not be known. It is convenient in that case to write the attributes after the data. Expand perf_session__do_write_header() to allow this when the data offset and size are known. This approach may be useful for more than just taking a pipe file to write into a data file, `perf inject --itrace` will reserve and additional 8kb for attributes, which would be unnecessary if the attributes were written after the data. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 16:17:43 -03:00
Ian Rogers	10df481fda	perf header: Fail read if header sections overlap Buggy perf.data files can have the attributes and data overlapping. For example, when processing pipe data the attributes aren't known and so file offset header calculations can consider them not present. Later this can cause the attributes to overwrite the data. This can be seen in: $ perf record -o - true > a.data [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.059 MB - ] $ perf inject -i a.data -o b.data $ perf report --stats -i b.data 0x68 [0]: failed to process type: 510379 [Invalid argument] Error: failed to process sample $ This change makes reading the corrupt file fail: $ perf report --stats -i b.data Perf file header corrupt: Attributes and data overlap incompatible file format (rerun with -v to learn more) $ Which is more informative. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 16:15:29 -03:00
Ian Rogers	d71bbe799c	perf header: Add kerneldoc to 'struct perf_file_header' Some of the values are a little strange so add documentation to resolve ambiguity. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 16:14:24 -03:00
Ian Rogers	d9c993100e	perf session: Document 'struct perf_session' and constify its 'auxtrace' member perf_session is a central data structure to the tool so let's comment it. The auxtrace callbacks are never modified in session so constify. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 16:13:26 -03:00
James Clark	022aa67b5a	perf: cs-etm: Print queue number in raw trace dump Now that we have overlapping trace IDs it's also useful to know what the queue number is to be able to distinguish the source of the trace so print it inline. Hide it behind the -v option because it might not be obvious to users what the queue number is. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-8-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 15:56:37 -03:00
James Clark	1506af6db8	perf: cs-etm: Support version 0.1 of HW_ID packets v0.1 HW_ID packets have a new field that describes which sink each CPU writes to. Use the sink ID to link trace ID maps to each other so that mappings are shared wherever the sink is shared. Also update the error message to show that overlapping IDs aren't an error in per-thread mode, just not supported. In the future we can use the CPU ID from the AUX records, or watch for changing sink IDs on HW_ID packets to use the correct decoders. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-7-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 15:56:13 -03:00
James Clark	940007cee5	perf: cs-etm: Only save valid trace IDs into files This isn't a bug because Perf always masks with CORESIGHT_TRACE_ID_VAL_MASK before using these values, but to avoid it looking like it could be, make an effort to not save bad values. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-6-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 15:55:52 -03:00
James Clark	19c3e4db38	perf: cs-etm: Create decoders based on the trace ID mappings Now that each queue has a unique set of trace ID mappings, use this list to create the decoders. In unformatted mode just add a single mapping so only one decoder is made. Previously each queue would have a decoder created for each traced CPU on the system but this won't work anymore because CPUs can have overlapping trace IDs. This also means that the CORESIGHT_TRACE_ID_UNUSED_FLAG isn't needed any more. If mappings aren't added then decoders aren't created, rather than needing a flag to suppress creation. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-5-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 15:55:24 -03:00
James Clark	77c123f53e	perf: cs-etm: Move traceid_list to each queue The global list won't work for per-sink trace ID allocations, so put a list in each queue where the IDs will be unique to that queue. To keep the same behavior as before, for version 0 of the HW_ID packets, copy all the HW_ID mappings into all queues. This change doesn't effect the decoders, only trace ID lookups on the Perf side. The decoders are still created with global mappings which will be fixed in a later commit. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-4-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 15:54:40 -03:00
James Clark	57880a7966	perf: cs-etm: Allocate queues for all CPUs Make cs_etm__setup_queue() setup a queue even if it's empty, and pre-allocate queues based on the max CPU that was recorded. In per-CPU mode aux queues are indexed based on CPU ID even if all CPUs aren't recorded, sparse queue arrays aren't used. This will allow HW_IDs to be saved even if no aux data was received in that queue without having to call cs_etm__setup_queue() from two different places. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-3-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 12:34:55 -03:00
James Clark	b6aa0de9a5	perf cs-etm: Create decoders after both AUX and HW_ID search passes Both of these passes gather information about how to create the decoders. AUX records determine formatted/unformatted, and the HW_IDs determine the traceID/metadata mappings. Therefore it makes sense to cache the information and wait until both passes are over until creating the decoders, rather than creating them at the first HW_ID found. This will allow a simplification of the creation process where cs_etm_queue->traceid_list will exclusively used to create the decoders, rather than the current two methods depending on whether the trace is formatted or not. Previously the sample CPU from the AUX record was used to initialize the decoder CPU, but actually sample CPU == AUX queue index in per-CPU mode, so saving the sample CPU isn't required. Similarly formatted/unformatted was used upfront to create the decoders, but now it's cached until later. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Tested-by: Leo Yan <leo.yan@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-2-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-29 12:33:02 -03:00
Namhyung Kim	d56a4d56a2	perf test: Add 'perf record cgroup' filtering test $ sudo ./perf test filtering -vv 96: perf record sample filtering (by BPF) tests: --- start --- test child forked, pid 2966908 Checking BPF-filter privilege Basic bpf-filter test Basic bpf-filter test [Success] Failing bpf-filter test Failing bpf-filter test [Success] Group bpf-filter test Group bpf-filter test [Success] Multiple bpf-filter test Multiple bpf-filter test [Success] Cgroup bpf-filter test Cgroup bpf-filter test [Success] ---- end(0) ---- 96: perf record sample filtering (by BPF) tests : Ok Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:22:27 -03:00
Namhyung Kim	91e88437d5	perf bpf-filter: Support filtering on cgroups The new cgroup filter can take either of '==' or '!=' operator and a pathname for the target cgroup. $ perf record -a --all-cgroups -e cycles --filter 'cgroup == /abc/def' -- sleep 1 Users should have --all-cgroups option in the command line to enable cgroup filtering. Technically it doesn't need to have the option as it can get the current task's cgroup info directly from BPF. But I want to follow the convention for the other sample info. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:21:49 -03:00
Namhyung Kim	591156f25f	perf bpf-filter: Add build dependency to header files The flex and bison files need to be recompiled when one of these header filters are changed. * util/bpf-filter.h * util/bpf_skel/sample-filter.h Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:21:24 -03:00
Namhyung Kim	9af2efee41	perf report: Fix segfault when 'sym' sort key is not used The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 #1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 #2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 #3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 #4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 #5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 #6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 #7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 #8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 #9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: `ac01c8c424` ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:20:38 -03:00
James Clark	6f87543c74	perf test trace_btf_enum: Fix shellcheck warning Shellcheck versions < v0.7.2 can't follow this path so add the helper to fix the following warning: In tests/shell/trace_btf_enum.sh line 13: . "$(dirname $0)"/lib/probe.sh ^--------------------------^ SC1090: Can't follow non-constant source. Use a directive to specify location. Fixes: `d66763fed3` ("perf test trace_btf_enum: Add regression test for the BTF augmentation of enums in 'perf trace'") Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240809095426.3065163-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:18:33 -03:00
Leo Yan	d5726f1c8d	perf auxtrace: Remove unused 'pmu' pointer from struct auxtrace_record The 'pmu' pointer in the auxtrace_record structure is not used after support multiple AUX events, remove it. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240806204130.720977-3-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:15:16 -03:00
Leo Yan	c87826ddce	perf auxtrace: Use evsel__is_aux_event() for checking AUX event Use evsel__is_aux_event() to decide if an event is a AUX event, this is a refactoring to replace comparing the PMU type. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240806204130.720977-2-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:14:42 -03:00
Lucas Stach	aea4d46345	perf vendor events arm64: Move Yitian 710 DDR PMU into T-Head directory The Yitian 710 is not a Freescale/NXP design and thus should be located in a separate T-Head vendor directory. Reviewed-by: Jing Zhang <renyu.zj@linux.alibaba.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: kernel@pengutronix.de Cc: linux-arm-kernel@lists.infradead.org Cc: patchwork-lst@pengutronix.de Link: https://lore.kernel.org/r/20240701175735.485655-1-l.stach@pengutronix.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:12:20 -03:00
Kajol Jain	adf50a6e66	perf vendor events: Move PM_BR_MPRED_CMPL event for power10 platform Move PM_BR_MPRED_CMPL event from cache.json to frontend.json file for power10 platform Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20240827053206.538814-3-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:10:24 -03:00
Kajol Jain	0edee81971	perf vendor events power10: Move the JSON/events Move some of the JSON/events from others.json to more appropriate JSON files for power10 platform. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20240827053206.538814-2-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:10:20 -03:00
Kajol Jain	c5d50457a8	perf vendor events power10: Update JSON/events Update JSON/events for power10 platform with additional events. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20240827053206.538814-1-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:10:18 -03:00
Arnaldo Carvalho de Melo	7bedcbaefd	perf trace: Pass the richer 'struct syscall_arg' pointer to trace__btf_scnprintf() Since we'll need it later in the current patch series and we can get the syscall_arg_fmt from syscall_arg->fmt. Based-on-a-patch-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zsd8vqCrTh5h69rp@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Howard Chu	8df1d8c6cb	perf trace: Fix perf trace -p <PID> 'perf trace -p <PID>' work on a syscall that is unaugmented, but doesn't work on a syscall that's augmented (when it calls perf_event_output() in BPF). Let's take open() as an example. open() is augmented in perf trace. Before: $ perf trace -e open -p 3792392 ? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory) ? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory) We can see there's no output. After: $ perf trace -e open -p 3792392 0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory) 1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory) Reason: bpf_perf_event_output() will fail when you specify a pid in 'perf trace' (EOPNOTSUPP). When using 'perf trace -p 114', before perf_event_open(), we'll have PID = 114, and CPU = -1. This is bad for bpf-output event, because the ring buffer won't accept output from BPF's perf_event_output(), making it fail. I'm still trying to find out why. If we open bpf-output for every cpu, instead of setting it to -1, like this: PID = <PID>, CPU = 0 PID = <PID>, CPU = 1 PID = <PID>, CPU = 2 PID = <PID>, CPU = 3 Everything works. You can test it with this script (open.c): #include <unistd.h> #include <sys/syscall.h> int main() { int i1 = 1, i2 = 2, i3 = 3, i4 = 4; char s1[] = "DINGZHEN", s2[] = "XUEBAO"; while (1) { syscall(SYS_open, s1, i1, i2); sleep(1); } return 0; } save, compile: make open perf trace: perf trace -e open <path-to-the-executable> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Howard Chu	4451dae469	perf evlist: Introduce method to find if there is a bpf-output event We'll use it in the next patch, to deciding how to set up the ring buffer. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Ian Rogers	8b48f8ba16	perf report: Name events in stats for pipe mode In stats mode PERF_RECORD_EVENT_UPDATE isn't being handled meaning the evsels aren't named when handling pipe mode output. Before: $ perf record -e inst_retired.any -a -o - sleep 0.1\|perf report --stats -i - ... Aggregated stats: TOTAL events: 23358 COMM events: 2608 (11.2%) EXIT events: 1 ( 0.0%) FORK events: 2607 (11.2%) SAMPLE events: 174 ( 0.7%) MMAP2 events: 17936 (76.8%) ATTR events: 2 ( 0.0%) FINISHED_ROUND events: 2 ( 0.0%) ID_INDEX events: 1 ( 0.0%) THREAD_MAP events: 1 ( 0.0%) CPU_MAP events: 1 ( 0.0%) EVENT_UPDATE events: 3 ( 0.0%) TIME_CONV events: 1 ( 0.0%) FEATURE events: 20 ( 0.1%) FINISHED_INIT events: 1 ( 0.0%) raw 0xc0 stats: SAMPLE events: 174 After: $ perf record -e inst_retired.any -a -o - sleep 0.1\|perf report --stats -i - ... Aggregated stats: TOTAL events: 23742 COMM events: 2620 (11.0%) EXIT events: 2 ( 0.0%) FORK events: 2619 (11.0%) SAMPLE events: 165 ( 0.7%) MMAP2 events: 18304 (77.1%) ATTR events: 2 ( 0.0%) FINISHED_ROUND events: 2 ( 0.0%) ID_INDEX events: 1 ( 0.0%) THREAD_MAP events: 1 ( 0.0%) CPU_MAP events: 1 ( 0.0%) EVENT_UPDATE events: 3 ( 0.0%) TIME_CONV events: 1 ( 0.0%) FEATURE events: 20 ( 0.1%) FINISHED_INIT events: 1 ( 0.0%) inst_retired.any stats: SAMPLE events: 165 This makes the pipe output match the regular output. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240827212757.1469340-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Michael Petlan	097fe67df1	perf testsuite: Install perf-report tests in the 'make install-tests -C tools/perf' target Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-13-vmolnaro@redhat.com Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Veronika Molnarova	e37cb2a6be	perf testsuite report: Add test case for perf report Add a new 'perf report' test case that acts as an entry element in 'perf test list'. Runs multiple subtests from directory "base_report", which can be expanded without further editing. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-12-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Veronika Molnarova	61f8715183	perf testsuite report: Add test for perf-report basic functionality Test basic execution and some options of perf-report subcommand, like show-nr-samples, header, showcpuutilization, pid and symbol filtering. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-11-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Veronika Molnarova	13d58a6672	perf testsuite: Add common output checking helper As a form of validation, it is a common practice to check the outputs of commands whether they contain expected patterns or match a certain regular expression. This output checking helper is designed to allow checking stderr output of perf commands for unexpected messages, while ignoring messages that are known to be harmless, e.g.: "Lowering default frequency rate to \d+\." "\d+ out of order events recorded." etc. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-10-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Veronika Molnarova	c0964af816	perf testsuite probe: Add test for line semantics The perf-probe command uses a specific semantics to describe probes. Test some patterns that are known to be both valid and invalid if they are handled appropriately. This test is run as a part of perftool-testsuite_probe test case. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-9-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Veronika Molnarova	83b6815dbb	perf testsuite probe: Add test for invalid options Test if various incompatible options are correctly handled-rejected. It is run as a part of perftool-testsuite_probe test case. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-8-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Veronika Molnarova	adc1dd00db	perf testsuite probe: Add test for basic perf-probe options Test basic behavior of perf-probe subcommand. It is run as a part of perftool-testsuite_probe test case. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-7-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Veronika Molnarova	def5480d63	perf testsuite probe: Add test for blacklisted kprobes handling Test perf probe interface. Blacklisted functions should be rejected when there is an attempt to set a kprobe to them. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-6-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:21 -03:00
Veronika Molnarova	32ddd082dc	perf testsuite: Fix shellcheck warnings Shellcheck is becoming a standard when building perf to prevent any unnecessary mistakes. Fix shellcheck warnings in perf testsuite. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-5-vmolnaro@redhat.com Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:20 -03:00
Veronika Molnarova	a3a02a52bc	perf testsuite: Merge settings files for shell tests Merge perf testsuite setting files into common settings to reduce duplicates and prevent errors. Signed-off-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-4-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:20 -03:00
Michael Petlan	5a02447c81	perf tests shell: Skip base_* dirs in test script search The test scripts in base_* directories currently have their own drivers that run them. Before this patch, the shell test-suite generator causes them to run twice. Fix that by skipping them in the generator. A cleaner solution (for future) will be to use the directory structure idea (introduced by Carsten Haitzler in `7391db6459` ("perf test: Refactor shell tests allowing subdirs")) to generate test entries with subtests, like: $ perf test list [...] 97: perf probe shell tests 97:1: perf probe basic functionality 97:2: perf probe tests with arguments 97:3: perf probe invalid options handling [...] There is already a lot of shell test scripts and many are about to come, so there is a need for some hierarchy. Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240702110849.31904-3-vmolnaro@redhat.com Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:20 -03:00
Arnaldo Carvalho de Melo	a68080e1a2	perf test vfs_getname: Look for alternative line where to collect the pathname The getname_flags() routine changed recently and thus the place where we were getting the pathname is not probeable anymore, albeit still present, so use the next line for that, before: root@number:/home/acme/git/perf-tools-next# perf test vfs_getname 91: Add vfs_getname probe to get syscall args filenames : FAILED! 93: Use vfs_getname probe to get syscall args filenames : FAILED! 126: Check open filename arg using perf trace + vfs_getname : FAILED! root@number:/home/acme/git/perf-tools-next# Now tests 91 and 126 are passing, some more investigation is needed for test 93, that continues to fail. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:20 -03:00
Namhyung Kim	150ca9ccc4	perf test: Update sample filtering tests with multiple events Add Multiple bpf-filter test for two or more events with filters. It uses task-clock and page-faults events with different filter expressions and check the perf script output $ sudo ./perf test filtering -vv 96: perf record sample filtering (by BPF) tests: --- start --- test child forked, pid 2804025 Checking BPF-filter privilege Basic bpf-filter test Basic bpf-filter test [Success] Failing bpf-filter test Error: task-clock event does not have PERF_SAMPLE_CPU Failing bpf-filter test [Success] Group bpf-filter test Error: task-clock event does not have PERF_SAMPLE_CPU Error: task-clock event does not have PERF_SAMPLE_CODE_PAGE_SIZE Group bpf-filter test [Success] Multiple bpf-filter test Multiple bpf-filter test [Success] ---- end(0) ---- 96: perf record sample filtering (by BPF) tests : Ok Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240820154504.128923-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:20 -03:00
Namhyung Kim	1a5474a779	perf tools: Print lost samples due to BPF filter Print the actual dropped sample count in the event stat. $ sudo perf record -o- -e cycles --filter 'period < 10000' \ -e instructions --filter 'ip > 0x8000000000000000' perf test -w noploop \| \ perf report --stat -i- [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.058 MB - ] Aggregated stats: TOTAL events: 469 MMAP events: 268 (57.1%) COMM events: 2 ( 0.4%) EXIT events: 1 ( 0.2%) SAMPLE events: 16 ( 3.4%) MMAP2 events: 22 ( 4.7%) LOST_SAMPLES events: 2 ( 0.4%) KSYMBOL events: 89 (19.0%) BPF_EVENT events: 39 ( 8.3%) ATTR events: 2 ( 0.4%) FINISHED_ROUND events: 1 ( 0.2%) ID_INDEX events: 1 ( 0.2%) THREAD_MAP events: 1 ( 0.2%) CPU_MAP events: 1 ( 0.2%) EVENT_UPDATE events: 2 ( 0.4%) TIME_CONV events: 1 ( 0.2%) FEATURE events: 20 ( 4.3%) FINISHED_INIT events: 1 ( 0.2%) cycles stats: SAMPLE events: 2 LOST_SAMPLES (BPF) events: 4010 instructions stats: SAMPLE events: 14 LOST_SAMPLES (BPF) events: 3990 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240820154504.128923-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:20 -03:00
Namhyung Kim	0fe2b18ddc	perf bpf-filter: Support multiple events properly So far it used tgid as a key to get the filter expressions in the pinned filters map for regular users but it won't work well if the has more than one filters at the same time. Let's add the event id to the key of the filter hash map so that it can identify the right filter expression in the BPF program. As the event can be inherited to child tasks, it should use the primary id which belongs to the parent (original) event. Since evsel opens the event for multiple CPUs and tasks, it needs to maintain a separate hash map for the event id. In the user space, it keeps a list for the multiple evsel and release the entries in the both hash map when it closes the event. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240820154504.128923-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:20 -03:00
Kan Liang	4f3affe0ab	perf hist: Don't set hpp_fmt_value for members in --no-group Perf crashes as below when applying --no-group # perf record -e "{cache-misses,branches"} -b sleep 1 # perf report --stdio --no-group free(): invalid next size (fast) Aborted (core dumped) # In the __hpp__fmt(), only 1 hpp_fmt_value is allocated for the current event when --no-group is applied. However, the current implementation tries to assign the hists from all members to the hpp_fmt_value, which exceeds the allocated memory. Fixes: `8f6071a3dc` ("perf hist: Simplify __hpp_fmt() using hpp_fmt_data") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240820183202.3174323-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-28 18:07:20 -03:00
Andi Kleen	f133c76409	perf test: Support external tests for separate objdir Extend the searching for the test files so that it works when running perf from a separate objdir, and also when the perf executable is symlinked. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240813213651.1057362-2-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-26 11:30:52 -03:00
Arnaldo Carvalho de Melo	00dc514612	perf python: Disable -Wno-cast-function-type-mismatch if present on clang The -Wcast-function-type-mismatch option was introduced in clang 19 and its enabled by default, since we use -Werror, and python bindings do casts that are valid but trips this warning, disable it if present. Closes: https://lore.kernel.org/all/CA+icZUXoJ6BS3GMhJHV3aZWyb5Cz2haFneX0C5pUMUUhG-UVKQ@mail.gmail.com Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org # To allow building with the upcoming clang 19 Link: https://lore.kernel.org/lkml/CA+icZUVtHn8X1Tb_Y__c-WswsO0K8U9uy3r2MzKXwTA5THtL7w@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-22 17:26:50 -03:00
Arnaldo Carvalho de Melo	b811623020	perf python: Allow checking for the existence of warning options in clang We'll need to check if an warning option introduced in clang 19 is available on the clang version being used, so cover the error message emitted when testing for a -W option. Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/CA+icZUVtHn8X1Tb_Y__c-WswsO0K8U9uy3r2MzKXwTA5THtL7w@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-22 14:15:55 -03:00
Namhyung Kim	1cfd01eb60	perf annotate-data: Copy back variable types after move In some cases, compilers don't set the location expression in DWARF precisely. For instance, it may assign a variable to a register after copying it from a different register. Then it should use the register for the new type but still uses the old register. This makes hard to track the type information properly. This is an example I found in __tcp_transmit_skb(). The first argument (sk) of this function is a pointer to sock and there's a variable (tp) for tcp_sock. static int __tcp_transmit_skb(struct sock sk, struct sk_buff skb, int clone_it, gfp_t gfp_mask, u32 rcv_nxt) { ... struct tcp_sock tp; BUG_ON(!skb \|\| !tcp_skb_pcount(skb)); tp = tcp_sk(sk); prior_wstamp = tp->tcp_wstamp_ns; tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache); ... So it basically calls tcp_sk(sk) to get the tcp_sock pointer from sk. But it turned out to be the same value because tcp_sock embeds sock as the first member. The sk is located in reg5 (RDI) and tp is in reg3 (RBX). The offset of tcp_wstamp_ns is 0x748 and tcp_clock_cache is 0x750. So you need to use RBX (reg3) to access the fields in the tcp_sock. But the code used RDI (reg5) as it has the same value. $ pahole --hex -C tcp_sock vmlinux \| grep -e 748 -e 750 u64 tcp_wstamp_ns; / 0x748 0x8 / u64 tcp_clock_cache; / 0x750 0x8 / And this is the disassembly of the part of the function. <__tcp_transmit_skb>: ... 44: mov %rdi, %rbx 47: mov 0x748(%rdi), %rsi 4e: mov 0x750(%rdi), %rax 55: cmp %rax, %rsi Because compiler put the debug info to RBX, it only knows RDI is a pointer to sock and accessing those two fields resulted in error due to offset being beyond the type size. ----------------------------------------------------------- find data type for 0x748(reg5) at __tcp_transmit_skb+0x63 CU for net/ipv4/tcp_output.c (die:0x817f543) frame base: cfa=0 fbreg=6 scope: [1/1] (die:81aac3e) bb: [0 - 30] var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df) var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6) var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6) var [5] reg1 type='int' size=0x4 (die:0x818059e) var [5] reg4 type='struct sk_buff' size=0x8 (die:0x8181360) var [5] reg5 type='struct sock' size=0x8 (die:0x8181a0c) <<<--- the first argument ('sk' at %RDI) mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6) mov [20] stack canary -> reg0 mov [29] reg0 -> -0x30(stack) stack canary bb: [36 - 3e] mov [36] reg4 -> reg15 type='struct sk_buff' size=0x8 (die:0x8181360) bb: [44 - 63] mov [44] reg5 -> reg3 type='struct sock' size=0x8 (die:0x8181a0c) <<<--- calling tcp_sk() var [47] reg3 type='struct tcp_sock' size=0x8 (die:0x819eead) <<<--- new variable ('tp' at %RBX) var [4e] reg4 type='unsigned long long' size=0x8 (die:0x8180edd) mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd) chk [63] reg5 offset=0x748 ok=1 kind=1 (struct sock) : offset bigger than size <<<--- access with old variable final result: offset bigger than size While it's a fault in the compiler, we could work around this issue by using the type of new variable when it's copied directly. So I've added copied_from field in the register state to track those direct register to register copies. After that new register gets a new type and the old register still has the same type, it'll update (copy it back) the type of the old register. For example, if we can update type of reg5 at __tcp_transmit_skb+0x47, we can find the target type of the instruction at 0x63 like below: ----------------------------------------------------------- find data type for 0x748(reg5) at __tcp_transmit_skb+0x63 ... bb: [44 - 63] mov [44] reg5 -> reg3 type='struct sock' size=0x8 (die:0x8181a0c) var [47] reg3 type='struct tcp_sock' size=0x8 (die:0x819eead) var [47] copyback reg5 type='struct tcp_sock' size=0x8 (die:0x819eead) <<<--- here mov [47] 0x748(reg5) -> reg4 type='unsigned long long' size=0x8 (die:0x8180edd) mov [4e] 0x750(reg5) -> reg0 type='unsigned long long' size=0x8 (die:0x8180edd) mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd) chk [63] reg5 offset=0x748 ok=1 kind=1 (struct tcp_sock*) : Good! <<<--- new type found by insn track: 0x748(reg5) type-offset=0x748 final result: type='struct tcp_sock' size=0xa98 (die:0x819eeb2) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240821232628.353177-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-22 12:38:18 -03:00
Namhyung Kim	895891dad7	perf annotate-data: Update stack slot for the store When checking the match variable at the target instruction, it might not have any information if it's a first write to a stack slot. In this case it could spill a register value into the stack so the type info is in the source operand. But currently it's hard to get the operand from the checking function. Let's process the instruction and retry to get the type info from the stack if there's no information already. This is an example of __tcp_transmit_skb(). The instructions are <__tcp_transmit_skb>: 0: nopl 0x0(%rax, %rax, 1) 5: push %rbp 6: mov %rsp, %rbp 9: push %r15 b: push %r14 d: push %r13 f: push %r12 11: push %rbx 12: sub $0x98, %rsp 19: mov %r8d, -0xa8(%rbp) ... It cannot find any variable at -0xa8(%rbp) at this point. ----------------------------------------------------------- find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19 CU for net/ipv4/tcp_output.c (die:0x817f543) frame base: cfa=0 fbreg=6 scope: [1/1] (die:81aac3e) bb: [0 - 19] var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df) var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6) var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6) var [5] reg1 type='int' size=0x4 (die:0x818059e) var [5] reg4 type='struct sk_buff' size=0x8 (die:0x8181360) var [5] reg5 type='struct sock' size=0x8 (die:0x8181a0c) chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : no type information no type information And it was able to find the type after processing the 'mov' instruction. ----------------------------------------------------------- find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19 CU for net/ipv4/tcp_output.c (die:0x817f543) frame base: cfa=0 fbreg=6 scope: [1/1] (die:81aac3e) bb: [0 - 19] var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df) var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6) var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6) var [5] reg1 type='int' size=0x4 (die:0x818059e) var [5] reg4 type='struct sk_buff' size=0x8 (die:0x8181360) var [5] reg5 type='struct sock' size=0x8 (die:0x8181a0c) chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : retry <<<--- here mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6) chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : Good! found by insn track: -0xa8(reg6) type-offset=0 final result: type='unsigned int' size=0x4 (die:0x8180ed6) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240821232628.353177-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-22 12:38:02 -03:00
Namhyung Kim	a0d57c6061	perf annotate-data: Update debug messages In check_matching_type(), it'd be easier to display the typename in question if it's available. For example, check out the line starts with 'chk'. ----------------------------------------------------------- find data type for 0x10(reg0) at cpuacct_charge+0x13 CU for kernel/sched/build_utility.c (die:0x137ee0b) frame base: cfa=1 fbreg=7 scope: [3/3] (die:13d9632) bb: [c - 13] var [c] reg5 type='struct task_struct' size=0x8 (die:0x1381230) mov [c] 0xdf8(reg5) -> reg0 type='struct css_set' size=0x8 (die:0x1385c56) chk [13] reg0 offset=0x10 ok=1 kind=1 (struct css_set*) : Good! <<<--- here found by insn track: 0x10(reg0) type-offset=0x10 final result: type='struct css_set' size=0x250 (die:0x1385b0e) Another example: ----------------------------------------------------------- find data type for 0x8(reg0) at menu_select+0x279 CU for drivers/cpuidle/governors/menu.c (die:0x7b0fe79) frame base: cfa=1 fbreg=7 scope: [2/2] (die:7b11010) bb: [273 - 277] bb: [279 - 279] chk [279] reg0 offset=0x8 ok=0 kind=0 cfa : no type information scope: [1/2] (die:7b10cbc) bb: [0 - 64] ... mov [26a] imm=0xffffffff -> reg15 bb: [273 - 277] bb: [279 - 279] chk [279] reg0 offset=0x8 ok=1 kind=1 (long long unsigned int) : no/void pointer <<<--- here final result: no/void pointer Also change some places to print negative offsets properly. Before: ----------------------------------------------------------- find data type for 0xffffff40(reg6) at __tcp_transmit_skb+0x58 After: ----------------------------------------------------------- find data type for -0xc0(reg6) at __tcp_transmit_skb+0x58 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240821232628.353177-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-22 12:37:46 -03:00
Namhyung Kim	a11b4222bb	perf dwarf-aux: Handle bitfield members from pointer access The __die_find_member_offset_cb() missed to handle bitfield members which don't have DW_AT_data_member_location. Like in adding member types in __add_member_cb() it should fallback to check the bit offset when it resolves the member type for an offset. Fixes: `437683a994` ("perf dwarf-aux: Handle type transfer for memory access") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240821232628.353177-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-22 12:32:18 -03:00
Namhyung Kim	fd45d52eae	perf annotate-data: Add 'typecln' sort key Sometimes it's useful to organize member fields in cache-line boundary. The 'typecln' sort key is short for type-cacheline and to show samples in each cacheline. The cacheline size is fixed to 64 for now, but it can read the actual size once it saves the value from sysfs. For example, you maybe want to which cacheline in a target is hot or cold. The following shows members in the cfs_rq's first cache line. $ perf report -s type,typecln,typeoff -H ... - 2.67% struct cfs_rq + 1.23% struct cfs_rq: cache-line 2 + 0.57% struct cfs_rq: cache-line 4 + 0.46% struct cfs_rq: cache-line 6 - 0.41% struct cfs_rq: cache-line 0 0.39% struct cfs_rq +0x14 (h_nr_running) 0.02% struct cfs_rq +0x38 (tasks_timeline.rb_leftmost) ... Committer testing: # root@number:~# perf report -s type,typecln,typeoff -H --stdio # Total Lost Samples: 0 # # Samples: 5K of event 'cpu_atom/mem-loads,ldlat=5/P' # Event count (approx.): 312251 # # Overhead Data Type / Data Type Cacheline / Data Type Offset # .............. .................................................. # <SNIP> 0.07% struct sigaction 0.05% struct sigaction: cache-line 1 0.02% struct sigaction +0x58 (sa_mask) 0.02% struct sigaction +0x78 (sa_mask) 0.03% struct sigaction: cache-line 0 0.02% struct sigaction +0x38 (sa_mask) 0.01% struct sigaction +0x8 (sa_mask) <SNIP> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819233603.54941-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-21 11:48:43 -03:00
Namhyung Kim	7a5c217024	perf annotate-data: Show offset and size in hex It'd be better to have them in hex to check cacheline alignment. Percent offset size field 100.00 0 0x1c0 struct cfs_rq { 0.00 0 0x10 struct load_weight load { 0.00 0 0x8 long unsigned int weight; 0.00 0x8 0x4 u32 inv_weight; }; 0.00 0x10 0x4 unsigned int nr_running; 14.56 0x14 0x4 unsigned int h_nr_running; 0.00 0x18 0x4 unsigned int idle_nr_running; 0.00 0x1c 0x4 unsigned int idle_h_nr_running; ... Committer notes: Justification from Namhyung when asked about why it would be "better": Cache line sizes are power of 2 so it'd be natural to use hex and check whether an offset is in the same boundary. Also 'perf annotate' shows instruction offsets in hex. > > Maybe this should be selectable? I can add an option and/or a config if you want. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819233603.54941-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-21 11:48:39 -03:00
Yang Ruibin	ce66d7c703	perf bpf: Remove redundant check that map is NULL The check that map is NULL is already done in the bpf_map__fd(map) and returns an errno, which does not run further checks. In addition, even if the check for map is run, the return is a pointer, which is not consistent with the err_number returned by bpf_map__fd(map). Signed-off-by: Yang Ruibin <11162571@vivo.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: opensource.kernel@vivo.com Link: https://lore.kernel.org/r/20240821101500.4568-1-11162571@vivo.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-21 11:39:51 -03:00
Namhyung Kim	4d6d6e0f61	perf annotate-data: Fix percpu pointer check In check_matching_type(), it checks the type state of the register in a wrong order. When it's the percpu pointer, it should check the type for the pointer, but it checks the CFA bit first and thought it has no type in the stack slot. This resulted in no type info. ----------------------------------------------------------- find data type for 0x28(reg1) at hrtimer_reprogram+0x88 CU for kernel/time/hrtimer.c (die:0x18f219f) frame base: cfa=1 fbreg=7 ... add [72] percpu 0x24500 -> reg1 pointer type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46) bb: [7a - 7e] bb: [80 - 86] (here) bb: [88 - 88] vvv chk [88] reg1 offset=0x28 ok=1 kind=4 cfa : no type information no type information Here, instruction at 0x72 found reg1 has a (percpu) pointer and got the correct type. But when it checks the final result, it wrongly thought it was stack variable because it checks the cfa bit first. After changing the order of state check: ----------------------------------------------------------- find data type for 0x28(reg1) at hrtimer_reprogram+0x88 CU for kernel/time/hrtimer.c (die:0x18f219f) frame base: cfa=1 fbreg=7 ... (here) vvvvvvvvvv chk [88] reg1 offset=0x28 ok=1 kind=4 percpu ptr : Good! found by insn track: 0x28(reg1) type-offset=0x28 final type: type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240821065408.285548-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-21 11:30:38 -03:00
Namhyung Kim	4a32a97268	perf annotate-data: Prefer struct/union over base type Sometimes a compound type can have a single field and the size is the same as the base type. But it's still preferred as struct or union could carry more information than the base type. Also put a slight priority on the typedef for the same reason. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240821065408.285548-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-21 11:29:56 -03:00
Namhyung Kim	922ec313f0	perf annotate-data: Fix missing constant copy I found it missed to copy the immediate constant when it moves the register value. This could result in a wrong type inference since the address for the per-cpu variable would be 0 always. Fixes: `eb9190afae` ("perf annotate-data: Handle ADD instructions") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240821065408.285548-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-21 11:27:18 -03:00
Ian Rogers	e25ebda78e	perf cap: Tidy up and improve capability testing Remove dependence on libcap. libcap is only used to query whether a capability is supported, which is just 1 capget system call. If the capget system call fails, fall back on root permission checking. Previously if libcap fails then the permission is assumed not present which may be pessimistic/wrong. Add a used_root out argument to perf_cap__capable to say whether the fall back root check was used. This allows the correct error message, "root" vs "users with the CAP_PERFMON or CAP_SYS_ADMIN capability", to be selected. Tidy uses of perf_cap__capable so that tests aren't repeated if capget isn't supported. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240806220614.831914-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-20 17:53:12 -03:00
Namhyung Kim	8b1042c425	perf annotate-data: Set bitfield member offset and size properly The bitfield members might not have DW_AT_data_member_location. Let's use DW_AT_data_bit_offset to set the member offset correct. Also use DW_AT_bit_size for the name like in a C program. Before: Annotate type: 'struct sk_buff' (1 samples) Percent Offset Size Field - 100.00 0 232 struct sk_buff { + 0.00 0 24 union ; + 0.00 24 8 union ; + 0.00 32 8 union ; 0.00 40 48 char[] cb; + 0.00 88 16 union ; 0.00 104 8 long unsigned int _nfct; 100.00 112 4 unsigned int len; 0.00 116 4 unsigned int data_len; 0.00 120 2 __u16 mac_len; 0.00 122 2 __u16 hdr_len; 0.00 124 2 __u16 queue_mapping; 0.00 126 0 __u8[] __cloned_offset; 0.00 0 1 __u8 cloned; 0.00 0 1 __u8 nohdr; 0.00 0 1 __u8 fclone; 0.00 0 1 __u8 peeked; 0.00 0 1 __u8 head_frag; 0.00 0 1 __u8 pfmemalloc; 0.00 0 1 __u8 pp_recycle; 0.00 127 1 __u8 active_extensions; + 0.00 128 60 union ; 0.00 188 4 sk_buff_data_t tail; 0.00 192 4 sk_buff_data_t end; 0.00 200 8 unsigned char* head; After: Annotate type: 'struct sk_buff' (1 samples) Percent Offset Size Field - 100.00 0 232 struct sk_buff { + 0.00 0 24 union ; + 0.00 24 8 union ; + 0.00 32 8 union ; 0.00 40 48 char[] cb + 0.00 88 16 union ; 0.00 104 8 long unsigned int _nfct; 100.00 112 4 unsigned int len; 0.00 116 4 unsigned int data_len; 0.00 120 2 __u16 mac_len; 0.00 122 2 __u16 hdr_len; 0.00 124 2 __u16 queue_mapping; 0.00 126 0 __u8[] __cloned_offset; 0.00 126 1 __u8 cloned:1; 0.00 126 1 __u8 nohdr:1; 0.00 126 1 __u8 fclone:2; 0.00 126 1 __u8 peeked:1; 0.00 126 1 __u8 head_frag:1; 0.00 126 1 __u8 pfmemalloc:1; 0.00 126 1 __u8 pp_recycle:1; 0.00 127 1 __u8 active_extensions; + 0.00 128 60 union ; 0.00 188 4 sk_buff_data_t tail; 0.00 192 4 sk_buff_data_t end; 0.00 200 8 unsigned char* head; Commiter notes: Collect some data: root@number:~# perf mem record -a --ldlat 5 -- ping -s 8193 -f 192.168.86.1 Memory events are enabled on a subset of CPUs: 16-27 PING 192.168.86.1 (192.168.86.1) 8193(8221) bytes of data. .^C --- 192.168.86.1 ping statistics --- 13881 packets transmitted, 13880 received, 0.00720409% packet loss, time 8664ms rtt min/avg/max/mdev = 0.510/0.599/7.768/0.115 ms, ipg/ewma 0.624/0.593 ms [ perf record: Woken up 8 times to write data ] [ perf record: Captured and wrote 14.877 MB perf.data (46785 samples) ] root@number:~# root@number:~# perf evlist cpu_atom/mem-loads,ldlat=5/P cpu_atom/mem-stores/P dummy:u root@number:~# perf evlist -v cpu_atom/mem-loads,ldlat=5/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|ADDR\|CPU\|PERIOD\|IDENTIFIER\|DATA_SRC\|WEIGHT_STRUCT, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x7 cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|ADDR\|CPU\|PERIOD\|IDENTIFIER\|DATA_SRC\|WEIGHT_STRUCT, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP\|TID\|TIME\|ADDR\|CPU\|IDENTIFIER\|DATA_SRC\|WEIGHT_STRUCT, read_format: ID\|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 root@number:~# Ok, now lets see what changes from before this patch to after it: root@number:~# perf annotate --data-type > /tmp/before Apply the patch, build: root@number:~# perf annotate --data-type > /tmp/after The first hunk of the diff, for a glib data structure, in userspace, look at those bitfields: root@number:~# diff -u10 /tmp/before /tmp/after \| head -20 --- /tmp/before 2024-08-20 17:29:58.306765780 -0300 +++ /tmp/after 2024-08-20 17:33:13.210582596 -0300 @@ -163,22 +163,22 @@ Annotate type: 'GHashTable' in /usr/lib64/libglib-2.0.so.0.8000.3 (1 samples): ============================================================================ Percent offset size field 100.00 0 96 GHashTable { 0.00 0 8 gsize size; 0.00 8 4 gint mod; 100.00 12 4 guint mask; 0.00 16 4 guint nnodes; 0.00 20 4 guint noccupied; - 0.00 0 4 guint have_big_keys; - 0.00 0 4 guint have_big_values; + 0.00 24 1 guint have_big_keys:1; + 0.00 24 1 guint have_big_values:1; 0.00 32 8 gpointer keys; 0.00 40 8 guint* hashes; 0.00 48 8 gpointer values; root@number:~# As advertised :-) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240815223823.2402285-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-20 17:11:39 -03:00
Arnaldo Carvalho de Melo	6236ebe071	perf daemon: Fix the build on more 32-bit architectures The previous attempt fixed the build on debian:experimental-x-mipsel, but when building on a larger set of containers I noticed it broke the build on some other 32-bit architectures such as: 42 7.87 ubuntu:18.04-x-arm : FAIL gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) builtin-daemon.c: In function 'cmd_session_list': builtin-daemon.c:692:16: error: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'long int' [-Werror=format=] fprintf(out, "%c%" PRIu64, ^~~~~ builtin-daemon.c:694:13: csv_sep, (curr - daemon->start) / 60); ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from builtin-daemon.c:3:0: /usr/arm-linux-gnueabihf/include/inttypes.h:105:34: note: format string is defined here # define PRIu64 __PRI64_PREFIX "u" So lets cast that time_t (32-bit/64-bit) to uint64_t to make sure it builds everywhere. Fixes: `4bbe600293` ("perf daemon: Fix the build on 32-bit architectures") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZsPmldtJ0D9Cua9_@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 21:44:30 -03:00
Namhyung Kim	5cc698bad7	perf test: Add cgroup sampling test Add it to the record.sh shell test to verify if it tracks cgroup information correctly. It records with --all-cgroups option can check if it has PERF_RECORD_CGROUP and the names are not "unknown". $ sudo ./perf test -vv 95 95: perf record tests: --- start --- test child forked, pid 2871922 169c90-169cd0 g test_loop perf does have symbol 'test_loop' Basic --per-thread mode test Basic --per-thread mode test [Success] Register capture test Register capture test [Success] Basic --system-wide mode test Basic --system-wide mode test [Success] Basic target workload test Basic target workload test [Success] Branch counter test branch counter feature not supported on all core PMUs (/sys/bus/event_source/devices/cpu) [Skipped] Cgroup sampling test Cgroup sampling test [Success] ---- end(0) ---- 95: perf record tests : Ok Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240818212948.2873156-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 16:32:32 -03:00
Namhyung Kim	3432bae89e	perf record: Fix sample cgroup & namespace tracking The recent change in 'struct perf_tool' constification broke the cgroup and/or namespace tracking by resetting tool fields. It should set the values after perf_tool__init(). Fixes: `cecb1cf154` ("perf record: Use perf_tool__init()") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240818212948.2873156-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 16:32:05 -03:00
Ian Rogers	05c4cfeba0	perf inject: Combine mmap and mmap2 handling The handling of mmap and mmap2 events is near identical. Add a common helper function and call that by the two event handling functions. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:57:15 -03:00
Ian Rogers	048a7a9363	perf inject: Combine different mmap and mmap2 functions There are repipe, build ID and JIT dump variants of the mmap and mmap2 repipe functions. The organization doesn't allow JIT dump to work with build ID injection and the structure is less than clear. Combine the function and enable the different behaviors based on ifs. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:54:50 -03:00
Ian Rogers	0ed4c8c311	perf inject: Combine build_ids and build_id_all into enum It is clearer to have a single enum that determines how build ids are injected, it also allows for future extension. Set the header build ID feature whether lazy or all are generated, previously only the lazy case would set it. Allow parsing of known build IDs for either the lazy or all cases. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:53:55 -03:00
Ian Rogers	a8656614eb	perf test: Expand pipe/inject test Test recording of call-graphs and injecting --build-all. Add/expand trap handler. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:53:26 -03:00
Ian Rogers	63c89dc5e1	perf evsel: Constify evsel__id_hdr_size() argument Allows evsel__id_hdr_size() to be used when the evsel is const. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:52:42 -03:00
Ian Rogers	e4bb4caa54	perf dso: Constify dso_id The passed dso_id is copied and so is never an out argument. Remove its mutability. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:52:13 -03:00
Ian Rogers	0847c193c3	perf jit: Constify filename argument Make it clearer the argument is just being used as a string. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:51:46 -03:00
Ian Rogers	a031073626	perf map: API clean up map__init() is only used internally so make it static. Assume memory is zero initialized, which will better support adding fields to struct map in the future and was already the case for map__new2. To reduce complexity, change set_priv and set_erange_warned to not take a value to assign as they always assign true. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:49:53 -03:00
Ian Rogers	2aebebb834	perf synthetic-events: Avoid unnecessary memset Make sure the memset of a synthesized event only zeros the necessary tracing data part of the event, as a full event can be over 4kb in size. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:46:17 -03:00
Xu Yang	2518e13275	perf python: Fix the build on 32-bit arm by including missing "util/sample.h" The 32-bit arm build system will complain: tools/perf/util/python.c:75:28: error: field ‘sample’ has incomplete type 75 \| struct perf_sample sample; However, arm64 build system doesn't complain this. The root cause is arm64 define "HAVE_KVM_STAT_SUPPORT := 1" in tools/perf/arch/arm64/Makefile, but arm arch doesn't define this. This will lead to kvm-stat.h include other header files on arm64 build system, especially "util/sample.h" for util/python.c. This will try to directly include "util/sample.h" for "util/python.c" to avoid such build issue on arm platform. Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: imx@lists.linux.dev Link: https://lore.kernel.org/r/20240819023403.201324-1-xu.yang_2@nxp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 14:44:21 -03:00
Namhyung Kim	023aceecc7	perf annotate-data: Update type stat at the end of find_data_type_die() After trying all possibilities with DWARF and instruction tracking. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240816235840.2754937-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 11:55:26 -03:00
Namhyung Kim	ba8833703b	perf annotate-data: Check variables in every scope Sometimes it matches a variable in the inner scope but it fails because the actual access can be on a different type. Let's try variables in every scope and choose the best one using is_better_type(). I have an example with update_blocked_averages(), at first it found a variable (__mptr) but it's a void pointer. So it moved on to the upper scope and found another variable (cfs_rq). $ perf --debug type-profile annotate --data-type --stdio ... ----------------------------------------------------------- find data type for 0x140(reg14) at update_blocked_averages+0x2db CU for kernel/sched/fair.c (die:0x12dd892) frame base: cfa=1 fbreg=7 found "__mptr" (die: 0x13022f1) in scope=4/4 (die: 0x13022e8) failed: no/void pointer variable location: base=reg14, offset=0x140 type='void' size=0x8 (die:0x12dd8f9) found "cfs_rq" (die: 0x1301721) in scope=3/4 (die: 0x130171c) type_offset=0x140 variable location: reg14 type='struct cfs_rq' size=0x1c0 (die:0x12e37e5) final type: type='struct cfs_rq' size=0x1c0 (die:0x12e37e5) IIUC the scope is like below: 1: update_blocked_averages 2: __update_blocked_fair 3: for_each_leaf_cfs_rq_safe 4: list_entry -> (container_of) The container_of is implemented like: #define container_of(ptr, type, member) ({ \ void __mptr = (void )(ptr); \ static_assert(__same_type((ptr), ((type )0)->member) \|\| \ __same_type((ptr), void), \ "pointer type mismatch in container_of()"); \ ((type )(__mptr - offsetof(type, member))); }) That's why we see the __mptr variable first but it failed since it has no type information. Then for_each_leaf_cfs_rq_safe() is defined as #define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \ list_for_each_entry_safe(cfs_rq, pos, &rq->leaf_cfs_rq_list, \ leaf_cfs_rq_list) Note that the access was 0x140(r14). And the cfs_rq has leaf_cfs_rq_list at the 0x140. So it converts the list_head pointer to a pointer to struct cfs_rq here. $ pahole --hex -C cfs_rq vmlinux \| grep 140 struct cfs_rq struct list_head leaf_cfs_rq_list; / 0x140 0x10 */ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240816235840.2754937-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 11:50:40 -03:00
Namhyung Kim	c663451f92	perf annotate-data: Add is_better_type() helper Sometimes more than one variables are located in the same register or a stack slot. Or it can overwrite existing information with others. I found this is not helpful in some cases so it needs to update the type information from the variable only if it's better. But it's hard to know which one is better, so we needs heuristics. :) As it deals with memory accesses, the location should have a pointer or something similar (like array or reference). So if it had an integer type and a variable is a pointer, we can take the variable's type to resolve the target of the access. If it has a pointer type and a variable with the same location has a different pointer type, it'll take one with bigger target type. This can be useful when the target type embeds a smaller type (like list header or RB-tree node) at the beginning so their location is same. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240816235840.2754937-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 11:49:22 -03:00
Namhyung Kim	98d1f1dc72	perf annotate-data: Add is_pointer_type() helper It treats pointers and arrays in the same way. Let's add the helper and use it when it checks if it needs a pointer. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240816235840.2754937-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 11:40:57 -03:00
Namhyung Kim	69e2c78425	perf annotate-data: Change return type of find_data_type_block() So that it can return enum variable_match_type to be propagated to the find_data_type_die(). Also update the debug message to show the result of the check_matching_type(). chk [dd] reg0 offset=0 ok=1 kind=1 : Good! or chk [177] reg4 offset=0x138 ok=0 kind=0 cfa : no type information Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240816235840.2754937-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 11:37:52 -03:00
Namhyung Kim	653185d808	perf annotate-data: Add variable_state_str() So that it can show a proper debug message in the right place. The check_variable() is used in other places which don't want to print the message. $ perf --debug type-profile annotate --data-type Before: ----------------------------------------------------------- find data type for 0x140(reg14) at update_blocked_averages+0x2db CU for kernel/sched/fair.c (die:0x12dd892) frame base: cfa=1 fbreg=7 no pointer or no type <<<--- removed check variable "__mptr" failed (die: 0x13022f1) variable location: base=reg14, offset=0x140 type='void' size=0x8 (die:0x12dd8f9) After: ----------------------------------------------------------- find data type for 0x140(reg14) at update_blocked_averages+0x2db CU for kernel/sched/fair.c (die:0x12dd892) frame base: cfa=1 fbreg=7 found "__mptr" (die: 0x13022f1) in scope=4/4 (die: 0x13022e8) failed: no/void pointer <<<--- here variable location: base=reg14, offset=0x140 type='void' size=0x8 (die:0x12dd8f9) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240816235840.2754937-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 11:37:18 -03:00
Namhyung Kim	976862f8ab	perf annotate-data: Add 'enum type_match_result' And let check_variable() return the enum value so that callers can know what was the problem. This will be used by the later patch to update the statistics correctly and print the error message in a right place. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240816235840.2754937-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 11:36:41 -03:00
Namhyung Kim	3ab0b8b238	perf annotate-data: Fix off-by-one in location range check The location list will have entries with half-open addressing like [start, end) which means it doesn't include the end address. So it should skip entries at the end address and match to the next entry. An example location list looks like this (from readelf -wo): 00237876 ffffffff8110d32b (base address) 0023787f v000000000000000 v000000000000002 views at 00237868 for: ffffffff8110d32b ffffffff8110d4eb (DW_OP_reg3 (rbx)) <<<--- 1 00237885 v000000000000002 v000000000000000 views at 0023786a for: ffffffff8110d4eb ffffffff8110d50b (DW_OP_reg14 (r14)) <<<--- 2 0023788c v000000000000000 v000000000000001 views at 0023786c for: ffffffff8110d50b ffffffff8110d7c4 (DW_OP_reg3 (rbx)) 00237893 v000000000000000 v000000000000000 views at 0023786e for: ffffffff8110d806 ffffffff8110d854 (DW_OP_reg3 (rbx)) 0023789a v000000000000000 v000000000000000 views at 00237870 for: ffffffff8110d876 ffffffff8110d88e (DW_OP_reg3 (rbx)) The first entry at 0023787f has [8110d32b, 8110d4eb) (omitting the ffffffff at the beginning), and the second one has [8110d4eb, 8110d50b). Fixes: `2bc3cf575a` ("perf annotate-data: Improve debug message with location info") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240816235840.2754937-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 11:35:56 -03:00
Namhyung Kim	e8bb03ed68	perf dwarf-aux: Check allowed location expressions when collecting variables It missed to call check_allowed_ops() in __die_collect_vars_cb() so it can take variables with complex location expression incorrectly. For example, I found some variable has this expression. 015d8df8 ffffffff81aacfb3 (base address) 015d8e01 v000000000000004 v000000000000000 views at 015d8df2 for: ffffffff81aacfb3 ffffffff81aacfd2 (DW_OP_fbreg: -176; DW_OP_deref; DW_OP_plus_uconst: 332; DW_OP_deref_size: 4; DW_OP_lit1; DW_OP_shra; DW_OP_const1u: 64; DW_OP_minus; DW_OP_stack_value) 015d8e14 v000000000000000 v000000000000000 views at 015d8df4 for: ffffffff81aacfd2 ffffffff81aacfd7 (DW_OP_reg3 (rbx)) 015d8e19 v000000000000000 v000000000000000 views at 015d8df6 for: ffffffff81aacfd7 ffffffff81aad020 (DW_OP_fbreg: -176; DW_OP_deref; DW_OP_plus_uconst: 332; DW_OP_deref_size: 4; DW_OP_lit1; DW_OP_shra; DW_OP_const1u: 64; DW_OP_minus; DW_OP_stack_value) 015d8e2c <End of list> It looks like '((int *)(-176(%rbp) + 332) >> 1) - 64' but the current code thought it's just -176(%rbp) and processed the variable incorrectly. It should reject such a complex expression if check_allowed_ops() doesn't like it. :) Fixes: `932dcc2c39` ("perf dwarf-aux: Add die_collect_vars()") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240816235840.2754937-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-19 11:34:07 -03:00
Arnaldo Carvalho de Melo	3bce87eb74	Merge remote-tracking branch 'torvalds/master' into perf-tools-next To pick up the latest perf-tools merge for 6.11, i.e. to have the current perf tools branch that is getting into 6.11 with the perf-tools-next that is geared towards 6.12. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-16 19:43:16 -03:00
Yicong Yang	2615639352	perf stat: Display iostat headers correctly Currently we'll only print metric headers for metric leader in aggregration mode. This will make `perf iostat` header not shown since it'll aggregrated globally but don't have metric events: root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat --iostat --timeout 1000 Performance counter stats for 'system wide': port 0000:00 0 0 0 0 0000:80 0 0 0 0 [...] Fix this by excluding the iostat in the check of printing metric headers. Then we can see the headers: root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat --iostat --timeout 1000 Performance counter stats for 'system wide': port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB) 0000:00 0 0 0 0 0000:80 0 0 0 0 [...] Fixes: `193a9e3020` ("perf stat: Don't display metric header for non-leader uncore events") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: linuxarm@huawei.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Zeng Tao <prime.zeng@hisilicon.com> Link: https://lore.kernel.org/r/20240802065800.48774-1-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-16 19:35:18 -03:00
Yang Jihong	6bdf5168b6	perf sched timehist: Fix missing free of session in perf_sched__timehist() When perf_time__parse_str() fails in perf_sched__timehist(), need to free session that was previously created, fix it. Fixes: `853b740711` ("perf sched timehist: Add option to specify time window of interest") Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240806023533.1316348-1-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-16 19:31:15 -03:00
Matt Fleming	ac01c8c424	perf hist: Update hist symbol when updating maps AddressSanitizer found a use-after-free bug in the symbol code which manifested as 'perf top' segfaulting. ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80 READ of size 1 at 0x60b00c48844b thread T193 #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310 #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286 #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614 #3 0x5650d804568f in __hists__add_entry util/hist.c:754 #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772 #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997 #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242 #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845 #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208 #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245 #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324 #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120 #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442 #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 When updating hist maps it's also necessary to update the hist symbol reference because the old one gets freed in map__put(). While this bug was probably introduced with `5c24b67aae` ("perf tools: Replace map->referenced & maps->removed_maps with map->refcnt"), the symbol objects were leaked until `c087e9480c` ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so the bug was masked. Fixes: `c087e9480c` ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") Reported-by: Yunzhao Li <yunzhao@cloudflare.com> Signed-off-by: Matt Fleming (Cloudflare) <matt@readmodwrite.com> Cc: Ian Rogers <irogers@google.com> Cc: kernel-team@cloudflare.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org # v5.13+ Link: https://lore.kernel.org/r/20240815142212.3834625-1-matt@readmodwrite.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-15 11:50:13 -03:00
Veronika Molnarova	27ac597c0e	perf test record.sh: Raise limit of open file descriptors Subtest for system-wide record with '--threads=cpu' option fails due to a limit of open file descriptors on systems with 128 or more CPUs as the default limit is set to 1024. The number of open file descriptors should be slightly above nmb_eventsnmb_cpus + nmb_cpus(for perf.data.n) + 4nmb_cpus(for pipes), which equals 8nmb_cpus. Therefore, temporarily raise the limit to 16nmb_cpus for the test. Committer notes: Instead of disabling ShellCheck warnings all the uses of 'uname -n', i.e. those: In tests/shell/record.sh line 35: default_fd_limit=$(ulimit -Sn) ^-^ SC3045 (warning): In POSIX sh, ulimit -S is undefined. We can just switch from using '/bin/sh' to '/bin/bash' for this test, as bash _has_ 'ulimit -n', so ShellCheck will not emit that warning. There are dozens of 'perf test' shell tests that do just that, '/bin/bash' is a reasonable expectation for those tests. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Radostin Stoyanov <rstoyano@redhat.com> Link: https://lore.kernel.org/linux-perf-users/20240429085721.10122-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-14 12:55:48 -03:00
Kan Liang	dab5b6cb0d	perf test: Add new test cases for the branch counter feature Enhance the test case for the branch counter feature. Now, the test verifies: - The new filter can be successfully applied on the supported platforms. - The counter value can be outputted via the perf report -D - The counter value and the abbr name can be outputted via the perf script (New) Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-10-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-14 10:20:40 -03:00
Kan Liang	6f9d8d1de2	perf script: Add branch counters It's useful to print the branch counter information for each jump in the brstackinsn when it's available. Add a new field 'brcntr' to display the branch counter information. By default, the abbreviation will be used to indicate the branch counter. In the verbose mode, the real event name is shown. $ perf script -F +brstackinsn,+brcntr # Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit) f3+31: 0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5] 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC 0000000000401766 insn: 8b 45 fc 0000000000401769 insn: 83 e0 01 000000000040176c insn: 85 c0 000000000040176e insn: 74 06 br_cntr: A # PRED 1 cycles [7] 4.00 IPC 0000000000401776 insn: 83 45 fc 01 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 7 cycles [14] 0.43 IPC $ perf script -F +brstackinsn,+brcntr -v tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (/home/sdp/os.linux.perf.test-suite/kernels/lbr_kernel/tchain_edit) f3+31: 0000000000401774 insn: eb 04 br_cntr: branch-instructions:ppp 2 branch-misses 0 # PRED 5 cycles [5] 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [6] 2.00 IPC 0000000000401766 insn: 8b 45 fc 0000000000401769 insn: 83 e0 01 000000000040176c insn: 85 c0 000000000040176e insn: 74 06 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [7] 4.00 IPC 0000000000401776 insn: 83 45 fc 01 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 7 cycles [14] 0.43 IPC Originally-by: Tinghao Zhang <tinghao.zhang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-9-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-14 10:20:40 -03:00
Kan Liang	e6952dcec8	perf annotate: Display the branch counter histogram Display the branch counter histogram in the annotation view. Press 'B' to display the branch counter's abbreviation list as well. Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }', 4000 Hz, Event count (approx.): f3 /home/sdp/test/tchain_edit [Percent: local period] Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%) │ 0000000000401755 <f3>: 0.00 0.00 │ endbr64 │ push %rbp │ mov %rsp,%rbp │ movl $0x0,-0x4(%rbp) 0.00 0.00 │1.33 3 \|A \|- \| ↓ jmp 25 11.03 11.03 │ 11: mov -0x4(%rbp),%eax │ and $0x1,%eax │ test %eax,%eax 17.13 17.13 │2.41 1 \|A \|- \| ↓ je 21 │ addl $0x1,-0x4(%rbp) 21.84 21.84 │2.22 2 \|AA \|- \| ↓ jmp 25 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp) 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp) 11.03 11.03 │0.61 3 \|A \|- \| ↑ jle 11 │ nop │ pop %rbp 0.00 0.00 │0.24 20 \|AA \|B \| ← ret Originally-by: Tinghao Zhang <tinghao.zhang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-8-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-14 10:20:40 -03:00

... 11 12 13 14 15 ...

17720 Commits