Confidential Computing:
* Register a platform device when running in CCA realm mode to enable
automatic loading of dependent modules.
CPU Features:
* Update a bunch of system register definitions to pick up new field
encodings from the architectural documentation.
* Add hwcaps and selftests for the new (2024) dpISA extensions.
Documentation:
* Update EL3 (firmware) requirements for booting Linux on modern arm64
designs.
* Remove stale information about the kernel virtual memory map.
Miscellaneous:
* Minor cleanups and typo fixes.
Memory management:
* Fix vmemmap_check_pmd() to look at the PMD type bits
* LPA2 (52-bit physical addressing) cleanups and minor fixes.
* Adjust physical address space depending upon whether or not LPA2 is
enabled.
Perf and PMUs:
* Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU
* Extend AXI filtering support for the DDR PMU on NXP IMX SoCs
* Fix Designware PCIe PMU event numbering.
* Add generic branch events for the Apple M1 CPU PMU.
* Add support for Marvell Odyssey DDR and LLC-TAD PMUs.
* Cleanups to the Hisilicon DDRC and Uncore PMU code.
* Advertise discard mode for the SPE PMU.
* Add the perf users mailing list to our MAINTAINERS entry.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmeKZLcQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNEQzB/0X2U89ZiqxIkTPQvfFrjN/uUGybkq59rEL
DfeoGukTgJIwc3GHWXXtQ//wuuYKdTeCXaIz5NFK3+7/wmKSLvjkexmue8pta6EY
5rx9bAPr/D8lAUvhKIN2l3pF/ygoRwDz+nT2yVQ1xlZxYJWX7ZIsMj7W7ceb5kdx
HRrTSQuhEEPREAWWO4oCMWl5SQZSrIflSE3Be/PsP0OhW6k//ZmWbcJTgUcHbKam
o2WtNjITyGzxMpRCcrGEZKoe9YcwSxiut/PoD7JuoB4C/rbsf1cdJ6uLmtvGJcZj
qsdRHhVfBzP1+ahONrDbiT3C2+s1UZySKdCDIxiYy6lB39wpP0dd
=E7Mf
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"We've got a little less than normal thanks to the holidays in
December, but there's the usual summary below. The highlight is
probably the 52-bit physical addressing (LPA2) clean-up from Ard.
Confidential Computing:
- Register a platform device when running in CCA realm mode to enable
automatic loading of dependent modules
CPU Features:
- Update a bunch of system register definitions to pick up new field
encodings from the architectural documentation
- Add hwcaps and selftests for the new (2024) dpISA extensions
Documentation:
- Update EL3 (firmware) requirements for booting Linux on modern
arm64 designs
- Remove stale information about the kernel virtual memory map
Miscellaneous:
- Minor cleanups and typo fixes
Memory management:
- Fix vmemmap_check_pmd() to look at the PMD type bits
- LPA2 (52-bit physical addressing) cleanups and minor fixes
- Adjust physical address space depending upon whether or not LPA2 is
enabled
Perf and PMUs:
- Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU
- Extend AXI filtering support for the DDR PMU on NXP IMX SoCs
- Fix Designware PCIe PMU event numbering
- Add generic branch events for the Apple M1 CPU PMU
- Add support for Marvell Odyssey DDR and LLC-TAD PMUs
- Cleanups to the Hisilicon DDRC and Uncore PMU code
- Advertise discard mode for the SPE PMU
- Add the perf users mailing list to our MAINTAINERS entry"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
Documentation: arm64: Remove stale and redundant virtual memory diagrams
perf docs: arm_spe: Document new discard mode
perf: arm_spe: Add format option for discard mode
MAINTAINERS: Add perf list for drivers/perf/
arm64: Remove duplicate included header
drivers/perf: apple_m1: Map generic branch events
arm64: rsi: Add automatic arm-cca-guest module loading
kselftest/arm64: Add 2024 dpISA extensions to hwcap test
KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1
arm64/hwcap: Describe 2024 dpISA extensions to userspace
arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12
arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association
arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()
arm64: mm: Test for pmd_sect() in vmemmap_check_pmd()
arm64/mm: Replace open encodings with PXD_TABLE_BIT
arm64/mm: Rename pte_mkpresent() as pte_mkvalid()
arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09
...
In some of the powerpc platforms, event group testcase fails as below:
# perf test -v 'Event groups'
69: Event groups :
--- start ---
test child forked, pid 9765
Using CPUID 0x00820200
Using hv_24x7 for uncore pmu event
0x0 0x0, 0x0 0x0, 0x0 0x0: Fail
0x0 0x0, 0x0 0x0, 0x1 0x3: Pass
The testcase creates various combinations of hw, sw and uncore
PMU events and verify group creation succeeds or fails as expected.
This tests one of the limitation in perf where it doesn't allow
creating a group of events from different hw PMUs.
The testcase starts a leader event and opens two sibling events.
The combination the fails is three hardware events in a group.
"0x0 0x0, 0x0 0x0, 0x0 0x0: Fail"
Type zero and config zero which translates to PERF_TYPE_HARDWARE
and PERF_COUNT_HW_CPU_CYCLE. There is event constraint in powerpc
that events using same counter cannot be programmed in a group.
Here there is one alternative event for cycles, hence one leader
and only one sibling event can go in as a group.
if all three events (leader and two sibling events), are hardware
events, use instructions as one of the sibling event. Since
PERF_COUNT_HW_INSTRUCTIONS is a generic hardware event and present
in all architectures, use this as third event.
Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20250110094620.94976-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The comparison function cmpworker() violates the C standard's
requirements for qsort() comparison functions, which mandate symmetry
and transitivity:
Symmetry: If x < y, then y > x.
Transitivity: If x < y and y < z, then x < z.
In its current implementation, cmpworker() incorrectly returns 0 when
w1->tid < w2->tid, which breaks both symmetry and transitivity. This
violation causes undefined behavior, potentially leading to issues such
as memory corruption in glibc [1].
Fix the issue by returning -1 when w1->tid < w2->tid, ensuring
compliance with the C standard and preventing undefined behavior.
Link: https://www.qualys.com/2024/01/30/qsort.txt [1]
Fixes: 121dd9ea01 ("perf bench: Add epoll parallel epoll_wait benchmark")
Cc: stable@vger.kernel.org
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250116110842.4087530-1-visitorckw@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
An evsel idx may not be stable due to sorting, evlist removal,
etc. Try to reduce it being part of APIs by explicitly passing the
evsel in annotate code. Internally the code just reads evsel->core.idx
so behavior is unchanged.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Chen Ni <nichen@iscas.ac.cn>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20250117181848.690474-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
`lock_type_table` contains `name` and `str` which can be confusing.
Rename them to `flags_name` and `lock_name` and add descriptions to
enhance understanding.
Tested by building perf for x86.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Cc: nick.forrington@arm.com
Link: https://lore.kernel.org/r/20250116235838.2769691-3-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
percpu-rwsem was missing in man page. And for backward compatibility,
replace `pcpu-sem` with `percpu-rwsem` before parsing lock name.
Tested `./perf lock con -ab -Y pcpu-sem` and `./perf lock con -ab -Y
percpu-rwsem`
Fixes: 4f701063bf ("perf lock contention: Show lock type with address")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Cc: nick.forrington@arm.com
Link: https://lore.kernel.org/r/20250116235838.2769691-2-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
`parse_lock_type` can only add the first lock flag in `lock_type_table`
given input `str`. For example, for `Y rwlock`, it only adds `rwlock:R`
into this perf session. Another example is for `-Y mutex`, it only adds
the mutex without `LCB_F_SPIN` flag. The patch fixes this issue, makes
sure both `rwlock:R` and `rwlock:W` will be added with `-Y rwlock`, and
so on.
Testing:
$ ./perf lock con -ab -Y mutex,rwlock -- perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 9.313 [sec]
9.313976 usecs/op
107365 ops/sec
contended total wait max wait avg wait type caller
176 1.65 ms 19.43 us 9.38 us mutex pipe_read+0x57
34 180.14 us 10.93 us 5.30 us mutex pipe_write+0x50
7 77.48 us 16.09 us 11.07 us mutex do_epoll_wait+0x24d
7 74.70 us 13.50 us 10.67 us mutex do_epoll_wait+0x24d
3 35.97 us 14.44 us 11.99 us rwlock:W ep_done_scan+0x2d
3 35.00 us 12.23 us 11.66 us rwlock:W do_epoll_wait+0x255
2 15.88 us 11.96 us 7.94 us rwlock:W do_epoll_wait+0x47c
1 15.23 us 15.23 us 15.23 us rwlock:W do_epoll_wait+0x4d0
1 14.26 us 14.26 us 14.26 us rwlock:W ep_done_scan+0x2d
2 14.00 us 7.99 us 7.00 us mutex pipe_read+0x282
1 12.29 us 12.29 us 12.29 us rwlock:R ep_poll_callback+0x35
1 12.02 us 12.02 us 12.02 us rwlock:W do_epoll_ctl+0xb65
1 10.25 us 10.25 us 10.25 us rwlock:R ep_poll_callback+0x35
1 7.86 us 7.86 us 7.86 us mutex do_epoll_ctl+0x6c1
1 5.04 us 5.04 us 5.04 us mutex do_epoll_ctl+0x3d4
[namhyung: Add a comment and rename to 'mutex:spin' for consistency
Fixes: d783ea8f62 ("perf lock contention: Simplify parse_lock_type()")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Cc: nick.forrington@arm.com
Link: https://lore.kernel.org/r/20250116235838.2769691-1-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
perf lock contention returns zero exit value even if the lock contention
BPF setup failed.
# ./perf lock con -b true
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux
libbpf: failed to find valid kernel BTF
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -ESRCH
libbpf: failed to load object 'lock_contention_bpf'
libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH
Failed to load lock-contention BPF skeleton
lock contention BPF setup failed
# echo $?
0
Fix this by saving the return code for lock_contention_prepare
so that command exits with proper return code. Similarly set the
return code properly for two other functions in builtin-lock, namely
setup_output_field() and select_key().
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250110093730.93610-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
hpp__width_fn() round up width to length of the field name,
hpp__fmt() should do it too. Otherwise, the numbers may
end up unaligned if the field name is long.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250108065949.235718-1-dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When a filtered column is not present in the sort order, profiles become
arbitrary broken. Filtered and non-filtered entries are collapsed
together, and the filtered-by field ends up with a random value (either
from a filtered or non-filtered entry). If we end up with filtered
entry/value, then the whole collapsed entry will be filtered out and will
be missing in the profile. If we end up with non-filtered entry/value,
then the overhead value will be wrongly larger (include some subset
of filtered out samples).
This leads to very confusing profiles. The problem is hard to notice,
and if noticed hard to understand. If the filter is for a single value,
then it can be fixed by adding the corresponding field to the sort order
(provided user understood the problem). But if the filter is for multiple
values, it's impossible to fix b/c there is no concept of binary sorting
based on filter predicate (we want to group all non-filtered values in
one bucket, and all filtered values in another).
Examples of affected commands:
perf report --tid=123
perf report --sort overhead,symbol --comm=foo,bar
Fix this by considering filtered status as the highest priority
sort/collapse predicate.
As a side effect this effectively adds a new feature of showing profile
where several lines are combined based on arbitrary filtering predicate.
For example, showing symbols from binaries foo and bar combined together,
but not from other binaries; or showing combined overhead of several
particular threads.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/359dc444ce94d20e59d3a9e360c36fbeac833a04.1736927981.git.dvyukov@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add a little more detail on the output expectations for each verbose
level.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250110045736.598281-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To detect flakes it is useful to run tests more than once. Add a
runs-per-test flag that will run each test multiple times. Example
output:
```
$ perf test -r 3 lbr -v
122: perf record LBR tests : Ok
122: perf record LBR tests : Ok
122: perf record LBR tests : Ok
```
Update the documentation for the runs-per-test option.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250110045736.598281-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The parallel option was removed in commit 94d1a913bd ("perf test:
Make parallel testing the default"). Update the sequential
documentation to reflect it isn't the default except for "exclusive"
tests.
Fixes: 94d1a913bd ("perf test: Make parallel testing the default")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250110045736.598281-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Follow the workload listing in using stdout rather than
stderr. Correct the numbering of sub-tests to be 1.1 rather than 1:1.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250110045736.598281-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The relationship between subtests and test cases is somewhat
confusing, so let's do away with the notion of sub-tests and switch to
just working with some number of test cases. Add a
test_suite__for_each_test_case as in many cases, except the special
one test case situation, the iteration can just be on all test
cases. Switch variable names to be more intention revealing of what
their value is.
This work was motivated by discussion with Kan where it was noted the
code is becoming overly indented:
https://lore.kernel.org/lkml/20241109160219.49976-1-irogers@google.com/
Unifying more of the sub-test/no-sub-tests avoids one level of
indentation in a number of places.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250110045736.598281-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The variables to make builds silent/verbose live inside
tools/build/Makefile.build. Move those variables to the top-level
Makefile.perf to be generally available.
Committer testing:
See the SYSCALL lines, now they are consistent with the other
operations in other lines:
SYSTBL /tmp/build/perf-tools-next/arch/x86/include/generated/asm/syscalls_32.h
SYSTBL /tmp/build/perf-tools-next/arch/x86/include/generated/asm/syscalls_64.h
GEN /tmp/build/perf-tools-next/common-cmds.h
GEN /tmp/build/perf-tools-next/arch/arm64/include/generated/asm/sysreg-defs.h
PERF_VERSION = 6.13.rc2.g3d94bb6ed1d0
GEN perf-archive
MKDIR /tmp/build/perf-tools-next/jvmti/
MKDIR /tmp/build/perf-tools-next/jvmti/
MKDIR /tmp/build/perf-tools-next/jvmti/
MKDIR /tmp/build/perf-tools-next/jvmti/
GEN perf-iostat
CC /tmp/build/perf-tools-next/jvmti/libjvmti.o
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20250114-perf_make_test-v1-1-decc1c517b11@rivosinc.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To allow for setting a variable from some other tool, like with the
"wallclock" patchset needs to allow the user to opt-in to having
that key in the sort order for 'perf report'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/Z4akewi7UPXpagce@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In 'perf test', a return value 2 represents that the test case was
skipped. Fix this value for perftool_testsuite test cases to
differentiate between skip and pass values.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250113182605.130719-3-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Properly name the test cases of perftool_testsuite instead of the
license being taken as the name for 'perf test'.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250113182605.130719-2-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test failed back and forth due to the call chain being heavily
impacted by the libc, which varies across different architectures and
distros.
The libc contains the symbols for "gaih_inet" and "getaddrinfo" in some
cases, but not always. Moreover, these symbols can be either normal
symbols or dynamic symbols, making it difficult to decide the call chain
entries due to the symbols are inconsistent.
To fix the issue, this commit identifies three call chain entries are
always present. These entries are matched by iterating through the
lines in the "perf script" result. The recording attribute max-stack is
set to 4 for the possible maximum call chain depth.
After:
# perf test -vF pton
--- start ---
Pattern: ping[][0-9 \.:]+probe_libc:inet_pton: \([[:xdigit:]]+\)
Matching: ping 285058 [025] 1219802.466939: probe_libc:inet_pton: (ffffa14b7cf0)
Pattern: .*inet_pton\+0x[[:xdigit:]]+[[:space:]]\(/usr/lib/aarch64-linux-gnu/libc-2.31.so|inlined\)$
Matching: ping 285058 [025] 1219802.466939: probe_libc:inet_pton: (ffffa14b7cf0)
Matching: ffffa14b7cf0 __GI___inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc-2.31.so)
Pattern: .*(\+0x[[:xdigit:]]+|\[unknown\])[[:space:]]\(.*/bin/ping.*\)$
Matching: ping 285058 [025] 1219802.466939: probe_libc:inet_pton: (ffffa14b7cf0)
Matching: ffffa14b7cf0 __GI___inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc-2.31.so)
Matching: ffffa1488040 getaddrinfo+0xe8 (/usr/lib/aarch64-linux-gnu/libc-2.31.so)
Matching: aaaab8672da4 [unknown] (/usr/bin/ping)
---- end ----
82: probe libc's inet_pton & backtrace it with ping : Ok
Closes: https://lore.kernel.org/linux-perf-users/1728978807-81116-1-git-send-email-renyu.zj@linux.alibaba.com/
Closes: https://lore.kernel.org/linux-perf-users/Z0X3AYUWkAgfPpWj@x1/T/#m57327e135b156047e37d214a0d453af6ae1e02be
Reported-by: Guilherme Amadio <amadio@gentoo.org>
Reported-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241202111958.553403-1-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Local variables were missing initialization and command line
processing didn't provide default values.
Fixes: 64eed019f3 ("perf inject: Lazy build-id mmap2 event insertion")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241211060831.806539-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rename err to out to avoid confusion because buf is still supposed to be
freed in non error cases.
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241211085525.519458-3-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The cycles event will fallback to task-clock in the hybrid test when
running virtualized. Change the test to not fail for this.
Fixes: 65d1182191 ("perf test: Add a test for default perf stat command")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241212173354.9860-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When kernel is built without debuginfo, running 'perf record' with
--off-cpu results in segfault as below:
./perf record --off-cpu -e dummy sleep 1
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux
libbpf: failed to find valid kernel BTF
Segmentation fault (core dumped)
The backtrace pointed to:
#0 0x00000000100fb17c in btf.type_cnt ()
#1 0x00000000100fc1a8 in btf_find_by_name_kind ()
#2 0x00000000100fc38c in btf.find_by_name_kind ()
#3 0x00000000102ee3ac in off_cpu_prepare ()
#4 0x000000001002f78c in cmd_record ()
#5 0x00000000100aee78 in run_builtin ()
#6 0x00000000100af3e4 in handle_internal_command ()
#7 0x000000001001004c in main ()
Code sequence is:
static void check_sched_switch_args(void)
{
struct btf *btf = btf__load_vmlinux_btf();
const struct btf_type *t1, *t2, *t3;
u32 type_id;
type_id = btf__find_by_name_kind(btf, "btf_trace_sched_switch",
BTF_KIND_TYPEDEF);
btf__load_vmlinux_btf() fails when CONFIG_DEBUG_INFO_BTF is not enabled.
Here bpf__find_by_name_kind() calls btf__type_cnt() with NULL btf value
and results in segfault.
To fix this, add a check to see if btf is not NULL before invoking
bpf__find_by_name_kind().
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://lore.kernel.org/r/20241223135813.8175-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perftool-testsuite_probe fails in test_adding_kernel as below:
Regexp not found: "probe:inode_permission_11"
-- [ FAIL ] -- perf_probe :: test_adding_kernel :: force-adding probes ::
second probe adding (with force) (output regexp parsing)
event syntax error: 'probe:inode_permission_11'
\___ unknown tracepoint
Error: File /sys/kernel/tracing//events/probe/inode_permission_11
not found.
Hint: Perhaps this kernel misses some CONFIG_ setting to
enable this feature?.
The test does the following:
1) Adds a probe point first using:
$CMD_PERF probe --add $TEST_PROBE
2) Then tries to add same probe again without —force and expects it to
fail. Next tries to add same probe again with —force. In this case,
perf probe succeeds and adds the probe with a suffix number. Example:
./perf probe --add inode_permission
Added new event:
probe:inode_permission (on inode_permission)
./perf probe --add inode_permission --force
Added new event:
probe:inode_permission_1 (on inode_permission)
./perf probe --add inode_permission --force
Added new event:
probe:inode_permission_2 (on inode_permission)
Each time, suffix is added to existing probe name.
To get the suffix number, test cases uses:
NO_OF_PROBES=`$CMD_PERF probe -l | wc -l`
This will work if there is no other probe existing in the system. If
there are any other probes other than kernel probes or inode_permission,
( example: any probe), "perf probe -l" will include count for other
probes too.
Example, in the system where this failed, already some probes were
default added. So count became 10
./perf probe -l | wc -l
10
So to be specific for "inode_permission", restrict the probe count check
to that probe point alone using:
NO_OF_PROBES=`$CMD_PERF probe -l $TEST_PROBE| wc -l`
Similarly while removing the probe using "probe --del *", (removing all
probes), check uses:
../common/check_all_lines_matched.pl "Removed event: probe:$TEST_PROBE"
But if there are other probes in the system, the log will contain
reference to other existing probe too. Hence change usage of
check_all_lines_matched.pl to check_all_patterns_found.pl This will make
sure expecting string comes in the result
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250110094324.94604-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The standalone tarballs should include the license files - both the
COPYING declaration as well as the text of GPLv2.
Signed-off-by: Michel Lind <michel@michel-slm.name>
Link: https://lore.kernel.org/r/Z0Zcx0WRqtlUYpgw@hyperscale.parallels
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The brstack test runs quite slowly in software models. Part of the reason
is "xargs -n1" is quite inefficient in replacing spaces with newlines.
While that's not noticeable on normal machines, it is on software models.
Use "tr -s ' ' '\n'" instead which can do the same transformation, but is
much faster. For comparison on an M1 Macbook Pro:
$ time seq -s ' ' 10000 | xargs -n1 > /dev/null
real 0m2.729s
user 0m2.009s
sys 0m0.914s
$ time seq -s ' ' 10000 | tr -s ' ' '\n' | grep '.' > /dev/null
real 0m0.002s
user 0m0.001s
sys 0m0.001s
The "grep '.'" is also needed to remove any remaining blank lines.
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241213231312.2640687-2-robh@kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
[robh: Drop changing loop iterations on arm64. Squash blank line fix and redo commit msg]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The mips syscall generation was still based on the old method. Delete
the Makefile since it is no longer needed with the new method of
generation.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes: 619ffe6694 ("perf tools mips: Use generic syscall scripts")
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250110-perf_fix_mips-v1-1-4e661c3b710a@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a test that checks that there were no AUX or AUXTRACE events
recorded when discard mode is used.
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Graham Woodward <graham.woodward@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108142904.401139-6-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The buffer will never be written to so don't bother allocating it. The
tracking event is also not required.
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Graham Woodward <graham.woodward@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108142904.401139-5-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
These won't be used in the next commit in discard mode, so put them in
their own functions. No functional changes intended.
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Graham Woodward <graham.woodward@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108142904.401139-4-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The wrong help message may mislead users. This commit fixes it.
Fixes: 328ccdace8 ("perf report: Add --no-demangle option")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiachen Zhang <me@jcix.top>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250109152220.1869581-1-me@jcix.top
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Document the flag along with PMU events to hint what it's used for and
give an example with other useful options to get minimal output.
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250108142904.401139-3-james.clark@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Needed to build tools/lib/bpf/ on various arches other than x86_64,
notably arm64 when using the perf tarballs generated by:
$ make help | grep perf-
perf-tar-src-pkg - Build the perf source tarball with no compression
perf-targz-src-pkg - Build the perf source tarball with gzip compression
perf-tarbz2-src-pkg - Build the perf source tarball with bz2 compression
perf-tarxz-src-pkg - Build the perf source tarball with xz compression
perf-tarzst-src-pkg - Build the perf source tarball with zst compression
$
Building with BPF support was opt-in in perf for a long time, and
testing it via the tarball main kernel Makefile targets in an
architecture other than x86_64 was an odd case.
I had noticed this at some point earlier this year while cross building
perf to some arches, including arm64, but it fell thru the cracks, see
the Link tag below.
Fix it now by adding those arch/*/include/uapi/asm/bpf_perf_event.h
files to the MANIFEST file used in building the perf source tarball.
Tested with:
perfbuilder@number:~$ time dm debian:experimental-x-arm64
1 21.60 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 14.1.0-5) 14.1.0 flex 2.6.4
BUILD_TARBALL_HEAD=d31a974f6edc576f84c35be9526fec549a3b3520
$
$ git log --oneline -1 d31a974f6edc576f84c35be9526fec549a3b3520
d31a974f6edc576f (HEAD -> perf-tools-next) perf MANIFEST: Add arch/*/include/uapi/asm/bpf_perf_event.h to the perf tarball
$
That was previously failing:
perfbuilder@number:~$ grep debian:experimental-x-arm64 dm.log.old/summary
19 4.80 debian:experimental-x-arm64 : FAIL gcc version 14.1.0 (Debian 14.1.0-5)
$
perfbuilder@number:~$ grep -B6 'Error 1' dm.log.old/debian:experimental-x-arm64
In file included from /git/perf-6.12.0-rc6/tools/include/uapi/linux/bpf_perf_event.h:11,
from libbpf.c:36:
/git/perf-6.12.0-rc6/tools/include/uapi/asm/bpf_perf_event.h:2:10: fatal error: ../../arch/arm64/include/uapi/asm/bpf_perf_event.h: No such file or directory
2 | #include "../../arch/arm64/include/uapi/asm/bpf_perf_event.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[4]: *** [/git/perf-6.12.0-rc6/tools/build/Makefile.build:105: /tmp/build/perf/libbpf/staticobjs/libbpf.o] Error 1
perfbuilder@number:~$
Closes: https://lore.kernel.org/all/Z0UNRCRYKunbDYxP@hyperscale.parallels
Fixes: 9eea8fafe3 ("libbpf: fix __arg_ctx type enforcement for perf_event programs")
Reported-by: Michel Lind <michel@michel-slm.name>
Tested-by: Michel Lind <michel@michel-slm.name>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: 317c11923cf676437456e44a7f408d4ce589a9c0.camel@michel-slm.name
Link: https://lore.kernel.org/bpf/ZfyEgoG3JFiOs2Fs@x1/
Link: https://lore.kernel.org/r/Z0Yy5u42Q1hWoEzz@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add PMU events for FUJITSU-MONAKA.
And, also updated common-and-microarch.json and recommended.json.
FUJITSU-MONAKA Specification URL:
https://github.com/fujitsu/FUJITSU-MONAKA
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Akio Kakuno <fj3333bs@aa.jp.fujitsu.com>
Signed-off-by: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20241217065751.1448755-1-fj5100bi@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In machine__create_module(), it reads /proc/modules to get a list of
modules in the system. The file shows the start address (of text) and
the size of the module so it uses the info to reconstruct system memory
maps for symbol resolution.
But module memory consists of multiple segments and they can be
scaterred. Currently perf tools assume they are contiguous and see some
overlaps. This can confuse the tool when it finds a map containing a
given address.
As we mostly care about the function symbols in the text segment, it can
fixup the size or end address of modules when there's an overlap. We
can use maps__fixup_end() which updates the end address using the start
address of the next map.
Ideally it should be able to track other segments (like data/rodata),
but that would require some changes in /proc/modules IMHO.
Reported-by: Blake Jones <blakejones@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20241218220453.203069-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When there are more than one symbols at the same address, it needs to
choose which one is better. In choose_best_symbol() it didn't check the
type of symbols. It's possible to have labels in other symbols and in
that case, it would be better to pick the actual symbol over the labels.
To minimize the possible impact on other symbols, I only check NOTYPE
symbols specifically.
$ readelf -sW vmlinux | grep -e __do_softirq -e __softirqentry_text_start
105089: ffffffff82000000 814 FUNC GLOBAL DEFAULT 1 __do_softirq
111954: ffffffff82000000 0 NOTYPE GLOBAL DEFAULT 1 __softirqentry_text_start
The commit 77b004f4c5 tried to do the same by not giving the size
to the label symbols but it seems there's some label-only symbols in asm
code. Let's restore the original code and choose the right symbol using
type of the symbols.
Fixes: 77b004f4c5 ("perf symbol: Do not fixup end address of labels")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: http://lore.kernel.org/lkml/Z3b-DqBMnNb4ucEm@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
cxx_demangle_sym is weak in case demangle-cxx.c replaces the
definition in symbol-elf.c. When demangle-cxx.c is built
HAVE_CXA_DEMANGLE_SUPPORT is defined, as such the define can be used
to avoid a weak symbol.
As weak symbols are outside of the C standard their use can lead to
strange behaviors, in particular with LTO, as well as causing issues to
be hidden at link time.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241119031754.1021858-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some version of compilers reported unaligned accesses in perf trace when
undefined-behavior sanitizer is on. I found that it uses raw data in
the sample directly and assuming it's properly aligned.
Unlike other sample fields, the raw data is not 8-byte aligned because
there's a size field (u32) before the actual data. So I added a static
buffer in syscall__augmented_args() and return it instead. This is not
ideal but should work well as perf trace is single-threaded.
A better approach would be aligning the raw data by adding a 4-byte data
before the augmented args but I'm afraid it'd break the backward
compatibility.
Committer testing:
To build with the undefined behaviour sanitizer:
$ make CC=clang EXTRA_CFLAGS=-fsanitize=undefined -C tools/perf
Checking if the resulting binary is instrumented:
root@number:~# nm ~/bin/perf | grep ubsan | wc -l
113
root@number:~# nm ~/bin/perf | grep ubsan | tail -5
000000000043d5b0 t _ZN7__ubsanL19UBsanOnDeadlySignalEiPvS0_
000000000043ce50 T _ZNK7__ubsan5Value12getSIntValueEv
000000000043cf40 T _ZNK7__ubsan5Value12getUIntValueEv
000000000043d140 T _ZNK7__ubsan5Value13getFloatValueEv
000000000043cfd0 T _ZNK7__ubsan5Value19getPositiveIntValueEv
root@number:~#
Now running something that will access timespec, as reported in the
Closes URL:
root@number:~# perf trace --max-events=1 -e *nano* sleep 1.1
trace/beauty/timespec.c:10:64: runtime error: member access within misaligned address 0x7fc583cfb2a4 for type 'struct augmented_arg', which requires 8 byte alignment
0x7fc583cfb2a4: note: pointer points here
99 99 11 00 10 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 e1 f5 05 00 00 00 00 00 00 00 00
^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior trace/beauty/timespec.c:10:64
<SNIP>
As Namhyung said we need to make the raw_data to be 64-bit aligned,
probably we need to add a PERF_SAMPLE_ALIGNED_RAW with a 64-bit raw_size
instead of the current u32 done at kernel/events/core.c,
perf_output_sample(), that perf_output_put(handle, raw->size) where
raw->size is an u32 and then the raw_data is always 64-bit unaligned...
After the patch:
root@number:~# perf trace -e *nano* sleep 1.1
0.000 (1100.064 ms): sleep/1984224 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 100000001 }, rmtp: 0x7fff5b3fe970) = 0
root@number:~#
Closes: https://lore.kernel.org/r/Z2STgyD1p456Qqhg@google.com
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250102201248.790841-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Probes are global and other probe tests are already exclusive. These
two tests can throw warnings when run at the same time so mark them as
exclusive too:
$ perf test -vvv 81 79
79: perftool-testsuite_probe:
--- start ---
test child forked, pid 46419
../common/init.sh: line 137: /sys/kernel/debug/tracing/uprobe_events: Device or resource busy
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20250107165933.292225-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
All architectures now support HAVE_SYSCALL_TABLE_SUPPORT, so the flag is
no longer needed. With the removal of the flag, the related
GENERIC_SYSCALL_TABLE can also be removed.
libaudit was only used as a fallback for when HAVE_SYSCALL_TABLE_SUPPORT
was not defined, so libaudit is also no longer needed for any
architecture.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-16-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the generic scripts to generate headers from the syscall table
instead of the custom ones for s390.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-15-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the generic scripts to generate headers from the syscall table
instead of the custom ones for powerpc.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-14-7543b5293098@rivosinc.com
Link: https://lore.kernel.org/lkml/20250110100505.78d81450@canb.auug.org.au
[ Stephen Rothwell noticed on linux-next that the powerpc build for perf was broken and ...]
Link: https://lore.kernel.org/lkml/20250109-perf_powerpc_spu-v1-1-c097fc43737e@rivosinc.com
[ ... Charlie fixed it up and asked for it to be squashed to avoid breaking bisection. ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the generic scripts to generate headers from the syscall table for
mips.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-13-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
loongarch uses a syscall table, use that in perf instead of using unistd.h.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-12-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
arm64 uses a syscall table, use that in perf instead of using unistd.h.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-11-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
parisc uses a syscall table, use that in perf instead of requiring
libaudit.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-10-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
alpha uses a syscall table, use that in perf instead of requiring
libaudit.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-9-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the generic scripts to generate headers from the syscall table for
both 32- and 64-bit x86.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-8-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
xtensa uses a syscall table, use that in perf instead of requiring
libaudit.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-7-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
sparc uses a syscall table, use that in perf instead of requiring
libaudit.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-6-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
sh uses a syscall table, use that in perf instead of requiring libaudit.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-5-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
arm uses a syscall table, use that in perf instead of requiring
libaudit.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-4-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
csky uses the generic syscall table, use that in perf instead of
requiring libaudit.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Guo Ren <guoren@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-3-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arc uses the generic syscall table, use that in perf instead of
requiring libaudit.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-2-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently each architecture in perf independently generates syscall
headers.
Adapt the work that has gone into unifying syscall header
implementations in the kernel to work with perf tools.
Introduce this framework with riscv at first. riscv previously relied on
libaudit, but with this change, perf tools for riscv no longer needs
this external dependency.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-1-7543b5293098@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously cpu maps in the test weren't modified by calls to the cpu map
API, however, perf_cpu_map__merge was modified so the left hand argument
was updated.
In the test this meant the maps copy of the "two" map was put/deleted in
the merge meaning when accessed via maps, the pointer was stale and to
the put/deleted memory.
To fix this add an extra layer of indirection to the maps array, so the
updated value of two is accessed.
Fixes: a9d2217556 ("libperf cpumap: Refactor perf_cpu_map__merge()")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108051511.1720369-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove unused symbol_conf.h include.
First, it's just unused. Second, it's problematic since this is a C++
file, and most perf headers don't compile as C++. So if any other
includes are added to symbol_conf.h, it may break the build.
Signed-off-by: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250108070248.237943-1-dvyukov@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Shellcheck versions < v0.7.2 can't follow this path so add the helper to
fix the following warning:
tests/shell/trace_btf_general.sh line 8:
. "$(dirname $0)"/lib/probe.sh
^--------------------------^ SC1090: Can't follow non-constant source.
Use a directive to specify location.
Fixes: 0255338d69 ("perf trace: Add tests for BTF general augmentation")
Signed-off-by: James Clark <james.clark@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250106164300.734202-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When adding support for refconunt checking a cut'n'paste made this
function, that is just an accessor to a bool member of 'struct nsinfo',
return a pid_t, when that member is a boolean, fix it.
Fixes: bcaf0a9785 ("perf namespaces: Add functions to access nsinfo")
Reported-by: Francesco Nigro <fnigro@redhat.com>
Reported-by: Ilan Green <igreen@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
Link: https://lore.kernel.org/r/20241206204828.507527-6-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When running 'perf record' outside a container and the java agent inside
a container the jit_repipe_code_load() and friends will emit
PERF_RECORD_MMAP2 entries for the jitdump records and will check if we
need to fixup the pid/tid:
nspid = jr->load.pid;
pid = jr_entry_pid(jd, jr);
tid = jr_entry_tid(jd, jr);
The jr_entry_pid() function looks if we're in the same pidns:
static pid_t jr_entry_pid(struct jit_buf_desc *jd, union jr_entry *jr)
{
if (jd->nsi && nsinfo__in_pidns(jd->nsi))
return nsinfo__tgid(jd->nsi);
return jr->load.pid;
}
But since the thread, populated from perf.data records, try to figure
out if in the same pidns by actually trying, on the system where 'perf
inject' is running to open a procfs file (a bug that remains to be
fixed), assuming that if it is not possible that is because that thread
terminated and thus we can't get its namespace info and tolerates
nsinfo__init() failing, noting only that that namespace can't be
entered, so don't even try.
But we can kinda get at least that info (thread->nsinfo->in_pidns) from
the data in the perf.data file, namely the pid and tid in the
PERF_RECORD_MMAP2 for the jit-<PID>.dump file generated from the java
agent, if the PERF_RECORD_MMAP2->pid is the same as what is in the
jitdump file, then we're in the same namespace, otherwise we need to use
the PERF_RECORD_MMAP2->pid.
This all has to be revamped for this jitdump + running perf from
outside, as the meaning of in_pidns is being abused, the initialization
of nsinfo->pid with the value coming from the PERF_RECORD_MMAP2 data is
wrong as it is the pid _outside_ the container since perf was running
there.
The hack in this patch at least produces the expected result in this
scenario by following the assumptions in the current codebase for
finding maps and for generating the PERF_RECORD_MMAP2 for the ELF files
synthesized from the jitdump records in jit_repipe_code_load(), etc.s
Reported-by: Francesco Nigro <fnigro@redhat.com>
Reported-by: Ilan Green <igreen@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
Link: https://lore.kernel.org/r/20241206204828.507527-5-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we're processing a perf.data file we will, for every thread in that
file do a machine__findnew_thread(machine, pid, tid) that when that pid
is seen for the first time will create a 'struct thread' representing
it.
That in turn will call nsinfo__new() -> nsinfo__init() and there it will
assume we're running live, which is wrong and will need to be addressed
in a followup patch.
The nsinfo__new() assumes that if we can't access that thread it has
already finished and will ignore the -1 return from nsinfo__init(), just
taking notes to avoid trying to enter in that namespace, since it isn't
there anymore, a race.
When doing this from 'perf inject', tho, we can fill in parts of that
nsinfo from what we get from the PERF_RECORD_MMAP2 (pid, tid) and in the
jitdump file name, that has the form of jit-<PID>.dump.
So if the pid in the jitdump file name is not the one in the
PERF_RECORD_MMAP2, we can assume that its the pid of the process
_inside_ the namespace, and that perf was runing outside that namespace.
This will be done in the following patch.
Reported-by: Francesco Nigro <fnigro@redhat.com>
Reported-by: Ilan Green <igreen@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
Link: https://lore.kernel.org/r/20241206204828.507527-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the java agent is running inside a container it will emit mmaps
with the format:
⬢ [acme@toolbox a]$ perf report -D | grep PERF_RECORD_MMAP | grep \.dump
0 0x15c400 [0x90]: PERF_RECORD_MMAP2 3308868/3308868: [0x7fb8de6cb000(0x1000) @ 0 08:14 3222905945 0]: r-xp /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jit-1.dump
⬢ [acme@toolbox a]$
Since perf is running from outside the container it sees the pid 3308868
in PERF_RECORD_MMAP2, while the agent saw the pid of the profiled app
inside the container, 1.
The previous validation was:
if (pid && pid2 != nsinfo__nstgid(nsi))
pid2 at this point is '1' (/jit-1.dump), so it considers this as a
malformed jitdump mmap and refuses to process it.
The test ends up as:
if (3308868 && 1 != 3308868)
which is true and the jitdump is not processed.
Since 1 in the container namespace is really 3308868 in the namespace
that perf is running, consider this a valid mmap.
We need to make perf realize this and behave accordingly, for now
checking instead:
if (pid && pid2 && pid != nsinfo__nstgid(nsi))
Translating to:
if (3308868 && 1 && 3308868 != 3308868)
Will make the jitdump mmap to be considered valid and processed.
The jitdump is described in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jitdump-specification.txt
Now we end up with the expected flurry of MMAPs, one per jitted function
transformed into a little ELF file that should then be processable by
the other perf features, like code annotation:
[acme@toolbox a]$ echo $JITDUMPDIR
/tmp/.debug/jit
[acme@toolbox a]$
First use 'perf inject':
⬢ [acme@toolbox a]$ time perf inject -i perf.data -o acme-perf-injected.data -j
Then look at the PERF_RECORD_MMAP events in the result file, that went
thru the JIT map file:
⬢ [acme@toolbox a]$ ls -la /tmp/*.map
-rw-r--r--. 1 acme acme 2989559 Nov 27 16:11 /tmp/perf-3308868.map
[acme@toolbox a]$
It is a symbol table:
⬢ [acme@toolbox a]$ head /tmp/*.map
0x00007fb8bda5c1a0 0x00000000000000d0 java.lang.String java.lang.module.ModuleDescriptor.name()
0x00007fb8bda5c4a0 0x0000000000000178 int java.lang.StringLatin1.hashCode(byte[])
0x00007fb8bda5c9a0 0x00000000000000d0 java.lang.String org.springframework.boot.context.config.ConfigDataLocation.getValue()
0x00007fb8bda5cca0 0x00000000000000d0 java.lang.module.ModuleDescriptor java.lang.module.ModuleReference.descriptor()
0x00007fb8bda5cfa0 0x00000000000000d0 java.lang.Object java.util.KeyValueHolder.getKey()
0x00007fb8bda5d2a0 0x00000000000000d0 java.lang.Object java.util.KeyValueHolder.getValue()
0x00007fb8bda5d5a0 0x0000000000000218 boolean jdk.internal.misc.Unsafe.compareAndSetReference(java.lang.Object, long, java.lang.Object, java.lang.Object)
0x00007fb8bda5d9a0 0x00000000000001f0 boolean jdk.internal.misc.Unsafe.compareAndSetLong(java.lang.Object, long, long, long)
0x00007fb8bda5dda0 0x00000000000001f8 void java.lang.System.arraycopy(java.lang.Object, int, java.lang.Object, int, int)
0x00007fb8bda5e1a0 0x00000000000001e8 int java.lang.Object.hashCode()
⬢ [acme@toolbox a]$
As specified in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jit-interface.txt
This was collected from inside the container, so came as
/tmp/perf-1.map.
To make perf, running outside the container to use it we need to copy it
to /tmp/perf-3308868.map.
This is another logic that has to be added to perf to work on this
scenario of running outside the container but processing things created
by the hava agent running inside the container.
With all this in place we get to:
⬢ [acme@toolbox a]$ perf report -D -i acme-perf-injected.data | \
grep PERF_RECORD_MMAP > /tmp/acme-perf-injected.data.mmaps ; \
wc -l /tmp/acme-perf-injected.data.mmaps
44182 /tmp/acme-perf-injected.data.mmaps
⬢ [acme@toolbox a]$ tail /tmp/acme-perf-injected.data.mmaps
1030266786574466 0x7bc9e0 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0ceb1c0(0x8d0) @ 0x80 00:2c 238715 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so
1030266795288774 0x7bca78 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cecc00(0x7e8) @ 0x80 00:2c 238716 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so
1030266895967339 0x7bcb10 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cee500(0x3328) @ 0x80 00:2c 238717 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43991.so
1030266915748306 0x7bcba8 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0aae0a0(0x138) @ 0x80 00:2c 238718 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43992.so
1030267185851220 0x7bcc40 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cf61e0(0x3b50) @ 0x80 00:2c 238719 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43993.so
1030267231364524 0x7bccd8 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0cfea80(0x14a0) @ 0x80 00:2c 238720 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43994.so
1030267425498831 0x7bcd70 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c054b4a0(0x338) @ 0x80 00:2c 238721 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43995.so
1030267506147888 0x7bce08 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0a995c0(0x1e8) @ 0x80 00:2c 238722 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43996.so
1030268112586116 0x7bcea0 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0d02520(0x258) @ 0x80 00:2c 238723 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43997.so
1030269435398150 0x7bcf38 [0x98]: PERF_RECORD_MMAP2 1/78: [0x7fb8c0d02dc0(0x278) @ 0x80 00:2c 238724 1]: --xs /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43998.so
⬢ [acme@toolbox a]$
And if we look at those tiny ELF files generated by the jitdump code
used by 'perf inject' we see:
⬢ [acme@toolbox a]$ file /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so
/tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=790591db95a77d644657dfe5058658b200000000, with debug_info, not stripped
⬢ [acme@toolbox a]$ file /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so
/tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=762f932acbee53a22638bf4c2b86780200000000, with debug_info, not stripped
⬢ [acme@toolbox a]$
⬢ [acme@toolbox a]$ ls -la /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so
-rw-r--r--. 1 acme acme 9432 Nov 29 10:56 /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43989.so
-rw-r--r--. 1 acme acme 7504 Nov 29 10:56 /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so
⬢ [acme@toolbox a]$
And:
⬢ [acme@toolbox a]$ objdump -dS /tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so | head -20
/tmp/.debug/jit/java-jit-20241126.XXTxEIOn/jitted-1-43990.so: file format elf64-x86-64
Disassembly of section .text:
0000000000000080 <Lredacted/REDACTED/REDACTED/logging/RedactedRedacted;Redacted(Lredacted/REDACTED/REDACTED/redactedRedacted/Redacted;)V>:
80: 44 8b 56 08 mov 0x8(%rsi),%r10d
84: 49 c1 e2 03 shl $0x3,%r10
88: 49 3b c2 cmp %r10,%rax
8b: 0f 85 6f 15 83 fc jne fffffffffc831600 <Lredacted/REDACTED/REDACTED/redacted/RedactedRedactedRedacted;Redacted(Lredacted/Redacted/Redacted/redactedRedacted/Redacted;)V+0xfffffffffc831580>
91: 66 66 90 data16 xchg %ax,%ax
94: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
9b: 00
9c: 66 66 66 90 data16 data16 xchg %ax,%ax
a0: 89 84 24 00 c0 fe ff mov %eax,-0x14000(%rsp)
a7: 55 push %rbp
a8: 48 8b ec mov %rsp,%rbp
ab: 48 83 ec 40 sub $0x40,%rsp
af: 48 89 34 24 mov %rsi,(%rsp)
⬢ [acme@toolbox a]$
The thing now being investigated is why we can't annotate anything here,
maybe that JITDUMPDIR is getting in the way:
⬢ [acme@toolbox a]$ perf annotate --stdio2 -i acme-perf-injected.data 'java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int)'
Error:
Couldn't annotate java.lang.String com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int):
Internal error: Invalid -1 error code
⬢ [acme@toolbox a]$
In the tests I performed while merging this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d518ac7be6223811ab947897273b1bbef846180
It works, but then there was no JITDUMPDIR involved:
/home/acme/.debug/jit/java-jit-20241127.XXF1SRgN/jitted-3912413-4191.so
⬢ [acme@toolbox perf-tools-next]$ perf report --call-graph none --no-child -i perf-injected.data | grep jitted- | head
1.36% java jitted-3912413-54.so [.] Interpreter
0.30% C1 CompilerThre jitted-3912413-1.so [.] flush_icache_stub
0.18% java jitted-3912413-4184.so [.] org.apache.fop.fo.properties.PropertyMaker.get(int, org.apache.fop.fo.PropertyList, boolean, boolean)
0.18% java jitted-3912413-4177.so [.] org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)
0.13% java jitted-3912413-3845.so [.] java.text.DecimalFormat.subformatNumber(java.lang.StringBuffer, java.text.Format$FieldDelegate, boolean, boolean, int, int, int, int)
0.11% java jitted-3912413-4191.so [.] org.apache.fop.fo.FObj.addChildNode(org.apache.fop.fo.FONode)
0.09% java jitted-3912413-2418.so [.] org.apache.fop.fo.XMLWhiteSpaceHandler.handleWhiteSpace()
0.08% Reference Handl jitted-3912413-54.so [.] Interpreter
0.08% java jitted-3912413-3326.so [.] org.apache.xmlgraphics.fonts.Glyphs.stringToGlyph(java.lang.String)
0.08% java jitted-3912413-3953.so [.] org.apache.fop.layoutmgr.BreakingAlgorithm.considerLegalBreak(org.apache.fop.layoutmgr.KnuthElement, int)
⬢ [acme@toolbox perf-tools-next]$
And then:
⬢ [acme@toolbox perf-tools-next]$ perf annotate --stdio2 -i perf-injected.data 'org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)' | head -20
Samples: 8 of event 'cpu_atom/cycles/Pu', 4000 Hz, Event count (approx.): 8112794, [percent: local period]
org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)() /home/acme/.debug/jit/java-jit-20241127.XXF1SRgN/jitted-3912413-4177.so
Percent 0x80 <org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)>:
nop
movl 0x8(%rsi),%r10d
cmpl 0x8(%rax),%r10d
→ jne 0
movl %eax,-0x14000(%rsp)
pushq %rbp
subq $0xb0,%rsp
nop
cmpl $0x3,0x20(%r15)
↓ jne 7037
2e: movl %ecx,0x28(%rsp)
movq %rdx,%rbp
movl 0x64(%rdx),%ebx
cmpb $0x0,0x38(%r15)
↓ jne 3a44
movq %rsi,0x30(%rsp)
48: movq 0x30(%rsp),%r10
⬢ [acme@toolbox perf-tools-next]$
No source code nor line numbers, that I saw in another build of perf for
RHEL9, for the same workload described in the cset above (a publicly
available java benchmark), so something to investigate on perf upstream
running on fedora, maybe some quirk with the jdk used when building perf
for RHEL 9 and for Fedora 40.
A related patch that should have make this all work is:
"perf inject jit: Add namespaces support"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=67dec926931448d688efb5fe34f7b5a22470fc0a
But we still need to polish this some more, maybe there are differences
in the agent used in NodeJS with --perf-prof and the jvmti one we're
using.
Hopefully describing all the steps while we investigate this case will
help us improve perf support for profiling JITed environments running in
containers while profiling from inside and outside it.
Reported-by: Francesco Nigro <fnigro@redhat.com>
Reported-by: Ilan Green <igreen@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
Link: https://lore.kernel.org/r/20241206204828.507527-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Depending on how vmlinux.lds is written, _etext might be the very first
data symbol instead of the very last text symbol.
Don't require it to be a text symbol, accept any symbol type.
Comitter notes:
See the first Link for further discussion, but it all boils down to
this:
---
# grep -e _stext -e _etext -e _edata /proc/kallsyms
c0000000 T _stext
c08b8000 D _etext
So there is no _edata and _etext is not text
$ ppc-linux-objdump -x vmlinux | grep -e _stext -e _etext -e _edata
c0000000 g .head.text 00000000 _stext
c08b8000 g .rodata 00000000 _etext
c1378000 g .sbss 00000000 _edata
---
Fixes: ed9adb2035 ("perf machine: Read also the end of the kernel")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/b3ee1994d95257cb7f2de037c5030ba7d1bed404.1736327613.git.christophe.leroy@csgroup.eu
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit 659ad3492b ("perf maps: Switch from rbtree to lazily
sorted array for addresses"), perf doesn't display anymore kernel
symbols on powerpc, allthough it still detects them as kernel addresses.
# Overhead Command Shared Object Symbol
# ........ .......... ............. ......................................
#
80.49% Coeur main [unknown] [k] 0xc005f0f8
3.91% Coeur main gau [.] engine_loop.constprop.0.isra.0
1.72% Coeur main [unknown] [k] 0xc005f11c
1.09% Coeur main [unknown] [k] 0xc01f82c8
0.44% Coeur main libc.so.6 [.] epoll_wait
0.38% Coeur main [unknown] [k] 0xc0011718
0.36% Coeur main [unknown] [k] 0xc01f45c0
This is because function maps__find_next_entry() now returns current
entry instead of next entry, leading to kernel map end address getting
mis-configured with its own start address instead of the start address
of the following map.
Fix it by really taking the next entry, also make sure that entry
follows current one by making sure entries are sorted.
Fixes: 659ad3492b ("perf maps: Switch from rbtree to lazily sorted array for addresses")
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/2ea4501209d5363bac71a6757fe91c0747558a42.1736329923.git.christophe.leroy@csgroup.eu
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I found it failed on machines with limited memory because 16M byte
per-cpu buffer is too big. The reason it added the option is not to
miss tracing data. Thus we can limit the data size by reducing the
function call depth instead of increasing the buffer size to handle the
whole data.
As it used the same option in the test_ftrace_trace() and it was able
to find the sleep function, it should work with the profile subcommand.
Get rid of other grep commands which might be affected by the depth
change.
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20250107224352.1128669-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The existing EBUSY strerror message is:
The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (intel_bts//).
"dmesg | grep -i perf" may provide additional information.
The dmesg won't be useful. What is more useful is knowing what
processes are potentially using the PMU, which some procfs scanning can
reveal. When parallel testing tests/shell/stat_all_pmu.sh this yields:
Testing intel_bts//
Error:
The PMU intel_bts counters are busy and in use by another process.
Possible processes:
2585882 perf list
2585902 perf list -j -o /tmp/__perf_test.list_output.json.KF9MY
2585904 perf list
2585911 perf record -e task-clock --filter period > 1 -o /dev/null --quiet true
2585912 perf list
2585915 perf list
2586042 /tmp/perf/perf record -asdg -e cpu-clock -o /tmp/perftool-testsuite_report.dIF/perf_report/perf.data -- sleep 2
2589078 perf record -g -e task-clock:u -o - perf test -w noploop
2589148 /tmp/perf/perf record --control=fifo:control,ack -e cpu-clock -m 1 sleep 10
2589379 perf --buildid-dir /tmp/perf.debug.Umx record --buildid-all -o /tmp/perf.data.YBm /tmp/perf.ex.MD5.ZQW
2589568 perf record -o /tmp/__perf_test.program.mtcZH/perf.data --branch-filter any,save_type,u -- perf test -w brstack
2589649 perf record --per-thread -o /tmp/__perf_test.perf.data.5d3dc perf test -w thloop
2589898 perf record -o /tmp/perf-test-script.BX2b27Dcnj/pp-perf.data --sample-cpu uname
Which gets a little closer to finding the issue.
Committer testing:
root@number:~#
root@number:~# grep -m1 "model name" /proc/cpuinfo
model name : Intel(R) Core(TM) i7-14700K
root@number:~#
Before:
root@number:~# perf stat -e intel_bts// &
[1] 197954
root@number:~# perf test "perf all PMU test"
124: perf all PMU test : FAILED!
root@number:~# perf test -v "perf all PMU test" |& tail
Testing i915/vecs0-busy/
Testing i915/vecs0-sema/
Testing i915/vecs0-wait/
Testing intel_bts//
Unexpected signal in main
Error:
The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (intel_bts//).
"dmesg | grep -i perf" may provide additional information.
---- end(-1) ----
124: perf all PMU test : FAILED!
root@number:~#
After:
root@number:~# perf stat -e intel_bts// &
[1] 200195
root@number:~# perf test "perf all PMU test"
123: perf all PMU test : FAILED!
root@number:~# perf test -v "perf all PMU test" |& tail
Testing i915/vecs0-wait/
Testing intel_bts//
Unexpected signal in main
Error:
The PMU intel_bts counters are busy and in use by another process.
Possible processes:
200195 perf stat -e intel_bts//
2319766 /root/bin/perf top --stdio
---- end(-1) ----
123: perf all PMU test : FAILED!
root@number:~#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Ze Gao <zegao2021@gmail.com>
Change-Id: Ie1ed8688286c44e8f44a35e98fed8be3e2a344df
Link: https://lore.kernel.org/r/20241106003007.2112584-1-ctshao@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When running in the now default parallel mode this test has been
frequently failing, while when running exclusively, on a quiet system,
it passes.
Since its expectations were established when serial testing was the
norm, mark it as exclusive to get this kind of resunt:
root@x1:~# perf test 106
106: perf script task-analyzer tests : Ok
root@x1:~# set -o vi
root@x1:~# perf stat --null --repeat 10 perf test 106
106: perf script task-analyzer tests : Ok
106: perf script task-analyzer tests : Ok
106: perf script task-analyzer tests : Ok
106: perf script task-analyzer tests : Ok
106: perf script task-analyzer tests : Ok
106: perf script task-analyzer tests : Ok
106: perf script task-analyzer tests : Ok
106: perf script task-analyzer tests : Ok
106: perf script task-analyzer tests : Ok
106: perf script task-analyzer tests : Ok
Performance counter stats for 'perf test 106' (10 runs):
4.8872 +- 0.0179 seconds time elapsed ( +- 0.37% )
root@x1:~#
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After binutils commit e43d876 which was first included in binutils 2.41,
riscv no longer supports dumping in the middle of instructions.
Increase the objdump window by 2-bytes to ensure that any instruction
that sits on the boundary of the specified stop-address is not cut in
half.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241219-perf_fix_riscv_obj_reading-v3-1-a7d644dcfa50@rivosinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Recently we got a case where a kernel sample wasn't being resolved due
to a bug that was not setting the end address on kernel functions
implemented in assembly (see Link: tag), and then those were not being
found by machine__resolve() -> map__find_symbol().
So we ended up with:
# perf top --stdio
PerfTop: 0 irqs/s kernel: 0% exact: 0% lost: 0/0 drop: 0/0 [cycles/P]
-----------------------------------------------------------------------
Warning:
A vmlinux file was not found.
Kernel samples will not be resolved.
^Z
[1]+ Stopped perf top --stdio
#
But then resolving all other kernel symbols.
So just fixup the logic to only print that warning when there are no
symbols in the kernel map.
Fixes: d88205db9c ("perf dso: Add dso__has_symbols() method")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/Z3buKhcCsZi3_aGb@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Not all of these are "state" so separate them into two sections. Rename
and document to make all clearer.
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241112160048.951213-6-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rename 'prefix' to 'timestamp' because that's all it does, except in
iostat mode where it's slightly overloaded, but still includes a
timestamp. This reveals a problem with iostat and JSON mode so document
this.
Make it more explicit that these are printed in interval mode by
changing 'if (prefix)' to 'if (interval)' which reveals an unnecessary
'else if (... && !interval)' which can be removed.
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241112160048.951213-5-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Despite the name new_line_metric doesn't make a new line, it actually
does nothing. Change it to NULL to avoid confusion.
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241112160048.951213-4-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We decided to hide NULL metric-units rather than showing it as "(null)"
when a dependent event for a metric doesn't exist. But on hybrid systems
if the process doesn't hit a PMU you get an empty string metric unit
instead. To make it consistent change all empty strings to NULL.
Note that metric-threshold is already hidden in this case without this
change.
Where a process only runs on cpu_core and never hits cpu_atom:
Before:
$ perf stat -j -- true
...
{"counter-value" : "<not counted>", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 0, "pcnt-running" : 0.00, "metric-value" : "0.000000", "metric-unit" : ""}
{"counter-value" : "6326.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 293786, "pcnt-running" : 100.00, "metric-value" : "3.553394", "metric-unit" : "of all branches", "metric-threshold" : "good"}
...
After:
...
{"counter-value" : "<not counted>", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 0, "pcnt-running" : 0.00}
{"counter-value" : "5778.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 282240, "pcnt-running" : 100.00, "metric-value" : "3.226797", "metric-unit" : "of all branches", "metric-threshold" : "good"}
...
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241112160048.951213-3-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that printing metric-value and metric-unit is optional,
print_running_json() shouldn't add the comma in case it becomes
trailing.
Replace all manual JSON comma stuff with a json_out() function that uses
the existing os->first tracking and auto inserts a comma if it's needed.
Update the test to handle that two of the fields can be missing.
This fixes the following test failure on Cortex A57 where the branch
misses metric is missing a required event:
$ perf test -vvv "json output"
106: perf stat JSON output linter:
--- start ---
test child forked, pid 665682
Checking json output: no args Test failed for input:
{"counter-value" : "3112.000000", "unit" : "",
"event" : "armv8_pmuv3_1/branch-misses/",
"event-runtime" : 20699340, "pcnt-running" : 100.00, }
...
json.decoder.JSONDecodeError: Expecting property name enclosed in
double quotes: line 12 column 144 (char 2109)
---- end(-1) ----
106: perf stat JSON output linter : FAILED!
Fixes: e1cc918b6c ("perf stat: Drop metric-unit if unit is NULL")
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241112160048.951213-2-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, we only have 'perf trace' augmentation tests for enum
arguments. This patch adds tests for more general syscall arguments,
such as struct pointers, strings, and buffers.
These tests utilize the 'perf config' system to configure 'the perf trace'
output, as suggested by Arnaldo Carvalho de Melo <acme@kernel.org>.
Committer testing:
root@number:~# perf test "BTF general"
109: perf trace BTF general tests : Ok
root@number:~# perf test -v "BTF general"
109: perf trace BTF general tests : Ok
root@number:~# perf test -vv "BTF general"
109: perf trace BTF general tests:
--- start ---
test child forked, pid 1410451
Checking if vmlinux BTF exists
Testing perf trace's string augmentation
Testing perf trace's buffer augmentation
Testing perf trace's struct augmentation
---- end(0) ----
109: perf trace BTF general tests : Ok
root@number:~#
It still fails sometimes, for instance when tested with:
root@number:~# perf stat --null -r 10 perf test "BTF general"
109: perf trace BTF general tests : Ok
109: perf trace BTF general tests : Ok
109: perf trace BTF general tests : Ok
109: perf trace BTF general tests : Ok
109: perf trace BTF general tests : FAILED!
109: perf trace BTF general tests : Ok
109: perf trace BTF general tests : Ok
109: perf trace BTF general tests : FAILED!
109: perf trace BTF general tests : Ok
109: perf trace BTF general tests : Ok
Performance counter stats for 'perf test BTF general' (10 runs):
2.148 +- 0.293 seconds time elapsed ( +- 13.63% )
root@number:~#
But we can go on from here and fix things up with followup patches.
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20241215190712.787847-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
is_executable_file() has been unused since 2022's commit
7391db6459 ("perf test: Refactor shell tests allowing subdirs")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241222215831.283248-1-linux@treblig.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An evsel idx may not be stable due to sorting, evlist removal,
etc. Avoid use of the idx where the evsel itself can be used to avoid
these problems. This removed 1 values array and duplicated evsel name
strings.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Chen Ni <nichen@iscas.ac.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241114230713.330701-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An evsel idx may not be stable due to sorting, evlist removal,
etc. Avoid use of the idx where the evsel itself can be used to avoid
these problems.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Chen Ni <nichen@iscas.ac.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241114230713.330701-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If the JSON input to jevents.py is broken it can be problematic to
work out which particular JSON file is broken. When processing files
catch exceptions that occur that re-raise the exception with path
details added.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20241114172309.840241-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The bpf_get_kmem_cache() kfunc can return an address of the slab cache
(kmem_cache). As it has the name of the slab cache from the iterator,
we can use it to symbolize some dynamic kernel locks in a slab.
Before:
root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1
contended total wait max wait avg wait address symbol
2 3.34 us 2.87 us 1.67 us ffff9d7800ad9600 (mutex)
2 2.16 us 1.93 us 1.08 us ffff9d7804b992d8 (mutex)
4 1.37 us 517 ns 343 ns ffff9d78036e6e00 (mutex)
1 1.27 us 1.27 us 1.27 us ffff9d7804b99378 (mutex)
2 845 ns 599 ns 422 ns ffffffff9e1c3620 delayed_uprobe_lock (mutex)
1 845 ns 845 ns 845 ns ffffffff9da0b280 jiffies_lock (spinlock)
2 377 ns 259 ns 188 ns ffffffff9e1cf840 pcpu_alloc_mutex (mutex)
1 305 ns 305 ns 305 ns ffffffff9e1b4cf8 tracepoint_srcu_srcu_usage (mutex)
1 295 ns 295 ns 295 ns ffffffff9e1c0940 pack_mutex (mutex)
1 232 ns 232 ns 232 ns ffff9d7804b7d8d8 (mutex)
1 180 ns 180 ns 180 ns ffffffff9e1b4c28 tracepoint_srcu_srcu_usage (mutex)
1 165 ns 165 ns 165 ns ffffffff9da8b3a0 text_mutex (mutex)
After:
root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1
contended total wait max wait avg wait address symbol
2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex)
1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex)
4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex)
2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex)
3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex)
3 421 ns 164 ns 140 ns ffffffffa3a8b3a0 text_mutex (mutex)
1 409 ns 409 ns 409 ns ffffffffa41b4cf8 tracepoint_srcu_srcu_usage (mutex)
2 362 ns 239 ns 181 ns ffffffffa41cf840 pcpu_alloc_mutex (mutex)
1 220 ns 220 ns 220 ns ffff9d5e82b534d8 &signal_cache (mutex)
1 215 ns 215 ns 215 ns ffffffffa41b4c28 tracepoint_srcu_srcu_usage (mutex)
Note that the name starts with '&' sign for slab objects to inform they
are dynamic locks. It won't give the accurate lock or type names but
it's still useful. We may add type info to the slab cache later to get
the exact name of the lock in the type later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20241220060009.507297-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Recently the kernel got the kmem_cache iterator to traverse metadata of
slab objects. This can be used to symbolize dynamic locks in a slab.
The new slab_caches hash map will have the pointer of the kmem_cache as
a key and save the name and a id. The id will be saved in the flags
part of the lock.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20241220060009.507297-3-namhyung@kernel.org
[ Added change from Namhyung addressing review from Alexei: ]
Link: https://lore.kernel.org/r/Z2dVdH3o5iF-KrWj@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is a preparation for the later change. It'll use more bits in the
flags so let's rename the type part and use the mask to extract the
type.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20241220060009.507297-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Right now every time we need to figure out the type of an evsel for
output purposes we do a quick sequence of ifs, but there are new cases
where there is a need to do more complex iterations over multiple data
structures, sso allow for caching this operation on a hole of 'struct
evsel'.
This should really be done on the evsel->priv area that 'perf script'
sets up, but more work is needed to make sure that it is allocated when
we need it, right now it is only used for conditionally, add some
comments so that we move this to that 'perf script' specific area when
the conditions are in place for that.
Acked-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/lkml/Z2XCi3PgstSrV0SE@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Correctly throw IndexError for out-of-bound accesses to evlist:
Python 3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.insert(0, '/tmp/perf/python')
>>> import perf
>>> x=perf.parse_events('cycles')
>>> print(x)
evlist([cycles])
>>> x[2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: Index out of range
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-23-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This allows evsel to be shown in the REPL like:
Python 3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.insert(0, '/tmp/perf/python')
>>> import perf
>>> x=perf.parse_events('cycles,data_read')
>>> print(x)
evlist([cycles,uncore_imc_free_running_0/data_read/,uncore_imc_free_running_1/data_read/])
>>> x[0]
evsel(cycles)
>>> x[1]
evsel(uncore_imc_free_running_0/data_read/)
>>> x[2]
evsel(uncore_imc_free_running_1/data_read/)
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-22-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This allows the values in the evlist to be shown in the REPL like:
Python 3.11.9 (main, Jun 19 2024, 00:38:48) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.insert(0,'/tmp/perf/python')
>>> import perf
>>> perf.parse_events('cycles,data_read')
evlist([cycles,uncore_imc_free_running_0/data_read/,uncore_imc_free_running_1/data_read/])
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-21-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add basic parse_events function that takes a string and returns an
evlist. As the python evlist is embedded in a pyrf_evlist, and the
evsels are embedded in pyrf_evsels, copy the parsed data into those
structs and update evsel__clone to enable this.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-20-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With the attr.c code moved to a shell test, there is no need to link
the test code into the python dso to avoid a missing reference to
test_attr__open. Drop the test code from the python library.
With the bench and test code removed from the python library on my x86
debian derived laptop the python library is reduced in size by 508,712
bytes or nearly 5%.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-19-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_kwork_add_work is declared in builtin-kwork, whereas much kwork
code is in util. To avoid needing to stub perf_kwork_add_work in
python.c, add a callback to struct perf_kwork and initialize it in
builtin-kwork to perf_kwork_add_work - this is the only struct
perf_kwork. This removes the need for the stub in python.c.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-18-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Avoid `perf bench internals inject-build-id` referencing the
cmd_inject sub-command that requires perf-bench to backward reference
internals of builtins. Replace the reference to cmd_inject with a call
to main. To avoid python.c needing to link with something providing
main, drop the libperf-bench library from the python shared object.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Avoid references from util code to builtin-lock that require python
stubs. Move the functions and related variables to
util/lock-contention.c. Add max_stack_depth parameter to
match_callstack_filter to avoid sharing a global variable.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-16-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move arch_syscalls__strerrno_function out of builtin-trace.c to env.c
so that there isn't a util to builtin function call. This allows the
python.c stub to be removed. Also, remove declaration/prototype from
env.h and make static to reduce scope. The include is moved inside
ifdefs to avoid, "defined but unused warnings".
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-15-irogers@google.com
perf: perf python: Correctly throw IndexError
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 00a263902a ("perf intel-pt: Use shared x86 insn decoder")
removed the use of diff, so remove stale busybox comment.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-14-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
archinsn.c containing arch_fetch_insn was only enabled with
CONFIG_AUXTRACE, but this meant that a NO_AUXTRACE build on x86 would
use the empty weak version of arch_fetch_insn - weak symbols are a
frequent source of errors like this and are outside of the C
specification. Change it so that archinsn.c is always built on x86 and
make the weak symbol empty version of arch_fetch_insn a strong one
guarded by ifdefs.
arch_fetch_insn on x86 depends on insn_decode which is a function
included then built into intel-pt-insn-decoder.c.
intel-pt-insn-decoder.c isn't built in a NO_AUXTRACE=1 build. Separate
the insn_decode function from intel-pt-insn-decoder.c by just directly
compiling the relevant file. Guard this compilation to be for either
always on x86 (because of the use in arch_fetch_insn) or when auxtrace
is enabled. Apply the CFLAGS overrides as necessary, reducing the amount
of code where warnings are disabled.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_sample__sprintf_flags is used in the python C code and so needs
to be in the util library rather than a builtin.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241119011644.971342-12-irogers@google.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add native_arch as a parameter to script_fetch_insn rather than
relying on the builtin-script value that won't be initialized for the
dlfilter and python Context use cases. Assume both of those cases are
running natively.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The script_spec code is referenced in util/trace-event-scripting but
the list was in builtin-script, accessed via a function that required
a stub function in python.c. Move all the logic to
trace-event-scripting, with lookup and foreach functions exposed for
builtin-script's benefit.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-10-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
stat_config is accessed by config.c via helper functions, but declared
in builtin-stat. Move to util/config.c so that stub functions aren't
needed in python.c which doesn't link against the builtin files.
To avoid name conflicts change builtin-script to use the same
stat_config as builtin-stat. Rename local variables in tests to avoid
shadow declaration warnings.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The only use of find_scripts is in browser/scripts.c but the
definition in builtin causes linking problems requiring a stub in
python.c. Move the function to allow the stub to be removed.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rewrite the directory iteration to use openat so that large character
arrays aren't needed. The arrays are warned about potential buffer
overflows by GCC when the code exists in a single C file.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The util library code is used by the python module but doesn't have
access to the builtin files. Make a util/kvm-stat.c to match the
kvm-stat.h file that declares the functions and move the functions
there.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
scripting_max_stack is used in util code which is linked into the
python module. Move the variable declaration to
util/trace-event-scripting.c to avoid conditional compilation.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove unused #include of bpf-filter.h.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Opportunistically constify variables and parameters when possible.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Python2 was deprecated 4 years ago, remove support and workarounds.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20241119011644.971342-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a simple sub-test to the "Miscellaneous Intel PT testing" test to
check pause / resume.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Document the use of aux-action config term and provide a simple example.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Improve format of config terms and section references.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Display "feature is not supported" error message if aux_start_paused,
aux_pause or aux_resume result in a perf_event_open() error.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add parsing for aux-action to accept "pause", "resume" or "start-paused"
values.
"start-paused" is valid only for AUX area events.
"pause" and "resume" are valid only for events grouped with an AUX area
event as the group leader. However, like with aux-output, the events
will be automatically grouped if they are not currently in a group, and
the AUX area event precedes the other events.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a new common config term "aux-action" to use for configuring AUX area
trace pause / resume. The value is a string that will be parsed in a
subsequent patch.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add 'struct perf_event_attr' members to support pause and resume of AUX area
tracing.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241216070244.14450-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The zstd library will be automatically linked by detecting the feature
libzstd. It is no need to explicitly link it for static builds, so
remove the redundant linkage.
It is contradictory to detect the feature libelf-zstd while the build
configuration NO_LIBZSTD is set. Report an error for reminding users
not to set NO_LIBZSTD.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Quentin Monnet <qmo@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241215221223.293205-3-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently print_overall_results prints the number of fails in the
summary, example from base_probe tests in testsuite_probe:
## [ FAIL ] ## perf_probe :: test_invalid_options SUMMARY ::
11 failures found
test_invalid_options contains multiple tests and out of that 11 failed.
Sometimes it could happen that it is due to missing dependency in the
build or environment dependency.
Example, perf probe -L requires DWARF enabled. otherwise
it fails as below:
./perf probe -L
Error: switch `L' is not available because NO_DWARF=1
"-L" is tested as one of the option in:
for opt in '-a' '-d' '-L' '-V'; do
<<perf probe test>>
print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "missing argument
for $opt"
Here -a and -d doesn't require DWARF. Similarly there are few other
tests requiring DWARF.
To hint the user that missing DWARF could be one issue, update
print_overall_results to print a comment string along with summary
hinting the possible cause. Update test_invalid_options.sh and
test_line_semantics.sh to pass the info about DWARF requirement since
these tests failed when perf is built without DWARF.
Use the check for presence of DWARF with "perf check feature" and append
the hint message based on the result.
With the change:
## [ FAIL ] ## perf_probe :: test_invalid_options SUMMARY ::
11 failures found :: Some of the tests need DWARF to run
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20241206135254.35727-1-atrajeev@linux.vnet.ibm.com
[ Minor edits changing "dwarf" to "DWARF" as its an acronym ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf tools side uses extended mask to display the platform supported
register names (with -I? option) to the user and also send this mask to
the kernel to capture the extended registers as part of each sample.
This mask value is decided based on the processor version ( from PVR ).
Add PVR value for power11 to enable capturing the extended regs as part
of sample in power11.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20241206135637.36166-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since these are so far considered part of the basic set of libraries to
be present when building perf, have then in
tools/build/features/test-all.c.
They were already in the FEATURE_TESTS_BASIC variable of
tools/build/Makefile.feature, meaning if test-all.c builds, those
features would be set as present, but then we were calling "again"
(well, they were not in test-all.c, so were not really being tested) for
it to be detected, fix this all up by not calling feature_check for
those features but instead have them in test-all.c to be tested together
with the the set of basic expected libraries.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20241213195052.914914-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This test was failing when run with the default 'perf test' mode, which
is to run multiple regression tests in parallel.
Since it checks system_wide mode, set it to run in exclusive mode.
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/lkml/Z1yPYqYYs_isO1PJ@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A variant of DEFINE_SUITE() but sets ->exclusive bit for the test so the
test will be executed sequentially.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20241210093449.1662-10-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When expr_parse_ctx is allocated by expr_ctx_new(),
expr_scanner_ctx->is_test isn't initialize, so it has garbage value.
this can affects the result of expr__parse() return when it parses
non-exist event literal according to garbage value.
Use calloc instead of malloc in expr_ctx_new() to fix this.
Fixes: 3340a08354 ("perf pmu-events: Fix testing with JEVENTS_ARCH=all")
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Levi Yun <yeoreum.yun@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241108143424.819126-1-yeoreum.yun@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The return value from the call to readlink() is ssize_t. However, the
return value is being assigned to an size_t variable 'len', so making
'len' an ssize_t.
./tools/perf/tests/tests-scripts.c:182:5-8: WARNING: Unsigned expression compared with zero: len < 0.
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=11909
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241115091527.128923-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While the value NULL+1 is never used it triggers a ubsan warning.
Restructure and comment the loop to avoid this.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241120065224.286813-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update using the new data [1] for these changes:
* Scale some metrics like dtlb_walk_ratio to percent so they display
better with Perf's 2 dp precision
* Description typos, grammar and clarifications
* Unnecessary metric formula brackets seem to have been removed in the
source but this is not a functional change
* New sve_all_percentage metric
The following command was used to generate this commit:
$ telemetry-solution/tools/perf_json_generator/generate.py \
tools/perf/ --telemetry-files \
telemetry-solution/data/pmu/cpu/neoverse/neoverse-v2.json:neoverse-n2-v2
[1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-v2.json
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20241120143739.243728-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The sample data is 64-bit aligned basically but raw data starts with
32-bit length field and data follows. In perf_event__synthesize_sample
it treats the sample data as a 64-bit array. And it needs some trick
to update the raw data properly.
But it seems some compilers are not happy with this and the program dies
siliently. I found the sample parsing test failed without any messages
on affected systems.
Let's update the code to use a 32-bit pointer directly and make sure the
result is 64-bit aligned again. No functional changes intended.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241128010325.946897-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since the linked fixes: commit, err is returned uninitialized due to the
removal of "return 0". Initialize err to fix it.
This fixes the following intermittent test failure on release builds:
$ perf test "testsuite_probe"
...
-- [ FAIL ] -- perf_probe :: test_invalid_options :: mutually exclusive options :: -L foo -V bar (output regexp parsing)
Regexp not found: \"Error: switch .+ cannot be used with switch .+\"
...
Fixes: 080e47b2a2 ("perf probe: Introduce quotation marks support")
Tested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241211085525.519458-2-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
If perf_env__insert_bpf_prog_info() returns false due to a duplicate bpf
prog info node insertion, the temporary info_node and info_linear memory
will leak. Add a check to ensure the memory is freed if the function
returns false.
Fixes: d56354dc49 ("perf tools: Save bpf_prog_info and BTF of new BPF programs")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241205084500.823660-4-quic_zhonhan@quicinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Function __perf_env__insert_bpf_prog_info() will return without inserting
bpf prog info node into perf env again due to a duplicate bpf prog info
node insertion, causing the temporary info_linear and info_node memory to
leak. Modify the return type of this function to bool and add a check to
ensure the memory is freed if the function returns false.
Fixes: 606f972b13 ("perf bpf: Save bpf_prog_info information as headers to perf.data")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241205084500.823660-3-quic_zhonhan@quicinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If __perf_env__insert_btf() returns false due to a duplicate btf node
insertion, the temporary node will leak. Add a check to ensure the memory
is freed if the function returns false.
Fixes: a70a112317 ("perf bpf: Save BTF information as headers to perf.data")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241205084500.823660-2-quic_zhonhan@quicinc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For big string offsets we output comments for what string the offset
is for. If the string contains a '*/' as seen in Intel Arrowlake event
descriptions, then this causes C parsing issues for the generated
pmu-events.c. Catch such '*/' values and escape to avoid this.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20241113165558.628856-1-irogers@google.com
[ Used return s.replace('*/', r'\*\/') based on failure followed by request by Ian ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf stat' output on aarch64 machines with topdown events wasn't
counted for in the 'perf stat STD output linter' test case. Add the
topdown metric to the skip_metric list as it is done for topdown events
on other systems.
The Topdown events are also disabled on aarch64 KVM guests because the
value of caps/slots is set to 0 due to the part of the system register
being a stub.
This prevents the metric for the topdown events from being computed,
leaving the 'perf stat' topdown metric without any value at all.
Add the "TopdownL1" to the skip_metric list as well to handle this
possibility.
Before aarch64:
100: perf stat STD output linter:
--- start ---
test child forked, pid 403305
Checking STD output: no args Unknown event name in TopdownL1 # 4.3 percent of slots slots_lost_misspeculation_fraction
---- end(-1) ----
100: perf stat STD output linter : FAILED!
Before aarch64 KVM:
100: perf stat STD output linter:
--- start ---
test child forked, pid 404671
Checking STD output: no args Unknown event name in TopdownL1
---- end(-1) ----
100: perf stat STD output linter : FAILED!
After:
100: perf stat STD output linter:
--- start ---
test child forked, pid 404777
Checking STD output: no args [Success]
Checking STD output: system wide [Success]
Checking STD output: interval [Success]
Checking STD output: per thread [Success]
Checking STD output: per node [Success]
Checking STD output: system wide no aggregation [Success]
Checking STD output: per core [Success]
Checking STD output: per cache instance [Success]
Checking STD output: per cluster [Success]
Checking STD output: per die [Success]
Checking STD output: per socket [Success]
---- end(0) ----
100: perf stat STD output linter : Ok
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241029144347.25651-1-vmolnaro@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Replace unacceptable characters with '_' when generating event name from
the probing function name.
This is not for a C program. For the a C program, it will continue to
remove suffixes.
Note that this language checking depends on the debuginfo. So without
the debuginfo, perf probe will always replaces unacceptable characters
with '_'.
For example.
$ ./perf probe -x cro3 -D \"cro3::cmd::servo::run_show\"
p:probe_cro3/cro3_cmd_servo_run_show /work/cro3/target/x86_64-unknown-linux-gnu/debug/cro3:0x197530
$ ./perf probe -x /work/go/example/outyet/main -D 'main.(*Server).poll'
p:probe_main/main_Server_poll /work/go/example/outyet/main:0x353040
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/173145728160.2747044.18089011235495186810.stgit@mhiramat.roam.corp.google.com
[ Removed some extra tabs in the new struct fields ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds a max-latency option as discussed, in case the number of
buckets is more than 22, we don't observe the setting (for now, let's
say).
By default or if 0 is passed, the value is automatically determined
based on the number of buckets, range and minimum, so that we fill all
available buffers (equivalent to the behaviour before this patch).
We now get something like this:
# perf ftrace latency --bucket-range=20 \
--min-latency 10 \
--max-latency=100 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 10 us | 1731 | ################ |
10 - 30 us | 1 | |
30 - 50 us | 0 | |
50 - 70 us | 0 | |
70 - 90 us | 0 | |
90 - 100 us | 0 | |
100 - ... us | 0 | |
Note the maximum is observed also if it doesn't cover completely a full
range (the second to last range is 10us long to let the last start at
100 sharp), this looks to me more sensible and eases the computations,
since we don't need to account for the range while filling the buckets.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241112181214.1171244-5-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Things below and over will be in the first and last, outlier, buckets.
Without it:
# perf ftrace latency --use-nsec --use-bpf \
--bucket-range=200 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 200 ns | 0 | |
200 - 400 ns | 44 | |
400 - 600 ns | 291 | # |
600 - 800 ns | 506 | ## |
800 - 1000 ns | 148 | |
1.00 - 1.20 us | 581 | ## |
1.20 - 1.40 us | 2199 | ########## |
1.40 - 1.60 us | 1048 | #### |
1.60 - 1.80 us | 1448 | ###### |
1.80 - 2.00 us | 1091 | ##### |
2.00 - 2.20 us | 517 | ## |
2.20 - 2.40 us | 318 | # |
2.40 - 2.60 us | 370 | # |
2.60 - 2.80 us | 271 | # |
2.80 - 3.00 us | 150 | |
3.00 - 3.20 us | 85 | |
3.20 - 3.40 us | 48 | |
3.40 - 3.60 us | 40 | |
3.60 - 3.80 us | 22 | |
3.80 - 4.00 us | 13 | |
4.00 - 4.20 us | 14 | |
4.20 - ... us | 626 | ## |
#
# perf ftrace latency --use-nsec --use-bpf \
--bucket-range=20 --min-latency=1200 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 1200 ns | 1243 | ##### |
1.20 - 1.22 us | 141 | |
1.22 - 1.24 us | 202 | |
1.24 - 1.26 us | 209 | |
1.26 - 1.28 us | 219 | |
1.28 - 1.30 us | 208 | |
1.30 - 1.32 us | 245 | # |
1.32 - 1.34 us | 246 | # |
1.34 - 1.36 us | 224 | # |
1.36 - 1.38 us | 219 | |
1.38 - 1.40 us | 206 | |
1.40 - 1.42 us | 190 | |
1.42 - 1.44 us | 190 | |
1.44 - 1.46 us | 146 | |
1.46 - 1.48 us | 140 | |
1.48 - 1.50 us | 125 | |
1.50 - 1.52 us | 115 | |
1.52 - 1.54 us | 102 | |
1.54 - 1.56 us | 87 | |
1.56 - 1.58 us | 90 | |
1.58 - 1.60 us | 85 | |
1.60 - ... us | 5487 | ######################## |
#
Now we want focus on the latencies starting at 1.2us, with a finer
grained range of 20ns:
This is all on a live system, so statistically interesting, but not
narrowing down on the same numbers, so a 'perf ftrace latency record'
seems interesting to then use all on the same snapshot of latencies.
A --max-latency counterpart should come next, at first limiting the
max-latency to 20 * bucket-size, as we have a fixed buckets array with
20 + 2 entries (+ for the outliers) and thus would need to make it
larger for higher latencies.
We also may need a way to ask for not considering the out of range
values (first and last buckets) when drawing the buckets bars.
Co-developed-by: Gabriele Monaco <gmonaco@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241112181214.1171244-4-acme@kernel.org
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The ftrace->use_nsec arg is being passed to both make_historgram() and
display_histogram(), since another ftrace field will be passed to those
functions in a followup patch, make them look like other functions in
this codebase that receive the 'struct perf_ftrace' pointer.
No change in logic.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241112181214.1171244-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The temp directory is made and a known fake hwmon PMU created within
it. Prior to this fix the events were being incorrectly written to the
temp directory rather than the fake PMU directory. This didn't impact
the test as the directory fd matched the wrong location, but it
doesn't mirror what a hwmon PMU would actually look like.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: https://lore.kernel.org/r/20241206042306.1055913-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The hwmon PMU test will make a temp directory, open the directory with
O_DIRECTORY then fill it with contents. As the open is before the
filling the contents the later fdopendir may reflect the initial empty
state, meaning no events are seen. Change to re-open the directory,
rather than dup the fd, so the latest contents are seen.
Minor tweaks/additions to debug messages.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: https://lore.kernel.org/r/20241206042306.1055913-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The comparison function cmp_profile_data() violates the C standard's
requirements for qsort() comparison functions, which mandate symmetry
and transitivity:
* Symmetry: If x < y, then y > x.
* Transitivity: If x < y and y < z, then x < z.
When v1 and v2 are equal, the function incorrectly returns 1, breaking
symmetry and transitivity. This causes undefined behavior, which can
lead to memory corruption in certain versions of glibc [1].
Fix the issue by returning 0 when v1 and v2 are equal, ensuring
compliance with the C standard and preventing undefined behavior.
Link: https://www.qualys.com/2024/01/30/qsort.txt [1]
Fixes: 0f223813ed ("perf ftrace: Add 'profile' command")
Fixes: 74ae366c37 ("perf ftrace profile: Add -s/--sort option")
Cc: stable@vger.kernel.org
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: jserv@ccns.ncku.edu.tw
Cc: chuang@cs.nycu.edu.tw
Link: https://lore.kernel.org/r/20241209134226.1939163-1-visitorckw@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The temp directory is made and a known fake hwmon PMU created within
it. Prior to this fix the events were being incorrectly written to the
temp directory rather than the fake PMU directory. This didn't impact
the test as the directory fd matched the wrong location, but it
doesn't mirror what a hwmon PMU would actually look like.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241206042306.1055913-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The hwmon PMU test will make a temp directory, open the directory with
O_DIRECTORY then fill it with contents. As the open is before the
filling the contents the later fdopendir may reflect the initial empty
state, meaning no events are seen. Change to re-open the directory,
rather than dup the fd, so the latest contents are seen.
Minor tweaks/additions to debug messages.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241206042306.1055913-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tracepoint parsing required libtraceevent but no longer does. Remove
the Build logic and #ifdefs that caused the tests not to be run. Test
code that directly uses libtraceevent is still guarded.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Zixian Cai <fzczx123@gmail.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20241118225345.889810-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Switch from reading the tracepoint format to reading the id directly for
the evsel config. This avoids the need to initialize libtraceevent,
plugins, etc. It is sufficient for many tracepoint commands to work
like:
$ perf stat -e sched:sched_switch true
To populate evsel->tp_format, do lazy initialization using libtraceevent
in the evsel__tp_format function (the sys and name are saved in
evsel__newtp_idx for this purpose).
Reading the id should be indicative of the format failing to load, but
if not an error is reported in evsel__tp_format. This could happen for a
tracepoint with a format that fails to parse.
As tracepoints can be parsed without libtraceevent with this, remove the
associated #ifdefs in parse-events.c.
By only lazily parsing the tracepoint format information it is hoped
this will help improve the performance of code using tracepoints but not
the format information. It also cuts down on the build and ifdef logic.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Zixian Cai <fzczx123@gmail.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20241118225345.889810-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add an accessor function for tp_format. Rather than search+replace
uses try to use a variable and reuse it. Add additional NULL checks
when accessing/using the value. Make sure the PTR_ERR is nulled out on
error path in evsel__newtp_idx.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Zixian Cai <fzczx123@gmail.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20241118225345.889810-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
trace-event-info.c has no libtraceevent dependencies, always build it
and use it in builtin-record and perf_event_attr printing.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Zixian Cai <fzczx123@gmail.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20241118225345.889810-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Capture that these functions don't mutate their input.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Zixian Cai <fzczx123@gmail.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20241118225345.889810-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
get_core_id returns 0 on success and a negative errno value on error.
Currently the error can only be -1, but fixing this to be any errno
value breaks perf:
https://lore.kernel.org/lkml/Zzu4Sdebve-NXEMX@google.com/
To avoid this, make sure all error values are written as -1.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Falcon <thomas.falcon@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Zixian Cai <fzczx123@gmail.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20241118225345.889810-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By introducing a tools/perf/util/btf.c to collect utilities not yet
available via libbpf, the first being a way to find a member by name
once we get the type_id for the struct.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove use of a FILE and switch to reading a string that is then
passed to perf_cpu_map__new().
Being able to remove perf_cpu_map__read() avoids duplicated parsing logic.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kyle Meyer <kyle.meyer@hpe.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241206044035.1062032-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
libperf exposes MAX_NR_CPUS via tools/lib/perf/include/internal/cpumap.h
which is internal.
The preferred dependency should be the definition in tools/perf/perf.h.
Add the includes of perf.h so that MAX_NR_CPUS can be hidden in libperf.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kyle Meyer <kyle.meyer@hpe.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241206044035.1062032-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096.
Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B,
cpus_runtime will increase from 81960B to 163880B, and max_entries will
increase from 8192B to 16384B.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241206044035.1062032-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Decode SPE Data Source packets on AmpereOne. The field is IMPDEF.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Graham Woodward <graham.woodward@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20241108202946.16835-3-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Split Data Source Packet handling to prepare adding support for
other implementations.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Graham Woodward <graham.woodward@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20241108202946.16835-2-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For the CPU map merging test, add an extra check for the reference
counter before releasing the last CPU map.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241107125308.41226-4-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add additional tests for CPU map merging to cover more cases.
These tests include different types of arguments, such as when one CPU
map is a subset of another, as well as cases with or without overlap
between the two maps.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241107125308.41226-3-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf_cpu_map__merge() function has two arguments, 'orig' and
'other'. The function definition might cause confusion as it could give
the impression that the CPU maps in the two arguments are copied into a
new allocated structure, which is then returned as the result.
The purpose of the function is to merge the CPU map 'other' into the CPU
map 'orig'. This commit changes the 'orig' argument to a pointer to
pointer, so the new result will be updated into 'orig'.
The return value is changed to an int type, as an error number or 0 for
success.
Update callers and tests for the new function definition.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241107125308.41226-2-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Just a trivial typo, should be 'can', did a spell check on the rest of
the file just in case, nothing more stood out.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously system RAM and persistent memory were hard code matched,
change so that the label of the memory region is just read from
/proc/iomem. This avoids frequent N/A samples.
Change the /proc/iomem reading, event processing and output so that
nested entries appear and their counts count toward their parent. As
labels may be repeated, include the memory ranges in the output to make
it clear why, for example, "System RAM" appears twice.
Before:
Event: mem_inst_retired.all_loads:P
Memory type count percentage
---------------------------------------- ---------- ----------
System RAM 9460 96.5%
N/A 998 3.5%
After:
Event: mem_inst_retired.all_loads:P
Memory type count percentage
---------------------------------------- ---------- ----------
100000000-105f7fffff : System RAM 36741 96.5
841400000-8416599ff : Kernel data 89 0.2
840800000-8412a6fff : Kernel rodata 60 0.2
841ebe000-8423fffff : Kernel bss 34 0.1
0-fff : Reserved 1345 3.5
100000-89dd9fff : System RAM 2 0.0
Before:
Event: mem_inst_retired.any:P
Memory type count percentage
---------------------------------------- ----------- -----------
System RAM 9460 90.5%
N/A 998 9.5%
After:
Event: mem_inst_retired.any:P
Memory type count percentage
---------------------------------------- ---------- ----------
100000000-105f7fffff : System RAM 9460 90.5
841400000-8416599ff : Kernel data 45 0.4
840800000-8412a6fff : Kernel rodata 19 0.2
841ebe000-8423fffff : Kernel bss 12 0.1
0-fff : Reserved 998 9.5
The code has been updated to python 3 with type hints and resolving
issues reported by mypy and pylint. Tabs are swapped to spaces as
preferred in PEP8, because most lines of code were modified (of this
small file) and this makes pylint significantly less noisy.
Committer testing:
root@number:/tmp# grep -m1 "model name" /proc/cpuinfo
model name : Intel(R) Core(TM) i7-14700K
root@number:/tmp#
root@number:/tmp# perf script mem-phys-addr -a find /
/bin
/lib
/lib64
/sbin
Warning:
744 out of order events recorded.
Event: cpu_core/mem_inst_retired.all_loads/P
Memory type count percentage
---------------------------------------- ---------- ----------
100000000-8bfbfffff : System RAM 364561 76.5
621400000-6223a6fff : Kernel rodata 10474 2.2
622400000-62283d4bf : Kernel data 4828 1.0
623304000-6237fffff : Kernel bss 1063 0.2
620000000-6213fffff : Kernel code 98 0.0
0-fff : Reserved 111480 23.4
100000-2b0ca017 : System RAM 337 0.1
2fbad000-30d92fff : System RAM 44 0.0
2c79d000-2fbabfff : System RAM 30 0.0
30d94000-316d5fff : System RAM 16 0.0
2b131a58-2c71dfff : System RAM 7 0.0
root@number:/tmp#
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241119180130.19160-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since 13e17c9ff4 ("perf build: Make libunwind opt-in rather than
opt-out"), so we shouldn't by default be testing for its availability at
build time in tools/build/features/test-all.c.
That test was designed to test the features we expect to be the most
common ones in most builds, so if we test build just that file, then we
assume the features there are present and will not test one by one.
Removing it from test-all.c gets rid of the first impediment for
test-all.c to build successfully:
$ cat /tmp/build/perf-tools-next/feature/test-all.make.output
In file included from test-all.c:62:
test-libunwind.c:2:10: fatal error: libunwind.h: No such file or directory
2 | #include <libunwind.h>
| ^~~~~~~~~~~~~
compilation terminated.
$
We then get to:
$ cat /tmp/build/perf-tools-next/feature/test-all.make.output
/usr/bin/ld: cannot find -lunwind-x86_64: No such file or directory
/usr/bin/ld: cannot find -lunwind: No such file or directory
collect2: error: ld returned 1 exit status
$
So make all the logic related to setting CFLAGS, LDFLAGS, etc for
libunwind to be conditional on NO_LIBWUNWIND=1, which is now the
default, now we get a faster build:
$ cat /tmp/build/perf-tools-next/feature/test-all.make.output
$ ldd /tmp/build/perf-tools-next/feature/test-all.bin
linux-vdso.so.1 (0x00007fef04cde000)
libdw.so.1 => /lib64/libdw.so.1 (0x00007fef04a49000)
libpython3.12.so.1.0 => /lib64/libpython3.12.so.1.0 (0x00007fef04478000)
libm.so.6 => /lib64/libm.so.6 (0x00007fef04394000)
libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007fef0436c000)
libtracefs.so.1 => /lib64/libtracefs.so.1 (0x00007fef04345000)
libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007fef03e95000)
libz.so.1 => /lib64/libz.so.1 (0x00007fef03e72000)
libelf.so.1 => /lib64/libelf.so.1 (0x00007fef03e56000)
libnuma.so.1 => /lib64/libnuma.so.1 (0x00007fef03e48000)
libslang.so.2 => /lib64/libslang.so.2 (0x00007fef03b65000)
libperl.so.5.38 => /lib64/libperl.so.5.38 (0x00007fef037c6000)
libc.so.6 => /lib64/libc.so.6 (0x00007fef035d5000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fef035a0000)
libzstd.so.1 => /lib64/libzstd.so.1 (0x00007fef034e1000)
libbz2.so.1 => /lib64/libbz2.so.1 (0x00007fef034cd000)
/lib64/ld-linux-x86-64.so.2 (0x00007fef04ce0000)
libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007fef03495000)
$
Fixes: 13e17c9ff4 ("perf build: Make libunwind opt-in rather than opt-out")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Z09zTztD8X8qIWCX@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes it returns other than EOPNOTSUPP for invalid precise_ip so
it cannot check the error code. Let's move the fallback after the
missing feature checks so that it can handle EINVAL as well. This also
aligns well with the existing behavior which blindly turns off the
precise_ip but we check the missing features correctly now.
Fixes: af954f76ee ("perf tools: Check fallback error and order")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Closes: https://lore.kernel.org/oe-lkp/202411301431.799e5531-lkp@intel.com
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/Z1DV0lN8qHSysX7f@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
It should only have generic flags in the array but the recent header
sync brought a new flags to fcntl.h and caused a build error. Let's
update the shell script to exclude flags specific to name_to_handle_at().
CC trace/beauty/fs_at_flags.o
In file included from trace/beauty/fs_at_flags.c:21:
tools/perf/trace/beauty/generated/fs_at_flags_array.c:13:30: error: initialized field overwritten [-Werror=override-init]
13 | [ilog2(0x002) + 1] = "HANDLE_CONNECTABLE",
| ^~~~~~~~~~~~~~~~~~~~
tools/perf/trace/beauty/generated/fs_at_flags_array.c:13:30: note: (near initialization for ‘fs_at_flags[2]’)
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241203035349.1901262-12-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick up the changes in this cset:
09d6775f50 riscv: Add support for userspace pointer masking
91e102e797 prctl: arch-agnostic prctl for shadow stack
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h
Please see tools/include/uapi/README for further details.
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241203035349.1901262-11-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick up the changes in this cset:
aefff51e1c statmount: retrieve security mount options
2f4d4503e9 statmount: add flag to retrieve unescaped options
44010543fc fs: add the ability for statmount() to report the sb_source
ed9d95f691 fs: add the ability for statmount() to report the fs_subtype
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/uapi/linux/mount.h include/uapi/linux/mount.h
Please see tools/include/uapi/README for further details.
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20241203035349.1901262-10-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick up the changes in this cset:
c374196b2b ("fs: name_to_handle_at() support for "explicit connectable" file handles")
95f567f81e ("fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag")
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h
Please see tools/include/uapi/README for further details.
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20241203035349.1901262-9-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick up the changes in this cset:
6140be90ec ("fs/xattr: add *at family syscalls")
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
diff -u tools/perf/arch/x86/entry/syscalls/syscall_32.tbl arch/x86/entry/syscalls/syscall_32.tbl
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl
The arm64 changes are not included as it requires more changes in the
tools. It'll be worked for the later cycle.
Please see tools/include/uapi/README for further details.
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
CC: x86@kernel.org
CC: linux-mips@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-s390@vger.kernel.org
Link: https://lore.kernel.org/r/20241203035349.1901262-7-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This signal handler loops over all tests on ctrl-C, but it's active
while the test list is being constructed. process.pid is 0, then -1,
then finally set to the child pid on fork. If the Ctrl-C is received
during this point a kill(-1, SIGINT) can be sent which affects all
processes.
Make sure the child has forked first before forwarding the signal. This
can be reproduced with ctrl-C immediately after launching perf test
which terminates the ssh connection.
Fixes: 553d5efeb3 ("perf test: Add a signal handler to kill forked child processes")
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241129151948.3199732-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The build-id events written at the end of the record session are broken
due to unexpected data. The write_buildid() writes the fixed length
event first and then variable length filename.
But a recent change made it write more data in the padding area
accidentally. So readers of the event see zero-filled data for the
next entry and treat it incorrectly. This resulted in wrong kernel
symbols because the kernel DSO loaded a random vmlinux image in the
path as it didn't have a valid build-id.
Fixes: ae39ba1655 ("perf inject: Fix build ID injection")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/Z0aRFFW9xMh3mqKB@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
perf record
-----------
* Enable leader sampling for inherited task events. It was supported
only for system-wide events but the kernel started to support such a
setup since v6.12.
This is to reduce the number of PMU interrupts. The samples of the
leader event will contain counts of other events and no samples will
be generated for the other member events.
$ perf record -e '{cycles,instructions}:S' ${MYPROG}
perf report
-----------
* Fix --branch-history option to display more branch-related information
like prediction, abort and cycles which is available on Intel machines.
$ perf record -bg -- perf test -w brstack
$ perf report --branch-history
...
#
# Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage]
# ........ ........................ .............. .................... ......... ..... ...... ....................
#
8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - -
|
---xas_load xarray.h:171
|
|--5.68%--xas_load xarray.c:245 (cycles:1)
| xas_load xarray.c:242
| xas_load xarray.h:1260 (cycles:1)
| xas_descend xarray.c:146
| xas_load xarray.c:244 (cycles:2)
| xas_load xarray.c:245
| xas_descend xarray.c:218 (cycles:10)
...
perf stat
---------
* Add HWMON PMU support. The HWMON provides various system information
like CPU/GPU temperature, fan speed and so on. Expose them as PMU
events so that users can see the values using perf stat commands.
$ perf stat -e temp_cpu,fan1 true
Performance counter stats for 'true':
60.00 'C temp_cpu
0 rpm fan1
0.000745382 seconds time elapsed
0.000883000 seconds user
0.000000000 seconds sys
* Display metric threshold in JSON output. Some metrics define
thresholds to classify value ranges. It used to be in a different
color but it won't work for JSON. Add "metric-threshold" field to
the JSON that can be one of "good", "less good", "nearly bad" and
"bad".
# perf stat -a -M TopdownL1 -j true
{"counter-value" : "18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" : 5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"}
{"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"}
{"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"}
{"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"}
{"counter-value" : "3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
{"counter-value" : "5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
{"counter-value" : "7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
{"counter-value" : "1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
...
perf sched
----------
* Add -P/--pre-migrations option for 'timehist' sub-command to track
time a task waited on a run-queue before migrating to a different CPU.
$ perf sched timehist -P
time cpu task name wait time sch delay run time pre-mig time
[tid/pid] (msec) (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- --------- ---------
585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000
585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000
585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000
585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000
585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000
585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000
585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000
585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000
585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001
585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000
...
Build
-----
* Make libunwind opt-in (LIBUNWIND=1) rather than opt-out. The perf
tools are generally built with libelf and libdw which has unwinder
functionality. The libunwind support predates it and no need to
have duplicate unwinders by default.
* Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify it's
using libdw for handling DWARF information.
Internals
---------
* Do not set exclude_guest bit in the perf_event_attr by default. This
was causing a trouble in AMD IBS PMU as it doesn't support the bit.
The bit will be set when it's needed later by the fallback logic.
Also update the missing feature detection logic to make sure not clear
supported bits unnecessarily.
* Run perf test in parallel by default and mark flaky tests "exclusive"
to run them serially at the end. Some test numbers are changed but
the test can complete in less than half the time.
JSON vendor events
------------------
* Add AMD Zen 5 events and metrics.
* Add i.MX91 and i.MX95 DDR metrics
* Fix HiSilicon HIP08 Topdown metric name.
* Support compat events on PowerPC.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ0Qi3gAKCRCMstVUGiXM
g6NIAP49eoSmQF40u55sJN0J7RpYd+bTgXZkahv0IUCBX98TLwEA2NrK0oUcB84C
xeanq28/3JxNM/oBpsEvvB8mb/0lGwI=
=FAVF
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf record:
- Enable leader sampling for inherited task events. It was supported
only for system-wide events but the kernel started to support such
a setup since v6.12.
This is to reduce the number of PMU interrupts. The samples of the
leader event will contain counts of other events and no samples
will be generated for the other member events.
$ perf record -e '{cycles,instructions}:S' ${MYPROG}
perf report:
- Fix --branch-history option to display more branch-related
information like prediction, abort and cycles which is available
on Intel machines.
$ perf record -bg -- perf test -w brstack
$ perf report --branch-history
...
#
# Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage]
# ........ ........................ .............. .................... ......... ..... ...... ....................
#
8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - -
|
---xas_load xarray.h:171
|
|--5.68%--xas_load xarray.c:245 (cycles:1)
| xas_load xarray.c:242
| xas_load xarray.h:1260 (cycles:1)
| xas_descend xarray.c:146
| xas_load xarray.c:244 (cycles:2)
| xas_load xarray.c:245
| xas_descend xarray.c:218 (cycles:10)
...
perf stat:
- Add HWMON PMU support.
The HWMON provides various system information like CPU/GPU
temperature, fan speed and so on. Expose them as PMU events so that
users can see the values using perf stat commands.
$ perf stat -e temp_cpu,fan1 true
Performance counter stats for 'true':
60.00 'C temp_cpu
0 rpm fan1
0.000745382 seconds time elapsed
0.000883000 seconds user
0.000000000 seconds sys
- Display metric threshold in JSON output.
Some metrics define thresholds to classify value ranges. It used to
be in a different color but it won't work for JSON.
Add "metric-threshold" field to the JSON that can be one of "good",
"less good", "nearly bad" and "bad".
# perf stat -a -M TopdownL1 -j true
{"counter-value" : "18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" : 5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"}
{"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"}
{"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"}
{"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"}
{"counter-value" : "3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
{"counter-value" : "5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
{"counter-value" : "7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
{"counter-value" : "1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" : 5552708, "pcnt-running" : 100.00, }
...
perf sched:
- Add -P/--pre-migrations option for 'timehist' sub-command to track
time a task waited on a run-queue before migrating to a different
CPU.
$ perf sched timehist -P
time cpu task name wait time sch delay run time pre-mig time
[tid/pid] (msec) (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- --------- ---------
585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000
585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000
585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000
585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000
585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000
585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000
585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000
585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000
585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001
585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000
...
Build:
- Make libunwind opt-in (LIBUNWIND=1) rather than opt-out.
The perf tools are generally built with libelf and libdw which has
unwinder functionality. The libunwind support predates it and no
need to have duplicate unwinders by default.
- Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify
it's using libdw for handling DWARF information.
Internals:
- Do not set exclude_guest bit in the perf_event_attr by default.
This was causing a trouble in AMD IBS PMU as it doesn't support the
bit. The bit will be set when it's needed later by the fallback
logic. Also update the missing feature detection logic to make sure
not clear supported bits unnecessarily.
- Run perf test in parallel by default and mark flaky tests
"exclusive" to run them serially at the end. Some test numbers are
changed but the test can complete in less than half the time.
JSON vendor events:
- Add AMD Zen 5 events and metrics.
- Add i.MX91 and i.MX95 DDR metrics
- Fix HiSilicon HIP08 Topdown metric name.
- Support compat events on PowerPC"
* tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (232 commits)
perf tests: Fix hwmon parsing with PMU name test
perf hwmon_pmu: Ensure hwmon key union is zeroed before use
perf tests hwmon_pmu: Remove double evlist__delete()
perf/test: fix perf ftrace test on s390
perf bpf-filter: Return -ENOMEM directly when pfi allocation fails
perf test: Correct hwmon test PMU detection
perf: Remove unused del_perf_probe_events()
perf pmu: Move pmu_metrics_table__find and remove ARM override
perf jevents: Add map_for_cpu()
perf header: Pass a perf_cpu rather than a PMU to get_cpuid_str
perf header: Avoid transitive PMU includes
perf arm64 header: Use cpu argument in get_cpuid
perf header: Refactor get_cpuid to take a CPU for ARM
perf header: Move is_cpu_online to numa bench
perf jevents: fix breakage when do perf stat on system metric
perf test: Add missing __exit calls in tool/hwmon tests
perf tests: Make leader sampling test work without branch event
perf util: Remove kernel version deadcode
perf test shell trace_exit_race: Use --no-comm to avoid cases where COMM isn't resolved
perf test shell trace_exit_race: Show what went wrong in verbose mode
...
Incorrectly the hwmon with PMU name test didn't pass "true". Fix and
address issue with hwmon_pmu__config_terms needing to load events - a
load bearing assert fired. Also fix missing list deletion when putting
the hwmon test PMU and lower some debug warnings to make the hwmon PMU
less spammy in verbose mode.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241121000955.536930-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In the error path when failing to parse events the evlist is being
deleted twice, keep the one after the out label.
Fixes: 531ee0fd48 ("perf test: Add hwmon "PMU" test")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/ZzzoJNNcJJVnPCCe@x1
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
On s390 the perf test case ftrace sometimes fails as follows:
# ./perf test ftrace
79: perf ftrace tests : FAILED!
#
The failure depends on the kernel .config file. Some configurations
always work fine, some do not. The ftrace profile test mostly fails,
because the ring buffer was not large enough, and some lines
(especially the interesting ones with nanosleep in it) where dropped.
To achieve success for all tested kernel configurations, enlarge
the buffer to store the traces completely without wrapping.
The default buffer size is too small for all kernel configurations.
Set the buffer size of for the ftrace profile test to 16 MB.
Output after:
# ./perf test ftrace
79: perf ftrace tests : Ok
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: agordeev@linux.ibm.com
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20241119064856.641446-1-tmricht@linux.ibm.com
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
del_perf_probe_events() last use was removed by commit 3d6dfae889
("perf parse-events: Remove BPF event support")
Remove it.
It was the last user of probe_file__del_events(), so remove it as well.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241022002940.302946-1-linux@treblig.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move pmu_metrics_table__find() to the jevents.py generated pmu-events.c
and remove indirection override for ARM.
The movement removes perf_pmu__find_metrics_table that exists to enable
the ARM override.
The ARM override isn't necessary as just the CPUID, not PMU, is used in
the metric table lookup.
On non-ARM the CPU argument is just ignored for the CPUID, for ARM -1 is
passed so that the CPUID for the first logical CPU is read.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Zong-You Xie <ben717@andestech.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241107162035.52206-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The PMU is no longer part of the map finding process and for metrics
doesn't make sense as they lack a PMU.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Zong-You Xie <ben717@andestech.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241107162035.52206-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On ARM the cpuid is dependent on the core type of the CPU in
question. The PMU was passed for the sake of the CPU map but this
means in places a temporary PMU is created just to pass a CPU
value. Just pass the CPU and fix up the callers.
As there are no longer PMU users in header.h, shuffle forward
declarations earlier to work around build failures.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Zong-You Xie <ben717@andestech.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241107162035.52206-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently satisfied via header.h. Note, pmu.h includes parse-events.h.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Zong-You Xie <ben717@andestech.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241107162035.52206-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the cpu to read the MIDR file requested. If the "any" value (-1) is
passed that keep the behavior of returning the first MIDR file that can
be read.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Zong-You Xie <ben717@andestech.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241107162035.52206-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
ARM BIG.little has no notion of a constant CPUID for both core
types. To reflect this reality, change the get_cpuid function to also
pass in a possibly unused logical cpu.
If the dummy value (-1) is passed in then ARM can, as currently happens,
select the first logical CPU's "CPUID".
The changes to ARM getcpuid happen in a follow up change.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Zong-You Xie <ben717@andestech.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241107162035.52206-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The helper function is only used in the NUMA benchmark as typically
online CPUs are determined through perf_cpu_map__new_online_cpus().
Reduce the scope of the function for now.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Zong-You Xie <ben717@andestech.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241107162035.52206-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When do perf stat on sys metric, perf tool output nothing now:
$ perf stat -a -M imx95_ddr_read.all -I 1000
$
This command runs on an arm64 machine and the Soc has one DDR hw pmu
except one armv8_cortex_a55 pmu. Their maps show as follows:
const struct pmu_events_map pmu_events_map[] = {
{
.arch = "arm64",
.cpuid = "0x00000000410fd050",
.event_table = {
.pmus = pmu_events__arm_cortex_a55,
.num_pmus = ARRAY_SIZE(pmu_events__arm_cortex_a55)
},
.metric_table = {
.pmus = NULL,
.num_pmus = 0
}
},
static const struct pmu_sys_events pmu_sys_event_tables[] = {
{
.event_table = {
.pmus = pmu_events__freescale_imx95_sys,
.num_pmus = ARRAY_SIZE(pmu_events__freescale_imx95_sys)
},
.metric_table = {
.pmus = pmu_metrics__freescale_imx95_sys,
.num_pmus = ARRAY_SIZE(pmu_metrics__freescale_imx95_sys)
},
.name = "pmu_events__freescale_imx95_sys",
},
Currently, pmu_metrics_table__find() will return NULL when only do perf
stat on sys metric. Then parse_groups() will never be called to parse
sys metric_name, finally perf tool will exit directly. This should be a
common problem.
To fix the issue, this will keep the logic before commit f20c15d13f
("perf pmu-events: Remember the perf_events_map for a PMU") to return a
empty metric table rather than a NULL pointer.
This should be fine since the removed part just check if the table match
provided metric_name. Without these code, the code in parse_groups()
will also check the validity of metrci_name too.
Fixes: f20c15d13f ("perf pmu-events: Remember the perf_events_map for a PMU")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Ben Zong-You Xie <ben717@andestech.com>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241107162035.52206-2-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Address sanitizer flagged the missing parse_events_error__exit when
testing on ARM.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241115201258.509477-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arm a57 only has speculative branch events so this test fails there. The
test doesn't depend on branch instructions so change it to instructions
which is pretty much guaranteed to be everywhere. The
test_branch_counter() test above already tests for the existence of the
branches event and skips if its not present.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241115161600.228994-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
fetch_kernel_version() has been unused since Ian's 2023 commit
3d6dfae889 ("perf parse-events: Remove BPF event support")
Remove it, and it's helpers.
I noticed there are a bunch of kernel-version macros that are also
unused nearby.
Also remove them.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241116155850.113129-1-linux@treblig.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The purpose of this test is to test for races in the exit of 'perf
trace' missing the last events, it was failing when the COMM wasn't
resolved either because we missed some PERF_RECORD_COMM or somehow
raced on getting it from procfs.
Add --no-comm to the 'perf trace' command line so that we get a
consistent, pid only output, which allows the test to achieve its goal.
This is the output from
'perf trace --no-comm -e syscalls:sys_enter_exit_group':
0.000 21953 syscalls:sys_enter_exit_group()
0.000 21955 syscalls:sys_enter_exit_group()
0.000 21957 syscalls:sys_enter_exit_group()
0.000 21959 syscalls:sys_enter_exit_group()
0.000 21961 syscalls:sys_enter_exit_group()
0.000 21963 syscalls:sys_enter_exit_group()
0.000 21965 syscalls:sys_enter_exit_group()
0.000 21967 syscalls:sys_enter_exit_group()
0.000 21969 syscalls:sys_enter_exit_group()
0.000 21971 syscalls:sys_enter_exit_group()
Now it passes:
root@number:~# perf test "trace exit race"
110: perf trace exit race : Ok
root@number:~#
root@number:~# perf test -v "trace exit race"
110: perf trace exit race : Ok
root@number:~#
If we artificially make it run just 9 times instead of the 10 it runs,
i.e. by manually doing:
trace_shutdown_race() {
for _ in $(seq 9); do
that 9 is $iter, 10 in the patch, we get:
root@number:~# vim ~acme/libexec/perf-core/tests/shell/trace_exit_race.sh
root@number:~# perf test -v "trace exit race"
--- start ---
test child forked, pid 24629
Missing output, expected 10 but only got 9
---- end(-1) ----
110: perf trace exit race : FAILED!
root@number:~#
I.e. 9 'perf trace' calls produced the expected output, the inverse grep
didn't show anything, so the patch provided by Howard for the previous
patch kicks in and shows a more informative message.
Tested-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Benjamin Peterson <benjamin@engflow.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/ZzdknoHqrJbojb6P@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If it fails we need to check what was the reason, what were the lines
that didn't match the expected format, so:
root@number:~# perf test -v "trace exit race"
--- start ---
test child forked, pid 2028724
Lines not matching the expected regexp: ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$':
0.000 :2028750/2028750 syscalls:sys_enter_exit_group()
---- end(-1) ----
110: perf trace exit race : FAILED!
root@number:~#
In this case we're not resolving the process COMM for some reason and
fallback to printing just the pid/tid, this will be fixed in a followup
patch.
Howard Chu spotted a problem with single code surrounding a regexp, that
made the test always fail, but since there were some failures when I
tested (COMM not being resolved in some of the results) the end inverse
grep would show some lines and thus didn't notice the single quote
problem.
He also provided a patch to test if less than the number of expected
matches took place but all of them with the expected output, in which
case the inverse grep wouldn't show anything, confusing the tester.
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Benjamin Peterson <benjamin@engflow.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/ZzdknoHqrJbojb6P@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a test that checks that trace output is not lost to races. This is
accomplished by tracing the exit_group syscall of "true" multiple times and
checking for correct output.
Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241107232128.108981-3-benjamin@engflow.com
[ Addressed two ShellCheck warnings ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
syscall__scnprintf_args may not place anything in the output buffer
(e.g., because the arguments are all zero). If that happened in
trace__fprintf_sys_enter, its fprintf would receive an unitialized
buffer leading to garbage output.
Fix the problem by passing the (possibly zero) bounds of the argument
buffer to the output fprintf.
Fixes: a98392bb1e ("perf trace: Use beautifiers on syscalls:sys_enter_ handlers")
Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241107232128.108981-2-benjamin@engflow.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If a perf trace event selector specifies a maximum number of events to output
(i.e., "/nr=N/" syntax), the event printing handler, trace__event_handler,
disables the event selector after the maximum number events are
printed.
Furthermore, trace__event_handler checked if the event selector was
disabled before doing any work. This avoided exceeding the maximum
number of events to print if more events were in the buffer before the
selector was disabled.
However, the event selector can be disabled for reasons other than
exceeding the maximum number of events. In particular, when the traced
subprocess exits, the main loop disables all event selectors. This meant
the last events of a traced subprocess might be lost to the printing
handler's short-circuiting logic.
This nondeterministic problem could be seen by running the following many times:
$ perf trace -e syscalls:sys_enter_exit_group true
trace__event_handler should simply check for exceeding the maximum number of
events to print rather than the state of the event selector.
Fixes: a9c5e6c1e9 ("perf trace: Introduce per-event maximum number of events property")
Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
Tested-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241107232128.108981-1-benjamin@engflow.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In non-C languages, it is possible to have ':' in the function names.
It is possible to escape it with backslashes, but if there are too many
backslashes, it is annoying.
This introduce quotation marks (`"` or `'`) support.
For example, without quotes, we have to pass it as below
$ perf probe -x cro3 -L "cro3\:\:cmd\:\:servo\:\:run_show"
<run_show@/work/cro3/src/cmd/servo.rs:0>
0 fn run_show(args: &ArgsShow) -> Result<()> {
1 let list = ServoList::discover()?;
2 let s = list.find_by_serial(&args.servo)?;
3 if args.json {
4 println!("{s}");
With quotes, we can more naturally write the function name as below;
$ perf probe -x cro3 -L \"cro3::cmd::servo::run_show\"
<run_show@/work/cro3/src/cmd/servo.rs:0>
0 fn run_show(args: &ArgsShow) -> Result<()> {
1 let list = ServoList::discover()?;
2 let s = list.find_by_serial(&args.servo)?;
3 if args.json {
4 println!("{s}");
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/173099116941.2431889.11609129616090100386.stgit@mhiramat.roam.corp.google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
strpbrk_esq() and strdup_esq() are new variants for strpbrk() and
strdup() which handles escaped characters and quoted strings.
- strpbrk_esq() searches specified set of characters but ignores the
escaped characters and quoted strings.
e.g. strpbrk_esq("'quote\d' \queue quiz", "qd") returns "quiz".
- strdup_esq() duplicates string but removes backslash and quotes which
is used for quotation. It also keeps the string (including backslash)
in the quoted part.
e.g. strdup_esq("'quote\d' \queue quiz") returns "quote\d queue quiz".
The (single, double) quotes in the quoted part should be escaped by
backslash. In this case, strdup_esq() removes that backslash.
The same quotes must be paired. If you use double quotation, you need
to use the double quotation to close the quoted part.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/173099116045.2431889.15772916605719019533.stgit@mhiramat.roam.corp.google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In Golang, the function name will have the '.', and 'perf probe'
misinterprets it as a file name.
To mitigate this situation, introduce `function@*` so that user can
explicitly specify that it is a function name.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/173099115149.2431889.13682110856853358354.stgit@mhiramat.roam.corp.google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use strbprk_esc() and strdup_esc() to ignore escaped characters in
--lines option. This has been done for other options, but only --lines
option doesn't.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/173099114272.2431889.4820591557298941207.stgit@mhiramat.roam.corp.google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With --lines option, if perf-probe fails to find the specified line, it
warns as "Debuginfo analysis failed." but this misleads user as the
debuginfo is broken.
Fix this message to "Specified source line(LINESPEC) is not found." so
that user can understand the error correctly.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/173099113381.2431889.16263147678401426107.stgit@mhiramat.roam.corp.google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch makes a minor change that removes a redundant variable
assignment. The assignment before the for loop is duplicated by the
initialization within the loop header.
Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241111095209.276332-1-luoyifan@cmss.chinamobile.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are some typos in fprintf messages.
Fix them via codespell.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108134728.25515-1-algonell@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then
parse and augment it with samples, allow navigation, etc.
More recently disassemblers from the capstone and llvm (libraries, not
parsing the output of tools using those libraries to mimic binutils's
objdump output) were introduced.
So when all those methods are available, there is a static preference
for a series of attempts of disassembling a binary, with the 'llvm,
capstone, objdump' sequence being hard coded.
This patch allows users to change that sequence, specifying via a 'perf
config' 'annotate.disassemblers' entry which and in what order
disassemblers should be attempted.
As alluded to in the comments in the source code of this series, this
flexibility is useful for users and developers alike, elliminating the
requirement to rebuild the tool with some specific set of libraries to
see how the output of disassembling would be for one of these methods.
root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
symbol__disassemble:
filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux,
sym=update_load_avg, start=0xffffffffb6148fe0, en>
annotating [0x6ff7170]
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux :
[0x7407ca0] update_load_avg
Disassembled with llvm
annotate.disassemblers=llvm,capstone,objdump
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = capstone
root@x1:~#
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
Disassembled with capstone
annotate.disassemblers=capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=objdump,capstone
root@x1:~# perf config annotate.disassemblers
annotate.disassemblers=objdump,capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,capstone
root@x1:~# perf annotate -v --stdio2 update_load_avg
Executing: objdump --start-address=0xffffffff81148fe0 \
--stop-address=0xffffffff811497aa \
-d --no-show-raw-insn -S -C "$1"
Disassembled with objdump
annotate.disassemblers=objdump,capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent
Disassembly of section .text:
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
#define DO_DETACH 0x8
/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int flags)
{
1.61 push %r15
push %r14
1.00 push %r13
mov %edx,%r13d
1.90 push %r12
push %rbp
mov %rsi,%rbp
push %rbx
mov %rdi,%rbx
sub $0x18,%rsp
}
/* rq->task_clock normalized against any time
this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
if (unlikely(cfs_rq->throttle_count))
15.14 mov 0x1a4(%rdi),%eax
root@x1:~#
After adding a way to select the disassembler from the command line a
'perf test' comparing the output of the various diassemblers should be
introduced, to test these codebases.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This reduces the number of ifdefs in the main symbol__disassemble()
method and paves the way for allowing the user to configure the
disassemblers of preference.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20241111151734.1018476-3-acme@kernel.org
[ Applied fixes from Masami Hiramatsu and Aditya Bodkhe for when capstone devel files are not available ]
Link: https://lore.kernel.org/r/B78FB6DF-24E9-4A3C-91C9-535765EC0E2A@ibm.com
Link: https://lore.kernel.org/r/173145729034.2747044.453926054000880254.stgit@mhiramat.roam.corp.google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With the first disassemble method in perf, the parsing of objdump
output, just like we have for llvm and capstone.
This paves the way to allow the user to specify what disassemblers are
preferred and to also to at some point allow building without the
objdump method.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20241111151734.1018476-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
PERF_HAVE_DWARF_REGS was true when an architecture had a dwarf-regs.c
file. There are no more architecture dwarf-regs.c files, selection is
done using constants from the ELF file rather than conditional
compilation. When removing PERF_HAVE_DWARF_REGS was the only variable
in the Makefile, remove the Makefile.
Add missing SPDX for RISC-V Makefile.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-21-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The file just provides the function get_arch_regstr, however, if in
the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case
the function can never be called. So remove as dead code. As this is
the only file in the arch/xtensa/util clean up Build files. Tidy up the
EM_NONE cases for xtensa in dwarf-regs.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-19-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The file just provides the function get_arch_regstr, however, if in
the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case
the function can never be called. So remove as dead code. As this is
the only file in the arch/sparc/util clean up Build files. Tidy up the
EM_NONE cases for sparc in dwarf-regs.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-18-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The file just provides the function get_arch_regstr, however, if in
the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case
the function can never be called. So remove as dead code. As this is
the only file in the arch/sh/util clean up Build files. Tidy up the
EM_NONE cases for sh in dwarf-regs.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-17-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The file just provides the function get_arch_regstr, however, if in
the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case
the function can never be called. So remove as dead code. Tidy up the
EM_NONE cases for s390 in dwarf-regs.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-16-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The file just provides the function get_arch_regstr, however, if in
the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case,
and the register table is provided in a header file, the function can
never be called. So remove as dead code. Tidy up the EM_NONE cases for
riscv in dwarf-regs.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-15-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Move arch/powerpc/util/dwarf-regs.c to util/dwarf-regs-powerpc.c and
compile in unconditionally. get_arch_regstr is redundant when EM_NONE
is treated as EM_HOST so remove and update dwarf-regs.c conditions.
Make get_powerpc_regs unconditionally available whwn libdw is.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-14-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The file just provides the function get_arch_regstr, however, if in
the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case
the function can never be called. So remove as dead code. Tidy up the
EM_NONE cases for mips in dwarf-regs.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-13-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The file just provides the function get_arch_regstr, however, if in
the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case
the function can never be called. So remove as dead code. Tidy up the
EM_NONE cases for loongarch in dwarf-regs.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Move arch/csky/util/dwarf-regs.c to util/dwarf-regs-csky.c and compile
in unconditionally. To avoid get_arch_regstr being duplicated, rename
to get_csky_regstr and add to get_dwarf_regstr switch.
Update #ifdefs to allow ABI V1 and V2 tables at the same
time. Determine the table from the ELF flags.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The file just provides the function get_arch_regstr, however, if in
the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case
the function can never be called. So remove as dead code. Tidy up the
EM_NONE cases for arm in dwarf-regs.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The file just provides the function get_arch_regstr, however, if in
the only caller get_dwarf_regstr EM_HOST is used for the EM_NONE case
the function can never be called. So remove as dead code. Tidy up the
EM_NONE cases for arm64 in dwarf-regs.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Move arch/x86/util/dwarf-regs.c to util/dwarf-regs-x86.c and compile
in unconditionally. To avoid get_arch_regnum being duplicated, rename
to get_x86_regnum and add to get_dwarf_regnum switch.
For get_arch_regstr, this was unused on x86 unless the machine type
was EM_NONE. Map that case to EM_HOST and remove get_arch_regstr from
dwarf-regs-x86.c.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Pass a flags value as architectures like csky need the flags to
determine the ABI variant.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Rather than pass 0/EM_NONE, use the value computed in the disasm
struct arch. Switch the EM_NONE case to EM_HOST, rewriting EM_NONE if
it were passed to get_dwarf_regnum. Pass a flags value as
architectures like csky need the flags to determine the ABI variant.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Currently functions like get_dwarf_regnum only work with the host
architecture. Carry the elf machine and flags in struct arch so that
in disassembly these can be used to allow cross platform disassembly.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Computed from the build architecture defines, EM_HOST and EF_HOST give
values that can be used in dwarf register lookup. Place in
dwarf-regs.h so the value can be shared. Move some dwarf-regs.c
constants used for EM_HOST to dwarf-regs.h. Add CSky constants that
may be missing.
In disasm.c add an include of dwarf-regs.h as the included
arch/*/annotate/instructions.c files make use of the constants and we
want the elf.h/dwarf-regs.h dependency to be explicit.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET was used for BPF prologue
support which was removed in Commit 3d6dfae889 ("perf parse-events:
Remove BPF event support"). The code is no longer used so remove.
Remove the offset from various dwarf-regs.c tables and the dependence
on ptrace.h. Rename structs starting pt_ as the ptrace derived offset is
now removed.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add a few paragraphs on tool and hwmon events.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20241109003759.473460-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Based on a mix of the sysfs PMU test (for creating the reference
files) and the tool PMU test, test that parsing given hwmon events
with there aliases creates the expected config values.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20241109003759.473460-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add the base PMU calls necessary for hwmon_pmu(s) to be
created/deleted and events found, listed, opened and read.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20241109003759.473460-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add a tool PMU for hwmon events but don't enable.
The hwmon sysfs ABI is defined in
Documentation/hwmon/sysfs-interface.rst. Create a PMU that reads the
hwmon input and can be used in `perf stat` and metrics much as an
uncore PMU can.
For example, when enabled by a later patch, the following shows
reading the CPU temperature and 2 fan speeds alongside the uncore
frequency:
```
$ perf stat -e temp_cpu,fan1,hwmon_thinkpad/fan2/,tool/num_cpus_online/ -M UNCORE_FREQ -I 1000
1.001153138 52.00 'C temp_cpu
1.001153138 2,588 rpm fan1
1.001153138 2,482 rpm hwmon_thinkpad/fan2/
1.001153138 8 tool/num_cpus_online/
1.001153138 1,077,101,397 UNC_CLOCK.SOCKET # 1.08 UNCORE_FREQ
1.001153138 1,012,773,595 duration_time
...
```
The PMUs are named from /sys/class/hwmon/hwmon<num>/name and have an
alias of hwmon<num>.
Hwmon data is presented in multiple <type><number>_<item> files. The
<type><number> is used to identify the event as is the <type> followed
by the contents of the <type>_label file if it exists. The
<type><number>_input file gives the data read by perf.
When enabled by a later patch, in `perf list` the other hwmon <item>
files are used to give a richer description, for example:
```
hwmon:
temp1
[Temperature in unit acpitz named temp1. Unit: hwmon_acpitz]
in0
[Voltage in unit bat0 named in0. Unit: hwmon_bat0]
temp_core_0 OR temp2
[Temperature in unit coretemp named Core 0. crit=100'C,max=100'C crit_alarm=0'C. Unit:
hwmon_coretemp]
temp_core_1 OR temp3
[Temperature in unit coretemp named Core 1. crit=100'C,max=100'C crit_alarm=0'C. Unit:
hwmon_coretemp]
...
temp_package_id_0 OR temp1
[Temperature in unit coretemp named Package id 0. crit=100'C,max=100'C crit_alarm=0'C.
Unit: hwmon_coretemp]
temp1
[Temperature in unit iwlwifi_1 named temp1. Unit: hwmon_iwlwifi_1]
temp_composite OR temp1
[Temperature in unit nvme named Composite. alarm=0'C,crit=86.85'C,max=75.85'C,
min=-273.15'C. Unit: hwmon_nvme]
temp_sensor_1 OR temp2
[Temperature in unit nvme named Sensor 1. max=65261.8'C,min=-273.15'C. Unit: hwmon_nvme]
temp_sensor_2 OR temp3
[Temperature in unit nvme named Sensor 2. max=65261.8'C,min=-273.15'C. Unit: hwmon_nvme]
fan1
[Fan in unit thinkpad named fan1. Unit: hwmon_thinkpad]
fan2
[Fan in unit thinkpad named fan2. Unit: hwmon_thinkpad]
...
temp_cpu OR temp1
[Temperature in unit thinkpad named CPU. Unit: hwmon_thinkpad]
temp_gpu OR temp2
[Temperature in unit thinkpad named GPU. Unit: hwmon_thinkpad]
curr1
[Current in unit ucsi_source_psy_usbc000_0 named curr1. max=1.5A. Unit:
hwmon_ucsi_source_psy_usbc000_0]
in0
[Voltage in unit ucsi_source_psy_usbc000_0 named in0. max=5V,min=5V. Unit:
hwmon_ucsi_source_psy_usbc000_0]
```
As there may be multiple hwmon devices a range of PMU types are
reserved for their use and to identify the PMU as belonging to the
hwmon types.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20241109003759.473460-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Filename parsing maps a hwmon filename to constituent parts enum/int
parts for the hwmon config value. Add a test case for the parsing.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
[namhyung: add #include <linux/string.h> for strlcpy()]
Link: https://lore.kernel.org/r/20241109003759.473460-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
hwmon filenames have a specific encoding that will be used to give a
config value. The encoding is described in:
Documentation/hwmon/sysfs-interface.rst
Add a function to parse the filename into consituent enums/ints that
will then be amenable to config encoding.
Note, things are done this way to allow mapping names to config and
back without the use of hash/dynamic lookup tables.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
[namhyung: add #include <linux/string.h> for strlcpy()]
Link: https://lore.kernel.org/r/20241109003759.473460-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Long metric names combined with units may exceed the metric_bf and
lead to truncation. Double metric_bf in size to avoid this.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20241106004818.2174593-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The commit 0fe881f10c ("perf jevents: Autogenerate empty-pmu-events.c")
build will generate two files, add them to .gitignore:
tools/perf/pmu-events/empty-pmu-events.log
tools/perf/pmu-events/test-empty-pmu-events.c
Signed-off-by: Haiyue Wang <haiyuewa@163.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241106121254.2869-1-haiyuewa@163.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
event_format__print() last use was removed by 2017's
commit 894f3f1732 ("perf script: Use event_format__fprintf()")
evlist__find_tracepoint_by_id() last use was removed by 2012's
commit e60fc847ce ("perf evlist: Remove some unused methods")
evlist__set_tp_filter_pid() last use was removed by 2017's
commit dd1a50377c ("perf trace: Introduce filter_loop_pids()")
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241106144826.91728-1-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
trace__fprintf_tp_fields may not print any tracepoint arguments. E.g., if the
argument values are all zero. Previously, this would result in a totally
uninitialized buffer being passed to fprintf, which could lead to garbage on the
console. Fix the problem by passing the number of initialized bytes fprintf.
Fixes: f11b2803bb ("perf trace: Allow choosing how to augment the tracepoint arguments")
Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
Tested-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20241103204816.7834-1-benjamin@engflow.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Since there are no longer any header include differences between
lib/list_sort.c and tools/lib/list_sort.c, update the expected diff in
check-header_ignore_hunks accordingly.
Link: https://lkml.kernel.org/r/20241012042828.471614-4-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During the parallel testing, I've noticed some ftrace test failures. It
seems the regex pattern checks 100 msec of nanosleep with the error
range of 10 msec. But sometimes it's affected by other processes and
resulted in more time in the syscall.
The following output shows that it took more than 120 msec and failed.
Let's update the regex pattern so that it can allow more drifts.
perf ftrace profile test
# Total (us) Avg (us) Max (us) Count Function
121279.500 121279.500 121279.500 1 __x64_sys_clock_nanosleep
121278.400 121278.400 121278.400 1 common_nsleep
121277.800 121277.800 121277.800 1 hrtimer_nanosleep
121277.100 121277.100 121277.100 1 do_nanosleep
341760.289 56960.048 121273.400 6 schedule
176.200 25.171 31.616 7 scheduler_tick
0.923 0.923 0.923 1 native_smp_send_reschedule
345522.360 69104.472 345320.600 5 __x64_sys_execve
345486.585 69097.317 345312.700 5 do_execveat_common.isra.0
340730.300 340730.300 340730.300 1 bprm_execve
1.758 0.879 0.883 2 sched_mm_cid_before_execve
1.112 1.112 1.112 1 sched_mm_cid_after_execve
---- end(-1) ----
81: perf ftrace tests : FAILED!
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241102231702.2262258-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Since the C test wrapper for attr.py was removed we don't have an attr.o
object for that CFLAGS_attr.o to apply for, remove it.
Fixes: 3a447031f5 ("perf test: Remove C test wrapper for attr.py")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/ZyjbksKYnV22zmz-@x1
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When building with custom libtraceevent, below errors occur:
$ make -C tools/perf NO_LIBPYTHON=1 PKG_CONFIG_PATH=<custom libtraceevent>
In file included from util/session.h:5,
from builtin-buildid-list.c:17:
util/trace-event.h:153:10: fatal error: traceevent/event-parse.h: No such file or directory
153 | #include <traceevent/event-parse.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
<snip similar errors of missing headers>
This is because the include path is missed in the cflags. Add it.
Fixes: 0f0e1f4456 ("perf build: Use pkg-config for feature check for libtrace{event,fs}")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Guilherme Amadio <amadio@gentoo.org>
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20241024133236.31016-1-yangyicong@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The cpu-list part of this testcase has proven itself to be unreliable.
Sometimes, we get "<not counted>" for system.slice when pinned to CPUs
0 and 1. In such case, the test fails.
Since we cannot simply guarantee that any system.slice load will run
on any arbitrary list of CPUs, except the whole set of all CPUs, let's
rather remove the cpu-list subtest.
Fixes: a84260e314 ("perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test")
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: vmolnaro@redhat.com
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20241101102812.576425-1-mpetlan@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Having multiple unwinding libraries makes the perf code harder to
understand and we have unused/untested code paths.
Perf made BPF support an opt-out rather than opt-in feature. As libbpf
has a libelf dependency, elfutils that provides libelf will also
provide libdw. When libdw is present perf will use libdw unwinding
rather than libunwind unwinding even if libunwind support is compiled
in.
Rather than have libunwind built into perf and never used, explicitly
disable the support and make it opt-in.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20241028193619.247727-1-irogers@google.com
Closes: https://lore.kernel.org/linux-perf-users/CAP-5=fUXkp-d7gkzX4eF+nbjb2978dZsiHZ9abGHN=BN1qAcbg@mail.gmail.com/
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
bperf has a nice ability to share PMUs, but it still does not support
inherit events during fork(), resulting in some deviations in its stat
results compared with perf.
perf stat result:
$ ./perf stat -e cycles,instructions -- ./perf test -w sqrtloop
Performance counter stats for './perf test -w sqrtloop':
2,316,038,116 cycles
2,859,350,725 instructions
1.009603637 seconds time elapsed
1.004196000 seconds user
0.003950000 seconds sys
bperf stat result:
$ ./perf stat --bpf-counters -e cycles,instructions -- \
./perf test -w sqrtloop
Performance counter stats for './perf test -w sqrtloop':
18,762,093 cycles
23,487,766 instructions
1.008913769 seconds time elapsed
1.003248000 seconds user
0.004069000 seconds sys
In order to support event inheritance, two new bpf programs are added
to monitor the fork and exit of tasks respectively. When a task is
created, add it to the filter map to enable counting, and reuse the
`accum_key` of its parent task to count together with the parent task.
When a task exits, remove it from the filter map to disable counting.
After support:
$ ./perf stat --bpf-counters -e cycles,instructions -- \
./perf test -w sqrtloop
Performance counter stats for './perf test -w sqrtloop':
2,316,252,189 cycles
2,859,946,547 instructions
1.009422314 seconds time elapsed
1.003597000 seconds user
0.004270000 seconds sys
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
Cc: song@kernel.org
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20241021110201.325617-2-wutengda@huaweicloud.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Since the linked commit, we stopped interpreting data source if the
perf.data file doesn't have the new metadata version. This means that
perf c2c will show no samples in this case.
Keep the old behavior so old files can be opened, but also still show
the new warning that updating might improve the decoding.
Also re-write the warning to be more concise and specific to a user.
Fixes: ba5e7169e5 ("perf arm-spe: Use metadata to decide the data source feature")
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Cc: Julio.Suarez@arm.com
Cc: Kiel.Friedt@arm.com
Cc: Ryan.Roberts@arm.com
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241029143734.291638-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
RISC-V does not currently support perf trace, since the system call
table is not generated.
Perform the copy/paste exercise, wiring up RISC-V system call table
generation.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-riscv@lists.infradead.org
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20241024190353.46737-1-bjorn@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When perf is linked with libdebuginfod:
root@number:~# ldd ~/bin/perf | grep debuginfod
libdebuginfod.so.1 => /lib64/libdebuginfod.so.1 (0x00007ff5c3930000)
root@number:~# perf check feature debuginfod
debuginfod: [ on ] # HAVE_DEBUGINFOD_SUPPORT
root@number:~#
And we don't have a debuginfo package installed for the binary we're
trying to use, vmlinux in this case as we didn't specify any using 'perf
probe -x', it will use the build for the running kernel:
root@number:~# perf buildid-list -k
38e927fd7799d50dbc4d99ec5e3f781b6105a6a9
root@number:~#
And communicate with a debuginfo server, be it configured in a
~/.perfconfig file, excerpt from the 'perf config' man page:
buildid-cache.*
buildid-cache.debuginfod=URLs Specify debuginfod URLs to be
used when retrieving perf.data binaries, it follows the same
syntax as the DEBUGINFOD_URLS variable, like:
buildid-cache.debuginfod=http://192.168.122.174:8002
Or via the DEBUGINFOD_URLS env var, as distros like fedora do by
default:
root@number:~# echo $DEBUGINFOD_URLS
https://debuginfod.fedoraproject.org/
root@number:~#
To pick and cache just what is needed, instead of requiring the manual
installation of the entire kernel-debuginfo package, which is really
large.
It will, in this example, use the following cache files, deleted
before/after this patch just to test the whole process:
root@number:~# rm -f /root/.cache/debuginfod_client/38e927fd7799d50dbc4d99ec5e3f781b6105a6a9/source-a1414a5d-#usr#src#debug#kernel-6.11.4#linux-6.11.4-201.fc40.x86_64#net#ipv4#icmp.c
root@number:~# rm -f /root/.cache/debuginfod_client/38e927fd7799d50dbc4d99ec5e3f781b6105a6a9/debuginfo
Before this patch:
root@number:~# perf probe -L icmp_rcv
Failed to find source file path.
Error: Failed to show lines.
root@number:~#
This is because 'perf probe' was using just the relative file name, in
this case "net/ipv4/icmp.c", that is where the 'icmp_rcv' function is
located, if we add it and comply with the debuginfo_find_source()
function man page, it contacts the server, finds the necessary files,
cache them locally and all works:
root@number:~# perf probe -L icmp_rcv | head
<icmp_rcv@/root/.cache/debuginfod_client/38e927fd7799d50dbc4d99ec5e3f781b6105a6a9/source-a1414a5d-#usr#src#debug#kernel-6.11.4#linux-6.11.4-201.fc40.x86_64#net#ipv4#icmp.c:0>
0 int icmp_rcv(struct sk_buff *skb)
{
2 enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
struct rtable *rt = skb_rtable(skb);
struct net *net = dev_net(rt->dst.dev);
struct icmphdr *icmph;
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
8 struct sec_path *sp = skb_sec_path(skb);
root@number:~#
Acked-by: Frank Ch. Eigler <fche@redhat.com>
Cc: Aaron Merey <amerey@redhat.com>
Cc: Francesco Nigro <fnigro@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/ZyACsIFUETsr7-09@x1
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This allows a uniform test numbering even though two passes are used
to execute them.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241025192109.132482-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
If the `perf test` process is killed the child tests continue running
and may run indefinitely. Propagate SIGINT (ctrl-C) and SIGTERM (kill)
signals to the running child processes so that they terminate when the
parent is killed.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241025192109.132482-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Now C tests can have the "exclusive" flag to run without other tests,
and shell tests can add "(exclusive)" to their description, run tests
in parallel by default. Tests which flake when run in parallel can be
marked exclusive to resolve the problem.
Non-scientifically, the reduction on `perf test` execution time is
from 8m35.890s to 3m55.115s on a Tigerlake laptop. So the tests
complete in less than half the time.
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241025192109.132482-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In pass 1 run all tests that succeed when run in parallel. In pass 2
sequentially run all remaining tests that are flagged as
"exclusive". Sequential and dont_fork tests keep to run in pass 1.
Read the exclusive flag from the shell test descriptions, but remove
from display to avoid >100 characters. Add error handling to finish
tests if starting a later test fails. Mark the task-exit test as
exclusive due to issues reported-by James Clark.
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241025192109.132482-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add a signal handler around running a test. If a signal occurs during
the test a siglongjmp unwinds the stack and output is flushed. The
global run_test_jmp_buf is either unique per forked child or not
shared during sequential execution.
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241025192109.132482-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Some shell tests compete for resources and so can't run with other
tests, tag such tests. The "(exclusive)" stems from shared/exclusive
to describe how the tests run as if holding a lock.
For ARM/coresight tests:
Suggested-by: James Clark <james.clark@linaro.org>
Additional failing tests:
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241025192109.132482-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Python's json.tool will output the input json to stdout. Redirect to
/dev/null to avoid blocking on stdout writes.
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241025192109.132482-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The variable duplicates sequential but is only used for command line
argument processing. Reduce scope to make the behavior clearer.
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241025192109.132482-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Before polling or sleeping to wait for a test to complete, print out
": Running (<num> active)" where the number of active tests is
determined by iterating over the tests and seeing which return false
for check_if_command_finished. The line erasing and printing out only
occur if the number of runnings tests changes to avoid the line
flickering excessively. Knowing tests are running allows a user to
know a test is running and in parallel mode how many of the tests are
waiting to complete. If color mode is disabled then avoid displaying
the "Running" message as deleting the line isn't reliable.
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241025192109.132482-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
As there are duplicated kernel headers in tools/include libc can pick
up the wrong definitions. This was causing the wrong system call for
capget in perf.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: e25ebda78e ("perf cap: Tidy up and improve capability testing")
Closes: https://lore.kernel.org/lkml/cc7d6bdf-1aeb-4179-9029-4baf50b59342@intel.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241026055448.312247-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
7f053812da ("random: vDSO: minimize and simplify header includes")
That required adding a copy of include/vdso/unaligned.h and its checking
in tools/perf/check-headers.h.
Addressing this perf tools build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/linux/unaligned.h include/linux/unaligned.h
Please see tools/include/uapi/README for further details.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zx-uHvAbPAESofEN@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In symbol__disassemble_raw(), the created disasm_line should be
discarded before returning an error. When creating disasm_line fails,
break the loop and then release the created lines.
Fixes: 0b971e6bf1 ("perf annotate: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: sesse@google.com
Cc: kjain@linux.ibm.com
Link: https://lore.kernel.org/r/20241019154157.282038-3-lihuafei1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
symbol__disassemble_capstone_powerpc() goto the 'err' label when it
failed in the loop that created disasm_line, and then used free()
directly to free disasm_line. Since the structure disasm_line contains
members that allocate memory dynamically, this can result in a memory
leak. In fact, we can simply break the loop when it fails in the middle
of the loop, and disasm_line__free() will then be called to properly
free the created line. Other error paths do not need to consider freeing
disasm_line.
Fixes: c5d60de181 ("perf annotate: Add support to use libcapstone in powerpc")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: sesse@google.com
Cc: kjain@linux.ibm.com
Link: https://lore.kernel.org/r/20241019154157.282038-2-lihuafei1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The structure disasm_line contains members that require dynamically
allocated memory and need to be freed correctly using
disasm_line__free().
This patch fixes the incorrect release in
symbol__disassemble_capstone().
Fixes: 6d17edc113 ("perf annotate: Use libcapstone to disassemble")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: sesse@google.com
Cc: kjain@linux.ibm.com
Link: https://lore.kernel.org/r/20241019154157.282038-1-lihuafei1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Noticed while building on a raspbian arm 32-bit system.
There was also this other case, fixed by adding a missing util/stat.h
with the prototypes:
/tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1396:6: error: no previous prototype for ‘perf_stat__set_no_csv_summary’ [-Werror=missing-prototypes]
1396 | void perf_stat__set_no_csv_summary(int set __maybe_unused)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1400:6: error: no previous prototype for ‘perf_stat__set_big_num’ [-Werror=missing-prototypes]
1400 | void perf_stat__set_big_num(int set __maybe_unused)
| ^~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
In other architectures this must be building due to some lucky indirect
inclusion of that header.
Fixes: 9dabf40034 ("perf python: Switch module to linking libraries from building source")
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZxllAtpmEw5fg9oy@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test case test_adding_blacklisted ends in failure if the blacklisted
probe is of an assembler function with no DWARF available. At the same
time, probing the blacklisted function with ASM DWARF doesn't test the
blacklist itself as the failure is a result of the broken DWARF.
When the broken DWARF output is encountered, check if the probed
function was compiled by the assembler. If so, the broken DWARF message
is expected and does not report a perf issue, else report a failure. If
the ASM DWARF affected the probe, try the next probe on the blacklist.
If the first 5 probes are defective due to broken DWARF, skip the test
case.
Fixes: def5480d63 ("perf testsuite probe: Add test for blacklisted kprobes handling")
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241017161555.236769-1-vmolnaro@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This fixes a build breakage on 32-bit arm, where the
syscalltbl__id_at_idx() function was missing.
Committer notes:
Generating a proper syscall table from a copy of
arch/arm/tools/syscall.tbl ends up being too big a patch for this rc
stage, I started doing it but while testing noticed some other problems
with using BPF to collect pointer args on arm7 (32-bit) will maybe
continue trying to make it work on the next cycle...
Fixes: 7a2fb5619c ("perf trace: Fix iteration of syscall ids in syscalltbl->entries")
Suggested-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: <jslaby@suse.cz>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/lkml/3a592835-a14f-40be-8961-c0cee7720a94@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This serves as a revert for this patch:
https://lore.kernel.org/linux-perf-users/ZuGL9ROeTV2uXoSp@x1/
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241011021403.4089793-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add some more checks to pass the verifier in more kernels.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241011021403.4089793-3-howardchu95@gmail.com
[ Reduced the patch removing things that can be done later ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In a RHEL8 kernel (4.18.0-513.11.1.el8_9.x86_64), that, as enterprise
kernels go, have backports from modern kernels, the verifier complains
about lack of bounds check for the index into the array of syscall
arguments, on a BPF bytecode generated by clang 17, with:
; } else if (size < 0 && size >= -6) { /* buffer */
116: (b7) r1 = -6
117: (2d) if r1 > r6 goto pc-30
R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv-6 R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6=inv(id=0,umin_value=18446744073709551610,var_off=(0xffffffff00000000; 0xffffffff)) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value
; index = -(size + 1);
118: (a7) r6 ^= -1
119: (67) r6 <<= 32
120: (77) r6 >>= 32
; aug_size = args->args[index];
121: (67) r6 <<= 3
122: (79) r1 = *(u64 *)(r10 -24)
123: (0f) r1 += r6
last_idx 123 first_idx 116
regs=40 stack=0 before 122: (79) r1 = *(u64 *)(r10 -24)
regs=40 stack=0 before 121: (67) r6 <<= 3
regs=40 stack=0 before 120: (77) r6 >>= 32
regs=40 stack=0 before 119: (67) r6 <<= 32
regs=40 stack=0 before 118: (a7) r6 ^= -1
regs=40 stack=0 before 117: (2d) if r1 > r6 goto pc-30
regs=42 stack=0 before 116: (b7) r1 = -6
R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv1 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_w=inv(id=0) R5_w=inv40 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=0) R7_w=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8_w=invP6 R9_w=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_w=map_value fp-24_r=map_value fp-32_w=inv40 fp-40=ctx fp-48=map_value fp-56_w=inv1 fp-64_w=map_value fp-72=map_value fp-80=map_value
parent didn't have regs=40 stack=0 marks
last_idx 110 first_idx 98
regs=40 stack=0 before 110: (6d) if r1 s> r6 goto pc+5
regs=42 stack=0 before 109: (b7) r1 = 1
regs=40 stack=0 before 108: (65) if r6 s> 0x1000 goto pc+7
regs=40 stack=0 before 98: (55) if r6 != 0x1 goto pc+9
R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=invP12 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_rw=inv(id=0) R5_w=inv24 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=2147483647) R7_w=map_value(id=0,off=40,ks=4,vs=8272,imm=0) R8_rw=invP4 R9_w=map_value(id=0,off=12,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_rw=map_value fp-24_r=map_value fp-32_rw=invP24 fp-40_r=ctx fp-48_r=map_value fp-56_w=invP1 fp-64_rw=map_value fp-72_r=map_value fp-80_r=map_value
parent already had regs=40 stack=0 marks
124: (79) r6 = *(u64 *)(r1 +16)
R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=map_value(id=0,off=0,ks=4,vs=8272,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6_w=invP(id=0,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value
R1 unbounded memory access, make sure to bounds check any such access
processed 466 insns (limit 1000000) max_states_per_insn 2 total_states 20 peak_states 20 mark_read 3
If we add this line, as used in other BPF programs, to cap that index:
index &= 7;
The generated BPF program is considered safe by that version of the BPF
verifier, allowing perf to collect the syscall args in one more kernel
using the BPF based pointer contents collector.
With the above one-liner it works with that kernel:
[root@dell-per740-01 ~]# uname -a
Linux dell-per740-01.khw.eng.rdu2.dc.redhat.com 4.18.0-513.11.1.el8_9.x86_64 #1 SMP Thu Dec 7 03:06:13 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
[root@dell-per740-01 ~]# ~acme/bin/perf trace -e *sleep* sleep 1.234567890
0.000 (1234.704 ms): sleep/3863610 nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }) = 0
[root@dell-per740-01 ~]#
As well as with the one in Fedora 40:
root@number:~# uname -a
Linux number 6.11.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 10 22:31:19 UTC 2024 x86_64 GNU/Linux
root@number:~# perf trace -e *sleep* sleep 1.234567890
0.000 (1234.722 ms): sleep/14873 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x7ffe87311a40) = 0
root@number:~#
Song Liu reported that this one-liner was being optimized out by clang
18, so I suggested and he tested that adding a compiler barrier before
it made clang v18 to keep it and the verifier in the kernel in Song's
case (Meta's 5.12 based kernel) also was happy with the resulting
bytecode.
I'll investigate using virtme-ng[1] to have all the perf BPF based
functionality thoroughly tested over multiple kernels and clang
versions.
[1] https://kernel-recipes.org/en/2024/virtme-ng/
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/lkml/Zw7JgJc0LOwSpuvx@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's a very simply test just to run with cycles:P and instructions:P
events.
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241016062359.264929-10-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The fallback logic can add ":u" modifier if needed.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241016062359.264929-9-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The perf_event_open might fail due to various reasons, so blindly
reducing precise_ip level might not be the best way to deal with it.
It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
given precise level. Let's try again with the correct error code.
This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
user events with exclude_kernel=1 cannot make progress. Let's add the
evsel__handle_error_quirks() to this case specially. I plan to work on
the kernel side to improve this situation but it'd still need some
special handling for IBS.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241016062359.264929-8-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
It can be called from non-x86 platform so let's move it to the general
util directory. Also add a new helper perf_env__is_x86_amd_cpu() so
that it can be called with an existing perf_env as well.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241016062359.264929-7-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The evsel__detect_missing_features() is to check if the attributes of
the evsel is supported or not. But it checks the attribute based on the
given evsel, it might miss something if the attr doesn't have the bit or
give incorrect results if the event is special.
Also it maintains the order of the feature that was added to the kernel
which means it can assume older features should be supported once it
detects the current feature is working. To minimized the confusion and
to accurately check the kernel features, I think it's better to use a
software event and go through all the features at once.
Also make the function static since it's only used in evsel.c.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241016062359.264929-6-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
It seems perf sets the exclude_guest bit because of Intel PEBS
implementation which uses a virtual address. IIUC now kernel disables
PEBS when it goes to the guest mode regardless of this bit so we don't
need to set it explicitly. At least for the other archs/vendors.
I found the commit 1342798cc1 set the exclude_guest for precise_ip
in the tool and the commit 20b279ddb3 added kernel side enforcement
which was reverted by commit a706d965dc later.
Actually it doesn't set the exclude_guest for the default event
(cycles:P) already.
$ grep -m1 vendor /proc/cpuinfo
vendor_id : GenuineIntel
$ perf record -e cycles:P true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ]
$ perf evlist -v | tr ',' '\n' | grep -e exclude -e precise
precise_ip: 3
But having lower 'p' modifier set the bit for some reason.
$ perf record -e cycles:pp true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ]
$ perf evlist -v | tr ',' '\n' | grep -e exclude -e precise
precise_ip: 2
exclude_guest: 1
Actually AMD IBS suffers from this because it doesn't support excludes
and having this bit effectively disables new features in the current
implementation (due to the missing feature check).
$ grep -m1 vendor /proc/cpuinfo
vendor_id : AuthenticAMD
$ perf record -W -e cycles:p -vv true 2>&1 | grep switching
switching off PERF_FORMAT_LOST support
switching off weight struct support
switching off bpf_event
switching off ksymbol
switching off cloexec flag
switching off mmap2
switching off exclude_guest, exclude_host
By not setting exclude_guest, we can fix this inconsistency and the
troubles.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241016062359.264929-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Since it doesn't set the exclude_guest, no need to special handle the
bit and simply show only if one of host or guest bit is set. Now the
default event name might not have :H prefix anymore so change the
dlfilter test not to compare the ":" at the end.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241016062359.264929-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The exclude_guest in the event attribute is to limit profiling in the
host environment. But I'm not sure why we want to set it by default
cause we don't care about it in most cases and I feel like it just
makes new PMU implementation complicated.
Of course it's useful for perf kvm command so I added the
exclude_GH_default variable to preserve the old behavior for perf kvm
and other commands like perf record and stat won't set the exclude bit.
This is helpful for AMD IBS case since having exclude_guest bit will
clear new feature bit due to the missing feature check logic.
$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = 0
$ perf record -W -e ibs_op// -vv true 2>&1 | grep switching
switching off PERF_FORMAT_LOST support
switching off weight struct support
switching off bpf_event
switching off ksymbol
switching off cloexec flag
switching off mmap2
switching off exclude_guest, exclude_host
Intestingly, I found it sets the exclude_bit if "u" modifier is used.
I don't know why but it's neither intuitive nor consistent. Let's
remove the bit there too.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241016062359.264929-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Commit 7b100989b4 ("perf evlist: Remove __evlist__add_default")
changed to parse "cycles:P" event instead of creating a new cycles
event for perf record. But it also changed the way how modifiers are
handled so it doesn't set the exclude_guest bit by default.
It seems Apple M1 PMU requires exclude_guest set and returns EOPNOTSUPP
if not. Let's add a fallback so that it can work with default events.
Also update perf stat hybrid tests to handle possible u or H modifiers.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20241016062359.264929-2-namhyung@kernel.org
Fixes: 7b100989b4 ("perf evlist: Remove __evlist__add_default")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Wasn't documented so far, mention that it is mostly used in the shell
regression tests.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/r/20241020021842.1752770-4-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Using it:
$ perf test -w noplop
No workload found: noplop
$
$ perf test -w
Error: switch `w' requires a value
Usage: perf test [<options>] [{list <test-name-fragment>|[<test-name-fragments>|<test-numbers>]}]
-w, --workload <work>
workload to run for testing, use '--list-workloads' to list the available ones.
$
$ perf test --list-workloads
noploop
thloop
leafloop
sqrtloop
brstack
datasym
landlock
$
Would be good at some point to have a description in 'struct test_workload'.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/r/20241020021842.1752770-3-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
For events that count data cache fills, some combinations of the unit
mask bits are useful for counting fills from local caches, DRAM or any
far sources. However, named events currently exist for PMCx044 (Any Data
Cache Fills) only. Add similar events for the following base events.
* PMCx043 (Demand Data Cache Fills)
* PMCx059 (Software Prefetch Data Cache Fills)
* PMCx05A (Hardware Prefetch Data Cache Fills)
While at it, remove "ls_any_fills_from_sys.all_dram_io" since it is a
duplicate of "ls_any_fills_from_sys.dram_io_all".
Event descriptions can be found in Section 2.1.16.5.2 "Load/Store (LS)
Events" of the Processor Programming Reference (PPR) for AMD Family 1Ah
Model 02h Revision C1 Processors document available at the link below.
Link: https://bugzilla.kernel.org/attachment.cgi?id=307010
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: ananth.narayan@amd.com
Cc: ravi.bangoria@amd.com
Cc: eranian@google.com
Link: https://lore.kernel.org/r/e036e3c9fb962c939fa06c855b68e532ee609e01.1729242778.git.sandipan.das@amd.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Perf test case 84 'perf pipe recording and injection test'
sometime fails on s390, especially on z/VM virtual machines.
This is caused by a very short run time of workload
# perf test -w noploop
which runs for 1 second. Occasionally this is not long
enough and the perf report has no samples for symbol noploop.
Fix this and enlarge the runtime for the perf work load
to 3 seconds. This ensures the symbol noploop is always
present. Since only s390 is affected, make this loop
architecture dependend.
Output before:
Inject -b build-ids test
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.195 MB - ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.277 MB - ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.195 MB - ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.160 MB
/tmp/perf.data.ELzRdq (4031 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.195 MB - ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.195 MB - ]
Inject -b build-ids test [Success]
Inject --buildid-all build-ids test
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.195 MB - ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB - ]
Inject --buildid-all build-ids test [Failed - cannot find
noploop function in pipe #2]
Output after:
Successful execution for over 10 times in a loop.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Cc: agordeev@linux.ibm.com
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Link: https://lore.kernel.org/r/20241018081732.1391060-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Like in the metricgroup tests, it should check the permission first and
then skip relevant failures accordingly.
Also it needs to try again with the system wide flag properly. On the
second round, check if the result has the metric name because other
failure cases are checked in the first round already.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241018204306.741972-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In Makefile.config for unwinding the name dwarf implies either
libunwind or libdw. Make it clearer that CONFIG_DWARF is really just
defined when libdw is present by renaming to CONFIG_LIBDW.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In Makefile.config for unwinding the name dwarf implies either
libunwind or libdw. Make it clearer that HAVE_DWARF_SUPPORT is really
just defined when libdw is present by renaming to HAVE_LIBDW_SUPPORT.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
As HAVE_DWARF_GETLOCATIONS_SUPPORT and HAVE_DWARF_CFI_SUPPORT always
match HAVE_DWARF_SUPPORT remove the macros and use
HAVE_DWARF_SUPPORT. If building the file is guarded by CONFIG_DWARF
then remove all ifs.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The test _ELFUTILS_PREREQ(0, 142) is false for elfutils before
2009-06-13, but that is 15 years ago and very unlikely. Add a test to
test-libdw.c and assume the libdw version is at least 0.142 to
simplify the build logic.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
dwarf_getcfi support in libdw is 15 years old. Make libdw imply
dwarf_getcfi support and simplify build logic.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
dwarf_getlocations support in libdw is more than 10 years old. Make
libdw imply dwarf_getlocations support and simplify build logic.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Support in libdw has been present for 10 years so let's simplify the
build logic with a single feature test.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Be more intention revealing that the dwarf test is actually testing
for libdw support.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Previously NO_DWARF_UNWIND was part of conditional compilation but it
is now unused so remove.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
NO_DWARF could mean more than NO_LIBDW support, in particular no
libunwind support. Rename to be more intention revealing.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Testing with a LIBDW_DIR showed that in Makefile.config the dwarf
feature tests need the LIBDW_DIR setting in the CFLAGS/LDFLAGS.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241017001354.56973-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Now the attr tests are shell tests move the associated python and
configuration files. Update the installation build rules for the new
directories. Recycle the lib install rules for python files allowing
the explicit attr.py install line to be dropped.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241015000158.871828-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Remove the C wrapper now a shell script wrapper exists. Move
perf_event_attr dumping functions to evsel.c and reduce the scope of
variables/defines. Use fprintf to avoid snprintf complexities in
WRITE_ASS.
Add __SANE_USERSPACE_TYPES__ to evsel.c to fix format flag issues on
PowerPC triggered by moving attr.c functions to evsel.c.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241015000158.871828-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The "Setup struct perf_event_attr" test in attr.c does a bunch of
directory finding to set up running a python test that in general is
more brittle than similar logic we have in shell tests. Add a shell
test that invokes and runs the tests in the python attr.py script.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241015000158.871828-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
An issue can be observed when probe C++ demangled symbol with steps:
# nm test_cpp_mangle | grep print_data
0000000000000c94 t _GLOBAL__sub_I__Z10print_datai
0000000000000afc T _Z10print_datai
0000000000000b38 T _Z10print_dataR5Point
# perf probe -x /home/niayan01/test_cpp_mangle -F --demangle
...
print_data(Point&)
print_data(int)
...
# perf --debug verbose=3 probe -x test_cpp_mangle --add "test=print_data(int)"
probe-definition(0): test=print_data(int)
symbol:print_data(int) file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Open Debuginfo file: /home/niayan01/test_cpp_mangle
Try to find probe point from debuginfo.
Symbol print_data(int) address found : afc
Matched function: print_data [2ccf]
Probe point found: print_data+0
Found 1 probe_trace_events.
Opening /sys/kernel/tracing//uprobe_events write=1
Opening /sys/kernel/tracing//README write=0
Writing event: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0xb38
...
When tried to probe symbol "print_data(int)", the log shows:
Symbol print_data(int) address found : afc
The found address is 0xafc - which is right with verifying the output
result from nm. Afterwards when write event, the command uses offset
0xb38 in the last log, which is a wrong address.
The dwarf_diename() gets a common function name, in above case, it
returns string "print_data". As a result, the tool parses the offset
based on the common name. This leads to probe at the wrong symbol
"print_data(Point&)".
To fix the issue, use the die_get_linkage_name() function to retrieve
the distinct linkage name - this is the mangled name for the C++ case.
Based on this unique name, the tool can get a correct offset for
probing. Based on DWARF doc, it is possible the linkage name is missed
in the DIE, it rolls back to use dwarf_diename().
After:
# perf --debug verbose=3 probe -x test_cpp_mangle --add "test=print_data(int)"
probe-definition(0): test=print_data(int)
symbol:print_data(int) file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Open Debuginfo file: /home/niayan01/test_cpp_mangle
Try to find probe point from debuginfo.
Symbol print_data(int) address found : afc
Matched function: print_data [2d06]
Probe point found: print_data+0
Found 1 probe_trace_events.
Opening /sys/kernel/tracing//uprobe_events write=1
Opening /sys/kernel/tracing//README write=0
Writing event: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0xafc
Added new event:
probe_test_cpp_mangle:test (on print_data(int) in /home/niayan01/test_cpp_mangle)
You can now use it in all perf tools, such as:
perf record -e probe_test_cpp_mangle:test -aR sleep 1
# perf --debug verbose=3 probe -x test_cpp_mangle --add "test2=print_data(Point&)"
probe-definition(0): test2=print_data(Point&)
symbol:print_data(Point&) file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Open Debuginfo file: /home/niayan01/test_cpp_mangle
Try to find probe point from debuginfo.
Symbol print_data(Point&) address found : b38
Matched function: print_data [2ccf]
Probe point found: print_data+0
Found 1 probe_trace_events.
Opening /sys/kernel/tracing//uprobe_events write=1
Parsing probe_events: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0x0000000000000afc
Group:probe_test_cpp_mangle Event:test probe:p
Opening /sys/kernel/tracing//README write=0
Writing event: p:probe_test_cpp_mangle/test2 /home/niayan01/test_cpp_mangle:0xb38
Added new event:
probe_test_cpp_mangle:test2 (on print_data(Point&) in /home/niayan01/test_cpp_mangle)
You can now use it in all perf tools, such as:
perf record -e probe_test_cpp_mangle:test2 -aR sleep 1
Fixes: fb1587d869 ("perf probe: List probes with line number and file name")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20241012141432.877894-1-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
These modes don't use the threshold, so don't compute it saving time
and potentially reducing events.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241017175356.783793-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Colors don't mean things in CSV and JSON output, switch to a threshold
enum value that the standard output can convert to a color. Updating
the CSV and JSON output will be later changes.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241017175356.783793-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Avoid cases like:
```
$ perf stat -a -M topdownl1 -j -I 1000
...
{"interval" : 11.127757275, "counter-value" : "85715898.000000", "unit" : "", "event" : "IDQ.MITE_UOPS", "event-runtime" : 988376123, "pcnt-running" : 100.00, "metric-value" : "0.000000", "metric-unit" : "(null)"}
...
```
If there is no unit then drop the metric-value too as:
Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241017175356.783793-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Return earlier for an empty unit case. If snprintf of the fmt doesn't
produce digits between vals and ends, as happens with NaN, make the
value "none" as happens in print_metric_end.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241017175356.783793-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The print_metric parameter names were rearranged, fix and add comments
in the stat-shadow callers to ensure they are correct.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241017175356.783793-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add printf format checking to vararg printf routines in
color.h. Resolve build errors/bugs that are found through this
checking.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241017175356.783793-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add missing dwarf_cfi_end to free memory associated with probe_finder
cfi_eh which is allocated and owned via a call to
dwarf_getcfi_elf. Confusingly cfi_dbg shouldn't be freed as its memory
is owned by the passed in debuginfo struct. Add comments to highlight
this.
This addresses leak sanitizer issues seen in:
tools/perf/tests/shell/test_uprobe_from_different_cu.sh
Fixes: 270bde1e76 ("perf probe: Search both .eh_frame and .debug_frame sections for probe location")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20241016235622.52166-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The insn argument passed to cs_disasm needs freeing. To support
accurately having count, add an additional free_count variable.
Fixes: c5d60de181 ("perf annotate: Add support to use libcapstone in powerpc")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20241016235622.52166-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
perf list picks the events supported for specific platform
from pmu-events/arch/powerpc/<platform>. Example power10 events
are in pmu-events/arch/powerpc/power10, power9 events are part
of pmu-events/arch/powerpc/power9. The decision of which
platform to pick is determined based on PVR value in powerpc.
The PVR value is matched from pmu-events/arch/powerpc/mapfile.csv
Example:
Format:
PVR,Version,JSON/file/pathname,Type
0x004[bcd][[:xdigit:]]{4},1,power8,core
0x0066[[:xdigit:]]{4},1,power8,core
0x004e[[:xdigit:]]{4},1,power9,core
0x0080[[:xdigit:]]{4},1,power10,core
0x0082[[:xdigit:]]{4},1,power10,core
The code gets the PVR from system using get_cpuid_str function
in arch/powerpc/util/headers.c ( from SPRN_PVR ) and compares
with value from mapfile.csv
In case of compat mode, say when partition is booted in a power9
mode when the system is a power10, this picks incorrectly. Because
PVR will point to power10 where as it should pick events from power9
folder. To support generic events, add new folder
pmu-events/arch/powerpc/compat to contain the ISA architected events
which is supported in compat mode. Also return 0x00ffffff as pvr
when booted in compat mode. Based on this pvr value, json will
pick events from pmu-events/arch/powerpc/compat
Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Cc: akanksha@linux.ibm.com
Cc: hbathini@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20241010145107.51211-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
perf list picks the events supported for specific platform
from pmu-events/arch/powerpc/<platform>. Example power10 events
are in pmu-events/arch/powerpc/power10, power9 events are part
of pmu-events/arch/powerpc/power9. The decision of which
platform to pick is determined based on PVR value in powerpc.
The PVR value is matched from pmu-events/arch/powerpc/mapfile.csv
Example:
Format:
PVR,Version,JSON/file/pathname,Type
0x004[bcd][[:xdigit:]]{4},1,power8,core
0x0066[[:xdigit:]]{4},1,power8,core
0x004e[[:xdigit:]]{4},1,power9,core
0x0080[[:xdigit:]]{4},1,power10,core
0x0082[[:xdigit:]]{4},1,power10,core
The code gets the PVR from system using get_cpuid_str function
in arch/powerpc/util/headers.c ( from SPRN_PVR ) and compares
with value from mapfile.csv
In case of compat mode, say when partition is booted in a power9
mode when the system is a power10, add an entry to pick the
ISA architected events from "pmu-events/arch/powerpc/compat".
Add json file generic-events.json which will contain these
events which is supported in compat mode.
Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Cc: akanksha@linux.ibm.com
Cc: hbathini@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20241010145107.51211-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
During the rework of the dso structure in patch ee756ef749 an
increment was forgotten for the symtab_type in case the data for
the kernel module are compressed. This affects the probing of the
kernel modules, which fails if the data are not already cached.
Increment the value of the symtab_type to its compressed variant so the
data could be recovered successfully.
Fixes: ee756ef749 ("perf dso: Add reference count checking and accessor functions")
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Acked-by: Michael Petlan <mpetlan@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20241010144836.16424-1-vmolnaro@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
If a symbol name is longer than the maximum event length (64 bytes), the
perf tool reports error:
# perf probe -x test_cpp_mangle --add "this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)"
snprintf() failed: -7; the event name nbase='this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)' is too long
Error: Failed to add events.
An information is missed in the log that the symbol name and the event
name can be set separately. Especially, this is recommended for adding
probe for a long symbol.
This commit refines the log for reminding event syntax.
After:
# perf probe -x test_cpp_mangle --add "this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)"
snprintf() failed: -7; the event name 'this_is_a_very_very_long_print_data_abcdefghijklmnopqrstuvwxyz(int)' is too long
Hint: Set a shorter event with syntax "EVENT=PROBEDEF"
EVENT: Event name (max length: 64 bytes).
Error: Failed to add events.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20241012204725.928794-4-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In the kernel, the probe group string length is limited up to
MAX_EVENT_NAME_LEN (including the NULL terminator).
Check for this limitation and report an error if it is exceeded.
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20241012204725.928794-3-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The MAX_EVENT_NAME_LEN macro has been defined in the kernel. Use the
same definition in the tool for more readable.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20241012204725.928794-2-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
On my system, perf list is very slow to print the whole events. I think
there's a performance issue in SDT and uprobes event listing. I noticed
this issue while running perf test on x86 but it takes long to check
some CoreSight event which should be skipped quickly.
Anyway, some test uses perf list to check whether the required event is
available before running the test. The perf list command can take an
argument to specify event class or (glob) pattern. But glob pattern is
only to suppress output for unmatched ones after checking all events.
In this case, specifying event class is better to reduce the number of
events it checks and to avoid buggy subsystems entirely.
No functional changes intended.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20241016065654.269994-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When adding a explicit beautifier for the 'write' syscall when the BPF
based buffer collector was introduced there was a cut'n'paste error that
carried the syscall_fmt->errpid setting from a nearby syscall (waitid)
that returns a pid.
So the write return was being suppressed by the return pretty printer,
remove that field, reverting it back to the default return handler, that
prints positive numbers as-is and interpret negative values as errnos.
I actually introduced the problem while making Howard's original patch
work just with the 'write' syscall, as we couldn't just look for any
buffers, the ones that are filled in by the kernel couldn't use the same
sys_enter BPF collector.
Fixes: b257fac12f ("perf trace: Pretty print buffer data")
Reported-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/lkml/bcf50648-3c7e-4513-8717-0d14492c53b9@linaro.org
Link: https://lore.kernel.org/all/Zt8jTfzDYgBPvFCd@x1/#t
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leverage the existed function perf_pmu__name_from_config() to check if
an event is topdown metrics event. perf_pmu__name_from_config() goes
through the defined formats and figures out the config of pre-defined
topdown events.
This avoids to figure out the config of topdown pre-defined events with
hard-coded format strings "event=" and "umask=" and provides more
flexibility.
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241011110207.1032235-2-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The commit "3b5edc0421e2 (perf x86/topdown: Don't move topdown metric
events in group)" modifies topdown metrics comparator to move topdown
metrics events which are not in same group with previous event. But it
just modifies the 2nd comparator and causes the comparators become
asymmetric.
Thus modify the 1st topdown metrics comparator and make the two
comparators be symmetric, and refine the comments as well.
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241011110207.1032235-1-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add Cortex-A720, Cortex-A725, Cortex-X1C, Cortex-X3 and Cortex-X925 into
the common data source encoding list. For everyone of these CPUs, it
technical reference manual defines the data source packet as the common
encoding format.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-8-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add Neoverse-V2 MIDR to the common data source encoding range list.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-7-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The 'midr' field is replaced by the MIDR values stored in metadata (per
CPU wise). Remove the 'midr' field as it is no longer used.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-6-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Use the info in the metadata to decide if the data source feature is
supported. The CPU MIDR must be in the CPU list for the common data
source encoding.
For the metadata version 1, it doesn't include info for MIDR. In this
case, due to absent info for making decision, print out warning to
remind users to upgrade tool and returns false.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-5-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Introduce the arm_spe__is_homogeneous() function, it uses to check if
Arm SPE is homogeneous cross all CPUs.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-4-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The Neoverse CPUs follow the common data source encoding, and other
CPU variants can share the same format.
Rename the CPU list and data source definitions as common data source
names. This change prepares for appending more CPU variants.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-3-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The arm_spe__synth_data_source_generic() function is invoked when the
tool detects that CPUs do not support data source packets and falls back
to synthesizing only the memory level.
Rename it to arm_spe__synth_memory_level() for better reflecting its
purpose.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-2-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
As Ian Rogers <irogers@google.com> pointed out, intel-cqm.c is neither
used nor built. It was deleted in the following commit:
commit b24413180f ("License cleanup: add SPDX GPL-2.0 license identifier to files with no license")
However, it resurfaced soon after in the following commit:
commit 5c9295bfe6 ("perf tests: Remove Intel CQM perf test")
It should be deleted once and for all.
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Matt Fleming <mfleming@cloudflare.com>
Link: https://lore.kernel.org/r/20241011055700.4142694-1-howardchu95@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
It should not clear the inherit bit simply because the kernel doesn't
support the sample read with it. IOW the inherit bit should be kept
when the sample read is not requested for the event.
Fixes: 90035d3cd8 ("tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events")
Acked-by: Ben Gainey <ben.gainey@arm.com>
Link: https://lore.kernel.org/r/20241009062250.730192-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
pre-migration wait time is the time that a task unnecessarily spends
on the runqueue of a CPU but doesn't get switched-in there. In terms
of tracepoints, it is the time between sched:sched_wakeup and
sched:sched_migrate_task.
Let's say a task woke up on CPU2, then it got migrated to CPU4 and
then it's switched-in to CPU4. So, here pre-migration wait time is
time that it was waiting on runqueue of CPU2 after it is woken up.
The general pattern for pre-migration to occur is:
sched:sched_wakeup
sched:sched_migrate_task
sched:sched_switch
The sched:sched_waking event is used to capture the wakeup time,
as it aligns with the existing code and only introduces a negligible
time difference.
pre-migrations are generally not useful and it increases migrations.
This metric would be helpful in testing patches mainly related to wakeup
and load-balancer code paths as better wakeup logic would choose an
optimal CPU where task would be switched-in and thereby reducing pre-
migrations.
The sample output(s) when -P or --pre-migrations is used:
=================
time cpu task name wait time sch delay run time pre-mig time
[tid/pid] (msec) (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- --------- ---------
38456.720806 [0001] schbench[28634/28574] 4.917 4.768 1.004 0.000
38456.720810 [0001] rcu_preempt[18] 3.919 0.003 0.004 0.000
38456.721800 [0006] schbench[28779/28574] 23.465 23.465 1.999 0.000
38456.722800 [0002] schbench[28773/28574] 60.371 60.237 3.955 60.197
38456.722806 [0001] schbench[28634/28574] 0.004 0.004 1.996 0.000
38456.722811 [0001] rcu_preempt[18] 1.996 0.005 0.005 0.000
38456.723800 [0000] schbench[28833/28574] 4.000 4.000 3.999 0.000
38456.723800 [0004] schbench[28762/28574] 42.951 42.839 3.999 39.867
38456.723802 [0007] schbench[28812/28574] 43.947 43.817 3.999 40.866
38456.723804 [0001] schbench[28587/28574] 7.935 7.822 0.993 0.000
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Link: https://lore.kernel.org/r/20241004170756.18064-1-vineethr@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The hashmap__for_each_entry[_safe] is accessing 'map' as if it's a
pointer. But it does without parentheses so passing a static hash map
with an ampersand (like &slab_hash below) caused compiler warnings due
to unmatched types.
In file included from util/bpf_lock_contention.c:5:
util/bpf_lock_contention.c: In function ‘exit_slab_cache_iter’:
linux/tools/perf/util/hashmap.h:169:32: error: invalid type argument of ‘->’ (have ‘struct hashmap’)
169 | for (bkt = 0; bkt < map->cap; bkt++) \
| ^~
util/bpf_lock_contention.c:105:9: note: in expansion of macro ‘hashmap__for_each_entry’
105 | hashmap__for_each_entry(&slab_hash, cur, bkt)
| ^~~~~~~~~~~~~~~~~~~~~~~
/home/namhyung/project/linux/tools/perf/util/hashmap.h:170:31: error: invalid type argument of ‘->’ (have ‘struct hashmap’)
170 | for (cur = map->buckets[bkt]; cur; cur = cur->next)
| ^~
util/bpf_lock_contention.c:105:9: note: in expansion of macro ‘hashmap__for_each_entry’
105 | hashmap__for_each_entry(&slab_hash, cur, bkt)
| ^~~~~~~~~~~~~~~~~~~~~~~
Cc: bpf@vger.kernel.org
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241009202009.884884-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmcMPK0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGw5kH/0GukMc4uUytezog
7UxIxa0G1zvwJwAhJpNCZR90e8GQ7YCvJFUOxjX3bVqjxZlCjEJ3YWC3fZNdx8YS
fOjbuZlGiTmyKg91aVYlby5l23N+r2u6gCDBdPfJD0japiIbayBKjrL+hdEicmf3
w6qToMY20mdvRQ6SUd+Y9nrc//TONru4EhabqRU2Sf1sDzQd1qj4WPtDLSKp3YG9
hpFR7YeJaSYDjwRz1vF8tEnQVJ4I2Df3lXJZYsoSsqiQhQ1Lasp4a09ppVPysj6x
oQCza6xeR3jwKib23pZIbNAF4xPMdN1OMOiYELkgHo7YGc6kxniXqSVSrP3LAvkA
b92bQpc=
=T5hJ
-----END PGP SIGNATURE-----
Merge tag 'v6.12-rc3' into perf-tools-next
To get the fixes in the current perf-tools tree.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
util/tool_pmu.c: In function 'evsel__tool_pmu_read':
util/tool_pmu.c:419:55: error: passing argument 2 of 'tool_pmu__read_event' from incompatible pointer type [-Werror=incompatible-pointer-types]
419 | if (!tool_pmu__read_event(ev, &val)) {
| ^~~~
| |
| long unsigned int *
util/tool_pmu.c:335:56: note: expected 'u64 *' {aka 'long long unsigned int *'} but argument is of type 'long unsigned int *'
335 | bool tool_pmu__read_event(enum tool_pmu_event ev, u64 *result)
| ~~~~~^~~~~~
Link: https://lore.kernel.org/r/Zw1XIGML32VaxE0t@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The testcase for tool_pmu failed in powerpc as below:
./perf test -v "Parsing without PMU name"
8: Tool PMU :
8.1: Parsing without PMU name : FAILED!
This happens when parse_events results in either skip or fail
of an event. Because the code invokes evlist__delete(evlist)
and "goto out".
ret = parse_events(evlist, str, &err);
if (ret) {
evlist__delete(evlist);
But in the "out" section also evlist__delete happens.
out:
evlist__delete(evlist);
return ret;
Hence remove the duplicate evlist__delete from the first path
in the testcase
With the change:
# ./perf test -v "Parsing without PMU name"
8: Tool PMU :
8.1: Parsing without PMU name : Ok
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: akanksha@linux.ibm.com
Cc: hbathini@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20241013170732.71339-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
perf fails to compile on systems with GCC version11
as below:
In file included from /usr/include/string.h:519,
from /home/athir/perf-tools-next/tools/include/linux/bitmap.h:5,
from /home/athir/perf-tools-next/tools/perf/util/pmu.h:5,
from /home/athir/perf-tools-next/tools/perf/util/evsel.h:14,
from /home/athir/perf-tools-next/tools/perf/util/evlist.h:14,
from tests/tool_pmu.c:3:
In function ‘strncpy’,
inlined from ‘do_test’ at tests/tool_pmu.c:25:3:
/usr/include/bits/string_fortified.h:95:10: error: ‘__builtin_strncpy’ specified bound 128 equals destination size [-Werror=stringop-truncation]
95 | return __builtin___strncpy_chk (__dest, __src, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
96 | __glibc_objsize (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~
The compile error is from strncpy refernce in do_test:
strncpy(str, tool_pmu__event_to_str(ev), sizeof(str));
This behaviour is not observed with GCC version 8, but observed
with GCC version 11 . This is message from gcc for detecting
truncation while using strncpu. Use snprintf instead of strncpy
here to be safe.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: akanksha@linux.ibm.com
Cc: hbathini@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20241013173742.71882-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The original commit message:
"
Use current sort mechanism but the real .se_cmp() just returns 0 so
that new columns "Predicted", "Abort" and "Cycles" are created in display
but actually these keys are not the sort keys.
For example:
Overhead Source:Line Symbol Shared Object Predicted Abort Cycles
........ ............ ........ ............. ......... ..... ......
38.25% div.c:45 [.] main div 97.6% 0 3
"
Update missed commit from series "perf report: Show branch flags/cycles
in --branch-history callgraph view" to apply to current repository so that
new columns described above are visible.
Link to original series:
https://lore.kernel.org/lkml/1477876794-30749-1-git-send-email-yao.jin@linux.intel.com/
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20241010184046.203822-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ensure parsing with and without PMU creates events with the expected
config values. This ensures the tool.json doesn't get out of sync with
tool_pmu_event enum.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Use the regular PMU approaches with tool json events to reduce the
amount of special tool_pmu code - tool_pmu__config_terms and
tool_pmu__for_each_event_cb are removed. Some functions remain, like
tool_pmu__str_to_event, as conveniences to metricgroups. Add
tool_pmu__skip_event/tool_pmu__num_skip_events to handle the case that
tool json events shouldn't appear on certain architectures. This isn't
done in jevents.py due to complexity in the empty-pmu-events.c and
when all vendor json is built into the tool.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Introduce the notion of a common architecture/model that can be used
to find event tables for common PMUs like the tool PMU. By having tool
events be json standard PMU attribute configuration, descriptions,
etc. can be used and these routines are already optimized for things
like binary searching.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add the expr literals like "#smt_on" as tool events, this allows stat
events to give the values. On my laptop with hyperthreading enabled:
```
$ perf stat -e "has_pmem,num_cores,num_cpus,num_cpus_online,num_dies,num_packages,smt_on,system_tsc_freq" true
Performance counter stats for 'true':
0 has_pmem
8 num_cores
16 num_cpus
16 num_cpus_online
1 num_dies
1 num_packages
1 smt_on
2,496,000,000 system_tsc_freq
0.001113637 seconds time elapsed
0.001218000 seconds user
0.000000000 seconds sys
```
And with hyperthreading disabled:
```
$ perf stat -e "has_pmem,num_cores,num_cpus,num_cpus_online,num_dies,num_packages,smt_on,system_tsc_freq" true
Performance counter stats for 'true':
0 has_pmem
8 num_cores
16 num_cpus
8 num_cpus_online
1 num_dies
1 num_packages
0 smt_on
2,496,000,000 system_tsc_freq
0.000802115 seconds time elapsed
0.000000000 seconds user
0.000806000 seconds sys
```
As zero matters for these values, in stat-display
should_skip_zero_counter only skip the zero value if it is not the
first aggregation index.
The tool event implementations are used in expr but not evaluated as
events for simplicity. Also core_wide isn't made a tool event as it
requires command line parameters.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Now the events are associated with the tool PMU, rename the functions
to reflect this.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To better reflect the events listed are from the tool PMU. Rename the
enum values from PERF_TOOL_* to TOOL_PMU__EVENT_*.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Rather than treat tool events as a special kind of event, create a
tool only PMU where the events/aliases match the existing
duration_time, user_time and system_time events. Remove special
parsing and printing support for the tool events, but add function
calls for when PMU functions are called on a tool_pmu.
Move the tool PMU code in evsel into tool_pmu.c to better encapsulate
the tool event behavior in that file.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Expose config_term_name as parse_events__term_type_str so that PMUs not
in pmu.c may access it.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Hard coded terms like "config=10" are skipped by perf_pmu__config
assuming they were already applied to a perf_event_attr by parse
event's config_attr function. When doing a reverse number to name
lookup in perf_pmu__name_from_config, as the hardcoded terms aren't
applied the config value is incorrect leading to misses or false
matches. Fix this by adding a parameter to have perf_pmu__config apply
hardcoded terms too (not just in parse event's config_term_common).
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Use ifs rather than ?: to avoid a large compound statement.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
color_fwrite_lines() was added by 2009's commit
8fc0321f1a ("perf_counter tools: Add color terminal output support")
but has never been used.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241009003938.254936-1-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
add_perf_probe_events has been unused since 2015's commit
b02137cc65 ("perf probe: Move print logic into cmd_probe()")
which confusingly now uses perf_add_probe_events.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240929010659.430208-1-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
- Fix an assert() to handle captured and unprocessed ARM CoreSight CPU traces.
- Fix static build compilation error when libdw isn't installed or is too old.
- Add missing include when building with !HAVE_DWARF_GETLOCATIONS_SUPPORT.
- Add missing refcount put on 32-bit DSOs.
- Fix disassembly of user space binaries by setting the binary_type of DSO when
loading.
- Update headers with the kernel sources, including asound.h, sched.h, fcntl,
msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's cputype.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZwU2dgAKCRCyPKLppCJ+
J8uaAQDEbp0lMf1S/Y6vOGbnP6mGQCewQsXtIpSA4gcRMWlCCgD+O6ZxbnBCHOzn
nQfBmbT62qUGuUA38Mg7pCyRXBd8FgU=
=s4JZ
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix an assert() to handle captured and unprocessed ARM CoreSight CPU
traces
- Fix static build compilation error when libdw isn't installed or is
too old
- Add missing include when building with
!HAVE_DWARF_GETLOCATIONS_SUPPORT
- Add missing refcount put on 32-bit DSOs
- Fix disassembly of user space binaries by setting the binary_type of
DSO when loading
- Update headers with the kernel sources, including asound.h, sched.h,
fcntl, msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's
cputype.h
* tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace
perf build: Fix build feature-dwarf_getlocations fail for old libdw
perf build: Fix static compilation error when libdw is not installed
perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT
tools headers arm64: Sync arm64's cputype.h with the kernel sources
perf tools: Cope with differences for lib/list_sort.c copy from the kernel
tools check_headers.sh: Add check variant that excludes some hunks
perf beauty: Update copy of linux/socket.h with the kernel sources
tools headers UAPI: Sync the linux/in.h with the kernel sources
perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools include UAPI: Sync linux/fcntl.h copy with the kernel sources
tools include UAPI: Sync linux/sched.h copy with the kernel sources
tools include UAPI: Sync sound/asound.h copy with the kernel sources
perf vdso: Missed put on 32-bit dsos
perf symbol: Set binary_type of dso when loading
With the patch 0b6c5371c0 "Add missing topdown metrics events" eight
topdown metric events with numbers ranging from 0x8000 to 0x8700 were
added to the test since they were added as 'perf stat' default events.
Later the patch 951efb9976 "Update no event/metric expectations" kept
only 4 of those events(0x8000-0x8300).
Currently, the topdown events with numbers 0x8400 to 0x8700 are missing
from the list of expected events resulting in a failure. Add back the
missing topdown events.
Fixes: 951efb9976 ("perf test attr: Update no event/metric expectations")
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: mpetlan@redhat.com
Link: https://lore.kernel.org/r/20240311081611.7835-1-vmolnaro@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This commit dumps metadata with version 2. It dumps metadata for header
and per CPU data respectively in the arm_spe_print_info() function to
support metadata version 2 format.
After:
0 0 0x3c0 [0x1b0]: PERF_RECORD_AUXTRACE_INFO type: 4
Header version :2
Header size :4
PMU type v2 :13
CPU number :8
Magic :0x1010101010101010
CPU # :0
Num of params :3
MIDR :0x410fd801
PMU Type :-1
Min Interval :0
Magic :0x1010101010101010
CPU # :1
Num of params :3
MIDR :0x410fd801
PMU Type :-1
Min Interval :0
Magic :0x1010101010101010
CPU # :2
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :3
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :4
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :5
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :6
Num of params :3
MIDR :0x410fd850
PMU Type :-1
Min Interval :0
Magic :0x1010101010101010
CPU # :7
Num of params :3
MIDR :0x410fd850
PMU Type :-1
Min Interval :0
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-6-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This commit is to support metadata version 2 and at the meantime it is
backward compatible for version 1's format.
The metadata version 1 doesn't include the ARM_SPE_HEADER_VERSION field.
As version 1 is fixed with two u64 fields, by checking the metadata
size, it distinguishes the metadata is version 1 or version 2 (and any
new versions if later will have). For version 2, it reads out CPU number
and retrieves the metadata info for every CPU.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-5-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Save the Arm SPE information on a per-CPU basis. This approach is easier
in the decoding phase for retrieving metadata based on the CPU number of
every Arm SPE record.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-4-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The metadata is designed to contain a header and per CPU information.
The arm_spe_find_cpus() function is introduced to identify how many CPUs
support ARM SPE. Based on the CPU number, calculates the metadata size.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-3-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The first version's metadata header structure doesn't include a field to
indicate a header version, which is not friendly for extension.
Define the metadata version 2 format with a new header structure and
extend per CPU's metadata. In the meantime, the old metadata header will
still be supported for backward compatibility.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-2-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
There is a difference between the SYNOPSIS section of the help message
and the man page (tools/perf/Documentation/perf-list.txt) for the perf
list command. After checking, we found that the help message reflected
the latest specifications. Therefore, revised the SYNOPSIS section of
the man page to match the help message.
Signed-off-by: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Liang
Link: https://lore.kernel.org/r/20241003002404.2592094-1-fj5100bi@fujitsu.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Test "Setup struct perf_event_attr" consists of multiple test cases that
can affect the max sample rate value for perf events. Some test cases
check this value as it should not be lowered under the set minimum for
the given test. Currently, it is possible for the test cases to affect
each other as the previous tests can lower the sample rate, leading to
a possible failure of some of the future test cases as the value is not
restored at any point.
# 10: Setup struct perf_event_attr:
--- start ---
test child forked, pid 104220
Using CPUID 0x00000000413fd0c1
running './tests/attr/test-record-C0'
Current sample rate: 10000
running './tests/attr/test-record-basic'
Current sample rate: 900
running './tests/attr/test-record-branch-any'
Current sample rate: 600
running './tests/attr/test-record-dummy-C0'
Current sample rate: 600
expected sample_period=4000, got 600
FAILED './tests/attr/test-record-dummy-C0' - match failure
Restore the max sample rate value for perf events to a reasonable value
before each test case if its value was lowered too much to ensure the
same conditions for each test case.
# 10: Setup struct perf_event_attr:
--- start ---
test child forked, pid 107222
Using CPUID 0x00000000413fd0c1
running './tests/attr/test-record-C0'
Current sample rate: 10000
running './tests/attr/test-record-basic'
Current sample rate: 800
running './tests/attr/test-record-branch-any'
Current sample rate: 700
unsupp './tests/attr/test-record-branch-any'
running './tests/attr/test-record-branch-filter-any'
Current sample rate: 10000
running './tests/attr/test-record-count'
Current sample rate: 10000
running './tests/attr/test-record-data'
Current sample rate: 600
running './tests/attr/test-record-dummy-C0'
Current sample rate: 800
running './tests/attr/test-record-freq'
Current sample rate: 10000
...
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Radostin Stoyanov <rstoyano@redhat.com>
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241003125136.15918-1-vmolnaro@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Command perf test 86 fails on s390:
# perf test -F 86
ping 868299 [007] 28248.013596: probe_libc:inet_pton_1: (3ff95948020)
3ff95948020 inet_pton+0x0 (inlined)
3ff9595e6e7 text_to_binary_address+0x1007 (inlined)
3ff9595e6e7 gaih_inet+0x1007 (inlined)
FAIL: expected backtrace entry \
"main\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$"
got "3ff9595e6e7 gaih_inet+0x1007 (inlined)"
86: probe libc's inet_pton & backtrace it with ping : FAILED!
#
The root cause is a new stack layout, two functions have been added
as seen below.
# perf script | tac | grep -m1 '^ping' -B9 | tac
ping 866856 [007] 25979.494921: probe_libc:inet_pton: (3ff8ec48020)
3ff8ec48020 inet_pton+0x0 (inlined)
new --> 3ff8ec5e6e7 text_to_binary_address+0x1007 (inlined)
new --> 3ff8ec5e6e7 gaih_inet+0x1007 (inlined)
3ff8ec5e6e7 getaddrinfo+0x1007 (/usr/lib64/libc.so.6)
2aa3fe04bf5 main+0xff5 (/usr/bin/ping)
3ff8eb34a5b __libc_start_call_main+0x8b (/usr/lib64/libc.so.6)
3ff8eb34b5d __libc_start_main@GLIBC_2.2+0xad (inlined)
2aa3fe06a1f [unknown] (/usr/bin/ping)
#
The new functions in the call chain are:
- text_to_binary_address()
- gaih_inet().
Both functions are inlined and do not show up in the output
of the nm command:
# nm -a /usr/lib64/libc.so.6 | \
grep -E '(text_to_binary_address|gaih_inet)$'
#
There is no possibility to add these 2 functions depending on their
existance in the C library.
Add text_to_binary_address() and gaih_inet() to the list of
expected functions in an compatible way and extend the regular
expression. On s390 the backtrace can now be
Before After
probe_libc:inet_pton probe_libc:inet_pton
inet_pton inet_pton
getaddrinfo getaddrinfo | text_to_binary_address
main main | gaih_inet
Output after:
# perf test -F 86
86: probe libc's inet_pton & backtrace it with ping : Ok
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: agordeev@linux.ibm.com
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20241001124224.3370306-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The "perf record" tool will now default to this new mode if the user
specifies a sampling group when not in system-wide mode, and when
"--no-inherit" is not specified.
This change updates evsel to allow the combination of inherit
and PERF_SAMPLE_READ.
A fallback is implemented for kernel versions where this feature is not
supported.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Cc: james.clark@arm.com
Link: https://lore.kernel.org/r/20241001121505.1009685-3-ben.gainey@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Sample period calculation in deliver_sample_value is updated to
calculate the per-thread period delta for events that are inherit +
PERF_SAMPLE_READ. When the sampling event has this configuration, the
read_format.id is used with the tid from the sample to lookup the
storage of the previously accumulated counter total before calculating
the delta. All existing valid configurations where read_format.value
represents some global value continue to use just the read_format.id to
locate the storage of the previously accumulated total.
perf_sample_id is modified to support tracking per-thread
values, along with the existing global per-id values. In the
per-thread case, values are stored in a hash by tid within the
perf_sample_id, and are dynamically allocated as the number is not known
ahead of time.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Cc: james.clark@arm.com
Link: https://lore.kernel.org/r/20241001121505.1009685-2-ben.gainey@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Clean up return value to be TEST_* rather than unspecific integer. Add
test case skip reason. Skip test if EACCES comes back from
evsel__newtp.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Clean up return value to be TEST_* rather than unspecific integer. Add
test case skip reason. Skip test if EACCES comes back from
evsel__newtp.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
These error paths occur without sufficient permissions. Fix the memory
leaks to make leak sanitizer happier.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The "perf all PMU test" fails on a Coffee Lake machine.
The failure is caused by the below change in the commit e2641db83f
("perf vendor events: Add/update skylake events/metrics").
+ {
+ "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles",
+ "Counter": "FIXED",
+ "EventCode": "0xff",
+ "EventName": "UNC_CLOCK.SOCKET",
+ "PerPkg": "1",
+ "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.",
+ "Unit": "cbox_0"
}
The other cbox events have the unit name "CBOX", while the fixed counter
has a unit name "cbox_0". So the events_table will maintain separate
entries for cbox and cbox_0.
The perf_pmus__print_pmu_events() calculates the total number of events,
allocate an aliases buffer, store all the events into the buffer, sort,
and print all the aliases one by one.
The problem is that the calculated total number of events doesn't match
the stored events in the aliases buffer.
The perf_pmu__num_events() is used to calculate the number of events. It
invokes the pmu_events_table__num_events() to go through the entire
events_table to find all events. Because of the
pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
the events for cbox and cbox_0 are all counted.
When storing events into the aliases buffer, the
perf_pmu__for_each_event() only process the events for cbox.
Since a bigger buffer was allocated, the last entry are all 0.
When printing all the aliases, null will be outputted, and trigger the
failure.
The mismatch was introduced from the commit e3edd6cf63 ("perf
pmu-events: Reduce processed events by passing PMU"). The
pmu_events_table__for_each_event() stops immediately once a pmu is set.
But for uncore, especially this case, the method is wrong and mismatch
what perf does in the perf_pmu__num_events().
With the patch,
$ perf list pmu | grep -A 1 clock.socket
unc_clock.socket
[This 48-bit fixed counter counts the UCLK cycles. Unit: uncore_cbox_0
$ perf test "perf all PMU test"
107: perf all PMU test : Ok
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/all/202407101021.2c8baddb-oliver.sang@intel.com/
Fixes: e3edd6cf63 ("perf pmu-events: Reduce processed events by passing PMU")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241001021431.814811-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
If one builds perf with DEBUG=1, captures data on multiple CPUs and
finally runs 'perf report -C <cpu>' for only one of the cpus, assert()
aborts the program. This happens because there are empty queues with
format set.
This patch changes the condition to abort only if a queue is not empty
and if the format is unset.
$ make -C tools/perf DEBUG=1 CORESIGHT=1 CSLIBS=/usr/lib CSINCLUDES=/usr/include install
$ perf record -o kcore --kcore -e cs_etm/timestamp/k -s -C 0-1 dd if=/dev/zero of=/dev/null bs=1M count=1
$ perf report --input kcore/data --vmlinux=/home/ikoskine/projects/linux/vmlinux -C 1
Aborted (core dumped)
Fixes: 57880a7966 ("perf: cs-etm: Allocate queues for all CPUs")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240924233930.5193-1-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For libdw versions below 0.177, need to link libdl.a in addition to
libbebl.a during static compilation, otherwise
feature-dwarf_getlocations compilation will fail.
Before:
$ make LDFLAGS=-static
BUILD: Doing 'make -j20' parallel build
<SNIP>
Makefile.config:483: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157
<SNIP>
$ cat ../build/feature/test-dwarf_getlocations.make.output
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libebl.a(eblclosebackend.o): in function `ebl_closebackend':
(.text+0x20): undefined reference to `dlclose'
collect2: error: ld returned 1 exit status
After:
$ make LDFLAGS=-static
<SNIP>
Auto-detecting system features:
... dwarf: [ on ]
<SNIP>
$ ./perf probe
Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
or: perf probe [<options>] --del '[GROUP:]EVENT' ...
or: perf probe --list [GROUP:]EVENT ...
<SNIP>
Fixes: 536661da6e ("perf: build: Only link libebl.a for old libdw")
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240919013513.118527-3-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If libdw is not installed in build environment, the output of
'pkg-config --modversion libdw' is empty, causing LIBDW_VERSION_2 to be
empty and the shell test will have the following error:
/bin/sh: 1: test: -lt: unexpected operator
Before:
$ pkg-config --modversion libdw
Package libdw was not found in the pkg-config search path.
Perhaps you should add the directory containing `libdw.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libdw' found
$ make LDFLAGS=-static -j16
BUILD: Doing 'make -j20' parallel build
<SNIP>
Package libdw was not found in the pkg-config search path.
Perhaps you should add the directory containing `libdw.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libdw' found
/bin/sh: 1: test: -lt: unexpected operator
After:
1. libdw is not installed:
$ pkg-config --modversion libdw
Package libdw was not found in the pkg-config search path.
Perhaps you should add the directory containing `libdw.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libdw' found
$ make LDFLAGS=-static -j16
BUILD: Doing 'make -j20' parallel build
<SNIP>
Package libdw was not found in the pkg-config search path.
Perhaps you should add the directory containing `libdw.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libdw' found
Makefile.config:473: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR
2. libdw version is lower than 0.177
$ pkg-config --modversion libdw
0.176
$ make LDFLAGS=-static -j16
BUILD: Doing 'make -j20' parallel build
<SNIP>
Auto-detecting system features:
... dwarf: [ on ]
<SNIP>
INSTALL libsubcmd_headers
INSTALL libapi_headers
INSTALL libperf_headers
INSTALL libsymbol_headers
INSTALL libbpf_headers
LINK perf
3. libdw version is higher than 0.177
$ pkg-config --modversion libdw
0.186
$ make LDFLAGS=-static -j16
BUILD: Doing 'make -j20' parallel build
<SNIP>
Auto-detecting system features:
... dwarf: [ on ]
<SNIP>
CC util/bpf-utils.o
CC util/pfm.o
LD util/perf-util-in.o
LD perf-util-in.o
AR libperf-util.a
LINK perf
Fixes: 536661da6e ("perf: build: Only link libebl.a for old libdw")
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240919013513.118527-2-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The linked fixes commit added an #include "dwarf-aux.h" to disasm.h
which gets picked up in a lot of places. Without
HAVE_DWARF_GETLOCATIONS_SUPPORT the stubs return an errno, so include
errno.h to fix the following build error:
In file included from util/disasm.h:8,
from util/annotate.h:16,
from builtin-top.c:23:
util/dwarf-aux.h: In function 'die_get_var_range':
util/dwarf-aux.h:183:10: error: 'ENOTSUP' undeclared (first use in this function)
183 | return -ENOTSUP;
| ^~~~~~~
Fixes: 782959ac24 ("perf annotate: Add "update_insn_state" callback function to handle arch specific instruction tracking")
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241001123625.1063153-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With 6d74e1e371 ("tools/lib/list_sort: remove redundant code for
cond_resched handling") we need to use the newly added hunk based
exceptions when comparing the copy we carry in tools/lib/ to the
original file, do it by adding the hunks that we know will be the
expected diff.
If at some point the original file is updated in other parts, then we
should flag and check the file for update.
Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/20240930202136.16904-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With 6d74e1e371 ("tools/lib/list_sort: remove redundant code for
cond_resched handling") we end up with a multi-line variation in the
merge_final() implementation, one that the simple line based exceptions
we had so far can't cope.
Thus this check has been failing:
Warning: Kernel ABI header differences:
diff -u tools/lib/list_sort.c lib/list_sort.c
So add a new check routine that uses grep -vf to exclude some hunks that
we store in the tools/perf/check-header_ignore_hunks/ directory.
This first patch is just the new check routine, the next one will use it
to check lib/list_sort.c.
Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/20240930202136.16904-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add more test cases to cover all supported topdown events regroup cases.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-7-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add counting and leader sampling tests to verify topdown events including
raw format can be reordered correctly.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-6-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add leader sampling test to validate event counts are captured into
record and the count value is consistent.
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-5-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
when running below perf command, we say error is reported.
perf record -e "{slots,instructions,topdown-retiring}:S" -vv -C0 sleep 1
------------------------------------------------------------
perf_event_attr:
type 4 (cpu)
size 168
config 0x400 (slots)
sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
read_format ID|GROUP|LOST
disabled 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 4 (cpu)
size 168
config 0x8000 (topdown-retiring)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
read_format ID|GROUP|LOST
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8
sys_perf_event_open failed, error -22
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for
event (topdown-retiring).
The reason of error is that the events are regrouped and
topdown-retiring event is moved to closely after the slots event and
topdown-retiring event needs to do the sampling, but Intel PMU driver
doesn't support to sample topdown metrics events.
For topdown metrics events, it just requires to be in a group which has
slots event as leader. It doesn't require topdown metrics event must be
closely after slots event. Thus it's a overkill to move topdown metrics
event closely after slots event in events regrouping and furtherly cause
the above issue.
Thus don't move topdown metrics events forward if they are already in a
group.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-4-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Addresses an issue where, in the absence of a topdown metrics event
within a sampling group, the slots event was incorrectly bypassed as
the sampling leader when sample_read was enabled.
perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1
In this case, the slots event should be sampled as leader but the
branches event is sampled in fact like the verbose output shows.
perf_event_attr:
type 4 (cpu)
size 168
config 0x400 (slots)
sample_type IP|TID|TIME|READ|CPU|IDENTIFIER
read_format ID|GROUP|LOST
disabled 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 168
config 0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
{ sample_period, sample_freq } 10000
sample_type IP|TID|TIME|READ|CPU|IDENTIFIER
read_format ID|GROUP|LOST
sample_id_all 1
exclude_guest 1
The sample period of slots event instead of branches event is reset to
0.
This fix ensures the slots event remains the leader under these
conditions.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-3-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
It's not complete to check whether an event is a topdown slots or
topdown metrics event by only comparing the event name since user
may assign the event by RAW format, e.g.
perf stat -e '{instructions,cpu/r400/,cpu/r8300/}' sleep 1
Performance counter stats for 'sleep 1':
<not counted> instructions
<not counted> cpu/r400/
<not supported> cpu/r8300/
1.002917796 seconds time elapsed
0.002955000 seconds user
0.000000000 seconds sys
The RAW format slots and topdown-be-bound events are not recognized and
not regroup the events, and eventually cause error.
Thus add two helpers arch_is_topdown_slots()/arch_is_topdown_metrics()
to detect whether an event is topdown slots/metrics event by comparing
the event config directly, and use these two helpers to replace the
original event name comparisons.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-2-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In __evsel__config_callchain avoid computing arch until code path that
uses it.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20240918223116.127386-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick the changes in:
8f0b3cc9a4 ("tcp: RX path for devmem TCP")
That don't result in any changes in the tables generated from that
header.
But while updating I noticed we need to support the new MSG_SOCK_DEVMEM
flag in the hard coded table for the msg flags table, add it.
This silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Please see tools/include/uapi/README for details.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrO_eT9e_41xrNv@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the change in:
a1fab3e69d ("x86/irq: Fix comment on IRQ vector layout")
That just adds some comments, so no changes in perf tooling, just
silences this build warning:
diff -u tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h arch/x86/include/asm/irq_vectors.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/ZvrKT7oQc1AOv6Vk@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Picking the changes from:
4356d575ef ("fhandle: expose u64 mount id to name_to_handle_at(2)")
b4fef22c2f ("uapi: explain how per-syscall AT_* flags should be allocated")
820a185896 ("fcntl: add F_CREATED_QUERY")
It just moves AT_REMOVEDIR around, and adds a bunch more AT_ for
renameat2() and name_to_handle_at(). We need to improve this situation,
as not all AT_ defines are applicable to all fs flags...
This adds support for those new AT_ defines, addressing this build
warning:
diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h
Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrIKL3cREoRHIQd@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Picking the changes from:
f0e1a0643a ("sched_ext: Implement BPF extensible scheduler class")
The inclusion of the SCHED_EXT define doesn't cause any change in
behaviour in tools/perf.
This just silences this perf tools build warning:
diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrDShNVXotZpiwk@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Picking the changes from:
37745918e0 ("ALSA: timer: Introduce virtual userspace-driven timers")
Which entails no changes in the tooling side as it only introduces new
SNDRV_TIMER_IOCTL_ ioctls, and the ones tracked by scripts in
tools/perf/trace/beauty/ are only SNDRV_PCM_IOCTL_ and SNDRV_CTL_IOCTL_,
we still need to support SNDRV_TIMER_IOCTL_ ones, but that probably will
be one of the first for a BTF enumeration based approach :-)
This silences this perf tools build warning:
diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ivan Orlov <ivan.orlov0322@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/lkml/ZvrB-g_E7g2ArlYW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If the dso type doesn't match then NULL is returned but the dso should
be put first.
Fixes: f649ed80f3 ("perf dsos: Tidy reference counting and locking")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240912182757.762369-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test 70 takes a long time. One culprit is the output of command
perf annotate. Per default enabled are
- demangle symbol names
- interleave source code with assembly code.
Disable demangle of symbols and abort the annotation
after the first 250 lines.
This speeds up the test case considerable, for example
on s390:
Output before:
# time perf test 70
70: perf annotate basic tests : Ok
.....
real 2m7.467s
user 1m26.869s
sys 0m34.086s
#
Output after:
# time perf test 70
70: perf annotate basic tests : Ok
real 0m3.341s
user 0m1.606s
sys 0m0.362s
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20240917085706.249691-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
With commit 8ec9497d3e ("tools/include: Sync uapi/linux/perf.h
with the kernel sources"), 'perf mem report' gives an incorrect memory
access string.
...
0.02% 1 3644 L5 hit [.] 0x0000000000009b0e mlc [.] 0x00007fce43f59480
...
This occurs because, if no entry exists in mem_lvlnum, perf_mem__lvl_scnprintf
will default to 'L%d, lvl', which in this case for PERF_MEM_LVLNUM_L2_MHB is 0x05.
Add entries for PERF_MEM_LVLNUM_L2_MHB and PERF_MEM_LVLNUM_MSC to mem_lvlnum,
so that the correct strings are printed.
...
0.02% 1 3644 L2 MHB hit [.] 0x0000000000009b0e mlc [.] 0x00007fce43f59480
...
Fixes: 8ec9497d3e ("tools/include: Sync uapi/linux/perf.h with the kernel sources")
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20240926144040.77897-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The sleep_sem semaphore and the specific_wait field (member of sched_atom)
are initialized but not used anywhere in the code, so this patch removes
them.
The SCHED_EVENT_MIGRATION case in perf_sched__process_event() is currently
not used and is also removed.
Additionally, prev_state in add_sched_event_sleep() is marked with
__maybe_unused and is not utilized anywhere in the function. This patch
removes the parameter.
If the task_state parameter was intended for future use, it can be
reintroduced when needed.
No functionality change intended.
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240917090100.42783-1-vineethr@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Test that one cycles event is opened for each core PMU when "perf stat"
is run without arguments.
The event line can either be output as "pmu/cycles/" or just "cycles" if
there is only one PMU. Include 2 spaces for padding in the one PMU case
to avoid matching when the word cycles is included in metric
descriptions.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-8-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
PMUs aren't listed in /sys/devices/ on DT devices, so change the search
directory to /sys/bus/event_source/devices which works everywhere. Also
add armv8_cortex_* as a known PMU type to search for to make the test
run on more devices.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-7-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
"evsel->pmu_name" is only ever assigned a strdup of "pmu->name", a
strdup of "evsel->pmu_name" or NULL. As such, prefer to use
"pmu->name" directly and even to directly compare PMUs than PMU
names. For safety, add some additional NULL tests.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
[ Fix arm-spe.c usage of pmu_name and empty PMU name ]
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-6-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Use PMU interface to better detect core PMU for legacy events. Look
for slots event on core PMU if it is appropriate for the event.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-5-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
add_default_atttributes would add evsels by having pre-created
perf_event_attr, however, this needed fixing for hybrid as the
extended PMU type was necessary for each core PMU. The logic for this
was in an arch specific x86 function and wasn't present for ARM,
meaning that default events weren't being opened on all PMUs on
ARM. Change the creation of the default events to use parse_events and
strings as that will open the events on all PMUs.
Rather than try to detect events on PMUs before parsing, parse the
event but skip its output in stat-display.
The previous order of hardware events was: cycles,
stalled-cycles-frontend, stalled-cycles-backend, instructions. As
instructions is a more fundamental concept the order is changed to:
instructions, cycles, stalled-cycles-frontend, stalled-cycles-backend.
Closes: https://lore.kernel.org/lkml/CAP-5=fVABSBZnsmtRn1uF-k-G1GWM-L5SgiinhPTfHbQsKXb_g@mail.gmail.com/
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
[Don't display unsupported default events except 'cycles']
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-4-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Without aggregation on Intel:
```
$ perf stat -e instructions,cycles ...
```
Will use "cycles" for the name of the legacy cycles event but as
"instructions" has a sysfs name it will and a "[cpu]" PMU suffix. This
often breaks things as the space between the event and the PMU name
look like an extra column. The existing uniquify logic was also
uniquifying in cases when all events are core and not with uncore
events, it was not correctly handling modifiers, etc.
Change the logic so that an initial pass that can disable
uniquification is run. For individual counters, disable uniquification
in more cases such as for consistency with legacy events or for
libpfm4 events. Don't use the "[pmu]" style suffix in uniquification,
always use "pmu/.../". Change how modifiers/terms are handled in the
uniquification so that they look like parse-able events.
This fixes "102: perf stat metrics (shadow stat) test:" that has been
failing due to "instructions [cpu]" breaking its column/awk logic when
values aren't aggregated. This started happening when instructions
could match a sysfs rather than a legacy event, so the fixes tag
reflects this.
Fixes: 617824a7f0 ("perf parse-events: Prefer sysfs/JSON hardware events over legacy")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
[ Fix Intel TPEBS counting mode test ]
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-3-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
There are cases where we want to match events like instructions and
cycles with legacy hardware values, in particular in stat-shadow's
hard coded metrics. An evsel's name isn't a good point of reference as
it gets altered, strstr would be too imprecise and re-parsing the
event from its name is silly. Instead, hold the legacy hardware event
name, determined during parsing, in the evsel for this matching case.
Inline evsel__match2 that is only used in builtin-diff.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-2-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Refactor code to have some more error diagnosis on traps, etc. and to
do less work on each line. Add an ignore situation for security failures.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240925173013.12789-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When it loads symbols from an ELF file, it loads label symbols which is
0 size. Sometimes it has the same address with other symbols and might
shadow the original symbols because it fixes up the size of the symbol.
For example, in my system __do_softirq is shadowed and only accepts the
__softirqentry_text_start instead. But it should accept __do_softirq.
$ readelf -sW vmlinux | grep -e __do_softirq -e __softirqentry_text_start
105089: ffffffff82000000 814 FUNC GLOBAL DEFAULT 1 __do_softirq
111954: ffffffff82000000 0 NOTYPE GLOBAL DEFAULT 1 __softirqentry_text_start
$ perf annotate --stdio __do_softirq
Error:
The perf.data data has no samples!
$ perf annotate --stdio __softirqentry_text_start | head
Percent | Source code & Disassembly of vmlinux for cycles (26 samples, percent: local period)
---------------------------------------------------------------------------------------------------
: 0 0xffffffff82000000 <__softirqentry_text_start>:
0.00 : ffffffff82000000: nopl (%rax,%rax)
30.77 : ffffffff82000005: pushq %rbp
3.85 : ffffffff82000006: movq %rsp, %rbp
0.00 : ffffffff82000009: pushq %r15
3.85 : ffffffff8200000b: pushq %r14
3.85 : ffffffff8200000d: pushq %r13
0.00 : ffffffff8200000f: pushq %r12
We can ignore NOTYPE symbols in the symbols__fixup_end() so that it can
pick the __do_softirq() in choose_best_symbol(). This should be fine
since most symbols have either STT_FUNC or STT_OBJECT.
Link: https://lore.kernel.org/r/20240912224208.3360116-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Exit when run_perf_stat() returns an error to avoid continuously
repeating the same error message. It's not expected that COUNTER_FATAL
or internal errors are recoverable so there's no point in retrying.
This fixes the following flood of error messages for permission issues,
for example when perf_event_paranoid==3:
perf stat -r 1044 -- false
Error:
Access to performance monitoring and observability operations is limited.
...
Error:
Access to performance monitoring and observability operations is limited.
...
(repeating for 1044 times).
Signed-off-by: Levi Yun <yeoreum.yun@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: nd@arm.com
Cc: howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240925132022.2650180-3-yeoreum.yun@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When create_perf_stat_counter() failed, it doesn't close workload.cork_fd
open in evlist__prepare_workload(). This could make too many open file
error while __run_perf_stat() repeats.
Introduce evlist__cancel_workload to close workload.cork_fd and
wait workload.child_pid until exit to clear child process
when create_perf_stat_counter() is failed.
Signed-off-by: Levi Yun <yeoreum.yun@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: nd@arm.com
Cc: howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240925132022.2650180-2-yeoreum.yun@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In non-FHS compliant distros like NixOS, nothing resides in `/bin`
and `/usr/bin`. Instead dynamically symlinked into
`/run/current-system/sw/bin/`, the executable resides in `/nix/store`.
With this patch,`/bin` prefix from the dmesg command in the error
message is stripped.
Link: https://github.com/NixOS/nixpkgs/pull/258027
Signed-off-by: Masum Reza <masumrezarock100@gmail.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20240922112619.149429-1-masumrezarock100@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Run a few samples through the disassembly script and check to see that
at least one branch instruction is printed.
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-8-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Make it possible to only disassemble a range of timestamps or sample
indexes. This will be used by the test to limit the runtime, but it's
also useful for users.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-7-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Make vmlinux detection automatic and use Perf's default objdump
when -d is specified. This will make it easier for a test to use the
script without having to provide arguments. And similarly for users.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-6-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
optparse is deprecated and less flexible than argparse so update it.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-5-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This can be used to get config values like which objdump Perf uses for
disassembly.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-4-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Previously when the incorrect binary was used for decode, Perf would
silently continue to generate incorrect samples. With OpenCSD 1.5.4 we
can enable consistency checks that do a best effort to detect a mismatch
in the image. When one is detected a warning is printed and sample
generation stops until the trace resynchronizes with a good part of the
image.
Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-3-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
cs_etm__flush(), like cs_etm__sample() is an operation that generates a
sample and then swaps the current with the previous packet. Calling
flush after processing the queues results in two swaps which corrupts
the next sample. Therefore it wasn't appropriate to call flush here so
remove it.
Flushing is still done on a discontinuity to explicitly clear the last
branch buffer, but when the packet_queue fills up before reaching a
timestamp, that's not a discontinuity and the call to
cs_etm__process_traceid_queue() already generated samples and drained
the buffers correctly.
This is visible by looking for a branch that has the same target as the
previous branch and the following source is before the address of the
last target, which is impossible as execution would have had to have
gone backwards:
ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94
(packet_queue fills here before a timestamp, resulting in a flush and
branch target ffff80008011cadc is duplicated.)
ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff80008011cadc update_sg_lb_stats+0x94
ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34
After removing the flush the correct branch target is used for the
second sample, and ffff8000801117c4 is no longer before the previous
address:
ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94
ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff8000801117a0 cpu_util+0x0
ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34
Make sure that a final branch stack is output at the end of the trace
by calling cs_etm__end_block(). This is already done for both the
timeless decode paths.
Fixes: 21fe8dc119 ("perf cs-etm: Add support for CPU-wide trace scenarios")
Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-2-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Previously "set -e" meant any non-zero exit code from perf stat would
cause a test failure. As a non-zero exit happens when there aren't
sufficient permissions, check for this case and make the exit code
2/skip for it.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240502223115.2357499-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
For the kernel dso, it sets the binary type of dso when loading the
symbol table. But it seems not to do that for user DSOs. Actually
it sets the symtab type only. It's not clear why we want to maintain
the two separately but it uses the binary type info before getting
the disassembly.
Let's use the symtab type as binary type too if it's not set. I think
it's ok to set the binary type when it founds a symsrc whether or not
it has actual symbols.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Alexander Monakov <amonakov@ispras.ru>
Link: https://lore.kernel.org/r/20240426215139.1271039-1-namhyung@kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: <linux-perf-users@vger.kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With that it uses the generic BTF based pretty printer:
This one we need to think about, not being acquainted with this syscall,
should we _traverse_ that list somehow? Would that be useful?
root@number:~# perf trace -e set_robust_list sleep 1
0.000 ( 0.004 ms): sleep/1206493 set_robust_list(head: (struct robust_list_head){.list = (struct robust_list){.next = (struct robust_list *)0x7f48a9a02a20,},.futex_offset = (long int)-32,}, len: 24) =
root@number:~#
strace prints the default integer args:
root@number:~# strace -e set_robust_list sleep 1
set_robust_list(0x7efd99559a20, 24) = 0
+++ exited with 0 +++
root@number:~#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org
Link: https://lore.kernel.org/lkml/ZuH6MquMraBvODRp@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
No event is printed in the "Branch Counter" column on hybrid machines.
For example,
$ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter
$ perf report --total-cycles
# Branch counter abbr list:
# cpu_core/branch-instructions/pp = A
# cpu_core/branches/ = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter
# ............... .............. ........... .......... ..............
44.54% 727.1K 0.00% 1 |+ |+ |
36.31% 592.7K 0.00% 2 |+ |+ |
17.83% 291.1K 0.00% 1 |+ |+ |
The branch counter information (br_cntr_width and br_cntr_nr) in the
perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS
is not available on hybrid machines. Without the width information, the
number of occurrences of an event cannot be calculated.
For a hybrid machine, the caps information should be retrieved from the
PMU_CAPS, and stored in the perf_env->pmu_caps.
Add a perf_env__find_br_cntr_info() to return the correct branch counter
information from the corresponding fields.
Committer notes:
While testing I couldn't s ee those "Branch counter" columns enabled by
pressing 'B' on the TUI, after reporting it to the list Kan explained
the situation:
<quote Kan Liang>
For a hybrid client, the "Branch Counter" feature is only supported
starting from the just released Lunar Lake. Perf falls back to only
"ANY" on your Raptor Lake.
The "The branch counter is not available" message is expected.
Here is the 'perf evlist' result from my Lunar Lake machine,
# perf evlist -v
cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|READ|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID|GROUP|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY|COUNTERS
#
</quote>
Fixes: 6f9d8d1de2 ("perf script: Add branch counters")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240909184201.553519-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An event group is a critical relationship. There is a -g option that can
display the relationship. But it's hard for a user to know when should
this option be applied.
If there is an event group in the perf record, print a hint to suggest
the user apply the -g to display the group information.
With the patch,
$ perf record -e "{cycles,instructions},instructions" sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (4 samples) ]
$
$ perf evlist
cycles
instructions
instructions
# Tip: use 'perf evlist -g' to show group information
$ perf evlist -g
{cycles,instructions}
instructions
$
Committer testing:
So for a perf.data file _with_ a group:
root@number:~# perf evlist -g
{cpu_core/branch-instructions/pp,cpu_core/branches/}
dummy:u
root@number:~# perf evlist
cpu_core/branch-instructions/pp
cpu_core/branches/
dummy:u
# Tip: use 'perf evlist -g' to show group information
root@number:~#
Then for something _without_ a group, no hint:
root@number:~# perf record ls
<SNIP>
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.035 MB perf.data (7 samples) ]
root@number:~# perf evlist
cpu_atom/cycles/P
cpu_core/cycles/P
dummy:u
root@number:~#
No suggestion, good.
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Closes: https://lore.kernel.org/lkml/ZttgvduaKsVn1r4p@x1/
Link: https://lore.kernel.org/r/20240908202847.176280-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-O6 is very much not-a-thing. Really, this should've been dropped
entirely in 49b3cd306e ("tools: Set the maximum optimization level
according to the compiler being used") instead of just passing it for
not-Clang.
Just collapse it down to -O3, instead of "-O6 unless Clang, in which case
-O3".
GCC interprets > -O3 as -O3. It doesn't even interpret > -O3 as -Ofast,
which is a good thing, given -Ofast has specific (non-)requirements for
code built using it. So, this does nothing except look a bit daft.
Remove the silliness and also save a few lines in the Makefiles accordingly.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com>
Signed-off-by: Sam James <sam@gentoo.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/4f01524fa4ea91c7146a41e26ceaf9dae4c127e4.1725821201.git.sam@gentoo.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
All PMU events are assumed to be "Kernel PMU event", however, this
isn't true for fake PMUs and won't be true with the addition of more
software PMUs. Make the PMU's type description name configurable -
largely for printing callbacks.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240907050830.6752-5-irogers@google.com
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently tool events use a dedicated variable within the evsel. Later
changes will move this to the unused struct perf_event_attr config for
these events. Add an accessor to allow the later change to be well
typed and avoid changing all uses.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rather than passing a fake PMU around, just pass that the fake PMU
should be used - true when doing testing. Move the fake PMU into
pmus.[ch] and try to abstract the PMU's properties in pmu.c, ie so
there is less "if fake_pmu" in non-PMU code. Give the fake PMU a made
up type number.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240907050830.6752-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If a desc string is 0 length then -1 will be out of bounds, add a
check.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240907050830.6752-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix a typo in comments.
Reported-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20240907131006.18510-1-algonell@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To make error messages more accurate, this change detects whether ftrace is
enabled on system by checking trace file "set_ftrace_pid".
Before:
# perf ftrace
failed to reset ftrace
#
After:
# perf ftrace
ftrace is not supported on this system
#
Committer testing:
Doing it in an unprivileged toolbox container on Fedora 40:
Before:
acme@number:~/git/perf-tools-next$ toolbox enter perf
⬢[acme@toolbox perf-tools-next]$ sudo su -
⬢[root@toolbox ~]# ~acme/bin/perf ftrace
failed to reset ftrace
⬢[root@toolbox ~]#
After this patch:
⬢[root@toolbox ~]# ~acme/bin/perf ftrace
ftrace is not supported on this system
⬢[root@toolbox ~]#
Maybe we could check if we are in such as situation, inside an
unprivileged container, and provide a HINT line?
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240911100126.900779-1-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas reported the vfs_getname perf tests failing on s/390, it seems it
was just to some extraneous '=' somehow getting into the regexp, remove
it, now:
root@x1:~# perf test getname
91: Add vfs_getname probe to get syscall args filenames : Ok
93: Use vfs_getname probe to get syscall args filenames : FAILED!
126: Check open filename arg using perf trace + vfs_getname : Ok
root@x1:~#
Second one remains a mistery, have to take some time to nail it down.
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>,
Link: https://lore.kernel.org/lkml/1d7f3b7b-9edc-4d90-955c-9345428563f1@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Howard reported problems using perf features that use BPF:
perf $ clang -v
Debian clang version 15.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /bin
Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
perf $ ./perf trace -e write --max-events=1
libbpf: prog 'sys_enter_rename': BPF program load failed: Permission denied
libbpf: prog 'sys_enter_rename': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
But it works with:
perf $ clang -v
Debian clang version 16.0.6 (15~deb12u1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /bin
Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
perf $ ./perf trace -e write --max-events=1
0.000 ( 0.009 ms): gmain/1448 write(fd: 4, buf: \1\0\0\0\0\0\0\0, count: 8) = 8 (kworker/0:0-eve)
perf $
So lets make that the required version, if you happen to have a slightly
older version where this work, please report so that we can adjust the
minimum required version.
Reported-by: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuGL9ROeTV2uXoSp@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The header files parse-events.h is included twice in parse-events.c,
so one inclusion of each can be removed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10822
Link: https://lore.kernel.org/r/20240910005522.35994-1-yang.lee@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In uses like 'perf inject' it is not necessary to gather the symbol for
each call chain location, the map for the sample IP is wanted so that
build IDs and the like can be injected. Make gathering the symbol in the
callchain_cursor optional.
For a 'perf inject -B' command this lowers the peak RSS from 54.1MB to
29.6MB by avoiding loading symbols.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add -B option that lazily inserts mmap2 events thereby dropping all
mmap events without samples. This is similar to the behavior of -b
where only build_id events are inserted when a dso is accessed in a
sample.
File size savings can be significant in system-wide mode, consider:
$ perf record -g -a -o perf.data sleep 1
$ perf inject -B -i perf.data -o perf.new.data
$ ls -al perf.data perf.new.data
5147049 perf.data
2248493 perf.new.data
Give test coverage of the new option in pipe test.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add an option that allows all mmap or mmap2 events to be rewritten as
mmap2 events with build IDs.
This is similar to the existing -b/--build-ids and --buildid-all options
except instead of adding a build_id event an existing mmap/mmap2 event
is used as a template and a new mmap2 event synthesized from it.
As mmap2 events are typical this avoids the insertion of build_id
events.
Add test coverage to the pipe test.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Build ID injection wasn't inserting a sample ID and aligning events to
64 bytes rather than 8. No sample ID means events are unordered and two
different build_id events for the same path, as happens when a file is
replaced, can't be differentiated.
Add in sample ID insertion for the build_id events alongside some
refactoring. The refactoring better aligns the function arguments for
different use cases, such as synthesizing build_id events without
needing to have a dso. The misc bits are explicitly passed as with
callchains the maps/dsos may span user and kernel land, so using
sample->cpumode isn't good enough.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The pr_debug_scope() is to print more information about the scope DIE
during the instruction tracking so that it can help finding relevant
debug info and the source code like inlined functions more easily.
$ perf --debug type-profile annotate --data-type
...
-----------------------------------------------------------
find data type for 0(reg0, reg12) at set_task_cpu+0xdd
CU for kernel/sched/core.c (die:0x1268dae)
frame base: cfa=1 fbreg=7
scope: [3/3] (die:12b6d28) [inlined] set_task_rq <<<--- (here)
bb: [9f - dd]
var [9f] reg3 type='struct task_struct*' size=0x8 (die:0x126aff0)
var [9f] reg6 type='unsigned int' size=0x4 (die:0x1268e0d)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I found some portion of mem-store events sampled on CALL instruction
which has no memory access. But it actually saves a return address
into stack. It should be considered as a stack operation like RET
instruction.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The new LLVM addr2line feature requires a minimum version of 13 to
compile. Add a feature check for the version so that NO_LLVM=1 doesn't
need to be explicitly added. Leave the existing llvm feature check
intact because it's used by tools other than Perf.
This fixes the following compilation error when the llvm-dev version
doesn't match:
util/llvm-c-helpers.cpp: In function 'char* llvm_name_for_code(dso*, const char*, u64)':
util/llvm-c-helpers.cpp:178:21: error: 'std::remove_reference_t<llvm::DILineInfo>' {aka 'struct llvm::DILineInfo'} has no member named 'StartAddress'
178 | addr, res_or_err->StartAddress ? *res_or_err->StartAddress : 0);
Fixes: c3f8644c21 ("perf report: Support LLVM for addr2line()")
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Manu Bretelle <chantr4@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <qmo@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20240910140405.568791-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If --force-btf is enabled, prefer btf_dump general pretty printer to
perf trace's customized pretty printers.
Mostly for debug purposes.
Committer testing:
diff before/after shows we need several improvements to be able to
compare the changes, first we need to cut off/disable mutable data such
as pids and timestamps, then what is left are the buffer addresses
passed from userspace, returned from kernel space, maybe we can ask
'perf trace' to go on making those reproducible.
That would entail a Pointer Address Translation (PAT) like for
networking, that would, for simple, reproducible if not for these
details, workloads, that we would then use in our regression tests.
Enough digression, this is one such diff:
openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) = 3
-fstat(fd: 3, statbuf: 0x7fff01f212a0) = 0
-read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 2998
-read(fd: 3, buf: 0x5596bab2d630, count: 4096) = 0
+fstat(fd: 3, statbuf: 0x7ffc163cf0e0) = 0
+read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 2998
+read(fd: 3, buf: 0x55b4e0631630, count: 4096) = 0
close(fd: 3) = 0
openat(dfd: CWD, filename: "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
openat(dfd: CWD, filename: "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
@@ -45,7 +45,7 @@
openat(dfd: CWD, filename: "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
openat(dfd: CWD, filename: "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
openat(dfd: CWD, filename: "/usr/share/locale/en/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
-{ .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7fff01f21990) = 0
+(struct __kernel_timespec){.tv_sec = (__kernel_time64_t)1,}, rmtp: 0x7ffc163cf7d0) =
The problem more close to our hands is to make the libbpf BTF pretty
printer to have a mode that closely resembles what we're trying to
resemble: strace output.
Being able to run something with 'perf trace' and with 'strace' and get
the exact same output should be of interest of anybody wanting to have
strace and 'perf trace' regression tested against each other.
That last part is 'perf trace' shot at being something so useful as
strace... ;-)
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240824163322.60796-8-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads
TRACE_AUG_MAX_BUF bytes of buffer maximum.
Determine what type of argument and how many bytes to read from user space, us ing the
value in the beauty_map. This is the relation of parameter type and its corres ponding
value in the beauty map, and how many bytes we read eventually:
string: 1 -> size of string (till null)
struct: size of struct -> size of struct
buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF)
After reading from user space, we output the augmented data using
bpf_perf_event_output().
If the struct augmenter, augment_sys_enter() failed, we fall back to
using bpf_tail_call().
I have to make the payload 6 times the size of augmented_arg, to pass the
BPF verifier.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-10-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-7-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum
buffer size we can augment. BPF will include this header too.
Print buffer in a way that's different than just printing a string, we
print all the control characters in \digits (such as \0 for null, and
\10 for newline, LF).
For character that has a bigger value than 127, we print the digits
instead of the character itself as well.
Committer notes:
Simplified the buffer scnprintf to avoid using multiple buffers as
discussed in the patch review thread.
We can't really all 'buf' args to SCA_BUF as we're collecting so far
just on the sys_enter path, so we would be printing the previous 'read'
arg buffer contents, not what the kernel puts there.
So instead of:
static int syscall_fmt__cmp(const void *name, const void *fmtp)
@@ -1987,8 +1989,6 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
- else if (strstr(field->type, "char *") && strstr(field->name, "buf"))
- arg->scnprintf = SCA_BUF;
Do:
static const struct syscall_fmt syscall_fmts[] = {
+ { .name = "write", .errpid = true,
+ .arg = { [1] = { .scnprintf = SCA_BUF /* buf */, from_user = true, }, }, },
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-8-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-6-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Set up beauty_map, load it to BPF, in such format: if argument No.3 is a
struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32;
if argument No.3 is a string (of syscall number 114) beauty_map[114][2] =
1;
if argument No.3 is a buffer, its size is indicated by argument No.4 (of
syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this
buffer size in BPF */
Committer notes:
Moved syscall_arg_fmt__cache_btf_struct() from a ifdef
HAVE_LIBBPF_SUPPORT to closer to where it is used, that is ifdef'ed on
HAVE_BPF_SKEL and thus breaks the build when building with
BUILD_BPF_SKEL=0, as detected using 'make -C tools/perf build-test'.
Also add 'struct beauty_map_enter' to tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
as we're using it in this patch, otherwise we get this while trying to
build at this point in the original patch series:
builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
builtin-trace.c:3725:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
3725 | int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
|
We also have to take into account syscall_arg_fmt.from_user when telling
the kernel what to copy in the sys_enter generic collector, we don't
want to collect bogus data in buffers that will only be available to us
at sys_exit time, i.e. after the kernel has filled it, so leave this for
when we have such a sys_exit based collector.
Committer testing:
Not wired up yet, so all continues to work, using the existing BPF
collector and userspace beautifiers that are augmentation aware:
root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
0.000 ( 0.031 ms): mv/20888 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
0.000 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
0.040 ( 0.003 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff17980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
0.480 ( 0.017 ms): ping/20892 sendto(fd: 5, buff: 0x7ffd82d07150, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
0.526 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0
0.542 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0
0.544 ( 0.004 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.135.100 }, addrlen: 16) = 0
0.559 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.135.100 }, addrlen: 16PING www.google.com (142.251.135.100) 56(84) bytes of data.
) = 0
0.589 ( 0.058 ms): ping/20892 sendto(fd: 3, buff: 0x560b4ff11ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.135.100 }, addr_len: 0x10) = 64
45.250 ( 0.029 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
45.344 ( 0.012 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff19340, len: 111, flags: DONTWAIT|NOSIGNAL) = 111
64 bytes from rio09s08-in-f4.1e100.net (142.251.135.100): icmp_seq=1 ttl=49 time=44.4 ms
--- www.google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 44.361/44.361/44.361/0.000 ms
root@number:~#
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-4-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-3-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This one has no specific pretty printer right now, so will be handled by
the generic BTF based one later in this patch series.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Paving the way for the generic BPF BTF based syscall arg augmenter.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Paving the way for the generic BPF BTF based syscall arg augmenter.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Paving the way for the generic BPF BTF based syscall arg augmenter.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We need to know where to collect it in the BPF augmenters, if in the
sys_enter hook or in the sys_exit hook.
Start with the SCA_FILENAME one, that is just from user to kernel space.
The alternative, better, but takes a bit more time than I have now, is
to use the __user information that is already in the syscall args and
encoded in BTF via a tag, do it later.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While trying to shape Howard Chu's generic BPF augmenter transition into
the codebase I got stuck with the renameat2 syscall.
Until I noticed that the attempt at reusing augmenters were making it
use the 'openat' syscall augmenter, that collect just one string syscall
arg, for the 'renameat2' syscall, that takes two strings.
So, for the moment, just to help in this transition period, since
'renameat2' is what is used these days in the 'mv' utility, just make
the BPF collector be associated with the more widely used syscall,
hopefully the transition to Howard's generic BPF augmenter will cure
this, so get this out of the way for now!
So now we still have that odd "reuse", but for something we're not
testing so won't get in the way anymore:
root@number:~# rm -f 987654 ; touch 123456 ; perf trace -vv -e rename* mv 123456 987654 |& grep renameat
Reusing "openat" BPF sys_enter augmenter for "renameat"
0.000 ( 0.079 ms): mv/1158612 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
root@number:~#
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fXjGYs=tpBgETK-P9U-CuXssytk9pSnTXpfphrmmOydWA@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A segmentation fault can be triggered when running
'perf mem record -e ldlat-loads'
The commit 35b38a71c9 ("perf mem: Rework command option handling")
moves the OPT_CALLBACK of event from __cmd_record() to cmd_mem().
When invoking the __cmd_record(), the 'mem' has been referenced (&).
So the &mem passed into the parse_record_events() is a double reference
(&&) of the original struct perf_mem mem.
But in the cmd_mem(), the &mem is the single reference (&) of the
original struct perf_mem mem.
Fixes: 35b38a71c9 ("perf mem: Rework command option handling")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905170737.4070743-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The p-core mem events are missed when launching 'perf mem record' on ADL
and RPL.
root@number:~# perf mem record sleep 1
Memory events are enabled on a subset of CPUs: 16-27
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.032 MB perf.data ]
root@number:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
A variable 'record' in the 'struct perf_mem_event' is to indicate
whether a mem event in a mem_events[] should be recorded. The current
code only configure the variable for the first eligible PMU.
It's good enough for a non-hybrid machine or a hybrid machine which has
the same mem_events[].
However, if a different mem_events[] is used for different PMUs on a
hybrid machine, e.g., ADL or RPL, the 'record' for the second PMU never
get a chance to be set.
The mem_events[] of the second PMU are always ignored.
'perf mem' doesn't support the per-PMU configuration now. A per-PMU
mem_events[] 'record' variable doesn't make sense. Make it global.
That could also avoid searching for the per-PMU mem_events[] via
perf_pmu__mem_events_ptr every time.
Committer testing:
root@number:~# perf evlist -g
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
cpu_core/mem-stores/P
dummy:u
root@number:~#
The :S for '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}' is
not being added by 'perf evlist -g', to be checked.
Fixes: abbdd79b78 ("perf mem: Clean up perf_mem_events__name()")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
Link: https://lore.kernel.org/r/20240905170737.4070743-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The current perf_pmu__mem_events_init() only checks the availability of
the mem_events for the first eligible PMU. It works for non-hybrid
machines and hybrid machines that have the same mem_events.
However, it may bring issues if a hybrid machine has a different
mem_events on different PMU, e.g., Alder Lake and Raptor Lake. A
mem-loads-aux event is only required for the p-core. The mem_events on
both e-core and p-core should be checked and marked.
The issue was not found, because it's hidden by another bug, which only
records the mem-events for the e-core. The wrong check for the p-core
events didn't yell.
Fixes: abbdd79b78 ("perf mem: Clean up perf_mem_events__name()")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905170737.4070743-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Running a script that processes PEBS records gives buffer overflows
in valgrind.
The problem is that the allocation of the register string doesn't
include the terminating 0 byte. Fix this.
I also replaced the very magic "28" with a more reasonable larger buffer
that should fit all registers. There's no need to conserve memory here.
==2106591== Memcheck, a memory error detector
==2106591== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==2106591== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2106591== Command: ../perf script -i tcall.data gcov.py tcall.gcov
==2106591==
==2106591== Invalid write of size 1
==2106591== at 0x713354: regs_map (trace-event-python.c:748)
==2106591== by 0x7134EB: set_regs_in_dict (trace-event-python.c:784)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442)
==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591==
==2106591== Invalid read of size 1
==2106591== at 0x484B6C6: strlen (vg_replace_strmem.c:502)
==2106591== by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899)
==2106591== by 0x7134F7: set_regs_in_dict (trace-event-python.c:786)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442)
==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591==
==2106591== Invalid write of size 1
==2106591== at 0x713354: regs_map (trace-event-python.c:748)
==2106591== by 0x713539: set_regs_in_dict (trace-event-python.c:789)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442)
==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591==
==2106591== Invalid read of size 1
==2106591== at 0x484B6C6: strlen (vg_replace_strmem.c:502)
==2106591== by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899)
==2106591== by 0x713545: set_regs_in_dict (trace-event-python.c:791)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442)
==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499)
==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531)
==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549)
==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534)
==2106591== by 0x6296D0: machines__deliver_event (session.c:1573)
==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655)
==2106591== by 0x625830: ordered_events__deliver_event (session.c:193)
==2106591== by 0x630B23: do_flush (ordered-events.c:245)
==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
==2106591==
73056 total, 29 ignored
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905151058.2127122-2-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Existing sys directories aren't placed under a model directory like
skylake.
Placing a sys directory there causes the `is_leaf_dir` test to fail and
consequently no events or metrics are generated for the model.
Ignore sys directories in this case and update the comments to
reflect why.
This change has no affect, but when testing with a sys directory for a
model people have reported running into the no event/metric issue.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240904211705.915101-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix two inconsistencies in feature names as discussed in [1]:
1. Rename "dwarf-unwind-support" to "dwarf-unwind"
2. 'get_cpuid' feature and 'HAVE_AUXTRACE_SUPPORT' names don't
look related, change the feature name to 'auxtrace' to match the
macro name, as 'get_cpuid' string is not used anywhere to check the
feature presence
[1]: https://lore.kernel.org/linux-perf-users/ZoRw5we4HLSTZND6@x1/
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904190132.415212-7-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In probe_vfs_getname.sh, current we use "perf record --dry-run"
to check for libtraceevent and skip the test if perf is not
build with libtraceevent. Change the check to use "perf check feature"
option
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904190132.415212-6-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we use output of 'perf version --build-options', to check
whether perf was built with libtraceevent support.
Instead, use 'perf check feature libtraceevent' to check for
libtraceevent support.
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904190132.415212-5-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that the feature list has been duplicated in a global
'supported_features' array, use that array instead of manually checking
status of built-in features.
This helps in being consistent with commands such as 'perf check feature',
so commands can use the same array, and any new feature can be added at
one place, in the 'supported_features' array
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904190132.415212-4-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When restricting jevents generated json lookup code with JEVENTS_MODEL
a list of models must be provided. Some builds don't know model names
but know cpuids. Add a command that can convert a cpuid to a model
using mapfile.csv files. This can be used with JEVENTS_MODEL like:
$ make JEVENTS_MODEL=`./pmu-events/models.py x86 'GenuineIntel-6-8D-1,AuthenticAMD-26-1' pmu-events/arch/`
Committer testing:
$ tools/perf/pmu-events/models.py x86 'GenuineIntel-6-8D-1,AuthenticAMD-26-1' tools/perf/pmu-events/arch/
tigerlake,amdzen5
$ perf stat -v sleep 1 |& head -1
Using CPUID GenuineIntel-6-B7-1
$ tools/perf/pmu-events/models.py x86 'GenuineIntel-6-B7-1' tools/perf/pmu-events/arch/
alderlake
$
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240904044351.712080-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the presence of a feature is checked with a combination of
perf version --build-options and greps, such as:
perf version --build-options | grep " on .* HAVE_FEATURE"
Instead of this, introduce a subcommand "perf check feature", with which
scripts can test for presence of a feature, such as:
perf check feature HAVE_FEATURE
'perf check feature' command is expected to have exit status of 0 if
feature is built-in, and 1 if it's not built-in or if feature is not known.
Multiple features can also be passed as a comma-separated list, in which
case the exit status will be 1 only if all of the passed features are
built-in. For example, with below command, it will have exit status of 0
only if both libtraceevent and bpf are enabled, else 1 in all other cases
perf check feature libtraceevent,bpf
The arguments are case-insensitive.
An array 'supported_features' has also been introduced that can be used by
other commands like 'perf version --build-options', so that new features
can be added in one place, with the array
Committer testing:
$ perf check feature libtraceevent,bpf
libtraceevent: [ on ] # HAVE_LIBTRACEEVENT
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
$ perf check feature libtraceevent
libtraceevent: [ on ] # HAVE_LIBTRACEEVENT
$ perf check feature bpf
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
$ perf check -q feature bpf && echo "BPF support is present"
BPF support is present
$ perf check -q feature Bogus && echo "Bogus support is present"
$
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904061836.55873-3-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, commands which depend on 'parse_options_subcommand()' don't
show the usage string, and instead show '(null)'
$ ./perf sched
Usage: (null)
-D, --dump-raw-trace dump raw trace in ASCII
-f, --force don't complain, do it
-i, --input <file> input file name
-v, --verbose be more verbose (show symbol address, etc)
'parse_options_subcommand()' is generally expected to initialise the usage
string, with information in the passed 'subcommands[]' array
This behaviour was changed in:
230a7a71f9 ("libsubcmd: Fix parse-options memory leak")
Where the generated usage string is deallocated, and usage[0] string is
reassigned as NULL.
As discussed in [1], free the allocated usage string in the main
function itself, and don't reset usage string to NULL in
parse_options_subcommand
With this change, the behaviour is restored.
$ ./perf sched
Usage: perf sched [<options>] {record|latency|map|replay|script|timehist}
-D, --dump-raw-trace dump raw trace in ASCII
-f, --force don't complain, do it
-i, --input <file> input file name
-v, --verbose be more verbose (show symbol address, etc)
[1]: https://lore.kernel.org/linux-perf-users/htq5vhx6piet4nuq2mmhk7fs2bhfykv52dbppwxmo3s7du2odf@styd27tioc6e/
Fixes: 230a7a71f9 ("libsubcmd: Fix parse-options memory leak")
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240904061836.55873-2-adityag@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On arm64 the breakpoint length should be 4-bytes but 8-bytes is
tolerated as perf passes that as sizeof(long). Just pass the correct
value.
On i386 the sizeof(long) check in the kernel needs to match the
kernel's long size. Check using an environment (uname checks) whether
4 or 8 bytes needs to be passed. Cache the value in a static.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240904050606.752788-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The default breakpoint length is "sizeof(long)" however this is
incorrect on platforms like Aarch64 where sizeof(long) is 8 but the
breakpoint length is 4. Add a helper function that can be used to
determine the correct breakpoint length, in this change it just
returns the existing default sizeof(long) value.
Use the helper in the bp_account test so that, when modifying the
event from a watchpoint to a breakpoint, the breakpoint length is
appropriate for the architecture and not just sizeof(long).
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240904050606.752788-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously the cpu_list is a string and typically no cpu_list is
passed to __add_event().
Wanting to make events have their cpus distinct from the PMU means that
in more occassions we want to pass a cpu_list.
If we're reading this from sysfs it is easier to read a perf_cpu_map
than allocate and pass around strings that will later be parsed.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Gautham Shenoy <gautham.shenoy@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240718003025.1486232-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge perf_pmu__parse_per_pkg() and perf_pmu__parse_snapshot() that do the
same parsing except for the file suffix used.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Gautham Shenoy <gautham.shenoy@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240718003025.1486232-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --show-prio option is used to display the priority of task.
It is disabled by default, which is consistent with original behavior.
The display format is xxx (priority does not change during task running)
or xxx->yyy (priority changes during task running)
Testcase:
# perf sched record nice -n 9 true
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 0.497 MB perf.data ]
# perf sched timehist -h
Usage: perf sched timehist [<options>]
-C, --cpu <cpu> list of cpus to profile
-D, --dump-raw-trace dump raw trace in ASCII
-f, --force don't complain, do it
-g, --call-graph Display call chains if present (default on)
-I, --idle-hist Show idle events only
-i, --input <file> input file name
-k, --vmlinux <file> vmlinux pathname
-M, --migrations Show migration events
-n, --next Show next task
-p, --pid <pid[,pid...]>
analyze events only for given process id(s)
-s, --summary Show only syscall summary with statistics
-S, --with-summary Show all syscalls and summary with statistics
-t, --tid <tid[,tid...]>
analyze events only for given thread id(s)
-V, --cpu-visual Add CPU visual
-v, --verbose be more verbose (show symbol address, etc)
-w, --wakeups Show wakeup events
--kallsyms <file>
kallsyms pathname
--max-stack <n> Maximum number of functions to display backtrace.
--show-prio Show task priority
--state Show task state when sched-out
--symfs <directory>
Look for files with symbols relative to this directory
--time <str> Time span for analysis (start,stop)
# perf sched timehist
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
23952.006537 [0000] perf[534] 0.000 0.000 0.000
23952.006593 [0000] migration/0[19] 0.000 0.014 0.056
23952.006899 [0001] perf[534] 0.000 0.000 0.000
23952.006947 [0001] migration/1[22] 0.000 0.015 0.047
23952.007138 [0002] perf[534] 0.000 0.000 0.000
<SNIP>
# perf sched timehist --show-prio
Samples of sched_switch event do not have callchains.
time cpu task name prio wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ -------- --------- --------- ---------
23952.006537 [0000] perf[534] 120 0.000 0.000 0.000
23952.006593 [0000] migration/0[19] 0 0.000 0.014 0.056
23952.006899 [0001] perf[534] 120 0.000 0.000 0.000
<SNIP>
23952.034843 [0003] nice[535] 120->129 0.189 0.024 23.314
<SNIP>
23952.053838 [0005] rcu_preempt[16] 120 3.993 0.000 0.023
23952.053990 [0005] <idle> 120 0.023 0.023 0.152
23952.054137 [0006] <idle> 120 1.427 1.427 17.855
23952.054278 [0007] <idle> 120 0.506 0.506 1.650
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819033016.2427235-2-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The BUG_ON(thread__tid(thread) != 0) in timehist_sched_change_event() is
redundant, remove it.
No functional change.
Fixes: 07235f84ec ("perf sched timehist: Add -I/--idle-hist option")
Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812132606.3126490-2-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In some situations 'perf script -F +brstackinsn' sees a lot of "not
reaching sample" messages.
This happens when the last LBR block before the sample contains a branch
that is not in the LBR, and the instruction dumping stops.
$ perf record -b emacs -Q --batch '()'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.396 MB perf.data (443 samples) ]
$ perf script -F +brstackinsn
...
00007f0ab2d171a4 insn: 41 0f 94 c0
00007f0ab2d171a8 insn: 83 fa 01
00007f0ab2d171ab insn: 74 d3 # PRED 6 cycles [313] 1.00 IPC
00007f0ab2d17180 insn: 45 84 c0
00007f0ab2d17183 insn: 74 28
... not reaching sample ...
$ perf script -F +brstackinsn | grep -c reach
136
$
This is a problem for further analysis that wants to see the full code
upto the sample.
There are two common cases where the message is bogus:
- The LBR only logs taken branches, but the branch might be a
conditional branch that is not taken (that is the most common case
actually)
- The LBR sampling uses a filter ignoring some branches, but the perf
script check checks for all branches.
This patch fixes these two conditions, by only checking for conditional
branches, as well as checking the perf_event_attr's branch filter
attributes.
For the test case above it fixes all the messages:
$ ./perf script -F +brstackinsn | grep -c reach
0
Note that there are still conditions when the message is hit --
sometimes there can be a unconditional branch that misses the LBR update
before the sample -- but they are much more rare now.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20240229161828.386397-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The control knobs set before loading BPF programs should be declared as
'const volatile' so that it can be optimized by the BPF core.
Committer testing:
root@x1:~# perf kwork report --use-bpf
Starting trace, Hit <Ctrl+C> to stop and report
^C
Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
--------------------------------------------------------------------------------------------------------------------------------
(w)intel_atomic_commit_work [ | 0009 | 18.680 ms | 2 | 18.553 ms | 362410.681580 s | 362410.700133 s |
(w)pm_runtime_work | 0007 | 13.300 ms | 1 | 13.300 ms | 362410.254996 s | 362410.268295 s |
(w)intel_atomic_commit_work [ | 0009 | 9.846 ms | 2 | 9.717 ms | 362410.172352 s | 362410.182069 s |
(w)acpi_ec_event_processor | 0002 | 8.106 ms | 1 | 8.106 ms | 362410.463187 s | 362410.471293 s |
(s)SCHED:7 | 0000 | 1.351 ms | 106 | 0.063 ms | 362410.658017 s | 362410.658080 s |
i915:157 | 0008 | 0.994 ms | 13 | 0.361 ms | 362411.222125 s | 362411.222486 s |
(s)SCHED:7 | 0001 | 0.703 ms | 98 | 0.047 ms | 362410.245004 s | 362410.245051 s |
(s)SCHED:7 | 0005 | 0.674 ms | 42 | 0.074 ms | 362411.483039 s | 362411.483113 s |
(s)NET_RX:3 | 0001 | 0.556 ms | 10 | 0.079 ms | 362411.066388 s | 362411.066467 s |
<SNIP>
root@x1:~# perf trace -e bpf --max-events 5 perf kwork report --use-bpf
0.000 ( 0.016 ms): perf/2948007 bpf(cmd: 36, uattr: 0x7ffededa6660, size: 8) = -1 EOPNOTSUPP (Operation not supported)
0.026 ( 0.106 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6390, size: 148) = 12
0.152 ( 0.032 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6450, size: 148) = 12
26.247 ( 0.138 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6300, size: 148) = 12
26.396 ( 0.012 ms): perf/2948007 bpf(uattr: 0x7ffededa64b0, size: 80) = 12
Starting trace, Hit <Ctrl+C> to stop and report
root@x1:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240902200515.2103769-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
i386 only supports watchpoints up to size 4, 8 bytes causes extra
counts and test failures.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The probe command is dependent on libelf. Skip the test if the
required probe command isn't present.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "time utils" test fails in 32-bit builds:
...
parse_nsec_time("18446744073.709551615")
Failed. ptime 4294967295709551615 expected 18446744073709551615
...
Switch strtoul to strtoull as an unsigned long in 32-bit build isn't
64-bits.
Fixes: c284d669a2 ("perf tools: Move parse_nsec_time to time-utils.c")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The hex PMU suffix maybe 64-bit but the comparisons were "unsigned
long" or 32-bit on 32-bit systems. This was causing the "PMU name
comparison" test to fail in a 32-bit build.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Support using LLVM as a disassembler method, allowing helperless
annotation in non-distro builds. (It is also much faster than
using libbfd or bfd objdump on binaries with a lot of debug
information.)
This is nearly identical to the output of llvm-objdump; there are
some very rare whitespace differences, some minor changes to demangling
(since we use perf's regular demangling and not LLVM's own) and
the occasional case where llvm-objdump makes a different choice
when multiple symbols share the same address.
It should work across all of LLVM's supported architectures, although
I've only tested 64-bit x86, and finding the right triple from perf's
idea of machine architecture can sometimes be a bit tricky. Ideally, we
should have some way of finding the triplet just from the file itself.
Committer notes:
Address this on 32-bit systems by using PRIu64 from inttypes.h
3 17.58 almalinux:9-i386 : FAIL gcc version 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC)
util/llvm-c-helpers.cpp: In function ‘char* make_symbol_relative_string(dso*, const char*, u64, u64)’:
util/llvm-c-helpers.cpp:150:52: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘u64’ {aka
+‘long long unsigned int’} [-Werror=format=]
150 | snprintf(buf, sizeof(buf), "%s+0x%lx",
| ~~^
| |
| long unsigned int
| %llx
151 | demangled ? demangled : sym_name, addr - base_addr);
| ~~~~~~~~~~~~~~~~
| |
| u64 {aka long long unsigned int}
cc1plus: all warnings being treated as errors
Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240803152008.2818485-3-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Capstone disassembler code has a useful code snippet to read the
bytes for a given code symbol into memory. Split it out into its own
function, so that the LLVM disassembler can use it in the next patch.
Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240803152008.2818485-2-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In addition to the existing support for libbfd and calling out to
an external addr2line command, add support for using libllvm directly.
This is both faster than libbfd, and can be enabled in distro builds
(the LLVM license has an explicit provision for GPLv2 compatibility).
Thus, it is set as the primary choice if available.
As an example, running 'perf report' on a medium-size profile with
DWARF-based backtraces took 58 seconds with LLVM, 78 seconds with
libbfd, 153 seconds with external llvm-addr2line, and I got tired and
aborted the test after waiting for 55 minutes with external bfd
addr2line (which is the default for perf as compiled by distributions
today).
Evidently, for this case, the bfd addr2line process needs 18 seconds (on
a 5.2 GHz Zen 3) to load the .debug ELF in question, hits the 1-second
timeout and gets killed during initialization, getting restarted anew
every time. Having an in-process addr2line makes this much more robust.
As future extensions, libllvm can be used in many other places where
we currently use libbfd or other libraries:
- Symbol enumeration (in particular, for PE binaries).
- Demangling (including non-Itanium demangling, e.g. Microsoft
or Rust).
- Disassembling (perf annotate).
However, these are much less pressing; most people don't profile PE
binaries, and perf has non-bfd paths for ELF. The same with demangling;
the default _cxa_demangle path works fine for most users, and while bfd
objdump can be slow on large binaries, it is possible to use
--objdump=llvm-objdump to get the speed benefits. (It appears
LLVM-based demangling is very simple, should we want that.)
Tested with LLVM 14, 15, 16, 18 and 19. For some reason, LLVM 12 was not
correctly detected using feature_check, and thus was not tested.
Committer notes:
Added the name and a __maybe_unused to address:
1 13.50 almalinux:8 : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-22) (GCC)
util/srcline.c: In function 'dso__free_a2l':
util/srcline.c:184:20: error: parameter name omitted
void dso__free_a2l(struct dso *)
^~~~~~~~~~~~
make[3]: *** [/git/perf-6.11.0-rc3/tools/build/Makefile.build:158: util] Error 2
Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240803152008.2818485-1-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To remove one more use of the audit libs and address a problem reported
with a recent change where a function isn't available when using the
audit libs method, that should really go away, this being one step in
that direction.
The script used to generate the 64-bit syscall table was already
parametrized to generate for both 64-bit and 32-bit, so just use it and
wire the generated table to the syscalltbl.c routines.
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Jiri Slaby <jirislaby@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/6fe63fa3-6c63-4b75-ac09-884d26f6fb95@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If sched_in event for current task is not recorded, sched_in timestamp
will be set to end_time of time window interest, causing an error in
timestamp show. In this case, we choose to ignore this event.
Test scenario:
perf[1229608] does not record the first sched_in event, run time and sch delay are both 0
# perf sched timehist
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
2090450.763231 [0000] perf[1229608] 0.000 0.000 0.000
2090450.763235 [0000] migration/0[15] 0.000 0.001 0.003
2090450.763263 [0001] perf[1229608] 0.000 0.000 0.000
2090450.763268 [0001] migration/1[21] 0.000 0.001 0.004
2090450.763302 [0002] perf[1229608] 0.000 0.000 0.000
2090450.763309 [0002] migration/2[27] 0.000 0.001 0.007
2090450.763338 [0003] perf[1229608] 0.000 0.000 0.000
2090450.763343 [0003] migration/3[33] 0.000 0.001 0.004
Before:
arbitrarily specify a time window of interest, timestamp will be set to an incorrect value
# perf sched timehist --time 100,200
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
200.000000 [0000] perf[1229608] 0.000 0.000 0.000
200.000000 [0001] perf[1229608] 0.000 0.000 0.000
200.000000 [0002] perf[1229608] 0.000 0.000 0.000
200.000000 [0003] perf[1229608] 0.000 0.000 0.000
200.000000 [0004] perf[1229608] 0.000 0.000 0.000
200.000000 [0005] perf[1229608] 0.000 0.000 0.000
200.000000 [0006] perf[1229608] 0.000 0.000 0.000
200.000000 [0007] perf[1229608] 0.000 0.000 0.000
After:
# perf sched timehist --time 100,200
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
Fixes: 853b740711 ("perf sched timehist: Add option to specify time window of interest")
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819024720.2405244-1-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The spinlock and rwlock use a single-element per-cpu array to track
current locks due to performance reason. But this means the key is
always available and it cannot simply account lock stats in the array
because some of them are invalid.
In fact, the contention_end() program in the BPF invalidates the entry
by setting the 'lock' value to 0 instead of deleting the entry for the
hashmap. So it should skip entries with the lock value of 0 in the
account_end_timestamp().
Otherwise, it'd have spurious high contention on an idle machine:
$ sudo perf lock con -ab -Y spinlock sleep 3
contended total wait max wait avg wait type caller
8 4.72 s 1.84 s 590.46 ms spinlock rcu_core+0xc7
8 1.87 s 1.87 s 233.48 ms spinlock process_one_work+0x1b5
2 1.87 s 1.87 s 933.92 ms spinlock worker_thread+0x1a2
3 1.81 s 1.81 s 603.93 ms spinlock tmigr_update_events+0x13c
2 1.72 s 1.72 s 861.98 ms spinlock tick_do_update_jiffies64+0x25
6 42.48 us 13.02 us 7.08 us spinlock futex_q_lock+0x2a
1 13.03 us 13.03 us 13.03 us spinlock futex_wake+0xce
1 11.61 us 11.61 us 11.61 us spinlock rcu_core+0xc7
I don't believe it has contention on a spinlock longer than 1 second.
After this change, it only reports some small contentions.
$ sudo perf lock con -ab -Y spinlock sleep 3
contended total wait max wait avg wait type caller
4 133.51 us 43.29 us 33.38 us spinlock tick_do_update_jiffies64+0x25
4 69.06 us 31.82 us 17.27 us spinlock process_one_work+0x1b5
2 50.66 us 25.77 us 25.33 us spinlock rcu_core+0xc7
1 28.45 us 28.45 us 28.45 us spinlock rcu_core+0xc7
1 24.77 us 24.77 us 24.77 us spinlock tmigr_update_events+0x13c
1 23.34 us 23.34 us 23.34 us spinlock raw_spin_rq_lock_nested+0x15
Fixes: b5711042a1 ("perf lock contention: Use per-cpu array map for spinlocks")
Reported-by: Xi Wang <xii@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: bpf@vger.kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240828052953.1445862-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When it updates the lock stat for the first time, it needs to create an
element in the BPF hash map.
But if there's a concurrent thread waiting for the same lock (like for
rwsem or rwlock), it might race with the thread and possibly fail to
update with -EEXIST.
In that case, it can lookup the map again and put the data there instead
of failing.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240830065150.1758962-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The LCB_F_SPIN bit is used for spinlock, rwlock and optimistic spinning
in mutex. In get_tstamp_elem() it needs to check spinlock and rwlock
only. As mutex sets the LCB_F_MUTEX, it can check those two bits and
reduce the number of operations.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240830065150.1758962-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It has some duplicate codes to do the same job. Let's add a label and
goto there to handle errors in a single place.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240830065150.1758962-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Additional pipe tests where piped files are written to disk. This
means that spotting a file name of "-" isn't a sufficient "is pipe?"
test.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
No longer used by `perf inject` the repipe_fd is always -1 and repipe
is always false. Remove the options and associated code knowing the
constant values of the removed variables.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously inject->is_pipe was set if the input or output were a
pipe. Determining the input was a pipe had to be done prior to
starting the session and opening the file. This was done by comparing
the input file name with '-' but it fails if the pipe file is written
to disk.
Opening a pipe file from disk will correctly set perf_data.is_pipe, but
this is too late for 'perf inject' and results in a broken file. A
workaround is 'cat pipe_perf|perf inject -i - ...'.
This change removes inject->is_pipe and changes the dependent
conditions to use the is_pipe flag on the input
(inject->session->data) and output files (inject->output). This
ensures the is_pipe condition reflects things like the header being
read.
The change removes the use of perf file header repiping, that is
writing the file header out while reading it in. The case of input
pipe and output file cannot repipe as the attributes for the file are
unknown. To resolve this, write the file header when writing to disk
and as the attributes may be unknown, write them after the data.
Update sessions repipe variable to be trace_event_repipe as those are
the only events now impacted by it. Update __perf_session__new as the
repipe_fd no longer needs passing. Fully removing repipe from session
header reading will be done in a later change.
Committer testing:
root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 | perf report -i -
# To display the perf.data header info, please use --header/--header-only options.
#
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.050 MB - ]
#
# Total Lost Samples: 0
#
# Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep'
# Event count (approx.): 1
#
# Overhead Command Shared Object Symbol
# ........ ....... ............. ...............................
#
100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5
|
---__libc_start_main@@GLIBC_2.34
__libc_start_call_main
0x562fc2560a9f
clock_nanosleep@GLIBC_2.2.5
#
# (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
#
root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 > pipe.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.050 MB - ]
root@number:~# perf report --stdio -i pipe.data
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep'
# Event count (approx.): 1
#
# Overhead Command Shared Object Symbol
# ........ ....... ............. ...............................
#
100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5
|
---__libc_start_main@@GLIBC_2.34
__libc_start_call_main
0x55f775975a9f
clock_nanosleep@GLIBC_2.2.5
#
# (Tip: To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ...)
#
root@number:~#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With a file, to write data an offset needs to be known. Typically data
follows the event attributes in a file.
However, if processing a pipe the number of event attributes may not be
known.
It is convenient in that case to write the attributes after the data.
Expand perf_session__do_write_header() to allow this when the data
offset and size are known.
This approach may be useful for more than just taking a pipe file to
write into a data file, `perf inject --itrace` will reserve and
additional 8kb for attributes, which would be unnecessary if the
attributes were written after the data.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Buggy perf.data files can have the attributes and data
overlapping.
For example, when processing pipe data the attributes aren't known and
so file offset header calculations can consider them not present.
Later this can cause the attributes to overwrite the data. This can be
seen in:
$ perf record -o - true > a.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.059 MB - ]
$ perf inject -i a.data -o b.data
$ perf report --stats -i b.data
0x68 [0]: failed to process type: 510379 [Invalid argument]
Error:
failed to process sample
$
This change makes reading the corrupt file fail:
$ perf report --stats -i b.data
Perf file header corrupt: Attributes and data overlap
incompatible file format (rerun with -v to learn more)
$
Which is more informative.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some of the values are a little strange so add documentation to
resolve ambiguity.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_session is a central data structure to the tool so let's comment
it. The auxtrace callbacks are never modified in session so constify.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240829150154.37929-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that we have overlapping trace IDs it's also useful to know what the
queue number is to be able to distinguish the source of the trace so
print it inline. Hide it behind the -v option because it might not be
obvious to users what the queue number is.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-8-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
v0.1 HW_ID packets have a new field that describes which sink each CPU
writes to. Use the sink ID to link trace ID maps to each other so that
mappings are shared wherever the sink is shared.
Also update the error message to show that overlapping IDs aren't an
error in per-thread mode, just not supported. In the future we can
use the CPU ID from the AUX records, or watch for changing sink IDs on
HW_ID packets to use the correct decoders.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-7-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This isn't a bug because Perf always masks with
CORESIGHT_TRACE_ID_VAL_MASK before using these values, but to avoid it
looking like it could be, make an effort to not save bad values.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-6-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that each queue has a unique set of trace ID mappings, use this
list to create the decoders. In unformatted mode just add a single
mapping so only one decoder is made.
Previously each queue would have a decoder created for each traced CPU
on the system but this won't work anymore because CPUs can have
overlapping trace IDs.
This also means that the CORESIGHT_TRACE_ID_UNUSED_FLAG isn't needed
any more. If mappings aren't added then decoders aren't created, rather
than needing a flag to suppress creation.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-5-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The global list won't work for per-sink trace ID allocations, so put a
list in each queue where the IDs will be unique to that queue.
To keep the same behavior as before, for version 0 of the HW_ID packets,
copy all the HW_ID mappings into all queues.
This change doesn't effect the decoders, only trace ID lookups on the
Perf side. The decoders are still created with global mappings which
will be fixed in a later commit.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-4-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Make cs_etm__setup_queue() setup a queue even if it's empty, and
pre-allocate queues based on the max CPU that was recorded. In per-CPU
mode aux queues are indexed based on CPU ID even if all CPUs aren't
recorded, sparse queue arrays aren't used.
This will allow HW_IDs to be saved even if no aux data was received in
that queue without having to call cs_etm__setup_queue() from two
different places.
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-3-james.clark@linaro.org
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Both of these passes gather information about how to create the
decoders. AUX records determine formatted/unformatted, and the HW_IDs
determine the traceID/metadata mappings.
Therefore it makes sense to cache the information and wait until both
passes are over until creating the decoders, rather than creating them
at the first HW_ID found.
This will allow a simplification of the creation process where
cs_etm_queue->traceid_list will exclusively used to create the decoders,
rather than the current two methods depending on whether the trace is
formatted or not.
Previously the sample CPU from the AUX record was used to initialize
the decoder CPU, but actually sample CPU == AUX queue index in per-CPU
mode, so saving the sample CPU isn't required.
Similarly formatted/unformatted was used upfront to create the decoders,
but now it's cached until later.
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240722101202.26915-2-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
$ sudo ./perf test filtering -vv
96: perf record sample filtering (by BPF) tests:
--- start ---
test child forked, pid 2966908
Checking BPF-filter privilege
Basic bpf-filter test
Basic bpf-filter test [Success]
Failing bpf-filter test
Failing bpf-filter test [Success]
Group bpf-filter test
Group bpf-filter test [Success]
Multiple bpf-filter test
Multiple bpf-filter test [Success]
Cgroup bpf-filter test
Cgroup bpf-filter test [Success]
---- end(0) ----
96: perf record sample filtering (by BPF) tests : Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The new cgroup filter can take either of '==' or '!=' operator and a
pathname for the target cgroup.
$ perf record -a --all-cgroups -e cycles --filter 'cgroup == /abc/def' -- sleep 1
Users should have --all-cgroups option in the command line to enable
cgroup filtering. Technically it doesn't need to have the option as
it can get the current task's cgroup info directly from BPF. But I want
to follow the convention for the other sample info.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The flex and bison files need to be recompiled when one of these header
filters are changed.
* util/bpf-filter.h
* util/bpf_skel/sample-filter.h
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.
So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage. So it shouldn't access it
unconditionally.
I got a segfault, when I wanted to see cgroup profiles.
$ sudo perf record -a --all-cgroups --synth=cgroup true
$ sudo perf report -s cgroup
Program received signal SIGSEGV, Segmentation fault.
0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
48 return RC_CHK_ACCESS(map)->dso;
(gdb) bt
#0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
#1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
#2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
#3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
at util/hist.c:644
#4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
#5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
#6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
#7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
at util/hist.c:1260
#8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
machine=0x5555560388e8) at builtin-report.c:334
#9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
#10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
#11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
#12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
#13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
#14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
#15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
#16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
at util/session.c:780
#17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
file_path=0x555556038ff0 "perf.data") at util/session.c:1406
As you can see the entry->ms.map was NULL even if he->ms.map has a
value. This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same. I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).
Fixes: ac01c8c424 ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Shellcheck versions < v0.7.2 can't follow this path so add the helper to
fix the following warning:
In tests/shell/trace_btf_enum.sh line 13:
. "$(dirname $0)"/lib/probe.sh
^--------------------------^ SC1090: Can't follow non-constant source.
Use a directive to specify location.
Fixes: d66763fed3 ("perf test trace_btf_enum: Add regression test for the BTF augmentation of enums in 'perf trace'")
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240809095426.3065163-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'pmu' pointer in the auxtrace_record structure is not used after
support multiple AUX events, remove it.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240806204130.720977-3-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use evsel__is_aux_event() to decide if an event is a AUX event, this is
a refactoring to replace comparing the PMU type.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240806204130.720977-2-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Yitian 710 is not a Freescale/NXP design and thus should
be located in a separate T-Head vendor directory.
Reviewed-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Cc: kernel@pengutronix.de
Cc: linux-arm-kernel@lists.infradead.org
Cc: patchwork-lst@pengutronix.de
Link: https://lore.kernel.org/r/20240701175735.485655-1-l.stach@pengutronix.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move some of the JSON/events from others.json to more appropriate JSON
files for power10 platform.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20240827053206.538814-2-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since we'll need it later in the current patch series and we can get the
syscall_arg_fmt from syscall_arg->fmt.
Based-on-a-patch-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zsd8vqCrTh5h69rp@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf trace -p <PID>' work on a syscall that is unaugmented, but doesn't
work on a syscall that's augmented (when it calls perf_event_output() in
BPF).
Let's take open() as an example. open() is augmented in perf trace.
Before:
$ perf trace -e open -p 3792392
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
We can see there's no output.
After:
$ perf trace -e open -p 3792392
0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
Reason:
bpf_perf_event_output() will fail when you specify a pid in 'perf trace' (EOPNOTSUPP).
When using 'perf trace -p 114', before perf_event_open(), we'll have PID
= 114, and CPU = -1.
This is bad for bpf-output event, because the ring buffer won't accept
output from BPF's perf_event_output(), making it fail. I'm still trying
to find out why.
If we open bpf-output for every cpu, instead of setting it to -1, like
this:
PID = <PID>, CPU = 0
PID = <PID>, CPU = 1
PID = <PID>, CPU = 2
PID = <PID>, CPU = 3
Everything works.
You can test it with this script (open.c):
#include <unistd.h>
#include <sys/syscall.h>
int main()
{
int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
char s1[] = "DINGZHEN", s2[] = "XUEBAO";
while (1) {
syscall(SYS_open, s1, i1, i2);
sleep(1);
}
return 0;
}
save, compile:
make open
perf trace:
perf trace -e open <path-to-the-executable>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We'll use it in the next patch, to deciding how to set up the ring
buffer.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a new 'perf report' test case that acts as an entry element in 'perf
test list'.
Runs multiple subtests from directory "base_report", which can be
expanded without further editing.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-12-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test basic execution and some options of perf-report subcommand, like
show-nr-samples, header, showcpuutilization, pid and symbol filtering.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-11-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As a form of validation, it is a common practice to check the outputs
of commands whether they contain expected patterns or match a certain
regular expression.
This output checking helper is designed to allow checking stderr output
of perf commands for unexpected messages, while ignoring messages that
are known to be harmless, e.g.:
"Lowering default frequency rate to \d+\."
"\d+ out of order events recorded."
etc.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-10-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf-probe command uses a specific semantics to describe probes.
Test some patterns that are known to be both valid and invalid if
they are handled appropriately.
This test is run as a part of perftool-testsuite_probe test case.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-9-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test if various incompatible options are correctly handled-rejected.
It is run as a part of perftool-testsuite_probe test case.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-8-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test basic behavior of perf-probe subcommand. It is run as a part of
perftool-testsuite_probe test case.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-7-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test perf probe interface. Blacklisted functions should be rejected
when there is an attempt to set a kprobe to them.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-6-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Shellcheck is becoming a standard when building perf to prevent
any unnecessary mistakes. Fix shellcheck warnings in perf testsuite.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-5-vmolnaro@redhat.com
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test scripts in base_* directories currently have their own drivers
that run them. Before this patch, the shell test-suite generator causes
them to run twice. Fix that by skipping them in the generator.
A cleaner solution (for future) will be to use the directory structure
idea (introduced by Carsten Haitzler in 7391db6459 ("perf test:
Refactor shell tests allowing subdirs")) to generate test entries with
subtests, like:
$ perf test list
[...]
97: perf probe shell tests
97:1: perf probe basic functionality
97:2: perf probe tests with arguments
97:3: perf probe invalid options handling
[...]
There is already a lot of shell test scripts and many are about to come,
so there is a need for some hierarchy.
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-3-vmolnaro@redhat.com
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The getname_flags() routine changed recently and thus the place where we
were getting the pathname is not probeable anymore, albeit still
present, so use the next line for that, before:
root@number:/home/acme/git/perf-tools-next# perf test vfs_getname
91: Add vfs_getname probe to get syscall args filenames : FAILED!
93: Use vfs_getname probe to get syscall args filenames : FAILED!
126: Check open filename arg using perf trace + vfs_getname : FAILED!
root@number:/home/acme/git/perf-tools-next#
Now tests 91 and 126 are passing, some more investigation is needed for
test 93, that continues to fail.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add Multiple bpf-filter test for two or more events with filters.
It uses task-clock and page-faults events with different filter
expressions and check the perf script output
$ sudo ./perf test filtering -vv
96: perf record sample filtering (by BPF) tests:
--- start ---
test child forked, pid 2804025
Checking BPF-filter privilege
Basic bpf-filter test
Basic bpf-filter test [Success]
Failing bpf-filter test
Error: task-clock event does not have PERF_SAMPLE_CPU
Failing bpf-filter test [Success]
Group bpf-filter test
Error: task-clock event does not have PERF_SAMPLE_CPU
Error: task-clock event does not have PERF_SAMPLE_CODE_PAGE_SIZE
Group bpf-filter test [Success]
Multiple bpf-filter test
Multiple bpf-filter test [Success]
---- end(0) ----
96: perf record sample filtering (by BPF) tests : Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240820154504.128923-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So far it used tgid as a key to get the filter expressions in the
pinned filters map for regular users but it won't work well if the has
more than one filters at the same time. Let's add the event id to the
key of the filter hash map so that it can identify the right filter
expression in the BPF program.
As the event can be inherited to child tasks, it should use the primary
id which belongs to the parent (original) event. Since evsel opens the
event for multiple CPUs and tasks, it needs to maintain a separate hash
map for the event id.
In the user space, it keeps a list for the multiple evsel and release
the entries in the both hash map when it closes the event.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240820154504.128923-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf crashes as below when applying --no-group
# perf record -e "{cache-misses,branches"} -b sleep 1
# perf report --stdio --no-group
free(): invalid next size (fast)
Aborted (core dumped)
#
In the __hpp__fmt(), only 1 hpp_fmt_value is allocated for the current
event when --no-group is applied.
However, the current implementation tries to assign the hists from all
members to the hpp_fmt_value, which exceeds the allocated memory.
Fixes: 8f6071a3dc ("perf hist: Simplify __hpp_fmt() using hpp_fmt_data")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240820183202.3174323-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Extend the searching for the test files so that it works when running
perf from a separate objdir, and also when the perf executable is
symlinked.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240813213651.1057362-2-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We'll need to check if an warning option introduced in clang 19 is
available on the clang version being used, so cover the error message
emitted when testing for a -W option.
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/CA+icZUVtHn8X1Tb_Y__c-WswsO0K8U9uy3r2MzKXwTA5THtL7w@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In some cases, compilers don't set the location expression in DWARF
precisely. For instance, it may assign a variable to a register after
copying it from a different register. Then it should use the register
for the new type but still uses the old register. This makes hard to
track the type information properly.
This is an example I found in __tcp_transmit_skb(). The first argument
(sk) of this function is a pointer to sock and there's a variable (tp)
for tcp_sock.
static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
int clone_it, gfp_t gfp_mask, u32 rcv_nxt)
{
...
struct tcp_sock *tp;
BUG_ON(!skb || !tcp_skb_pcount(skb));
tp = tcp_sk(sk);
prior_wstamp = tp->tcp_wstamp_ns;
tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
...
So it basically calls tcp_sk(sk) to get the tcp_sock pointer from sk.
But it turned out to be the same value because tcp_sock embeds sock as
the first member. The sk is located in reg5 (RDI) and tp is in reg3
(RBX). The offset of tcp_wstamp_ns is 0x748 and tcp_clock_cache is
0x750. So you need to use RBX (reg3) to access the fields in the
tcp_sock. But the code used RDI (reg5) as it has the same value.
$ pahole --hex -C tcp_sock vmlinux | grep -e 748 -e 750
u64 tcp_wstamp_ns; /* 0x748 0x8 */
u64 tcp_clock_cache; /* 0x750 0x8 */
And this is the disassembly of the part of the function.
<__tcp_transmit_skb>:
...
44: mov %rdi, %rbx
47: mov 0x748(%rdi), %rsi
4e: mov 0x750(%rdi), %rax
55: cmp %rax, %rsi
Because compiler put the debug info to RBX, it only knows RDI is a
pointer to sock and accessing those two fields resulted in error
due to offset being beyond the type size.
-----------------------------------------------------------
find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
CU for net/ipv4/tcp_output.c (die:0x817f543)
frame base: cfa=0 fbreg=6
scope: [1/1] (die:81aac3e)
bb: [0 - 30]
var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg1 type='int' size=0x4 (die:0x818059e)
var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c) <<<--- the first argument ('sk' at %RDI)
mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6)
mov [20] stack canary -> reg0
mov [29] reg0 -> -0x30(stack) stack canary
bb: [36 - 3e]
mov [36] reg4 -> reg15 type='struct sk_buff*' size=0x8 (die:0x8181360)
bb: [44 - 63]
mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c) <<<--- calling tcp_sk()
var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead) <<<--- new variable ('tp' at %RBX)
var [4e] reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
chk [63] reg5 offset=0x748 ok=1 kind=1 (struct sock*) : offset bigger than size <<<--- access with old variable
final result: offset bigger than size
While it's a fault in the compiler, we could work around this issue by
using the type of new variable when it's copied directly. So I've added
copied_from field in the register state to track those direct register
to register copies. After that new register gets a new type and the old
register still has the same type, it'll update (copy it back) the type
of the old register.
For example, if we can update type of reg5 at __tcp_transmit_skb+0x47,
we can find the target type of the instruction at 0x63 like below:
-----------------------------------------------------------
find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
...
bb: [44 - 63]
mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c)
var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead)
var [47] copyback reg5 type='struct tcp_sock*' size=0x8 (die:0x819eead) <<<--- here
mov [47] 0x748(reg5) -> reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
mov [4e] 0x750(reg5) -> reg0 type='unsigned long long' size=0x8 (die:0x8180edd)
mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
chk [63] reg5 offset=0x748 ok=1 kind=1 (struct tcp_sock*) : Good! <<<--- new type
found by insn track: 0x748(reg5) type-offset=0x748
final result: type='struct tcp_sock' size=0xa98 (die:0x819eeb2)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When checking the match variable at the target instruction, it might not
have any information if it's a first write to a stack slot. In this
case it could spill a register value into the stack so the type info is
in the source operand.
But currently it's hard to get the operand from the checking function.
Let's process the instruction and retry to get the type info from the
stack if there's no information already.
This is an example of __tcp_transmit_skb(). The instructions are
<__tcp_transmit_skb>:
0: nopl 0x0(%rax, %rax, 1)
5: push %rbp
6: mov %rsp, %rbp
9: push %r15
b: push %r14
d: push %r13
f: push %r12
11: push %rbx
12: sub $0x98, %rsp
19: mov %r8d, -0xa8(%rbp)
...
It cannot find any variable at -0xa8(%rbp) at this point.
-----------------------------------------------------------
find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19
CU for net/ipv4/tcp_output.c (die:0x817f543)
frame base: cfa=0 fbreg=6
scope: [1/1] (die:81aac3e)
bb: [0 - 19]
var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg1 type='int' size=0x4 (die:0x818059e)
var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)
chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : no type information
no type information
And it was able to find the type after processing the 'mov' instruction.
-----------------------------------------------------------
find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19
CU for net/ipv4/tcp_output.c (die:0x817f543)
frame base: cfa=0 fbreg=6
scope: [1/1] (die:81aac3e)
bb: [0 - 19]
var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
var [5] reg1 type='int' size=0x4 (die:0x818059e)
var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)
chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : retry <<<--- here
mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6)
chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : Good!
found by insn track: -0xa8(reg6) type-offset=0
final result: type='unsigned int' size=0x4 (die:0x8180ed6)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In check_matching_type(), it'd be easier to display the typename in
question if it's available.
For example, check out the line starts with 'chk'.
-----------------------------------------------------------
find data type for 0x10(reg0) at cpuacct_charge+0x13
CU for kernel/sched/build_utility.c (die:0x137ee0b)
frame base: cfa=1 fbreg=7
scope: [3/3] (die:13d9632)
bb: [c - 13]
var [c] reg5 type='struct task_struct*' size=0x8 (die:0x1381230)
mov [c] 0xdf8(reg5) -> reg0 type='struct css_set*' size=0x8 (die:0x1385c56)
chk [13] reg0 offset=0x10 ok=1 kind=1 (struct css_set*) : Good! <<<--- here
found by insn track: 0x10(reg0) type-offset=0x10
final result: type='struct css_set' size=0x250 (die:0x1385b0e)
Another example:
-----------------------------------------------------------
find data type for 0x8(reg0) at menu_select+0x279
CU for drivers/cpuidle/governors/menu.c (die:0x7b0fe79)
frame base: cfa=1 fbreg=7
scope: [2/2] (die:7b11010)
bb: [273 - 277]
bb: [279 - 279]
chk [279] reg0 offset=0x8 ok=0 kind=0 cfa : no type information
scope: [1/2] (die:7b10cbc)
bb: [0 - 64]
...
mov [26a] imm=0xffffffff -> reg15
bb: [273 - 277]
bb: [279 - 279]
chk [279] reg0 offset=0x8 ok=1 kind=1 (long long unsigned int) : no/void pointer <<<--- here
final result: no/void pointer
Also change some places to print negative offsets properly.
Before:
-----------------------------------------------------------
find data type for 0xffffff40(reg6) at __tcp_transmit_skb+0x58
After:
-----------------------------------------------------------
find data type for -0xc0(reg6) at __tcp_transmit_skb+0x58
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The __die_find_member_offset_cb() missed to handle bitfield members
which don't have DW_AT_data_member_location. Like in adding member
types in __add_member_cb() it should fallback to check the bit offset
when it resolves the member type for an offset.
Fixes: 437683a994 ("perf dwarf-aux: Handle type transfer for memory access")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes it's useful to organize member fields in cache-line boundary.
The 'typecln' sort key is short for type-cacheline and to show samples
in each cacheline. The cacheline size is fixed to 64 for now, but it
can read the actual size once it saves the value from sysfs.
For example, you maybe want to which cacheline in a target is hot or
cold. The following shows members in the cfs_rq's first cache line.
$ perf report -s type,typecln,typeoff -H
...
- 2.67% struct cfs_rq
+ 1.23% struct cfs_rq: cache-line 2
+ 0.57% struct cfs_rq: cache-line 4
+ 0.46% struct cfs_rq: cache-line 6
- 0.41% struct cfs_rq: cache-line 0
0.39% struct cfs_rq +0x14 (h_nr_running)
0.02% struct cfs_rq +0x38 (tasks_timeline.rb_leftmost)
...
Committer testing:
# root@number:~# perf report -s type,typecln,typeoff -H --stdio
# Total Lost Samples: 0
#
# Samples: 5K of event 'cpu_atom/mem-loads,ldlat=5/P'
# Event count (approx.): 312251
#
# Overhead Data Type / Data Type Cacheline / Data Type Offset
# .............. ..................................................
#
<SNIP>
0.07% struct sigaction
0.05% struct sigaction: cache-line 1
0.02% struct sigaction +0x58 (sa_mask)
0.02% struct sigaction +0x78 (sa_mask)
0.03% struct sigaction: cache-line 0
0.02% struct sigaction +0x38 (sa_mask)
0.01% struct sigaction +0x8 (sa_mask)
<SNIP>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819233603.54941-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It'd be better to have them in hex to check cacheline alignment.
Percent offset size field
100.00 0 0x1c0 struct cfs_rq {
0.00 0 0x10 struct load_weight load {
0.00 0 0x8 long unsigned int weight;
0.00 0x8 0x4 u32 inv_weight;
};
0.00 0x10 0x4 unsigned int nr_running;
14.56 0x14 0x4 unsigned int h_nr_running;
0.00 0x18 0x4 unsigned int idle_nr_running;
0.00 0x1c 0x4 unsigned int idle_h_nr_running;
...
Committer notes:
Justification from Namhyung when asked about why it would be "better":
Cache line sizes are power of 2 so it'd be natural to use hex and
check whether an offset is in the same boundary. Also 'perf annotate'
shows instruction offsets in hex.
>
> Maybe this should be selectable?
I can add an option and/or a config if you want.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819233603.54941-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The check that map is NULL is already done in the bpf_map__fd(map) and
returns an errno, which does not run further checks.
In addition, even if the check for map is run, the return is a pointer,
which is not consistent with the err_number returned by bpf_map__fd(map).
Signed-off-by: Yang Ruibin <11162571@vivo.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: opensource.kernel@vivo.com
Link: https://lore.kernel.org/r/20240821101500.4568-1-11162571@vivo.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In check_matching_type(), it checks the type state of the register in a
wrong order. When it's the percpu pointer, it should check the type for
the pointer, but it checks the CFA bit first and thought it has no type
in the stack slot. This resulted in no type info.
-----------------------------------------------------------
find data type for 0x28(reg1) at hrtimer_reprogram+0x88
CU for kernel/time/hrtimer.c (die:0x18f219f)
frame base: cfa=1 fbreg=7
...
add [72] percpu 0x24500 -> reg1 pointer type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46)
bb: [7a - 7e]
bb: [80 - 86] (here)
bb: [88 - 88] vvv
chk [88] reg1 offset=0x28 ok=1 kind=4 cfa : no type information
no type information
Here, instruction at 0x72 found reg1 has a (percpu) pointer and got the
correct type. But when it checks the final result, it wrongly thought
it was stack variable because it checks the cfa bit first.
After changing the order of state check:
-----------------------------------------------------------
find data type for 0x28(reg1) at hrtimer_reprogram+0x88
CU for kernel/time/hrtimer.c (die:0x18f219f)
frame base: cfa=1 fbreg=7
... (here)
vvvvvvvvvv
chk [88] reg1 offset=0x28 ok=1 kind=4 percpu ptr : Good!
found by insn track: 0x28(reg1) type-offset=0x28
final type: type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821065408.285548-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes a compound type can have a single field and the size is the
same as the base type. But it's still preferred as struct or union
could carry more information than the base type.
Also put a slight priority on the typedef for the same reason.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821065408.285548-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I found it missed to copy the immediate constant when it moves the
register value. This could result in a wrong type inference since the
address for the per-cpu variable would be 0 always.
Fixes: eb9190afae ("perf annotate-data: Handle ADD instructions")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821065408.285548-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove dependence on libcap. libcap is only used to query whether a
capability is supported, which is just 1 capget system call.
If the capget system call fails, fall back on root permission
checking. Previously if libcap fails then the permission is assumed
not present which may be pessimistic/wrong.
Add a used_root out argument to perf_cap__capable to say whether the
fall back root check was used. This allows the correct error message,
"root" vs "users with the CAP_PERFMON or CAP_SYS_ADMIN capability", to
be selected.
Tidy uses of perf_cap__capable so that tests aren't repeated if capget
isn't supported.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240806220614.831914-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The previous attempt fixed the build on debian:experimental-x-mipsel,
but when building on a larger set of containers I noticed it broke the
build on some other 32-bit architectures such as:
42 7.87 ubuntu:18.04-x-arm : FAIL gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)
builtin-daemon.c: In function 'cmd_session_list':
builtin-daemon.c:692:16: error: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'long int' [-Werror=format=]
fprintf(out, "%c%" PRIu64,
^~~~~
builtin-daemon.c:694:13:
csv_sep, (curr - daemon->start) / 60);
~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from builtin-daemon.c:3:0:
/usr/arm-linux-gnueabihf/include/inttypes.h:105:34: note: format string is defined here
# define PRIu64 __PRI64_PREFIX "u"
So lets cast that time_t (32-bit/64-bit) to uint64_t to make sure it
builds everywhere.
Fixes: 4bbe600293 ("perf daemon: Fix the build on 32-bit architectures")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZsPmldtJ0D9Cua9_@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add it to the record.sh shell test to verify if it tracks cgroup
information correctly. It records with --all-cgroups option can check
if it has PERF_RECORD_CGROUP and the names are not "unknown".
$ sudo ./perf test -vv 95
95: perf record tests:
--- start ---
test child forked, pid 2871922
169c90-169cd0 g test_loop
perf does have symbol 'test_loop'
Basic --per-thread mode test
Basic --per-thread mode test [Success]
Register capture test
Register capture test [Success]
Basic --system-wide mode test
Basic --system-wide mode test [Success]
Basic target workload test
Basic target workload test [Success]
Branch counter test
branch counter feature not supported on all core PMUs (/sys/bus/event_source/devices/cpu) [Skipped]
Cgroup sampling test
Cgroup sampling test [Success]
---- end(0) ----
95: perf record tests : Ok
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240818212948.2873156-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The recent change in 'struct perf_tool' constification broke the cgroup
and/or namespace tracking by resetting tool fields. It should set the
values after perf_tool__init().
Fixes: cecb1cf154 ("perf record: Use perf_tool__init()")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240818212948.2873156-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The handling of mmap and mmap2 events is near identical. Add a common
helper function and call that by the two event handling functions.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-10-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are repipe, build ID and JIT dump variants of the mmap and mmap2
repipe functions. The organization doesn't allow JIT dump to work with
build ID injection and the structure is less than clear. Combine the
function and enable the different behaviors based on ifs.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is clearer to have a single enum that determines how build ids are
injected, it also allows for future extension.
Set the header build ID feature whether lazy or all are generated,
previously only the lazy case would set it.
Allow parsing of known build IDs for either the lazy or all cases.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test recording of call-graphs and injecting --build-all. Add/expand
trap handler.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allows evsel__id_hdr_size() to be used when the evsel is const.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The passed dso_id is copied and so is never an out argument. Remove
its mutability.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Make it clearer the argument is just being used as a string.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
map__init() is only used internally so make it static. Assume memory is
zero initialized, which will better support adding fields to struct
map in the future and was already the case for map__new2.
To reduce complexity, change set_priv and set_erange_warned to not take
a value to assign as they always assign true.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Make sure the memset of a synthesized event only zeros the necessary
tracing data part of the event, as a full event can be over 4kb in
size.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 32-bit arm build system will complain:
tools/perf/util/python.c:75:28: error: field ‘sample’ has incomplete type
75 | struct perf_sample sample;
However, arm64 build system doesn't complain this.
The root cause is arm64 define "HAVE_KVM_STAT_SUPPORT := 1" in
tools/perf/arch/arm64/Makefile, but arm arch doesn't define this. This
will lead to kvm-stat.h include other header files on arm64 build
system, especially "util/sample.h" for util/python.c.
This will try to directly include "util/sample.h" for "util/python.c" to
avoid such build issue on arm platform.
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: imx@lists.linux.dev
Link: https://lore.kernel.org/r/20240819023403.201324-1-xu.yang_2@nxp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After trying all possibilities with DWARF and instruction tracking.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes it matches a variable in the inner scope but it fails because
the actual access can be on a different type. Let's try variables in
every scope and choose the best one using is_better_type().
I have an example with update_blocked_averages(), at first it found a
variable (__mptr) but it's a void pointer. So it moved on to the upper
scope and found another variable (cfs_rq).
$ perf --debug type-profile annotate --data-type --stdio
...
-----------------------------------------------------------
find data type for 0x140(reg14) at update_blocked_averages+0x2db
CU for kernel/sched/fair.c (die:0x12dd892)
frame base: cfa=1 fbreg=7
found "__mptr" (die: 0x13022f1) in scope=4/4 (die: 0x13022e8) failed: no/void pointer
variable location: base=reg14, offset=0x140
type='void*' size=0x8 (die:0x12dd8f9)
found "cfs_rq" (die: 0x1301721) in scope=3/4 (die: 0x130171c) type_offset=0x140
variable location: reg14
type='struct cfs_rq' size=0x1c0 (die:0x12e37e5)
final type: type='struct cfs_rq' size=0x1c0 (die:0x12e37e5)
IIUC the scope is like below:
1: update_blocked_averages
2: __update_blocked_fair
3: for_each_leaf_cfs_rq_safe
4: list_entry -> (container_of)
The container_of is implemented like:
#define container_of(ptr, type, member) ({ \
void *__mptr = (void *)(ptr); \
static_assert(__same_type(*(ptr), ((type *)0)->member) || \
__same_type(*(ptr), void), \
"pointer type mismatch in container_of()"); \
((type *)(__mptr - offsetof(type, member))); })
That's why we see the __mptr variable first but it failed since it has
no type information.
Then for_each_leaf_cfs_rq_safe() is defined as
#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \
list_for_each_entry_safe(cfs_rq, pos, &rq->leaf_cfs_rq_list, \
leaf_cfs_rq_list)
Note that the access was 0x140(r14). And the cfs_rq has
leaf_cfs_rq_list at the 0x140. So it converts the list_head pointer to
a pointer to struct cfs_rq here.
$ pahole --hex -C cfs_rq vmlinux | grep 140
struct cfs_rq struct list_head leaf_cfs_rq_list; /* 0x140 0x10 */
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes more than one variables are located in the same register or a
stack slot. Or it can overwrite existing information with others. I
found this is not helpful in some cases so it needs to update the type
information from the variable only if it's better.
But it's hard to know which one is better, so we needs heuristics. :)
As it deals with memory accesses, the location should have a pointer or
something similar (like array or reference). So if it had an integer
type and a variable is a pointer, we can take the variable's type to
resolve the target of the access.
If it has a pointer type and a variable with the same location has a
different pointer type, it'll take one with bigger target type. This
can be useful when the target type embeds a smaller type (like list
header or RB-tree node) at the beginning so their location is same.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It treats pointers and arrays in the same way. Let's add the helper and
use it when it checks if it needs a pointer.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that it can return enum variable_match_type to be propagated to the
find_data_type_die(). Also update the debug message to show the result
of the check_matching_type().
chk [dd] reg0 offset=0 ok=1 kind=1 : Good!
or
chk [177] reg4 offset=0x138 ok=0 kind=0 cfa : no type information
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that it can show a proper debug message in the right place. The
check_variable() is used in other places which don't want to print the
message.
$ perf --debug type-profile annotate --data-type
Before:
-----------------------------------------------------------
find data type for 0x140(reg14) at update_blocked_averages+0x2db
CU for kernel/sched/fair.c (die:0x12dd892)
frame base: cfa=1 fbreg=7
no pointer or no type <<<--- removed
check variable "__mptr" failed (die: 0x13022f1)
variable location: base=reg14, offset=0x140
type='void*' size=0x8 (die:0x12dd8f9)
After:
-----------------------------------------------------------
find data type for 0x140(reg14) at update_blocked_averages+0x2db
CU for kernel/sched/fair.c (die:0x12dd892)
frame base: cfa=1 fbreg=7
found "__mptr" (die: 0x13022f1) in scope=4/4 (die: 0x13022e8) failed: no/void pointer <<<--- here
variable location: base=reg14, offset=0x140
type='void*' size=0x8 (die:0x12dd8f9)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
And let check_variable() return the enum value so that callers can know
what was the problem. This will be used by the later patch to update
the statistics correctly and print the error message in a right place.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The location list will have entries with half-open addressing like
[start, end) which means it doesn't include the end address. So it
should skip entries at the end address and match to the next entry.
An example location list looks like this (from readelf -wo):
00237876 ffffffff8110d32b (base address)
0023787f v000000000000000 v000000000000002 views at 00237868 for:
ffffffff8110d32b ffffffff8110d4eb (DW_OP_reg3 (rbx)) <<<--- 1
00237885 v000000000000002 v000000000000000 views at 0023786a for:
ffffffff8110d4eb ffffffff8110d50b (DW_OP_reg14 (r14)) <<<--- 2
0023788c v000000000000000 v000000000000001 views at 0023786c for:
ffffffff8110d50b ffffffff8110d7c4 (DW_OP_reg3 (rbx))
00237893 v000000000000000 v000000000000000 views at 0023786e for:
ffffffff8110d806 ffffffff8110d854 (DW_OP_reg3 (rbx))
0023789a v000000000000000 v000000000000000 views at 00237870 for:
ffffffff8110d876 ffffffff8110d88e (DW_OP_reg3 (rbx))
The first entry at 0023787f has [8110d32b, 8110d4eb) (omitting the
ffffffff at the beginning), and the second one has [8110d4eb, 8110d50b).
Fixes: 2bc3cf575a ("perf annotate-data: Improve debug message with location info")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It missed to call check_allowed_ops() in __die_collect_vars_cb() so it
can take variables with complex location expression incorrectly.
For example, I found some variable has this expression.
015d8df8 ffffffff81aacfb3 (base address)
015d8e01 v000000000000004 v000000000000000 views at 015d8df2 for:
ffffffff81aacfb3 ffffffff81aacfd2 (DW_OP_fbreg: -176; DW_OP_deref;
DW_OP_plus_uconst: 332; DW_OP_deref_size: 4;
DW_OP_lit1; DW_OP_shra; DW_OP_const1u: 64;
DW_OP_minus; DW_OP_stack_value)
015d8e14 v000000000000000 v000000000000000 views at 015d8df4 for:
ffffffff81aacfd2 ffffffff81aacfd7 (DW_OP_reg3 (rbx))
015d8e19 v000000000000000 v000000000000000 views at 015d8df6 for:
ffffffff81aacfd7 ffffffff81aad020 (DW_OP_fbreg: -176; DW_OP_deref;
DW_OP_plus_uconst: 332; DW_OP_deref_size: 4;
DW_OP_lit1; DW_OP_shra; DW_OP_const1u: 64;
DW_OP_minus; DW_OP_stack_value)
015d8e2c <End of list>
It looks like '((int *)(-176(%rbp) + 332) >> 1) - 64' but the current
code thought it's just -176(%rbp) and processed the variable incorrectly.
It should reject such a complex expression if check_allowed_ops()
doesn't like it. :)
Fixes: 932dcc2c39 ("perf dwarf-aux: Add die_collect_vars()")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240816235840.2754937-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the latest perf-tools merge for 6.11, i.e. to have the
current perf tools branch that is getting into 6.11 with the
perf-tools-next that is geared towards 6.12.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we'll only print metric headers for metric leader in
aggregration mode. This will make `perf iostat` header not shown
since it'll aggregrated globally but don't have metric events:
root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat --iostat --timeout 1000
Performance counter stats for 'system wide':
port
0000:00 0 0 0 0
0000:80 0 0 0 0
[...]
Fix this by excluding the iostat in the check of printing metric
headers. Then we can see the headers:
root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat --iostat --timeout 1000
Performance counter stats for 'system wide':
port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB)
0000:00 0 0 0 0
0000:80 0 0 0 0
[...]
Fixes: 193a9e3020 ("perf stat: Don't display metric header for non-leader uncore events")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: linuxarm@huawei.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Zeng Tao <prime.zeng@hisilicon.com>
Link: https://lore.kernel.org/r/20240802065800.48774-1-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When perf_time__parse_str() fails in perf_sched__timehist(),
need to free session that was previously created, fix it.
Fixes: 853b740711 ("perf sched timehist: Add option to specify time window of interest")
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240806023533.1316348-1-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
AddressSanitizer found a use-after-free bug in the symbol code which
manifested as 'perf top' segfaulting.
==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
READ of size 1 at 0x60b00c48844b thread T193
#0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
#1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
#2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
#3 0x5650d804568f in __hists__add_entry util/hist.c:754
#4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
#5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
#6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
#7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
#8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
#9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
#10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
#11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
#12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
#13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().
While this bug was probably introduced with 5c24b67aae ("perf
tools: Replace map->referenced & maps->removed_maps with map->refcnt"),
the symbol objects were leaked until c087e9480c ("perf machine:
Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so
the bug was masked.
Fixes: c087e9480c ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Reported-by: Yunzhao Li <yunzhao@cloudflare.com>
Signed-off-by: Matt Fleming (Cloudflare) <matt@readmodwrite.com>
Cc: Ian Rogers <irogers@google.com>
Cc: kernel-team@cloudflare.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org # v5.13+
Link: https://lore.kernel.org/r/20240815142212.3834625-1-matt@readmodwrite.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Subtest for system-wide record with '--threads=cpu' option fails due
to a limit of open file descriptors on systems with 128 or more CPUs
as the default limit is set to 1024.
The number of open file descriptors should be slightly above
nmb_events*nmb_cpus + nmb_cpus(for perf.data.n) + 4*nmb_cpus(for pipes),
which equals 8*nmb_cpus. Therefore, temporarily raise the limit to
16*nmb_cpus for the test.
Committer notes:
Instead of disabling ShellCheck warnings all the uses of 'uname -n',
i.e. those:
In tests/shell/record.sh line 35:
default_fd_limit=$(ulimit -Sn)
^-^ SC3045 (warning): In POSIX sh, ulimit -S is undefined.
We can just switch from using '/bin/sh' to '/bin/bash' for this test, as
bash _has_ 'ulimit -n', so ShellCheck will not emit that warning.
There are dozens of 'perf test' shell tests that do just that,
'/bin/bash' is a reasonable expectation for those tests.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Radostin Stoyanov <rstoyano@redhat.com>
Link: https://lore.kernel.org/linux-perf-users/20240429085721.10122-1-vmolnaro@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enhance the test case for the branch counter feature.
Now, the test verifies:
- The new filter can be successfully applied on the supported platforms.
- The counter value can be outputted via the perf report -D
- The counter value and the abbr name can be outputted via the
perf script (New)
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-10-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There could be several branch counter events. If perf tool output the
result via the format "event name + a number", the line could be very
long and hard to read.
An abbreviation is introduced to replace the full event name in the
display. The abbreviation starts from 'A' to 'Z9', which can support
up to 286 events. The same abbreviation will be assigned if the same
events are found in the evlist. The next patch will utilize the
abbreviation name to show the branch counter events in the output.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-6-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When annotating a basic block, it's useful to display the occurrences
of other events in the block.
The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation
of a basic block and displays the cycle-related annotation.
When the branch counters information is available, the branch counters
are automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in
hist__account_cycles().
In 'struct annotated_branch', introduce a br_cntr array to save the
accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64
space.
Because the saturation of a branch counter is small, e.g., for Intel
Sierra Forest, the saturation is only 3.
Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter
once saturated. That can be used to indicate a potential event lost
because of the saturation.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-5-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. The kernel
only dumps the number of occurrences into a record. The perf tool has
to map the number to the corresponding event.
Add evlist__update_br_cntr() to go through the evlist to pick the
events that are configured to be logged. Assign a logical idx to track
them, and add the total number of the events in the leader event.
The total number will be used to allocate the space to save the branch
counters for a block. The logical idx will be used to locate the
corresponding event quickly in the following patches.
It only needs to iterate the evlist once. The
evsel__has_branch_counters() is also optimized.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A false overflow warning is triggered if a sample doesn't have any LBRs
recorded and the branch counters feature is enabled.
The current code does OVERFLOW_CHECK_u64() at the very beginning when
reading the information of branch counters. It assumes that there is at
least one LBR in the PEBS record. But it is a valid case that 0 LBR is
recorded especially in a high context switch.
Remove the OVERFLOW_CHECK_u64(). The later OVERFLOW_CHECK() should be
good enough to check the overflow when reading the information of the
branch counters.
Fixes: 9fbb4b0230 ("perf tools: Add branch counter knob")
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Help to better identify the location of test failures but dumping the
failing test in the trap handler.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20240813040613.882075-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
lock__parse() calls disasm_line__parse() passing
&ops->locked.ins.name that will use strdup() to populate it.
Ensure ops->locked.ins.name is freed in lock__delete().
Found with address/leak sanitizer.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20240813040613.882075-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The DSO build id is injected when the dso is first encountered but the
checking for first encountered only looks at the sample->ip not the
entire callchain.
Use the callchain logic to ensure all build ids are inserted.
Fixes: 454c407ec1 ("perf: add perf-inject builtin")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: https://lore.kernel.org/r/20240812224119.744968-1-irogers@google.com
[ Split from a larger patch that introduced the API and use it ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a for_each callback style API to callchain with
sample__for_each_callchain_node().
Possibly in the future such an API can avoid the overhead of
constructing the call chain list.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: https://lore.kernel.org/r/20240812224119.744968-1-irogers@google.com
[ Split from a larger patch that introduced the API and use it ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Intel TPEBS sampling mode is supported through perf record. The counting mode
code uses perf record to capture retire_latency value and use it in metric
calculation. This test checks the counting mode code on Intel platforms.
Committer testing:
root@x1:~# perf test tpebs
123: test Intel TPEBS counting mode : Ok
root@x1:~# set -o vi
root@x1:~# perf test tpebs
123: test Intel TPEBS counting mode : Ok
root@x1:~# perf test -v tpebs
123: test Intel TPEBS counting mode : Ok
root@x1:~# perf test -vvv tpebs
123: test Intel TPEBS counting mode:
--- start ---
test child forked, pid 16603
Testing without --record-tpebs
Testing with --record-tpebs
---- end(0) ----
123: test Intel TPEBS counting mode : Ok
root@x1:~#
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-9-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
TPEBS (Timed PEBS(Precise Event-Based Sampling)) is a new feature Intel
PMU from Granite Rapids microarchitecture.
It will be used in new TMA (Top-Down Microarchitecture Analysis)
releases.
Add related introduction to documents while adding new code to support
it in 'perf stat'.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-8-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add MTL metric JSON file for TMA4.8. Some of the metrics' formulas use TPEBS
retire_latency in MTL.
This also includes lated E-Core TMA3.6 changes.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-6-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When retire_latency value is used in a metric formula, evsel would fork
a 'perf record' process with "-e" and "-W" options. 'perf record' will
collect required retire_latency values in parallel while 'perf stat' is
collecting counting values.
At the point of time that 'perf stat' stops counting, evsel would stop
'perf record' by sending sigterm signal to 'perf record' process.
Sampled data will be processed to get retire latency value. Another
thread is required to synchronize between 'perf stat' and 'perf record'
when we pass data through pipe.
Retire_latency evsel is not opened for 'perf stat' so that there is no
counter wasted on it. This commit includes code suggested by Namhyung to
adjust reading size for groups that include retire_latency evsels.
In current :R parsing implementation, the parser would recognize events
with retire_latency modifier and insert them into the evlist like a
normal event. Ideally, we need to avoid counting these events.
In this commit, at the time when a retire_latency evsel is read, set the
retire latency value processed from the sampled data to count value.
This sampled retire latency value will be used for metric calculation
and final event count print out. No special metric calculation and event
print out code required for retire_latency events.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-4-weilin.wang@intel.com
[ Squashed the 3rd and 4th commit in the series to keep it building patch by patch ]
[ Constified the 'struct perf_tool' pointer in process_sample_event() ]
[ Use perf_tool__init(&tool, false) to address a segfault I reported and Ian/Weilin diagnosed ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When in PIPE mode, allow to use fd dynamically opened and asigned to
data->file.fd instead of STDIN_FILENO or STDOUT_FILENO.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-3-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Retirement latency is a separate sampled count used on newer Intel
CPUs.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-2-weilin.wang@intel.com
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Make tool const now that all uses are const and
perf_tool__fill_defaults() won't be used. The aim is to better capture
that sessions don't mutate tools.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-28-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now all tools are fully initialized prior to use it has no use so
remove.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-27-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-26-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ensure tool is initialized to avoid lazy initialization pattern so
that more uses of struct perf_tool can be made const.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-25-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-24-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-23-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-22-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-21-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-20-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-19-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-18-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-16-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-15-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-14-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-12-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-10-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reduce scope of build_id__mark_dso_hit_ops() to the scope of function
perf_session__list_build_ids, its only use, and use perf_tool__init()
for the default values. Move perf_event__exit_del_thread() to event.[ch]
so it can be used in builtin-buildid-list.c.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reduce the scope of the tool from global/static to just that of the
cmd_kmem function where the session is scoped. Use the perf_tool__init()
to initialize default values.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add init function that behaves like perf_tool__fill_defaults() but
assumes all values haven't been initialized.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The aim here is to eventually make perf_tool__fill_defaults() an init
function so that the tools struct is more const.
Create a tool.c to go along with tool.h. Move perf_tool__fill_defaults()
out of session.c into tool.c along with the default stub values. Add
perf_tool__compressed_is_stub() for a test in
perf_session__process_user_event().
perf_session__process_compressed_event() is only used from being default
initialized so migrate into tool.c.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The tool pointer (to a struct largely of function pointers) is passed
around but is unchanged except at initialization. Change parameter and
variable types to be const to lower the possibilities of what could
happen with a tool.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
struct s390_cpumsf_synth was likely cargo culted from other auxtrace
examples. It has no users, so remove.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add perf_session__deliver_synth_attr_event that synthesizes a
perf_record_header_attr event with one id. Remove use of
perf_event__synthesize_attr that necessitates the use of the dummy
tool in order to pass the session.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Like 'perf report', use 'e' or 'E' key to toggle folding the current
entry so that it can control displaying child entries.
Note I didn't add the 'c' and 'C' key to collapse the entry because it's
also handled with the 'e'/'E' since it toggles the state.
Committer testing:
Do some 'perf mem record' for some workload of the whole system, using
the target options, as usual (--pid/-p, -C/--cpu, -a for the system wide
profiling, etc) and then:
# perf annotate --skip-empty --data-type=pthread_mutex_t
That, by default, will start as --tui, then press 'E' to see the whole
struct unfolded, etc.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812194447.2049187-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Like in the hists browser, it should support folding current entry so
that it can hide unwanted details in some data structures.
The folded entries will be displayed with the '+' sign, while unfolded
entries will have the '-' sign.
Entries that have no children will not show any signs.
Annotate type: 'struct socket' (1 samples)
Percent Offset Size Field
- 100.00 0 128 struct socket { ◆
0.00 0 4 socket_state state; ▒
0.00 4 2 short int type; ▒
0.00 8 8 long unsigned int flags; ▒
0.00 16 8 struct file* file; ▒
100.00 24 8 struct sock* sk; ▒
0.00 32 8 struct proto_ops* ops; ▒
- 0.00 64 64 struct socket_wq wq { ▒
- 0.00 64 24 wait_queue_head_t wait { ▒
+ 0.00 64 4 spinlock_t lock; ▒
- 0.00 72 16 struct list_head head { ▒
0.00 72 8 struct list_head* next; ▒
0.00 80 8 struct list_head* prev; ▒
}; ▒
}; ▒
0.00 88 8 struct fasync_struct* fasync_list; ▒
0.00 96 8 long unsigned int flags; ▒
+ 0.00 104 16 struct callback_head rcu; ▒
}; ▒
}; ▒
This just adds the display logic for folding, actually folding action
will be implemented in the next patch.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812194447.2049187-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cache home agent (CHA) events were setting the low rather than high
config1 bits. SNR was using CLX CHA events, however its CHA is similar
to ICX so remove the events.
Incorporate the updates in:
https://github.com/intel/perfmon/pull/215https://github.com/intel/perfmon/pull/216
Fixes: 4cc4994244 ("perf vendor events: Update cascadelakex events/metrics")
Closes: https://lore.kernel.org/linux-perf-users/CAPhsuW4nem9XZP+b=sJJ7kqXG-cafz0djZf51HsgjCiwkGBA+A@mail.gmail.com/
Reported-by: Song Liu <song@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Co-authored-by: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240811042004.421869-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The bpf_get_stackid() helper returns a signed type to check whether it
failed to get a stacktrace or not. But it saved the result in u32 and
checked if the value is negative.
376 if (needs_callstack) {
377 pelem->stack_id = bpf_get_stackid(ctx, &stacks,
378 BPF_F_FAST_STACK_CMP | stack_skip);
--> 379 if (pelem->stack_id < 0)
./tools/perf/util/bpf_skel/lock_contention.bpf.c:379 contention_begin()
warn: unsigned 'pelem->stack_id' is never less than zero.
Let's change the type to s32 instead.
Fixes: 6d499a6b3d ("perf lock: Print the number of lost entries for BPF")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812172533.2015291-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In get_member_overhead(), k is updated when it has a entry in the
histogram. But the entry->hists array is allocated with the number of
evsel in the group. So the k should be reset when it iterates the event
using for_each_group_evsel(), otherwise it'd crash due to a buffer
overflow.
Fixes: cb1898f58e ("perf annotate-data: Support --skip-empty option")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240810191502.1947959-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Current description for the AUX trace buffer size is misleading. When a
user specifies the option '-m,512M', it represents a size value in bytes
(512MiB) but not 512M pages (512M x 4KiB regard to a page of 4KiB).
Make the document clear that the normal buffer and the AUX tracing
buffer share the same semantics. Syncs the documents for consistent
text.
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812093459.2575278-1-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Similarly to other subcommands (like report, top), it would be handy to
provide a path for addr2line command.
Signed-off-by: Martin Liska <martin.liska@hey.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/eadc3e36-029d-4848-9d69-272fe5a83a26@foxlink.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of explicitely initializing just the .name and .alias_name,
use struct member named initialization of just the non-null -name field,
the compiler will initialize all the other non-explicitely initialized
fields to NULL.
This makes the code more robust, avoiding the error recently fixed when
the .alias_name was used and contained a random value.
Reviewed-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Radostin Stoyanov <rstoyano@redhat.com>
Link: https://lore.kernel.org/lkml/e26941f9-f86c-4f2e-b812-20c49fb2c0d3@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Noticed with:
1 6.22 debian:experimental-x-mipsel : FAIL gcc version 13.2.0 (Debian 13.2.0-25)
builtin-daemon.c: In function 'cmd_session_list':
builtin-daemon.c:691:35: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'time_t' {aka 'long long int'} [-Werror=format=]
Use inttypes.h's PRIu64 to deal with that.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/ZplvH21aQ8pzmza_@x1
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Create a source symlink to the original source in the objdir.
This is similar to what the main kernel build script does.
Committer testing:
⬢[acme@toolbox perf-tools-next]$ make O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
<SNIP>
⬢[acme@toolbox perf-tools-next]$ ls -la /tmp/build/perf-tools-next/source
lrwxrwxrwx. 1 acme acme 41 Aug 9 16:26 /tmp/build/perf-tools-next/source -> /home/acme/git/perf-tools-next/tools/perf
⬢[acme@toolbox perf-tools-next]$
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240807231823.898979-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In that case we have a set of placeholder functions, one of them uses a
'Dwarf_Addr' type that is not present as it is defined in the missing
DWARF libraries, so provide a placeholder typedef for that as well.
The build error before this patch:
In file included from util/annotate.c:28:
util/debuginfo.h:44:46: error: unknown type name ‘Dwarf_Addr’
44 | Dwarf_Addr *offs __maybe_unused,
| ^~~~~~~~~~
make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: util/annotate.o] Error 1
make[6]: *** Waiting for unfinished jobs....
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/CAM9d7ciushSwEfj7yW4rtDEJBTcCB991V4cswwFEL+cv6QF2pg@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For example, when using the Alder Lake PMU memory load event, the
instruction latency is stored in 'ins_lat', while the cache latency
is stored in 'weight'.
This patch reports the 'ins_lat' field for Python scripting.
Committer testing:
On a Rocket Lake Refresh Intel machine (14th gen):
root@number:~# grep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Core(TM) i7-14700K
root@number:~# perf mem record -a sleep 5
Memory events are enabled on a subset of CPUs: 16-27
[ perf record: Woken up 85 times to write data ]
[ perf record: Captured and wrote 41.236 MB perf.data (191390 samples) ]
root@number:~# perf evlist -v
cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
root@number:~#
Now generate a python script to then dump the dictionary that now needs
to have that 'ins_lat' field:
root@number:~# perf script --gen python
generated Python script: perf-script.py
root@number:~# vim perf-script.py
root@number:~# perf script -s perf-script.py | head -40
in trace_begin
in trace_end
root@number:~# vim perf-script.py
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Zixian Cai <fzczx123@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240809080137.3590148-1-fzczx123@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Running on a:
root@x1:~# grep 'model name' -m1 /proc/cpuinfo
model name : 13th Gen Intel(R) Core(TM) i7-1365U
root@x1:~#
It skips all the tests with:
root@x1:~# perf test -vvvv LBR
97: perf record LBR tests:
--- start ---
test child forked, pid 2033388
Skip: only x86 CPUs support LBR
---- end(-2) ----
97: perf record LBR tests : Skip
root@x1:~#
Because the test checks for the /sys/devices/cpu/caps/branches file,
that isn't present as we have instead:
root@x1:~# ls -la /sys/devices/cpu*/caps/branches
-r--r--r--. 1 root root 4096 Aug 8 11:22 /sys/devices/cpu_atom/caps/branches
-r--r--r--. 1 root root 4096 Aug 8 11:21 /sys/devices/cpu_core/caps/branches
root@x1:~#
If we check as well for one of those,
/sys/devices/cpu_core/caps/branches, then we don't skip the tests and
all are run on these x86 Intel Hybrid systems as well, passing all of
them:
root@x1:~# perf test -vvvv LBR
97: perf record LBR tests:
--- start ---
test child forked, pid 2034956
LBR callgraph
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.812 MB /tmp/__perf_test.perf.data.B2HvQ (8114 samples) ]
LBR callgraph [Success]
LBR any branch test
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 6.382 MB /tmp/__perf_test.perf.data.B2HvQ (8071 samples) ]
LBR any branch test: 8071 samples
LBR any branch test [Success]
LBR any call test
[ perf record: Woken up 23 times to write data ]
[ perf record: Captured and wrote 6.208 MB /tmp/__perf_test.perf.data.B2HvQ (8092 samples) ]
LBR any call test: 8092 samples
LBR any call test [Success]
LBR any ret test
[ perf record: Woken up 24 times to write data ]
[ perf record: Captured and wrote 6.396 MB /tmp/__perf_test.perf.data.B2HvQ (8093 samples) ]
LBR any ret test: 8093 samples
LBR any ret test [Success]
LBR any indirect call test
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 6.344 MB /tmp/__perf_test.perf.data.B2HvQ (8067 samples) ]
LBR any indirect call test: 8067 samples
LBR any indirect call test [Success]
LBR any indirect jump test
[ perf record: Woken up 12 times to write data ]
[ perf record: Captured and wrote 3.073 MB /tmp/__perf_test.perf.data.B2HvQ (8061 samples) ]
LBR any indirect jump test: 8061 samples
LBR any indirect jump test [Success]
LBR direct calls test
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 6.380 MB /tmp/__perf_test.perf.data.B2HvQ (8076 samples) ]
LBR direct calls test: 8076 samples
LBR direct calls test [Success]
LBR any indirect user call test
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.597 MB /tmp/__perf_test.perf.data.B2HvQ (8079 samples) ]
LBR any indirect user call test: 8079 samples
LBR any indirect user call test [Success]
LBR system wide any branch test
[ perf record: Woken up 26 times to write data ]
[ perf record: Captured and wrote 9.088 MB /tmp/__perf_test.perf.data.B2HvQ (9209 samples) ]
LBR system wide any branch test: 9209 samples
LBR system wide any branch test [Success]
LBR system wide any call test
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 8.945 MB /tmp/__perf_test.perf.data.B2HvQ (9333 samples) ]
LBR system wide any call test: 9333 samples
LBR system wide any call test [Success]
LBR parallel any branch test
LBR parallel any call test
LBR parallel any ret test
LBR parallel any indirect call test
LBR parallel any indirect jump test
LBR parallel direct calls test
LBR parallel system wide any branch test
LBR parallel any indirect user call test
LBR parallel system wide any call test
[ perf record: Woken up 9 times to write data ]
[ perf record: Woken up 51 times to write data ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Woken up 5 times to write data ]
[ perf record: Woken up 559 times to write data ]
[ perf record: Woken up 14 times to write data ]
[ perf record: Woken up 17 times to write data ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Woken up 11 times to write data ]
[ perf record: Captured and wrote 0.150 MB /tmp/__perf_test.perf.data.lANpR (1909 samples) ]
[ perf record: Captured and wrote 2.371 MB /tmp/__perf_test.perf.data.Olum8 (3033 samples) ]
[ perf record: Captured and wrote 1.230 MB /tmp/__perf_test.perf.data.njfJ8 (1742 samples) ]
[ perf record: Captured and wrote 5.554 MB /tmp/__perf_test.perf.data.4ZTrj (29662 samples) ]
[ perf record: Captured and wrote 19.906 MB /tmp/__perf_test.perf.data.dlGQt (29576 samples) ]
[ perf record: Captured and wrote 0.289 MB /tmp/__perf_test.perf.data.CAT7y (4311 samples) ]
[ perf record: Captured and wrote 3.129 MB /tmp/__perf_test.perf.data.diuKG (3971 samples) ]
LBR parallel any indirect user call test: 1909 samples
[ perf record: Captured and wrote 4.858 MB /tmp/__perf_test.perf.data.sVjtN (6130 samples) ]
LBR parallel any indirect user call test [Success]
[ perf record: Captured and wrote 3.669 MB /tmp/__perf_test.perf.data.AJtNI (4827 samples) ]
LBR parallel any indirect jump test: 4311 samples
LBR parallel any indirect jump test [Success]
LBR parallel direct calls test: 3033 samples
LBR parallel direct calls test [Success]
LBR parallel any indirect call test: 1742 samples
LBR parallel any indirect call test [Success]
LBR parallel any call test: 4827 samples
LBR parallel any call test [Success]
LBR parallel any branch test: 6130 samples
LBR parallel any branch test [Success]
LBR parallel system wide any branch test: 29662 samples
LBR parallel any ret test: 3971 samples
LBR parallel any ret test [Success]
LBR parallel system wide any branch test [Success]
LBR parallel system wide any call test: 29576 samples
LBR parallel system wide any call test [Success]
---- end(0) ----
97: perf record LBR tests : Ok
root@x1:~#
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZrTXftup0H46R8WK@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adds coverage for LBR operations and LBR callgraph.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240808054644.1286065-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps
and map members are reference counted. Ensure these values use a _get
routine to increment the reference counts and use map_symbol__exit() to
release the reference counts.
Do similar for 'struct thread's prev_lbr_cursor, but save the size of
the prev_lbr_cursor array so that it may be iterated.
Ensure that when stitch_nodes are placed on the free list the
map_symbols are exited.
Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to
list_splice_init(), so the whole list is moved and nodes aren't leaked.
A reproduction of the memory leaks is possible with a leak sanitizer
build in the perf report command of:
```
$ perf record -e cycles --call-graph lbr perf test -w thloop
$ perf report --stitch-lbr
```
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Fixes: ff165628d7 ("perf callchain: Stitch LBR call stack")
Signed-off-by: Ian Rogers <irogers@google.com>
[ Basic tests after applying the patch, repeating the example above ]
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240808054644.1286065-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 3e0bf9fde2 ("perf pmu: Restore full PMU name wildcard
support") adds a test case "PMU cmdline match" that covers PMU name
wildcard support provided by function perf_pmu__match().
The test works with a wide range of supported combinations of PMU name
matching but omits the case that if the perf_pmu__match() cannot match
the PMU name to the wildcard, it tries to match its alias. However, this
variable is not set up, causing the test case to fail when run with
subprocesses or to segfault if run as a single process.
./perf test -vv 9
9: Sysfs PMU tests :
9.1: Parsing with PMU format directory : Ok
9.2: Parsing with PMU event : Ok
9.3: PMU event names : Ok
9.4: PMU name combining : Ok
9.5: PMU name comparison : Ok
9.6: PMU cmdline match : FAILED!
./perf test -F 9
9.1: Parsing with PMU format directory : Ok
9.2: Parsing with PMU event : Ok
9.3: PMU event names : Ok
9.4: PMU name combining : Ok
9.5: PMU name comparison : Ok
Segmentation fault (core dumped)
Initialize the PMU alias to null for all tests of perf_pmu__match()
as this functionality is not being tested and the alias matching works
exactly the same as the matching of the PMU name.
./perf test -F 9
9.1: Parsing with PMU format directory : Ok
9.2: Parsing with PMU event : Ok
9.3: PMU event names : Ok
9.4: PMU name combining : Ok
9.5: PMU name comparison : Ok
9.6: PMU cmdline match : Ok
Fixes: 3e0bf9fde2 ("perf pmu: Restore full PMU name wildcard support")
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Radostin Stoyanov <rstoyano@redhat.com>
Link: https://lore.kernel.org/r/20240808103749.9356-1-vmolnaro@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In 'perf ftrace profile sleep 0.1' we know that we'll have an specific
kernel function that will take a bit more than 0.1 seconds and will take
place just one time, so we can add a check for that so that we validate
more than just the presence of some functions in the profile.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/lkml/ZrTBo7KACZeuCyLj@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The die_get_typename() would resolve typedef and get to the original
type. But sometimes the original type is a struct without name and it
makes the output confusing and hard to read.
This is a diff of perf report -s type before and after the change.
New types such as atomic{,64}_t and sigset_t appeared and the portion
of unnamed struct was reduced. Also u32, u64 and size_t were splitted
from the base types.
--- b 2024-08-01 17:02:34.307809952 -0700
+++ a 2024-08-07 14:17:05.245853999 -0700
- 2.40% long unsigned int
+ 2.26% long unsigned int
- 1.56% unsigned int
+ 1.27% unsigned int
- 0.98% struct
- 0.79% long long unsigned int
+ 0.58% long long unsigned int
+ 0.36% struct
+ 0.27% atomic64_t
+ 0.22% u32
+ 0.21% u64
+ 0.19% atomic_t
+ 0.13% size_t
- 0.08% struct seqcount_spinlock
+ 0.08% seqcount_spinlock_t
+ 0.08% sigset_t
+ 0.08% __poll_t
Let's use the typedef name directly and the resolved to get the size of
the type.
Committer testing:
root@x1:~# diff -u before after | head -30
--- before 2024-08-08 09:35:13.917325041 -0300
+++ after 2024-08-08 09:37:35.312257905 -0300
@@ -10,25 +10,27 @@
# ........ .........
#
79.40% (unknown)
- 2.28% union
1.96% (stack operation)
- 1.24% struct
+ 1.87% pthread_mutex_t
0.99% u32[]
- 0.92% unsigned int
0.77% struct task_struct
+ 0.75% U32
0.75% struct pcpu_hot
0.63% struct qspinlock
+ 0.61% atomic_t
0.59% struct list_head
- 0.58% int
0.53% struct cfs_rq
0.51% BYTE*
- 0.48% unsigned char
+ 0.48% BYTE
0.48% long unsigned int
0.46% struct rq
0.41% struct worker
0.41% struct memcg_vmstats_percpu
+ 0.41% pthread_cond_t
0.37% _Bool
+ 0.36% int
root@x1:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240807223129.1738004-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample. This is inefficient and it most
likely accesses the same binary again and again.
Let's add a single entry cache the debug info structure for the last DSO.
Depending on sample data, it usually gives me 2~3x (and sometimes more)
speed ups.
Note that this will introduce a little difference in the output due to
the order of checking stack operations. It used to check the stack ops
before checking the availability of debug info but I moved it after the
symbol check. So it'll report stack operations in DSOs without debug
info as unknown. But I think it's ok and better to have the checking
near the caching logic.
Committer testing:
root@x1:~# perf mem record -a sleep 5s
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~# diff -u before after
--- before 2024-08-08 09:33:53.880780784 -0300
+++ after 2024-08-08 09:35:13.917325041 -0300
@@ -81,8 +81,8 @@
# Overhead Data Type
# ........ .........
#
- 55.43% (unknown)
- 11.61% (stack operation)
+ 55.56% (unknown)
+ 11.48% (stack operation)
4.93% struct pcpu_hot
3.26% unsigned int
2.48% struct
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240805234648.1453689-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
iter_finish_branch_entry() doesn't put the branch_info from/to map
elements creating memory leaks. This can be seen with:
```
$ perf record -e cycles -b perf test -w noploop
$ perf report -D
...
Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
#0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x564d3400d10b in map__get util/map.h:186
#2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
#3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
#4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
#5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
#6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
#7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
#8 0x564d3403ba52 in do_flush util/ordered-events.c:245
#9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
#10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
#11 0x564d34032480 in perf_session__process_event util/session.c:1877
#12 0x564d340336ad in reader__read_event util/session.c:2399
#13 0x564d34033fdc in reader__process_events util/session.c:2448
#14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
#15 0x564d34033fdc in perf_session__process_events util/session.c:2661
#16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
#17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
#18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
#19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
#20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
#21 0x564d33cdd827 in main tools/perf/perf.c:561
...
```
Clearing up the map_symbols properly creates maps reference count
issues so resolve those. Resolving this issue doesn't improve peak
heap consumption for the test above.
Committer testing:
$ sudo dnf install libasan
$ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/20240807065136.1039977-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up changes from:
0f9ca80fa4 fs: Add initial atomic write support info to statx
f9af549d1f fs: export mount options via statmount()
0a3deb1185 fs: Allow listmount() in foreign mount namespace
09b31295f8 fs: export the mount ns id via statmount
d04bccd8c1 listmount: allow listing in reverse order
bfc69fd05e fs/procfs: add build ID fetching to PROCMAP_QUERY API
ed5d583a88 fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps
This should be used to beautify FS syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h
diff -u tools/perf/trace/beauty/include/uapi/linux/fs.h include/uapi/linux/fs.h
diff -u tools/perf/trace/beauty/include/uapi/linux/mount.h include/uapi/linux/mount.h
diff -u tools/perf/trace/beauty/include/uapi/linux/stat.h include/uapi/linux/stat.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick up changes from:
d25a92ccae net/smc: Introduce IPPROTO_SMC
060f4ba6e4 io_uring/net: move charging socket out of zc io_uring
bb6aaf7366 net: Split a __sys_listen helper for io_uring
dc2e779794 net: Split a __sys_bind helper for io_uring
This should be used to beautify socket syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
And arch syscall tables to pick up changes from:
b1e31c134a powerpc: restore some missing spu syscalls
d3882564a7 syscalls: fix compat_sys_io_pgetevents_time64 usage
54233a4254 uretprobe: change syscall number, again
63ded11097 uprobe: Change uretprobe syscall scope and number
9142be9e64 x86/syscall: Mark exit[_group] syscall handlers __noreturn
9aae1baa1c x86, arm: Add missing license tag to syscall tables files
5c28424e9a syscalls: Fix to add sys_uretprobe to syscall.tbl
190fec72df uprobe: Wire up uretprobe system call
This should be used to beautify syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick up changes from:
f05c1ffc27 ALSA: pcm: reinvent the stream synchronization ID API
This should be used to beautify sound syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: linux-sound@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To pick a patch that albeit being for tools/perf/ directory went thru a
different tree and ended up breaking some recent tests introduced in the
perf-tools-next tree to validate duplicate events in the JSON
performance event files.
Link: https://lore.kernel.org/lkml/ZrIqDMg7cBVhstYU@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Duplicate event names break invariants in 'perf list'. Assert that an
event name isn't duplicated so that broken JSON won't build.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
OP_SPEC is repeated twice in the file which will break invariants in
'perf list' as discussed in this thread:
https://lore.kernel.org/linux-perf-users/20240719081651.24853-1-eric.lin@sifive.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Switch from $? (all the prerequisites that are newer than the target)
to $^ (all the prerequisites) as touching jevents.py will mean that
empty-pmu-events.c won't be passed to the diff command breaking the
build.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Building with JEVENTS_ARCH=all builds all CPU types and allows things
like assertions to check the validity of the input JSON.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is a preparation to support skipping empty events.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly. Use this
function consistently so that we can make sure it has similar output
in different modes. But there's a difference in stdio and tui output:
stdio uses 8 and tui uses 7 for a percent.
Let's use 8 and adjust the print width in __annotation_line__write()
properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We want to use it in different places so make sure it sets properly
in symbol__annotate() before creating the disasm lines.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The data_nr keeps the number of entries in al->data[] so it should use
it when it iterates the array. The notes->src->nr_events should have
the same number but it'd be natural to use al->data_nr.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a common options section and move some items to the section. Also
add description of new options to report options.
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20240802180913.1023886-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* A fix to avoid dropping some of the internal pseudo-extensions, which
breaks *envcfg dependency parsing.
* The kernel entry address is now aligned in purgatory, which avoids a
misaligned load that can lead to crash on systems that don't support
misaligned accesses early in boot.
* The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of
perf JSON configurations, one of them been updated to
FW_SFENCE_VMA_ASID_SENT.
* The starfive cache driver is now restricted to 64-bit systems, as it
isn't 32-bit clean.
* A fix for to avoid aliasing legacy-mode perf counters with software
perf counters.
* VM_FAULT_SIGSEGV is now handled in the page fault code.
* A fix for stalls during CPU hotplug due to IPIs being disabled.
* A fix for memblock bounds checking. This manifests as a crash on
systems with discontinuous memory maps that have regions that don't
fit in the linear map.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmas/qwTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiWp7EACDcorcihBG8uSsX//GKJPjkiGIbZkT
MIMN3yqIzJuSftxpvgVxpyq2MFKYy7BK/75sK+4VoQpoCJEtdxbdh0JUqck/Nrgj
Kn0hxWy7RO6Rp9ggf9dTdca64Tdxh32Eegpum3E46zuhYQBMcNze4z4NsOXs6ems
254ww8+v7V5R7FGsxm1PG4Hs3soxZ9FPdWE69ndxmjr9N5FFkchk5YbV8AgKYtSJ
sfu5Q+68zh58GVZhn0usug0fHNgVzdvwy3PIBDGD58hqIDAs9WlF80MiW3sESTIe
PrJcAFBU4tHp+8h+OMaKw2xfybrZpNmqobx7dED34PJu0R4+Uvz7MUKMMPUJeB+q
7UOZokjF2Hvd5VsAeTc1PisvzVsWkWpkzJqZmdaTr2m8J4m5z7/nby+ZcXmoOlVz
JiMDgrkM4KIziq++9bYbBfcxsS9dMsvNtEQAHByL/zdVfAFTvWUMUmAgg27C3K9Z
QbHfbpxqQ/pEu4CsRUIx4GnkEKnWPLuGovnYboGmC3BCDwQkkV8H0tcEhJtWMKte
6h+vvKBX2POS4l8467ElmcTRv5Cfpi/dmhZrC9SHHQhNF5OiHHM2CmSEOKS1bUPj
e4+k/QGmVQOAJGRRPkpD+DFMhHT/jhvbYV4kDXr/h9AKJQ2eWRGMSOMaPJ/X311N
R5W1yiJilIhXuQ==
=K52W
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid dropping some of the internal pseudo-extensions, which
breaks *envcfg dependency parsing
- The kernel entry address is now aligned in purgatory, which avoids a
misaligned load that can lead to crash on systems that don't support
misaligned accesses early in boot
- The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of
perf JSON configurations, one of them been updated to
FW_SFENCE_VMA_ASID_SENT
- The starfive cache driver is now restricted to 64-bit systems, as it
isn't 32-bit clean
- A fix for to avoid aliasing legacy-mode perf counters with software
perf counters
- VM_FAULT_SIGSEGV is now handled in the page fault code
- A fix for stalls during CPU hotplug due to IPIs being disabled
- A fix for memblock bounds checking. This manifests as a crash on
systems with discontinuous memory maps that have regions that don't
fit in the linear map
* tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix linear mapping checks for non-contiguous memory regions
RISC-V: Enable the IPI before workqueue_online_cpu()
riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error()
perf: riscv: Fix selecting counters in legacy mode
cache: StarFive: Require a 64-bit system
perf arch events: Fix duplicate RISC-V SBI firmware event name
riscv/purgatory: align riscv_kernel_entry
riscv: cpufeature: Do not drop Linux-internal extensions
This is just a shortcut to have 'type' in the sort key and use more
compact output format like below.
$ perf mem report -T
...
#
# Overhead Samples Memory access Snoop TLB access Data Type
# ........ ............ ....................................... ............ ...................... .........
#
14.84% 22 L1 hit None L1 or L2 hit (unknown)
7.68% 8 LFB/MAB hit None L1 or L2 hit (unknown)
7.17% 3 RAM hit Hit L2 miss (unknown)
6.29% 12 L1 hit None L1 or L2 hit (stack operation)
4.85% 5 RAM hit Hit L1 or L2 hit (unknown)
3.97% 5 LFB/MAB hit None L1 or L2 hit struct psi_group_cpu
3.18% 3 LFB/MAB hit None L1 or L2 hit (stack operation)
2.58% 3 L1 hit None L1 or L2 hit unsigned int
2.36% 2 L1 hit None L1 or L2 hit struct
2.31% 2 L1 hit None L1 or L2 hit struct psi_group_cpu
...
Users also can use their own sort keys and -T option makes sure it has
the 'type' sort key at the end.
$ perf mem report -T -s mem
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some sort keys are meaningful only in a specific mode - like branch
stack and memory (data-src). Add the mode to skip unnecessary ones.
This will be used for 'perf mem report' later.
While at it, change the prefix for the -F/--fields option to remove
the duplicate part.
Before:
$ perf report -F
Error: switch `F' requires a value
Usage: perf report [<options>]
-F, --fields <key[,keys...]>
output field(s): overhead period sample overhead overhead_sys
overhead_us overhead_guest_sys overhead_guest_us overhead_children
sample period weight1 weight2 weight3 ins_lat retire_lat
...
After:
$ perf report -F
Error: switch `F' requires a value
Usage: perf report [<options>]
-F, --fields <key[,keys...]>
output field(s): overhead overhead_sys overhead_us
overhead_guest_sys overhead_guest_us overhead_children
sample period weight1 weight2 weight3 ins_lat retire_lat
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Split the common option and ones for record or report. Otherwise -U in
the record option cannot be used because it clashes with in the common
(or report) option. Also rename report_events() to __cmd_report() to
follow the convention and to be sync with the record part.
Also set the flag PARSE_OPT_STOP_AT_NON_OPTION for the common option so
that it can show the help message in the subcommand like below:
$ perf mem record -h
Usage: perf mem record [<options>] [<command>]
or: perf mem record [<options>] -- <command> [<options>]
-C, --cpu <cpu> list of cpus to profile
-e, --event <event> event selector. use 'perf mem record -e list' to list available events
-f, --force don't complain, do it
-K, --all-kernel collect only kernel level data
-p, --phys-data Record/Report sample physical addresses
-t, --type <type> memory operations(load,store) Default load,store
-U, --all-user collect only user level data
-v, --verbose be more verbose (show counter open errors, etc)
--data-page-size Record/Report sample data address page size
--ldlat <n> mem-loads latency
Cc: Aditya Gupta <adityag@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The get_sort_order() returns either a new string (from strdup) or NULL
but it never gets freed.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fixes: 2e7f545096 ("perf mem: Factor out a function to generate sort order")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-3-namhyung@kernel.org
[ Added Fixes tag ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'struct mem_info' is created by iter_prepare_mem_entry() at the
beginning and destroyed by iter_finish_mem_entry() at the end.
So if it's used in a new hist_entry, it should be cloned.
Simplify (hopefully) the logic by adding some helper functions and by
not holding the refcount in the temporary entry.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240731235505.710436-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When perf code was compiled one way for the binary and another for the
python module, the PYTHON_PERF ifdef was used to remove some code from
the python module.
Since switching to building the perf code as a series of libraries, with
the same libraries being used for the python module, the ifdefs became
unused as PYTHON_PERF is never defined. As such remove the ifdefs.
Fixes: 9dabf40034 ("perf python: Switch module to linking libraries from building source")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240731230005.12295-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
empty-pmu-events.c exists so that builds may occur without python
being installed on a system. Manually updating empty-pmu-events.c to
be in sync with jevents.py is a pain, let's use jevents.py to generate
empty-pmu-events.c.
1) change jevents.py so that an arch and model of none cause
generation of a pmu-events.c without any json. Add a SPDX and
autogenerated warning to the start of the file.
2) change Build so that if a generated pmu-events.c for arch none and
model none doesn't match empty-pmu-events.c the build fails with a
cat of the differences. Update Makefile.perf to clean up the files
used for this.
3) update empty-pmu-events.c to match the output of jevents.py with
arch and mode of none.
Committer notes:
The firtst paragraph is confusing, so I asked and Ian further clarified:
---
The requirement for python hasn't changed.
Case 1: no python or NO_JEVENTS=1
Build happens using empty-pmu-events.c that is checked in, no python
is required.
Case 2: python
pmu-events.c is created by jevents.py (requiring python) and then built.
This change adds a step where the empty-pmu-events.c is created using
jevents.py and that file is diffed against the checked in version.
This stops the checked in empty-pmu-events.c diverging if changes are
made to jevents.py. If the diff causes the build to fail then you just
copy the diff empty-pmu-events.c over the checked in one.
---
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Sang <oliver.sang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philip Li <philip.li@intel.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240730191744.3097329-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is a clash of the libbpf and capstone libraries, that ends up
with:
In file included from /usr/include/capstone/capstone.h:325,
from util/disasm.c:1513:
/usr/include/capstone/bpf.h:94:14: error: ‘bpf_insn’ defined as wrong kind of tag
94 | typedef enum bpf_insn {
So far we're just trying to avoid this by not having both headers
included in the same .c or .h file, do it one more time by moving the
BPF diassembly routines from util/disasm.c to util/disasm_bpf.c.
This is only being hit when building with BUILD_NONDISTRO=1, i.e.
building with binutils-devel, that isn't the in the default build due to
a licencing clash. We need to reimplement what is now isolated in
util/disasm_bpf.c using some other library to have BPF annotation
feature that now only is available with BUILD_NONDISTRO=1.
Fixes: 6d17edc113 ("perf annotate: Use libcapstone to disassemble")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZqpUSKPxMwaQKORr@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now it can run the BPF filtering test with normal user if the BPF
objects are pinned by 'sudo perf record --setup-filter pin'. Let's
update the test case to verify the behavior. It'll skip the test if the
filter check is failed from a normal user, but it shows a message how to
set up the filters.
First, run the test as a normal user and it fails.
$ perf test -vv filtering
95: perf record sample filtering (by BPF) tests:
--- start ---
test child forked, pid 425677
Checking BPF-filter privilege
try 'sudo perf record --setup-filter pin' first. <<<--- here
bpf-filter test [Skipped permission]
---- end(-2) ----
95: perf record sample filtering (by BPF) tests : Skip
According to the message, run the perf record command to pin the BPF
objects.
$ sudo perf record --setup-filter pin
And re-run the test as a normal user.
$ perf test -vv filtering
95: perf record sample filtering (by BPF) tests:
--- start ---
test child forked, pid 424486
Checking BPF-filter privilege
Basic bpf-filter test
Basic bpf-filter test [Success]
Failing bpf-filter test
Error: task-clock event does not have PERF_SAMPLE_CPU
Failing bpf-filter test [Success]
Group bpf-filter test
Error: task-clock event does not have PERF_SAMPLE_CPU
Error: task-clock event does not have PERF_SAMPLE_CODE_PAGE_SIZE
Group bpf-filter test [Success]
---- end(0) ----
95: perf record sample filtering (by BPF) tests : Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To allow BPF filters for unprivileged users it needs to pin the BPF
objects to BPF-fs first. Let's add a new option to pin and unpin the
objects easily. I'm not sure 'perf record' is a right place to do this
but I don't have a better idea right now.
$ sudo perf record --setup-filter pin
The above command would pin BPF program and maps for the filter when the
system has BPF-fs (usually at /sys/fs/bpf/). To unpin the objects,
users can run the following command (as root).
$ sudo perf record --setup-filter unpin
Committer testing:
root@number:~# perf record --setup-filter pin
root@number:~# ls -la /sys/fs/bpf/perf_filter/
total 0
drwxr-xr-x. 2 root root 0 Jul 31 10:43 .
drwxr-xr-t. 3 root root 0 Jul 31 10:43 ..
-rw-rw-rw-. 1 root root 0 Jul 31 10:43 dropped
-rw-rw-rw-. 1 root root 0 Jul 31 10:43 filters
-rwxrwxrwx. 1 root root 0 Jul 31 10:43 perf_sample_filter
-rw-rw-rw-. 1 root root 0 Jul 31 10:43 pid_hash
-rw-------. 1 root root 0 Jul 31 10:43 sample_f_rodata
root@number:~# ls -la /sys/fs/bpf/perf_filter/perf_sample_filter
-rwxrwxrwx. 1 root root 0 Jul 31 10:43 /sys/fs/bpf/perf_filter/perf_sample_filter
root@number:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The evlist is allocated at the beginning of cmd_record(). Also free-ing
thread masks should be paired with record__init_thread_masks() which is
called right before __cmd_record().
Let's change the order of these functions to release the resources
correctly in case of errors. This is maybe fine as the process exits,
but it might be a problem if it manages some system-wide resources that
live longer than the process.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As the BPF filter is shared between other processes, it should have its
own counter for each invocation. Add a new array map (lost_count) to
save the count using the same index as the filter. It should clear the
count before running the filter.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
And use the pinned objects for unprivileged users to profile their own
tasks. The BPF objects need to be pinned in the BPF-fs by root first
and it'll be handled in the later patch.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If the target is a list of tasks, it can use a shared hash map for
filter expressions. The key of the filter map is an integer index like
in an array. A separate pid_hash map is added to get the index for the
filter map using the tgid.
For system-wide mode including per-cpu or per-user targets are handled
by the single entry map like before.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>