mirror_ubuntu-kernels/tools/perf
Song Liu 7fac83aaf2 perf stat: Introduce 'bperf' to share hardware PMCs with BPF
The perf tool uses performance monitoring counters (PMCs) to monitor
system performance. The PMCs are limited hardware resources. For
example, Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.

Modern data center systems use these PMCs in many different ways: system
level monitoring, (maybe nested) container level monitoring, per process
monitoring, profiling (in sample mode), etc. In some cases, there are
more active perf_events than available hardware PMCs. To allow all
perf_events to have a chance to run, it is necessary to do expensive
time multiplexing of events.

On the other hand, many monitoring tools count the common metrics
(cycles, instructions). It is a waste to have multiple tools create
multiple perf_events of "cycles" and occupy multiple PMCs.

bperf tries to reduce such wastes by allowing multiple perf_events of
"cycles" or "instructions" (at different scopes) to share PMUs. Instead
of having each perf-stat session to read its own perf_events, bperf uses
BPF programs to read the perf_events and aggregate readings to BPF maps.
Then, the perf-stat session(s) reads the values from these BPF maps.

Please refer to the comment before the definition of bperf_ops for the
description of bperf architecture.

bperf is off by default. To enable it, pass --bpf-counters option to
perf-stat. bperf uses a BPF hashmap to share information about BPF
programs and maps used by bperf. This map is pinned to bpffs. The
default path is /sys/fs/bpf/perf_attr_map. The user could change the
path with option --bpf-attr-map.

Committer testing:

  # dmesg|grep "Performance Events" -A5
  [    0.225277] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
  [    0.225280] ... version:                0
  [    0.225280] ... bit width:              48
  [    0.225281] ... generic registers:      6
  [    0.225281] ... value mask:             0000ffffffffffff
  [    0.225281] ... max period:             00007fffffffffff
  #
  #  for a in $(seq 6) ; do perf stat -a -e cycles,instructions sleep 100000 & done
  [1] 2436231
  [2] 2436232
  [3] 2436233
  [4] 2436234
  [5] 2436235
  [6] 2436236
  # perf stat -a -e cycles,instructions sleep 0.1

   Performance counter stats for 'system wide':

         310,326,987      cycles                                                        (41.87%)
         236,143,290      instructions              #    0.76  insn per cycle           (41.87%)

         0.100800885 seconds time elapsed

  #

We can see that the counters were enabled for this workload 41.87% of
the time.

Now with --bpf-counters:

  #  for a in $(seq 32) ; do perf stat --bpf-counters -a -e cycles,instructions sleep 100000 & done
  [1] 2436514
  [2] 2436515
  [3] 2436516
  [4] 2436517
  [5] 2436518
  [6] 2436519
  [7] 2436520
  [8] 2436521
  [9] 2436522
  [10] 2436523
  [11] 2436524
  [12] 2436525
  [13] 2436526
  [14] 2436527
  [15] 2436528
  [16] 2436529
  [17] 2436530
  [18] 2436531
  [19] 2436532
  [20] 2436533
  [21] 2436534
  [22] 2436535
  [23] 2436536
  [24] 2436537
  [25] 2436538
  [26] 2436539
  [27] 2436540
  [28] 2436541
  [29] 2436542
  [30] 2436543
  [31] 2436544
  [32] 2436545
  #
  # ls -la /sys/fs/bpf/perf_attr_map
  -rw-------. 1 root root 0 Mar 23 14:53 /sys/fs/bpf/perf_attr_map
  # bpftool map | grep bperf | wc -l
  64
  #

  # bpftool map | tail
  1265: percpu_array  name accum_readings  flags 0x0
  	key 4B  value 24B  max_entries 1  memlock 4096B
  1266: hash  name filter  flags 0x0
  	key 4B  value 4B  max_entries 1  memlock 4096B
  1267: array  name bperf_fo.bss  flags 0x400
  	key 4B  value 8B  max_entries 1  memlock 4096B
  	btf_id 996
  	pids perf(2436545)
  1268: percpu_array  name accum_readings  flags 0x0
  	key 4B  value 24B  max_entries 1  memlock 4096B
  1269: hash  name filter  flags 0x0
  	key 4B  value 4B  max_entries 1  memlock 4096B
  1270: array  name bperf_fo.bss  flags 0x400
  	key 4B  value 8B  max_entries 1  memlock 4096B
  	btf_id 997
  	pids perf(2436541)
  1285: array  name pid_iter.rodata  flags 0x480
  	key 4B  value 4B  max_entries 1  memlock 4096B
  	btf_id 1017  frozen
  	pids bpftool(2437504)
  1286: array  flags 0x0
  	key 4B  value 32B  max_entries 1  memlock 4096B
  #
  # bpftool map dump id 1268 | tail
  value (CPU 21):
  8f f3 bc ca 00 00 00 00  80 fd 2a d1 4d 00 00 00
  80 fd 2a d1 4d 00 00 00
  value (CPU 22):
  7e d5 64 4d 00 00 00 00  a4 8a 2e ee 4d 00 00 00
  a4 8a 2e ee 4d 00 00 00
  value (CPU 23):
  a7 78 3e 06 01 00 00 00  b2 34 94 f6 4d 00 00 00
  b2 34 94 f6 4d 00 00 00
  Found 1 element
  # bpftool map dump id 1268 | tail
  value (CPU 21):
  c6 8b d9 ca 00 00 00 00  20 c6 fc 83 4e 00 00 00
  20 c6 fc 83 4e 00 00 00
  value (CPU 22):
  9c b4 d2 4d 00 00 00 00  3e 0c df 89 4e 00 00 00
  3e 0c df 89 4e 00 00 00
  value (CPU 23):
  18 43 66 06 01 00 00 00  5b 69 ed 83 4e 00 00 00
  5b 69 ed 83 4e 00 00 00
  Found 1 element
  # bpftool map dump id 1268 | tail
  value (CPU 21):
  f2 6e db ca 00 00 00 00  92 67 4c ba 4e 00 00 00
  92 67 4c ba 4e 00 00 00
  value (CPU 22):
  dc 8e e1 4d 00 00 00 00  d9 32 7a c5 4e 00 00 00
  d9 32 7a c5 4e 00 00 00
  value (CPU 23):
  bd 2b 73 06 01 00 00 00  7c 73 87 bf 4e 00 00 00
  7c 73 87 bf 4e 00 00 00
  Found 1 element
  #

  # perf stat --bpf-counters -a -e cycles,instructions sleep 0.1

   Performance counter stats for 'system wide':

       119,410,122      cycles
       152,105,479      instructions              #    1.27  insn per cycle

       0.101395093 seconds time elapsed

  #

See? We had the counters enabled all the time.

Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20210316211837.910506-2-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-23 17:46:44 -03:00
..
arch perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
bench perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
Documentation perf stat: Introduce 'bperf' to share hardware PMCs with BPF 2021-03-23 17:46:44 -03:00
examples/bpf perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
include/bpf perf bpf: Remove bpf/ subdir from bpf.h headers used to build bpf events 2020-02-18 10:13:28 -03:00
jvmti perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
pmu-events perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
python tweewide: Fix most Shebang lines 2020-12-08 23:30:04 +09:00
scripts perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
tests perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
trace perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
ui perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
util perf stat: Introduce 'bperf' to share hardware PMCs with BPF 2021-03-23 17:46:44 -03:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
Build perf daemon: Add daemon command 2021-02-09 15:42:57 -03:00
builtin-annotate.c perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
builtin-bench.c perf bench: Add build-id injection benchmark 2020-10-13 10:59:42 -03:00
builtin-buildid-cache.c perf buildid-cache: Add --debuginfod option to specify a server to fetch debug files 2020-12-28 12:20:39 -03:00
builtin-buildid-list.c perf buildid-list: Add support for mmap2's buildid events 2020-12-28 12:23:09 -03:00
builtin-c2c.c perf c2c: Support data block and addr block 2021-02-08 16:25:00 -03:00
builtin-config.c perf tools: Remove util.h from where it is not needed 2019-09-20 09:19:20 -03:00
builtin-daemon.c perf daemon: Fix compile error with Asan 2021-03-06 16:54:31 -03:00
builtin-data.c perf data: Add support to store time of day in CTF data conversion 2020-08-06 09:43:37 -03:00
builtin-diff.c perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
builtin-evlist.c perf evlist: Support pipe mode display 2020-12-17 14:36:17 -03:00
builtin-ftrace.c perf evlist: Use the right prefix for 'struct evlist' create maps methods 2020-11-30 14:56:52 -03:00
builtin-help.c perf debug: Remove needless include directives from debug.h 2019-08-31 19:10:19 -03:00
builtin-inject.c perf inject jit: Add namespaces support 2021-02-03 13:10:44 -03:00
builtin-kallsyms.c perf dsos: Move the dsos struct and its methods to separate source files 2019-08-31 22:24:10 -03:00
builtin-kmem.c perf evlist: Use the right prefix for 'struct evlist' 'find' methods 2020-11-30 09:48:07 -03:00
builtin-kvm.c perf evlist: Use the right prefix for 'struct evlist' mmap pages parsing method 2020-11-30 15:15:30 -03:00
builtin-list.c perf list: Remove dead code in argument check 2020-09-09 11:12:10 -03:00
builtin-lock.c perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
builtin-mem.c perf tools: Support data block and addr block 2021-02-08 16:25:00 -03:00
builtin-probe.c perf probe: Do not show the skipped events 2020-05-28 10:03:24 -03:00
builtin-record.c perf auxtrace: Automatically group aux-output events 2021-02-18 16:11:19 -03:00
builtin-report.c perf report: Create option to disable raw event ordering 2021-03-03 12:59:27 -03:00
builtin-sched.c perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
builtin-script.c perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
builtin-stat.c perf stat: Introduce 'bperf' to share hardware PMCs with BPF 2021-03-23 17:46:44 -03:00
builtin-timechart.c perf tools: Replace zero-length array with flexible-array 2020-05-28 10:03:27 -03:00
builtin-top.c perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
builtin-trace.c perf trace: Fix race in signal handling 2021-03-06 16:54:32 -03:00
builtin-version.c perf version: Add a feature for libpfm4 2020-11-04 09:42:40 -03:00
builtin.h perf daemon: Add daemon command 2021-02-09 15:42:57 -03:00
check-headers.sh perf tools: Generate mips syscalls_n64.c syscall table 2021-03-01 14:49:28 -03:00
command-list.txt perf daemon: Add daemon command 2021-02-09 15:42:57 -03:00
CREDITS
design.txt perf tools: Support CAP_PERFMON capability 2020-04-16 12:19:08 -03:00
Makefile tools: Let O= makes handle a relative path with -C option 2020-03-06 17:08:28 -03:00
Makefile.config perf tools: Generate mips syscalls_n64.c syscall table 2021-03-01 14:49:28 -03:00
Makefile.perf perf stat: Introduce 'bperf' to share hardware PMCs with BPF 2021-03-23 17:46:44 -03:00
MANIFEST libperf: Move to tools/lib/perf 2020-01-06 11:46:09 -03:00
perf-archive.sh perf archive: Fix filtering of empty build-ids 2021-03-06 16:54:31 -03:00
perf-completion.sh
perf-read-vdso.c
perf-sys.h perf tests: Call test_attr__open() directly 2020-09-10 11:55:37 -03:00
perf-with-kcore.sh
perf.c perf daemon: Add daemon command 2021-02-09 15:42:57 -03:00
perf.h perf time-utils: Adopt rdclock() from perf.h 2019-08-29 17:38:32 -03:00