mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2026-01-08 18:24:39 +00:00
It would be useful to support sorting for all blocks by the sampled
cycles percent per block. This is useful to concentrate on the globally
hottest blocks.
This patch implements a new option "--total-cycles" which sorts all
blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:
percent = block sampled cycles aggregation / total sampled cycles
Note that, this patch only supports "--stdio" mode.
For example,
# perf record -b ./div
# perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 2753248
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ................................................ .................
#
26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so
0.25% 182.5K 0.02% 1 [random_r.c:388 -> random_r.c:391] libc-2.27.so
0.00% 48 1.07% 48 [x86_pmu_enable+284 -> x86_pmu_enable+298] [kernel.kallsyms]
0.00% 74 1.64% 74 [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92] [kernel.kallsyms]
0.00% 73 1.62% 73 [vm_mmap+0 -> vm_mmap+48] [kernel.kallsyms]
0.00% 63 0.69% 31 [up_write+0 -> up_write+34] [kernel.kallsyms]
0.00% 13 0.29% 13 [setup_arg_pages+396 -> setup_arg_pages+413] [kernel.kallsyms]
0.00% 3 0.07% 3 [setup_arg_pages+418 -> setup_arg_pages+450] [kernel.kallsyms]
0.00% 616 6.84% 308 [security_mmap_file+0 -> security_mmap_file+72] [kernel.kallsyms]
0.00% 23 0.51% 23 [security_mmap_file+77 -> security_mmap_file+87] [kernel.kallsyms]
0.00% 4 0.02% 1 [sched_clock+0 -> sched_clock+4] [kernel.kallsyms]
0.00% 4 0.02% 1 [sched_clock+9 -> sched_clock+12] [kernel.kallsyms]
0.00% 1 0.02% 1 [rcu_nmi_exit+0 -> rcu_nmi_exit+9] [kernel.kallsyms]
Committer testing:
This should provide material for hours of endless joy, both from looking
for suspicious things in the implementation of this patch, such as the
top one:
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
As well from things that look legit:
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]
:-)
Very short system wide taken branches session:
# perf record -h -b
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-b, --branch-any sample any taken branches
#
# perf record -b
^C[ perf record: Woken up 595 times to write data ]
[ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
#
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
#
# perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 6M of event 'cycles'
# Event count (approx.): 6299936
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ...................................................................... ....................
#
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux]
0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux]
0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux]
0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux]
0.30% 260.8K 0.07% 564 [clear_page_64.S:47 -> clear_page_64.S:50] [kernel.vmlinux]
0.28% 215.3K 0.05% 369 [traps.c:623 -> traps.c:628] [kernel.vmlinux]
0.23% 178.1K 0.04% 278 [entry_64.S:271 -> entry_64.S:275] [kernel.vmlinux]
0.20% 152.6K 0.09% 706 [paravirt.c:177 -> paravirt.c:179] [kernel.vmlinux]
0.20% 155.8K 0.05% 373 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux]
0.18% 136.6K 0.03% 222 [msr.h:105 -> msr.h:166] [kernel.vmlinux]
0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]
0.16% 118.3K 0.01% 44 [entry_64.S:632 -> entry_64.S:657] [kernel.vmlinux]
0.14% 104.5K 0.00% 28 [rwsem.c:1541 -> rwsem.c:1544] [kernel.vmlinux]
0.13% 99.2K 0.01% 53 [spinlock.c:150 -> spinlock.c:152] [kernel.vmlinux]
0.13% 95.5K 0.00% 35 [swap.c:456 -> swap.c:471] [kernel.vmlinux]
0.12% 96.2K 0.05% 407 [copy_user_64.S:175 -> copy_user_64.S:209] [kernel.vmlinux]
0.11% 85.9K 0.00% 31 [swap.c:400 -> page-flags.h:188] [kernel.vmlinux]
0.10% 73.0K 0.01% 52 [paravirt.h:763 -> list.h:131] [kernel.vmlinux]
0.07% 56.2K 0.03% 214 [filemap.c:1524 -> filemap.c:1557] [kernel.vmlinux]
0.07% 54.2K 0.02% 145 [memory.c:1032 -> memory.c:1049] [kernel.vmlinux]
0.07% 50.3K 0.00% 39 [mmzone.c:49 -> mmzone.c:69] [kernel.vmlinux]
0.06% 48.3K 0.01% 40 [paravirt.h:768 -> page_alloc.c:3304] [kernel.vmlinux]
0.06% 46.7K 0.02% 155 [memory.c:1032 -> memory.c:1056] [kernel.vmlinux]
0.06% 46.9K 0.01% 103 [swap.c:867 -> swap.c:902] [kernel.vmlinux]
0.06% 47.8K 0.00% 34 [entry_64.S:1201 -> entry_64.S:1202] [kernel.vmlinux]
-----------------------------------------------------------
v7:
---
Use use_browser in report__browse_block_hists for supporting
stdio and potential tui mode.
v6:
---
Create report__browse_block_hists in block-info.c (codes are
moved from builtin-report.c). It's called from
perf_evlist__tty_browse_hists.
v5:
---
1. Move all block functions to block-info.c
2. Move the code of setting ms in block hist_entry to
other patch.
v4:
---
1. Use new option '--total-cycles' to replace
'-s total_cycles' in v3.
2. Move block info collection out of block info
printing.
v3:
---
1. Use common function block_info__process_sym to
process the blocks per symbol.
2. Remove the nasty hack for skipping calculation
of column length
3. Some minor cleanup
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|---|---|---|
| .. | ||
| c++ | ||
| cs-etm-decoder | ||
| include | ||
| intel-pt-decoder | ||
| libunwind | ||
| scripting-engines | ||
| annotate.c | ||
| annotate.h | ||
| archinsn.h | ||
| arm-spe-pkt-decoder.c | ||
| arm-spe-pkt-decoder.h | ||
| arm-spe.c | ||
| arm-spe.h | ||
| auxtrace.c | ||
| auxtrace.h | ||
| block-info.c | ||
| block-info.h | ||
| block-range.c | ||
| block-range.h | ||
| bpf_map.c | ||
| bpf_map.h | ||
| bpf-event.c | ||
| bpf-event.h | ||
| bpf-loader.c | ||
| bpf-loader.h | ||
| bpf-prologue.c | ||
| bpf-prologue.h | ||
| branch.c | ||
| branch.h | ||
| Build | ||
| build-id.c | ||
| build-id.h | ||
| cache.h | ||
| cacheline.c | ||
| cacheline.h | ||
| call-path.c | ||
| call-path.h | ||
| callchain.c | ||
| callchain.h | ||
| cap.c | ||
| cap.h | ||
| cgroup.c | ||
| cgroup.h | ||
| cloexec.c | ||
| cloexec.h | ||
| color_config.c | ||
| color.c | ||
| color.h | ||
| comm.c | ||
| comm.h | ||
| compress.h | ||
| config.c | ||
| config.h | ||
| copyfile.c | ||
| copyfile.h | ||
| counts.c | ||
| counts.h | ||
| cpu-set-sched.h | ||
| cpumap.c | ||
| cpumap.h | ||
| cputopo.c | ||
| cputopo.h | ||
| cs-etm.c | ||
| cs-etm.h | ||
| data-convert-bt.c | ||
| data-convert-bt.h | ||
| data-convert.h | ||
| data.c | ||
| data.h | ||
| db-export.c | ||
| db-export.h | ||
| debug.c | ||
| debug.h | ||
| demangle-java.c | ||
| demangle-java.h | ||
| demangle-rust.c | ||
| demangle-rust.h | ||
| dso.c | ||
| dso.h | ||
| dsos.c | ||
| dsos.h | ||
| dump-insn.c | ||
| dump-insn.h | ||
| dwarf-aux.c | ||
| dwarf-aux.h | ||
| dwarf-regs.c | ||
| env.c | ||
| env.h | ||
| event.c | ||
| event.h | ||
| events_stats.h | ||
| evlist.c | ||
| evlist.h | ||
| evsel_config.h | ||
| evsel_fprintf.c | ||
| evsel_fprintf.h | ||
| evsel.c | ||
| evsel.h | ||
| evswitch.c | ||
| evswitch.h | ||
| expr.h | ||
| expr.y | ||
| find-map.c | ||
| genelf_debug.c | ||
| genelf.c | ||
| genelf.h | ||
| generate-cmdlist.sh | ||
| get_current_dir_name.c | ||
| get_current_dir_name.h | ||
| group.h | ||
| header.c | ||
| header.h | ||
| help-unknown-cmd.c | ||
| help-unknown-cmd.h | ||
| hist.c | ||
| hist.h | ||
| intel-bts.c | ||
| intel-bts.h | ||
| intel-pt.c | ||
| intel-pt.h | ||
| intlist.c | ||
| intlist.h | ||
| jit.h | ||
| jitdump.c | ||
| jitdump.h | ||
| kvm-stat.h | ||
| levenshtein.c | ||
| levenshtein.h | ||
| llvm-utils.c | ||
| llvm-utils.h | ||
| lzma.c | ||
| machine.c | ||
| machine.h | ||
| map_groups.h | ||
| map_symbol.h | ||
| map.c | ||
| map.h | ||
| mem2node.c | ||
| mem2node.h | ||
| mem-events.c | ||
| mem-events.h | ||
| memswap.c | ||
| memswap.h | ||
| metricgroup.c | ||
| metricgroup.h | ||
| mmap.c | ||
| mmap.h | ||
| namespaces.c | ||
| namespaces.h | ||
| ordered-events.c | ||
| ordered-events.h | ||
| parse-branch-options.c | ||
| parse-branch-options.h | ||
| parse-events.c | ||
| parse-events.h | ||
| parse-events.l | ||
| parse-events.y | ||
| parse-regs-options.c | ||
| parse-regs-options.h | ||
| path.c | ||
| path.h | ||
| perf_event_attr_fprintf.c | ||
| perf_regs.c | ||
| perf_regs.h | ||
| perf-hooks-list.h | ||
| perf-hooks.c | ||
| perf-hooks.h | ||
| PERF-VERSION-GEN | ||
| pmu.c | ||
| pmu.h | ||
| pmu.l | ||
| pmu.y | ||
| print_binary.c | ||
| print_binary.h | ||
| probe-event.c | ||
| probe-event.h | ||
| probe-file.c | ||
| probe-file.h | ||
| probe-finder.c | ||
| probe-finder.h | ||
| pstack.c | ||
| pstack.h | ||
| python-ext-sources | ||
| python.c | ||
| rb_resort.h | ||
| rblist.c | ||
| rblist.h | ||
| record.c | ||
| record.h | ||
| rlimit.c | ||
| rlimit.h | ||
| rwsem.c | ||
| rwsem.h | ||
| s390-cpumcf-kernel.h | ||
| s390-cpumsf-kernel.h | ||
| s390-cpumsf.c | ||
| s390-cpumsf.h | ||
| s390-sample-raw.c | ||
| sample-raw.c | ||
| sample-raw.h | ||
| session.c | ||
| session.h | ||
| setns.c | ||
| setup.py | ||
| smt.c | ||
| smt.h | ||
| sort.c | ||
| sort.h | ||
| spark.c | ||
| spark.h | ||
| srccode.c | ||
| srccode.h | ||
| srcline.c | ||
| srcline.h | ||
| stat-display.c | ||
| stat-shadow.c | ||
| stat.c | ||
| stat.h | ||
| strbuf.c | ||
| strbuf.h | ||
| strfilter.c | ||
| strfilter.h | ||
| string2.h | ||
| string.c | ||
| strlist.c | ||
| strlist.h | ||
| svghelper.c | ||
| svghelper.h | ||
| symbol_conf.h | ||
| symbol_fprintf.c | ||
| symbol-elf.c | ||
| symbol-minimal.c | ||
| symbol.c | ||
| symbol.h | ||
| symsrc.h | ||
| synthetic-events.c | ||
| synthetic-events.h | ||
| syscalltbl.c | ||
| syscalltbl.h | ||
| target.c | ||
| target.h | ||
| term.c | ||
| term.h | ||
| thread_map.c | ||
| thread_map.h | ||
| thread-stack.c | ||
| thread-stack.h | ||
| thread.c | ||
| thread.h | ||
| time-utils.c | ||
| time-utils.h | ||
| tool.h | ||
| top.c | ||
| top.h | ||
| trace-event-info.c | ||
| trace-event-parse.c | ||
| trace-event-read.c | ||
| trace-event-scripting.c | ||
| trace-event.c | ||
| trace-event.h | ||
| trigger.h | ||
| tsc.c | ||
| tsc.h | ||
| units.c | ||
| units.h | ||
| unwind-libdw.c | ||
| unwind-libdw.h | ||
| unwind-libunwind-local.c | ||
| unwind-libunwind.c | ||
| unwind.h | ||
| usage.c | ||
| util.c | ||
| util.h | ||
| values.c | ||
| values.h | ||
| vdso.c | ||
| vdso.h | ||
| xyarray.c | ||
| zlib.c | ||
| zstd.c | ||