Commit Graph

4 Commits

Author SHA1 Message Date
Sandipan Das
8d40f74ebf perf vendor events amd: Fix large metrics
There are cases where a metric requires more events than the number of
available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four
data fabric counters but the "nps1_die_to_dram" metric has eight events.

By default, the constituent events are placed in a group and since the
events cannot be scheduled at the same time, the metric is not computed.
The "all metrics" test also fails because of this.

Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect
the user to run perf with "--metric-no-group".

E.g.

  $ sudo perf test -v 101

Before:

  101: perf all metrics test                                           :
  --- start ---
  test child forked, pid 37131
  Testing branch_misprediction_ratio
  Testing all_remote_links_outbound
  Testing nps1_die_to_dram
  Metric 'nps1_die_to_dram' not printed in:
  Error:
  Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
  Testing macro_ops_dispatched
  Testing all_l2_cache_accesses
  Testing all_l2_cache_hits
  Testing all_l2_cache_misses
  Testing ic_fetch_miss_ratio
  Testing l2_cache_accesses_from_l2_hwpf
  Testing l2_cache_misses_from_l2_hwpf
  Testing op_cache_fetch_miss_ratio
  Testing l3_read_miss_latency
  Testing l1_itlb_misses
  test child finished with -1
  ---- end ----
  perf all metrics test: FAILED!

After:

  101: perf all metrics test                                           :
  --- start ---
  test child forked, pid 43766
  Testing branch_misprediction_ratio
  Testing all_remote_links_outbound
  Testing nps1_die_to_dram
  Testing macro_ops_dispatched
  Testing all_l2_cache_accesses
  Testing all_l2_cache_hits
  Testing all_l2_cache_misses
  Testing ic_fetch_miss_ratio
  Testing l2_cache_accesses_from_l2_hwpf
  Testing l2_cache_misses_from_l2_hwpf
  Testing op_cache_fetch_miss_ratio
  Testing l3_read_miss_latency
  Testing l1_itlb_misses
  test child finished with 0
  ---- end ----
  perf all metrics test: Ok

Reported-by: Ayush Jain <ayush.jain3@amd.com>
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-07-11 11:50:39 -03:00
Smita Koralahalli
e5f2b4e1b8 perf vendor events amd: Use 0x%02x format for event code and umask
Use 0x%02x format for all event codes and umasks as this helps in tracking
changes of automatically generated event tables.

Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-4-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:38 -03:00
Smita Koralahalli
86c2bc3da7 perf vendor events amd: Fix broken L2 Cache Hits from L2 HWPF metric
Commit 08ed77e414 ("perf vendor events amd: Add recommended events")
added the hits event "L2 Cache Hits from L2 HWPF" with the same metric
expression as the accesses event "L2 Cache Accesses from L2 HWPF":

$ perf list --details
...
  l2_cache_accesses_from_l2_hwpf
     [L2 Cache Accesses from L2 HWPF]
     [l2_pf_hit_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
  l2_cache_hits_from_l2_hwpf
     [L2 Cache Hits from L2 HWPF]
     [l2_pf_hit_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
...

This was wrong and led to counting hits the same as accesses. Section
2.1.15.2 "Performance Measurement" of "PPR for AMD Family 17h Model 31h
B0 - 55803 Rev 0.54 - Sep 12, 2019", documents the hits event with
EventCode 0x70 which is the same as l2_pf_hit_l2.

Fix this, and massage the description for l2_pf_hit_l2 as the hits event
is now the duplicate of l2_pf_hit_l2. AMD recommends using the recommended
event over other events if the duplicate exists and maintain both for
consistency. Hence, l2_cache_hits_from_l2_hwpf should override
l2_pf_hit_l2.

Before:

 # perf stat -M l2_cache_accesses_from_l2_hwpf,l2_cache_hits_from_l2_hwpf sleep 1

 Performance counter stats for 'sleep 1':

             1,436      l2_pf_miss_l2_l3          # 11114.00 l2_cache_accesses_from_l2_hwpf
                                                  # 11114.00 l2_cache_hits_from_l2_hwpf
             4,482      l2_pf_hit_l2
             5,196      l2_pf_miss_l2_hit_l3

       1.001765339 seconds time elapsed

After:

 # perf stat -M l2_cache_accesses_from_l2_hwpf sleep 1

 Performance counter stats for 'sleep 1':

             1,477      l2_pf_miss_l2_l3          # 10442.00 l2_cache_accesses_from_l2_hwpf
             3,978      l2_pf_hit_l2
             4,987      l2_pf_miss_l2_hit_l3

       1.001491186 seconds time elapsed

 # perf stat -e l2_cache_hits_from_l2_hwpf sleep 1

 Performance counter stats for 'sleep 1':

             3,983      l2_cache_hits_from_l2_hwpf

       1.001329970 seconds time elapsed

Note the difference in performance counter values for the accesses
versus the hits after the fix, and the hits event now counting the same
as l2_pf_hit_l2.

Fixes: 08ed77e414 ("perf vendor events amd: Add recommended events")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> # On a 3900X
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-2-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-08 14:24:38 -03:00
Kim Phillips
08ed77e414 perf vendor events amd: Add recommended events
Add support for events listed in Section 2.1.15.2 "Performance
Measurement" of "PPR for AMD Family 17h Model 31h B0 - 55803
Rev 0.54 - Sep 12, 2019".

perf now supports these new events (-e):

  all_dc_accesses
  all_tlbs_flushed
  l1_dtlb_misses
  l2_cache_accesses_from_dc_misses
  l2_cache_accesses_from_ic_misses
  l2_cache_hits_from_dc_misses
  l2_cache_hits_from_ic_misses
  l2_cache_misses_from_dc_misses
  l2_cache_misses_from_ic_miss
  l2_dtlb_misses
  l2_itlb_misses
  sse_avx_stalls
  uops_dispatched
  uops_retired
  l3_accesses
  l3_misses

and these metrics (-M):

  branch_misprediction_ratio
  all_l2_cache_accesses
  all_l2_cache_hits
  all_l2_cache_misses
  ic_fetch_miss_ratio
  l2_cache_accesses_from_l2_hwpf
  l2_cache_hits_from_l2_hwpf
  l2_cache_misses_from_l2_hwpf
  l3_read_miss_latency
  l1_itlb_misses
  all_remote_links_outbound
  nps1_die_to_dram

The nps1_die_to_dram event may need perf stat's --metric-no-group
switch if the number of available data fabric counters is less
than the number it uses (8).

Committer testing:

On a AMD Ryzen 3900x system:

Before:

  # perf list all_dc_accesses   all_tlbs_flushed   l1_dtlb_misses   l2_cache_accesses_from_dc_misses   l2_cache_accesses_from_ic_misses   l2_cache_hits_from_dc_misses   l2_cache_hits_from_ic_misses   l2_cache_misses_from_dc_misses   l2_cache_misses_from_ic_miss   l2_dtlb_misses   l2_itlb_misses   sse_avx_stalls   uops_dispatched   uops_retired   l3_accesses   l3_misses | grep -v "^Metric Groups:$" | grep -v "^$"
  #

After:

  # perf list all_dc_accesses   all_tlbs_flushed   l1_dtlb_misses   l2_cache_accesses_from_dc_misses   l2_cache_accesses_from_ic_misses   l2_cache_hits_from_dc_misses   l2_cache_hits_from_ic_misses   l2_cache_misses_from_dc_misses   l2_cache_misses_from_ic_miss   l2_dtlb_misses   l2_itlb_misses   sse_avx_stalls   uops_dispatched   uops_retired   l3_accesses   l3_misses | grep -v "^Metric Groups:$" | grep -v "^$" | grep -v "^recommended:$"
  all_dc_accesses
       [All L1 Data Cache Accesses]
  all_tlbs_flushed
       [All TLBs Flushed]
  l1_dtlb_misses
       [L1 DTLB Misses]
  l2_cache_accesses_from_dc_misses
       [L2 Cache Accesses from L1 Data Cache Misses (including prefetch)]
  l2_cache_accesses_from_ic_misses
       [L2 Cache Accesses from L1 Instruction Cache Misses (including
        prefetch)]
  l2_cache_hits_from_dc_misses
       [L2 Cache Hits from L1 Data Cache Misses]
  l2_cache_hits_from_ic_misses
       [L2 Cache Hits from L1 Instruction Cache Misses]
  l2_cache_misses_from_dc_misses
       [L2 Cache Misses from L1 Data Cache Misses]
  l2_cache_misses_from_ic_miss
       [L2 Cache Misses from L1 Instruction Cache Misses]
  l2_dtlb_misses
       [L2 DTLB Misses & Data page walks]
  l2_itlb_misses
       [L2 ITLB Misses & Instruction page walks]
  sse_avx_stalls
       [Mixed SSE/AVX Stalls]
  uops_dispatched
       [Micro-ops Dispatched]
  uops_retired
       [Micro-ops Retired]
  l3_accesses
       [L3 Accesses. Unit: amd_l3]
  l3_misses
       [L3 Misses (includes Chg2X). Unit: amd_l3]
  #

  # perf stat -a -e all_dc_accesses,all_tlbs_flushed,l1_dtlb_misses,l2_cache_accesses_from_dc_misses,l2_cache_accesses_from_ic_misses,l2_cache_hits_from_dc_misses,l2_cache_hits_from_ic_misses,l2_cache_misses_from_dc_misses,l2_cache_misses_from_ic_miss,l2_dtlb_misses,l2_itlb_misses,sse_avx_stalls,uops_dispatched,uops_retired,l3_accesses,l3_misses sleep 2

   Performance counter stats for 'system wide':

       433,439,949      all_dc_accesses                                               (35.66%)
               443      all_tlbs_flushed                                              (35.66%)
         2,985,885      l1_dtlb_misses                                                (35.66%)
        18,318,019      l2_cache_accesses_from_dc_misses                                     (35.68%)
        50,114,810      l2_cache_accesses_from_ic_misses                                     (35.72%)
        12,423,978      l2_cache_hits_from_dc_misses                                     (35.74%)
        40,703,103      l2_cache_hits_from_ic_misses                                     (35.74%)
         6,698,673      l2_cache_misses_from_dc_misses                                     (35.74%)
        12,090,892      l2_cache_misses_from_ic_miss                                     (35.74%)
           614,267      l2_dtlb_misses                                                (35.74%)
           216,036      l2_itlb_misses                                                (35.74%)
            11,977      sse_avx_stalls                                                (35.74%)
       999,276,223      uops_dispatched                                               (35.73%)
     1,075,311,620      uops_retired                                                  (35.69%)
         1,420,763      l3_accesses
           540,164      l3_misses

       2.002344121 seconds time elapsed

  # perf stat -a -e all_dc_accesses,all_tlbs_flushed,l1_dtlb_misses,l2_cache_accesses_from_dc_misses,l2_cache_accesses_from_ic_misses sleep 2

   Performance counter stats for 'system wide':

       175,943,104      all_dc_accesses
               310      all_tlbs_flushed
         2,280,359      l1_dtlb_misses
        11,700,151      l2_cache_accesses_from_dc_misses
        25,414,963      l2_cache_accesses_from_ic_misses

       2.001957818 seconds time elapsed

  #

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Jon Grimm <jon.grimm@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Link: http://lore.kernel.org/lkml/20200901220944.277505-3-kim.phillips@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04 16:32:22 -03:00