Commit Graph

1516 Commits

Author SHA1 Message Date
Jean Delvare
2e71e8bc6f perf/x86/amd/uncore: Avoid a false positive warning about snprintf truncation in amd_uncore_umc_ctx_init
Fix the following warning:
  CC [M]  arch/x86/events/amd/uncore.o
arch/x86/events/amd/uncore.c: In function ‘amd_uncore_umc_ctx_init’:
arch/x86/events/amd/uncore.c:951:52: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 8 [-Wformat-truncation=]
    snprintf(pmu->name, sizeof(pmu->name), "amd_umc_%d", index);
                                                    ^~
arch/x86/events/amd/uncore.c:951:43: note: directive argument in the range [0, 2147483647]
    snprintf(pmu->name, sizeof(pmu->name), "amd_umc_%d", index);
                                           ^~~~~~~~~~~~
arch/x86/events/amd/uncore.c:951:4: note: ‘snprintf’ output between 10 and 19 bytes into a destination of size 16
    snprintf(pmu->name, sizeof(pmu->name), "amd_umc_%d", index);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As far as I can see, there can't be more than UNCORE_GROUP_MAX (256)
groups and each group can't have more than 255 PMU, so the number
printed by this %d can't exceed 65279, that's only 5 digits and would
fit into the buffer. So it's a false positive warning. But we can
make the compiler happy by declaring index as a 16-bit number.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20241105095253.18f34b4d@endymion.delvare
2024-11-11 11:49:47 +01:00
Adrian Hunter
0d5eb14c1e perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
Events with aux actions or aux sampling expect the PMI to coincide with the
event, which does not happen for large PEBS, so do not enable large PEBS in
that case.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20241022155920.17511-5-adrian.hunter@intel.com
2024-11-05 12:55:44 +01:00
Adrian Hunter
08c7454ceb perf/x86/intel/pt: Add support for pause / resume
Prevent tracing to start if aux_paused.

Implement support for PERF_EF_PAUSE / PERF_EF_RESUME. When aux_paused, stop
tracing. When not aux_paused, only start tracing if it isn't currently
meant to be stopped.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20241022155920.17511-4-adrian.hunter@intel.com
2024-11-05 12:55:44 +01:00
Adrian Hunter
5b590160d2 perf/x86/intel/pt: Fix buffer full but size is 0 case
If the trace data buffer becomes full, a truncated flag [T] is reported
in PERF_RECORD_AUX.  In some cases, the size reported is 0, even though
data must have been added to make the buffer full.

That happens when the buffer fills up from empty to full before the
Intel PT driver has updated the buffer position.  Then the driver
calculates the new buffer position before calculating the data size.
If the old and new positions are the same, the data size is reported
as 0, even though it is really the whole buffer size.

Fix by detecting when the buffer position is wrapped, and adjust the
data size calculation accordingly.

Example

  Use a very small buffer size (8K) and observe the size of truncated [T]
  data. Before the fix, it is possible to see records of 0 size.

  Before:

    $ perf record -m,8K -e intel_pt// uname
    Linux
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.105 MB perf.data ]
    $ perf script -D --no-itrace | grep AUX | grep -F '[T]'
    Warning:
    AUX data lost 2 times out of 3!

    5 19462712368111 0x19710 [0x40]: PERF_RECORD_AUX offset: 0 size: 0 flags: 0x1 [T]
    5 19462712700046 0x19ba8 [0x40]: PERF_RECORD_AUX offset: 0x170 size: 0xe90 flags: 0x1 [T]

 After:

    $ perf record -m,8K -e intel_pt// uname
    Linux
    [ perf record: Woken up 3 times to write data ]
    [ perf record: Captured and wrote 0.040 MB perf.data ]
    $ perf script -D --no-itrace | grep AUX | grep -F '[T]'
    Warning:
    AUX data lost 2 times out of 3!

    1 113720802995 0x4948 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x2000 flags: 0x1 [T]
    1 113720979812 0x6b10 [0x40]: PERF_RECORD_AUX offset: 0x2000 size: 0x2000 flags: 0x1 [T]

Fixes: 52ca9ced3f ("perf/x86/intel/pt: Add Intel PT PMU driver")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20241022155920.17511-2-adrian.hunter@intel.com
2024-11-05 12:55:43 +01:00
Kan Liang
9e9af8bbb5 perf/x86/rapl: Clean up cpumask and hotplug
The rapl pmu is die scope, which is supported by the generic perf_event
subsystem now.

Set the scope for the rapl PMU and remove all the cpumask and hotplug
codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241010142604.770192-2-kan.liang@linux.intel.com
2024-10-30 22:42:19 +01:00
Kan Liang
9b99d65c0b perf/x86/rapl: Move the pmu allocation out of CPU hotplug
There are extra codes in the CPU hotplug function to allocate rapl pmus.
The generic PMU hotplug support is hard to be applied.

As long as the rapl pmus can be allocated upfront for each die/socket,
the code doesn't need to be implemented in the CPU hotplug function.
Move the code to the init_rapl_pmus(), and allocate a PMU for each
possible die/socket.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Link: https://lore.kernel.org/r/20241010142604.770192-1-kan.liang@linux.intel.com
2024-10-30 22:42:18 +01:00
Breno Leitao
de20037e1b perf/x86/amd: Warn only on new bits set
Warning at every leaking bits can cause a flood of message, triggering
various stall-warning mechanisms to fire, including CSD locks, which
makes the machine to be unusable.

Track the bits that are being leaked, and only warn when a new bit is
set.

That said, this patch will help with the following issues:

1) It will tell us which bits are being set, so, it is easy to
   communicate it back to vendor, and to do a root-cause analyzes.

2) It avoid the machine to be unusable, because, worst case
   scenario, the user gets less than 60 WARNs (one per unhandled bit).

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/20241001141020.2620361-1-leitao@debian.org
2024-10-07 09:28:46 +02:00
Dapeng Mi
d3fe6f0a43 perf/x86/intel: Add PMU support for ArrowLake-H
ArrowLake-H contains 3 different uarchs, LionCove, Skymont and Crestmont.
It is different with previous hybrid processors which only contains two
kinds of uarchs.

This patch adds PMU support for ArrowLake-H processor, adds ARL-H
specific events which supports the 3 kinds of uarchs, such as
td_retiring_arl_h, and extends some existed format attributes like
offcore_rsp to make them be available to support ARL-H as well. Althrough
these format attributes like offcore_rsp have been extended to support
ARL-H, they can still support the regular hybrid platforms with 2 kinds
of uarchs since the helper hybrid_format_is_visible() would filter PMU
types and only show the format attribute for available PMUs.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lkml.kernel.org/r/20240820073853.1974746-5-dapeng1.mi@linux.intel.com
2024-10-07 09:28:43 +02:00
Dapeng Mi
9f4a39757c perf/x86/intel: Support hybrid PMU with multiple atom uarchs
The upcoming ARL-H hybrid processor contains 2 different atom uarchs
which have different PMU capabilities. To distinguish these atom uarchs,
CPUID.1AH.EAX[23:0] defines a native model ID which can be used to
uniquely identify the uarch of the core by combining with core type.

Thus a 3rd hybrid pmu type "hybrid_tiny" is defined to mark the 2nd
atom uarch. The helper find_hybrid_pmu_for_cpu() would compare the
hybrid pmu type and dynamically read core native id from cpu to identify
the corresponding hybrid pmu structure.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lkml.kernel.org/r/20240820073853.1974746-4-dapeng1.mi@linux.intel.com
2024-10-07 09:28:43 +02:00
Dapeng Mi
79390db9eb perf/x86: Refine hybrid_pmu_type defination
Use macros instead of magic number to define hybrid_pmu_type and remove
X86_HYBRID_NUM_PMUS since it's never used.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lkml.kernel.org/r/20240820073853.1974746-2-dapeng1.mi@linux.intel.com
2024-10-07 09:28:43 +02:00
Linus Torvalds
9f0c253ddd Performance events changes for v6.12:
- Implement per-PMU context rescheduling to significantly improve single-PMU
    performance, and related cleanups/fixes. (by Peter Zijlstra and Namhyung Kim)
 
  - Fix ancient bug resulting in a lot of events being dropped erroneously
    at higher sampling frequencies. (by Luo Gengkun)
 
  - uprobes enhancements:
 
      - Implement RCU-protected hot path optimizations for better performance:
 
          "For baseline vs SRCU, peak througput increased from 3.7 M/s (million uprobe
           triggerings per second) up to about 8 M/s. For uretprobes it's a bit more
           modest with bump from 2.4 M/s to 5 M/s.
 
           For SRCU vs RCU Tasks Trace, peak throughput for uprobes increases further from
           8 M/s to 10.3 M/s (+28%!), and for uretprobes from 5.3 M/s to 5.8 M/s (+11%),
           as we have more work to do on uretprobes side.
 
           Even single-thread (no contention) performance is slightly better: 3.276 M/s to
           3.396 M/s (+3.5%) for uprobes, and 2.055 M/s to 2.174 M/s (+5.8%)
           for uretprobes."
 
           (by Andrii Nakryiko et al)
 
      - Document mmap_lock, don't abuse get_user_pages_remote(). (by Oleg Nesterov)
 
      - Cleanups & fixes to prepare for future work:
 
         - Remove uprobe_register_refctr()
 	- Simplify error handling for alloc_uprobe()
         - Make uprobe_register() return struct uprobe *
         - Fold __uprobe_unregister() into uprobe_unregister()
         - Shift put_uprobe() from delete_uprobe() to uprobe_unregister()
         - BPF: Fix use-after-free in bpf_uprobe_multi_link_attach()
 
           (by Oleg Nesterov)
 
  - New feature & ABI extension: allow events to use PERF_SAMPLE READ with
    inheritance, enabling sample based profiling of a group of counters over
    a hierarchy of processes or threads.  (by Ben Gainey)
 
  - Intel uncore & power events updates:
 
       - Add Arrow Lake and Lunar Lake support
       - Add PERF_EV_CAP_READ_SCOPE
       - Clean up and enhance cpumask and hotplug support
 
         (by Kan Liang)
 
       - Add LNL uncore iMC freerunning support
       - Use D0:F0 as a default device
 
         (by Zhenyu Wang)
 
  - Intel PT: fix AUX snapshot handling race. (by Adrian Hunter)
 
  - Misc fixes and cleanups. (by James Clark, Jiri Olsa, Oleg Nesterov and Peter Zijlstra)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmbqxEwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iusw/43UAcAZVof6Qs+j6bVAxSabF66fFfE9Wh
 jc+F4yZ2MGl9x6a1f392+CPcTdVsYp6G2QtRGMipD+trmi/lhDhmRrhxxD1KWIwP
 zVGSBx9CSFl0UpCXdGiVrGzT5xpIpJ4qqW2XUVr32n8SxTT5X/vM5ySm6KUXsIrD
 2/KXwucT9a7grkl3pvy/A/FUHxaF7oAMJjcIPSvLBveQjQSHUrZoCZdHsRGT9rjS
 HjzxG6gDy97172z5XV1ej3HJOfFlFTQ1RcoxNqdLfiZ6n3hD4hfmtsXWB5zTzRjT
 xHaCOmWLhEp5v+fK2+RCFiWUbDBsmW/mecZdrjGb3C1RIDWQhLCXXc95XtrobTvk
 BkW9QEC/XRB+vU6Ssdv3ugN7yRWxih0BsLU5sy4nlzmwoYt9qOy8fgjRvSBKHr5K
 Mu1RIFu+KXq++sa7+ZJjUMY70PHQCp2m4AHprG/Y98t93CQMhDXzGVpPzWyQuW/V
 lqYFjd/CAoCIVGF4Jxq7sqOdZ1emDN+P0WSnnFWssJ0ZJFvxN9ZDPH2AaMk4lwo7
 NFW6u3+0Vx9P0m/H6xRQj00Iye2JLMqJNCIA8QtjnB7L6upgVvcIPjgcG58fpV1o
 xfJekOR1A7T2aQUDlX5t9Cu36ZUImDRmwHj2m1p84s5AANlbD7/fOmffR1Hn9uFj
 wCTqSpi8Hg==
 =E3s3
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2024-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:

 - Implement per-PMU context rescheduling to significantly improve
   single-PMU performance, and related cleanups/fixes (Peter Zijlstra
   and Namhyung Kim)

 - Fix ancient bug resulting in a lot of events being dropped
   erroneously at higher sampling frequencies (Luo Gengkun)

 - uprobes enhancements:

     - Implement RCU-protected hot path optimizations for better
       performance:

         "For baseline vs SRCU, peak througput increased from 3.7 M/s
          (million uprobe triggerings per second) up to about 8 M/s. For
          uretprobes it's a bit more modest with bump from 2.4 M/s to
          5 M/s.

          For SRCU vs RCU Tasks Trace, peak throughput for uprobes
          increases further from 8 M/s to 10.3 M/s (+28%!), and for
          uretprobes from 5.3 M/s to 5.8 M/s (+11%), as we have more
          work to do on uretprobes side.

          Even single-thread (no contention) performance is slightly
          better: 3.276 M/s to 3.396 M/s (+3.5%) for uprobes, and 2.055
          M/s to 2.174 M/s (+5.8%) for uretprobes."

          (Andrii Nakryiko et al)

     - Document mmap_lock, don't abuse get_user_pages_remote() (Oleg
       Nesterov)

     - Cleanups & fixes to prepare for future work:
        - Remove uprobe_register_refctr()
	- Simplify error handling for alloc_uprobe()
        - Make uprobe_register() return struct uprobe *
        - Fold __uprobe_unregister() into uprobe_unregister()
        - Shift put_uprobe() from delete_uprobe() to uprobe_unregister()
        - BPF: Fix use-after-free in bpf_uprobe_multi_link_attach()
          (Oleg Nesterov)

 - New feature & ABI extension: allow events to use PERF_SAMPLE READ
   with inheritance, enabling sample based profiling of a group of
   counters over a hierarchy of processes or threads (Ben Gainey)

 - Intel uncore & power events updates:

      - Add Arrow Lake and Lunar Lake support
      - Add PERF_EV_CAP_READ_SCOPE
      - Clean up and enhance cpumask and hotplug support
        (Kan Liang)

      - Add LNL uncore iMC freerunning support
      - Use D0:F0 as a default device
        (Zhenyu Wang)

 - Intel PT: fix AUX snapshot handling race (Adrian Hunter)

 - Misc fixes and cleanups (James Clark, Jiri Olsa, Oleg Nesterov and
   Peter Zijlstra)

* tag 'perf-core-2024-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  dmaengine: idxd: Clean up cpumask and hotplug for perfmon
  iommu/vt-d: Clean up cpumask and hotplug for perfmon
  perf/x86/intel/cstate: Clean up cpumask and hotplug
  perf: Add PERF_EV_CAP_READ_SCOPE
  perf: Generic hotplug support for a PMU with a scope
  uprobes: perform lockless SRCU-protected uprobes_tree lookup
  rbtree: provide rb_find_rcu() / rb_find_add_rcu()
  perf/uprobe: split uprobe_unregister()
  uprobes: travers uprobe's consumer list locklessly under SRCU protection
  uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks
  uprobes: protected uprobe lifetime with SRCU
  uprobes: revamp uprobe refcounting and lifetime management
  bpf: Fix use-after-free in bpf_uprobe_multi_link_attach()
  perf/core: Fix small negative period being ignored
  perf: Really fix event_function_call() locking
  perf: Optimize __pmu_ctx_sched_out()
  perf: Add context time freeze
  perf: Fix event_function_call() locking
  perf: Extract a few helpers
  perf: Optimize context reschedule for single PMU cases
  ...
2024-09-18 15:03:58 +02:00
Ingo Molnar
5e645f3113 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-09-10 12:28:05 +02:00
Kan Liang
ef493f4b12 perf/x86/intel: Allow to setup LBR for counting event for BPF
The BPF subsystem may capture LBR data on a counting event. However, the
current implementation assumes that LBR can/should only be used with
sampling events.

For instance, retsnoop tool ([0]) makes an extensive use of this
functionality and sets up perf event as follows:

	struct perf_event_attr attr;

	memset(&attr, 0, sizeof(attr));
	attr.size = sizeof(attr);
	attr.type = PERF_TYPE_HARDWARE;
	attr.config = PERF_COUNT_HW_CPU_CYCLES;
	attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
	attr.branch_sample_type = PERF_SAMPLE_BRANCH_KERNEL;

To limit the LBR for a sampling event is to avoid unnecessary branch
stack setup for a counting event in the sample read. Because LBR is only
read in the sampling event's overflow.

Although in most cases LBR is used in sampling, there is no HW limit to
bind LBR to the sampling mode. Allow an LBR setup for a counting event
unless in the sample read mode.

Fixes: 85846b2707 ("perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag")
Closes: https://lore.kernel.org/lkml/20240905180055.1221620-1-andrii@kernel.org/
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240909155848.326640-1-kan.liang@linux.intel.com
2024-09-10 12:02:23 +02:00
Kan Liang
08155c7f2a perf/x86/intel/cstate: Clean up cpumask and hotplug
There are three cstate PMUs with different scopes, core, die and module.
The scopes are supported by the generic perf_event subsystem now.

Set the scope for each PMU and remove all the cpumask and hotplug codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240802151643.1691631-4-kan.liang@linux.intel.com
2024-09-10 11:44:13 +02:00
Dhananjay Ugwekar
8d72eba1cf perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
After commit:

  63edbaa48a ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")

... on AMD processors that support extended CPUID leaf 0x80000026, the
topology_die_cpumask() and topology_logical_die_id() macros no longer
return the package cpumask and package ID, instead they return the CCD
(Core Complex Die) mask and ID respectively.

This leads to the energy-pkg event scope to be modified to CCD instead of package.

So, change the PMU scope for AMD and Hygon back to package.

On a 12 CCD 1 Package AMD Zen4 Genoa machine:

  Before:

  $ cat /sys/devices/power/cpumask
  0,8,16,24,32,40,48,56,64,72,80,88.

The expected cpumask here is supposed to be just "0", as it is a package
scope event, only one CPU will be collecting the event for all the CPUs in
the package.

  After:

  $ cat /sys/devices/power/cpumask
  0

[ mingo: Cleaned up the changelog ]

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240904100934.3260-1-Dhananjay.Ugwekar@amd.com
2024-09-05 12:02:14 +02:00
Ingo Molnar
95c13662b6 Merge branch 'perf/urgent' into perf/core, to pick up fixes
This also refreshes the -rc1 based branch to -rc5.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-09-05 11:17:43 +02:00
Kan Liang
25dfc9e357 perf/x86/intel: Limit the period on Haswell
Running the ltp test cve-2015-3290 concurrently reports the following
warnings.

perfevents: irq loop stuck!
  WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174
  intel_pmu_handle_irq+0x285/0x370
  Call Trace:
   <NMI>
   ? __warn+0xa4/0x220
   ? intel_pmu_handle_irq+0x285/0x370
   ? __report_bug+0x123/0x130
   ? intel_pmu_handle_irq+0x285/0x370
   ? __report_bug+0x123/0x130
   ? intel_pmu_handle_irq+0x285/0x370
   ? report_bug+0x3e/0xa0
   ? handle_bug+0x3c/0x70
   ? exc_invalid_op+0x18/0x50
   ? asm_exc_invalid_op+0x1a/0x20
   ? irq_work_claim+0x1e/0x40
   ? intel_pmu_handle_irq+0x285/0x370
   perf_event_nmi_handler+0x3d/0x60
   nmi_handle+0x104/0x330

Thanks to Thomas Gleixner's analysis, the issue is caused by the low
initial period (1) of the frequency estimation algorithm, which triggers
the defects of the HW, specifically erratum HSW11 and HSW143. (For the
details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/)

The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL
event, but the initial period in the freq mode is 1. The erratum is the
same as the BDM11, which has been supported in the kernel. A minimum
period of 128 is enforced as well on HSW.

HSW143 is regarding that the fixed counter 1 may overcount 32 with the
Hyper-Threading is enabled. However, based on the test, the hardware
has more issues than it tells. Besides the fixed counter 1, the message
'interrupt took too long' can be observed on any counter which was armed
with a period < 32 and two events expired in the same NMI. A minimum
period of 32 is enforced for the rest of the events.
The recommended workaround code of the HSW143 is not implemented.
Because it only addresses the issue for the fixed counter. It brings
extra overhead through extra MSR writing. No related overcounting issue
has been reported so far.

Fixes: 3a632cb229 ("perf/x86/intel: Add simple Haswell PMU support")
Reported-by: Li Huafei <lihuafei1@huawei.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240819183004.3132920-1-kan.liang@linux.intel.com
Closes: https://lore.kernel.org/lkml/20240729223328.327835-1-lihuafei1@huawei.com/
2024-08-25 16:49:05 +02:00
James Clark
ea1992f36b perf/x86/intel/bts: Fix comment about default perf_event_paranoid setting
The default paranoid setting was updated in commit 0161028b7c
("perf/core: Change the default paranoia level to 2") so this comment is
no longer true.

Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240802105256.335961-1-james.clark@linaro.org
2024-08-05 16:54:47 +02:00
Zhenyu Wang
aaad0e2aa5 perf/x86/intel/uncore: Use D0:F0 as a default device
Some uncore PMON registers are located in the MMIO space of the Host
Bridge and DRAM Controller device, which is located at D0:F0 for
Tiger Lake and later client generation.

Use D0:F0 as a default device. So it doesn't need to keep adding the
complete Device ID list for each generation anymore.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240731141353.759643-5-kan.liang@linux.intel.com
2024-08-05 16:54:46 +02:00
Zhenyu Wang
9ac57c456f perf/x86/intel/uncore: Add LNL uncore iMC freerunning support
LNL uncore imc freerunning counters keep same as previous HW.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240731141353.759643-4-kan.liang@linux.intel.com
2024-08-05 16:54:46 +02:00
Kan Liang
9bd7dfe3a5 perf/x86/intel/uncore: Add Lunar Lake support
The uncore subsystem for Lunar Lake is similar to the previous
Meteor Lake. The uncore PerfMon registers are located at both
MSR and MMIO space.

The ARB and iMC are kept. There is no difference from the Meteor Lake.
Move the global control initialization to the first box of the CBOX.

The sNCU is moved to the MMIO space.

The HBO is newly added and only be accessed from the MMIO space.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240731141353.759643-3-kan.liang@linux.intel.com
2024-08-05 16:54:46 +02:00
Kan Liang
efb0c9c0b9 perf/x86/intel/uncore: Factor out common MMIO init and ops functions
Some uncore PMON registers are located in the MMIO space. For the client
machine, the MMIO space is usually located at D0:F0 but in a different
BAR. For example, some uncore PMON registers are located in the SAF BAR,
not the MCHBAR in the Lunar Lake.

The current __uncore_imc_init_box() hard code the BAR information.
Factor out the uncore_get_box_mmio_addr() which uses the BAR information
as a parameter.
The only change is the error output message. The hardcode name 'MCHBAR'
is replaced by the offset of a BAR.

Add a new macro, MMIO_UNCORE_COMMON_OPS(), since the MMIO ops functions
are usually the same among different generations.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240731141353.759643-2-kan.liang@linux.intel.com
2024-08-05 16:54:46 +02:00
Kan Liang
e0f49f15f6 perf/x86/intel/uncore: Add Arrow Lake support
>From the perspective of the uncore PMU, the Arrow Lake is the same as
the previous Meteor Lake. The only difference is the event list, which
will be supported in the perf tool later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240731141353.759643-1-kan.liang@linux.intel.com
2024-08-05 16:54:45 +02:00
Andrii Nakryiko
cfa7f3d2c5 perf,x86: avoid missing caller address in stack traces captured in uprobe
When tracing user functions with uprobe functionality, it's common to
install the probe (e.g., a BPF program) at the first instruction of the
function. This is often going to be `push %rbp` instruction in function
preamble, which means that within that function frame pointer hasn't
been established yet. This leads to consistently missing an actual
caller of the traced function, because perf_callchain_user() only
records current IP (capturing traced function) and then following frame
pointer chain (which would be caller's frame, containing the address of
caller's caller).

So when we have target_1 -> target_2 -> target_3 call chain and we are
tracing an entry to target_3, captured stack trace will report
target_1 -> target_3 call chain, which is wrong and confusing.

This patch proposes a x86-64-specific heuristic to detect `push %rbp`
(`push %ebp` on 32-bit architecture) instruction being traced. Given
entire kernel implementation of user space stack trace capturing works
under assumption that user space code was compiled with frame pointer
register (%rbp/%ebp) preservation, it seems pretty reasonable to use
this instruction as a strong indicator that this is the entry to the
function. In that case, return address is still pointed to by %rsp/%esp,
so we fetch it and add to stack trace before proceeding to unwind the
rest using frame pointer-based logic.

We also check for `endbr64` (for 64-bit modes) as another common pattern
for function entry, as suggested by Josh Poimboeuf. Even if we get this
wrong sometimes for uprobes attached not at the function entry, it's OK
because stack trace will still be overall meaningful, just with one
extra bogus entry. If we don't detect this, we end up with guaranteed to
be missing caller function entry in the stack trace, which is worse
overall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240729175223.23914-1-andrii@kernel.org
2024-08-02 11:30:30 +02:00
Li Huafei
f73cefa3b7 perf/x86: Fix smp_processor_id()-in-preemptible warnings
The following bug was triggered on a system built with
CONFIG_DEBUG_PREEMPT=y:

 # echo p > /proc/sysrq-trigger

 BUG: using smp_processor_id() in preemptible [00000000] code: sh/117
 caller is perf_event_print_debug+0x1a/0x4c0
 CPU: 3 UID: 0 PID: 117 Comm: sh Not tainted 6.11.0-rc1 #109
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x4f/0x60
  check_preemption_disabled+0xc8/0xd0
  perf_event_print_debug+0x1a/0x4c0
  __handle_sysrq+0x140/0x180
  write_sysrq_trigger+0x61/0x70
  proc_reg_write+0x4e/0x70
  vfs_write+0xd0/0x430
  ? handle_mm_fault+0xc8/0x240
  ksys_write+0x9c/0xd0
  do_syscall_64+0x96/0x190
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

This is because the commit d4b294bf84 ("perf/x86: Hybrid PMU support
for counters") took smp_processor_id() outside the irq critical section.
If a preemption occurs in perf_event_print_debug() and the task is
migrated to another cpu, we may get incorrect pmu debug information.
Move smp_processor_id() back inside the irq critical section to fix this
issue.

Fixes: d4b294bf84 ("perf/x86: Hybrid PMU support for counters")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240729220928.325449-1-lihuafei1@huawei.com
2024-07-31 12:57:39 +02:00
Peter Zijlstra
52c3fb1a0f perf/x86: Add hw_perf_event::aux_config
Start a new section for AUX PMUs in hw_perf_event.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2024-07-29 12:16:24 +02:00
Adrian Hunter
d92792a4b2 perf/x86/intel/pt: Fix sampling synchronization
pt_event_snapshot_aux() uses pt->handle_nmi to determine if tracing
needs to be stopped, however tracing can still be going because
pt->handle_nmi is set to zero before tracing is stopped in pt_event_stop,
whereas pt_event_snapshot_aux() requires that tracing must be stopped in
order to copy a sample of trace from the buffer.

Instead call pt_config_stop() always, which anyway checks config for
RTIT_CTL_TRACEEN and does nothing if it is already clear.

Note pt_event_snapshot_aux() can continue to use pt->handle_nmi to
determine if the trace needs to be restarted afterwards.

Fixes: 25e8920b30 ("perf/x86/intel/pt: Add sampling support")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240715160712.127117-2-adrian.hunter@intel.com
2024-07-29 12:16:24 +02:00
Zhenyu Wang
b1d0e15c87 perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
Package C2 residency counter is also available on Sierra Forest.
So add it support in srf_cstates.

Fixes: 3877d55a0d ("perf/x86/intel/cstate: Add Sierra Forest support")
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Link: https://lore.kernel.org/r/20240717031609.74513-1-zhenyuw@linux.intel.com
2024-07-29 12:16:22 +02:00
Linus Torvalds
576a997c63 Performance events changes for v6.11:
- Intel PT support enhancements & fixes
  - Fix leaked SIGTRAP events
  - Improve and fix the Intel uncore driver
  - Add support for Intel HBM and CXL uncore counters
  - Add Intel Lake and Arrow Lake support
  - AMD uncore driver fixes
  - Make SIGTRAP and __perf_pending_irq() work on RT
  - Micro-optimizations
  - Misc cleanups and fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmaWjncRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iZyg//TSafjCK4N9fyXrPdPqf8L7ntX5uYf0rd
 uVZpEo/+VGvuFhznHnZIV2DLetvuwYZcUWszCqQMYfokGGi6WI1/k4MeZkSpN5QE
 p5mFk6gW3cmpHT9bECg7mKQH+w7Qna/b6mnA0HYTFxPGmQKdQDl1/S+ZsgWedxpC
 4V3re7/FzenFVS45DwSMPi9s7uZzZhVhTSgb4XLy+0Da4S0iRULItBa8HT8HmqE5
 v5aQlw3mmwKPUWvyPMi3Sw6RRWK3C+n5ZxWswSYoLSM3dsp1ZD+YYqtOv2GqAx8v
 JoL0SOnGnNCfxGHh0kz5D2hztDvq61Enotih2gz7HxvdWh2DasNp4yS1USGQhu5h
 VJnKNA0TfOUaYqWFVj0EgRVhDX79lMwSHTkR1DZd4vM2GDigHeRPh0zGSn2w/koV
 oCRxFfBoktHBnX0Te1NE2BhojbuKp25vTGK6GriVcHt/RNpuz6hTxsjdJzHCAlVX
 M349l0EpUJafvfaIN9zF22uw22J8P9y9JYqI6ebkUIKiuoT9LuafVYhQupSE9H4u
 IqlozPCTNw6eAQcUo03gkl3n+SY/DZH6eU2ycKgEp3r7TDGYbJPwxY1BgOHbwi4U
 lySM07leso2accSVAz7GDMI3ejj6Sx64asWS1FSwbajDflouaIK2jtey+1IOdXfv
 hHY65tomV8U=
 =gguT
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:

 - Intel PT support enhancements & fixes

 - Fix leaked SIGTRAP events

 - Improve and fix the Intel uncore driver

 - Add support for Intel HBM and CXL uncore counters

 - Add Intel Lake and Arrow Lake support

 - AMD uncore driver fixes

 - Make SIGTRAP and __perf_pending_irq() work on RT

 - Micro-optimizations

 - Misc cleanups and fixes

* tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  perf/x86/intel: Add a distinct name for Granite Rapids
  perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake
  perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated
  perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR
  perf: Split __perf_pending_irq() out of perf_pending_irq()
  perf: Don't disable preemption in perf_pending_task().
  perf: Move swevent_htable::recursion into task_struct.
  perf: Shrink the size of the recursion counter.
  perf: Enqueue SIGTRAP always via task_work.
  task_work: Add TWA_NMI_CURRENT as an additional notify mode.
  perf: Move irq_work_queue() where the event is prepared.
  perf: Fix event leak upon exec and file release
  perf: Fix event leak upon exit
  task_work: Introduce task_work_cancel() again
  task_work: s/task_work_cancel()/task_work_cancel_func()/
  perf/x86/amd/uncore: Fix DF and UMC domain identification
  perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable
  perf/x86/intel: Support Perfmon MSRs aliasing
  perf/x86/intel: Support PERFEVTSEL extension
  perf/x86: Add config_mask to represent EVENTSEL bitmask
  ...
2024-07-16 17:13:31 -07:00
Linus Torvalds
151647ab58 Locking changes for v6.11:
- Jump label fixes, including a perf events fix that originally
    manifested as jump label failures, but was a serialization bug
    at the usage site.
 
  - Mark down_write*() helpers as __always_inline, to improve
    WCHAN debuggability.
 
  - Misc cleanups and fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmaVEV8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iWbQ/+KBa29udVx0qiX/1R/+4EBNKzFDyvhobu
 9XuAUZJZ9g46PFqnXlnA/VhGff52M2I+1scva11OPnb2NnOhAN1yUFQJaZh0HM+y
 GTsaLQEdJfNKv1Z0i05AnYE7BKFEnoksWInsOg6Or5KoML5RS6vIyLWsBQpCjelM
 ZxAYNHVpI5qeeUT7dqy9bkBxUXgOpc5X5+qZmmBnBDTGvhsPviV88gL1Ol2tjmGi
 hhALK9RdLMltrNEejDc9sBcLw/qRA5QOUDDxSoBykSOBaljJB+3BcTCuXdUJT24Y
 xpufbESHyl7HAmdvRU+n4ypo4kuKSIcyF1+V6LcMmNi0jSsyUG7x+SuhJetQHd2z
 kLsYD18JbHHBVz7/047DnVTtyTcifsDpJi4MqEYYrUhMw1ns/goEiLRgSEYdB/FT
 eCjdZxbCzkKzmV42/gqkM/mv8/pWkhvAqZ1/LNkHSOR/JSa5X3NeR0qrj18OKSrS
 Du+ycGDqk1PqVcqiC1heBayIU/QoY/sEkdktTtbAKSujKCJ+tYhPtqMki6JaOh5I
 VPp9ekcbs+Vvuu1qNDcueoe3TqxO2wfchKMNcSD8+I68EZtxHWZN3iHtTOlqGYVr
 uZuG0V2WSgGvUi7k0UIPpnnmZ2+XejA++VnjVitXM3Z+xf7MlrPNy/4fzUbkgTQn
 T8/Tt7SytQE=
 =T1KT
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Jump label fixes, including a perf events fix that originally
   manifested as jump label failures, but was a serialization bug at the
   usage site

 - Mark down_write*() helpers as __always_inline, to improve WCHAN
   debuggability

 - Misc cleanups and fixes

* tag 'locking-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Add __always_inline annotation to __down_write_common() and inlined callers
  jump_label: Simplify and clarify static_key_fast_inc_cpus_locked()
  jump_label: Clarify condition in static_key_fast_inc_not_disabled()
  jump_label: Fix concurrency issues in static_key_slow_dec()
  perf/x86: Serialize set_attr_rdpmc()
  cleanup: Standardize the header guard define's name
2024-07-16 16:42:37 -07:00
Linus Torvalds
d679783188 - Flip the logic to add feature names to /proc/cpuinfo to having to
explicitly specify the flag if there's a valid reason to show it in
   /proc/cpuinfo
 
 - Switch a bunch of Intel x86 model checking code to the new CPU model
   defines
 
 - Fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaVZ+EACgkQEsHwGGHe
 VUqTgA//aJez6C5SmuqIofqgimr+8JGNThf4vFB3O9tN0ony3IR8IRieF+sOZFXE
 WVyN7KOhPs2XvNzVAaJpzWUcg/E2bXzVrOKfx3uFiyNiBttKLVot7Hl640wqWGoG
 eTViTpQ6IALY7lEI6vFNXz+4Ja5PWmHxWdBkvP9ehSvqNxHivTWL4HQ11pcCWQEA
 i+V37PbOHsnH7ZprJtaV0ihtjFblk9/R4qoZuT3SObhG0QDJK4Q7yYUelxXMUUgD
 Yo3nXluQl6Vc5dD2ULYkTlhzMxoZUMURty897vYSsZz49ZXsS6fsvd+BheSQVOv1
 hzaqqFYijdIpPI1zwgAPM+e6S/EAafpNVcEkjhHGZIJehwXm3teoSlX5tK2NPGoe
 PLYrwPWAzagdS3dWvrvBYT3Bu7pygieDSyPFfVP2XQsElHsWhYvBtxeH/uUwm+v4
 xjtXaJUj9eznChPaDZhCl8ioh9szUKHsh2NJ5ND7qpxPCFpz1Xj9ZmbIYTjHEgjG
 IT8dFfykKdyh5htJWw/P8LbexpEMTmu/LDrDXt+tFsDLBKIkeLiP3h8+yDR+vJ7K
 OGBjY2ciSi9Wy9ynunCOCNHNBdia1qc3AJWSg/2YP4NW+RzRLe6cIs+Ih4s1N5lx
 ADvw+TA9CAKo1KASyOVYAxq7h4xlsyH6jbCC3ZW3P/a+Bs8smqM=
 =SEED
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu model updates from Borislav Petkov:

 - Flip the logic to add feature names to /proc/cpuinfo to having to
   explicitly specify the flag if there's a valid reason to show it in
   /proc/cpuinfo

 - Switch a bunch of Intel x86 model checking code to the new CPU model
   defines

 - Fixes and cleanups

* tag 'x86_cpu_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/intel: Drop stray FAM6 check with new Intel CPU model defines
  x86/cpufeatures: Flip the /proc/cpuinfo appearance logic
  x86/CPU/AMD: Always inline amd_clear_divider()
  x86/mce/inject: Add missing MODULE_DESCRIPTION() line
  perf/x86/rapl: Switch to new Intel CPU model defines
  x86/boot: Switch to new Intel CPU model defines
  x86/cpu: Switch to new Intel CPU model defines
  perf/x86/intel: Switch to new Intel CPU model defines
  x86/virt/tdx: Switch to new Intel CPU model defines
  x86/PCI: Switch to new Intel CPU model defines
  x86/cpu/intel: Switch to new Intel CPU model defines
  x86/platform/intel-mid: Switch to new Intel CPU model defines
  x86/pconfig: Remove unused MKTME pconfig code
  x86/cpu: Remove useless work in detect_tme_early()
2024-07-15 20:25:16 -07:00
Kan Liang
fa0c1c9d28 perf/x86/intel: Add a distinct name for Granite Rapids
Currently, the Sapphire Rapids and Granite Rapids share the same PMU
name, sapphire_rapids. Because from the kernel’s perspective, GNR is
similar to SPR. The only key difference is that they support different
extra MSRs. The code path and the PMU name are shared.

However, from end users' perspective, they are quite different. Besides
the extra MSRs, GNR has a newer PEBS format, supports Retire Latency,
supports new CPUID enumeration architecture, doesn't required the
load-latency AUX event, has additional TMA Level 1 Architectural Events,
etc. The differences can be enumerated by CPUID or the PERF_CAPABILITIES
MSR. They weren't reflected in the model-specific kernel setup.
But it is worth to have a distinct PMU name for GNR.

Fixes: a6742cb90b ("perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL")
Suggested-by: Ahmad Yasin <ahmad.yasin@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240708193336.1192217-3-kan.liang@linux.intel.com
2024-07-09 13:26:39 +02:00
Kan Liang
e5f32ad56b perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake
A non-0 retire latency can be observed on a Raptorlake which doesn't
support the retire latency feature.
By design, the retire latency shares the PERF_SAMPLE_WEIGHT_STRUCT
sample type with other types of latency. That could avoid adding too
many different sample types to support all kinds of latency. For the
machine which doesn't support some kind of latency, 0 should be
returned.

Perf doesn’t clear/init all the fields of a sample data for the sake
of performance. It expects the later perf_{prepare,output}_sample() to
update the uninitialized field. However, the current implementation
doesn't touch the field of the retire latency if the feature is not
supported. The memory garbage is dumped into the perf data.

Clear the retire latency if the feature is not supported.

Fixes: c87a31093c ("perf/x86: Support Retire Latency")
Reported-by: "Bayduraev, Alexey V" <alexey.v.bayduraev@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Bayduraev, Alexey V" <alexey.v.bayduraev@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240708193336.1192217-4-kan.liang@linux.intel.com
2024-07-09 13:26:39 +02:00
Kan Liang
556a7c039a perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated
The below error is observed on Ice Lake VM.

$ perf stat
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (slots).
/bin/dmesg | grep -i perf may provide additional information.

In a virtualization env, the Topdown metrics and the slots event haven't
been supported yet. The guest CPUID doesn't enumerate them. However, the
current kernel unconditionally exposes the slots event and the Topdown
metrics events to sysfs, which misleads the perf tool and triggers the
error.

Hide the perf-metrics topdown events and the slots event if the
perf-metrics feature is not enumerated.

The big core of a hybrid platform can also supports the perf-metrics
feature. Fix the hybrid platform as well.

Closes: https://lore.kernel.org/lkml/CAM9d7cj8z+ryyzUHR+P1Dcpot2jjW+Qcc4CPQpfafTXN=LEU0Q@mail.gmail.com/
Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Link: https://lkml.kernel.org/r/20240708193336.1192217-2-kan.liang@linux.intel.com
2024-07-09 13:26:38 +02:00
Kan Liang
a5a6ff3d63 perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR
The perf stat errors out with UNC_CHA_TOR_INSERTS.IA_HIT_CXL_ACC_LOCAL
event.

 $perf stat -e uncore_cha_55/event=0x35,umask=0x10c0008101/ -a -- ls
    event syntax error: '..0x35,umask=0x10c0008101/'
                                      \___ Bad event or PMU

The definition of the CHA umask is config:8-15,32-55, which is 32bit.
However, the umask of the event is bigger than 32bit.
This is an error in the original uncore spec.

Add a new umask_ext5 for the new CHA umask range.

Fixes: 949b11381f ("perf/x86/intel/uncore: Add Sapphire Rapids server CHA support")
Closes: https://lore.kernel.org/linux-perf-users/alpine.LRH.2.20.2401300733310.11354@Diego/
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240708185524.1185505-1-kan.liang@linux.intel.com
2024-07-09 13:26:38 +02:00
Sandipan Das
57e11990f4 perf/x86/amd/uncore: Fix DF and UMC domain identification
For uncore PMUs, a single context is shared across all CPUs in a domain.
The domain can be a CCX, like in the case of the L3 PMU, or a socket,
like in the case of DF and UMC PMUs. This information is available via
the PMU's cpumask.

For contexts shared across a socket, the domain is currently determined
from topology_die_id() which is incorrect after the introduction of
commit 63edbaa48a ("x86/cpu/topology: Add support for the AMD
0x80000026 leaf") as it now returns a CCX identifier on Zen 4 and later
systems which support CPUID leaf 0x80000026.

Use topology_logical_package_id() instead as it always returns a socket
identifier irrespective of the availability of CPUID leaf 0x80000026.

Fixes: 63edbaa48a ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240626074942.1044818-1-sandipan.das@amd.com
2024-07-04 16:00:41 +02:00
Sandipan Das
f997e208b6 perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable
X86_FEATURE_PERFCTR_NB and X86_FEATURE_PERFCTR_LLC are derived from
CPUID leaf 0x80000001 ECX bits 24 and 28 respectively and denote the
availability of DF and L3 counters. When these bits are not set, the
corresponding PMUs have no counters and hence, should not be registered.

Fixes: 07888daa05 ("perf/x86/amd/uncore: Move discovery and registration")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240626074404.1044230-1-sandipan.das@amd.com
2024-07-04 16:00:41 +02:00
Kan Liang
149fd4712b perf/x86/intel: Support Perfmon MSRs aliasing
The architectural performance monitoring V6 supports a new range of
counters' MSRs in the 19xxH address range. They include all the GP
counter MSRs, the GP control MSRs, and the fixed counter MSRs.

The step between each sibling counter is 4. Add intel_pmu_addr_offset()
to calculate the correct offset.

Add fixedctr in struct x86_pmu to store the address of the fixed counter
0. It can be used to calculate the rest of the fixed counters.

The MSR address of the fixed counter control is not changed.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-9-kan.liang@linux.intel.com
2024-07-04 16:00:40 +02:00
Kan Liang
dce0c74d2d perf/x86/intel: Support PERFEVTSEL extension
Two new fields (the unit mask2, and the equal flag) are added in the
IA32_PERFEVTSELx MSRs. They can be enumerated by the CPUID.23H.0.EBX.

Update the config_mask in x86_pmu and x86_hybrid_pmu for the true layout
of the PERFEVTSEL.
Expose the new formats into sysfs if they are available. The umask
extension reuses the same format attr name "umask" as the previous
umask. Add umask2_show to determine/display the correct format
for the current machine.

Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-8-kan.liang@linux.intel.com
2024-07-04 16:00:40 +02:00
Kan Liang
e8fb5d6e76 perf/x86: Add config_mask to represent EVENTSEL bitmask
Different vendors may support different fields in EVENTSEL MSR, such as
Intel would introduce new fields umask2 and eq bits in EVENTSEL MSR
since Perfmon version 6. However, a fixed mask X86_RAW_EVENT_MASK is
used to filter the attr.config.

Introduce a new config_mask to record the real supported EVENTSEL
bitmask.
Only apply it to the existing code now. No functional change.

Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-7-kan.liang@linux.intel.com
2024-07-04 16:00:39 +02:00
Kan Liang
608f6976c3 perf/x86/intel: Support new data source for Lunar Lake
A new PEBS data source format is introduced for the p-core of Lunar
Lake. The data source field is extended to 8 bits with new encodings.

A new layout is introduced into the union intel_x86_pebs_dse.
Introduce the lnl_latency_data() to parse the new format.
Enlarge the pebs_data_source[] accordingly to include new encodings.

Only the mem load and the mem store events can generate the data source.
Introduce INTEL_HYBRID_LDLAT_CONSTRAINT and
INTEL_HYBRID_STLAT_CONSTRAINT to mark them.

Add two new bits for the new cache-related data src, L2_MHB and MSC.
The L2_MHB is short for L2 Miss Handling Buffer, which is similar to
LFB (Line Fill Buffer), but to track the L2 Cache misses.
The MSC stands for the memory-side cache.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-6-kan.liang@linux.intel.com
2024-07-04 16:00:38 +02:00
Kan Liang
090262439f perf/x86/intel: Rename model-specific pebs_latency_data functions
The model-specific pebs_latency_data functions of ADL and MTL use the
"small" as a postfix to indicate the e-core. The postfix is too generic
for a model-specific function. It cannot provide useful information that
can directly map it to a specific uarch, which can facilitate the
development and maintenance.
Use the abbr of the uarch to rename the model-specific functions.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-5-kan.liang@linux.intel.com
2024-07-04 16:00:38 +02:00
Kan Liang
a932aa0e86 perf/x86: Add Lunar Lake and Arrow Lake support
From PMU's perspective, Lunar Lake and Arrow Lake are similar to the
previous generation Meteor Lake. Both are hybrid platforms, with e-core
and p-core.

The key differences include:
- The e-core supports 3 new fixed counters
- The p-core supports an updated PEBS Data Source format
- More GP counters (Updated event constraint table)
- New Architectural performance monitoring V6
  (New Perfmon MSRs aliasing, umask2, eq).
- New PEBS format V6 (Counters Snapshotting group)
- New RDPMC metrics clear mode

The legacy features, the 3 new fixed counters and updated event
constraint table are enabled in this patch.

The new PEBS data source format, the architectural performance
monitoring V6, the PEBS format V6, and the new RDPMC metrics clear mode
are supported in the following patches.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-4-kan.liang@linux.intel.com
2024-07-04 16:00:37 +02:00
Kan Liang
722e42e45c perf/x86: Support counter mask
The current perf assumes that both GP and fixed counters are contiguous.
But it's not guaranteed on newer Intel platforms or in a virtualization
environment.

Use the counter mask to replace the number of counters for both GP and
the fixed counters. For the other ARCHs or old platforms which don't
support a counter mask, using GENMASK_ULL(num_counter - 1, 0) to
replace. There is no functional change for them.

The interface to KVM is not changed. The number of counters still be
passed to KVM. It can be updated later separately.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-3-kan.liang@linux.intel.com
2024-07-04 16:00:36 +02:00
Kan Liang
a23eb2fc1d perf/x86/intel: Support the PEBS event mask
The current perf assumes that the counters that support PEBS are
contiguous. But it's not guaranteed with the new leaf 0x23 introduced.
The counters are enumerated with a counter mask. There may be holes in
the counter mask for future platforms or in a virtualization
environment.

Store the PEBS event mask rather than the maximum number of PEBS
counters in the x86 PMU structures.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20240626143545.480761-2-kan.liang@linux.intel.com
2024-07-04 16:00:36 +02:00
Zhang Rui
26579860fb perf/x86/intel/cstate: Add Lunarlake support
Compared with previous client platforms, PC8 is removed from Lunarlake.
It supports CC1/CC6/CC7 and PC2/PC3/PC6/PC10 residency counters.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240628031758.43103-4-rui.zhang@intel.com
2024-07-04 16:00:35 +02:00
Zhang Rui
a31000753d perf/x86/intel/cstate: Add Arrowlake support
Like Alderlake, Arrowlake supports CC1/CC6/CC7 and PC2/PC3/PC6/PC8/PC10.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240628031758.43103-3-rui.zhang@intel.com
2024-07-04 16:00:34 +02:00
Zhang Rui
2c3aedd9db perf/x86/intel/cstate: Fix Alderlake/Raptorlake/Meteorlake
For Alderlake, the spec changes after the patch submitted and PC7/PC9
are removed.

Raptorlake and Meteorlake, which copy the Alderlake cstate PMU, also
don't have PC7/PC9.

Remove PC7/PC9 support for Alderlake/Raptorlake/Meteorlake.

Fixes: d0ca946bcf ("perf/x86/cstate: Add Alder Lake CPU support")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240628031758.43103-2-rui.zhang@intel.com
2024-07-04 16:00:34 +02:00
Peter Zijlstra
0c8ea05e9b Merge branch 'tip/x86/cpu'
The Lunarlake patches rely on the new VFM stuff.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2024-07-04 16:00:24 +02:00
Adrian Hunter
3520b251dc perf/x86/intel/pt: Fix pt_topa_entry_for_page() address calculation
Currently, perf allocates an array of page pointers which is limited in
size by MAX_PAGE_ORDER. That in turn limits the maximum Intel PT buffer
size to 2GiB. Should that limitation be lifted, the Intel PT driver can
support larger sizes, except for one calculation in
pt_topa_entry_for_page(), which is limited to 32-bits.

Fix pt_topa_entry_for_page() address calculation by adding a cast.

Fixes: 39152ee51b ("perf/x86/intel/pt: Get rid of reverse lookup table for ToPA")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240624201101.60186-4-adrian.hunter@intel.com
2024-07-04 16:00:21 +02:00
Adrian Hunter
ad97196379 perf/x86/intel/pt: Fix a topa_entry base address calculation
topa_entry->base is a bit-field. Bit-fields are not promoted to a 64-bit
type, even if the underlying type is 64-bit, and so, if necessary, must
be cast to a larger type when calculations are done.

Fix a topa_entry->base address calculation by adding a cast.

Without the cast, the address was limited to 36-bits i.e. 64GiB.

The address calculation is used on systems that do not support Multiple
Entry ToPA (only Broadwell), and affects physical addresses on or above
64GiB. Instead of writing to the correct address, the address comprising
the first 36 bits would be written to.

Intel PT snapshot and sampling modes are not affected.

Fixes: 52ca9ced3f ("perf/x86/intel/pt: Add Intel PT PMU driver")
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240624201101.60186-3-adrian.hunter@intel.com
2024-07-04 16:00:21 +02:00
Marco Cavenati
5638bd722a perf/x86/intel/pt: Fix topa_entry base length
topa_entry->base needs to store a pfn.  It obviously needs to be
large enough to store the largest possible x86 pfn which is
MAXPHYADDR-PAGE_SIZE (52-12).  So it is 4 bits too small.

Increase the size of topa_entry->base from 36 bits to 40 bits.

Note, systems where physical addresses can be 256TiB or more are affected.

[ Adrian: Amend commit message as suggested by Dave Hansen ]

Fixes: 52ca9ced3f ("perf/x86/intel/pt: Add Intel PT PMU driver")
Signed-off-by: Marco Cavenati <cavenati.marco@gmail.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240624201101.60186-2-adrian.hunter@intel.com
2024-07-04 16:00:20 +02:00
Kan Liang
f8a86a9bb5 perf/x86/intel/uncore: Support HBM and CXL PMON counters
Unknown uncore PMON types can be found in both SPR and EMR with HBM or
CXL.

 $ls /sys/devices/ | grep type
 uncore_type_12_16
 uncore_type_12_18
 uncore_type_12_2
 uncore_type_12_4
 uncore_type_12_6
 uncore_type_12_8
 uncore_type_13_17
 uncore_type_13_19
 uncore_type_13_3
 uncore_type_13_5
 uncore_type_13_7
 uncore_type_13_9

The unknown PMON types are HBM and CXL PMON. Except for the name, the
other information regarding the HBM and CXL PMON counters can be
retrieved via the discovery table. Add them into the uncores tables for
SPR and EMR.

The event config registers for all CXL related units are 8-byte apart.
Add SPR_UNCORE_MMIO_OFFS8_COMMON_FORMAT to specially handle it.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20240614134631.1092359-9-kan.liang@linux.intel.com
2024-06-17 17:57:59 +02:00
Kan Liang
15a4bd5185 perf/x86/uncore: Cleanup unused unit structure
The unit control and ID information are retrieved from the unit control
RB tree. No one uses the old structure anymore. Remove them.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20240614134631.1092359-8-kan.liang@linux.intel.com
2024-06-17 17:57:59 +02:00
Kan Liang
f76a842044 perf/x86/uncore: Apply the unit control RB tree to PCI uncore units
The unit control RB tree has the unit control and unit ID information
for all the PCI units. Use them to replace the box_ctls/pci_offsets to
get an accurate unit control address for PCI uncore units.

The UPI/M3UPI units in the discovery table are ignored. Please see the
commit 65248a9a9e ("perf/x86/uncore: Add a quirk for UPI on SPR").
Manually allocate a unit control RB tree for UPI/M3UPI.
Add cleanup_extra_boxes to release such manual allocation.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20240614134631.1092359-7-kan.liang@linux.intel.com
2024-06-17 17:57:58 +02:00
Kan Liang
b1d9ea2e1c perf/x86/uncore: Apply the unit control RB tree to MSR uncore units
The unit control RB tree has the unit control and unit ID information
for all the MSR units. Use them to replace the box_ctl and
uncore_msr_box_ctl() to get an accurate unit control address for MSR
uncore units.

Add intel_generic_uncore_assign_hw_event(), which utilizes the accurate
unit control address from the unit control RB tree to calculate the
config_base and event_base.

The unit id related information should be retrieved from the unit
control RB tree as well.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20240614134631.1092359-6-kan.liang@linux.intel.com
2024-06-17 17:57:57 +02:00
Kan Liang
80580dae65 perf/x86/uncore: Apply the unit control RB tree to MMIO uncore units
The unit control RB tree has the unit control and unit ID information
for all the units. Use it to replace the box_ctls/mmio_offsets to get
an accurate unit control address for MMIO uncore units.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20240614134631.1092359-5-kan.liang@linux.intel.com
2024-06-17 17:57:57 +02:00
Kan Liang
585463fee6 perf/x86/uncore: Retrieve the unit ID from the unit control RB tree
The box_ids only save the unit ID for the first die. If a unit, e.g., a
CXL unit, doesn't exist in the first die. The unit ID cannot be
retrieved.

The unit control RB tree also stores the unit ID information.
Retrieve the unit ID from the unit control RB tree

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20240614134631.1092359-4-kan.liang@linux.intel.com
2024-06-17 17:57:56 +02:00
Kan Liang
c74443d92f perf/x86/uncore: Support per PMU cpumask
The cpumask of some uncore units, e.g., CXL uncore units, may be wrong
under some configurations. Perf may access an uncore counter of a
non-existent uncore unit.

The uncore driver assumes that all uncore units are symmetric among
dies. A global cpumask is shared among all uncore PMUs. However, some
CXL uncore units may only be available on some dies.

A per PMU cpumask is introduced to track the CPU mask of this PMU.
The driver searches the unit control RB tree to check whether the PMU is
available on a given die, and updates the per PMU cpumask accordingly.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20240614134631.1092359-3-kan.liang@linux.intel.com
2024-06-17 17:57:56 +02:00
Kan Liang
0007f39325 perf/x86/uncore: Save the unit control address of all units
The unit control address of some CXL units may be wrongly calculated
under some configuration on a EMR machine.

The current implementation only saves the unit control address of the
units from the first die, and the first unit of the rest of dies. Perf
assumed that the units from the other dies have the same offset as the
first die. So the unit control address of the rest of the units can be
calculated. However, the assumption is wrong, especially for the CXL
units.

Introduce an RB tree for each uncore type to save the unit control
address and three kinds of ID information (unit ID, PMU ID, and die ID)
for all units.
The unit ID is a physical ID of a unit.
The PMU ID is a logical ID assigned to a unit. The logical IDs start
from 0 and must be contiguous. The physical ID and the logical ID are
1:1 mapping. The units with the same physical ID in different dies share
the same PMU.
The die ID indicates which die a unit belongs to.

The RB tree can be searched by two different keys (unit ID or PMU ID +
die ID). During the RB tree setup, the unit ID is used as a key to look
up the RB tree. The perf can create/assign a proper PMU ID to the unit.
Later, after the RB tree is setup, PMU ID + die ID is used as a key to
look up the RB tree to fill the cpumask of a PMU. It's used more
frequently, so PMU ID + die ID is compared in the unit_less().
The uncore_find_unit() has to be O(N). But the RB tree setup only occurs
once during the driver load time. It should be acceptable.

Compared with the current implementation, more space is required to save
the information of all units. The extra size should be acceptable.
For example, on EMR, there are 221 units at most. For a 2-socket machine,
the extra space is ~6KB at most.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240614134631.1092359-2-kan.liang@linux.intel.com
2024-06-17 17:57:55 +02:00
Thomas Gleixner
bb9bb45f74 perf/x86: Serialize set_attr_rdpmc()
Yue and Xingwei reported a jump label failure. It's caused by the lack of
serialization in set_attr_rdpmc():

CPU0                           CPU1

Assume: x86_pmu.attr_rdpmc == 0

if (val != x86_pmu.attr_rdpmc) {
  if (val == 0)
    ...
  else if (x86_pmu.attr_rdpmc == 0)
    static_branch_dec(&rdpmc_never_available_key);

				if (val != x86_pmu.attr_rdpmc) {
				   if (val == 0)
				      ...
				   else if (x86_pmu.attr_rdpmc == 0)
     FAIL, due to imbalance --->      static_branch_dec(&rdpmc_never_available_key);

The reported BUG() is a consequence of the above and of another bug in the
jump label core code. The core code needs a separate fix, but that cannot
prevent the imbalance problem caused by set_attr_rdpmc().

Prevent this by serializing set_attr_rdpmc() locally.

Fixes: a66734297f ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240610124406.359476013@linutronix.de
2024-06-11 11:25:22 +02:00
Jeff Johnson
dc8e5dfb52 perf/x86/intel: Add missing MODULE_DESCRIPTION() lines
Fix the 'make W=1 C=1' warnings:

  WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-uncore.o
  WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-cstate.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-intel-v1-1-8252194ed20a@quicinc.com
2024-05-31 11:41:15 +02:00
Jeff Johnson
0a44078f2b perf/x86/rapl: Add missing MODULE_DESCRIPTION() line
Fix the warning from 'make C=1 W=1':

  WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/rapl.o

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-v1-1-e45ffa8af99f@quicinc.com
2024-05-31 11:41:04 +02:00
Tony Luck
8e887536b8 perf/x86/rapl: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240520224620.9480-44-tony.luck%40intel.com
2024-05-28 10:59:03 -07:00
Tony Luck
744866f5c0 x86/cpu: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Update INTEL_CPU_DESC() to work with vendor/family/model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240520224620.9480-34-tony.luck%40intel.com
2024-05-28 10:59:03 -07:00
Tony Luck
d142df13f3 perf/x86/intel: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240520224620.9480-32-tony.luck%40intel.com
2024-05-28 10:59:02 -07:00
Linus Torvalds
d90be6e4aa Driver core changes for 6.10-rc1
Here is the small set of driver core and kernfs changes for 6.10-rc1.
 
 Nothing major here at all, just a small set of changes for some driver
 core apis, and minor fixups.  Included in here are:
   - sysfs_bin_attr_simple_read() helper added and used
   - device_show_string() helper added and used
 All usages of these were acked by the various maintainers.  Also in here
 are:
   - kernfs minor cleanup
   - removed unused functions
   - typo fix in documentation
   - pay attention to sysfs_create_link() failures in module.c finally.
 
 All of these have been in linux-next for a very long time with no
 reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk3+hQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylfTwCfUyHWkDZuZ7ehdtjzfmcd4EKZBK8An3AAV99G
 ox8PXMxuFTaUEdT/69FQ
 =2sEo
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the small set of driver core and kernfs changes for 6.10-rc1.

  Nothing major here at all, just a small set of changes for some driver
  core apis, and minor fixups. Included in here are:

   - sysfs_bin_attr_simple_read() helper added and used

   - device_show_string() helper added and used

  All usages of these were acked by the various maintainers. Also in
  here are:

   - kernfs minor cleanup

   - removed unused functions

   - typo fix in documentation

   - pay attention to sysfs_create_link() failures in module.c finally

  All of these have been in linux-next for a very long time with no
  reported problems"

* tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  device property: Fix a typo in the description of device_get_child_node_count()
  kernfs: mount: Remove unnecessary ‘NULL’ values from knparent
  scsi: Use device_show_string() helper for sysfs attributes
  platform/x86: Use device_show_string() helper for sysfs attributes
  perf: Use device_show_string() helper for sysfs attributes
  IB/qib: Use device_show_string() helper for sysfs attributes
  hwmon: Use device_show_string() helper for sysfs attributes
  driver core: Add device_show_string() helper for sysfs attributes
  treewide: Use sysfs_bin_attr_simple_read() helper
  sysfs: Add sysfs_bin_attr_simple_read() helper
  module: don't ignore sysfs_create_link() failures
  driver core: Remove unused platform_notify, platform_notify_remove
2024-05-22 12:13:40 -07:00
Uros Bizjak
cd84351c8c perf/x86/amd: Use try_cmpxchg() in events/amd/{un,}core.c
Replace this pattern in events/amd/{un,}core.c:

    cmpxchg(*ptr, old, new) == old

... with the simpler and faster:

    try_cmpxchg(*ptr, &old, new)

The x86 CMPXCHG instruction returns success in the ZF flag, so this change
saves a compare after the CMPXCHG.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240425101708.5025-1-ubizjak@gmail.com
2024-05-18 11:15:13 +02:00
Ingo Molnar
9d351132ed perf/x86/cstate: Remove unused 'struct perf_cstate_msr'
Use of this structure was removed in:

  8f2a28c585 ("perf/x86/cstate: Use new probe function")

Remove the now stale type as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-05-08 09:53:23 +02:00
Lukas Wunner
b91b73a438 perf: Use device_show_string() helper for sysfs attributes
Deduplicate sysfs ->show() callbacks which expose a string at a static
memory location.  Use the newly introduced device_show_string() helper
in the driver core instead by declaring those sysfs attributes with
DEVICE_STRING_ATTR_RO().

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/3a297850312b4ecb62d6872121de04496900f502.1713608122.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-04 17:37:03 +02:00
Dhananjay Ugwekar
626c5acf39 perf/x86/rapl: Rename 'maxdie' to nr_rapl_pmu and 'dieid' to rapl_pmu_idx
AMD CPUs have the scope of RAPL energy-pkg event as package, whereas
Intel Cascade Lake CPUs have the scope as die.

To account for the difference in the energy-pkg event scope between AMD
and Intel CPUs, give more generic and semantically correct names to the
maxdie and dieid variables.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240502095115.177713-2-Dhananjay.Ugwekar@amd.com
2024-05-02 13:32:21 +02:00
Ingo Molnar
10ed2b1181 Merge branch 'x86/cpu' into perf/core, to pick up dependent commits
We are going to fix perf-events fallout of changes in tip:x86/cpu,
so merge in that branch first.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-05-02 13:31:29 +02:00
Ingo Molnar
ad112b3a75 Linux 6.9-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmYutdweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGG5oH/3Ggwz5N+gdEK3np
 qxUfpgJTWo+fJ6xhRLGy84TnEZ9s9UnK0Su6UuVyOb0F/2Y8hesJ6iwB16yQFKNe
 Nore/VvuBZ+utshz5N20yNyPugNOP74GGbyOm+d+iJwIKnmSE8jSjWyMwNFHJCZM
 BfoBxZrpwU/YD/0PD1KkI44jhPX1H/EcEmtNiklLnuYvJydTWiRFeku+CSgcOiRz
 6fIFrcZMREZrAytMQSwteBAvI3vWblC0S39ZgJmtZt+oi+s1ksIUNG8Mm5uMAiyF
 LcGep2tWV06x9uB9XVvrk0qco/kOaUgYuQrHQwNu9LzsaUdYGKqBBMk41SoNXFkB
 hBXN26U=
 =N2E2
 -----END PGP SIGNATURE-----

Merge tag 'v6.9-rc6' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-05-02 13:12:31 +02:00
Tony Luck
add69475de perf/x86/msr: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/20240424181503.41614-1-tony.luck%40intel.com
2024-04-29 10:31:04 +02:00
Tony Luck
9828a1cff4 perf/x86/intel/uncore: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

  [ bp: Squash *three* uncore patches into one. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/20240424181501.41557-1-tony.luck%40intel.com
2024-04-29 10:30:39 +02:00
Tony Luck
a7011b852a perf/x86/intel/pt: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240424181500.41538-1-tony.luck%40intel.com
2024-04-25 09:04:32 -07:00
Tony Luck
0011a51d73 perf/x86/lbr: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240424181500.41519-1-tony.luck%40intel.com
2024-04-25 09:04:32 -07:00
Tony Luck
5ee800945a perf/x86/intel/cstate: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240424181459.41500-1-tony.luck%40intel.com
2024-04-25 09:04:32 -07:00
Linus Torvalds
817772266d * Clean up SVM's enter/exit assembly code so that it can be compiled
without OBJECT_FILES_NON_STANDARD.  This fixes a warning
   "Unpatched return thunk in use. This should not happen!" when running
   KVM selftests.
 
 * Fix a mostly benign bug in the gfn_to_pfn_cache infrastructure where KVM
   would allow userspace to refresh the cache with a bogus GPA.  The bug has
   existed for quite some time, but was exposed by a new sanity check added in
   6.9 (to ensure a cache is either GPA-based or HVA-based).
 
 * Drop an unused param from gfn_to_pfn_cache_invalidate_start() that got left
   behind during a 6.9 cleanup.
 
 * Fix a math goof in x86's hugepage logic for KVM_SET_MEMORY_ATTRIBUTES that
   results in an array overflow (detected by KASAN).
 
 * Fix a bug where KVM incorrectly clears root_role.direct when userspace sets
   guest CPUID.
 
 * Fix a dirty logging bug in the where KVM fails to write-protect SPTEs used
   by a nested guest, if KVM is using Page-Modification Logging and the nested
   hypervisor is NOT using EPT.
 
 x86 PMU:
 
 * Drop support for virtualizing adaptive PEBS, as KVM's implementation is
   architecturally broken without an obvious/easy path forward, and because
   exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak
   host kernel addresses to the guest.
 
 * Set the enable bits for general purpose counters in PERF_GLOBAL_CTRL at
   RESET time, as done by both Intel and AMD processors.
 
 * Disable LBR virtualization on CPUs that don't support LBR callstacks, as
   KVM unconditionally uses PERF_SAMPLE_BRANCH_CALL_STACK when creating the
   perf event, and would fail on such CPUs.
 
 Tests:
 
 * Fix a flaw in the max_guest_memory selftest that results in it exhausting
   the supply of ucall structures when run with more than 256 vCPUs.
 
 * Mark KVM_MEM_READONLY as supported for RISC-V in set_memory_region_test.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmYjdqcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPNRAgAh1AdKBAWnq9bFN2Np1kSAcRAk3bs
 REDq/0iD1T9TvIwEmE1lHaRuqvCSO15WW+DKvbs7TS8zA0DyY7X/x8sIIy5YzZ5C
 bQ+JXiqk55OAj0sPskBpCvE5qEreuU8qAit57+8OseKWs57EICvJjrfsRnHlmIub
 pgGas3I42LjIgsuZRr2kjv+GrvaiikW+wWK6sq3CvPzTtHV196d26AK5l4NOoLkY
 0FTbBIYUSJ7wxs92xuTed5mZ7JFZdsa5DVMXF5MRZ9W6g2vZCLbqCNRddRhSAsl0
 gKmqZkuPTB7AnGQbJ2h/aKFT0ydsguzqbbKq62sK7ft5f1CUlbp9luDC9w==
 =99rq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "This is a bit on the large side, mostly due to two changes:

   - Changes to disable some broken PMU virtualization (see below for
     details under "x86 PMU")

   - Clean up SVM's enter/exit assembly code so that it can be compiled
     without OBJECT_FILES_NON_STANDARD. This fixes a warning "Unpatched
     return thunk in use. This should not happen!" when running KVM
     selftests.

  Everything else is small bugfixes and selftest changes:

   - Fix a mostly benign bug in the gfn_to_pfn_cache infrastructure
     where KVM would allow userspace to refresh the cache with a bogus
     GPA. The bug has existed for quite some time, but was exposed by a
     new sanity check added in 6.9 (to ensure a cache is either
     GPA-based or HVA-based).

   - Drop an unused param from gfn_to_pfn_cache_invalidate_start() that
     got left behind during a 6.9 cleanup.

   - Fix a math goof in x86's hugepage logic for
     KVM_SET_MEMORY_ATTRIBUTES that results in an array overflow
     (detected by KASAN).

   - Fix a bug where KVM incorrectly clears root_role.direct when
     userspace sets guest CPUID.

   - Fix a dirty logging bug in the where KVM fails to write-protect
     SPTEs used by a nested guest, if KVM is using Page-Modification
     Logging and the nested hypervisor is NOT using EPT.

  x86 PMU:

   - Drop support for virtualizing adaptive PEBS, as KVM's
     implementation is architecturally broken without an obvious/easy
     path forward, and because exposing adaptive PEBS can leak host LBRs
     to the guest, i.e. can leak host kernel addresses to the guest.

   - Set the enable bits for general purpose counters in
     PERF_GLOBAL_CTRL at RESET time, as done by both Intel and AMD
     processors.

   - Disable LBR virtualization on CPUs that don't support LBR
     callstacks, as KVM unconditionally uses
     PERF_SAMPLE_BRANCH_CALL_STACK when creating the perf event, and
     would fail on such CPUs.

  Tests:

   - Fix a flaw in the max_guest_memory selftest that results in it
     exhausting the supply of ucall structures when run with more than
     256 vCPUs.

   - Mark KVM_MEM_READONLY as supported for RISC-V in
     set_memory_region_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits)
  KVM: Drop unused @may_block param from gfn_to_pfn_cache_invalidate_start()
  KVM: selftests: Add coverage of EPT-disabled to vmx_dirty_log_test
  KVM: x86/mmu: Fix and clarify comments about clearing D-bit vs. write-protecting
  KVM: x86/mmu: Remove function comments above clear_dirty_{gfn_range,pt_masked}()
  KVM: x86/mmu: Write-protect L2 SPTEs in TDP MMU when clearing dirty status
  KVM: x86/mmu: Precisely invalidate MMU root_role during CPUID update
  KVM: VMX: Disable LBR virtualization if the CPU doesn't support LBR callstacks
  perf/x86/intel: Expose existence of callback support to KVM
  KVM: VMX: Snapshot LBR capabilities during module initialization
  KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms
  KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible
  KVM: x86: Stop compiling vmenter.S with OBJECT_FILES_NON_STANDARD
  KVM: SVM: Create a stack frame in __svm_sev_es_vcpu_run()
  KVM: SVM: Save/restore args across SEV-ES VMRUN via host save area
  KVM: SVM: Save/restore non-volatile GPRs in SEV-ES VMRUN via host save area
  KVM: SVM: Clobber RAX instead of RBX when discarding spec_ctrl_intercepted
  KVM: SVM: Drop 32-bit "support" from __svm_sev_es_vcpu_run()
  KVM: SVM: Wrap __svm_sev_es_vcpu_run() with #ifdef CONFIG_KVM_AMD_SEV
  KVM: SVM: Create a stack frame in __svm_vcpu_run() for unwinding
  KVM: SVM: Remove a useless zeroing of allocated memory
  ...
2024-04-20 11:10:51 -07:00
Ingo Molnar
d0331aa978 Merge branch 'linus' into perf/core, to pick up perf/urgent fixes
Pick up perf/urgent fixes that are upstream already, but not
yet in the perf/core development branch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-14 22:25:18 +02:00
Sean Christopherson
0d0b608650 perf/x86/intel: Expose existence of callback support to KVM
Add a "has_callstack" field to the x86_pmu_lbr structure used to pass
information to KVM, and set it accordingly in x86_perf_get_lbr().  KVM
will use has_callstack to avoid trying to create perf LBR events with
PERF_SAMPLE_BRANCH_CALL_STACK on CPUs that don't support callstacks.

Reviewed-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20240307011344.835640-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:47 -07:00
Zhang Rui
acf68d98ca perf/x86/rapl: Add support for Intel Lunar Lake
Lunar Lake RAPL support is the same as previous Sky Lake.
Add Lunar Lake model for RAPL.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240410124554.448987-2-rui.zhang@intel.com
2024-04-10 14:48:18 +02:00
Zhang Rui
fb70fe74be perf/x86/rapl: Add support for Intel Arrow Lake
Arrow Lake RAPL support is the same as previous Sky Lake.
Add Arrow Lake model for RAPL.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240410124554.448987-1-rui.zhang@intel.com
2024-04-10 14:48:18 +02:00
Namhyung Kim
dec8ced871 perf/x86: Fix out of range data
On x86 each struct cpu_hw_events maintains a table for counter assignment but
it missed to update one for the deleted event in x86_pmu_del().  This
can make perf_clear_dirty_counters() reset used counter if it's called
before event scheduling or enabling.  Then it would return out of range
data which doesn't make sense.

The following code can reproduce the problem.

  $ cat repro.c
  #include <pthread.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <unistd.h>
  #include <linux/perf_event.h>
  #include <sys/ioctl.h>
  #include <sys/mman.h>
  #include <sys/syscall.h>

  struct perf_event_attr attr = {
  	.type = PERF_TYPE_HARDWARE,
  	.config = PERF_COUNT_HW_CPU_CYCLES,
  	.disabled = 1,
  };

  void *worker(void *arg)
  {
  	int cpu = (long)arg;
  	int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
  	int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
  	void *p;

  	do {
  		ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0);
  		p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
  		ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);

  		ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0);
  		munmap(p, 4096);
  		ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0);
  	} while (1);

  	return NULL;
  }

  int main(void)
  {
  	int i;
  	int n = sysconf(_SC_NPROCESSORS_ONLN);
  	pthread_t *th = calloc(n, sizeof(*th));

  	for (i = 0; i < n; i++)
  		pthread_create(&th[i], NULL, worker, (void *)(long)i);
  	for (i = 0; i < n; i++)
  		pthread_join(th[i], NULL);

  	free(th);
  	return 0;
  }

And you can see the out of range data using perf stat like this.
Probably it'd be easier to see on a large machine.

  $ gcc -o repro repro.c -pthread
  $ ./repro &
  $ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }'
       1.001028462 CPU6   196,719,295,683,763      cycles                           # 194290.996 GHz                       (71.54%)
       1.001028462 CPU3   396,077,485,787,730      branch-misses                    # 15804359784.80% of all branches      (71.07%)
       1.001028462 CPU17  197,608,350,727,877      branch-misses                    # 14594186554.56% of all branches      (71.22%)
       2.020064073 CPU4   198,372,472,612,140      cycles                           # 194681.113 GHz                       (70.95%)
       2.020064073 CPU6   199,419,277,896,696      cycles                           # 195720.007 GHz                       (70.57%)
       2.020064073 CPU20  198,147,174,025,639      cycles                           # 194474.654 GHz                       (71.03%)
       2.020064073 CPU20  198,421,240,580,145      stalled-cycles-frontend          #  100.14% frontend cycles idle        (70.93%)
       3.037443155 CPU4   197,382,689,923,416      cycles                           # 194043.065 GHz                       (71.30%)
       3.037443155 CPU20  196,324,797,879,414      cycles                           # 193003.773 GHz                       (71.69%)
       3.037443155 CPU5   197,679,956,608,205      stalled-cycles-backend           # 1315606428.66% backend cycles idle   (71.19%)
       3.037443155 CPU5   198,571,860,474,851      instructions                     # 13215422.58  insn per cycle

It should move the contents in the cpuc->assign as well.

Fixes: 5471eea5d3 ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240306061003.1894224-1-namhyung@kernel.org
2024-04-10 06:12:01 +02:00
Kan Liang
312be9fc22 perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event
The MSR_PEBS_DATA_CFG MSR register is used to configure which data groups
should be generated into a PEBS record, and it's shared among all counters.

If there are different configurations among counters, perf combines all the
configurations.

The first perf command as below requires a complete PEBS record
(including memory info, GPRs, XMMs, and LBRs). The second perf command
only requires a basic group. However, after the second perf command is
running, the MSR_PEBS_DATA_CFG register is cleared. Only a basic group is
generated in a PEBS record, which is wrong. The required information
for the first perf command is missed.

 $ perf record --intr-regs=AX,SP,XMM0 -a -C 8 -b -W -d -c 100000003 -o /dev/null -e cpu/event=0xd0,umask=0x81/upp &
 $ sleep 5
 $ perf record  --per-thread  -c 1  -e cycles:pp --no-timestamp --no-tid taskset -c 8 ./noploop 1000

The first PEBS event is a system-wide PEBS event. The second PEBS event
is a per-thread event. When the thread is scheduled out, the
intel_pmu_pebs_del() function is invoked to update the PEBS state.
Since the system-wide event is still available, the cpuc->n_pebs is 1.
The cpuc->pebs_data_cfg is cleared. The data configuration for the
system-wide PEBS event is lost.

The (cpuc->n_pebs == 1) check was introduced in commit:

  b6a32f023f ("perf/x86: Fix PEBS threshold initialization")

At that time, it indeed didn't hurt whether the state was updated
during the removal, because only the threshold is updated.

The calculation of the threshold takes the last PEBS event into
account.

However, since commit:

  b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")

we delay the threshold update, and clear the PEBS data config, which triggers
the bug.

The PEBS data config update scope should not be shrunk during removal.

[ mingo: Improved the changelog & comments. ]

Fixes: b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240401133320.703971-1-kan.liang@linux.intel.com
2024-04-03 10:19:20 +02:00
Andrii Nakryiko
9794563d4d perf/x86/amd: Don't reject non-sampling events with configured LBR
Now that it's possible to capture LBR on AMD CPU from BPF at arbitrary
point, there is no reason to artificially limit this feature to just
sampling events. So corresponding check is removed. AFAIU, there is no
correctness implications of doing this (and it was possible to bypass
this check by just setting perf_event's sample_period to 1 anyways, so
it doesn't guard all that much).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240402022118.1046049-5-andrii@kernel.org
2024-04-03 09:14:26 +02:00
Andrii Nakryiko
a4d18112e5 perf/x86/amd: Support capturing LBR from software events
Upstream commit c22ac2a3d4 ("perf: Enable branch record for software
events") added ability to capture LBR (Last Branch Records) on Intel CPUs
from inside BPF program at pretty much any arbitrary point. This is
extremely useful capability that allows to figure out otherwise
hard to debug problems, because LBR is now available based on some
application-defined conditions, not just hardware-supported events.

'retsnoop' is one such tool that takes a huge advantage of this
functionality and has proved to be an extremely useful tool in
practice:

  https://github.com/anakryiko/retsnoop

Now, AMD Zen4 CPUs got support for similar LBR functionality, but
necessary wiring inside the kernel is not yet setup. This patch seeks to
rectify this and follows a similar approach to the original patch
for Intel CPUs. We implement an AMD-specific callback set to be called
through perf_snapshot_branch_stack static call.

Previous preparatory patches ensured that amd_pmu_core_disable_all() and
__amd_pmu_lbr_disable() will be completely inlined and will have no
branches, so LBR snapshot contamination will be minimized.

This was tested on AMD Bergamo CPU and worked well when utilized from
the aforementioned retsnoop tool.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240402022118.1046049-4-andrii@kernel.org
2024-04-03 09:14:26 +02:00
Andrii Nakryiko
1eddf187e5 perf/x86/amd: Avoid taking branches before disabling LBR
In the following patches we will enable LBR capture on AMD CPUs at
arbitrary point in time, which means that LBR recording won't be frozen
by hardware automatically as part of hardware overflow event. So we need
to take care to minimize amount of branches and function calls/returns
on the path to freezing LBR, minimizing LBR snapshot altering as much as
possible.

As such, split out LBR disabling logic from the sanity checking logic
inside amd_pmu_lbr_disable_all(). This will ensure that no branches are
taken before LBR is frozen in the functionality added in the next patch.
Use __always_inline to also eliminate any possible function calls.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240402022118.1046049-3-andrii@kernel.org
2024-04-03 09:14:26 +02:00
Andrii Nakryiko
0dbf66fa7e perf/x86/amd: Ensure amd_pmu_core_disable_all() is always inlined
In the following patches we will enable LBR capture on AMD CPUs at
arbitrary point in time, which means that LBR recording won't be frozen
by hardware automatically as part of hardware overflow event. So we need
to take care to minimize amount of branches and function calls/returns
on the path to freezing LBR, minimizing LBR snapshot altering as much as
possible.

amd_pmu_core_disable_all() is one of the functions on this path, and is
already marked as __always_inline. But it calls amd_pmu_set_global_ctl()
which is marked as just inline.  So to guarantee no function call will
be generated thoughout mark amd_pmu_set_global_ctl() as __always_inline
as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20240402022118.1046049-2-andrii@kernel.org
2024-04-03 09:14:26 +02:00
Ingo Molnar
9b4e528557 Linux 6.9-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmYJ1nceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGzPkH/jQ9hL1zS3XS+wJw
 GthWkLl70quCueWkLaOYdhfPwJpRDVIrr7A533q1Hxp35nWx6bEJpmvBHys8Dm/x
 5OOf595voL7UhdY+93k0IRyJBSMApBgW83AO6vqndpld95TrOxItdvlnpuXiqLxn
 5S4vf5LSWE7B2k0iWq5Uv6UnBwQ19F9SB6d0ujLHYBp5VcwDuctIPrD+2+XYKBBf
 /PWZtUgn4FeL6Ez1MJGs2qGSK8vxalqozqsrIr/LAQayhwGA+OM7Q/hqVQmjbek7
 QdGJPRMsngbq+QxbMzKA5q57tM7RKnrZgnOi3hC/FWpHNXPPHdeBfnPqpRw+ovJA
 qLvEBI0=
 =2ioy
 -----END PGP SIGNATURE-----

Merge tag 'v6.9-rc2' into perf/core, to pick up dependent commits

Pick up fixes that followup patches are going to depend on.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-03 09:13:09 +02:00
Sandipan Das
68cdf1e6e8 perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later
Add the "ref-cycles" event for AMD processors based on Zen 4 and later
microarchitectures. The backing event is based on PMCx120 which counts
cycles not in halt state in P0 frequency (same as MPERF).

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/089155f19f7c7e65aeb1caa727a882e2ca9b8b04.1711352180.git.sandipan.das@amd.com
2024-03-26 09:04:21 +01:00
Sandipan Das
c7b2edd837 perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later
AMD processors based on Zen 2 and later microarchitectures do not
support PMCx087 (instruction pipe stalls) which is used as the backing
event for "stalled-cycles-frontend" and "stalled-cycles-backend".

Use PMCx0A9 (cycles where micro-op queue is empty) instead to count
frontend stalls and remove the entry for backend stalls since there
is no direct replacement.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Fixes: 3fe3331bb2 ("perf/x86/amd: Add event map for AMD Family 17h")
Link: https://lore.kernel.org/r/03d7fc8fa2a28f9be732116009025bdec1b3ec97.1711352180.git.sandipan.das@amd.com
2024-03-26 09:03:40 +01:00
Sandipan Das
598c2fafc0 perf/x86/amd/lbr: Use freeze based on availability
Currently, the LBR code assumes that LBR Freeze is supported on all processors
when X86_FEATURE_AMD_LBR_V2 is available i.e. CPUID leaf 0x80000022[EAX]
bit 1 is set. This is incorrect as the availability of the feature is
additionally dependent on CPUID leaf 0x80000022[EAX] bit 2 being set,
which may not be set for all Zen 4 processors.

Define a new feature bit for LBR and PMC freeze and set the freeze enable bit
(FLBRI) in DebugCtl (MSR 0x1d9) conditionally.

It should still be possible to use LBR without freeze for profile-guided
optimization of user programs by using an user-only branch filter during
profiling. When the user-only filter is enabled, branches are no longer
recorded after the transition to CPL 0 upon PMI arrival. When branch
entries are read in the PMI handler, the branch stack does not change.

E.g.

  $ perf record -j any,u -e ex_ret_brn_tkn ./workload

Since the feature bit is visible under flags in /proc/cpuinfo, it can be
used to determine the feasibility of use-cases which require LBR Freeze
to be supported by the hardware such as profile-guided optimization of
kernels.

Fixes: ca5b7c0d96 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/69a453c97cfd11c6f2584b19f937fe6df741510f.1711091584.git.sandipan.das@amd.com
2024-03-25 11:16:55 +01:00
Erick Archer
dfbc411e0a perf/x86/rapl: Prefer struct_size() over open coded arithmetic
This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows:

  https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
  https://github.com/KSPP/linux/issues/160

As the "rapl_pmus" variable is a pointer to "struct rapl_pmus" and
this structure ends in a flexible array:

  struct rapl_pmus {
	[...]
	struct rapl_pmu *pmus[] __counted_by(maxdie);
  };

the preferred way in the kernel is to use the struct_size() helper to
do the arithmetic instead of the calculation "size + count * size" in
the kzalloc() function.

This way, the code is more readable and safer.

Signed-off-by: Erick Archer <erick.archer@gmx.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240317164442.6729-1-erick.archer@gmx.com
2024-03-21 20:58:43 +01:00
Sandipan Das
ad8c91282c perf/x86/amd/core: Avoid register reset when CPU is dead
When bringing a CPU online, some of the PMC and LBR related registers
are reset. The same is done when a CPU is taken offline although that
is unnecessary. This currently happens in the "cpu_dead" callback which
is also incorrect as the callback runs on a control CPU instead of the
one that is being taken offline. This also affects hibernation and
suspend to RAM on some platforms as reported in the link below.

Fixes: 21d59e3e2c ("perf/x86/amd/core: Detect PerfMonV2 support")
Reported-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/550a026764342cf7e5812680e3e2b91fe662b5ac.1706526029.git.sandipan.das@amd.com
2024-03-13 11:01:30 +01:00
Sandipan Das
29297ffffb perf/x86/amd/lbr: Discard erroneous branch entries
The Revision Guide for AMD Family 19h Model 10-1Fh processors declares
Erratum 1452 which states that non-branch entries may erroneously be
recorded in the Last Branch Record (LBR) stack with the valid and
spec bits set.

Such entries can be recognized by inspecting bit 61 of the corresponding
LastBranchStackToIp register. This bit is currently reserved but if found
to be set, the associated branch entry should be discarded.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://bugzilla.kernel.org/attachment.cgi?id=305518
Link: https://lore.kernel.org/r/3ad2aa305f7396d41a40e3f054f740d464b16b7f.1706526029.git.sandipan.das@amd.com
2024-03-13 11:01:30 +01:00
Linus Torvalds
fcc196579a Misc cleanups, including a large series from Thomas Gleixner to
cure Sparse warnings.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmXvAFQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hkDRAAwASVCQ88kiGqNQtHibXlK54mAFGsc0xv
 T8OPds15DUzoLg/y8lw0X0DHly6MdGXVmygybejNIw2BN4lhLjQ7f4Ria7rv7LDy
 FcI1jfvysEMyYRFHGRefb/GBFzuEfKoROwf+QylGmKz0ZK674gNMngsI9pwOBdbe
 wElq3IkHoNuTUfH9QA4BvqGam1n122nvVTop3g0PMHWzx9ky8hd/BEUjXFZhfINL
 zZk3fwUbER2QYbhHt+BN2GRbdf2BrKvqTkXpKxyXTdnpiqAo0CzBGKerZ62H82qG
 n737Nib1lrsfM5yDHySnau02aamRXaGvCJUd6gpac1ZmNpZMWhEOT/0Tr/Nj5ztF
 lUAvKqMZn/CwwQky1/XxD0LHegnve0G+syqQt/7x7o1ELdiwTzOWMCx016UeodzB
 yyHf3Xx9J8nt3snlrlZBaGEfegg9ePLu5Vir7iXjg3vrloUW8A+GZM62NVxF4HVV
 QWF80BfWf8zbLQ/OS1382t1shaioIe5pEXzIjcnyVIZCiiP2/5kP2O6P4XVbwVlo
 Ca5eEt8U1rtsLUZaCzI2ZRTQf/8SLMQWyaV+ZmkVwcVdFoARC31EgdE5wYYoZOf6
 7Vl+rXd+rZCuTWk0ZgznCZEm75aaqukaQCBa2V8hIVociLFVzhg/Tjedv7s0CspA
 hNfxdN1LDZc=
 =0eJ7
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups, including a large series from Thomas Gleixner to cure
  sparse warnings"

* tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Drop unused declaration of proc_nmi_enabled()
  x86/callthunks: Use EXPORT_PER_CPU_SYMBOL_GPL() for per CPU variables
  x86/cpu: Provide a declaration for itlb_multihit_kvm_mitigation
  x86/cpu: Use EXPORT_PER_CPU_SYMBOL_GPL() for x86_spec_ctrl_current
  x86/uaccess: Add missing __force to casts in __access_ok() and valid_user_address()
  x86/percpu: Cure per CPU madness on UP
  smp: Consolidate smp_prepare_boot_cpu()
  x86/msr: Add missing __percpu annotations
  x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h>
  perf/x86/amd/uncore: Fix __percpu annotation
  x86/nmi: Remove an unnecessary IS_ENABLED(CONFIG_SMP)
  x86/apm_32: Remove dead function apm_get_battery_status()
  x86/insn-eval: Fix function param name in get_eff_addr_sib()
2024-03-11 19:37:56 -07:00
Thomas Gleixner
154fcf3a78 x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h>
To clean up the per CPU insanity of UP which causes sparse to be rightfully
unhappy and prevents the usage of the generic per CPU accessors on cpu_info
it is necessary to include <linux/percpu.h> into <asm/msr.h>.

Including <linux/percpu.h> into <asm/msr.h> is impossible because it ends
up in header dependency hell. The problem is that <asm/processor.h>
includes <asm/msr.h>. The inclusion of <linux/percpu.h> results in a
compile fail where the compiler cannot longer handle an include in
<asm/cpufeature.h> which references boot_cpu_data which is
defined in <asm/processor.h>.

The only reason why <asm/msr.h> is included in <asm/processor.h> are the
set/get_debugctlmsr() inlines. They are defined there because <asm/processor.h>
is such a nice dump ground for everything. In fact they belong obviously
into <asm/debugreg.h>.

Move them to <asm/debugreg.h> and fix up the resulting damage which is just
exposing the reliance on random include chains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240304005104.454678686@linutronix.de
2024-03-04 12:01:39 +01:00
Thomas Gleixner
9eae297d5d perf/x86/amd/uncore: Fix __percpu annotation
The __percpu annotation in struct amd_uncore is confusing Sparse:

  uncore.c:649:10: sparse: warning: incorrect type in initializer (different address spaces)
  uncore.c:649:10: sparse:    expected void const [noderef] __percpu *__vpp_verify
  uncore.c:649:10: sparse:    got union amd_uncore_info *

The reason is that the __percpu annotation sits between the '*'
dereferencing operator and the member name.

Move it before the dereferencing operator to cure this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240304005104.394845326@linutronix.de
2024-03-04 11:58:36 +01:00
Thomas Gleixner
89b0f15f40 x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
Now that __num_cores_per_package and __num_threads_per_package are
available, cpuinfo::x86_max_cores and the related math all over the place
can be replaced with the ready to consume data.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210253.176147806@linutronix.de
2024-02-16 15:51:32 +01:00
Thomas Gleixner
bd745d1c41 x86/cpu/topology: Rename topology_max_die_per_package()
The plural of die is dies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210253.065874205@linutronix.de
2024-02-15 22:07:45 +01:00
Thomas Gleixner
7e3ec62867 x86/cpu/amd: Provide a separate accessor for Node ID
AMD (ab)uses topology_die_id() to store the Node ID information and
topology_max_dies_per_pkg to store the number of nodes per package.

This collides with the proper processor die level enumeration which is
coming on AMD with CPUID 8000_0026, unless there is a correlation between
the two. There is zero documentation about that.

So provide new storage and new accessors which for now still access die_id
and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are
converted over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.956116738@linutronix.de
2024-02-15 22:07:37 +01:00
Linus Torvalds
aac4de465a Performance events changes for v6.8 are:
- Add branch stack counters ABI extension to better capture
    the growing amount of information the PMU exposes via
    branch stack sampling. There's matching tooling support.
 
  - Fix race when creating the nr_addr_filters sysfs file
 
  - Add Intel Sierra Forest and Grand Ridge intel/cstate
    PMU support.
 
  - Add Intel Granite Rapids, Sierra Forest and Grand Ridge
    uncore PMU support.
 
  - Misc cleanups & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWb4lURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jlnQ/+NSzrPQ9hEiS5a1iMMxdwC6IoXCmeFVsv
 s5NsGaVC7FEgjm3oCfvQlP63HolMO9R7TNLZsgINzOda5IHtE7WUcgBK7gbZr+NT
 WabdTyFrdmUr+Br0rLrEe0bxDSQU7r41ptqKE5HZRM9/3SbLhWgaXSJbfFAG2JV0
 xboZ/2qzb7Puch6VTWv1YhuIpr1Pi817As4SOo7JR4V8jBB2bh2eZ7XBN1z23aw2
 xuglbYml5gs4dOaFTqkRLWyn2PmrZ9wYKcdp63FVUscZ4LxvSw749BxEcNpTbxLp
 PT6uXIKw9PnStNfscfrsk6fDocVJzqrOK71blgiOKbmhWTE0UimEpFf1Hd3ooewg
 hFp3hmkE5Bc2MTUnwivkBxj96fz5rXH+3+Cue/5NsvDNlhlkswIIxzDw8M1G4rOI
 KQMDUYFOhQPa3Hi1lSp2SgHI5AcYHudepr/Z3QMxD3iLs+Wo2cmDcp8d2VrMLfb7
 GHSITG592iYcZPYsJosxby8CSFaUPxIl9l3AODQwWuEjd4PcOYa6iB2HbEa/mC3R
 wXcs8mFIMAaH/HRYUlqUDA5pOqN5chb13iDtS4JqJqBKyWgdrDLCVxoZSQvB64+I
 bldyy1e5oQSVVwJ42WLkUK3Eld2x75ki1JLZFwMgYuOgQv3jfu2VNenUWJ5ig0La
 dPpHP8PwOoc=
 =2O/5
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:

 - Add branch stack counters ABI extension to better capture the growing
   amount of information the PMU exposes via branch stack sampling.
   There's matching tooling support.

 - Fix race when creating the nr_addr_filters sysfs file

 - Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support

 - Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU
   support

 - Misc cleanups & fixes

* tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Factor out topology_gidnid_map()
  perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()
  perf/x86/amd: Reject branch stack for IBS events
  perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge
  perf/x86/intel/uncore: Support IIO free-running counters on GNR
  perf/x86/intel/uncore: Support Granite Rapids
  perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array
  perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR
  perf: Fix the nr_addr_filters fix
  perf/x86/intel/cstate: Add Grand Ridge support
  perf/x86/intel/cstate: Add Sierra Forest support
  x86/smp: Export symbol cpu_clustergroup_mask()
  perf/x86/intel/cstate: Cleanup duplicate attr_groups
  perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file
  perf/x86/intel: Support branch counters logging
  perf/x86/intel: Reorganize attrs and is_visible
  perf: Add branch_sample_call_stack
  perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
  perf: Add branch stack counters
2024-01-08 19:37:20 -08:00
Linus Torvalds
b51cc5d028 x86/cleanups changes for v6.8:
- A micro-optimization got misplaced as a cleanup:
     - Micro-optimize the asm code in secondary_startup_64_no_verify()
 
  - Change global variables to local
  - Add missing kernel-doc function parameter descriptions
  - Remove unused parameter from a macro
  - Remove obsolete Kconfig entry
  - Fix comments
  - Fix typos, mostly scripted, manually reviewed
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWb2i8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iFIQ//RjqKWmEBfv0UVCNgtRgkUKOvYVkfhC1R
 FykHWbSE+/oDODS7B+gbWqzl9Fq2Oxx9re4KZuMfnojE96KZ6H1flQn7z3UVRUrf
 pfMx13E+uyf7qbVZktqH38lUS4s/AHdX2PKCiXlU/0hIkiBdjbAl3ylyqMv7ytIL
 Fi2N9iYJN+eLlMkc3A5IK83xNiU8rb0gO6Uywn3nUbqadY/YX2gDpND5kfzRIneR
 lTKy4rX3+E65qYB2Ly1wDr7e0Q0rgaTzPctx6twFrxQXK+MsHiartJhM5juND/tU
 DEjSW9ISOHlitKEJI/zbdrvJlr5AKDNy2zHYmQQuqY6+YHRamCKqwIjLIPkKj52g
 lAbosNwvp/o8W3zUHgUfVZR5hVxN863zV2qa/ehoQ3b/9kNjQC8actILjYEgIVu9
 av1sd+nETbjCUABIF9H9uAoRbgc+wQs2nupJZrjvginFz8+WVhgaBdJDMYCNAmjc
 fNMjGtRS7YXiIMj09ZAXFThVW302FdbTgggDh/qlQlDOXFu5HRbyuWR+USr4/jkP
 qs2G6m/BHDs9HxDRo/no+ccSrUBV5phfhZbO7qwjTf2NJJvPHW+cxGpT00zU2v8A
 lgfVI7SDkxwbyi1gacJ054GqEhsWuEdi40ikqxjhL8Oq4xwwsey/PiaIxjkDQx92
 Gj3XUSDnGEs=
 =kUav
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:

 - Change global variables to local

 - Add missing kernel-doc function parameter descriptions

 - Remove unused parameter from a macro

 - Remove obsolete Kconfig entry

 - Fix comments

 - Fix typos, mostly scripted, manually reviewed

and a micro-optimization got misplaced as a cleanup:

 - Micro-optimize the asm code in secondary_startup_64_no_verify()

* tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch/x86: Fix typos
  x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify()
  x86/docs: Remove reference to syscall trampoline in PTI
  x86/Kconfig: Remove obsolete config X86_32_SMP
  x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro
  x86/mtrr: Document missing function parameters in kernel-doc
  x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
2024-01-08 17:23:32 -08:00
Paolo Bonzini
9710794640 KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
When commit c59a1f106f ("KVM: x86/pmu: Add IA32_PEBS_ENABLE
MSR emulation for extended PEBS") switched the initialization of
cpuc->guest_switch_msrs to use compound literals, it screwed up
the boolean logic:

+	u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
...
-	arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
-	arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
+               .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),

Before the patch, the value of arr[0].guest would have been intel_ctrl &
~cpuc->intel_ctrl_host_mask & ~pebs_mask.  The intent is to always treat
PEBS events as host-only because, while the guest runs, there is no way
to tell the processor about the virtual address where to put PEBS records
intended for the host.

Unfortunately, the new expression can be expanded to

	(intel_ctrl & ~cpuc->intel_ctrl_host_mask) | (intel_ctrl & ~pebs_mask)

which makes no sense; it includes any bit that isn't *both* marked as
exclude_guest and using PEBS.  So, reinstate the old logic.  Another
way to write it could be "intel_ctrl & ~(cpuc->intel_ctrl_host_mask |
pebs_mask)", presumably the intention of the author of the faulty.
However, I personally find the repeated application of A AND NOT B to
be a bit more readable.

This shows up as guest failures when running concurrent long-running
perf workloads on the host, and was reported to happen with rcutorture.
All guests on a given host would die simultaneously with something like an
instruction fault or a segmentation violation.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Analyzed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Fixes: c59a1f106f ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-04 16:31:27 +01:00
Bjorn Helgaas
54aa699e80 arch/x86: Fix typos
Fix typos, most reported by "codespell arch/x86".  Only touches comments,
no code changes.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
2024-01-03 11:46:22 +01:00
Alexander Antonov
fdd041028f perf/x86/intel/uncore: Factor out topology_gidnid_map()
The same code is used for retrieving package ID procedure from GIDNIDMAP
register. Factor out topology_gidnid_map() to avoid code duplication.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20231127185246.2371939-3-alexander.antonov@linux.intel.com
2023-11-30 14:29:53 +01:00
Alexander Antonov
1692cf434b perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()
Get logical socket id instead of physical id in discover_upi_topology()
to avoid out-of-bound access on 'upi = &type->topology[nid][idx];' line
that leads to NULL pointer dereference in upi_fill_topology()

Fixes: f680b6e606 ("perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server")
Reported-by: Kyle Meyer <kyle.meyer@hpe.com>
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Kyle Meyer <kyle.meyer@hpe.com>
Link: https://lore.kernel.org/r/20231127185246.2371939-2-alexander.antonov@linux.intel.com
2023-11-30 14:29:52 +01:00
Namhyung Kim
0f9e0d7928 perf/x86/amd: Reject branch stack for IBS events
The AMD IBS PMU doesn't handle branch stacks, so it should not accept
events with brstack.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231130062246.290-1-ravi.bangoria@amd.com
2023-11-30 09:34:40 +01:00
Kan Liang
cb4a6ccf35 perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge
The same as Granite Rapids, the Sierra Forest and Grand Ridge also
supports the discovery table feature and the same type of the uncore
units. The difference of the available units and counters can be
retrieved from the discovery table automatically.
Just add the CPU model ID.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-5-kan.liang@linux.intel.com
2023-11-24 20:25:03 +01:00
Kan Liang
388d76175b perf/x86/intel/uncore: Support IIO free-running counters on GNR
The free-running counters for IIO uncore blocks on Granite Rapids are
similar to Sapphire Rapids. The key difference is the offset of the
registers. The number of the IIO uncore blocks can also be retrieved
from the discovery table.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-4-kan.liang@linux.intel.com
2023-11-24 20:25:02 +01:00
Kan Liang
632c4bf6d0 perf/x86/intel/uncore: Support Granite Rapids
The same as Sapphire Rapids, Granite Rapids also supports the discovery
table feature. All the basic uncore PMON information can be retrieved
from the discovery table which resides in the BIOS.

There are 4 new units are added on Granite Rapids, b2cmi, b2cxl, ubox,
and mdf_sbo. The layout of the counters is exactly the same as the
generic uncore counters. Only add a name for the new units. All the
details can be retrieved from the discovery table.
The description of the new units can be found at
https://www.intel.com/content/www/us/en/secure/content-details/772943/content-details.html

The other units, e.g., cha, iio, irp, pcu, and imc, are the same as
Sapphire Rapids.

Ignore the upi and b2upi units in the discovery table, which are broken
for now.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-3-kan.liang@linux.intel.com
2023-11-24 20:25:01 +01:00
Kan Liang
b560e0cd88 perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array
The current perf doesn't save the complete address of an uncore unit.
The complete address of each unit is calculated by the base address +
offset. The type of the base address is u64, while the type of offset is
unsigned.
In the old platforms (without the discovery table method), the base
address and offset are hard coded in the driver. Perf can always use the
lowest address as the base address. Everything works well.

In the new platforms (starting from SPR), the discovery table provides
a complete address for all uncore units. To follow the current
framework/codes, when parsing the discovery table, the complete address
of the first box is stored as a base address. The offset of the
following units is calculated by the complete address of the unit minus
the base address (the address of the first unit). On GNR, the latter
units may have a lower address compared to the first unit. So the offset
is a negative value. The upper 32 bits are lost when casting a negative
u64 to an unsigned type.

Use u64 to replace unsigned for the uncore offsets array to correct the
above case. There is no functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-2-kan.liang@linux.intel.com
2023-11-24 20:25:01 +01:00
Kan Liang
cf35791476 perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR
Factor out SPR_UNCORE_MMIO_COMMON_FORMAT which can be reused by
Granite Rapids in the following patch.

Granite Rapids have more uncore units than Sapphire Rapids. Add new
parameters to support adjustable uncore units.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-1-kan.liang@linux.intel.com
2023-11-24 20:25:00 +01:00
Dapeng Mi
e8df9d9f42 perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
When running perf-stat command on Intel hybrid platform, perf-stat
reports the following errors:

  sudo taskset -c 7 ./perf stat -vvvv -e cpu_atom/instructions/ sleep 1

  Opening: cpu/cycles/:HG
  ------------------------------------------------------------
  perf_event_attr:
    type                             0 (PERF_TYPE_HARDWARE)
    config                           0xa00000000
    disabled                         1
  ------------------------------------------------------------
  sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
  sys_perf_event_open failed, error -16

   Performance counter stats for 'sleep 1':

       <not counted>      cpu_atom/instructions/

It looks the cpu_atom/instructions/ event can't be enabled on atom PMU
even when the process is pinned on atom core. Investigation shows that
exclusive_event_init() helper always returns -EBUSY error in the perf
event creation. That's strange since the atom PMU should not be an
exclusive PMU.

Further investigation shows the issue was introduced by commit:

  97588df87b ("perf/x86/intel: Add common intel_pmu_init_hybrid()")

The commit originally intents to clear the bit PERF_PMU_CAP_AUX_OUTPUT
from PMU capabilities if intel_cap.pebs_output_pt_available is not set,
but it incorrectly uses 'or' operation and leads to all PMU capabilities
bits are set to 1 except bit PERF_PMU_CAP_AUX_OUTPUT.

Testing this fix on Intel hybrid platforms, the observed issues
disappear.

Fixes: 97588df87b ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231121014628.729989-1-dapeng1.mi@linux.intel.com
2023-11-21 13:44:36 +01:00
Kan Liang
bbb968696d perf/x86/intel/cstate: Add Grand Ridge support
The same as the Sierra Forest, the Grand Ridge supports core C1/C6 and
module C6. But it doesn't support pkg C6 residency counter.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231116142245.1233485-4-kan.liang@linux.intel.com
2023-11-17 10:54:53 +01:00
Kan Liang
3877d55a0d perf/x86/intel/cstate: Add Sierra Forest support
A new module C6 Residency Counter is introduced in the Sierra Forest.
The scope of the new counter is module (A cluster of cores shared L2
cache). Create a brand new cstate_module PMU to profile the new counter.
The only differences between the new cstate_module PMU and the existing
cstate PMU are the scope and events.

Regarding the choice of the new cstate_module PMU name, the current
naming rule of a cstate PMU is "cstate_" + the scope of the PMU. The
scope of the PMU is the cores shared L2. On SRF, Intel calls it
"module", while the internal Linux sched code calls it "cluster". The
"cstate_module" is used as the new PMU name, because
- The Cstate PMU driver is a Intel specific driver. It doesn't impact
  other ARCHs. The name makes it consistent with the documentation.
- The "cluster" mainly be used by the scheduler developer, while the
  user of cstate PMU is more likely a researcher reading HW docs and
  optimizing power.
- In the Intel's SDM, the "cluster" has a different meaning/scope for
  topology. Using it will mislead the end users.

Besides the module C6, the core C1/C6 and pkg C6 residency counters are
supported in the Sierra Forest as well.

Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231116142245.1233485-3-kan.liang@linux.intel.com
2023-11-17 10:54:53 +01:00
Kan Liang
243218ca93 perf/x86/intel/cstate: Cleanup duplicate attr_groups
The events of the cstate_core and cstate_pkg PMU have the same format.
They both need to create a "events" group (with empty attrs). The
attr_groups can be shared.

Remove the dedicated attr_groups for each cstate PMU. Use the shared
cstate_attr_groups to replace.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231116142245.1233485-1-kan.liang@linux.intel.com
2023-11-17 10:54:52 +01:00
Peter Zijlstra
5d2d4a9f60 Merge branch 'tip/perf/urgent'
Avoid conflicts, base on fixes.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2023-11-15 10:15:40 +01:00
Linus Torvalds
eb55307e67 X86 core code updates:
- Limit the hardcoded topology quirk for Hygon CPUs to those which have a
     model ID less than 4. The newer models have the topology CPUID leaf 0xB
     correctly implemented and are not affected.
 
   - Make SMT control more robust against enumeration failures
 
     SMT control was added to allow controlling SMT at boottime or
     runtime. The primary purpose was to provide a simple mechanism to
     disable SMT in the light of speculation attack vectors.
 
     It turned out that the code is sensible to enumeration failures and
     worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
     which means the primary thread mask is not set up correctly. By chance
     a XEN/PV boot ends up with smp_num_siblings == 2, which makes the
     hotplug control stay at its default value "enabled". So the mask is
     never evaluated.
 
     The ongoing rework of the topology evaluation caused XEN/PV to end up
     with smp_num_siblings == 1, which sets the SMT control to "not
     supported" and the empty primary thread mask causes the hotplug core to
     deny the bringup of the APS.
 
     Make the decision logic more robust and take 'not supported' and 'not
     implemented' into account for the decision whether a CPU should be
     booted or not.
 
   - Fake primary thread mask for XEN/PV
 
     Pretend that all XEN/PV vCPUs are primary threads, which makes the
     usage of the primary thread mask valid on XEN/PV. That is consistent
     with because all of the topology information on XEN/PV is fake or even
     non-existent.
 
   - Encapsulate topology information in cpuinfo_x86
 
     Move the randomly scattered topology data into a separate data
     structure for readability and as a preparatory step for the topology
     evaluation overhaul.
 
   - Consolidate APIC ID data type to u32
 
     It's fixed width hardware data and not randomly u16, int, unsigned long
     or whatever developers decided to use.
 
   - Cure the abuse of cpuinfo for persisting logical IDs.
 
     Per CPU cpuinfo is used to persist the logical package and die
     IDs. That's really not the right place simply because cpuinfo is
     subject to be reinitialized when a CPU goes through an offline/online
     cycle.
 
     Use separate per CPU data for the persisting to enable the further
     topology management rework. It will be removed once the new topology
     management is in place.
 
   - Provide a debug interface for inspecting topology information
 
     Useful in general and extremly helpful for validating the topology
     management rework in terms of correctness or "bug" compatibility.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmU+yX0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoROUD/4vlvKEcpm9rbI5DzLcaq4DFHKbyEZF
 cQtzuOSM/9vTc9DHnuoNNLl9TWSYxiVYnejf3E21evfsqspYlzbTH8bId9XBCUid
 6B68AJW842M2erNuwj0b0HwF1z++zpDmBDyhGOty/KQhoM8pYOHMvntAmbzJbuso
 Dgx6BLVFcboTy6RwlfRa0EE8f9W5V+JbmG/VBDpdyCInal7VrudoVFZmWQnPIft7
 zwOJpAoehkp8OKq7geKDf79yWxu9a1sNPd62HtaVEvfHwehHqE6OaMLss1us+0vT
 SJ/D6gmRQBOwcXaZL0wL1dG7Km9Et4AisOvzhXGvTa5b2D5oljVoqJ7V7FTf5g3u
 y3aqWbeUJzERUbeJt1HoGVAKyA4GtZOvg+TNIysf6F1Z4khl9alfa9jiqjj4g1au
 zgItq/ZMBEBmJ7X4FxQUEUVBG2CDsEidyNBDRcimWQUDfBakV/iCs0suD8uu8ZOD
 K5jMx8Hi2+xFx7r1YqsfsyMBYOf/zUZw65RbNe+kI992JbJ9nhcODbnbo5MlAsyv
 vcqlK5FwXgZ4YAC8dZHU/tyTiqAW7oaOSkqKwTP5gcyNEqsjQHV//q6v+uqtjfYn
 1C4oUsRHT2vJiV9ktNJTA4GQHIYF4geGgpG8Ih2SjXsSzdGtUd3DtX1iq0YiLEOk
 eHhYsnniqsYB5g==
 =xrz8
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Thomas Gleixner:

 - Limit the hardcoded topology quirk for Hygon CPUs to those which have
   a model ID less than 4.

   The newer models have the topology CPUID leaf 0xB correctly
   implemented and are not affected.

 - Make SMT control more robust against enumeration failures

   SMT control was added to allow controlling SMT at boottime or
   runtime. The primary purpose was to provide a simple mechanism to
   disable SMT in the light of speculation attack vectors.

   It turned out that the code is sensible to enumeration failures and
   worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
   which means the primary thread mask is not set up correctly. By
   chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes
   the hotplug control stay at its default value "enabled". So the mask
   is never evaluated.

   The ongoing rework of the topology evaluation caused XEN/PV to end up
   with smp_num_siblings == 1, which sets the SMT control to "not
   supported" and the empty primary thread mask causes the hotplug core
   to deny the bringup of the APS.

   Make the decision logic more robust and take 'not supported' and 'not
   implemented' into account for the decision whether a CPU should be
   booted or not.

 - Fake primary thread mask for XEN/PV

   Pretend that all XEN/PV vCPUs are primary threads, which makes the
   usage of the primary thread mask valid on XEN/PV. That is consistent
   with because all of the topology information on XEN/PV is fake or
   even non-existent.

 - Encapsulate topology information in cpuinfo_x86

   Move the randomly scattered topology data into a separate data
   structure for readability and as a preparatory step for the topology
   evaluation overhaul.

 - Consolidate APIC ID data type to u32

   It's fixed width hardware data and not randomly u16, int, unsigned
   long or whatever developers decided to use.

 - Cure the abuse of cpuinfo for persisting logical IDs.

   Per CPU cpuinfo is used to persist the logical package and die IDs.
   That's really not the right place simply because cpuinfo is subject
   to be reinitialized when a CPU goes through an offline/online cycle.

   Use separate per CPU data for the persisting to enable the further
   topology management rework. It will be removed once the new topology
   management is in place.

 - Provide a debug interface for inspecting topology information

   Useful in general and extremly helpful for validating the topology
   management rework in terms of correctness or "bug" compatibility.

* tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too
  x86/cpu: Provide debug interface
  x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids
  x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
  x86/apic: Use u32 for [gs]et_apic_id()
  x86/apic: Use u32 for phys_pkg_id()
  x86/apic: Use u32 for cpu_present_to_apicid()
  x86/apic: Use u32 for check_apicid_used()
  x86/apic: Use u32 for APIC IDs in global data
  x86/apic: Use BAD_APICID consistently
  x86/cpu: Move cpu_l[l2]c_id into topology info
  x86/cpu: Move logical package and die IDs into topology info
  x86/cpu: Remove pointless evaluation of x86_coreid_bits
  x86/cpu: Move cu_id into topology info
  x86/cpu: Move cpu_core_id into topology info
  hwmon: (fam15h_power) Use topology_core_id()
  scsi: lpfc: Use topology_core_id()
  x86/cpu: Move cpu_die_id into topology info
  x86/cpu: Move phys_proc_id into topology info
  x86/cpu: Encapsulate topology information in cpuinfo_x86
  ...
2023-10-30 17:37:47 -10:00
Linus Torvalds
bceb7accb7 Performance events changes for v6.7 are:
- Add AMD Unified Memory Controller (UMC) events introduced with Zen 4
  - Simplify & clean up the uncore management code
  - Fall back from RDPMC to RDMSR on certain uncore PMUs
  - Improve per-package and cstate event reading
  - Extend the Intel ref-cycles event to GP counters
  - Fix Intel MTL event constraints
  - Improve the Intel hybrid CPU handling code
  - Micro-optimize the RAPL code
  - Optimize perf_cgroup_switch()
  - Improve large AUX area error handling
 
  - Misc fixes and cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU89YsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iQqQ/9EF9mG4te5By4qN+B7jADCmE71xG5ViKz
 sp4Thl86SHxhwFuiHn8dMUixrp+qbcemi5yTbQ9TF8cKl4s3Ju2CihU8jaauUp0a
 iS5W0IliMqLD1pxQoXAPLuPVInVYgrNOCbR4l6l7D6ervh5Z6PVEf7SVeAP3L5wo
 QV/V3NKkrYeNQL+FoKhCH8Vhxw0HxUmKJO7UhW6yuCt7BAok9Es18h3OVnn+7es4
 BB7VI/JvdmXf2ioKhTPnDXJjC+vh5vnwiBoTcdQ2W9ADhWUvfL4ozxOXT6z7oC3A
 nwBOdXf8w8Rqnqqd8hduop1QUrusMxlEVgOMCk27qHx97uWgPceZWdoxDXGHBiRK
 fqJAwXERf9wp5/M57NDlPwyf/43Hocdx2CdLkQBpfD78/k/sB5hW0KxnzY0FUI9x
 jBRQyWD05IDJATBaMHz+VbrexS+Itvjp2QvSiSm9zislYD4zA9fQ3lAgFhEpcUbA
 ZA/nN4t+CbiGEAsJEuBPlvSC1ahUwVP/0nz3PFlVWFDqAx0mXgVNKBe083A9yh7I
 dVisVY6KPAVDzyOc1LqzU8WFXNFnIkIIaLrb6fRHJVEM8MDfpLPS/a+7AHdRcDP4
 yq6fjVVjyP7e9lSQLYBUP3/3uiVnWQj92l6V6CrcgDMX5rDOb0VN+BQrmhPR6fWY
 WEim6WZrZj4=
 =OLsv
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance event updates from Ingo Molnar:
 - Add AMD Unified Memory Controller (UMC) events introduced with Zen 4
 - Simplify & clean up the uncore management code
 - Fall back from RDPMC to RDMSR on certain uncore PMUs
 - Improve per-package and cstate event reading
 - Extend the Intel ref-cycles event to GP counters
 - Fix Intel MTL event constraints
 - Improve the Intel hybrid CPU handling code
 - Micro-optimize the RAPL code
 - Optimize perf_cgroup_switch()
 - Improve large AUX area error handling
 - Misc fixes and cleanups

* tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  perf/x86/amd/uncore: Pass through error code for initialization failures, instead of -ENODEV
  perf/x86/amd/uncore: Fix uninitialized return value in amd_uncore_init()
  x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations
  perf: Optimize perf_cgroup_switch()
  perf/x86/amd/uncore: Add memory controller support
  perf/x86/amd/uncore: Add group exclusivity
  perf/x86/amd/uncore: Use rdmsr if rdpmc is unavailable
  perf/x86/amd/uncore: Move discovery and registration
  perf/x86/amd/uncore: Refactor uncore management
  perf/core: Allow reading package events from perf_event_read_local
  perf/x86/cstate: Allow reading the package statistics from local CPU
  perf/x86/intel/pt: Fix kernel-doc comments
  perf/x86/rapl: Annotate 'struct rapl_pmus' with __counted_by
  perf/core: Rename perf_proc_update_handler() -> perf_event_max_sample_rate_handler(), for readability
  perf/x86/rapl: Fix "Using plain integer as NULL pointer" Sparse warning
  perf/x86/rapl: Use local64_try_cmpxchg in rapl_event_update()
  perf/x86/rapl: Stop doing cpu_relax() in the local64_cmpxchg() loop in rapl_event_update()
  perf/core: Bail out early if the request AUX area is out of bound
  perf/x86/intel: Extend the ref-cycles event to GP counters
  perf/x86/intel: Fix broken fixed event constraints extension
  ...
2023-10-30 13:44:35 -10:00
Kan Liang
3374491619 perf/x86/intel: Support branch counters logging
The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. It can
provide a means to attribute exposed retirement latency to combinations
of events across a block of instructions. It also provides a means of
attributing Timed LBR latencies to events.

The feature is first introduced on SRF/GRR. It is an enhancement of the
ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences
of events on the GP counters. The information is displayed by the order
of counters.

The design proposed in this patch requires that the events which are
logged must be in a group with the event that has LBR. If there are
more than one LBR group, the counters logging information only from the
current group (overflowed) are stored for the perf tool, otherwise the
perf tool cannot know which and when other groups are scheduled
especially when multiplexing is triggered. The user can ensure it uses
the maximum number of counters that support LBR info (4 by now) by
making the group large enough.

The HW only logs events by the order of counters. The order may be
different from the order of enabling which the perf tool can understand.
When parsing the information of each branch entry, convert the counter
order to the enabled order, and store the enabled order in the extension
space.

Unconditionally reset LBRs for an LBR event group when it's deleted. The
logged counter information is only valid for the current LBR group. If
another LBR group is scheduled later, the information from the stale
LBRs would be otherwise wrongly interpreted.

Add a sanity check in intel_pmu_hw_config(). Disable the feature if other
counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode
is enabled. (For the LBR call stack mode, we cannot simply flush the
LBR, since it will break the call stack. Also, there is no obvious usage
with the call stack mode for now.)

Only applying the PERF_SAMPLE_BRANCH_COUNTERS doesn't require any branch
stack setup.

Expose the maximum number of supported counters and the width of the
counters into the sysfs. The perf tool can use the information to parse
the logged counters in each branch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-5-kan.liang@linux.intel.com
2023-10-27 15:05:11 +02:00
Kan Liang
318c498591 perf/x86/intel: Reorganize attrs and is_visible
Some attrs and is_visible implementations are rather far away from one
another which makes the whole thing hard to interpret.

There are only two attribute groups which have both .attrs and
.is_visible, group_default and group_caps_lbr. Move them together.

No functional changes.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-4-kan.liang@linux.intel.com
2023-10-27 15:05:10 +02:00
Kan Liang
1f2376cd03 perf: Add branch_sample_call_stack
Add a helper function to check call stack sample type.

The later patch will invoke the function in several places.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-3-kan.liang@linux.intel.com
2023-10-27 15:05:09 +02:00
Kan Liang
85846b2707 perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
Currently, branch_sample_type !=0 is used to check whether a branch
stack setup is required. But it doesn't check the sample type,
unnecessary branch stack setup may be done for a counting event. E.g.,
perf record -e "{branch-instructions,branch-misses}:S" -j any
Also, the event only with the new PERF_SAMPLE_BRANCH_COUNTERS branch
sample type may not require a branch stack setup either.

Add a new flag NEEDS_BRANCH_STACK to indicate whether the event requires
a branch stack setup. Replace the needs_branch_stack() by checking the
new flag.

The counting event check is implemented here. The later patch will take
the new PERF_SAMPLE_BRANCH_COUNTERS into account.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-2-kan.liang@linux.intel.com
2023-10-27 15:05:09 +02:00
Kan Liang
571d91dcad perf: Add branch stack counters
Currently, the additional information of a branch entry is stored in a
u64 space. With more and more information added, the space is running
out. For example, the information of occurrences of events will be added
for each branch.

Two places were suggested to append the counters.
https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/
One place is right after the flags of each branch entry. It changes the
existing struct perf_branch_entry. The later ARCH specific
implementation has to be really careful to consistently pick
the right struct.
The other place is right after the entire struct perf_branch_stack.
The disadvantage is that the pointer of the extra space has to be
recorded. The common interface perf_sample_save_brstack() has to be
updated.

The latter is much straightforward, and should be easily understood and
maintained. It is implemented in the patch.

Add a new branch sample type, PERF_SAMPLE_BRANCH_COUNTERS, to indicate
the event which is recorded in the branch info.

The "u64 counters" may store the occurrences of several events. The
information regarding the number of events/counters and the width of
each counter should be exposed via sysfs as a reference for the perf
tool. Define the branch_counter_nr and branch_counter_width ABI here.
The support will be implemented later in the Intel-specific patch.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-1-kan.liang@linux.intel.com
2023-10-27 15:05:08 +02:00
Sandipan Das
744940f192 perf/x86/amd/uncore: Pass through error code for initialization failures, instead of -ENODEV
Pass through the appropriate error code when the registration of hotplug
callbacks fail during initialization, instead of returning a blanket -ENODEV.

[ mingo: Updated the changelog. ]

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231016060743.332051-1-sandipan.das@amd.com
2023-10-16 11:42:01 +02:00
Dan Carpenter
7543365739 perf/x86/amd/uncore: Fix uninitialized return value in amd_uncore_init()
Some of the error paths in this function return don't initialize the
error code.  Return -ENODEV by default.

Fixes: d6389d3ccc ("perf/x86/amd/uncore: Refactor uncore management")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/cec62eba-c4b8-4cb7-9671-58894dd4b974@moroto.mountain
2023-10-13 09:32:50 +02:00
Thomas Gleixner
6e29032340 x86/cpu: Move cpu_l[l2]c_id into topology info
The topology IDs which identify the LLC and L2 domains clearly belong to
the per CPU topology information.

Move them into cpuinfo_x86::cpuinfo_topo and get rid of the extra per CPU
data and the related exports.

This also paves the way to do proper topology evaluation during early boot
because it removes the only per CPU dependency for that.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.803864641@linutronix.de
2023-10-10 14:38:18 +02:00
Thomas Gleixner
22dc963162 x86/cpu: Move logical package and die IDs into topology info
Yet another topology related data pair. Rename logical_proc_id to
logical_pkg_id so it fits the common naming conventions.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.745139505@linutronix.de
2023-10-10 14:38:18 +02:00
Sandipan Das
25e5684782 perf/x86/amd/uncore: Add memory controller support
Unified Memory Controller (UMC) events were introduced with Zen 4 as a
part of the Performance Monitoring Version 2 (PerfMonV2) enhancements.
An event is specified using the EventSelect bits and the RdWrMask bits
can be used for additional filtering of read and write requests.

As of now, a maximum of 12 channels of DDR5 are available on each socket
and each channel is controlled by a dedicated UMC. Each UMC, in turn,
has its own set of performance monitoring counters.

Since the MSR address space for the UMC PERF_CTL and PERF_CTR registers
are reused across sockets, uncore groups are created on the basis of
socket IDs. Hence, group exclusivity is mandatory while opening events
so that events for an UMC can only be opened on CPUs which are on the
same socket as the corresponding memory channel.

For each socket, the total number of available UMC counters and active
memory channels are determined from CPUID leaf 0x80000022 EBX and ECX
respectively. Usually, on Zen 4, each UMC has four counters.

MSR assignments are determined on the basis of active UMCs. E.g. if
UMCs 1, 4 and 9 are active for a given socket, then

  * UMC 1 gets MSRs 0xc0010800 to 0xc0010807 as PERF_CTLs and PERF_CTRs
  * UMC 4 gets MSRs 0xc0010808 to 0xc001080f as PERF_CTLs and PERF_CTRs
  * UMC 9 gets MSRs 0xc0010810 to 0xc0010817 as PERF_CTLs and PERF_CTRs

If there are sockets without any online CPUs when the amd_uncore driver
is loaded, UMCs for such sockets will not be discoverable since the
mechanism relies on executing the CPUID instruction on an online CPU
from the socket.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/b25f391205c22733493abec1ed850b71784edc5f.1696425185.git.sandipan.das@amd.com
2023-10-09 16:12:25 +02:00
Sandipan Das
83a43c6221 perf/x86/amd/uncore: Add group exclusivity
In some cases, it may be necessary to restrict opening PMU events to a
subset of CPUs. E.g. Unified Memory Controller (UMC) PMUs are specific
to each active memory channel and the MSR address space for the PERF_CTL
and PERF_CTR registers is reused on each socket. Thus, opening events
for a specific UMC PMU should be restricted to CPUs belonging to the
same socket as that of the UMC. The "cpumask" of the PMU should also
reflect this accordingly.

Uncore PMUs which require this can use the new group attribute in struct
amd_uncore_pmu to set a valid group ID during the scan() phase. Later,
during init(), an uncore context for a CPU will be unavailable if the
group ID does not match.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/937d6d71010a48ea4e069f4904b3116a5f99ecdf.1696425185.git.sandipan.das@amd.com
2023-10-09 16:12:24 +02:00
Sandipan Das
7ef0343855 perf/x86/amd/uncore: Use rdmsr if rdpmc is unavailable
Not all uncore PMUs may support the use of the RDPMC instruction for
reading counters. In such cases, read the count from the corresponding
PERF_CTR register using the RDMSR instruction.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/e9d994e32a3fcb39fa59fcf43ab4260d11aba097.1696425185.git.sandipan.das@amd.com
2023-10-09 16:12:24 +02:00
Sandipan Das
07888daa05 perf/x86/amd/uncore: Move discovery and registration
Uncore PMUs have traditionally been registered in the module init path.
This is fine for the existing DF and L3 PMUs since the CPUID information
does not vary across CPUs but not for the memory controller (UMC) PMUs
since information like active memory channels can vary for each socket
depending on how the DIMMs have been physically populated.

To overcome this, the discovery of PMU information using CPUID is moved
to the startup of UNCORE_STARTING. This cannot be done in the startup of
UNCORE_PREP since the hotplug callback does not run on the CPU that is
being brought online.

Previously, the startup of UNCORE_PREP was used for allocating uncore
contexts following which, the startup of UNCORE_STARTING was used to
find and reuse an existing sibling context, if possible. Any unused
contexts were added to a list for reclaimation later during the startup
of UNCORE_ONLINE.

Since all required CPUID info is now available only after the startup of
UNCORE_STARTING has completed, context allocation has been moved to the
startup of UNCORE_ONLINE. Before allocating contexts, the first CPU that
comes online has to take up the additional responsibility of registering
the PMUs. This is a one-time process though. Since sibling discovery now
happens prior to deciding whether a new context is required, there is no
longer a need to track and free up unused contexts.

The teardown of UNCORE_ONLINE and UNCORE_PREP functionally remain the
same.

Overall, the flow of control described above is achieved using the
following handlers for managing uncore PMUs. It is mandatory to define
them for each type of uncore PMU.

  * scan() runs during startup of UNCORE_STARTING and collects PMU info
    using CPUID.

  * init() runs during startup of UNCORE_ONLINE, registers PMUs and sets
    up uncore contexts.

  * move() runs during teardown of UNCORE_ONLINE and migrates uncore
    contexts to a shared sibling, if possible.

  * free() runs during teardown of UNCORE_PREP and frees up uncore
    contexts.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/e6c447e48872fcab8452e0dd81b1c9cb09f39eb4.1696425185.git.sandipan.das@amd.com
2023-10-09 16:12:23 +02:00
Sandipan Das
d6389d3ccc perf/x86/amd/uncore: Refactor uncore management
Since struct amd_uncore is used to manage per-cpu contexts, rename it to
amd_uncore_ctx in order to better reflect its purpose. Add a new struct
amd_uncore_pmu to encapsulate all attributes which are shared by per-cpu
contexts for a corresponding PMU. These include the number of counters,
active mask, MSR and RDPMC base addresses, etc. Since the struct pmu is
now embedded, the corresponding amd_uncore_pmu for a given event can be
found by simply using container_of().

Finally, move all PMU-specific code to separate functions. While the
original event management functions continue to provide the base
functionality, all PMU-specific quirks and customizations are applied in
separate functions.

The motivation is to simplify the management of uncore PMUs.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/24b38c49a5dae65d8c96e5d75a2b96ae97aaa651.1696425185.git.sandipan.das@amd.com
2023-10-09 16:12:23 +02:00
Tero Kristo
05276d4831 perf/x86/cstate: Allow reading the package statistics from local CPU
The MSR registers for reading the package residency counters are
available on every CPU of the package. To avoid doing unnecessary SMP
calls to read the values for these from the various CPUs inside a
package, allow reading them from any CPU of the package.

Suggested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230912124432.3616761-2-tero.kristo@linux.intel.com
2023-10-09 16:12:22 +02:00
Lucy Mielke
38cd5b6a87 perf/x86/intel/pt: Fix kernel-doc comments
Some parameters or return codes were either wrong or missing,
update them.

Signed-off-by: Lucy Mielke <lucymielke@icloud.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/ZSOjQW3e2nJR4bAo@fedora.fritz.box
2023-10-09 12:26:22 +02:00
JP Kobryn
e53899771a perf/x86/lbr: Filter vsyscall addresses
We found that a panic can occur when a vsyscall is made while LBR sampling
is active. If the vsyscall is interrupted (NMI) for perf sampling, this
call sequence can occur (most recent at top):

    __insn_get_emulate_prefix()
    insn_get_emulate_prefix()
    insn_get_prefixes()
    insn_get_opcode()
    decode_branch_type()
    get_branch_type()
    intel_pmu_lbr_filter()
    intel_pmu_handle_irq()
    perf_event_nmi_handler()

Within __insn_get_emulate_prefix() at frame 0, a macro is called:

    peek_nbyte_next(insn_byte_t, insn, i)

Within this macro, this dereference occurs:

    (insn)->next_byte

Inspecting registers at this point, the value of the next_byte field is the
address of the vsyscall made, for example the location of the vsyscall
version of gettimeofday() at 0xffffffffff600000. The access to an address
in the vsyscall region will trigger an oops due to an unhandled page fault.

To fix the bug, filtering for vsyscalls can be done when
determining the branch type. This patch will return
a "none" branch if a kernel address if found to lie in the
vsyscall region.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
2023-10-08 12:25:18 +02:00
Kees Cook
a56d5551e1 perf/x86/rapl: Annotate 'struct rapl_pmus' with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS=y (for
array indexing) and CONFIG_FORTIFY_SOURCE=y (for strcpy/memcpy-family
functions).

Found with Coccinelle:

  https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1]

Add __counted_by for 'struct rapl_pmus'.

No change in functionality intended.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20231006201754.work.473-kees@kernel.org
2023-10-08 12:18:17 +02:00
David Reaver
618e77d774 perf/x86/rapl: Fix "Using plain integer as NULL pointer" Sparse warning
Change 0 to NULL when initializing the test field of perf_msr structs to
avoid the following sparse warnings:

  make C=2 arch/x86/events/rapl.o

  CHECK   arch/x86/events/rapl.c
  ...
  arch/x86/events/rapl.c:540:59: warning: Using plain integer as NULL pointer
  arch/x86/events/rapl.c:542:59: warning: Using plain integer as NULL pointer
  arch/x86/events/rapl.c:543:59: warning: Using plain integer as NULL pointer
  arch/x86/events/rapl.c:544:59: warning: Using plain integer as NULL pointer

Signed-off-by: David Reaver <me@davidreaver.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230801155651.108076-1-me@davidreaver.com
2023-10-03 21:25:56 +02:00
Uros Bizjak
bcc6ec3d95 perf/x86/rapl: Use local64_try_cmpxchg in rapl_event_update()
Use local64_try_cmpxchg() instead of local64_cmpxchg(*ptr, old, new) == old.

X86 CMPXCHG instruction returns success in ZF flag, so this change saves a
compare after CMPXCHG (and related move instruction in front of CMPXCHG).

Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when CMPXCHG
fails. There is no need to re-read the value in the loop.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230807145134.3176-2-ubizjak@gmail.com
2023-10-03 21:13:45 +02:00
Uros Bizjak
1ce19bf90b perf/x86/rapl: Stop doing cpu_relax() in the local64_cmpxchg() loop in rapl_event_update()
According to the following commit:

   f5fe24ef17 ("lockref: stop doing cpu_relax in the cmpxchg loop")

   "On the x86-64 architecture even a failing cmpxchg grants exclusive
    access to the cacheline, making it preferable to retry the failed op
    immediately instead of stalling with the pause instruction."

Based on the above observation, remove cpu_relax() from the
local64_cmpxchg() loop of rapl_event_update().

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230807145134.3176-1-ubizjak@gmail.com
2023-10-03 21:13:23 +02:00
Ingo Molnar
de80193308 Linux 6.6-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmUZ4WEeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGnIYH/07zef2U1nlqI+ro
 HRL2GlWGIs9yE70Oax+A3eYUYsjJIPu0yiDhFHUgOV3VyAALo44ZX/WNwKCGsI3e
 zhuOeItyyVcLGZXVC/jxSu0uveyfEiEYIWRYGyQ6Sna8Ksdk/qwhNgQNotdWdQG5
 7xt8z32couglu0uOkxcGqjTxmbjO6WSM5qi7Ts+xLsgrcS5cRuNhAg/vezp9bfeL
 1IUieCih4RJFgar/6LPOiB8uoVXEBonVbtlTRRqYdnqcsSIC+ACR9ZFk/+X88b5z
 S+Ta5VTcOAPu+2M/lSGe+PlUECvoBNK0SIYnaVCP2paPmDxfDXOFvSy/qJE87/7L
 9BeonFw=
 =8FTr
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc4' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-03 09:32:25 +02:00
Breno Leitao
599522d9d2 perf/x86/amd: Do not WARN() on every IRQ
Zen 4 systems running buggy microcode can hit a WARN_ON() in the PMI
handler, as shown below, several times while perf runs. A simple
`perf top` run is enough to render the system unusable:

  WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0

This happens because the Performance Counter Global Status Register
(PerfCntGlobalStatus) has one or more bits set which are considered
reserved according to the "AMD64 Architecture Programmer’s Manual,
Volume 2: System Programming, 24593":

  https://www.amd.com/system/files/TechDocs/24593.pdf

To make this less intrusive, warn just once if any reserved bit is set
and prompt the user to update the microcode. Also sanitize the value to
what the code is handling, so that the overflow events continue to be
handled for the number of counters that are known to be sane.

Going forward, the following microcode patch levels are recommended
for Zen 4 processors in order to avoid such issues with reserved bits:

  Family=0x19 Model=0x11 Stepping=0x01: Patch=0x0a10113e
  Family=0x19 Model=0x11 Stepping=0x02: Patch=0x0a10123e
  Family=0x19 Model=0xa0 Stepping=0x01: Patch=0x0aa00116
  Family=0x19 Model=0xa0 Stepping=0x02: Patch=0x0aa00212

Commit f2eb058afc57 ("linux-firmware: Update AMD cpu microcode") from
the linux-firmware tree has binaries that meet the minimum required
patch levels.

  [ sandipan: - add message to prompt users to update microcode
              - rework commit message and call out required microcode levels ]

Fixes: 7685665c39 ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/3540f985652f41041e54ee82aa53e7dbd55739ae.1694696888.git.sandipan.das@amd.com/
2023-09-25 11:30:31 +02:00
Sandipan Das
23d2626b84 perf/x86/amd/core: Fix overflow reset on hotplug
Kernels older than v5.19 do not support PerfMonV2 and the PMI handler
does not clear the overflow bits of the PerfCntrGlobalStatus register.
Because of this, loading a recent kernel using kexec from an older
kernel can result in inconsistent register states on Zen 4 systems.

The PMI handler of the new kernel gets confused and shows a warning when
an overflow occurs because some of the overflow bits are set even if the
corresponding counters are inactive. These are remnants from overflows
that were handled by the older kernel.

During CPU hotplug, the PerfCntrGlobalCtl and PerfCntrGlobalStatus
registers should always be cleared for PerfMonV2-capable processors.
However, a condition used for NB event constaints applicable only to
older processors currently prevents this from happening. Move the reset
sequence to an appropriate place and also clear the LBR Freeze bit.

Fixes: 21d59e3e2c ("perf/x86/amd/core: Detect PerfMonV2 support")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/882a87511af40792ba69bb0e9026f19a2e71e8a3.1694696888.git.sandipan.das@amd.com
2023-09-22 12:05:14 +02:00
Kan Liang
ffbe4ab0be perf/x86/intel: Extend the ref-cycles event to GP counters
The current ref-cycles event is only available on the fixed counter 2.
Starting from the GLC and GRT core, the architectural UnHalted Reference
Cycles event (0x013c) which is available on general-purpose counters
can collect the exact same events as the fixed counter 2.

Update the mapping of ref-cycles to 0x013c. So the ref-cycles can be
available on both fixed counter 2 and general-purpose counters.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230911144139.2354015-1-kan.liang@linux.intel.com
2023-09-12 08:22:24 +02:00
Kan Liang
950ecdc672 perf/x86/intel: Fix broken fixed event constraints extension
Unnecessary multiplexing is triggered when running an "instructions"
event on an MTL.

perf stat -e cpu_core/instructions/,cpu_core/instructions/ -a sleep 1

 Performance counter stats for 'system wide':

       115,489,000      cpu_core/instructions/                (50.02%)
       127,433,777      cpu_core/instructions/                (49.98%)

       1.002294504 seconds time elapsed

Linux architectural perf events, e.g., cycles and instructions, usually
have dedicated fixed counters. These events also have equivalent events
which can be used in the general-purpose counters. The counters are
precious. In the intel_pmu_check_event_constraints(), perf check/extend
the event constraints of these events. So these events can utilize both
fixed counters and general-purpose counters.

The following cleanup commit:

  97588df87b ("perf/x86/intel: Add common intel_pmu_init_hybrid()")

forgot adding the intel_pmu_check_event_constraints() into update_pmu_cap().
The architectural perf events cannot utilize the general-purpose counters.

The code to check and update the counters, event constraints and
extra_regs is the same among hybrid systems. Move
intel_pmu_check_hybrid_pmus() to init_hybrid_pmu(), and
emove the duplicate check in update_pmu_cap().

Fixes: 97588df87b ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230911135128.2322833-1-kan.liang@linux.intel.com
2023-09-12 08:22:24 +02:00
Kan Liang
6f7f984fa8 perf/x86/uncore: Correct the number of CHAs on EMR
Starting from SPR, the basic uncore PMON information is retrieved from
the discovery table (resides in an MMIO space populated by BIOS). It is
called the discovery method. The existing value of the type->num_boxes
is from the discovery table.

On some SPR variants, there is a firmware bug that makes the value from the
discovery table incorrect. We use the value from the
SPR_MSR_UNC_CBO_CONFIG MSR to replace the one from the discovery table:

   38776cc45e ("perf/x86/uncore: Correct the number of CHAs on SPR")

Unfortunately, the SPR_MSR_UNC_CBO_CONFIG isn't available for the EMR
XCC (Always returns 0), but the above firmware bug doesn't impact the
EMR XCC.

Don't let the value from the MSR replace the existing value from the
discovery table.

Fixes: 38776cc45e ("perf/x86/uncore: Correct the number of CHAs on SPR")
Reported-by: Stephane Eranian <eranian@google.com>
Reported-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20230905134248.496114-1-kan.liang@linux.intel.com
2023-09-05 21:50:21 +02:00
Kan Liang
97588df87b perf/x86/intel: Add common intel_pmu_init_hybrid()
The current hybrid initialization codes aren't well organized and are
hard to read.

Factor out intel_pmu_init_hybrid() to do a common setup for each
hybrid PMU. The PMU-specific capability will be updated later via either
hard code (ADL) or CPUID hybrid enumeration (MTL).

Splitting the ADL and MTL initialization codes, since they have
different uarches. The hard code PMU capabilities are not required for
MTL either. They can be enumerated by the new leaf 0x23 and
IA32_PERF_CAPABILITIES MSR.

The hybrid enumeration of the IA32_PERF_CAPABILITIES MSR is broken on
MTL. Using the default value.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-7-kan.liang@linux.intel.com
2023-08-29 20:59:23 +02:00
Kan Liang
b0560bfd4b perf/x86/intel: Clean up the hybrid CPU type handling code
There is a fairly long list of grievances about the current code. The
main beefs:

   1. hybrid_big_small assumes that the *HARDWARE* (CPUID) provided
      core types are a bitmap. They are not. If Intel happened to
      make a core type of 0xff, hilarity would ensue.
   2. adl_get_hybrid_cpu_type() utterly inscrutable.  There are
      precisely zero comments and zero changelog about what it is
      attempting to do.

According to Kan, the adl_get_hybrid_cpu_type() is there because some
Alder Lake (ADL) CPUs can do some silly things. Some ADL models are
*supposed* to be hybrid CPUs with big and little cores, but there are
some SKUs that only have big cores. CPUID(0x1a) on those CPUs does
not say that the CPUs are big cores. It apparently just returns 0x0.
It confuses perf because it expects to see either 0x40 (Core) or
0x20 (Atom).

The perf workaround for this is to watch for a CPU core saying it is
type 0x0. If that happens on an Alder Lake, it calls
x86_pmu.get_hybrid_cpu_type() and just assumes that the core is a
Core (0x40) CPU.

To fix up the mess, separate out the CPU types and the 'pmu' types.
This allows 'hybrid_pmu_type' bitmaps without worrying that some
future CPU type will set multiple bits.

Since the types are now separate, add a function to glue them back
together again. Actual comment on the situation in the glue
function (find_hybrid_pmu_for_cpu()).

Also, give ->get_hybrid_cpu_type() a real return type and make it
clear that it is overriding the *CPU* type, not the PMU type.

Rename cpu_type to pmu_type in the struct x86_hybrid_pmu to reflect the
change.

Originally-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-6-kan.liang@linux.intel.com
2023-08-29 20:59:23 +02:00
Kan Liang
299a5fc8e7 perf/x86/intel: Apply the common initialization code for ADL
Use the intel_pmu_init_glc() and intel_pmu_init_grt() to replace the
duplicate code for ADL.

The current code already checks the PERF_X86_EVENT_TOPDOWN flag before
invoking the Topdown metrics functions. (The PERF_X86_EVENT_TOPDOWN flag
is to indicate the Topdown metric feature, which is only available for
the p-core.) Drop the unnecessary adl_set_topdown_event_period() and
adl_update_topdown_event().

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-5-kan.liang@linux.intel.com
2023-08-29 20:59:23 +02:00
Kan Liang
d87d221f85 perf/x86/intel: Factor out the initialization code for ADL e-core
From PMU's perspective, the ADL e-core and newer SRF/GRR have a similar
uarch. Most of the initialization code can be shared.

Factor out intel_pmu_init_grt() for the common initialization code.
The common part of the ADL e-core will be replaced by the later patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-4-kan.liang@linux.intel.com
2023-08-29 20:59:22 +02:00
Kan Liang
0ba0c03528 perf/x86/intel: Factor out the initialization code for SPR
The SPR and ADL p-core have a similar uarch. Most of the initialization
code can be shared.

Factor out intel_pmu_init_glc() for the common initialization code.
The common part of the ADL p-core will be replaced by the later patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-3-kan.liang@linux.intel.com
2023-08-29 20:59:22 +02:00
Kan Liang
d4b5694c75 perf/x86/intel: Use the common uarch name for the shared functions
From PMU's perspective, the SPR/GNR server has a similar uarch to the
ADL/MTL client p-core. Many functions are shared. However, the shared
function name uses the abbreviation of the server product code name,
rather than the common uarch code name.

Rename these internal shared functions by the common uarch name.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829125806.3016082-2-kan.liang@linux.intel.com
2023-08-29 20:59:22 +02:00
Linus Torvalds
1a7c611546 Perf events changes for v6.6:
- AMD IBS improvements
 - Intel PMU driver updates
 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events
 - Micro-optimize software events and the ring-buffer code
 - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTtBscRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hHoQ/+IBQ8Xi/rcdd40n8OqEB/VBWVuSjNT3uN
 3pHHcTl2Pio9CxBeat42NekNijlRILCKJrZ3Lt3JWBmWyWv5l3KFabelj+lDF2xa
 TVCjTnQNe1+HvrODYnF4ECIs5vaoMVjcJ9jg8+VDgAcOQr1nZs4m5TVAd6TLqPpV
 urBEQVULkkzk7ZRhfrugKhw+wrpWFefgGCx0RV8ijZB7TLMHc2wE+Q/sTxKdKceL
 wNaJaDgV33pZh0aImwR9pKUE532hF1FiBdLuehkh61PZa1L82jzAX1xjw2s1hSa4
 eIWemPHJIYfivRlENbJsDWc4N8gk6ijVHwrxGcr4Axu+NN+zPtQ3ddhaGMAyKdTo
 qUKXH3MZSMIl++jI5Fkc6xM+XLvY1rML62epSzMwu/cc7Z5MeyWdQcri0N9YFuO7
 wUUNnFpU00lwQBLbyyUQ3Zi8E0QV7NuPW4axTkmntiIjMpLagaEvVSf6nf8qLpbE
 WTT16s707t19hUZNazNZ7ONmhly4ALbHFQEH65J2KoYn99fYqy9z68Hwk+xnmykw
 bc3qvfhpw0MImQQ+DqHiBwb4n4UuvY2WlkkZI3FfNeSG63DaM2mZikfpElpXYjn6
 9iOIXvx21Wiq/n0cbLhidI2q/ZzFCzYLCk6ikZ320wb+rhvd7EoSlZil6QSzn3pH
 Qdk+NEZgWQY=
 =ZT6+
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event updates from Ingo Molnar:

 - AMD IBS improvements

 - Intel PMU driver updates

 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events

 - Micro-optimize software events and the ring-buffer code

 - Misc cleanups & fixes

* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
  perf/x86/intel: Add Crestmont PMU
  x86/cpu: Update Hybrids
  x86/cpu: Fix Crestmont uarch
  x86/cpu: Fix Gracemont uarch
  perf: Remove unused extern declaration arch_perf_get_page_size()
  perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
  locking/arch: Avoid variable shadowing in local_try_cmpxchg()
  perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
  perf/x86: Use local64_try_cmpxchg
  perf/amd: Prevent grouping of IBS events
2023-08-28 16:35:01 -07:00
Ilpo Järvinen
2c65477f14 perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
If err == 0, pcibios_err_to_errno(err) returns 0 so the ?: construct
can be removed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230824132832.78705-15-ilpo.jarvinen@linux.intel.com
2023-08-24 21:25:24 +02:00
Kan Liang
a430021faa perf/x86/intel: Add Crestmont PMU
The Grand Ridge and Sierra Forest are successors to Snow Ridge. They
both have Crestmont core. From the core PMU's perspective, they are
similar to the e-core of MTL. The only difference is the LBR event
logging feature, which will be implemented in the following patches.

Create a non-hybrid PMU setup for Grand Ridge and Sierra Forest.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/20230522113040.2329924-1-kan.liang@linux.intel.com
2023-08-09 21:51:07 +02:00
Peter Zijlstra
882cdb06b6 x86/cpu: Fix Gracemont uarch
Alderlake N is an E-core only product using Gracemont
micro-architecture. It fits the pre-existing naming scheme perfectly
fine, adhere to it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230807150405.686834933@infradead.org
2023-08-09 21:51:06 +02:00
James Clark
4b36873b4a perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
Since commit bd27568117 ("perf: Rewrite core context handling") the
relationship between perf_event_context and PMUs has changed so that
the error scenario that PERF_PMU_CAP_HETEROGENEOUS_CPUS originally
silenced no longer exists.

Remove the capability to avoid confusion that it actually influences
any perf core behavior. This change should be a no-op.

Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230724134500.970496-3-james.clark@arm.com
2023-07-26 12:28:46 +02:00
Namhyung Kim
8bfc20baa9 perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
limitations. Mainly:

1. mem_lvl_num doesn't allow setting multiple sources whereas old API
   allows it. Setting multiple data sources is useful because IBS on
   pre-zen4 uarch doesn't provide fine granular DataSrc details (there
   is only one such DataSrc(2h) though).
2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
   c2c (c2c_decode_stats()) does not use mem_lvl_num at all.

1st one can be handled using ANY_CACHE with HOPS_0. 2nd is purely perf
tool specific issue and should be fixed separately.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230725150206.184-4-ravi.bangoria@amd.com
2023-07-26 12:28:45 +02:00
Uros Bizjak
4c1c9dea20 perf/x86: Use local64_try_cmpxchg
Use local64_try_cmpxchg instead of local64_cmpxchg (*ptr, old, new) == old.
x86 CMPXCHG instruction returns success in ZF flag, so this change saves a
compare after cmpxchg (and related move instruction in front of cmpxchg).

Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
fails. There is no need to re-read the value in the loop.

No functional change intended.

Cc. "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230706141720.2672-1-ubizjak@gmail.com
2023-07-10 09:52:35 +02:00
Ravi Bangoria
7c2128235e perf/amd: Prevent grouping of IBS events
IBS PMUs can have only one event active at any point in time. Restrict
grouping of multiple IBS events.

Reported-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230620091603.269-1-ravi.bangoria@amd.com
2023-07-10 09:52:34 +02:00
Namhyung Kim
27c68c216e perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
On SPR, the load latency event needs an auxiliary event in the same
group to work properly.  There's a check in intel_pmu_hw_config()
for this to iterate sibling events and find a mem-loads-aux event.

The for_each_sibling_event() has a lockdep assert to make sure if it
disabled hardirq or hold leader->ctx->mutex.  This works well if the
given event has a separate leader event since perf_try_init_event()
grabs the leader->ctx->mutex to protect the sibling list.  But it can
cause a problem when the event itself is a leader since the event is
not initialized yet and there's no ctx for the event.

Actually I got a lockdep warning when I run the below command on SPR,
but I guess it could be a NULL pointer dereference.

  $ perf record -d -e cpu/mem-loads/uP true

The code path to the warning is:

  sys_perf_event_open()
    perf_event_alloc()
      perf_init_event()
        perf_try_init_event()
          x86_pmu_event_init()
            hsw_hw_config()
              intel_pmu_hw_config()
                for_each_sibling_event()
                  lockdep_assert_event_ctx()

We don't need for_each_sibling_event() when it's a standalone event.
Let's return the error code directly.

Fixes: f3c0eba287 ("perf: Add a few assertions")
Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20230704181516.3293665-1-namhyung@kernel.org
2023-07-10 09:52:20 +02:00
Linus Torvalds
a193cc7506 Perf events changes for v6.5:
- Rework & fix the event forwarding logic by extending the
   core interface. This fixes AMD PMU events that have to
   be forwarded from the core PMU to the IBS PMU.
 
 - Add self-tests to test AMD IBS invocation via core PMU events
 
 - Clean up Intel FixCntrCtl MSR encoding & handling
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSayC0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jlWxAAqUPtfst1p6H5sSyCBPYo5Y/Rh0SyyqJj
 w0YZ8p2nbB/+EmIN3WS1uYhx1/AalTP254q2BgVF4DlDFQA1MlJCmSNJ9HhtzOgt
 mbpNKzy50cQCR/iH+s3ldcFsLGhSG07j6w8xeb6BGiABm2JoiZeg6iVU76zRe5A1
 iPnjC7qoqjKH+sq8pu32fBClMjzf05/LGMd0MqFuYfl5950xRW61olstjo93XWgK
 O5z+5wm5H3MhJ2mzU6x+0C/xurIEQ0zRf6AqLbFp41BbJJJORgTCK746flghiqd5
 DiADc7oj9eOqL1X9jFPHgE07T/6QPrMC8BoH64pOcM3PoZ6Iq3zTkUHxAw3qK5j+
 kqduxzlVaFLFnf7R/vxUvjMg1PM+qP3pqgCrT+NFUdqsdLgSPxRzt5pAM6aAUwmU
 1lhuapESH44RUFZGWrfOwzQE5q/FDmUc2yGyGW2aYDmwkclNjVpnvHEJrQMugI3M
 M3/y9a+ErcPDUJfHcodutBDGw9l7VhsxJFMt4ydOTkNbEfZLbi2TzNapui6SKFja
 G2efrB/HhrV9nE+21Wfa3uxoKMuJ/UPiGrVr2qyGOnShQpK7sdyGDshO1s6TTPye
 OoVf9I0LhewMPap52SU/KDP7GJVPW1BhL/C7w6OSnXxlS5k4lOji7z4Dj2hqXHib
 19Jm7BhqZwE=
 =xn05
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:

 - Rework & fix the event forwarding logic by extending the core
   interface.

   This fixes AMD PMU events that have to be forwarded from the
   core PMU to the IBS PMU.

 - Add self-tests to test AMD IBS invocation via core PMU events

 - Clean up Intel FixCntrCtl MSR encoding & handling

* tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Re-instate the linear PMU search
  perf/x86/intel: Define bit macros for FixCntrCtl MSR
  perf test: Add selftest to test IBS invocation via core pmu events
  perf/core: Remove pmu linear searching code
  perf/ibs: Fix interface via core pmu events
  perf/core: Rework forwarding of {task|cpu}-clock events
2023-06-27 14:43:02 -07:00
Kan Liang
a6742cb90b perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL
When counting a FRONTEND event, the MSR_PEBS_FRONTEND is not correctly
set on GNR and MTL p-core.

The umask value for the FRONTEND events is changed on GNR and MTL. The
new umask is missing in the extra_regs[] table.

Add a dedicated intel_gnr_extra_regs[] for GNR and MTL p-core.

Fixes: bc4000fdb0 ("perf/x86/intel: Add Granite Rapids")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20230615173242.3726364-1-kan.liang@linux.intel.com
2023-06-16 16:46:33 +02:00
Kan Liang
38776cc45e perf/x86/uncore: Correct the number of CHAs on SPR
The number of CHAs from the discovery table on some SPR variants is
incorrect, because of a firmware issue. An accurate number can be read
from the MSR UNC_CBO_CONFIG.

Fixes: 949b11381f ("perf/x86/intel/uncore: Add Sapphire Rapids server CHA support")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230508140206.283708-1-kan.liang@linux.intel.com
2023-05-24 22:19:41 +02:00
Like Xu
3c845304d2 perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
After commit b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing
PEBS_DATA_CFG"), the cpuc->pebs_data_cfg may save some bits that are not
supported by real hardware, such as PEBS_UPDATE_DS_SW. This would cause
the VMX hardware MSR switching mechanism to save/restore invalid values
for PEBS_DATA_CFG MSR, thus crashing the host when PEBS is used for guest.
Fix it by using the active host value from cpuc->active_pebs_data_cfg.

Fixes: b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
Signed-off-by: Like Xu <likexu@tencent.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230517133808.67885-1-likexu@tencent.com
2023-05-23 10:01:13 +02:00
Dapeng Mi
10d95a317e perf/x86/intel: Define bit macros for FixCntrCtl MSR
Define bit macros for FixCntrCtl MSR and replace the bit hardcoding
with these bit macros. This would make code be more human-readable.

Perf commands 'perf stat -e "instructions,cycles,ref-cycles"' and
'perf record -e "instructions,cycles,ref-cycles"' pass.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230504072128.3653470-1-dapeng1.mi@linux.intel.com
2023-05-08 10:58:32 +02:00
Ravi Bangoria
2fad201fe3 perf/ibs: Fix interface via core pmu events
Although, IBS pmus can be invoked via their own interface, indirect
IBS invocation via core pmu events is also supported with fixed set
of events: cpu-cycles:p, r076:p (same as cpu-cycles:p) and r0C1:p
(micro-ops) for user convenience.

This indirect IBS invocation is broken since commit 66d258c5b0
("perf/core: Optimize perf_init_event()"), which added RAW pmu under
'pmu_idr' list and thus if event_init() fails with RAW pmu, it started
returning error instead of trying other pmus.

Forward precise events from core pmu to IBS by overwriting 'type' and
'config' in the kernel copy of perf_event_attr. Overwriting will cause
perf_init_event() to retry with updated 'type' and 'config', which will
automatically forward event to IBS pmu.

Without patch:
  $ sudo ./perf record -C 0 -e r076:p -- sleep 1
  Error:
  The r076:p event is not supported.

With patch:
  $ sudo ./perf record -C 0 -e r076:p -- sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.341 MB perf.data (37 samples) ]

Fixes: 66d258c5b0 ("perf/core: Optimize perf_init_event()")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230504110003.2548-3-ravi.bangoria@amd.com
2023-05-08 10:58:30 +02:00
Kan Liang
b752ea0c28 perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG
Several similar kernel warnings can be triggered,

  [56605.607840] CPU0 PEBS record size 0, expected 32, config 0 cpuc->record_size=208

when the below commands are running in parallel for a while on SPR.

  while true;
  do
	perf record --no-buildid -a --intr-regs=AX  \
		    -e cpu/event=0xd0,umask=0x81/pp \
		    -c 10003 -o /dev/null ./triad;
  done &

  while true;
  do
	perf record -o /tmp/out -W -d \
		    -e '{ld_blocks.store_forward:period=1000000, \
                         MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}' \
		    -c 1037 ./triad;
  done

The triad program is just the generation of loads/stores.

The warnings are triggered when an unexpected PEBS record (with a
different config and size) is found.

A system-wide PEBS event with the large PEBS config may be enabled
during a context switch. Some PEBS records for the system-wide PEBS
may be generated while the old task is sched out but the new one
hasn't been sched in yet. When the new task is sched in, the
cpuc->pebs_record_size may be updated for the per-task PEBS events. So
the existing system-wide PEBS records have a different size from the
later PEBS records.

The PEBS buffer should be flushed right before the hardware is
reprogrammed. The new size and threshold should be updated after the
old buffer has been flushed.

Reported-by: Stephane Eranian <eranian@google.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230421184529.3320912-1-kan.liang@linux.intel.com
2023-05-08 10:58:27 +02:00
Namhyung Kim
90befef5a9 perf/x86: Fix missing sample size update on AMD BRS
It missed to convert a PERF_SAMPLE_BRANCH_STACK user to call the new
perf_sample_save_brstack() helper in order to update the dyn_size.
This affects AMD Zen3 machines with the branch-brs event.

Fixes: eb55b455ef ("perf/core: Add perf_sample_save_brstack() helper")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20230427030527.580841-1-namhyung@kernel.org
2023-05-08 10:58:26 +02:00
Linus Torvalds
7c339778f9 Perf changes for v6.4:
- Add Intel Granite Rapids support
 
  - Add uncore events for Intel SPR IMC PMU
 
  - Fix perf IRQ throttling bug
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK3GoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hqghAAwZyNLY8oAu/izkp5DKML7okLa48nh1+K
 JWsM4GT16ldx6MBqxX4V7tiYfAeo6ydCkd/LbxPklAk4Fmcwt5HWotOEGHab7N7B
 iTOix481TIa+E7ERuDLU5vS0chtdE5CrVfmBwtkI4WEv0c8vBwQHvRzy6RaoGG+2
 XSqwFhkDG7l6N6rIz/V8zJKFo0hNPou0bllYJMM+A+BRdOthKhgXNjM6yuuKXgst
 TWxByDCoAG59uWvzq3WmKRLk/bJ4VC2lRDFpo01xtPvn1hA/Bs8MDF1940NG8uEh
 u7aTI1ZyaJJnnxaTk98r/aZIMYlHHWIIKnymtVFT52ZLeKARDt6MmX5ttec16Cd1
 a5IXuiWPkiSfGlmN7Q4sK89eg606WQ/i+eIKfNMG0ZruvKFiT0uD+gNUlQvE/4fk
 6OgJot/iF2B5+UWcBmEOMYV4jagSWLFkIQ3vGVT6E3Zp8gEa1/R3ek1aSbQeb8Gi
 kvMjcwthUNOgGIXrFKbdDTN/3seoNmrgIP2wetUzXhVaAOIYz6mpxIcXLc+vNiDh
 soTRtkLgNTkrVQ8AKp7snKNRLjazL6CqUd+RILH4tM+bshFnyeH6K4Eck+Fo//Dn
 XtEHry6nJ0ZPhsJPRwFHOoYCAuDQdy5DG7QQ07qXYW9DyJV7ZmefDJYOEjgcip6G
 f3/sUe7ekRA=
 =41Sa
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:

 - Add Intel Granite Rapids support

 - Add uncore events for Intel SPR IMC PMU

 - Fix perf IRQ throttling bug

* tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add events for Intel SPR IMC PMU
  perf/core: Fix hardlockup failure caused by perf throttle
  perf/x86/cstate: Add Granite Rapids support
  perf/x86/msr: Add Granite Rapids
  perf/x86/intel: Add Granite Rapids
2023-04-28 14:41:53 -07:00
Stephane Eranian
743767d6f6 perf/x86/intel/uncore: Add events for Intel SPR IMC PMU
Add missing clockticks and cas_count_* events for Intel SapphireRapids IMC
PMU. These events are useful to measure memory bandwidth.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230419214241.2310385-1-eranian@google.com
2023-04-21 13:24:23 +02:00
Artem Bityutskiy
872d28001b perf/x86/cstate: Add Granite Rapids support
Granite Rapids Xeon is successor or Emerald Rapids Xeon, and it will use
the same C-state residency counters as Emerald Rapids (and previous
Xeons, all the way back to Ice Lake Xeon).

Add Granite Rapids Xeon support.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230314170041.2967712-3-kan.liang@linux.intel.com
2023-03-21 14:43:08 +01:00
Kan Liang
5a796d5cb5 perf/x86/msr: Add Granite Rapids
The same as Sapphire Rapids, the SMI_COUNT MSR is also supported on
Granite Rapids. Add Granite Rapids model.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230314170041.2967712-2-kan.liang@linux.intel.com
2023-03-21 14:43:08 +01:00
Kan Liang
bc4000fdb0 perf/x86/intel: Add Granite Rapids
From core PMU's perspective, Granite Rapids is similar to the Sapphire
Rapids. The key differences include:

 - Doesn't need the AUX event workaround for the mem load event.
   (Implement in this patch).

 - Support Retire Latency (Has been implemented in the commit
   c87a31093c ("perf/x86: Support Retire Latency"))

 - The event list, which will be supported in the perf tool later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230314170041.2967712-1-kan.liang@linux.intel.com
2023-03-21 14:43:08 +01:00
Breno Leitao
263f5ecaf7 perf/x86/amd/core: Always clear status for idx
The variable 'status' (which contains the unhandled overflow bits) is
not being properly masked in some cases, displaying the following
warning:

  WARNING: CPU: 156 PID: 475601 at arch/x86/events/amd/core.c:972 amd_pmu_v2_handle_irq+0x216/0x270

This seems to be happening because the loop is being continued before
the status bit being unset, in case x86_perf_event_set_period()
returns 0. This is also causing an inconsistency because the "handled"
counter is incremented, but the status bit is not cleaned.

Move the bit cleaning together above, together when the "handled"
counter is incremented.

Fixes: 7685665c39 ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230321113338.1669660-1-leitao@debian.org
2023-03-21 14:43:05 +01:00
Linus Torvalds
49d5759268 ARM:
- Provide a virtual cache topology to the guest to avoid
   inconsistencies with migration on heterogenous systems. Non secure
   software has no practical need to traverse the caches by set/way in
   the first place.
 
 - Add support for taking stage-2 access faults in parallel. This was an
   accidental omission in the original parallel faults implementation,
   but should provide a marginal improvement to machines w/o FEAT_HAFDBS
   (such as hardware from the fruit company).
 
 - A preamble to adding support for nested virtualization to KVM,
   including vEL2 register state, rudimentary nested exception handling
   and masking unsupported features for nested guests.
 
 - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
   resuming a CPU when running pKVM.
 
 - VGIC maintenance interrupt support for the AIC
 
 - Improvements to the arch timer emulation, primarily aimed at reducing
   the trap overhead of running nested.
 
 - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
   interest of CI systems.
 
 - Avoid VM-wide stop-the-world operations when a vCPU accesses its own
   redistributor.
 
 - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
   in the host.
 
 - Aesthetic and comment/kerneldoc fixes
 
 - Drop the vestiges of the old Columbia mailing list and add [Oliver]
   as co-maintainer
 
 This also drags in arm64's 'for-next/sme2' branch, because both it and
 the PSCI relay changes touch the EL2 initialization code.
 
 RISC-V:
 
 - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
 
 - Correctly place the guest in S-mode after redirecting a trap to the guest
 
 - Redirect illegal instruction traps to guest
 
 - SBI PMU support for guest
 
 s390:
 
 - Two patches sorting out confusion between virtual and physical
   addresses, which currently are the same on s390.
 
 - A new ioctl that performs cmpxchg on guest memory
 
 - A few fixes
 
 x86:
 
 - Change tdp_mmu to a read-only parameter
 
 - Separate TDP and shadow MMU page fault paths
 
 - Enable Hyper-V invariant TSC control
 
 - Fix a variety of APICv and AVIC bugs, some of them real-world,
   some of them affecting architecurally legal but unlikely to
   happen in practice
 
 - Mark APIC timer as expired if its in one-shot mode and the count
   underflows while the vCPU task was being migrated
 
 - Advertise support for Intel's new fast REP string features
 
 - Fix a double-shootdown issue in the emergency reboot code
 
 - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
   similar treatment to VMX
 
 - Update Xen's TSC info CPUID sub-leaves as appropriate
 
 - Add support for Hyper-V's extended hypercalls, where "support" at this
   point is just forwarding the hypercalls to userspace
 
 - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
   MSR filters
 
 - One-off fixes and cleanups
 
 - Fix and cleanup the range-based TLB flushing code, used when KVM is
   running on Hyper-V
 
 - Add support for filtering PMU events using a mask.  If userspace
   wants to restrict heavily what events the guest can use, it can now
   do so without needing an absurd number of filter entries
 
 - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
   support is disabled
 
 - Add PEBS support for Intel Sapphire Rapids
 
 - Fix a mostly benign overflow bug in SEV's send|receive_update_data()
 
 - Move several SVM-specific flags into vcpu_svm
 
 x86 Intel:
 
 - Handle NMI VM-Exits before leaving the noinstr region
 
 - A few trivial cleanups in the VM-Enter flows
 
 - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
   EPTP switching (or any other VM function) for L1
 
 - Fix a crash when using eVMCS's enlighted MSR bitmaps
 
 Generic:
 
 - Clean up the hardware enable and initialization flow, which was
   scattered around multiple arch-specific hooks.  Instead, just
   let the arch code call into generic code.  Both x86 and ARM should
   benefit from not having to fight common KVM code's notion of how
   to do initialization.
 
 - Account allocations in generic kvm_arch_alloc_vm()
 
 - Fix a memory leak if coalesced MMIO unregistration fails
 
 selftests:
 
 - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
   the correct hypercall instruction instead of relying on KVM to patch
   in VMMCALL
 
 - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
 mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
 9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
 duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
 VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
 /WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
 =goe1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Provide a virtual cache topology to the guest to avoid
     inconsistencies with migration on heterogenous systems. Non secure
     software has no practical need to traverse the caches by set/way in
     the first place

   - Add support for taking stage-2 access faults in parallel. This was
     an accidental omission in the original parallel faults
     implementation, but should provide a marginal improvement to
     machines w/o FEAT_HAFDBS (such as hardware from the fruit company)

   - A preamble to adding support for nested virtualization to KVM,
     including vEL2 register state, rudimentary nested exception
     handling and masking unsupported features for nested guests

   - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
     resuming a CPU when running pKVM

   - VGIC maintenance interrupt support for the AIC

   - Improvements to the arch timer emulation, primarily aimed at
     reducing the trap overhead of running nested

   - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
     interest of CI systems

   - Avoid VM-wide stop-the-world operations when a vCPU accesses its
     own redistributor

   - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
     exceptions in the host

   - Aesthetic and comment/kerneldoc fixes

   - Drop the vestiges of the old Columbia mailing list and add [Oliver]
     as co-maintainer

  RISC-V:

   - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE

   - Correctly place the guest in S-mode after redirecting a trap to the
     guest

   - Redirect illegal instruction traps to guest

   - SBI PMU support for guest

  s390:

   - Sort out confusion between virtual and physical addresses, which
     currently are the same on s390

   - A new ioctl that performs cmpxchg on guest memory

   - A few fixes

  x86:

   - Change tdp_mmu to a read-only parameter

   - Separate TDP and shadow MMU page fault paths

   - Enable Hyper-V invariant TSC control

   - Fix a variety of APICv and AVIC bugs, some of them real-world, some
     of them affecting architecurally legal but unlikely to happen in
     practice

   - Mark APIC timer as expired if its in one-shot mode and the count
     underflows while the vCPU task was being migrated

   - Advertise support for Intel's new fast REP string features

   - Fix a double-shootdown issue in the emergency reboot code

   - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
     SVM similar treatment to VMX

   - Update Xen's TSC info CPUID sub-leaves as appropriate

   - Add support for Hyper-V's extended hypercalls, where "support" at
     this point is just forwarding the hypercalls to userspace

   - Clean up the kvm->lock vs. kvm->srcu sequences when updating the
     PMU and MSR filters

   - One-off fixes and cleanups

   - Fix and cleanup the range-based TLB flushing code, used when KVM is
     running on Hyper-V

   - Add support for filtering PMU events using a mask. If userspace
     wants to restrict heavily what events the guest can use, it can now
     do so without needing an absurd number of filter entries

   - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
     support is disabled

   - Add PEBS support for Intel Sapphire Rapids

   - Fix a mostly benign overflow bug in SEV's
     send|receive_update_data()

   - Move several SVM-specific flags into vcpu_svm

  x86 Intel:

   - Handle NMI VM-Exits before leaving the noinstr region

   - A few trivial cleanups in the VM-Enter flows

   - Stop enabling VMFUNC for L1 purely to document that KVM doesn't
     support EPTP switching (or any other VM function) for L1

   - Fix a crash when using eVMCS's enlighted MSR bitmaps

  Generic:

   - Clean up the hardware enable and initialization flow, which was
     scattered around multiple arch-specific hooks. Instead, just let
     the arch code call into generic code. Both x86 and ARM should
     benefit from not having to fight common KVM code's notion of how to
     do initialization

   - Account allocations in generic kvm_arch_alloc_vm()

   - Fix a memory leak if coalesced MMIO unregistration fails

  selftests:

   - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
     emit the correct hypercall instruction instead of relying on KVM to
     patch in VMMCALL

   - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
  KVM: SVM: hyper-v: placate modpost section mismatch error
  KVM: x86/mmu: Make tdp_mmu_allowed static
  KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Handle SMCs taken from virtual EL2
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: Use the S2 MMU context to iterate over S2 table
  ...
2023-02-25 11:30:21 -08:00
Linus Torvalds
1f2d9ffc7a Scheduler updates in this cycle are:
- Improve the scalability of the CFS bandwidth unthrottling logic
    with large number of CPUs.
 
  - Fix & rework various cpuidle routines, simplify interaction with
    the generic scheduler code. Add __cpuidle methods as noinstr to
    objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks.
 
  - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS,
    to query previously issued registrations.
 
  - Limit scheduler slice duration to the sysctl_sched_latency period,
    to improve scheduling granularity with a large number of SCHED_IDLE
    tasks.
 
  - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
    but also enable them to prevent a cascade of followup problems and
    repeat warnings.
 
  - Fix the rescheduling logic in prio_changed_dl().
 
  - Micro-optimize cpufreq and sched-util methods.
 
  - Micro-optimize ttwu_runnable()
 
  - Micro-optimize the idle-scanning in update_numa_stats(),
    select_idle_capacity() and steal_cookie_task().
 
  - Update the RSEQ code & self-tests
 
  - Constify various scheduler methods
 
  - Remove unused methods
 
  - Refine __init tags
 
  - Documentation updates
 
  - ... Misc other cleanups, fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzbJwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIvA//ZcEaB8Z6ChLRQjM+bsaudKJu3pdLQbPK
 iYbP8Da+LsAfxbEfYuGV3m+jIp0LlBOtsI/EezxQrXV+V7FvNyAX9Y00eEu/zlj8
 7Jn3LMy/DBYTwH7LwVdcU0MyIVI8ZPc6WNnkx0LOtGZn8n+qfHPSDzcP3CW+a5AV
 UvllPYpYyEmsX0Eby7CF4Ue8mSmbViw/xR3rNr8ZSve0c25XzKabw8O9kE3jiHxP
 d/zERJoAYeDyYUEuZqhfn5dTlB4an4IjNEkAfRE5SQ09RA8Gkxsa5Ar8gob9e9M1
 eQsdd4/bdhnrkM8L5qDZczqmgCTZ2bukQrxkBXhRDhLgoFxwAn77b+2ZjmIW3Lae
 AyGqRcDSg1q2oxaYm5ZiuO/t26aDOZu9vPHyHRDGt95EGbZlrp+GgeePyfCigJYz
 UmPdZAAcHdSymnnnlcvdG37WVvaVkpgWZzd8LbtBi23QR+Zc4WQ2IlgnUS5WKNNf
 VOBcAcP6E1IslDotZDQCc2dPFFQoQQEssVooyUc5oMytm7BsvxXLOeHG+Ncu/8uc
 H+U8Qn8jnqTxJbC5hkWQIJlhVKCq2FJrHxxySYTKROfUNcDgCmxboFeAcXTCIU1K
 T0S+sdoTS/CvtLklRkG0j6B8N4N98mOd9cFwUV3tX+/gMLMep3hCQs5L76JagvC5
 skkQXoONNaM=
 =l1nN
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Improve the scalability of the CFS bandwidth unthrottling logic with
   large number of CPUs.

 - Fix & rework various cpuidle routines, simplify interaction with the
   generic scheduler code. Add __cpuidle methods as noinstr to objtool's
   noinstr detection and fix boatloads of cpuidle bugs & quirks.

 - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
   previously issued registrations.

 - Limit scheduler slice duration to the sysctl_sched_latency period, to
   improve scheduling granularity with a large number of SCHED_IDLE
   tasks.

 - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
   but also enable them to prevent a cascade of followup problems and
   repeat warnings.

 - Fix the rescheduling logic in prio_changed_dl().

 - Micro-optimize cpufreq and sched-util methods.

 - Micro-optimize ttwu_runnable()

 - Micro-optimize the idle-scanning in update_numa_stats(),
   select_idle_capacity() and steal_cookie_task().

 - Update the RSEQ code & self-tests

 - Constify various scheduler methods

 - Remove unused methods

 - Refine __init tags

 - Documentation updates

 - Misc other cleanups, fixes

* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  sched/rt: pick_next_rt_entity(): check list_entry
  sched/deadline: Add more reschedule cases to prio_changed_dl()
  sched/fair: sanitize vruntime of entity being placed
  sched/fair: Remove capacity inversion detection
  sched/fair: unlink misfit task from cpu overutilized
  objtool: mem*() are not uaccess safe
  cpuidle: Fix poll_idle() noinstr annotation
  sched/clock: Make local_clock() noinstr
  sched/clock/x86: Mark sched_clock() noinstr
  x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
  x86/atomics: Always inline arch_atomic64*()
  cpuidle: tracing, preempt: Squash _rcuidle tracing
  cpuidle: tracing: Warn about !rcu_is_watching()
  cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
  cpuidle: drivers: firmware: psci: Dont instrument suspend code
  KVM: selftests: Fix build of rseq test
  exit: Detect and fix irq disabled state in oops
  cpuidle, arm64: Fix the ARM64 cpuidle logic
  cpuidle: mvebu: Fix duplicate flags assignment
  sched/fair: Limit sched slice duration
  ...
2023-02-20 17:41:08 -08:00
Linus Torvalds
a2f0e7eee1 The latest perf updates in this cycle are:
- Optimize perf_sample_data layout
  - Prepare sample data handling for BPF integration
  - Update the x86 PMU driver for Intel Meteor Lake
  - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
    discovery breakage
  - Fix the x86 Zhaoxin PMU driver
  - Cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzaHgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jYQg/+KRfobCevMQlZVnz09T3SsJ4ahJ587BL6
 g2C6kobyUNfeChpFVroBkTR+yCb6Mq4xGr2nda9+2E978BYu9eanpx/u/bXNQ6NU
 6YhLwgRrlFXonYn07kFfUJeELZ0W+zpPvymEN1KhTQWcrgXDfXRt2VfMwNsVxGRF
 ZRyCWK+UOzSMU22FtW3I/xVLBB0vio9Y6wRC5QOpDVW5YtGwQGust7GJ53JPK43J
 m2soJvWORauT+v0aqc7ggOtKd6pahVoXrDrbktxtq9N0ZGI+PubVCGevex++cXm/
 B3QSf6VcMMuU6pfzxiEwRa8Whrc3XFeSDEfvMjC5v3becGNkdNBnGOJzYprwgRZJ
 irb6/dSrv5P2lj6WphsO1Wzcm7EoWh8M7DVOMh/13Y/oODRdOrv48112Don9UURC
 EPyvzAzizqdwdDopUmfiqUwuAXqb8uPZqCgmlz/NJkVz1/ijlfrmLgeDuf0vI7Aq
 HznzzRwjFHzyCH7D+rtonFh3JDaqgaouY76tpC5yTtzKbZPlFT8kzeCvqkTMnGgH
 czZnSNc/kBup0HDkNSlthK+TyrMXWKeVa8KQSY1E0NJHO4IBBCMzZywSoAaeofQK
 hqfQyofX9XHmuHhCA4yIfv1XkZGlBTxpPAyDdHjgs9iJTsodSYMs8ESY08eW8DXn
 Ld/35O6SylM=
 =ztUT
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:

 - Optimize perf_sample_data layout

 - Prepare sample data handling for BPF integration

 - Update the x86 PMU driver for Intel Meteor Lake

 - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
   discovery breakage

 - Fix the x86 Zhaoxin PMU driver

 - Cleanups

* tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  perf/x86/intel/uncore: Add Meteor Lake support
  x86/perf/zhaoxin: Add stepping check for ZXC
  perf/x86/intel/ds: Fix the conversion from TSC to perf time
  perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
  perf/x86/uncore: Add a quirk for UPI on SPR
  perf/x86/uncore: Ignore broken units in discovery table
  perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
  perf/x86/uncore: Factor out uncore_device_to_die()
  perf/core: Call perf_prepare_sample() before running BPF
  perf/core: Introduce perf_prepare_header()
  perf/core: Do not pass header for sample ID init
  perf/core: Set data->sample_flags in perf_prepare_sample()
  perf/core: Add perf_sample_save_brstack() helper
  perf/core: Add perf_sample_save_raw_data() helper
  perf/core: Add perf_sample_save_callchain() helper
  perf/core: Save the dynamic parts of sample data size
  x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation
  perf/core: Change the layout of perf_sample_data
  perf/x86/msr: Add Meteor Lake support
  perf/x86/cstate: Add Meteor Lake support
  ...
2023-02-20 17:29:55 -08:00
Paolo Bonzini
33436335e9 KVM/riscv changes for 6.3
- Fix wrong usage of PGDIR_SIZE to check page sizes
 - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
 - Redirect illegal instruction traps to guest
 - SBI PMU support for guest
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmPifFIACgkQrUjsVaLH
 LAcEyxAAinMBaBhiPmwWZQvcCzh/UFmJo8BQCwAPuwoc/a4ZGAR7ylzd0oJilP8M
 wSgX6Ad8XF+CEW2VpxW9nwyi41N25ep1Lrf8vOaWy9L9QNUo0t15WrCIbXT2p399
 HrK9fz7HHKKIMsJy+rYb9EepdmMf55xtr1Y/EjyvhoDQbrEMlKsAODYz/SUoriQG
 Tn3cCYBzLdvzDzu0xXM9v+nsetWXdajK/v4je+mE3NQceXhePAO4oVWP4IpnoROd
 ZQm3evvVdf0WtKG9curxwMB7jjBqDBFrcLYl0qHGa7pi2o5PzVM7esgaV47KwetH
 IgA/Mrf1IfzpgM7VYDDax5wUHlKj63KisqU0J8rU3PUloQXaWqv7+ho51t9GzZ/i
 9x4uyO/evVntgyTw6HCbqmQJDgEtJiG1ydrR/ydBMYHLnh7LPY2UpKgcqmirtbkK
 1/DYDp84vikQ5VW1hc8IACdoBShh9Moh4xsEStzkTrIeHcZCjtORXUh8UIPZ0Mu2
 7Mnkktu9I55SLwA3rwH/EYT1ISrOV1G+q3wfqgeLpn8YUWwCIiqWQ5Ur0/WSMJse
 uJ3HedZDzj9T4n4khX+mKEYh6joAafQZag+4TID2lRSwd0S/mpeC22hYrViMdDmq
 yhE+JNin/sz4AVaHNzGwfqk2NC2RFl9aRn2X0xTwyBubif9pKMQ=
 =spUL
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.3

- Fix wrong usage of PGDIR_SIZE to check page sizes
- Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
2023-02-15 12:33:28 -05:00
Sean Christopherson
4b4191b8ae perf/x86: Refuse to export capabilities for hybrid PMUs
Now that KVM disables vPMU support on hybrid CPUs, WARN and return zeros
if perf_get_x86_pmu_capability() is invoked on a hybrid CPU.  The helper
doesn't provide an accurate accounting of the PMU capabilities for hybrid
CPUs and needs to be enhanced if KVM, or anything else outside of perf,
wants to act on the PMU capabilities.

Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-15 08:25:44 -05:00
Kan Liang
c828441f21 perf/x86/intel/uncore: Add Meteor Lake support
The uncore subsystem for Meteor Lake is similar to the previous Alder
Lake. The main difference is that MTL provides PMU support for different
tiles, while ADL only provides PMU support for the whole package. On
ADL, there are CBOX, ARB, and clockbox uncore PMON units. On MTL, they
are split into CBOX/HAC_CBOX, ARB/HAC_ARB, and cncu/sncu which provides
a fixed counter for clockticks. Also, new MSR addresses are introduced
on MTL.

The IMC uncore PMON is the same as Alder Lake. Add new PCIIDs of IMC for
Meteor Lake.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230210190238.1726237-1-kan.liang@linux.intel.com
2023-02-11 12:39:03 +01:00
silviazhao
fd636b6a9b x86/perf/zhaoxin: Add stepping check for ZXC
Some of Nano series processors will lead GP when accessing
PMC fixed counter. Meanwhile, their hardware support for PMC
has not announced externally. So exclude Nano CPUs from ZXC
by checking stepping information. This is an unambiguous way
to differentiate between ZXC and Nano CPUs.

Following are Nano and ZXC FMS information:
Nano FMS: Family=6, Model=F, Stepping=[0-A][C-D]
ZXC FMS:  Family=6, Model=F, Stepping=E-F OR
          Family=6, Model=0x19, Stepping=0-3

Fixes: 3a4ac121c2 ("x86/perf: Add hardware performance events support for Zhaoxin CPU.")

Reported-by: Arjan <8vvbbqzo567a@nospam.xutrox.com>
Reported-by: Kevin Brace <kevinbrace@gmx.com>
Signed-off-by: silviazhao <silviazhao-oc@zhaoxin.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212389
2023-02-11 11:18:12 +01:00
Kan Liang
89e97eb8ce perf/x86/intel/ds: Fix the conversion from TSC to perf time
The time order is incorrect when the TSC in a PEBS record is used.

 $perf record -e cycles:upp dd if=/dev/zero of=/dev/null
  count=10000
 $ perf script --show-task-events
       perf-exec     0     0.000000: PERF_RECORD_COMM: perf-exec:915/915
              dd   915   106.479872: PERF_RECORD_COMM exec: dd:915/915
              dd   915   106.483270: PERF_RECORD_EXIT(915:915):(914:914)
              dd   915   106.512429:          1 cycles:upp:
 ffffffff96c011b7 [unknown] ([unknown])
 ... ...

The perf time is from sched_clock_cpu(). The current PEBS code
unconditionally convert the TSC to native_sched_clock(). There is a
shift between the two clocks. If the TSC is stable, the shift is
consistent, __sched_clock_offset. If the TSC is unstable, the shift has
to be calculated at runtime.

This patch doesn't support the conversion when the TSC is unstable. The
TSC unstable case is a corner case and very unlikely to happen. If it
happens, the TSC in a PEBS record will be dropped and fall back to
perf_event_clock().

Fixes: 47a3aeb39e ("perf/x86/intel/pebs: Fix PEBS timestamps overwritten")
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/CAM9d7cgWDVAq8-11RbJ2uGfwkKD6fA-OMwOKDrNUrU_=8MgEjg@mail.gmail.com/
2023-02-11 11:18:12 +01:00
Like Xu
13738a3647 perf/x86/intel: Expose EPT-friendly PEBS for SPR and future models
According to Intel SDM, the EPT-friendly PEBS is supported by all the
platforms after ICX, ADL and the future platforms with PEBS format 5.

Currently the only in-kernel user of this capability is KVM, which has
very limited support for hybrid core pmu, so ADL and its successors do
not currently expose this capability. When both hybrid core and PEBS
format 5 are present, KVM will decide on its own merits.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.org
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221109082802.27543-4-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-01 16:42:36 -08:00
Ingo Molnar
57a30218fa Linux 6.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
 E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
 nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
 TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
 s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
 ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
 SDc/Y/c=
 =zpaD
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc6' into sched/core, to pick up fixes

Pick up fixes before merging another batch of cpuidle updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-31 15:01:20 +01:00
Kan Liang
5d515ee40c perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
The kernel warning message is triggered, when SPR MCC is used.

[   17.945331] ------------[ cut here ]------------
[   17.946305] WARNING: CPU: 65 PID: 1 at
arch/x86/events/intel/uncore_discovery.c:184
intel_uncore_has_discovery_tables+0x4c0/0x65c
[   17.946305] Modules linked in:
[   17.946305] CPU: 65 PID: 1 Comm: swapper/0 Not tainted
5.4.17-2136.313.1-X10-2c+ #4

It's caused by the broken discovery table of UPI.

The discovery tables are from hardware. Except for dropping the broken
information, there is nothing Linux can do. Using WARN_ON_ONCE() is
overkilled.

Use the pr_info() to replace WARN_ON_ONCE(), and specify what uncore unit
is dropped and the reason.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-6-kan.liang@linux.intel.com
2023-01-21 00:06:13 +01:00
Kan Liang
65248a9a9e perf/x86/uncore: Add a quirk for UPI on SPR
The discovery table of UPI on some SPR variants, e.g., MCC, is broken.
The third UPI table may includes a wrong address which points to a
non-exists device. The bug impacts both UPI and M3UPI uncore PMON.

Use a pre-defined UPI and M3UPI table to replace the broken table.

Different BIOS may populate a device into a different domain or a
different BUS. The accurate location can only be retrieved at load time.
Add spr_update_device_location() to update the location of the UPI and
M3UPI in the pre-defined table.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-5-kan.liang@linux.intel.com
2023-01-21 00:06:13 +01:00
Kan Liang
bd9514a4d5 perf/x86/uncore: Ignore broken units in discovery table
Some units in a discovery table may be broken, e.g., UPI of SPR MCC.
A generic method is required to ignore the broken units.

Add uncore_units_ignore in the struct intel_uncore_init_fun, which
indicates the type ID of broken units. It will be assigned by the
platform-specific code later when the platform has a broken discovery
table.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-4-kan.liang@linux.intel.com
2023-01-21 00:06:12 +01:00
Kan Liang
3af548f236 perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
The current code assumes that the discovery table provides valid
box_ids for the normal units. It's not the case anymore since some units
in the discovery table are broken on some SPR variants.

Factor out uncore_get_box_id(). Check the existence of the type->box_ids
before using it. If it's not available, use pmu_idx.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-3-kan.liang@linux.intel.com
2023-01-21 00:06:12 +01:00
Kan Liang
dbf061b262 perf/x86/uncore: Factor out uncore_device_to_die()
The same code is used to retrieve the logical die ID with a given PCI
device in both the discovery code and the code that supports a system
with > 8 nodes.

Factor out uncore_device_to_die() to replace the duplicate code.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-2-kan.liang@linux.intel.com
2023-01-21 00:06:11 +01:00
Kan Liang
5a8a05f165 perf/x86/intel/cstate: Add Emerald Rapids
From the perspective of Intel cstate residency counters,
Emerald Rapids is the same as the Sapphire Rapids and Ice Lake.
Add Emerald Rapids model.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-2-kan.liang@linux.intel.com
2023-01-18 12:42:49 +01:00
Kan Liang
6795e558e9 perf/x86/intel: Add Emerald Rapids
From core PMU's perspective, Emerald Rapids is the same as the Sapphire
Rapids. The only difference is the event list, which will be
supported in the perf tool later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-1-kan.liang@linux.intel.com
2023-01-18 12:42:49 +01:00
Namhyung Kim
f6e707156e perf/core: Introduce perf_prepare_header()
Factor out perf_prepare_header() so that it can call
perf_prepare_sample() without a header if not needed.

Also it checks the filtered_sample_type to avoid duplicate
work when perf_prepare_sample() is called twice (or more).

Suggested-by: Peter Zijlstr <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-8-namhyung@kernel.org
2023-01-18 11:57:20 +01:00
Namhyung Kim
eb55b455ef perf/core: Add perf_sample_save_brstack() helper
When we saves the branch stack to the perf sample data, we needs to
update the sample flags and the dynamic size.  To make sure this is
done consistently, add the perf_sample_save_brstack() helper and
convert all call sites.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-5-namhyung@kernel.org
2023-01-18 11:57:20 +01:00
Namhyung Kim
0a9081cf0a perf/core: Add perf_sample_save_raw_data() helper
When we save the raw_data to the perf sample data, we need to update
the sample flags and the dynamic size.  To make sure this is done
consistently, add the perf_sample_save_raw_data() helper and convert
all call sites.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-4-namhyung@kernel.org
2023-01-18 11:57:19 +01:00
Namhyung Kim
31046500c1 perf/core: Add perf_sample_save_callchain() helper
When we save the callchain to the perf sample data, we need to update
the sample flags and the dynamic size.  To ensure this is done consistently,
add the perf_sample_save_callchain() helper and convert all call sites.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-3-namhyung@kernel.org
2023-01-18 11:57:19 +01:00
Ingo Molnar
65adf3a57c Linux 6.2-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPEGkMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGE4YH/1sxdve3nlWT7eyf
 VDanaCZFc2u6Sk1td23HyhvOVbFjj1E39UokHK9YfzD4hhn+u9SFvDgoXHGhHzYq
 IhDvNM3mLsKpEtjwNbbi/lt6tryJchhZ96O5leRiUxZaw/GBVnHrKGR56vCjC/DB
 zq8td4I+TmTPMpsL3e3WZqJoX3bjTqgZFNShA1mZFYtuQRpAVkJkafMRwuAySIMQ
 +FyK/zZ+MpGJ7y/UmWfaBmS5GtjBz0fva7fTbEkUqk8bDtNUWMBhsTKtgU1LBpBe
 Yrkhs3sdQdpKSm9yNaNrlLhSjFwkusV5kR09p6/5w+wthdPmjnoUZuEnAZpYyydE
 ZY7+Vqk=
 =q9Ac
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc4' into perf/core, to pick up fixes

Move from the -rc1 base to the fresher -rc4 kernel that
has various fixes included, before applying a larger
patchset.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-18 11:56:57 +01:00
Peter Zijlstra
1f7c232ee0 x86/perf/amd: Remove tracing from perf_lopwr_cb()
The perf_lopwr_cb() function is called from the idle routines; there
is no RCU there, we must not enter tracing.

Use __always_inline, noidle annotations and existing no-trace methods.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195539.392862891@infradead.org
2023-01-13 11:03:21 +01:00