Commit Graph

313 Commits

Author SHA1 Message Date
Paolo Bonzini
f292dc8aad KVM x86 misc changes for 6.7:
- Add CONFIG_KVM_MAX_NR_VCPUS to allow supporting up to 4096 vCPUs without
    forcing more common use cases to eat the extra memory overhead.
 
  - Add IBPB and SBPB virtualization support.
 
  - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
    creating the original vCPU would cause KVM to try to synchronize the vCPU's
    TSC and thus clobber the correct TSC being set by userspace.
 
  - Compute guest wall clock using a single TSC read to avoid generating an
    inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
 
  - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
     about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
 
  - Don't apply side effects to Hyper-V's synthetic timer on writes from
    userspace to fix an issue where the auto-enable behavior can trigger
    spurious interrupts, i.e. do auto-enabling only for guest writes.
 
  - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
    without PML enabled.
 
  - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
 
  - Use octal notation for file permissions through KVM x86.
 
  - Fix a handful of typo fixes and warts.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmU8EugSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5xS0P+gPTDO81CUZO70LrO2W4E7toRBf/F9x1
 /v5D/76p9hG32Z6+BJs/xxDxJFagw75MtoR5oKivtXiip3TxbfOyDOlaQkIRo85E
 /d95il/LRidL3Mv3TXRj1lykXnxSSz9tigAGEZti1Y9Fn9fXEIwurJH7dU5cBI1E
 fin5bsDaTNRjG4jjTiEUbnKPRTlD/S7CQJn4CaYvZhMv/eJkYDLyBBVy4VLoLzvD
 ctL6VJQLGPVxbxr9mEmulaqMrSuDIQQLkRVQJAViKyerBInTEc5d/GPCHuE8O3zi
 0r/QSJbMS9titWLz07NhJ1UH4VJNyaEhRlyJPSFhBW4h6dzUb3EXdUe0Hwa+JH/S
 H2cVqsANItTCIhvDtuEGIRDahu0eD+63h90InJ0gEVL1kSJS+UWZHB71PkUEQgAV
 2OsuT1D26fuxrv+0b9ioBZURycqKw++zGsrwyVhe77eBgqBJ12tbL4TAD+QNjaQ5
 HZTCe6YV83gZoOMeVkoTGSf96s9lGORgxsaAIXmFuLB9RVCVXhVh0ph2HZsnV8Hw
 ZXEXpBEFo7GUhb0NIvsk2W73QL87A3fLv15yITWc8KuC7/dXP9z6KpSKjFySS69X
 uWD1MVx6shhvbg97UzoJlXc3/z0aVzmdZJudE5d0gcFvAjIItqp6ICPOoKxfj8pT
 tqRZu3kVHd61
 =sfp8
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.7' of https://github.com/kvm-x86/linux into HEAD

KVM x86 misc changes for 6.7:

 - Add CONFIG_KVM_MAX_NR_VCPUS to allow supporting up to 4096 vCPUs without
   forcing more common use cases to eat the extra memory overhead.

 - Add IBPB and SBPB virtualization support.

 - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
   creating the original vCPU would cause KVM to try to synchronize the vCPU's
   TSC and thus clobber the correct TSC being set by userspace.

 - Compute guest wall clock using a single TSC read to avoid generating an
   inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.

 - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
    about a "Firmware Bug" if the bit isn't set for select F/M/S combos.

 - Don't apply side effects to Hyper-V's synthetic timer on writes from
   userspace to fix an issue where the auto-enable behavior can trigger
   spurious interrupts, i.e. do auto-enabling only for guest writes.

 - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
   without PML enabled.

 - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.

 - Use octal notation for file permissions through KVM x86.

 - Fix a handful of typo fixes and warts.
2023-10-31 10:15:15 -04:00
Linus Torvalds
bceb7accb7 Performance events changes for v6.7 are:
- Add AMD Unified Memory Controller (UMC) events introduced with Zen 4
  - Simplify & clean up the uncore management code
  - Fall back from RDPMC to RDMSR on certain uncore PMUs
  - Improve per-package and cstate event reading
  - Extend the Intel ref-cycles event to GP counters
  - Fix Intel MTL event constraints
  - Improve the Intel hybrid CPU handling code
  - Micro-optimize the RAPL code
  - Optimize perf_cgroup_switch()
  - Improve large AUX area error handling
 
  - Misc fixes and cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU89YsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iQqQ/9EF9mG4te5By4qN+B7jADCmE71xG5ViKz
 sp4Thl86SHxhwFuiHn8dMUixrp+qbcemi5yTbQ9TF8cKl4s3Ju2CihU8jaauUp0a
 iS5W0IliMqLD1pxQoXAPLuPVInVYgrNOCbR4l6l7D6ervh5Z6PVEf7SVeAP3L5wo
 QV/V3NKkrYeNQL+FoKhCH8Vhxw0HxUmKJO7UhW6yuCt7BAok9Es18h3OVnn+7es4
 BB7VI/JvdmXf2ioKhTPnDXJjC+vh5vnwiBoTcdQ2W9ADhWUvfL4ozxOXT6z7oC3A
 nwBOdXf8w8Rqnqqd8hduop1QUrusMxlEVgOMCk27qHx97uWgPceZWdoxDXGHBiRK
 fqJAwXERf9wp5/M57NDlPwyf/43Hocdx2CdLkQBpfD78/k/sB5hW0KxnzY0FUI9x
 jBRQyWD05IDJATBaMHz+VbrexS+Itvjp2QvSiSm9zislYD4zA9fQ3lAgFhEpcUbA
 ZA/nN4t+CbiGEAsJEuBPlvSC1ahUwVP/0nz3PFlVWFDqAx0mXgVNKBe083A9yh7I
 dVisVY6KPAVDzyOc1LqzU8WFXNFnIkIIaLrb6fRHJVEM8MDfpLPS/a+7AHdRcDP4
 yq6fjVVjyP7e9lSQLYBUP3/3uiVnWQj92l6V6CrcgDMX5rDOb0VN+BQrmhPR6fWY
 WEim6WZrZj4=
 =OLsv
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance event updates from Ingo Molnar:
 - Add AMD Unified Memory Controller (UMC) events introduced with Zen 4
 - Simplify & clean up the uncore management code
 - Fall back from RDPMC to RDMSR on certain uncore PMUs
 - Improve per-package and cstate event reading
 - Extend the Intel ref-cycles event to GP counters
 - Fix Intel MTL event constraints
 - Improve the Intel hybrid CPU handling code
 - Micro-optimize the RAPL code
 - Optimize perf_cgroup_switch()
 - Improve large AUX area error handling
 - Misc fixes and cleanups

* tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  perf/x86/amd/uncore: Pass through error code for initialization failures, instead of -ENODEV
  perf/x86/amd/uncore: Fix uninitialized return value in amd_uncore_init()
  x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations
  perf: Optimize perf_cgroup_switch()
  perf/x86/amd/uncore: Add memory controller support
  perf/x86/amd/uncore: Add group exclusivity
  perf/x86/amd/uncore: Use rdmsr if rdpmc is unavailable
  perf/x86/amd/uncore: Move discovery and registration
  perf/x86/amd/uncore: Refactor uncore management
  perf/core: Allow reading package events from perf_event_read_local
  perf/x86/cstate: Allow reading the package statistics from local CPU
  perf/x86/intel/pt: Fix kernel-doc comments
  perf/x86/rapl: Annotate 'struct rapl_pmus' with __counted_by
  perf/core: Rename perf_proc_update_handler() -> perf_event_max_sample_rate_handler(), for readability
  perf/x86/rapl: Fix "Using plain integer as NULL pointer" Sparse warning
  perf/x86/rapl: Use local64_try_cmpxchg in rapl_event_update()
  perf/x86/rapl: Stop doing cpu_relax() in the local64_cmpxchg() loop in rapl_event_update()
  perf/core: Bail out early if the request AUX area is out of bound
  perf/x86/intel: Extend the ref-cycles event to GP counters
  perf/x86/intel: Fix broken fixed event constraints extension
  ...
2023-10-30 13:44:35 -10:00
Linus Torvalds
ca2e9c3bee - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when
virtualization support is disabled in the BIOS on AMD and Hygon
   platforms
 
 - A minor cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmU77KoACgkQEsHwGGHe
 VUrophAAtfsB+WhRydin0V6kjQeH+RbiWyx/jOw6eNqvzOzaOPxVXn0cAHRSgAO4
 +S8tKIqaWpXNNNKpOIKBVaDkh9qr50/p36/jfVkXi8GOLYrK633F0BMjcG4+/vYQ
 A9b5iNiJhZ7xWE6+qRrqdg+o+a6UyPUGz34HNp3KwJVTdaHU2OnXXwuWeiUkgRrJ
 uQSfLc4+UIeefIzNy8Tqg083iaENBYMya7U90rzewD64NF0bsA15AEPut/6tnUVq
 ej3UU3cqO7nKXyhuZX+zpt856MZFa1rNYVXUAfoAO4xhqdN0Q5LFWO506sqajNx/
 hqbT+hKDoC03zuLmbZO21s/uWQdtVFo63FU0h9QBRp1m6Ug5P3rQQCK8ydJc5xwr
 Yd7je6UPK9jIKBo9VP1qmsyzGwADNevNf1qGExHI2T6Wml7HgDmPysAHnGiKqRGI
 1o9+Yqa+VBt8Wml9M8Ny+dLyr5F/2uq8sMrQedQlXdFMSzVm2JYecukJ5BvUWE/r
 Qyll8mTpIdgGXjBt56lMrgH7ibMC5ct/4MvTHOHuA997g/PwuwtWj7QyKXpUq2Rf
 o/c3zKKWIFxevjzwU86haCBaz+5xAQlB6dJw61ExxsmUuT/kZzkN15w6aqGZtpns
 PsARwnvuwZJ7vfqFLIa0ZkPN4OgnkRX7HlNqrVyKpONDTocZd9E=
 =i9On
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when
   virtualization support is disabled in the BIOS on AMD and Hygon
   platforms

 - A minor cleanup

* tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Remove redundant 'break' statement
  x86/cpu: Clear SVM feature if disabled by BIOS
2023-10-30 12:10:24 -10:00
Kan Liang
3374491619 perf/x86/intel: Support branch counters logging
The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. It can
provide a means to attribute exposed retirement latency to combinations
of events across a block of instructions. It also provides a means of
attributing Timed LBR latencies to events.

The feature is first introduced on SRF/GRR. It is an enhancement of the
ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences
of events on the GP counters. The information is displayed by the order
of counters.

The design proposed in this patch requires that the events which are
logged must be in a group with the event that has LBR. If there are
more than one LBR group, the counters logging information only from the
current group (overflowed) are stored for the perf tool, otherwise the
perf tool cannot know which and when other groups are scheduled
especially when multiplexing is triggered. The user can ensure it uses
the maximum number of counters that support LBR info (4 by now) by
making the group large enough.

The HW only logs events by the order of counters. The order may be
different from the order of enabling which the perf tool can understand.
When parsing the information of each branch entry, convert the counter
order to the enabled order, and store the enabled order in the extension
space.

Unconditionally reset LBRs for an LBR event group when it's deleted. The
logged counter information is only valid for the current LBR group. If
another LBR group is scheduled later, the information from the stale
LBRs would be otherwise wrongly interpreted.

Add a sanity check in intel_pmu_hw_config(). Disable the feature if other
counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode
is enabled. (For the LBR call stack mode, we cannot simply flush the
LBR, since it will break the call stack. Also, there is no obvious usage
with the call stack mode for now.)

Only applying the PERF_SAMPLE_BRANCH_COUNTERS doesn't require any branch
stack setup.

Expose the maximum number of supported counters and the width of the
counters into the sysfs. The perf tool can use the information to parse
the logged counters in each branch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-5-kan.liang@linux.intel.com
2023-10-27 15:05:11 +02:00
Maciej S. Szmigiero
2770d47220 KVM: x86: Ignore MSR_AMD64_TW_CFG access
Hyper-V enabled Windows Server 2022 KVM VM cannot be started on Zen1 Ryzen
since it crashes at boot with SYSTEM_THREAD_EXCEPTION_NOT_HANDLED +
STATUS_PRIVILEGED_INSTRUCTION (in other words, because of an unexpected #GP
in the guest kernel).

This is because Windows tries to set bit 8 in MSR_AMD64_TW_CFG and can't
handle receiving a #GP when doing so.

Give this MSR the same treatment that commit 2e32b71906
("x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs") gave
MSR_AMD64_BU_CFG2 under justification that this MSR is baremetal-relevant
only.
Although apparently it was then needed for Linux guests, not Windows as in
this case.

With this change, the aforementioned guest setup is able to finish booting
successfully.

This issue can be reproduced either on a Summit Ridge Ryzen (with
just "-cpu host") or on a Naples EPYC (with "-cpu host,stepping=1" since
EPYC is ordinarily stepping 2).

Alternatively, userspace could solve the problem by using MSR filters, but
forcing every userspace to define a filter isn't very friendly and doesn't
add much, if any, value.  The only potential hiccup is if one of these
"baremetal-only" MSRs ever requires actual emulation and/or has F/M/S
specific behavior.  But if that happens, then KVM can still punt *that*
handling to userspace since userspace MSR filters "win" over KVM's default
handling.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1ce85d9c7c9e9632393816cf19c902e0a3f411f1.1697731406.git.maciej.szmigiero@oracle.com
[sean: call out MSR filtering alternative]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-19 10:55:14 -07:00
Borislav Petkov
deedec0a15 x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations
The comments introduced in <asm/msr-index.h> in the merge conflict fixup in:

  8f4156d587 ("Merge branch 'x86/urgent' into perf/core, to resolve conflict")

... aren't right: AMD naming schemes are more complex than implied,
family 0x17 is Zen1 and 2, family 0x19 is spread around Zen 3 and 4.

So there's indeed four separate MSR namespaces for:

  MSR_F17H_
  MSR_F19H_
  MSR_ZEN2_
  MSR_ZEN4_

... and the namespaces cannot be merged.

Fix it up. No change in functionality.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com
2023-10-12 20:10:39 +02:00
Ingo Molnar
8f4156d587 Merge branch 'x86/urgent' into perf/core, to resolve conflict
Resolve an MSR enumeration conflict.

Conflicts:
	arch/x86/include/asm/msr-index.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-11 22:54:03 +02:00
Borislav Petkov (AMD)
f454b18e07 x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs
Fix erratum #1485 on Zen4 parts where running with STIBP disabled can
cause an #UD exception. The performance impact of the fix is negligible.

Reported-by: René Rebe <rene@exactcode.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: René Rebe <rene@exactcode.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com
2023-10-11 11:00:11 +02:00
Sandipan Das
25e5684782 perf/x86/amd/uncore: Add memory controller support
Unified Memory Controller (UMC) events were introduced with Zen 4 as a
part of the Performance Monitoring Version 2 (PerfMonV2) enhancements.
An event is specified using the EventSelect bits and the RdWrMask bits
can be used for additional filtering of read and write requests.

As of now, a maximum of 12 channels of DDR5 are available on each socket
and each channel is controlled by a dedicated UMC. Each UMC, in turn,
has its own set of performance monitoring counters.

Since the MSR address space for the UMC PERF_CTL and PERF_CTR registers
are reused across sockets, uncore groups are created on the basis of
socket IDs. Hence, group exclusivity is mandatory while opening events
so that events for an UMC can only be opened on CPUs which are on the
same socket as the corresponding memory channel.

For each socket, the total number of available UMC counters and active
memory channels are determined from CPUID leaf 0x80000022 EBX and ECX
respectively. Usually, on Zen 4, each UMC has four counters.

MSR assignments are determined on the basis of active UMCs. E.g. if
UMCs 1, 4 and 9 are active for a given socket, then

  * UMC 1 gets MSRs 0xc0010800 to 0xc0010807 as PERF_CTLs and PERF_CTRs
  * UMC 4 gets MSRs 0xc0010808 to 0xc001080f as PERF_CTLs and PERF_CTRs
  * UMC 9 gets MSRs 0xc0010810 to 0xc0010817 as PERF_CTLs and PERF_CTRs

If there are sockets without any online CPUs when the amd_uncore driver
is loaded, UMCs for such sockets will not be discoverable since the
mechanism relies on executing the CPUID instruction on an online CPU
from the socket.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/b25f391205c22733493abec1ed850b71784edc5f.1696425185.git.sandipan.das@amd.com
2023-10-09 16:12:25 +02:00
Jithu Joseph
97a5e801b3
platform/x86/intel/ifs: Store IFS generation number
IFS generation number is reported via MSR_INTEGRITY_CAPS.  As IFS
support gets added to newer CPUs, some differences are expected during
IFS image loading and test flows.

Define MSR bitmasks to extract and store the generation in driver data,
so that driver can modify its MSR interaction appropriately.

Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Link: https://lore.kernel.org/r/20231005195137.3117166-2-jithu.joseph@intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-10-06 13:05:13 +03:00
Paolo Bonzini
7deda2ce5b x86/cpu: Clear SVM feature if disabled by BIOS
When SVM is disabled by BIOS, one cannot use KVM but the
SVM feature is still shown in the output of /proc/cpuinfo.
On Intel machines, VMX is cleared by init_ia32_feat_ctl(),
so do the same on AMD and Hygon processors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230921114940.957141-1-pbonzini@redhat.com
2023-09-22 10:55:26 +02:00
Linus Torvalds
64094e7e31 Mitigate Gather Data Sampling issue
* Add Base GDS mitigation
  * Support GDS_NO under KVM
  * Fix a documentation typo
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTJh5YACgkQaDWVMHDJ
 krAzAw/8DzjhAYEa7a1AodCBMNg8uNOPnLNoRPPNhaN5Iw6W3zXYDBDKT9PyjAIx
 RoIM0aHx/oY9nCpK441o25oCWAAyzk6E5/+q9hMa7B4aHUGKqiDUC6L9dC8UiiSN
 yvoBv4g7F81QnmyazwYI64S6vnbr4Cqe7K/mvVqQ/vbJiugD25zY8mflRV9YAuMk
 Oe7Ff/mCA+I/kqyKhJE3cf3qNhZ61FsFI886fOSvIE7g4THKqo5eGPpIQxR4mXiU
 Ri2JWffTaeHr2m0sAfFeLH4VTZxfAgBkNQUEWeG6f2kDGTEKibXFRsU4+zxjn3gl
 xug+9jfnKN1ceKyNlVeJJZKAfr2TiyUtrlSE5d+subIRKKBaAGgnCQDasaFAluzd
 aZkOYz30PCebhN+KTrR84FySHCaxnev04jqdtVGAQEDbTvyNagFUdZFGhWijJShV
 l2l4A0gFSYJmPfPVuuAwOJnnZtA1sRH9oz/Sny3+z9BKloZh+Nc/+Cu9zC8SLjaU
 BF3Qv2gU9HKTJ+MSy2JrGS52cONfpO5ngFHoOMilZ1KBHrfSb1eiy32PDT+vK60Y
 PFEmI8SWl7bmrO1snVUCfGaHBsHJSu5KMqwBGmM4xSRzJpyvRe493xC7+nFvqNLY
 vFOFc4jGeusOXgiLPpfGduppkTGcM7sy75UMLwTSLcQbDK99mus=
 =ZAPY
 -----END PGP SIGNATURE-----

Merge tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86/gds fixes from Dave Hansen:
 "Mitigate Gather Data Sampling issue:

   - Add Base GDS mitigation

   - Support GDS_NO under KVM

   - Fix a documentation typo"

* tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Fix backwards on/off logic about YMM support
  KVM: Add GDS_NO support to KVM
  x86/speculation: Add Kconfig option for GDS
  x86/speculation: Add force option to GDS mitigation
  x86/speculation: Add Gather Data Sampling mitigation
2023-08-07 17:03:54 -07:00
Linus Torvalds
138bcddb86 Add a mitigation for the speculative RAS (Return Address Stack) overflow
vulnerability on AMD processors. In short, this is yet another issue
 where userspace poisons a microarchitectural structure which can then be
 used to leak privileged information through a side channel.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmTQs1gACgkQEsHwGGHe
 VUo1UA/8C34PwJveZDcerdkaxSF+WKx7AjOI/L2ws1qn9YVFA3ItFMgVuFTrlY6c
 1eYKYB3FS9fVN3KzGOXGyhho6seHqfY0+8cyYupR+PVLn9rSy7GqHaIMr37FdQ2z
 yb9xu26v+gsvuPEApazS6MxijYS98u71rHhmg97qsHCnUiMJ01+TaGucntukNJv8
 FfwjZJvgeUiBPQ/6IeA/O0413tPPJ9weawPyW+sV1w7NlXjaUVkNXwiq/Xxbt9uI
 sWwMBjFHpSnhBRaDK8W5Blee/ZfsS6qhJ4jyEKUlGtsElMnZLPHbnrbpxxqA9gyE
 K+3ZhoHf/W1hhvcZcALNoUHLx0CvVekn0o41urAhPfUutLIiwLQWVbApmuW80fgC
 DhPedEFu7Wp6Okj5+Bqi/XOsOOWN2WRDSzdAq10o1C+e+fzmkr6y4E6gskfz1zXU
 ssD9S4+uAJ5bccS5lck4zLffsaA03nAYTlvl1KRP4pOz5G9ln6eyO20ar1WwfGAV
 o5ZsTJVGQMyVA49QFkksj+kOI3chkmDswPYyGn2y8OfqYXU4Ip4eN+VkjorIAo10
 zIec3Z0bCGZ9UUMylUmdtH3KAm8q0wVNoFrUkMEmO8j6nn7ew2BhwLMn4uu+nOnw
 lX2AG6PNhRLVDVaNgDsWMwejaDsitQPoWRuCIAZ0kQhbeYuwfpM=
 =73JY
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86/srso fixes from Borislav Petkov:
 "Add a mitigation for the speculative RAS (Return Address Stack)
  overflow vulnerability on AMD processors.

  In short, this is yet another issue where userspace poisons a
  microarchitectural structure which can then be used to leak privileged
  information through a side channel"

* tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/srso: Tie SBPB bit setting to microcode patch detection
  x86/srso: Add a forgotten NOENDBR annotation
  x86/srso: Fix return thunks in generated code
  x86/srso: Add IBPB on VMEXIT
  x86/srso: Add IBPB
  x86/srso: Add SRSO_NO support
  x86/srso: Add IBPB_BRTYPE support
  x86/srso: Add a Speculative RAS Overflow mitigation
  x86/bugs: Increase the x86 bugs vector size to two u32s
2023-08-07 16:35:44 -07:00
Borislav Petkov (AMD)
1b5277c0ea x86/srso: Add SRSO_NO support
Add support for the CPUID flag which denotes that the CPU is not
affected by SRSO.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:19 +02:00
Daniel Sneddon
8974eb5882 x86/speculation: Add Gather Data Sampling mitigation
Gather Data Sampling (GDS) is a hardware vulnerability which allows
unprivileged speculative access to data which was previously stored in
vector registers.

Intel processors that support AVX2 and AVX512 have gather instructions
that fetch non-contiguous data elements from memory. On vulnerable
hardware, when a gather instruction is transiently executed and
encounters a fault, stale data from architectural or internal vector
registers may get transiently stored to the destination vector
register allowing an attacker to infer the stale data using typical
side channel techniques like cache timing attacks.

This mitigation is different from many earlier ones for two reasons.
First, it is enabled by default and a bit must be set to *DISABLE* it.
This is the opposite of normal mitigation polarity. This means GDS can
be mitigated simply by updating microcode and leaving the new control
bit alone.

Second, GDS has a "lock" bit. This lock bit is there because the
mitigation affects the hardware security features KeyLocker and SGX.
It needs to be enabled and *STAY* enabled for these features to be
mitigated against GDS.

The mitigation is enabled in the microcode by default. Disable it by
setting gather_data_sampling=off or by disabling all mitigations with
mitigations=off. The mitigation status can be checked by reading:

    /sys/devices/system/cpu/vulnerabilities/gather_data_sampling

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-07-19 16:45:37 -07:00
Borislav Petkov (AMD)
522b1d6921 x86/cpu/amd: Add a Zenbleed fix
Add a fix for the Zen2 VZEROUPPER data corruption bug where under
certain circumstances executing VZEROUPPER can cause register
corruption or leak data.

The optimal fix is through microcode but in the case the proper
microcode revision has not been applied, enable a fallback fix using
a chicken bit.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-17 15:48:10 +02:00
Jithu Joseph
c68e3d4739 x86/include/asm/msr-index.h: Add IFS Array test bits
Define MSR bitfields for enumerating support for Array BIST test.

Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230322003359.213046-5-jithu.joseph@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-03-27 16:10:20 +02:00
Linus Torvalds
877934769e - Cache the AMD debug registers in per-CPU variables to avoid MSR writes
where possible, when supporting a debug registers swap feature for
   SEV-ES guests
 
 - Add support for AMD's version of eIBRS called Automatic IBRS which is
   a set-and-forget control of indirect branch restriction speculation
   resources on privilege change
 
 - Add support for a new x86 instruction - LKGS - Load kernel GS which is
   part of the FRED infrastructure
 
 - Reset SPEC_CTRL upon init to accomodate use cases like kexec which
   rediscover
 
 - Other smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmP1RDIACgkQEsHwGGHe
 VUohBw//ZB9ZRqsrKdm6D9YaP2x4Zb+kqKqo6rjYeWaYqyPyCwDujPwh+pb3Oq1t
 aj62muDv1t/wEJc8mKNkfXkjEEtBVAOcpb5YIpKreoEvNKyevol83Ih0u5iJcTRE
 E5qf8HDS8b/JZrcazJJLl6WQmQNH5RiKSu5bbCpRhoeOcyo5pRYR5MztK9vNmAQk
 GMdwHsUSU+jN8uiE4HnpaOb/luhgFindRwZVTpdjJegQWLABS8cl3CKeTv4+PW45
 isvv37XnQP248wsptIEVRHeG6g3g/HtvwRx7DikUw06QwUyUK7H9hJssOoSP8TL9
 u4psRwfWnJ1OxU6klL+s0Ii+pjQ97wXmK/oqK7QkdUwhWqR/mQAW2e9kWHAngyDn
 A6mKbzSM6HFAeSXQpB9cMb6uvYRD44SngDFe3WXtEK8jiiQ70ikUm4E28I5KJOPg
 s+RyioHk0NFRHYSOOBqNG1NKz6ED7L3GbgbbzxkgMh21AAyI3X351t+PtGoLV5ew
 eqOsM7lbg9Scg1LvPk1JcoALS8USWqgar397rz9qGUs+OkPWBtEBCmTdMz/Eb+2t
 g/WHdLS5/ajSs5gNhT99W3DeqZMPDEkgBRSeyBBmY3CUD3gBL2wXEktRXv504zBR
 RC4oyUPX3c9E2ib6GATLE3kBLbcz9hTWbMxF+X3lLJvTVd/Qc2o=
 =v/ZC
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Cache the AMD debug registers in per-CPU variables to avoid MSR
   writes where possible, when supporting a debug registers swap feature
   for SEV-ES guests

 - Add support for AMD's version of eIBRS called Automatic IBRS which is
   a set-and-forget control of indirect branch restriction speculation
   resources on privilege change

 - Add support for a new x86 instruction - LKGS - Load kernel GS which
   is part of the FRED infrastructure

 - Reset SPEC_CTRL upon init to accomodate use cases like kexec which
   rediscover

 - Other smaller fixes and cleanups

* tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd: Cache debug register values in percpu variables
  KVM: x86: Propagate the AMD Automatic IBRS feature to the guest
  x86/cpu: Support AMD Automatic IBRS
  x86/cpu, kvm: Add the SMM_CTL MSR not present feature
  x86/cpu, kvm: Add the Null Selector Clears Base feature
  x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf
  x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature
  KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code
  x86/cpu, kvm: Add support for CPUID_80000021_EAX
  x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h>
  x86/gsseg: Use the LKGS instruction if available for load_gs_index()
  x86/gsseg: Move load_gs_index() to its own new header file
  x86/gsseg: Make asm_load_gs_index() take an u16
  x86/opcode: Add the LKGS instruction to x86-opcode-map
  x86/cpufeature: Add the CPU feature bit for LKGS
  x86/bugs: Reset speculation control settings on init
  x86/cpu: Remove redundant extern x86_read_arch_cap_msr()
2023-02-21 14:51:40 -08:00
Linus Torvalds
aa8c3db40a - Add support for a new AMD feature called slow memory bandwidth
allocation.  Its goal is to control resource allocation in external slow
 memory which is connected to the machine like for example through CXL devices,
 accelerators etc
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzmf4ACgkQEsHwGGHe
 VUppKg//Tq+lHaMYO8aTvk4jgqbR9RVXJwPbtEOp2C0kSLs5QxBms/o21IXnxJ07
 tdbIGOrfJGlbzSWP8ywysRRQwpKlwltWUVAjMOFqEfzEURLL042qtHZ8nxGKSGrc
 IZFJLNTMyx1Zyjc7e9A/hANCOoQFoPHT8zHf1CNNo1LtzgHzNZG6kggLHh5tRKSz
 Xi7wFbYBtmttsyIA/iAQjYAU0O9MnmdnktUb7XdPSFtTIZ3Nyw90We4gwYueEPzD
 S/rQHKr8V7ROZMHXQ/BWpVWdcxGoHD8acUSVq8j20KW3W9/H8KL9TRVakvnf0aRW
 g0efxKXdTjTRO49GgD7FUL8x1JdAOXeZwQYDzKPqW/GRESRdpOvsaMwcLDCEpIXK
 PmEOVReklokJF0btFqaVYkY6wGE2KLKmp97g/RffuHdIeIomwI9lTpy9kyQsKakc
 yJ+VsE85BlBEVkHNt49qFClO1L98G3IgZTTt6//EGv0EJl8pELfsddsbjG5uXun+
 xFhr2i7gllQcV4B4HSFFdYRBLvZYnTfKlNR7Hs9pRJT7V28Jv2GURiCHBm4sRv9O
 k3FX7sxytH2syBBwU1NNrMRMo+KgjVZurJwiHpTRbb39K6uCgLk/wbXfWh2SovW1
 BRItz2T6LFu4bo6WIhakx31pNmH94P8vC0acO8LHECVji7qvXFM=
 =8hmj
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

 - Add support for a new AMD feature called slow memory bandwidth
   allocation. Its goal is to control resource allocation in external
   slow memory which is connected to the machine like for example
   through CXL devices, accelerators etc

* tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix a silly -Wunused-but-set-variable warning
  Documentation/x86: Update resctrl.rst for new features
  x86/resctrl: Add interface to write mbm_local_bytes_config
  x86/resctrl: Add interface to write mbm_total_bytes_config
  x86/resctrl: Add interface to read mbm_local_bytes_config
  x86/resctrl: Add interface to read mbm_total_bytes_config
  x86/resctrl: Support monitor configuration
  x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
  x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  x86/resctrl: Include new features in command line options
  x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
2023-02-21 08:38:45 -08:00
Linus Torvalds
a2f0e7eee1 The latest perf updates in this cycle are:
- Optimize perf_sample_data layout
  - Prepare sample data handling for BPF integration
  - Update the x86 PMU driver for Intel Meteor Lake
  - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
    discovery breakage
  - Fix the x86 Zhaoxin PMU driver
  - Cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzaHgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jYQg/+KRfobCevMQlZVnz09T3SsJ4ahJ587BL6
 g2C6kobyUNfeChpFVroBkTR+yCb6Mq4xGr2nda9+2E978BYu9eanpx/u/bXNQ6NU
 6YhLwgRrlFXonYn07kFfUJeELZ0W+zpPvymEN1KhTQWcrgXDfXRt2VfMwNsVxGRF
 ZRyCWK+UOzSMU22FtW3I/xVLBB0vio9Y6wRC5QOpDVW5YtGwQGust7GJ53JPK43J
 m2soJvWORauT+v0aqc7ggOtKd6pahVoXrDrbktxtq9N0ZGI+PubVCGevex++cXm/
 B3QSf6VcMMuU6pfzxiEwRa8Whrc3XFeSDEfvMjC5v3becGNkdNBnGOJzYprwgRZJ
 irb6/dSrv5P2lj6WphsO1Wzcm7EoWh8M7DVOMh/13Y/oODRdOrv48112Don9UURC
 EPyvzAzizqdwdDopUmfiqUwuAXqb8uPZqCgmlz/NJkVz1/ijlfrmLgeDuf0vI7Aq
 HznzzRwjFHzyCH7D+rtonFh3JDaqgaouY76tpC5yTtzKbZPlFT8kzeCvqkTMnGgH
 czZnSNc/kBup0HDkNSlthK+TyrMXWKeVa8KQSY1E0NJHO4IBBCMzZywSoAaeofQK
 hqfQyofX9XHmuHhCA4yIfv1XkZGlBTxpPAyDdHjgs9iJTsodSYMs8ESY08eW8DXn
 Ld/35O6SylM=
 =ztUT
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:

 - Optimize perf_sample_data layout

 - Prepare sample data handling for BPF integration

 - Update the x86 PMU driver for Intel Meteor Lake

 - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
   discovery breakage

 - Fix the x86 Zhaoxin PMU driver

 - Cleanups

* tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  perf/x86/intel/uncore: Add Meteor Lake support
  x86/perf/zhaoxin: Add stepping check for ZXC
  perf/x86/intel/ds: Fix the conversion from TSC to perf time
  perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
  perf/x86/uncore: Add a quirk for UPI on SPR
  perf/x86/uncore: Ignore broken units in discovery table
  perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
  perf/x86/uncore: Factor out uncore_device_to_die()
  perf/core: Call perf_prepare_sample() before running BPF
  perf/core: Introduce perf_prepare_header()
  perf/core: Do not pass header for sample ID init
  perf/core: Set data->sample_flags in perf_prepare_sample()
  perf/core: Add perf_sample_save_brstack() helper
  perf/core: Add perf_sample_save_raw_data() helper
  perf/core: Add perf_sample_save_callchain() helper
  perf/core: Save the dynamic parts of sample data size
  x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation
  perf/core: Change the layout of perf_sample_data
  perf/x86/msr: Add Meteor Lake support
  perf/x86/cstate: Add Meteor Lake support
  ...
2023-02-20 17:29:55 -08:00
Kim Phillips
e7862eda30 x86/cpu: Support AMD Automatic IBRS
The AMD Zen4 core supports a new feature called Automatic IBRS.

It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS,
h/w manages its IBRS mitigation resources automatically across CPL transitions.

The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by
setting MSR C000_0080 (EFER) bit 21.

Enable Automatic IBRS by default if the CPU feature is present.  It typically
provides greater performance over the incumbent generic retpolines mitigation.

Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum.  AMD Automatic IBRS and
Intel Enhanced IBRS have similar enablement.  Add NO_EIBRS_PBRSB to
cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS.

The kernel command line option spectre_v2=eibrs is used to select AMD Automatic
IBRS, if available.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-8-kim.phillips@amd.com
2023-01-25 17:16:01 +01:00
Babu Moger
dc2a3e8579 x86/resctrl: Add interface to read mbm_total_bytes_config
The event configuration can be viewed by the user by reading the
configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.  The
event configuration settings are domain specific and will affect all the CPUs in
the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

By default, the mbm_total_bytes_config is set to 0x7f to count all the
event types.

For example:

  $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
  0=0x7f;1=0x7f;2=0x7f;3=0x7f

In this case, the event mbm_total_bytes is configured with 0x7f on
domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-10-babu.moger@amd.com
2023-01-23 17:40:24 +01:00
Babu Moger
5b6fac3fa4 x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
The QoS slow memory configuration details are available via
CPUID_Fn80000020_EDX_x02. Detect the available details and
initialize the rest to defaults.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-7-babu.moger@amd.com
2023-01-23 17:38:44 +01:00
Nikunj A Dadhania
8c29f01654 x86/sev: Add SEV-SNP guest feature negotiation support
The hypervisor can enable various new features (SEV_FEATURES[1:63]) and start a
SNP guest. Some of these features need guest side implementation. If any of
these features are enabled without it, the behavior of the SNP guest will be
undefined.  It may fail booting in a non-obvious way making it difficult to
debug.

Instead of allowing the guest to continue and have it fail randomly later,
detect this early and fail gracefully.

The SEV_STATUS MSR indicates features which the hypervisor has enabled.  While
booting, SNP guests should ascertain that all the enabled features have guest
side implementation. In case a feature is not implemented in the guest, the
guest terminates booting with GHCB protocol Non-Automatic Exit(NAE) termination
request event, see "SEV-ES Guest-Hypervisor Communication Block Standardization"
document (currently at https://developer.amd.com/wp-content/resources/56421.pdf),
section "Termination Request".

Populate SW_EXITINFO2 with mask of unsupported features that the hypervisor can
easily report to the user.

More details in the AMD64 APM Vol 2, Section "SEV_STATUS MSR".

  [ bp:
    - Massage.
    - Move snp_check_features() call to C code.
    Note: the CC:stable@ aspect here is to be able to protect older, stable
    kernels when running on newer hypervisors. Or not "running" but fail
    reliably and in a well-defined manner instead of randomly. ]

Fixes: cbd3d4f7c4 ("x86/sev: Check SEV-SNP features support")
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230118061943.534309-1-nikunj@amd.com
2023-01-19 17:29:58 +01:00
Breno Leitao
0125acda7d x86/bugs: Reset speculation control settings on init
Currently, x86_spec_ctrl_base is read at boot time and speculative bits
are set if Kconfig items are enabled. For example, IBRS is enabled if
CONFIG_CPU_IBRS_ENTRY is configured, etc. These MSR bits are not cleared
if the mitigations are disabled.

This is a problem when kexec-ing a kernel that has the mitigation
disabled from a kernel that has the mitigation enabled. In this case,
the MSR bits are not cleared during the new kernel boot. As a result,
this might have some performance degradation that is hard to pinpoint.

This problem does not happen if the machine is (hard) rebooted because
the bit will be cleared by default.

  [ bp: Massage. ]

Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221128153148.1129350-1-leitao@debian.org
2023-01-12 11:37:01 +01:00
Kan Liang
38aaf921e9 perf/x86: Add Meteor Lake support
From PMU's perspective, Meteor Lake is similar to Alder Lake. Both are
hybrid platforms, with e-core and p-core.

The key differences include:
- The e-core supports 2 PDIST GP counters (GP0 & GP1)
- New MSRs for the Module Snoop Response Events on the e-core.
- New Data Source fields are introduced for the e-core.
- There are 8 GP counters for the e-core.
- The load latency AUX event is not required for the p-core anymore.
- Retire Latency (Support in a separate patch) for both cores.

Since most of the code in the intel_pmu_init() should be the same as
Alder Lake, to avoid code duplication, share the path with Alder Lake.

Add new specific functions of extra_regs, and get_event_constraints
to support the OCR events, Module Snoop Response Events and 2 PDIST
GP counters on e-core.

Add new MTL specific mem_attrs which drops the load latency AUX event.

The Data Source field is extended to 4:0, which can contains max 32
sources.

The Retire Latency is implemented with a separate patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230104201349.1451191-2-kan.liang@linux.intel.com
2023-01-09 12:22:07 +01:00
Linus Torvalds
3ef3ace4e2 - Split MTRR and PAT init code to accomodate at least Xen PV and TDX
guests which do not get MTRRs exposed but only PAT. (TDX guests do not
 support the cache disabling dance when setting up MTRRs so they fall
 under the same category.) This is a cleanup work to remove all the ugly
 workarounds for such guests and init things separately (Juergen Gross)
 
 - Add two new Intel CPUs to the list of CPUs with "normal" Energy
 Performance Bias, leading to power savings
 
 - Do not do bus master arbitration in C3 (ARB_DISABLE) on modern Centaur
 CPUs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOYhIMACgkQEsHwGGHe
 VUpxug//ZKw3hYFroKhsULJi/e0j2nGARiSlJrJcFHl2vgh9yGvDsnYUyM/rgjgt
 cM3uCLbEG7nA6uhB3nupzaXZ8lBM1nU9kiEl/kjQ5oYf9nmJ48fLttvWGfxYN4s3
 kj5fYVhlOZpntQXIWrwxnPqghUysumMnZmBJeKYiYNNfkj62l3xU2Ni4Gnjnp02I
 9MmUhl7pj1aEyOQfM8rovy+wtYCg5WTOmXVlyVN+b9MwfYeK+stojvCZHxtJs9BD
 fezpJjjG+78xKUC7vVZXCh1p1N5Qvj014XJkVl9Hg0n7qizKFZRtqi8I769G2ptd
 exP8c2nDXKCqYzE8vK6ukWgDANQPs3d6Z7EqUKuXOCBF81PnMPSUMyNtQFGNM6Wp
 S5YSvFfCgUjp50IunOpvkDABgpM+PB8qeWUq72UFQJSOymzRJg/KXtE2X+qaMwtC
 0i6VLXfMddGcmqNKDppfGtCjq2W5VrNIIJedtAQQGyl+pl3XzZeNomhJpm/0mVfJ
 8UrlXZeXl/EUQ7qk40gC/Ash27pU9ZDx4CMNMy1jDIQqgufBjEoRIDSFqQlghmZq
 An5/BqMLhOMxUYNA7bRUnyeyxCBypetMdQt5ikBmVXebvBDmArXcuSNAdiy1uBFX
 KD8P3Y1AnsHIklxkLNyZRUy7fb4mgMFenUbgc0vmbYHbFl0C0pQ=
 =Zmgh
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

 - Split MTRR and PAT init code to accomodate at least Xen PV and TDX
   guests which do not get MTRRs exposed but only PAT. (TDX guests do
   not support the cache disabling dance when setting up MTRRs so they
   fall under the same category)

   This is a cleanup work to remove all the ugly workarounds for such
   guests and init things separately (Juergen Gross)

 - Add two new Intel CPUs to the list of CPUs with "normal" Energy
   Performance Bias, leading to power savings

 - Do not do bus master arbitration in C3 (ARB_DISABLE) on modern
   Centaur CPUs

* tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  x86/mtrr: Make message for disabled MTRRs more descriptive
  x86/pat: Handle TDX guest PAT initialization
  x86/cpuid: Carve out all CPUID functionality
  x86/cpu: Switch to cpu_feature_enabled() for X86_FEATURE_XENPV
  x86/cpu: Remove X86_FEATURE_XENPV usage in setup_cpu_entry_area()
  x86/cpu: Drop 32-bit Xen PV guest code in update_task_stack()
  x86/cpu: Remove unneeded 64-bit dependency in arch_enter_from_user_mode()
  x86/cpufeatures: Add X86_FEATURE_XENPV to disabled-features.h
  x86/acpi/cstate: Optimize ARB_DISABLE on Centaur CPUs
  x86/mtrr: Simplify mtrr_ops initialization
  x86/cacheinfo: Switch cache_ap_init() to hotplug callback
  x86: Decouple PAT and MTRR handling
  x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()
  x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init
  x86/mtrr: Get rid of __mtrr_enabled bool
  x86/mtrr: Simplify mtrr_bp_init()
  x86/mtrr: Remove set_all callback from struct mtrr_ops
  x86/mtrr: Disentangle MTRR init from PAT init
  x86/mtrr: Move cache control code to cacheinfo.c
  x86/mtrr: Split MTRR-specific handling from cache dis/enabling
  ...
2022-12-13 14:56:56 -08:00
Linus Torvalds
287f037db5 Minor cleanups:
* Remove unnecessary arch_has_empty_bitmaps structure memory
  * Move rescrtl MSR defines into msr-index.h, like normal MSRs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYhsACgkQaDWVMHDJ
 krA75w//XmOC929XGMOY7WQL6IZlH62xsJbtb3BhmM24Ho7RHSNQGPD+ukArCb0u
 V/w50Q4crQrLsIxqWjXkyDQ7w66PvvsAIhYFBEV4kssRli9y173CzJQt/lQfUXL9
 T7vG5WY1n4f+vtvmZfwcFaGOPkZ5edp8v1y8Grk3r93ci2VDSk+yvEiq80c+JQoX
 ZnEYPxGPUpwAVuaysY8wkGCEc4Yln6gtTKzpVPXE18WAs82OeiCWBfldI/+95j3o
 /5r5asYQpD8bVhtLHi1mepkBAGbeVNWhSJVlOE9HdU9WnzCkNKn1ZXRuXSBlvTeq
 FPjg6vsBXuz8zQV4Dd3Jk3hWv3H/4sTWsgiyUFdHtz/VlE9M8NjGcE4caOgSuBqR
 2ovI/HwdvdYyiZwvNN0fXrnzEn1MliSXDgAscNuxzovJXqdTP2BpUj0SVlZdVs0U
 0xba5sZ5A6fh2SwKX7JQYYsEh4gudiixR+D2l5u7EUOiNyfw0DZgWi/ElpvX4ncy
 QvDIIqlm29A/VkJQAdSHJc0ew+w39M7f3VNfQviLXxudGFuhrg+kXlI1UYGcX/cH
 4LEjmE1KCymmq7v+7+zBrHwsVCxr5mi/CZnx+/4Y/2O+xOKJ1U7GQDXWzu/SC+aF
 tEwqDCldYKjqrfdkmuGXSt2YipkNOC2EBLY32mW7rtTDDIXPSro=
 =n2UE
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cache resource control updates from Dave Hansen:
 "These declare the resource control (rectrl) MSRs a bit more normally
  and clean up an unnecessary structure member:

   - Remove unnecessary arch_has_empty_bitmaps structure memory

   - Move rescrtl MSR defines into msr-index.h, like normal MSRs"

* tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Move MSR defines into msr-index.h
  x86/resctrl: Remove arch_has_empty_bitmaps
2022-12-12 14:30:54 -08:00
Borislav Petkov
97fa21f65c x86/resctrl: Move MSR defines into msr-index.h
msr-index.h should contain all MSRs for easier grepping for MSR numbers
when dealing with unchecked MSR access warnings, for example.

Move the resctrl ones. Prefix IA32_PQR_ASSOC with "MSR_" while at it.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221106212923.20699-1-bp@alien8.de
2022-11-27 23:00:45 +01:00
Borislav Petkov
2632daebaf x86/cpu: Restore AMD's DE_CFG MSR after resume
DE_CFG contains the LFENCE serializing bit, restore it on resume too.
This is relevant to older families due to the way how they do S3.

Unify and correct naming while at it.

Fixes: e4d0e84e49 ("x86/cpu/AMD: Make LFENCE a serializing instruction")
Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Reported-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-11-15 10:15:58 -08:00
Srinivas Pandruvada
7420ae3bb9 x86/intel_epb: Set Alder Lake N and Raptor Lake P normal EPB
Intel processors support additional software hint called EPB ("Energy
Performance Bias") to guide the hardware heuristic of power management
features to favor increasing dynamic performance or conserve energy
consumption.

Since this EPB hint is processor specific, the same value of hint can
result in different behavior across generations of processors.

commit 4ecc933b7d ("x86: intel_epb: Allow model specific normal EPB
value")' introduced capability to update the default power up EPB
based on the CPU model and updated the default EPB to 7 for Alder Lake
mobile CPUs.

The same change is required for other Alder Lake-N and Raptor Lake-P
mobile CPUs as the current default of 6 results in higher uncore power
consumption. This increase in power is related to memory clock
frequency setting based on the EPB value.

Depending on the EPB the minimum memory frequency is set by the
firmware. At EPB = 7, the minimum memory frequency is 1/4th compared to
EPB = 6. This results in significant power saving for idle and
semi-idle workload on a Chrome platform.

For example Change in power and performance from EPB change from 6 to 7
on Alder Lake-N:

Workload    Performance diff (%)    power diff
----------------------------------------------------
VP9 FHD30	0 (FPS)		-218 mw
Google meet	0 (FPS)		-385 mw

This 200+ mw power saving is very significant for mobile platform for
battery life and thermal reasons.

But as the workload demands more memory bandwidth, the memory frequency
will be increased very fast. There is no power savings for such busy
workloads.

For example:

Workload		Performance diff (%) from EPB 6 to 7
-------------------------------------------------------
Speedometer 2.0		-0.8
WebGL Aquarium 10K
Fish    		-0.5
Unity 3D 2018		0.2
WebXPRT3		-0.5

There are run to run variations for performance scores for
such busy workloads. So the difference is not significant.

Add a new define ENERGY_PERF_BIAS_NORMAL_POWERSAVE for EPB 7
and use it for Alder Lake-N and Raptor Lake-P mobile CPUs.

This modification is done originally by
Jeremy Compostella <jeremy.compostella@intel.com>.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/all/20221027220056.1534264-1-srinivas.pandruvada%40linux.intel.com
2022-11-03 11:31:01 -07:00
Linus Torvalds
3871d93b82 Perf events updates for v6.1:
- PMU driver updates:
 
      - Add AMD Last Branch Record Extension Version 2 (LbrExtV2)
        feature support for Zen 4 processors.
 
      - Extend the perf ABI to provide branch speculation information,
        if available, and use this on CPUs that have it (eg. LbrExtV2).
 
      - Improve Intel PEBS TSC timestamp handling & integration.
 
      - Add Intel Raptor Lake S CPU support.
 
      - Add 'perf mem' and 'perf c2c' memory profiling support on
        AMD CPUs by utilizing IBS tagged load/store samples.
 
      - Clean up & optimize various x86 PMU details.
 
  - HW breakpoints:
 
      - Big rework to optimize the code for systems with hundreds of CPUs and
        thousands of breakpoints:
 
         - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem
 	  per-CPU rwsem that is read-locked during most of the key operations.
 
 	- Improve the O(#cpus * #tasks) logic in toggle_bp_slot()
 	  and fetch_bp_busy_slots().
 
 	- Apply micro-optimizations & cleanups.
 
   - Misc cleanups & enhancements.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM/2pMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIMA/+J+MCEVTt9kwZeBtHoPX7iZ5gnq1+McoQ
 f6ALX19AO/ZSuA7EBA3cS3Ny5eyGy3ofYUnRW+POezu9CpflLW/5N27R2qkZFrWC
 A09B86WH676ZrmXt+oI05rpZ2y/NGw4gJxLLa4/bWF3g9xLfo21i+YGKwdOnNFpl
 DEdCVHtjlMcOAU3+on6fOYuhXDcYd7PKGcCfLE7muOMOAtwyj0bUDBt7m+hneZgy
 qbZHzDU2DZ5L/LXiMyuZj5rC7V4xUbfZZfXglG38YCW1WTieS3IjefaU2tREhu7I
 rRkCK48ULDNNJR3dZK8IzXJRxusq1ICPG68I+nm/K37oZyTZWtfYZWehW/d/TnPa
 tUiTwimabz7UUqaGq9ZptxwINcAigax0nl6dZ3EseeGhkDE6j71/3kqrkKPz4jth
 +fCwHLOrI3c4Gq5qWgPvqcUlUneKf3DlOMtzPKYg7sMhla2zQmFpYCPzKfm77U/Z
 BclGOH3FiwaK6MIjPJRUXTePXqnUseqCR8PCH/UPQUeBEVHFcMvqCaa15nALed8x
 dFi76VywR9mahouuLNq6sUNePlvDd2B124PygNwegLlBfY9QmKONg9qRKOnQpuJ6
 UprRJjLOOucZ/N/jn6+ShHkqmXsnY2MhfUoHUoMQ0QAI+n++e+2AuePo251kKWr8
 QlqKxd9PMQU=
 =LcGg
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:
 "PMU driver updates:

   - Add AMD Last Branch Record Extension Version 2 (LbrExtV2) feature
     support for Zen 4 processors.

   - Extend the perf ABI to provide branch speculation information, if
     available, and use this on CPUs that have it (eg. LbrExtV2).

   - Improve Intel PEBS TSC timestamp handling & integration.

   - Add Intel Raptor Lake S CPU support.

   - Add 'perf mem' and 'perf c2c' memory profiling support on AMD CPUs
     by utilizing IBS tagged load/store samples.

   - Clean up & optimize various x86 PMU details.

  HW breakpoints:

   - Big rework to optimize the code for systems with hundreds of CPUs
     and thousands of breakpoints:

      - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem
        per-CPU rwsem that is read-locked during most of the key
        operations.

      - Improve the O(#cpus * #tasks) logic in toggle_bp_slot() and
        fetch_bp_busy_slots().

      - Apply micro-optimizations & cleanups.

  - Misc cleanups & enhancements"

* tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
  perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex
  perf: Fix pmu_filter_match()
  perf: Fix lockdep_assert_event_ctx()
  perf/x86/amd/lbr: Adjust LBR regardless of filtering
  perf/x86/utils: Fix uninitialized var in get_branch_type()
  perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
  perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  perf/x86/amd: Support PERF_SAMPLE_ADDR
  perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/x86/uncore: Add new Raptor Lake S support
  perf/x86/cstate: Add new Raptor Lake S support
  perf/x86/msr: Add new Raptor Lake S support
  perf/x86: Add new Raptor Lake S support
  bpf: Check flags for branch stack in bpf_read_branch_records helper
  perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails
  perf: Use sample_flags for raw_data
  perf: Use sample_flags for addr
  ...
2022-10-10 09:27:46 -07:00
Daniel Sneddon
b8d1d16360 x86/apic: Don't disable x2APIC if locked
The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC
(or x2APIC).  X2APIC mode is mostly compatible with legacy APIC, but
it disables the memory-mapped APIC interface in favor of one that uses
MSRs.  The APIC mode is controlled by the EXT bit in the APIC MSR.

The MMIO/xAPIC interface has some problems, most notably the APIC LEAK
[1].  This bug allows an attacker to use the APIC MMIO interface to
extract data from the SGX enclave.

Introduce support for a new feature that will allow the BIOS to lock
the APIC in x2APIC mode.  If the APIC is locked in x2APIC mode and the
kernel tries to disable the APIC or revert to legacy APIC mode a GP
fault will occur.

Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle
the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by
preventing the kernel from trying to disable the x2APIC.

On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are
enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS.  If
legacy APIC is required, then it SGX and TDX need to be disabled in the
BIOS.

[1]: https://aepicleak.com/aepicleak.pdf

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Link: https://lkml.kernel.org/r/20220816231943.1152579-1-daniel.sneddon@linux.intel.com
2022-08-31 14:34:11 -07:00
Sandipan Das
ca5b7c0d96 perf/x86/amd/lbr: Add LbrExtV2 branch record support
If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected,
enable it alongside LBR Freeze on PMI when an event requests branch stack
i.e. PERF_SAMPLE_BRANCH_STACK.

Each branch record is represented by a pair of registers, LBR From and LBR
To. The freeze feature prevents any updates to these registers once a PMC
overflows. The contents remain unchanged until the freeze bit is cleared by
the PMI handler.

The branch records are read and copied to sample data before unfreezing.
However, only valid entries are copied. There is no additional register to
denote which of the register pairs represent the top of the stack (TOS)
since internal register renaming always ensures that the first pair (i.e.
index 0) is the one representing the most recent branch and so on.

The LBR registers are per-thread resources and are cleared explicitly
whenever a new task is scheduled in. There are no special implications on
the contents of these registers when transitioning to deep C-states.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/d3b8500a3627a0d4d0259b005891ee248f248d91.1660211399.git.sandipan.das@amd.com
2022-08-27 00:05:43 +02:00
Linus Torvalds
5318b987fe More from the CPU vulnerability nightmares front:
Intel eIBRS machines do not sufficiently mitigate against RET
 mispredictions when doing a VM Exit therefore an additional RSB,
 one-entry stuffing is needed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLqsGsACgkQEsHwGGHe
 VUpXGg//ZEkxhf3Ri7X9PknAWNG6eIEqigKqWcdnOw+Oq/GMVb6q7JQsqowK7KBZ
 AKcY5c/KkljTJNohditnfSOePyCG5nDTPgfkjzIawnaVdyJWMRCz/L4X2cv6ykDl
 2l2EvQm4Ro8XAogYhE7GzDg/osaVfx93OkLCQj278VrEMWgM/dN2RZLpn+qiIkNt
 DyFlQ7cr5UASh/svtKLko268oT4JwhQSbDHVFLMJ52VaLXX36yx4rValZHUKFdox
 ZDyj+kiszFHYGsI94KAD0dYx76p6mHnwRc4y/HkVcO8vTacQ2b9yFYBGTiQatITf
 0Nk1RIm9m3rzoJ82r/U0xSIDwbIhZlOVNm2QtCPkXqJZZFhopYsZUnq2TXhSWk4x
 GQg/2dDY6gb/5MSdyLJmvrTUtzResVyb/hYL6SevOsIRnkwe35P6vDDyp15F3TYK
 YvidZSfEyjtdLISBknqYRQD964dgNZu9ewrj+WuJNJr+A2fUvBzUebXjxHREsugN
 jWp5GyuagEKTtneVCvjwnii+ptCm6yfzgZYLbHmmV+zhinyE9H1xiwVDvo5T7DDS
 ZJCBgoioqMhp5qR59pkWz/S5SNGui2rzEHbAh4grANy8R/X5ASRv7UHT9uAo6ve1
 xpw6qnE37CLzuLhj8IOdrnzWwLiq7qZ/lYN7m+mCMVlwRWobbOo=
 =a8em
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 eIBRS fixes from Borislav Petkov:
 "More from the CPU vulnerability nightmares front:

  Intel eIBRS machines do not sufficiently mitigate against RET
  mispredictions when doing a VM Exit therefore an additional RSB,
  one-entry stuffing is needed"

* tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Add LFENCE to RSB fill sequence
  x86/speculation: Add RSB VM Exit protections
2022-08-09 09:29:07 -07:00
Linus Torvalds
7c5c3a6177 ARM:
* Unwinder implementations for both nVHE modes (classic and
   protected), complete with an overflow stack
 
 * Rework of the sysreg access from userspace, with a complete
   rewrite of the vgic-v3 view to allign with the rest of the
   infrastructure
 
 * Disagregation of the vcpu flags in separate sets to better track
   their use model.
 
 * A fix for the GICv2-on-v3 selftest
 
 * A small set of cosmetic fixes
 
 RISC-V:
 
 * Track ISA extensions used by Guest using bitmap
 
 * Added system instruction emulation framework
 
 * Added CSR emulation framework
 
 * Added gfp_custom flag in struct kvm_mmu_memory_cache
 
 * Added G-stage ioremap() and iounmap() functions
 
 * Added support for Svpbmt inside Guest
 
 s390:
 
 * add an interface to provide a hypervisor dump for secure guests
 
 * improve selftests to use TAP interface
 
 * enable interpretive execution of zPCI instructions (for PCI passthrough)
 
 * First part of deferred teardown
 
 * CPU Topology
 
 * PV attestation
 
 * Minor fixes
 
 x86:
 
 * Permit guests to ignore single-bit ECC errors
 
 * Intel IPI virtualization
 
 * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS
 
 * PEBS virtualization
 
 * Simplify PMU emulation by just using PERF_TYPE_RAW events
 
 * More accurate event reinjection on SVM (avoid retrying instructions)
 
 * Allow getting/setting the state of the speaker port data bit
 
 * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent
 
 * "Notify" VM exit (detect microarchitectural hangs) for Intel
 
 * Use try_cmpxchg64 instead of cmpxchg64
 
 * Ignore benign host accesses to PMU MSRs when PMU is disabled
 
 * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior
 
 * Allow NX huge page mitigation to be disabled on a per-vm basis
 
 * Port eager page splitting to shadow MMU as well
 
 * Enable CMCI capability by default and handle injected UCNA errors
 
 * Expose pid of vcpu threads in debugfs
 
 * x2AVIC support for AMD
 
 * cleanup PIO emulation
 
 * Fixes for LLDT/LTR emulation
 
 * Don't require refcounted "struct page" to create huge SPTEs
 
 * Miscellaneous cleanups:
 ** MCE MSR emulation
 ** Use separate namespaces for guest PTEs and shadow PTEs bitmasks
 ** PIO emulation
 ** Reorganize rmap API, mostly around rmap destruction
 ** Do not workaround very old KVM bugs for L0 that runs with nesting enabled
 ** new selftests API for CPUID
 
 Generic:
 
 * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache
 
 * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLnyo4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtQQf/XjVWiRcWLPR9dqzRM/vvRXpiG+UL
 jU93R7m6ma99aqTtrxV/AE+kHgamBlma3Cwo+AcWk9uCVNbIhFjv2YKg6HptKU0e
 oJT3zRYp+XIjEo7Kfw+TwroZbTlG6gN83l1oBLFMqiFmHsMLnXSI2mm8MXyi3dNB
 vR2uIcTAl58KIprqNNsYJ2dNn74ogOMiXYx9XzoA9/5Xb6c0h4rreHJa5t+0s9RO
 Gz7Io3PxumgsbJngjyL1Ve5oxhlIAcZA8DU0PQmjxo3eS+k6BcmavGFd45gNL5zg
 iLpCh4k86spmzh8CWkAAwWPQE4dZknK6jTctJc0OFVad3Z7+X7n0E8TFrA==
 =PM8o
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Quite a large pull request due to a selftest API overhaul and some
  patches that had come in too late for 5.19.

  ARM:

   - Unwinder implementations for both nVHE modes (classic and
     protected), complete with an overflow stack

   - Rework of the sysreg access from userspace, with a complete rewrite
     of the vgic-v3 view to allign with the rest of the infrastructure

   - Disagregation of the vcpu flags in separate sets to better track
     their use model.

   - A fix for the GICv2-on-v3 selftest

   - A small set of cosmetic fixes

  RISC-V:

   - Track ISA extensions used by Guest using bitmap

   - Added system instruction emulation framework

   - Added CSR emulation framework

   - Added gfp_custom flag in struct kvm_mmu_memory_cache

   - Added G-stage ioremap() and iounmap() functions

   - Added support for Svpbmt inside Guest

  s390:

   - add an interface to provide a hypervisor dump for secure guests

   - improve selftests to use TAP interface

   - enable interpretive execution of zPCI instructions (for PCI
     passthrough)

   - First part of deferred teardown

   - CPU Topology

   - PV attestation

   - Minor fixes

  x86:

   - Permit guests to ignore single-bit ECC errors

   - Intel IPI virtualization

   - Allow getting/setting pending triple fault with
     KVM_GET/SET_VCPU_EVENTS

   - PEBS virtualization

   - Simplify PMU emulation by just using PERF_TYPE_RAW events

   - More accurate event reinjection on SVM (avoid retrying
     instructions)

   - Allow getting/setting the state of the speaker port data bit

   - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls
     are inconsistent

   - "Notify" VM exit (detect microarchitectural hangs) for Intel

   - Use try_cmpxchg64 instead of cmpxchg64

   - Ignore benign host accesses to PMU MSRs when PMU is disabled

   - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

   - Allow NX huge page mitigation to be disabled on a per-vm basis

   - Port eager page splitting to shadow MMU as well

   - Enable CMCI capability by default and handle injected UCNA errors

   - Expose pid of vcpu threads in debugfs

   - x2AVIC support for AMD

   - cleanup PIO emulation

   - Fixes for LLDT/LTR emulation

   - Don't require refcounted "struct page" to create huge SPTEs

   - Miscellaneous cleanups:
      - MCE MSR emulation
      - Use separate namespaces for guest PTEs and shadow PTEs bitmasks
      - PIO emulation
      - Reorganize rmap API, mostly around rmap destruction
      - Do not workaround very old KVM bugs for L0 that runs with nesting enabled
      - new selftests API for CPUID

  Generic:

   - Fix races in gfn->pfn cache refresh; do not pin pages tracked by
     the cache

   - new selftests API using struct kvm_vcpu instead of a (vm, id)
     tuple"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits)
  selftests: kvm: set rax before vmcall
  selftests: KVM: Add exponent check for boolean stats
  selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test
  selftests: KVM: Check stat name before other fields
  KVM: x86/mmu: remove unused variable
  RISC-V: KVM: Add support for Svpbmt inside Guest/VM
  RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap()
  RISC-V: KVM: Add G-stage ioremap() and iounmap() functions
  KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache
  RISC-V: KVM: Add extensible CSR emulation framework
  RISC-V: KVM: Add extensible system instruction emulation framework
  RISC-V: KVM: Factor-out instruction emulation into separate sources
  RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run
  RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function
  RISC-V: KVM: Fix variable spelling mistake
  RISC-V: KVM: Improve ISA extension by using a bitmap
  KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs()
  KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog
  KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT
  KVM: x86: Do not block APIC write for non ICR registers
  ...
2022-08-04 14:59:54 -07:00
Daniel Sneddon
2b12993220 x86/speculation: Add RSB VM Exit protections
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.

== Background ==

Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.

To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced.  eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.

== Problem ==

Here's a simplification of how guests are run on Linux' KVM:

void run_kvm_guest(void)
{
	// Prepare to run guest
	VMRESUME();
	// Clean up after guest runs
}

The execution flow for that would look something like this to the
processor:

1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()

Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:

* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.

* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".

IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.

However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.

Balanced CALL/RET instruction pairs such as in step #5 are not affected.

== Solution ==

The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.

However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.

Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.

The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.

In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.

There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.

  [ bp: Massage, incorporate review comments from Andy Cooper. ]

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-08-03 11:23:52 +02:00
Paolo Bonzini
63f4b21041 Merge remote-tracking branch 'kvm/next' into kvm-next-5.20
KVM/s390, KVM/x86 and common infrastructure changes for 5.20

x86:

* Permit guests to ignore single-bit ECC errors

* Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit (detect microarchitectural hangs) for Intel

* Cleanups for MCE MSR emulation

s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to use TAP interface

* enable interpretive execution of zPCI instructions (for PCI passthrough)

* First part of deferred teardown

* CPU Topology

* PV attestation

* Minor fixes

Generic:

* new selftests API using struct kvm_vcpu instead of a (vm, id) tuple

x86:

* Use try_cmpxchg64 instead of cmpxchg64

* Bugfixes

* Ignore benign host accesses to PMU MSRs when PMU is disabled

* Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

* x86/MMU: Allow NX huge pages to be disabled on a per-vm basis

* Port eager page splitting to shadow MMU as well

* Enable CMCI capability by default and handle injected UCNA errors

* Expose pid of vcpu threads in debugfs

* x2AVIC support for AMD

* cleanup PIO emulation

* Fixes for LLDT/LTR emulation

* Don't require refcounted "struct page" to create huge SPTEs

x86 cleanups:

* Use separate namespaces for guest PTEs and shadow PTEs bitmasks

* PIO emulation

* Reorganize rmap API, mostly around rmap destruction

* Do not workaround very old KVM bugs for L0 that runs with nesting enabled

* new selftests API for CPUID
2022-08-01 03:21:00 -04:00
Len Brown
4af184ee8b tools/power turbostat: dump secondary Turbo-Ratio-Limit
Intel Performance Hybrid processors have a 2nd MSR
describing the turbo limits enforced on the Ecores.

Note, TRL and Secondary-TRL are usually R/O information,
but on overclock-capable parts, they can be written.

Signed-off-by: Len Brown <len.brown@intel.com>
2022-07-28 14:23:26 -04:00
Pawan Gupta
4ad3278df6 x86/speculation: Disable RRSBA behavior
Some Intel processors may use alternate predictors for RETs on
RSB-underflow. This condition may be vulnerable to Branch History
Injection (BHI) and intramode-BTI.

Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines,
eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against
such attacks. However, on RSB-underflow, RET target prediction may
fallback to alternate predictors. As a result, RET's predicted target
may get influenced by branch history.

A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback
behavior when in kernel mode. When set, RETs will not take predictions
from alternate predictors, hence mitigating RETs as well. Support for
this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2).

For spectre v2 mitigation, when a user selects a mitigation that
protects indirect CALLs and JMPs against BHI and intramode-BTI, set
RRSBA_DIS_S also to protect RETs for RSB-underflow case.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-09 13:12:45 +02:00
Peter Zijlstra
d7caac991f x86/cpu/amd: Add Spectral Chicken
Zen2 uarchs have an undocumented, unnamed, MSR that contains a chicken
bit for some speculation behaviour. It needs setting.

Note: very belatedly AMD released naming; it's now officially called
      MSR_AMD64_DE_CFG2 and MSR_AMD64_DE_CFG2_SUPPRESS_NOBR_PRED_BIT
      but shall remain the SPECTRAL CHICKEN.

Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Peter Zijlstra
6ad0ad2bf8 x86/bugs: Report Intel retbleed vulnerability
Skylake suffers from RSB underflow speculation issues; report this
vulnerability and it's mitigation (spectre_v2=ibrs).

  [jpoimboe: cleanups, eibrs]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Linus Torvalds
8e8afafb0b Yet another hw vulnerability with a software mitigation: Processor MMIO
Stale Data.
 
 They are a class of MMIO-related weaknesses which can expose stale data
 by propagating it into core fill buffers. Data which can then be leaked
 using the usual speculative execution methods.
 
 Mitigations include this set along with microcode updates and are
 similar to MDS and TAA vulnerabilities: VERW now clears those buffers
 too.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKXMkMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWGPD/idalLIhhV5F2+hZIKm0WSnsBxAOh9K
 7y8xBxpQQ5FUfW3vm7Pg3ro6VJp7w2CzKoD4lGXzGHriusn3qst3vkza9Ay8xu8g
 RDwKe6hI+p+Il9BV9op3f8FiRLP9bcPMMReW/mRyYsOnJe59hVNwRAL8OG40PY4k
 hZgg4Psfvfx8bwiye5efjMSe4fXV7BUCkr601+8kVJoiaoszkux9mqP+cnnB5P3H
 zW1d1jx7d6eV1Y063h7WgiNqQRYv0bROZP5BJkufIoOHUXDpd65IRF3bDnCIvSEz
 KkMYJNXb3qh7EQeHS53NL+gz2EBQt+Tq1VH256qn6i3mcHs85HvC68gVrAkfVHJE
 QLJE3MoXWOqw+mhwzCRrEXN9O1lT/PqDWw8I4M/5KtGG/KnJs+bygmfKBbKjIVg4
 2yQWfMmOgQsw3GWCRjgEli7aYbDJQjany0K/qZTq54I41gu+TV8YMccaWcXgDKrm
 cXFGUfOg4gBm4IRjJ/RJn+mUv6u+/3sLVqsaFTs9aiib1dpBSSUuMGBh548Ft7g2
 5VbFVSDaLjB2BdlcG7enlsmtzw0ltNssmqg7jTK/L7XNVnvxwUoXw+zP7RmCLEYt
 UV4FHXraMKNt2ZketlomC8ui2hg73ylUp4pPdMXCp7PIXp9sVamRTbpz12h689VJ
 /s55bWxHkR6S
 =LBxT
 -----END PGP SIGNATURE-----

Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 MMIO stale data fixes from Thomas Gleixner:
 "Yet another hw vulnerability with a software mitigation: Processor
  MMIO Stale Data.

  They are a class of MMIO-related weaknesses which can expose stale
  data by propagating it into core fill buffers. Data which can then be
  leaked using the usual speculative execution methods.

  Mitigations include this set along with microcode updates and are
  similar to MDS and TAA vulnerabilities: VERW now clears those buffers
  too"

* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/mmio: Print SMT warning
  KVM: x86/speculation: Disable Fill buffer clear within guests
  x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
  x86/speculation/srbds: Update SRBDS mitigation selection
  x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
  x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
  x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
  x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
  x86/speculation: Add a common function for MD_CLEAR mitigation update
  x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
  Documentation: Add documentation for Processor MMIO Stale Data
2022-06-14 07:43:15 -07:00
Like Xu
c59a1f106f KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the
IA32_PEBS_ENABLE MSR exists and all architecturally enumerated fixed
and general-purpose counters have corresponding bits in IA32_PEBS_ENABLE
that enable generation of PEBS records. The general-purpose counter bits
start at bit IA32_PEBS_ENABLE[0], and the fixed counter bits start at
bit IA32_PEBS_ENABLE[32].

When guest PEBS is enabled, the IA32_PEBS_ENABLE MSR will be
added to the perf_guest_switch_msr() and atomically switched during
the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR.

Based on whether the platform supports x86_pmu.pebs_ept, it has also
refactored the way to add more msrs to arr[] in intel_guest_get_msrs()
for extensibility.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-8-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:55 -04:00
Robert Hoo
465932db25 x86/cpu: Add new VMX feature, Tertiary VM-Execution control
A new 64-bit control field "tertiary processor-based VM-execution
controls", is defined [1]. It's controlled by bit 17 of the primary
processor-based VM-execution controls.

Different from its brother VM-execution fields, this tertiary VM-
execution controls field is 64 bit. So it occupies 2 vmx_feature_leafs,
TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH.

Its companion VMX capability reporting MSR,MSR_IA32_VMX_PROCBASED_CTLS3
(0x492), is also semantically different from its brothers, whose 64 bits
consist of all allow-1, rather than 32-bit allow-0 and 32-bit allow-1 [1][2].
Therefore, its init_vmx_capabilities() is a little different from others.

[1] ISE 6.2 "VMCS Changes"
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

[2] SDM Vol3. Appendix A.3

Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153240.11549-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:13 -04:00
Linus Torvalds
09583dfed2 Power management updates for 5.19-rc1
- Update the Energy Model support code to allow the Energy Model to be
    artificial, which means that the power values may not be on a uniform
    scale with other devices providing power information, and update the
    cpufreq_cooling and devfreq_cooling thermal drivers to support
    artificial Energy Models (Lukasz Luba).
 
  - Make DTPM check the Energy Model type (Lukasz Luba).
 
  - Fix policy counter decrementation in cpufreq if Energy Model is in
    use (Pierre Gondois).
 
  - Add CPU-based scaling support to passive devfreq governor (Saravana
    Kannan, Chanwoo Choi).
 
  - Update the rk3399_dmc devfreq driver (Brian Norris).
 
  - Export dev_pm_ops instead of suspend() and resume() in the IIO
    chemical scd30 driver (Jonathan Cameron).
 
  - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
    PM-runtime counterparts (Jonathan Cameron).
 
  - Move symbol exports in the IIO chemical scd30 driver into the
    IIO_SCD30 namespace (Jonathan Cameron).
 
  - Avoid device PM-runtime usage count underflows (Rafael Wysocki).
 
  - Allow dynamic debug to control printing of PM messages  (David
    Cohen).
 
  - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
    Bai).
 
  - Preserve ACPI-table override during hibernation (Amadeusz Sławiński).
 
  - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).
 
  - Make Intel RAPL power capping driver support the RaptorLake and
    AlderLake N processors (Zhang Rui, Sumeet Pawnikar).
 
  - Remove redundant store to value after multiply in the RAPL power
    capping driver (Colin Ian King).
 
  - Add AlderLake processor support to the intel_idle driver (Zhang Rui).
 
  - Fix regression leading to no genpd governor in the PSCI cpuidle
    driver and fix the riscv-sbi cpuidle driver to allow a genpd
    governor to be used (Ulf Hansson).
 
  - Fix cpufreq governor clean up code to avoid using kfree() directly
    to free kobject-based items (Kevin Hao).
 
  - Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe Leroy).
 
  - Make intel_pstate notify frequency invariance code when no_turbo is
    turned on and off (Chen Yu).
 
  - Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas
    Pandruvada).
 
  - Make cpufreq avoid unnecessary frequency updates due to mismatch
    between hardware and the frequency table (Viresh Kumar).
 
  - Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify
    code (Viresh Kumar).
 
  - Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the
    calling convention for some driver callbacks consistent (Rafael
    Wysocki).
 
  - Avoid accessing half-initialized cpufreq policies from the show()
    and store() sysfs functions (Schspa Shi).
 
  - Rearrange cpufreq_offline() to make the calling convention for some
    driver callbacks consistent (Schspa Shi).
 
  - Update CPPC handling in cpufreq (Pierre Gondois).
 
  - Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski).
 
  - Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf
    Hansson).
 
  - Improve the way genpd deals with its governors (Ulf Hansson).
 
  - Update the turbostat utility to version 2022.04.16 (Len Brown,
    Dan Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen
    Yu).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmKL3hsSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxW4oP/RzMh6dclWXs3J/gUCKTqRepq6cb80tq
 Q2r9xRRHwy6ZH/PVddGDHmhQ7d3NAv13s4srA9kznZognF3hzuxnGau226ilDqHh
 qxVSBRjWY9ijxRBvkcCaa6HZm4Chb91pUX0CLpdYSl9BTgIdk66HZYaMsKhHU/di
 j7KKHPdKyyQkssWnMjGEyuaF+UebiEgISCF3+X0eb6c1m7GHXpgLJVxNy0pKkUdK
 j+n6+ms12OlVLtg1eIl0J5824w/rkK3ZdqfEXJSq++mNMqSj/KCI3yWpzsLKp9AB
 xxhox/tPgJVyON8Vtbb2IkWkiQUKeSrAGIUYXWmnwIZYLPSGD7BPzr82Cxr7S/ez
 imMB+1Qd3SsOQ9EdI9rGYgNsEF2vOs1xjMehSdUdmTz148IzBOBt4YyQeb/mfXqH
 nh9eVuFCzqH1lAayYt6iP1+V5gQn9as/+rR91k4k4A6OKXomuQUGORLeHfuKMfNH
 eBZ72tdXqiq6z+ag3lY3pBAMSm11epCOa3VR6QNaC7hrlY3AZP+o3tIUL6W813b+
 V3l1gWApGHZE1hiDM95dll/dIt9IZpTRd3dlqF/YnFW7fPDrz71EGvhrZpO7vdO0
 /G6eJcCDjqJVcbCE8Y77I6/AXjpVQ7PRPeNx6aW7jPcQhpVIgcsF2BGjk9anjXDs
 3yHJs9R/HMmA
 =Hewm
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add support for 'artificial' Energy Models in which power
  numbers for different entities may be in different scales, add support
  for some new hardware, fix bugs and clean up code in multiple places.

  Specifics:

   - Update the Energy Model support code to allow the Energy Model to
     be artificial, which means that the power values may not be on a
     uniform scale with other devices providing power information, and
     update the cpufreq_cooling and devfreq_cooling thermal drivers to
     support artificial Energy Models (Lukasz Luba).

   - Make DTPM check the Energy Model type (Lukasz Luba).

   - Fix policy counter decrementation in cpufreq if Energy Model is in
     use (Pierre Gondois).

   - Add CPU-based scaling support to passive devfreq governor (Saravana
     Kannan, Chanwoo Choi).

   - Update the rk3399_dmc devfreq driver (Brian Norris).

   - Export dev_pm_ops instead of suspend() and resume() in the IIO
     chemical scd30 driver (Jonathan Cameron).

   - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
     PM-runtime counterparts (Jonathan Cameron).

   - Move symbol exports in the IIO chemical scd30 driver into the
     IIO_SCD30 namespace (Jonathan Cameron).

   - Avoid device PM-runtime usage count underflows (Rafael Wysocki).

   - Allow dynamic debug to control printing of PM messages (David
     Cohen).

   - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
     Bai).

   - Preserve ACPI-table override during hibernation (Amadeusz
     Sławiński).

   - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).

   - Make Intel RAPL power capping driver support the RaptorLake and
     AlderLake N processors (Zhang Rui, Sumeet Pawnikar).

   - Remove redundant store to value after multiply in the RAPL power
     capping driver (Colin Ian King).

   - Add AlderLake processor support to the intel_idle driver (Zhang
     Rui).

   - Fix regression leading to no genpd governor in the PSCI cpuidle
     driver and fix the riscv-sbi cpuidle driver to allow a genpd
     governor to be used (Ulf Hansson).

   - Fix cpufreq governor clean up code to avoid using kfree() directly
     to free kobject-based items (Kevin Hao).

   - Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe
     Leroy).

   - Make intel_pstate notify frequency invariance code when no_turbo is
     turned on and off (Chen Yu).

   - Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas
     Pandruvada).

   - Make cpufreq avoid unnecessary frequency updates due to mismatch
     between hardware and the frequency table (Viresh Kumar).

   - Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify
     code (Viresh Kumar).

   - Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the
     calling convention for some driver callbacks consistent (Rafael
     Wysocki).

   - Avoid accessing half-initialized cpufreq policies from the show()
     and store() sysfs functions (Schspa Shi).

   - Rearrange cpufreq_offline() to make the calling convention for some
     driver callbacks consistent (Schspa Shi).

   - Update CPPC handling in cpufreq (Pierre Gondois).

   - Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski).

   - Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf
     Hansson).

   - Improve the way genpd deals with its governors (Ulf Hansson).

   - Update the turbostat utility to version 2022.04.16 (Len Brown, Dan
     Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen Yu)"

* tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (94 commits)
  PM: domains: Trust domain-idle-states from DT to be correct by genpd
  PM: domains: Measure power-on/off latencies in genpd based on a governor
  PM: domains: Allocate governor data dynamically based on a genpd governor
  PM: domains: Clean up some code in pm_genpd_init() and genpd_remove()
  PM: domains: Fix initialization of genpd's next_wakeup
  PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd
  PM: domains: Measure suspend/resume latencies in genpd based on governor
  PM: domains: Move the next_wakeup variable into the struct gpd_timing_data
  PM: domains: Allocate gpd_timing_data dynamically based on governor
  PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain()
  PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd
  PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd
  PM: domains: Drop redundant code for genpd always-on governor
  PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor
  powercap: intel_rapl: remove redundant store to value after multiply
  cpufreq: CPPC: Enable dvfs_possible_from_any_cpu
  cpufreq: CPPC: Enable fast_switch
  ACPI: CPPC: Assume no transition latency if no PCCT
  ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported
  ACPI: CPPC: Check _OSC for flexible address space
  ...
2022-05-24 16:04:25 -07:00
Linus Torvalds
cfeb2522c3 Perf events changes for this cycle were:
Platform PMU changes:
 =====================
 
  - x86/intel:
     - Add new Intel Alder Lake and Raptor Lake support
 
  - x86/amd:
     - AMD Zen4 IBS extensions support
     - Add AMD PerfMonV2 support
     - Add AMD Fam19h Branch Sampling support
 
 Generic changes:
 ================
 
  - signal: Deliver SIGTRAP on perf event asynchronously if blocked
 
    Perf instrumentation can be driven via SIGTRAP, but this causes a problem
    when SIGTRAP is blocked by a task & terminate the task.
 
    Allow user-space to request these signals asynchronously (after they get
    unblocked) & also give the information to the signal handler when this
    happens:
 
      " To give user space the ability to clearly distinguish synchronous from
        asynchronous signals, introduce siginfo_t::si_perf_flags and
        TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is
        required in future).
 
        The resolution to the problem is then to (a) no longer force the signal
        (avoiding the terminations), but (b) tell user space via si_perf_flags
        if the signal was synchronous or not, so that such signals can be
        handled differently (e.g. let user space decide to ignore or consider
        the data imprecise). "
 
  - Unify/standardize the /sys/devices/cpu/events/* output format.
 
  - Misc fixes & cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmKLuiURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ioSRAAgM3PneFHn5MFiuV/8ZfP3xMHNUOYOCgN
 JhALRcUhDdL4N9pS0DSImfXvAlYPJ/TZK8qBRNDsRgygp5vjrbr9zH2HdZBW1gyV
 qi3bpuNS+METnfNyumAoBeOYbMIvpm3NDUX+w68Xvkd1g8ykyno8Zc2H2hj3IDsW
 cK3ErP0CZLsnBZsymy29/bxCYhfxsED6J06hOa8R3Tvl4XYg/27Z+tEuZ4GYeFS8
 VikulYB9RhRWUbhkzwjyRSbTWyvsuXP+xD28ymUIxXaNCDOwxK8uYtVepUFIBO8X
 cZgtwT2faV3y5ZAnz02M+/JZl+Jz5EPm037vNQp9aJsTuAbAGnxh/hL0cBVuDqhv
 Nh9wkqS8FqwAbtpvg/IeamzqN5z/Yn2Q/Jyk/4oWipmeddXWUL7sYVoSduTGJJkz
 cZz2ciNQbnOCzv0ZSjihrGMqPaT+/wI/iLW3ouLoZXpfTtVVRiiLuI1DDAZ1rd2r
 D6djV8JjHIs71V/6E9ahVATxq8yMdikd7u734rA5K3XSxIBTYrdshbOhddzgeE7d
 chQ7XvpQXDoFrZtxkHXP5iIeNF7fU9MWNWaEcsrZaWEB/8UpD6eL2if1Kl8mog+h
 J4+zR1LWRHh8TNRfos3yCP2PSbbS6LPVsYLJzP+bb+pxgqdJ+urxfmxoCtY5trNI
 zHT52xfdxSo=
 =UqYA
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:
 "Platform PMU changes:

   - x86/intel:
      - Add new Intel Alder Lake and Raptor Lake support

   - x86/amd:
      - AMD Zen4 IBS extensions support
      - Add AMD PerfMonV2 support
      - Add AMD Fam19h Branch Sampling support

  Generic changes:

   - signal: Deliver SIGTRAP on perf event asynchronously if blocked

     Perf instrumentation can be driven via SIGTRAP, but this causes a
     problem when SIGTRAP is blocked by a task & terminate the task.

     Allow user-space to request these signals asynchronously (after
     they get unblocked) & also give the information to the signal
     handler when this happens:

       "To give user space the ability to clearly distinguish
        synchronous from asynchronous signals, introduce
        siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for
        flags in case more binary information is required in future).

        The resolution to the problem is then to (a) no longer force the
        signal (avoiding the terminations), but (b) tell user space via
        si_perf_flags if the signal was synchronous or not, so that such
        signals can be handled differently (e.g. let user space decide
        to ignore or consider the data imprecise). "

   - Unify/standardize the /sys/devices/cpu/events/* output format.

   - Misc fixes & cleanups"

* tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  perf/x86/amd/core: Fix reloading events for SVM
  perf/x86/amd: Run AMD BRS code only on supported hw
  perf/x86/amd: Fix AMD BRS period adjustment
  perf/x86/amd: Remove unused variable 'hwc'
  perf/ibs: Fix comment
  perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute
  perf/amd/ibs: Add support for L3 miss filtering
  perf/amd/ibs: Use ->is_visible callback for dynamic attributes
  perf/amd/ibs: Cascade pmu init functions' return value
  perf/x86/uncore: Add new Alder Lake and Raptor Lake support
  perf/x86/uncore: Clean up uncore_pci_ids[]
  perf/x86/cstate: Add new Alder Lake and Raptor Lake support
  perf/x86/msr: Add new Alder Lake and Raptor Lake support
  perf/x86: Add new Alder Lake and Raptor Lake support
  perf/amd/ibs: Use interrupt regs ip for stack unwinding
  perf/x86/amd/core: Add PerfMonV2 overflow handling
  perf/x86/amd/core: Add PerfMonV2 counter control
  perf/x86/amd/core: Detect available counters
  perf/x86/amd/core: Detect PerfMonV2 support
  x86/msr: Add PerfCntrGlobal* registers
  ...
2022-05-24 10:59:38 -07:00
Linus Torvalds
8443516da6 platform-drivers-x86 for v5.19-1
Highlights:
  -  New drivers:
     -  Intel "In Field Scan" (IFS) support
     -  Winmate FM07/FM07P buttons
     -  Mellanox SN2201 support
  -  AMD PMC driver enhancements
  -  Lots of various other small fixes and hardware-id additions
 
 The following is an automated git shortlog grouped by driver:
 
 Documentation:
  -  In-Field Scan
 
 Documentation/ABI:
  -  Add new attributes for mlxreg-io sysfs interfaces
  -  sysfs-class-firmware-attributes: Misc. cleanups
  -  sysfs-class-firmware-attributes: Fix Sphinx errors
  -  sysfs-driver-intel_sdsi: Fix sphinx warnings
 
 acerhdf:
  -  Cleanup str_starts_with()
 
 amd-pmc:
  -  Fix build error unused-function
  -  Shuffle location of amd_pmc_get_smu_version()
  -  Avoid reading SMU version at probe time
  -  Move FCH init to first use
  -  Move SMU logging setup out of init
  -  Fix compilation without CONFIG_SUSPEND
 
 amd_hsmp:
  -  Add HSMP protocol version 5 messages
 
 asus-nb-wmi:
  -  Add keymap for MyASUS key
 
 asus-wmi:
  -  Update unknown code message
  -  Use kobj_to_dev()
  -  Fix driver not binding when fan curve control probe fails
  -  Potential buffer overflow in asus_wmi_evaluate_method_buf()
 
 barco-p50-gpio:
  -  Fix duplicate included linux/io.h
 
 dell-laptop:
  -  Add quirk entry for Latitude 7520
 
 gigabyte-wmi:
  -  Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
  -  added support for B660 GAMING X DDR4 motherboard
 
 hp-wmi:
  -  Correct code style related issues
 
 intel-hid:
  -  fix _DSM function index handling
 
 intel-uncore-freq:
  -  Prevent driver loading in guests
 
 intel_cht_int33fe:
  -  Set driver data
 
 platform/mellanox:
  -  Add support for new SN2201 system
 
 platform/surface:
  -  aggregator: Fix initialization order when compiling as builtin module
  -  gpe: Add support for Surface Pro 8
 
 platform/x86/dell:
  -  add buffer allocation/free functions for SMI calls
 
 platform/x86/intel:
  -  Fix 'rmmod pmt_telemetry' panic
  -  pmc/core: Use kobj_to_dev()
  -  pmc/core: change pmc_lpm_modes to static
 
 platform/x86/intel/ifs:
  -  Add CPU_SUP_INTEL dependency
  -  add ABI documentation for IFS
  -  Add IFS sysfs interface
  -  Add scan test support
  -  Authenticate and copy to secured memory
  -  Check IFS Image sanity
  -  Read IFS firmware image
  -  Add stub driver for In-Field Scan
 
 platform/x86/intel/sdsi:
  -  Fix bug in multi packet reads
  -  Poll on ready bit for writes
  -  Handle leaky bucket
 
 platform_data/mlxreg:
  -  Add field for notification callback
 
 pmc_atom:
  -  dont export pmc_atom_read - no modular users
  -  remove unused pmc_atom_write()
 
 samsung-laptop:
  -  use kobj_to_dev()
  -  Fix an unsigned comparison which can never be negative
 
 stop_machine:
  -  Add stop_core_cpuslocked() for per-core operations
 
 think-lmi:
  -  certificate support clean ups
 
 thinkpad_acpi:
  -  Correct dual fan probe
  -  Add a s2idle resume quirk for a number of laptops
  -  Convert btusb DMI list to quirks
 
 tools/power/x86/intel-speed-select:
  -  Fix warning for perf_cap.cpu
  -  Display error on turbo mode disabled
  -  fix build failure when using -Wl,--as-needed
 
 toshiba_acpi:
  -  use kobj_to_dev()
 
 trace:
  -  platform/x86/intel/ifs: Add trace point to track Intel IFS operations
 
 winmate-fm07-keys:
  -  Winmate FM07/FM07P buttons
 
 wmi:
  -  replace usage of found with dedicated list iterator variable
 
 x86/microcode/intel:
  -  Expose collect_cpu_info_early() for IFS
 
 x86/msr-index:
  -  Define INTEGRITY_CAPABILITIES MSR
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmKKlA0UHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9w0Iwf+PYoq7qtU6j6N2f8gL2s65JpKiSPP
 CkgnCzTP+khvNnTWMQS8RW9VE6YrHXmN/+d3UAvRrHsOYm3nyZT5aPju9xJ6Xyfn
 5ZdMVvYxz7cm3lC6ay8AQt0Cmy6im/+lzP5vA5K68IYh0fPX/dvuOU57pNvXYFfk
 Yz5/Gm0t0C4CKVqkcdU/zkNawHP+2+SyQe+Ua2srz7S3DAqUci0lqLr/w9Xk2Yij
 nCgEWFB1Qjd2NoyRRe44ksLQ0dXpD4ADDzED+KPp6VTGnw61Eznf9319Z5ONNa/O
 VAaSCcDNKps8d3ZpfCpLb3Rs4ztBCkRnkLFczJBgPsBiuDmyTT2/yeEtNg==
 =HdEG
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver updates from Hans de Goede:
 "This includes some small changes to kernel/stop_machine.c and arch/x86
  which are deps of the new Intel IFS support.

  Highlights:

   - New drivers:
       - Intel "In Field Scan" (IFS) support
       - Winmate FM07/FM07P buttons
       - Mellanox SN2201 support

   -  AMD PMC driver enhancements

   -  Lots of various other small fixes and hardware-id additions"

* tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits)
  platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency
  platform/x86: intel_cht_int33fe: Set driver data
  platform/x86: intel-hid: fix _DSM function index handling
  platform/x86: toshiba_acpi: use kobj_to_dev()
  platform/x86: samsung-laptop: use kobj_to_dev()
  platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
  tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu
  tools/power/x86/intel-speed-select: Display error on turbo mode disabled
  Documentation: In-Field Scan
  platform/x86/intel/ifs: add ABI documentation for IFS
  trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
  platform/x86/intel/ifs: Add IFS sysfs interface
  platform/x86/intel/ifs: Add scan test support
  platform/x86/intel/ifs: Authenticate and copy to secured memory
  platform/x86/intel/ifs: Check IFS Image sanity
  platform/x86/intel/ifs: Read IFS firmware image
  platform/x86/intel/ifs: Add stub driver for In-Field Scan
  stop_machine: Add stop_core_cpuslocked() for per-core operations
  x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
  x86/microcode/intel: Expose collect_cpu_info_early() for IFS
  ...
2022-05-23 20:38:39 -07:00
Linus Torvalds
eb39e37d5c AMD SEV-SNP support
Add to confidential guests the necessary memory integrity protection
 against malicious hypervisor-based attacks like data replay, memory
 remapping and others, thus achieving a stronger isolation from the
 hypervisor.
 
 At the core of the functionality is a new structure called a reverse
 map table (RMP) with which the guest has a say in which pages get
 assigned to it and gets notified when a page which it owns, gets
 accessed/modified under the covers so that the guest can take an
 appropriate action.
 
 In addition, add support for the whole machinery needed to launch a SNP
 guest, details of which is properly explained in each patch.
 
 And last but not least, the series refactors and improves parts of the
 previous SEV support so that the new code is accomodated properly and
 not just bolted on.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKLU2AACgkQEsHwGGHe
 VUpb/Q//f4LGiJf4nw1flzpe90uIsHNwAafng3NOjeXmhI/EcOlqPf23WHPCgg3Z
 2umfa4sRZyj4aZubDd7tYAoq4qWrQ7pO7viWCNTh0InxBAILOoMPMuq2jSAbq0zV
 ASUJXeQ2bqjYxX4JV4N5f3HT2l+k68M0mpGLN0H+O+LV9pFS7dz7Jnsg+gW4ZP25
 PMPLf6FNzO/1tU1aoYu80YDP1ne4eReLrNzA7Y/rx+S2NAetNwPn21AALVgoD4Nu
 vFdKh4MHgtVbwaQuh0csb/+4vD+tDXAhc8lbIl+Abl9ZxJaDWtAJW5D9e2CnsHk1
 NOkHwnrzizzhtGK1g56YPUVRFAWhZYMOI1hR0zGPLQaVqBnN4b+iahPeRiV0XnGE
 PSbIHSfJdeiCkvLMCdIAmpE5mRshhRSUfl1CXTCdetMn8xV/qz/vG6bXssf8yhTV
 cfLGPHU7gfVmsbR9nk5a8KZ78PaytxOxfIDXvCy8JfQwlIWtieaCcjncrj+sdMJy
 0fdOuwvi4jma0cyYuPolKiS1Hn4ldeibvxXT7CZQlIx6jZShMbpfpTTJs11XdtHm
 PdDAc1TY3AqI33mpy9DhDQmx/+EhOGxY3HNLT7evRhv4CfdQeK3cPVUWgo4bGNVv
 ZnFz7nvmwpyufltW9K8mhEZV267174jXGl6/idxybnlVE7ESr2Y=
 =Y8kW
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull AMD SEV-SNP support from Borislav Petkov:
 "The third AMD confidential computing feature called Secure Nested
  Paging.

  Add to confidential guests the necessary memory integrity protection
  against malicious hypervisor-based attacks like data replay, memory
  remapping and others, thus achieving a stronger isolation from the
  hypervisor.

  At the core of the functionality is a new structure called a reverse
  map table (RMP) with which the guest has a say in which pages get
  assigned to it and gets notified when a page which it owns, gets
  accessed/modified under the covers so that the guest can take an
  appropriate action.

  In addition, add support for the whole machinery needed to launch a
  SNP guest, details of which is properly explained in each patch.

  And last but not least, the series refactors and improves parts of the
  previous SEV support so that the new code is accomodated properly and
  not just bolted on"

* tag 'x86_sev_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  x86/entry: Fixup objtool/ibt validation
  x86/sev: Mark the code returning to user space as syscall gap
  x86/sev: Annotate stack change in the #VC handler
  x86/sev: Remove duplicated assignment to variable info
  x86/sev: Fix address space sparse warning
  x86/sev: Get the AP jump table address from secrets page
  x86/sev: Add missing __init annotations to SEV init routines
  virt: sevguest: Rename the sevguest dir and files to sev-guest
  virt: sevguest: Change driver name to reflect generic SEV support
  x86/boot: Put globals that are accessed early into the .data section
  x86/boot: Add an efi.h header for the decompressor
  virt: sevguest: Fix bool function returning negative value
  virt: sevguest: Fix return value check in alloc_shared_pages()
  x86/sev-es: Replace open-coded hlt-loop with sev_es_terminate()
  virt: sevguest: Add documentation for SEV-SNP CPUID Enforcement
  virt: sevguest: Add support to get extended report
  virt: sevguest: Add support to derive key
  virt: Add SEV-SNP guest driver
  x86/sev: Register SEV-SNP guest request platform device
  x86/sev: Provide support for SNP guest request NAEs
  ...
2022-05-23 17:38:01 -07:00
Pawan Gupta
027bbb884b KVM: x86/speculation: Disable Fill buffer clear within guests
The enumeration of MD_CLEAR in CPUID(EAX=7,ECX=0).EDX{bit 10} is not an
accurate indicator on all CPUs of whether the VERW instruction will
overwrite fill buffers. FB_CLEAR enumeration in
IA32_ARCH_CAPABILITIES{bit 17} covers the case of CPUs that are not
vulnerable to MDS/TAA, indicating that microcode does overwrite fill
buffers.

Guests running in VMM environments may not be aware of all the
capabilities/vulnerabilities of the host CPU. Specifically, a guest may
apply MDS/TAA mitigations when a virtual CPU is enumerated as vulnerable
to MDS/TAA even when the physical CPU is not. On CPUs that enumerate
FB_CLEAR_CTRL the VMM may set FB_CLEAR_DIS to skip overwriting of fill
buffers by the VERW instruction. This is done by setting FB_CLEAR_DIS
during VMENTER and resetting on VMEXIT. For guests that enumerate
FB_CLEAR (explicitly asking for fill buffer clear capability) the VMM
will not use FB_CLEAR_DIS.

Irrespective of guest state, host overwrites CPU buffers before VMENTER
to protect itself from an MMIO capable guest, as part of mitigation for
MMIO Stale Data vulnerabilities.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-05-21 12:41:35 +02:00
Pawan Gupta
5180218615 x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
Processor MMIO Stale Data is a class of vulnerabilities that may
expose data after an MMIO operation. For more details please refer to
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst

Add the Processor MMIO Stale Data bug enumeration. A microcode update
adds new bits to the MSR IA32_ARCH_CAPABILITIES, define them.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-05-21 12:14:30 +02:00
Tony Luck
db1af12929 x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
The INTEGRITY_CAPABILITIES MSR is enumerated by bit 2 of the
CORE_CAPABILITIES MSR.

Add defines for the CORE_CAPS enumeration as well as for the integrity
MSR.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-3-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Peter Zijlstra
47319846a9 Linux 5.18-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmJu9FYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGAyEH/16xtJSpLmLwrQzG
 o+4ToQxSQ+/9UHyu0RTEvHg2THm9/8emtIuYyc/5FgdoWctcSa3AaDcveWmuWmkS
 KYcdhfJsaEqjNHS3OPYXN84fmo9Hel7263shu5+IYmP/sN0DfQp6UWTryX1q4B3Q
 4Pdutkuq63Uwd8nBZ5LXQBumaBrmkkuMgWEdT4+6FOo1mPzwdIGBxCuz1UsNNl5k
 chLWxkQfe2eqgWbYJrgCQfrVdORXVtoU2fGilZUNrHRVGkkldXkkz5clJfapyZD3
 odmZCEbrE4GPKgZwCmDERMfD1hzhZDtYKiHfOQ506szH5ykJjPBcOjHed7dA60eB
 J3+wdek=
 =39Ca
 -----END PGP SIGNATURE-----

Merge branch 'v5.18-rc5'

Obtain the new INTEL_FAM6 stuff required.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-05-11 16:27:06 +02:00
Sandipan Das
089be16d59 x86/msr: Add PerfCntrGlobal* registers
Add MSR definitions that will be used to enable the new AMD
Performance Monitoring Version 2 (PerfMonV2) features. These
include:

  * Performance Counter Global Control (PerfCntrGlobalCtl)
  * Performance Counter Global Status (PerfCntrGlobalStatus)
  * Performance Counter Global Status Clear (PerfCntrGlobalStatusClr)

The new Performance Counter Global Control and Status MSRs
provide an interface for enabling or disabling multiple
counters at the same time and for testing overflow without
probing the individual registers for each PMC.

The availability of these registers is indicated through the
PerfMonV2 feature bit of CPUID leaf 0x80000022 EAX.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/cdc0d8f75bd519848731b5c64d924f5a0619a573.1650515382.git.sandipan.das@amd.com
2022-05-04 11:18:26 +02:00
Rafael J. Wysocki
9765fa2566 Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat changes for 5.19 from Len Brown:

"Chen Yu (1):
      tools/power turbostat: Support thermal throttle count print

Dan Merillat (1):
      tools/power turbostat: fix dump for AMD cpus

Len Brown (5):
      tools/power turbostat: tweak --show and --hide capability
      tools/power turbostat: fix ICX DRAM power numbers
      tools/power turbostat: be more useful as non-root
      tools/power turbostat: No build warnings with -Wextra
      tools/power turbostat: version 2022.04.16

Sumeet Pawnikar (2):
      tools/power turbostat: Add Power Limit4 support
      tools/power turbostat: print power values upto three decimal

Zephaniah E. Loss-Cutler-Hull (2):
      tools/power turbostat: Allow -e for all names.
      tools/power turbostat: Allow printing header every N iterations"

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2022.04.16
  tools/power turbostat: No build warnings with -Wextra
  tools/power turbostat: be more useful as non-root
  tools/power turbostat: fix ICX DRAM power numbers
  tools/power turbostat: Support thermal throttle count print
  tools/power turbostat: Allow printing header every N iterations
  tools/power turbostat: Allow -e for all names.
  tools/power turbostat: print power values upto three decimal
  tools/power turbostat: Add Power Limit4 support
  tools/power turbostat: fix dump for AMD cpus
  tools/power turbostat: tweak --show and --hide capability
2022-04-19 17:43:25 +02:00
Sumeet Pawnikar
f52ba93190 tools/power turbostat: Add Power Limit4 support
Add Power Limit4 support.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2022-04-16 21:58:14 -04:00
Pawan Gupta
400331f8ff x86/tsx: Disable TSX development mode at boot
A microcode update on some Intel processors causes all TSX transactions
to always abort by default[*]. Microcode also added functionality to
re-enable TSX for development purposes. With this microcode loaded, if
tsx=on was passed on the cmdline, and TSX development mode was already
enabled before the kernel boot, it may make the system vulnerable to TSX
Asynchronous Abort (TAA).

To be on safer side, unconditionally disable TSX development mode during
boot. If a viable use case appears, this can be revisited later.

  [*]: Intel TSX Disable Update for Selected Processors, doc ID: 643557

  [ bp: Drop unstable web link, massage heavily. ]

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/347bd844da3a333a9793c6687d4e4eb3b2419a3e.1646943780.git.pawan.kumar.gupta@linux.intel.com
2022-04-11 09:58:40 +02:00
Brijesh Singh
f742b90e61 x86/mm: Extend cc_attr to include AMD SEV-SNP
The CC_ATTR_GUEST_SEV_SNP can be used by the guest to query whether the
SNP (Secure Nested Paging) feature is active.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220307213356.2797205-10-brijesh.singh@amd.com
2022-04-06 13:02:34 +02:00
Stephane Eranian
ada543459c perf/x86/amd: Add AMD Fam19h Branch Sampling support
Add support for the AMD Fam19h 16-deep branch sampling feature as
described in the AMD PPR Fam19h Model 01h Revision B1.  This is a model
specific extension. It is not an architected AMD feature.

The Branch Sampling (BRS) operates with a 16-deep saturating buffer in MSR
registers. There is no branch type filtering. All control flow changes are
captured. BRS relies on specific programming of the core PMU of Fam19h.  In
particular, the following requirements must be met:
 - the sampling period be greater than 16 (BRS depth)
 - the sampling period must use a fixed and not frequency mode

BRS interacts with the NMI interrupt as well. Because enabling BRS is
expensive, it is only activated after P event occurrences, where P is the
desired sampling period.  At P occurrences of the event, the counter
overflows, the CPU catches the interrupt, activates BRS for 16 branches until
it saturates, and then delivers the NMI to the kernel.  Between the overflow
and the time BRS activates more branches may be executed skewing the period.
All along, the sampling event keeps counting. The skid may be attenuated by
reducing the sampling period by 16 (subsequent patch).

BRS is integrated into perf_events seamlessly via the same
PERF_RECORD_BRANCH_STACK sample format. BRS generates perf_branch_entry
records in the sampling buffer. No prediction information is supported. The
branches are stored in reverse order of execution.  The most recent branch is
the first entry in each record.

No modification to the perf tool is necessary.

BRS can be used with any sampling event. However, it is recommended to use
the RETIRED_BRANCH_INSTRUCTIONS event because it matches what the BRS
captures.

$ perf record -b -c 1000037 -e cpu/event=0xc2,name=ret_br_instructions/ test

$ perf report -D
56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0
... branch stack: nr:16
.....  0: 0000000000401d24 -> 0000000000401d5a 0 cycles      0
.....  1: 0000000000401d5c -> 0000000000401d24 0 cycles      0
.....  2: 0000000000401d22 -> 0000000000401d5c 0 cycles      0
.....  3: 0000000000401d5e -> 0000000000401d22 0 cycles      0
.....  4: 0000000000401d20 -> 0000000000401d5e 0 cycles      0
.....  5: 0000000000401d3e -> 0000000000401d20 0 cycles      0
.....  6: 0000000000401d42 -> 0000000000401d3e 0 cycles      0
.....  7: 0000000000401d3c -> 0000000000401d42 0 cycles      0
.....  8: 0000000000401d44 -> 0000000000401d3c 0 cycles      0
.....  9: 0000000000401d3a -> 0000000000401d44 0 cycles      0
..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles      0
..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles      0
..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles      0
..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles      0
..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles      0
..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles      0
 ... thread: test:18230
 ...... dso: test

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-4-eranian@google.com
2022-04-05 10:24:37 +02:00
Linus Torvalds
7001052160 Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism
 where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP.
 
 Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is
 limited to 2 instructions (and typically fewer) on branch targets not starting
 with ENDBR. CET-IBT also limits speculation of the next sequential instruction
 after the indirect CALL/JMP [1].
 
 CET-IBT is fundamentally incompatible with retpolines, but provides, as
 described above, speculation limits itself.
 
 [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp
 bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD
 1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX
 urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF
 BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b
 y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde
 go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht
 9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW
 ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT
 x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T
 dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL
 EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz
 beB/7u2KIUbKEkSN
 =jZfK
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
 "Add support for Intel CET-IBT, available since Tigerlake (11th gen),
  which is a coarse grained, hardware based, forward edge
  Control-Flow-Integrity mechanism where any indirect CALL/JMP must
  target an ENDBR instruction or suffer #CP.

  Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
  is limited to 2 instructions (and typically fewer) on branch targets
  not starting with ENDBR. CET-IBT also limits speculation of the next
  sequential instruction after the indirect CALL/JMP [1].

  CET-IBT is fundamentally incompatible with retpolines, but provides,
  as described above, speculation limits itself"

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html

* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  kvm/emulate: Fix SETcc emulation for ENDBR
  x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
  x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
  kbuild: Fixup the IBT kbuild changes
  x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
  x86: Remove toolchain check for X32 ABI capability
  x86/alternative: Use .ibt_endbr_seal to seal indirect calls
  objtool: Find unused ENDBR instructions
  objtool: Validate IBT assumptions
  objtool: Add IBT/ENDBR decoding
  objtool: Read the NOENDBR annotation
  x86: Annotate idtentry_df()
  x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
  x86: Annotate call_on_stack()
  objtool: Rework ASM_REACHABLE
  x86: Mark __invalid_creds() __noreturn
  exit: Mark do_group_exit() __noreturn
  x86: Mark stop_this_cpu() __noreturn
  objtool: Ignore extra-symbol code
  objtool: Rename --duplicate to --lto
  ...
2022-03-27 10:17:23 -07:00
Linus Torvalds
95ab0e8768 Changes for this cycle were:
- Fix address filtering for Intel/PT,ARM/CoreSight
  - Enable Intel/PEBS format 5
  - Allow more fixed-function counters for x86
  - Intel/PT: Enable not recording Taken-Not-Taken packets
  - Add a few branch-types
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI4WdIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jdTA/7BADTYzFCbdwPzHt2mR8osv7k+pDvYxs9
 wxNjyi1X7N8cPkhqgIg9CfdhdyDOqo7+J4fG17f2qbwjNK7b2Fb1/U6ZoZaf+f8F
 W0e2LX5KZTXUhkA+TEjrXvYD9FmJaCPM/l2RQg8U7okBs2kb0H6QT2Yn21wd1roC
 WwI5KFiWSVS1IzpVLaXjDh+FJfJHd75ReMqJeus+QoVQ9NHeuI+t4DglSB1IBi54
 d/zeVXE/Y4dFTQOrU06S2HxcOEptvXZsPmVLvKab/veeGGyWiGPxQpvu6bXm6u3x
 0sV+dn67zut2m2pQlUZUucgGTSYIZTpOe+rNukTB9hJ4XeN4/1ohOOCrOuYM+63P
 lGFbN1v+LD7Wc6C2eEhw8G5GEL0qbwzFNQ06O3EOFi7C7GKn7WS/ET6XuuMOERFk
 uxEPb4pFtbBlJ0SriCprFJSd5NL3PORZlLIhv4hGH5hilLR1TFeKDuwZaM4noQxU
 dL3rKGLi9H+P46Eni9H28+0gDISbv1xL+WivHOFQNmhBqAZO52ZcF3J+dgBaR1B5
 pBxVTycFpZMjxSZnqTE0gMsFaLIpVGc+75Chns1rajR0mEtRtJUQUbYz4tK4zb0E
 dZR1p+VF6+DYmSRhiqeaTi9uz9oE8kMa8o/EcbFIg/9BgEnUwJXU20bjnar30xQ7
 9OIn7r9hjHI=
 =XPuo
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf event updates from Ingo Molnar:

 - Fix address filtering for Intel/PT,ARM/CoreSight

 - Enable Intel/PEBS format 5

 - Allow more fixed-function counters for x86

 - Intel/PT: Enable not recording Taken-Not-Taken packets

 - Add a few branch-types

* tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix the build on !CONFIG_PHYS_ADDR_T_64BIT
  perf: Add irq and exception return branch types
  perf/x86/intel/uncore: Make uncore_discovery clean for 64 bit addresses
  perf/x86/intel/pt: Add a capability and config bit for disabling TNTs
  perf/x86/intel/pt: Add a capability and config bit for event tracing
  perf/x86/intel: Increase max number of the fixed counters
  KVM: x86: use the KVM side max supported fixed counter
  perf/x86/intel: Enable PEBS format 5
  perf/core: Allow kernel address filter when not filtering the kernel
  perf/x86/intel/pt: Fix address filter config for 32-bit kernel
  perf/core: Fix address filter parser for multiple filters
  x86: Share definition of __is_canonical_address()
  perf/x86/intel/pt: Relax address filter validation
2022-03-22 13:06:49 -07:00
Rafael J. Wysocki
31035f3e20 Merge branch 'thermal-hfi'
Merge Intel Hardware Feedback Interface (HFI) thermal driver for
5.18-rc1 and update the intel-speed-select utility to support that
driver.

* thermal-hfi:
  tools/power/x86/intel-speed-select: v1.12 release
  tools/power/x86/intel-speed-select: HFI support
  tools/power/x86/intel-speed-select: OOB daemon mode
  thermal: intel: hfi: INTEL_HFI_THERMAL depends on NET
  thermal: netlink: Fix parameter type of thermal_genl_cpu_capability_event() stub
  thermal: intel: hfi: Notify user space for HFI events
  thermal: netlink: Add a new event to notify CPU capabilities change
  thermal: intel: hfi: Enable notification interrupt
  thermal: intel: hfi: Handle CPU hotplug events
  thermal: intel: hfi: Minimally initialize the Hardware Feedback Interface
  x86/cpu: Add definitions for the Intel Hardware Feedback Interface
  x86/Documentation: Describe the Intel Hardware Feedback Interface
2022-03-18 19:00:26 +01:00
Peter Zijlstra
991625f3dd x86/ibt: Add IBT feature, MSR and #CP handling
The bits required to make the hardware go.. Of note is that, provided
the syscall entry points are covered with ENDBR, #CP doesn't need to
be an IST because we'll never hit the syscall gap.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.582331711@infradead.org
2022-03-15 10:32:39 +01:00
Alexander Shishkin
161a9a3370 perf/x86/intel/pt: Add a capability and config bit for disabling TNTs
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called TNT-Disable which is enabled config bit 55.

TNT-Disable disables Taken-Not-Taken packets to reduce the tracing
overhead, but with the result that exact control flow information is
lost.

Add a capability and config bit for TNT-Disable.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20220126104815.2807416-3-adrian.hunter@intel.com
2022-02-15 17:47:11 +01:00
Alexander Shishkin
28c24ded64 perf/x86/intel/pt: Add a capability and config bit for event tracing
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which is enabled config bit 31.

Event Trace exposes details about asynchronous events such as interrupts
and VM-Entry/Exit.

Add a capability and config bit for Event Trace.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20220126104815.2807416-2-adrian.hunter@intel.com
2022-02-15 17:47:11 +01:00
Maxim Levitsky
3915035282 KVM: x86: SVM: move avic definitions from AMD's spec to svm.h
asm/svm.h is the correct place for all values that are defined in
the SVM spec, and that includes AVIC.

Also add some values from the spec that were not defined before
and will be soon useful.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220207155447.840194-10-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-08 13:30:50 -05:00
Ricardo Neri
7b8f40b3de x86/cpu: Add definitions for the Intel Hardware Feedback Interface
Add the CPUID feature bit and the model-specific registers needed to
identify and configure the Intel Hardware Feedback Interface.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03 19:50:49 +01:00
Huang Rui
89aa94b4a2 x86/msr: Add AMD CPPC MSR definitions
AMD CPPC (Collaborative Processor Performance Control) function uses MSR
registers to manage the performance hints. So add the MSR register macro
here.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 18:51:17 +01:00
Chang S. Bae
dae1bd5838 x86/msr-index: Add MSRs for XFD
XFD introduces two MSRs:

    - IA32_XFD to enable/disable a feature controlled by XFD

    - IA32_XFD_ERR to expose to the #NM trap handler which feature
      was tried to be used for the first time.

Both use the same xstate-component bitmap format, used by XCR0.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-14-chang.seok.bae@intel.com
2021-10-26 10:18:09 +02:00
Linus Torvalds
2594b713c1 - New AMD models support
- Allow MONITOR/MWAIT to be used for C1 state entry on Hygon too
 
 - Use the special RAPL CPUID bit to detect the functionality on AMD and
   Hygon instead of doing family matching.
 
 - Add support for new Intel microcode deprecating TSX on some models and
 do not enable kernel workarounds for those CPUs when TSX transactions
 always abort, as a result of that microcode update.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmDZhzEACgkQEsHwGGHe
 VUo5ow//eRwlb1OL/D3jzLT4nTYX8+XdufaJF1HBr1Cf3mdNkiEgyu2bvsXNTpN/
 ZP7CFCHibgYeHJ7qTTkhoK1DCe4YHjj450oCgg7pv40Mv9E29Rpszie8y8e/ngkc
 g9OiAeEd4A32v8bRMAOOX0UZN4afismXBW0k4iwOAguNFiZ/usrrVYTZpJe3wG65
 /YM9FdDZ+Mt7BavJdVyGh03PpzoSMrKyEQ673CHhERQyy5oEublrDSmtt5hQJv1W
 4tgNOWpw57Gi7Vs7UYd7VvBQKwQZKeQeHJWu1TXUB6pw0lKYvULH6m0dasvc6cGb
 WtCBvbQU9MRP0LvdvYOdgmSgn400z7mEwlUWmAFJLIUlDsuRpZmVQ4C1/OUnOSdx
 amb7I3bp1z6Rqjs9ADW5h87qDA+q5OmbIZeIDvuRypQOB3yEktAEdUvWb65b1Fgm
 9CpzebxyaOUM9YRxDzDd2joZYKnfI3stF6UCrVXaZwYei+Jmzn5gc8ZOoOX9g6gO
 eX/sLW2RWRx6XxilaWZijOHJTjokVUpEnD12aGtKO6ou5QbFTwldc2Metpua42cL
 5p8wRxEYeKT/EE/GKy/qIEp624QaInSEmfyq8RFKU4em7GSaSUmoQF5151LfnoRY
 ARHkEdz+T8s5RI5xSvUZLRMNYjig9tZas3blYfbJHnU7V2+bspQ=
 =wW+k
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

 - New AMD models support

 - Allow MONITOR/MWAIT to be used for C1 state entry on Hygon too

 - Use the special RAPL CPUID bit to detect the functionality on AMD and
   Hygon instead of doing family matching.

 - Add support for new Intel microcode deprecating TSX on some models
   and do not enable kernel workarounds for those CPUs when TSX
   transactions always abort, as a result of that microcode update.

* tag 'x86_cpu_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsx: Clear CPUID bits when TSX always force aborts
  x86/events/intel: Do not deploy TSX force abort workaround when TSX is deprecated
  x86/msr: Define new bits in TSX_FORCE_ABORT MSR
  perf/x86/rapl: Use CPUID bit on AMD and Hygon parts
  x86/cstate: Allow ACPI C1 FFH MWAIT use on Hygon systems
  x86/amd_nb: Add AMD family 19h model 50h PCI ids
  x86/cpu: Fix core name for Sapphire Rapids
2021-06-28 11:22:40 -07:00
Pawan Gupta
1348924ba8 x86/msr: Define new bits in TSX_FORCE_ABORT MSR
Intel client processors that support the IA32_TSX_FORCE_ABORT MSR
related to perf counter interaction [1] received a microcode update that
deprecates the Transactional Synchronization Extension (TSX) feature.
The bit FORCE_ABORT_RTM now defaults to 1, writes to this bit are
ignored. A new bit TSX_CPUID_CLEAR clears the TSX related CPUID bits.

The summary of changes to the IA32_TSX_FORCE_ABORT MSR are:

  Bit 0: FORCE_ABORT_RTM (legacy bit, new default=1) Status bit that
  indicates if RTM transactions are always aborted. This bit is
  essentially !SDV_ENABLE_RTM(Bit 2). Writes to this bit are ignored.

  Bit 1: TSX_CPUID_CLEAR (new bit, default=0) When set, CPUID.HLE = 0
  and CPUID.RTM = 0.

  Bit 2: SDV_ENABLE_RTM (new bit, default=0) When clear, XBEGIN will
  always abort with EAX code 0. When set, XBEGIN will not be forced to
  abort (but will always abort in SGX enclaves). This bit is intended to
  be used on developer systems. If this bit is set, transactional
  atomicity correctness is not certain. SDV = Software Development
  Vehicle (SDV), i.e. developer systems.

Performance monitoring counter 3 is usable in all cases, regardless of
the value of above bits.

Add support for a new CPUID bit - CPUID.RTM_ALWAYS_ABORT (CPUID 7.EDX[11])
 - to indicate the status of always abort behavior.

[1] [ bp: Look for document ID 604224, "Performance Monitoring Impact
      of Intel Transactional Synchronization Extension Memory". Since
      there's no way for us to have stable links to documents... ]

 [ bp: Massage and extend commit message. ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Link: https://lkml.kernel.org/r/9add61915b4a4eedad74fbd869107863a28b428e.1623704845.git-series.pawan.kumar.gupta@linux.intel.com
2021-06-15 17:23:15 +02:00
Brijesh Singh
059e5c321a x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG
The SYSCFG MSR continued being updated beyond the K8 family; drop the K8
name from it.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com
2021-05-10 07:51:38 +02:00
Linus Torvalds
42dec9a936 Perf events changes in this cycle were:
- Improve Intel uncore PMU support:
 
      - Parse uncore 'discovery tables' - a new hardware capability enumeration method
        introduced on the latest Intel platforms. This table is in a well-defined PCI
        namespace location and is read via MMIO. It is organized in an rbtree.
 
        These uncore tables will allow the discovery of standard counter blocks, but
        fancier counters still need to be enumerated explicitly.
 
      - Add Alder Lake support
 
      - Improve IIO stacks to PMON mapping support on Skylake servers
 
  - Add Intel Alder Lake PMU support - which requires the introduction of 'hybrid' CPUs
    and PMUs. Alder Lake is a mix of Golden Cove ('big') and Gracemont ('small' - Atom derived)
    cores.
 
    The CPU-side feature set is entirely symmetrical - but on the PMU side there's
    core type dependent PMU functionality.
 
  - Reduce data loss with CPU level hardware tracing on Intel PT / AUX profiling, by
    fixing the AUX allocation watermark logic.
 
  - Improve ring buffer allocation on NUMA systems
 
  - Put 'struct perf_event' into their separate kmem_cache pool
 
  - Add support for synchronous signals for select perf events. The immediate motivation
    is to support low-overhead sampling-based race detection for user-space code. The
    feature consists of the following main changes:
 
     - Add thread-only event inheritance via perf_event_attr::inherit_thread, which limits
       inheritance of events to CLONE_THREAD.
 
     - Add the ability for events to not leak through exec(), via perf_event_attr::remove_on_exec.
 
     - Allow the generation of SIGTRAP via perf_event_attr::sigtrap, extend siginfo with an u64
       ::si_perf, and add the breakpoint information to ::si_addr and ::si_perf if the event is
       PERF_TYPE_BREAKPOINT.
 
    The siginfo support is adequate for breakpoints right now - but the new field can be used
    to introduce support for other types of metadata passed over siginfo as well.
 
  - Misc fixes, cleanups and smaller updates.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCJGpERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j9zBAAuVbG2snV6SBSdXLhQcM66N3NckOXvSY5
 QjjhQcuwJQEK/NJB3266K5d8qSmdyRBsWf3GCsrmyBT67P1V28K44Pu7oCV0UDtf
 mpVRjEP0oR7hNsANSSgo8Fa4ZD7H5waX7dK7925Tvw8By3mMoZoddiD/84WJHhxO
 NDF+GRFaRj+/dpbhV8cdCoXTjYdkC36vYuZs3b9lu0tS9D/AJgsNy7TinLvO02Cs
 5peP+2y29dgvCXiGBiuJtEA6JyGnX3nUJCvfOZZ/DWDc3fdduARlRrc5Aiq4n/wY
 UdSkw1VTZBlZ1wMSdmHQVeC5RIH3uWUtRoNqy0Yc90lBm55AQ0EENwIfWDUDC5zy
 USdBqWTNWKMBxlEilUIyqKPQK8LW/31TRzqy8BWKPNcZt5yP5YS1SjAJRDDjSwL/
 I+OBw1vjLJamYh8oNiD5b+VLqNQba81jFASfv+HVWcULumnY6ImECCpkg289Fkpi
 BVR065boifJDlyENXFbvTxyMBXQsZfA+EhtxG7ju2Ni+TokBbogyCb3L2injPt9g
 7jjtTOqmfad4gX1WSc+215iYZMkgECcUd9E+BfOseEjBohqlo7yNKIfYnT8mE/Xq
 nb7eHjyvLiE8tRtZ+7SjsujOMHv9LhWFAbSaxU/kEVzpkp0zyd6mnnslDKaaHLhz
 goUMOL/D0lg=
 =NhQ7
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event updates from Ingo Molnar:

 - Improve Intel uncore PMU support:

     - Parse uncore 'discovery tables' - a new hardware capability
       enumeration method introduced on the latest Intel platforms. This
       table is in a well-defined PCI namespace location and is read via
       MMIO. It is organized in an rbtree.

       These uncore tables will allow the discovery of standard counter
       blocks, but fancier counters still need to be enumerated
       explicitly.

     - Add Alder Lake support

     - Improve IIO stacks to PMON mapping support on Skylake servers

 - Add Intel Alder Lake PMU support - which requires the introduction of
   'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big')
   and Gracemont ('small' - Atom derived) cores.

   The CPU-side feature set is entirely symmetrical - but on the PMU
   side there's core type dependent PMU functionality.

 - Reduce data loss with CPU level hardware tracing on Intel PT / AUX
   profiling, by fixing the AUX allocation watermark logic.

 - Improve ring buffer allocation on NUMA systems

 - Put 'struct perf_event' into their separate kmem_cache pool

 - Add support for synchronous signals for select perf events. The
   immediate motivation is to support low-overhead sampling-based race
   detection for user-space code. The feature consists of the following
   main changes:

     - Add thread-only event inheritance via
       perf_event_attr::inherit_thread, which limits inheritance of
       events to CLONE_THREAD.

     - Add the ability for events to not leak through exec(), via
       perf_event_attr::remove_on_exec.

     - Allow the generation of SIGTRAP via perf_event_attr::sigtrap,
       extend siginfo with an u64 ::si_perf, and add the breakpoint
       information to ::si_addr and ::si_perf if the event is
       PERF_TYPE_BREAKPOINT.

   The siginfo support is adequate for breakpoints right now - but the
   new field can be used to introduce support for other types of
   metadata passed over siginfo as well.

 - Misc fixes, cleanups and smaller updates.

* tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  signal, perf: Add missing TRAP_PERF case in siginfo_layout()
  signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures
  perf/x86: Allow for 8<num_fixed_counters<16
  perf/x86/rapl: Add support for Intel Alder Lake
  perf/x86/cstate: Add Alder Lake CPU support
  perf/x86/msr: Add Alder Lake CPU support
  perf/x86/intel/uncore: Add Alder Lake support
  perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
  perf/x86/intel: Add Alder Lake Hybrid support
  perf/x86: Support filter_match callback
  perf/x86/intel: Add attr_update for Hybrid PMUs
  perf/x86: Add structures for the attributes of Hybrid PMUs
  perf/x86: Register hybrid PMUs
  perf/x86: Factor out x86_pmu_show_pmu_cap
  perf/x86: Remove temporary pmu assignment in event_init
  perf/x86/intel: Factor out intel_pmu_check_extra_regs
  perf/x86/intel: Factor out intel_pmu_check_event_constraints
  perf/x86/intel: Factor out intel_pmu_check_num_counters
  perf/x86: Hybrid PMU support for extra_regs
  perf/x86: Hybrid PMU support for event constraints
  ...
2021-04-28 13:03:44 -07:00
Linus Torvalds
64f8e73de0 Support for enhanced split lock detection:
Newer CPUs provide a second mechanism to detect operations with lock
   prefix which go accross a cache line boundary. Such operations have to
   take bus lock which causes a system wide performance degradation when
   these operations happen frequently.
 
   The new mechanism is not using the #AC exception. It triggers #DB and is
   restricted to operations in user space. Kernel side split lock access can
   only be detected by the #AC based variant. Contrary to the #AC based
   mechanism the #DB based variant triggers _after_ the instruction was
   executed. The mechanism is CPUID enumerated and contrary to the #AC
   version which is based on the magic TEST_CTRL_MSR and model/family based
   enumeration on the way to become architectural.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGkr8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodUKD/9tUXhInR7+1ykEHpMvdmSp48vqY3nc
 sKmT22pPl+OchnJ62mw3T8gKpBYVleJmcCaY2qVx7hfaVcWApLGJvX4tmfXmv422
 XDSJ6b8Os6wfgx5FR//I17z8ZtXnnuKkPrTMoRsQUw2qLq31y6fdQv+GW/cc1Kpw
 mengjmPE+HnpaKbtuQfPdc4a+UvLjvzBMAlDZPTBPKYrP4FFqYVnUVwyTg5aLVDY
 gHz4V8+b502RS/zPfTAtE3J848od+NmcUPdFlcG9DVA+hR0Rl0thvruCTFiD2vVh
 i9DJ7INof5FoJDEzh0dGsD7x+MB6OY8GZyHdUMeGgIRPtWkqrG52feQQIn2YYlaL
 fB3DlpNv7NIJ/0JMlALvh8S0tEoOcYdHqH+M/3K/zbzecg/FAo+lVo8WciGLPqWs
 ykUG5/f/OnlTvgB8po1ebJu0h0jHnoK9heWWXk9zWIRVDPXHFOWKW3kSbTTb3icR
 9hfjP/SNejpmt9Ju1OTwsgnV7NALIdVX+G5jyIEsjFl31Co1RZNYhHLFvi11FWlQ
 /ssvFK9O5ZkliocGCAN9+yuOnM26VqWSCE4fis6/2aSgD2Y4Gpvb//cP96SrcNAH
 u8eXNvGLlniJP3F3JImWIfIPQTrpvQhcU4eZ6NtviXqj/utQXX6c9PZ1PLYpcvUh
 9AWF8rwhT8X4oA==
 =lmi8
 -----END PGP SIGNATURE-----

Merge tag 'x86-splitlock-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 bus lock detection updates from Thomas Gleixner:
 "Support for enhanced split lock detection:

  Newer CPUs provide a second mechanism to detect operations with lock
  prefix which go accross a cache line boundary. Such operations have to
  take bus lock which causes a system wide performance degradation when
  these operations happen frequently.

  The new mechanism is not using the #AC exception. It triggers #DB and
  is restricted to operations in user space. Kernel side split lock
  access can only be detected by the #AC based variant.

  Contrary to the #AC based mechanism the #DB based variant triggers
  _after_ the instruction was executed. The mechanism is CPUID
  enumerated and contrary to the #AC version which is based on the magic
  TEST_CTRL_MSR and model/family based enumeration on the way to become
  architectural"

* tag 'x86-splitlock-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/admin-guide: Change doc for split_lock_detect parameter
  x86/traps: Handle #DB for bus lock
  x86/cpufeatures: Enumerate #DB for bus lock detection
2021-04-26 10:09:38 -07:00
Kan Liang
d0946a882e perf/x86/intel: Hybrid PMU support for perf capabilities
Some platforms, e.g. Alder Lake, have hybrid architecture. Although most
PMU capabilities are the same, there are still some unique PMU
capabilities for different hybrid PMUs. Perf should register a dedicated
pmu for each hybrid PMU.

Add a new struct x86_hybrid_pmu, which saves the dedicated pmu and
capabilities for each hybrid PMU.

The architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicates the
architecture features which are available on all hybrid PMUs. The
architecture features are stored in the global x86_pmu.intel_cap.

For Alder Lake, the model-specific features are perf metrics and
PEBS-via-PT. The corresponding bits of the global x86_pmu.intel_cap
should be 0 for these two features. Perf should not use the global
intel_cap to check the features on a hybrid system.
Add a dedicated intel_cap in the x86_hybrid_pmu to store the
model-specific capabilities. Use the dedicated intel_cap to replace
the global intel_cap for thse two features. The dedicated intel_cap
will be set in the following "Add Alder Lake Hybrid support" patch.

Add is_hybrid() to distinguish a hybrid system. ADL may have an
alternative configuration. With that configuration, the
X86_FEATURE_HYBRID_CPU is not set. Perf cannot rely on the feature bit.
Add a new static_key_false, perf_is_hybrid, to indicate a hybrid system.
It will be assigned in the following "Add Alder Lake Hybrid support"
patch as well.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1618237865-33448-5-git-send-email-kan.liang@linux.intel.com
2021-04-19 20:03:24 +02:00
Fenghua Yu
ebb1064e7c x86/traps: Handle #DB for bus lock
Bus locks degrade performance for the whole system, not just for the CPU
that requested the bus lock. Two CPU features "#AC for split lock" and
"#DB for bus lock" provide hooks so that the operating system may choose
one of several mitigation strategies.

#AC for split lock is already implemented. Add code to use the #DB for
bus lock feature to cover additional situations with new options to
mitigate.

split_lock_detect=
		#AC for split lock		#DB for bus lock

off		Do nothing			Do nothing

warn		Kernel OOPs			Warn once per task and
		Warn once per task and		and continues to run.
		disable future checking
	 	When both features are
		supported, warn in #AC

fatal		Kernel OOPs			Send SIGBUS to user.
		Send SIGBUS to user
		When both features are
		supported, fatal in #AC

ratelimit:N	Do nothing			Limit bus lock rate to
						N per second in the
						current non-root user.

Default option is "warn".

Hardware only generates #DB for bus lock detect when CPL>0 to avoid
nested #DB from multiple bus locks while the first #DB is being handled.
So no need to handle #DB for bus lock detected in the kernel.

#DB for bus lock is enabled by bus lock detection bit 2 in DEBUGCTL MSR
while #AC for split lock is enabled by split lock detection bit 29 in
TEST_CTRL MSR.

Both breakpoint and bus lock in the same instruction can trigger one #DB.
The bus lock is handled before the breakpoint in the #DB handler.

Delivery of #DB for bus lock in userspace clears DR6[11], which is set by
the #DB handler right after reading DR6.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com
2021-03-28 22:52:15 +02:00
Dave Hansen
09141ec0e4 x86: Remove duplicate TSC DEADLINE MSR definitions
There are two definitions for the TSC deadline MSR in msr-index.h,
one with an underscore and one without.  Axe one of them and move
all the references over to the other one.

 [ bp: Fixup the MSR define in handle_fastpath_set_msr_irqoff() too. ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200305174706.0D6B8EE4@viggo.jf.intel.com
2021-03-08 11:05:20 +01:00
Linus Torvalds
6a447b0e31 ARM:
* PSCI relay at EL2 when "protected KVM" is enabled
 * New exception injection code
 * Simplification of AArch32 system register handling
 * Fix PMU accesses when no PMU is enabled
 * Expose CSV3 on non-Meltdown hosts
 * Cache hierarchy discovery fixes
 * PV steal-time cleanups
 * Allow function pointers at EL2
 * Various host EL2 entry cleanups
 * Simplification of the EL2 vector allocation
 
 s390:
 * memcg accouting for s390 specific parts of kvm and gmap
 * selftest for diag318
 * new kvm_stat for when async_pf falls back to sync
 
 x86:
 * Tracepoints for the new pagetable code from 5.10
 * Catch VFIO and KVM irqfd events before userspace
 * Reporting dirty pages to userspace with a ring buffer
 * SEV-ES host support
 * Nested VMX support for wait-for-SIPI activity state
 * New feature flag (AVX512 FP16)
 * New system ioctl to report Hyper-V-compatible paravirtualization features
 
 Generic:
 * Selftest improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/bdL4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNgQQgAnTH6rhXa++Zd5F0EM2NwXwz3iEGb
 lOq1DZSGjs6Eekjn8AnrWbmVQr+CBCuGU9MrxpSSzNDK/awryo3NwepOWAZw9eqk
 BBCVwGBbJQx5YrdgkGC0pDq2sNzcpW/VVB3vFsmOxd9eHblnuKSIxEsCCXTtyqIt
 XrLpQ1UhvI4yu102fDNhuFw2EfpzXm+K0Lc0x6idSkdM/p7SyeOxiv8hD4aMr6+G
 bGUQuMl4edKZFOWFigzr8NovQAvDHZGrwfihu2cLRYKLhV97QuWVmafv/yYfXcz2
 drr+wQCDNzDOXyANnssmviazrhOX0QmTAhbIXGGX/kTxYKcfPi83ZLoI3A==
 =ISud
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Much x86 work was pushed out to 5.12, but ARM more than made up for it.

  ARM:
   - PSCI relay at EL2 when "protected KVM" is enabled
   - New exception injection code
   - Simplification of AArch32 system register handling
   - Fix PMU accesses when no PMU is enabled
   - Expose CSV3 on non-Meltdown hosts
   - Cache hierarchy discovery fixes
   - PV steal-time cleanups
   - Allow function pointers at EL2
   - Various host EL2 entry cleanups
   - Simplification of the EL2 vector allocation

  s390:
   - memcg accouting for s390 specific parts of kvm and gmap
   - selftest for diag318
   - new kvm_stat for when async_pf falls back to sync

  x86:
   - Tracepoints for the new pagetable code from 5.10
   - Catch VFIO and KVM irqfd events before userspace
   - Reporting dirty pages to userspace with a ring buffer
   - SEV-ES host support
   - Nested VMX support for wait-for-SIPI activity state
   - New feature flag (AVX512 FP16)
   - New system ioctl to report Hyper-V-compatible paravirtualization features

  Generic:
   - Selftest improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
  KVM: SVM: fix 32-bit compilation
  KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
  KVM: SVM: Provide support to launch and run an SEV-ES guest
  KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
  KVM: SVM: Provide support for SEV-ES vCPU loading
  KVM: SVM: Provide support for SEV-ES vCPU creation/loading
  KVM: SVM: Update ASID allocation to support SEV-ES guests
  KVM: SVM: Set the encryption mask for the SVM host save area
  KVM: SVM: Add NMI support for an SEV-ES guest
  KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
  KVM: SVM: Do not report support for SMM for an SEV-ES guest
  KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  KVM: SVM: Add support for EFER write traps for an SEV-ES guest
  KVM: SVM: Support string IO operations for an SEV-ES guest
  KVM: SVM: Support MMIO for an SEV-ES guest
  KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
  KVM: SVM: Create trace events for VMGEXIT processing
  ...
2020-12-20 10:44:05 -08:00
Linus Torvalds
b4ec805464 Power management updates for 5.11-rc1
- Use local_clock() instead of jiffies in the cpufreq statistics to
    improve accuracy (Viresh Kumar).
 
  - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
    drivers (Viresh Kumar).
 
  - Clean up the cpufreq core, the intel_pstate driver and the
    schedutil cpufreq governor (Rafael Wysocki).
 
  - Fix up error code paths in the sti-cpufreq and mediatek cpufreq
    drivers (Yangtao Li, Qinglang Miao).
 
  - Fix cpufreq_online() to return error codes instead of success (0)
    in all cases when it fails (Wang ShaoBo).
 
  - Add mt8167 support to the mediatek cpufreq driver and blacklist
    mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
 
  - Modify the tegra194 cpufreq driver to always return values from
    the frequency table as the current frequency and clean up that
    driver (Sumit Gupta, Jon Hunter).
 
  - Modify the arm_scmi cpufreq driver to allow it to discover the
    power scale present in the performance protocol and provide this
    information to the Energy Model (Lukasz Luba).
 
  - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
    Rohár).
 
  - Clean up the CPPC cpufreq driver (Ionela Voinescu).
 
  - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
    Bergmann).
 
  - Rework the poling interval selection for the polling state in
    cpuidle (Mel Gorman).
 
  - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle
    driver (Ulf Hansson).
 
  - Modify the OPP framework to support empty (node-less) OPP tables
    in DT for passing dependency information (Nicola Mazzucato).
 
  - Fix potential lockdep issue in the OPP core and clean up the OPP
    core (Viresh Kumar).
 
  - Modify dev_pm_opp_put_regulators() to accept a NULL argument and
    update its users accordingly (Viresh Kumar).
 
  - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
 
  - Add support for governor feature flags to devfreq, make devfreq
    sysfs file permissions depend on the governor and clean up the
    devfreq core (Chanwoo Choi).
 
  - Clean up the tegra20 devfreq driver and deprecate it to allow
    another driver based on EMC_STAT to be used instead of it (Dmitry
    Osipenko).
 
  - Add interconnect support to the tegra30 devfreq driver, allow it
    to take the interconnect and OPP information from DT and clean it
    up ((Dmitry Osipenko).
 
  - Add interconnect support to the exynos-bus devfreq driver along
    with interconnect properties documentation (Sylwester Nawrocki).
 
  - Add suport for AMD Fam17h and Fam19h processors to the RAPL power
    capping driver (Victor Ding, Kim Phillips).
 
  - Fix handling of overly long constraint names in the powercap
    framework (Lukasz Luba).
 
  - Fix the wakeup configuration handling for bridges in the ACPI
    device power management core (Rafael Wysocki).
 
  - Add support for using an abstract scale for power units in the
    Energy Model (EM) and document it (Lukasz Luba).
 
  - Add em_cpu_energy() micro-optimization to the EM (Pavankumar
    Kondeti).
 
  - Modify the generic power domains (genpd) framwework to support
    suspend-to-idle (Ulf Hansson).
 
  - Fix creation of debugfs nodes in genpd (Thierry Strudel).
 
  - Clean up genpd (Lina Iyer).
 
  - Clean up the core system-wide suspend code and make it print
    driver flags for devices with debug enabled (Alex Shi, Patrice
    Chotard, Chen Yu).
 
  - Modify the ACPI system reboot code to make it prepare for system
    power off to avoid confusing the platform firmware (Kai-Heng Feng).
 
  - Update the pm-graph (multiple changes, mostly usability-related)
    and cpupower (online and offline CPU information support) PM
    utilities (Todd Brandt, Brahadambal Srinivasan).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl/Y8mcSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxjY4QAKsNFJeEtjGCxq7MxQIML3QLAsdJM9of
 9kkY9skMEw4v1TRmyy7sW9jZW2pLSRcLJwWRKWu4143qUS3YUp2DQ0lqX4WyXoWu
 BhnkhkMUl6iCeBO8CWnt8zsTuqSa20A13sL9LyqN1+7OZKHD8StbT4hKjBncdNNN
 4aDj+1uAPyOgj2iCUZuHQ8DtpBvOLjgTh367vbhbufjeJ//8/9+R7s4Xzrj7wtmv
 JlE0LDgvge9QeGTpjhxQJzn0q2/H5fg9jbmjPXUfbHJNuyKhrqnmjGyrN5m256JI
 8DqGqQtJpmFp7Ihrur3uKTk3gWO05YwJ1FdeEooAKEjEMObm5xuYhKVRoDhmlJAu
 G6ui+OAUvNR0FffJtbzvWe/pLovLGOEOHdvTrZxUF8Abo6br3untTm8rKTi1fhaF
 wWndSMw0apGsPzCx5T+bE7AbJz2QHFpLhaVAutenuCzNI8xoMlxNKEzsaVz/+FqL
 Pq/PdFaM4vNlMbv7hkb/fujkCs/v3EcX2ihzvt7I2o8dBS0D1X8A4mnuWJmiGslw
 1ftbJ6M9XacwkPBTHPgeXxJh2C1yxxe5VQ9Z5fWWi7sPOUeJnUwxKaluv+coFndQ
 sO6JxsPQ4hQihg8yOxLEkL6Wn68sZlmp+u2Oj+TPFAsAGANIA8rJlBPo1ppJWvdQ
 j1OCIc/qzwpH
 =BVdX
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These update cpufreq (core and drivers), cpuidle (polling state
  implementation and the PSCI driver), the OPP (operating performance
  points) framework, devfreq (core and drivers), the power capping RAPL
  (Running Average Power Limit) driver, the Energy Model support, the
  generic power domains (genpd) framework, the ACPI device power
  management, the core system-wide suspend code and power management
  utilities.

  Specifics:

   - Use local_clock() instead of jiffies in the cpufreq statistics to
     improve accuracy (Viresh Kumar).

   - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
     drivers (Viresh Kumar).

   - Clean up the cpufreq core, the intel_pstate driver and the
     schedutil cpufreq governor (Rafael Wysocki).

   - Fix up error code paths in the sti-cpufreq and mediatek cpufreq
     drivers (Yangtao Li, Qinglang Miao).

   - Fix cpufreq_online() to return error codes instead of success (0)
     in all cases when it fails (Wang ShaoBo).

   - Add mt8167 support to the mediatek cpufreq driver and blacklist
     mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).

   - Modify the tegra194 cpufreq driver to always return values from the
     frequency table as the current frequency and clean up that driver
     (Sumit Gupta, Jon Hunter).

   - Modify the arm_scmi cpufreq driver to allow it to discover the
     power scale present in the performance protocol and provide this
     information to the Energy Model (Lukasz Luba).

   - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
     Rohár).

   - Clean up the CPPC cpufreq driver (Ionela Voinescu).

   - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
     Bergmann).

   - Rework the poling interval selection for the polling state in
     cpuidle (Mel Gorman).

   - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver
     (Ulf Hansson).

   - Modify the OPP framework to support empty (node-less) OPP tables in
     DT for passing dependency information (Nicola Mazzucato).

   - Fix potential lockdep issue in the OPP core and clean up the OPP
     core (Viresh Kumar).

   - Modify dev_pm_opp_put_regulators() to accept a NULL argument and
     update its users accordingly (Viresh Kumar).

   - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).

   - Add support for governor feature flags to devfreq, make devfreq
     sysfs file permissions depend on the governor and clean up the
     devfreq core (Chanwoo Choi).

   - Clean up the tegra20 devfreq driver and deprecate it to allow
     another driver based on EMC_STAT to be used instead of it (Dmitry
     Osipenko).

   - Add interconnect support to the tegra30 devfreq driver, allow it to
     take the interconnect and OPP information from DT and clean it up
     (Dmitry Osipenko).

   - Add interconnect support to the exynos-bus devfreq driver along
     with interconnect properties documentation (Sylwester Nawrocki).

   - Add suport for AMD Fam17h and Fam19h processors to the RAPL power
     capping driver (Victor Ding, Kim Phillips).

   - Fix handling of overly long constraint names in the powercap
     framework (Lukasz Luba).

   - Fix the wakeup configuration handling for bridges in the ACPI
     device power management core (Rafael Wysocki).

   - Add support for using an abstract scale for power units in the
     Energy Model (EM) and document it (Lukasz Luba).

   - Add em_cpu_energy() micro-optimization to the EM (Pavankumar
     Kondeti).

   - Modify the generic power domains (genpd) framwework to support
     suspend-to-idle (Ulf Hansson).

   - Fix creation of debugfs nodes in genpd (Thierry Strudel).

   - Clean up genpd (Lina Iyer).

   - Clean up the core system-wide suspend code and make it print driver
     flags for devices with debug enabled (Alex Shi, Patrice Chotard,
     Chen Yu).

   - Modify the ACPI system reboot code to make it prepare for system
     power off to avoid confusing the platform firmware (Kai-Heng Feng).

   - Update the pm-graph (multiple changes, mostly usability-related)
     and cpupower (online and offline CPU information support) PM
     utilities (Todd Brandt, Brahadambal Srinivasan)"

* tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
  cpufreq: Fix cpufreq_online() return value on errors
  cpufreq: Fix up several kerneldoc comments
  cpufreq: stats: Use local_clock() instead of jiffies
  cpufreq: schedutil: Simplify sugov_update_next_freq()
  cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
  PM: domains: create debugfs nodes when adding power domains
  opp: of: Allow empty opp-table with opp-shared
  dt-bindings: opp: Allow empty OPP tables
  media: venus: dev_pm_opp_put_*() accepts NULL argument
  drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
  drm/lima: dev_pm_opp_put_*() accepts NULL argument
  PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
  cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
  cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
  opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
  opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
  cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
  opp: Reduce the size of critical section in _opp_kref_release()
  PM / EM: Micro optimization in em_cpu_energy
  cpufreq: arm_scmi: Discover the power scale in performance protocol
  ...
2020-12-15 16:30:31 -08:00
Linus Torvalds
5583ff677b "Intel SGX is new hardware functionality that can be used by
applications to populate protected regions of user code and data called
 enclaves. Once activated, the new hardware protects enclave code and
 data from outside access and modification.
 
 Enclaves provide a place to store secrets and process data with those
 secrets. SGX has been used, for example, to decrypt video without
 exposing the decryption keys to nosy debuggers that might be used to
 subvert DRM. Software has generally been rewritten specifically to
 run in enclaves, but there are also projects that try to run limited
 unmodified software in enclaves."
 
 Most of the functionality is concentrated into arch/x86/kernel/cpu/sgx/
 except the addition of a new mprotect() hook to control enclave page
 permissions and support for vDSO exceptions fixup which will is used by
 SGX enclaves.
 
 All this work by Sean Christopherson, Jarkko Sakkinen and many others.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/XTtMACgkQEsHwGGHe
 VUqxFw/+NZGf2b3CWPcrvwXCpkvSpIrqh1jQwyvkZyJ1gen7Vy8dkvf99h8+zQPI
 4wSArEyjhYJKAAmBNefLKi/Cs/bdkGzLlZyDGqtM641XRjf0xXIpQkOBb6UBa+Pv
 to8veQmVH2bBTM49qnd+H1wM6FzYvhTYCD8xr4HlLXtIfpP2CK2GvCb8s/4LifgD
 fTucZX9TFwLgVkWOHWHN0n8XMR2Fjb2YCrwjFMKyr/M2W+pPoOCTIt4PWDuXiOeG
 rFP7R4DT9jDg8ht5j2dHQT/Bo8TvTCB4Oj98MrX1TTgkSjLJySSMfyQg5EwNfSIa
 HC0lg/6qwAxnhWX7cCCBETNZ4aYDmz/dxcCSsLbomGP9nMaUgUy7qn5nNuNbJilb
 oCBsr8LDMzu1LJzmkduM8Uw6OINh+J8ICoVXaR5pS7gSZz/+vqIP/rK691AiqhJL
 QeMkI9gQ83jEXpr/AV7ABCjGCAeqELOkgravUyTDev24eEc0LyU0qENpgxqWSTca
 OvwSWSwNuhCKd2IyKZBnOmjXGwvncwX0gp1KxL9WuLkR6O8XldLAYmVCwVAOrIh7
 snRot8+3qNjELa65Nh5DapwLJrU24TRoKLHLgfWK8dlqrMejNtXKucQ574Np0feR
 p2hrNisOrtCwxAt7OAgWygw8agN6cJiY18onIsr4wSBm5H7Syb0=
 =k7tj
 -----END PGP SIGNATURE-----

Merge tag 'x86_sgx_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SGC support from Borislav Petkov:
 "Intel Software Guard eXtensions enablement. This has been long in the
  making, we were one revision number short of 42. :)

  Intel SGX is new hardware functionality that can be used by
  applications to populate protected regions of user code and data
  called enclaves. Once activated, the new hardware protects enclave
  code and data from outside access and modification.

  Enclaves provide a place to store secrets and process data with those
  secrets. SGX has been used, for example, to decrypt video without
  exposing the decryption keys to nosy debuggers that might be used to
  subvert DRM. Software has generally been rewritten specifically to run
  in enclaves, but there are also projects that try to run limited
  unmodified software in enclaves.

  Most of the functionality is concentrated into arch/x86/kernel/cpu/sgx/
  except the addition of a new mprotect() hook to control enclave page
  permissions and support for vDSO exceptions fixup which will is used
  by SGX enclaves.

  All this work by Sean Christopherson, Jarkko Sakkinen and many others"

* tag 'x86_sgx_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  x86/sgx: Return -EINVAL on a zero length buffer in sgx_ioc_enclave_add_pages()
  x86/sgx: Fix a typo in kernel-doc markup
  x86/sgx: Fix sgx_ioc_enclave_provision() kernel-doc comment
  x86/sgx: Return -ERESTARTSYS in sgx_ioc_enclave_add_pages()
  selftests/sgx: Use a statically generated 3072-bit RSA key
  x86/sgx: Clarify 'laundry_list' locking
  x86/sgx: Update MAINTAINERS
  Documentation/x86: Document SGX kernel architecture
  x86/sgx: Add ptrace() support for the SGX driver
  x86/sgx: Add a page reclaimer
  selftests/x86: Add a selftest for SGX
  x86/vdso: Implement a vDSO for Intel SGX enclave call
  x86/traps: Attempt to fixup exceptions in vDSO before signaling
  x86/fault: Add a helper function to sanitize error code
  x86/vdso: Add support for exception fixup in vDSO functions
  x86/sgx: Add SGX_IOC_ENCLAVE_PROVISION
  x86/sgx: Add SGX_IOC_ENCLAVE_INIT
  x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES
  x86/sgx: Add SGX_IOC_ENCLAVE_CREATE
  x86/sgx: Add an SGX misc driver interface
  ...
2020-12-14 13:14:57 -08:00
Tom Lendacky
69372cf012 x86/cpu: Add VM page flush MSR availablility as a CPUID feature
On systems that do not have hardware enforced cache coherency between
encrypted and unencrypted mappings of the same physical page, the
hypervisor can use the VM page flush MSR (0xc001011e) to flush the cache
contents of an SEV guest page. When a small number of pages are being
flushed, this can be used in place of issuing a WBINVD across all CPUs.

CPUID 0x8000001f_eax[2] is used to determine if the VM page flush MSR is
available. Add a CPUID feature to indicate it is supported and define the
MSR.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <f1966379e31f9b208db5257509c4a089a87d33d0.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-14 11:09:30 -05:00
Sean Christopherson
d205e0f142 x86/{cpufeatures,msr}: Add Intel SGX Launch Control hardware bits
The SGX Launch Control hardware helps restrict which enclaves the
hardware will run.  Launch control is intended to restrict what software
can run with enclave protections, which helps protect the overall system
from bad enclaves.

For the kernel's purposes, there are effectively two modes in which the
launch control hardware can operate: rigid and flexible. In its rigid
mode, an entity other than the kernel has ultimate authority over which
enclaves can be run (firmware, Intel, etc...). In its flexible mode, the
kernel has ultimate authority over which enclaves can run.

Enable X86_FEATURE_SGX_LC to enumerate when the CPU supports SGX Launch
Control in general.

Add MSR_IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}, which when combined contain a
SHA256 hash of a 3072-bit RSA public key. The hardware allows SGX enclaves
signed with this public key to initialize and run [*]. Enclaves not signed
with this key can not initialize and run.

Add FEAT_CTL_SGX_LC_ENABLED, which informs whether the SGXLEPUBKEYHASH MSRs
can be written by the kernel.

If the MSRs do not exist or are read-only, the launch control hardware is
operating in rigid mode. Linux does not and will not support creating
enclaves when hardware is configured in rigid mode because it takes away
the authority for launch decisions from the kernel. Note, this does not
preclude KVM from virtualizing/exposing SGX to a KVM guest when launch
control hardware is operating in rigid mode.

[*] Intel SDM: 38.1.4 Intel SGX Launch Control Configuration

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-5-jarkko@kernel.org
2020-11-17 14:36:13 +01:00
Sean Christopherson
e7b6385b01 x86/cpufeatures: Add Intel SGX hardware bits
Populate X86_FEATURE_SGX feature from CPUID and tie it to the Kconfig
option with disabled-features.h.

IA32_FEATURE_CONTROL.SGX_ENABLE must be examined in addition to the CPUID
bits to enable full SGX support.  The BIOS must both set this bit and lock
IA32_FEATURE_CONTROL for SGX to be supported (Intel SDM section 36.7.1).
The setting or clearing of this bit has no impact on the CPUID bits above,
which is why it needs to be detected separately.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-4-jarkko@kernel.org
2020-11-17 14:36:13 +01:00
Victor Ding
43756a2989 powercap: Add AMD Fam17h RAPL support
Enable AMD Fam17h RAPL support for the power capping framework.

The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh
(Zen1) PPR.

Tested by comparing the results of following two sysfs entries and the
values directly read from corresponding MSRs via /dev/cpu/[x]/msr:
  /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
  /sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj

Signed-off-by: Victor Ding <victording@google.com>
Acked-by: Kim Phillips <kim.phillips@amd.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10 19:59:07 +01:00
Victor Ding
298ed2b31f x86/msr-index: sort AMD RAPL MSRs by address
MSRs in the rest of this file are sorted by their addresses; fixing the
two outliers.

No functional changes.

Signed-off-by: Victor Ding <victording@google.com>
Acked-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10 19:59:06 +01:00
Tony Luck
68299a42f8 x86/mce: Enable additional error logging on certain Intel CPUs
The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an
optional additional error logging mode which is enabled by an MSR.

Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu,
but userspace should not be poking at MSRs. So move the enabling into
the kernel.

 [ bp: Correct the explanation why this is done. ]

Suggested-by: Boris Petkov <bp@alien8.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201030190807.GA13884@agluck-desk2.amr.corp.intel.com
2020-11-02 11:15:59 +01:00
Linus Torvalds
da9803dfd3 This feature enhances the current guest memory encryption support
called SEV by also encrypting the guest register state, making the
 registers inaccessible to the hypervisor by en-/decrypting them on world
 switches. Thus, it adds additional protection to Linux guests against
 exfiltration, control flow and rollback attacks.
 
 With SEV-ES, the guest is in full control of what registers the
 hypervisor can access. This is provided by a guest-host exchange
 mechanism based on a new exception vector called VMM Communication
 Exception (#VC), a new instruction called VMGEXIT and a shared
 Guest-Host Communication Block which is a decrypted page shared between
 the guest and the hypervisor.
 
 Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so
 in order for that exception mechanism to work, the early x86 init code
 needed to be made able to handle exceptions, which, in itself, brings
 a bunch of very nice cleanups and improvements to the early boot code
 like an early page fault handler, allowing for on-demand building of the
 identity mapping. With that, !KASLR configurations do not use the EFI
 page table anymore but switch to a kernel-controlled one.
 
 The main part of this series adds the support for that new exchange
 mechanism. The goal has been to keep this as much as possibly
 separate from the core x86 code by concentrating the machinery in two
 SEV-ES-specific files:
 
  arch/x86/kernel/sev-es-shared.c
  arch/x86/kernel/sev-es.c
 
 Other interaction with core x86 code has been kept at minimum and behind
 static keys to minimize the performance impact on !SEV-ES setups.
 
 Work by Joerg Roedel and Thomas Lendacky and others.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+FiKYACgkQEsHwGGHe
 VUqS5BAAlh5mKwtxXMyFyAIHa5tpsgDjbecFzy1UVmZyxN0JHLlM3NLmb+K52drY
 PiWjNNMi/cFMFazkuLFHuY0poBWrZml8zRS/mExKgUJC6EtguS9FQnRE9xjDBoWQ
 gOTSGJWEzT5wnFqo8qHwlC2CDCSF1hfL8ks3cUFW2tCWus4F9pyaMSGfFqD224rg
 Lh/8+arDMSIKE4uH0cm7iSuyNpbobId0l5JNDfCEFDYRigQZ6pZsQ9pbmbEpncs4
 rmjDvBA5eHDlNMXq0ukqyrjxWTX4ZLBOBvuLhpyssSXnnu2T+Tcxg09+ZSTyJAe0
 LyC9Wfo0v78JASXMAdeH9b1d1mRYNMqjvnBItNQoqweoqUXWz7kvgxCOp6b/G4xp
 cX5YhB6BprBW2DXL45frMRT/zX77UkEKYc5+0IBegV2xfnhRsjqQAQaWLIksyEaX
 nz9/C6+1Sr2IAv271yykeJtY6gtlRjg/usTlYpev+K0ghvGvTmuilEiTltjHrso1
 XAMbfWHQGSd61LNXofvx/GLNfGBisS6dHVHwtkayinSjXNdWxI6w9fhbWVjQ+y2V
 hOF05lmzaJSG5kPLrsFHFqm2YcxOmsWkYYDBHvtmBkMZSf5B+9xxDv97Uy9NETcr
 eSYk//TEkKQqVazfCQS/9LSm0MllqKbwNO25sl0Tw2k6PnheO2g=
 =toqi
 -----END PGP SIGNATURE-----

Merge tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV-ES support from Borislav Petkov:
 "SEV-ES enhances the current guest memory encryption support called SEV
  by also encrypting the guest register state, making the registers
  inaccessible to the hypervisor by en-/decrypting them on world
  switches. Thus, it adds additional protection to Linux guests against
  exfiltration, control flow and rollback attacks.

  With SEV-ES, the guest is in full control of what registers the
  hypervisor can access. This is provided by a guest-host exchange
  mechanism based on a new exception vector called VMM Communication
  Exception (#VC), a new instruction called VMGEXIT and a shared
  Guest-Host Communication Block which is a decrypted page shared
  between the guest and the hypervisor.

  Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
  so in order for that exception mechanism to work, the early x86 init
  code needed to be made able to handle exceptions, which, in itself,
  brings a bunch of very nice cleanups and improvements to the early
  boot code like an early page fault handler, allowing for on-demand
  building of the identity mapping. With that, !KASLR configurations do
  not use the EFI page table anymore but switch to a kernel-controlled
  one.

  The main part of this series adds the support for that new exchange
  mechanism. The goal has been to keep this as much as possibly separate
  from the core x86 code by concentrating the machinery in two
  SEV-ES-specific files:

    arch/x86/kernel/sev-es-shared.c
    arch/x86/kernel/sev-es.c

  Other interaction with core x86 code has been kept at minimum and
  behind static keys to minimize the performance impact on !SEV-ES
  setups.

  Work by Joerg Roedel and Thomas Lendacky and others"

* tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
  x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
  x86/sev-es: Check required CPU features for SEV-ES
  x86/efi: Add GHCB mappings when SEV-ES is active
  x86/sev-es: Handle NMI State
  x86/sev-es: Support CPU offline/online
  x86/head/64: Don't call verify_cpu() on starting APs
  x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
  x86/realmode: Setup AP jump table
  x86/realmode: Add SEV-ES specific trampoline entry point
  x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
  x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
  x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
  x86/sev-es: Handle #DB Events
  x86/sev-es: Handle #AC Events
  x86/sev-es: Handle VMMCALL Events
  x86/sev-es: Handle MWAIT/MWAITX Events
  x86/sev-es: Handle MONITOR/MONITORX Events
  x86/sev-es: Handle INVD Events
  x86/sev-es: Handle RDPMC Events
  x86/sev-es: Handle RDTSC(P) Events
  ...
2020-10-14 10:21:34 -07:00
Linus Torvalds
3bff6112c8 These are the performance events changes for v5.10:
x86 Intel updates:
 
  - Add Jasper Lake support
 
  - Add support for TopDown metrics on Ice Lake
 
  - Fix Ice Lake & Tiger Lake uncore support, add Snow Ridge support
 
  - Add a PCI sub driver to support uncore PMUs where the PCI resources
    have been claimed already - extending the range of supported systems.
 
 x86 AMD updates:
 
  - Restore 'perf stat -a' behaviour to program the uncore PMU
    to count all CPU threads.
 
  - Fix setting the proper count when sampling Large Increment
    per Cycle events / 'paired' events.
 
  - Fix IBS Fetch sampling on F17h and some other IBS fine tuning,
    greatly reducing the number of interrupts when large sample
    periods are specified.
 
  - Extends Family 17h RAPL support to also work on compatible
    F19h machines.
 
 Core code updates:
 
  - Fix race in perf_mmap_close()
 
  - Add PERF_EV_CAP_SIBLING, to denote that sibling events should be
    closed if the leader is removed.
 
  - Smaller fixes and updates.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+Ef40RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h7NQ//ZdQ26Yg79ZaxBX1QSINJ9AgXDi6rXs75
 qU9qNwr/6EF+633RZoPQGAE0Iy5v6h7iLFokcJzM9+kK/rE3ax44tSnPlcMa0+6N
 SHXKCa5iL+hH7o2Spo2MZwCYseH79rloX3TSH7ajnN3X8PvwgWshF0lUE3WEWtCs
 eHSojdCk43IuL9TpusuNOBM2FvgnheFYWiMbFHd0MTBUMxul30sLVCG8IIWCPA+q
 TwG4RJS3X42VbL3SuAGFmOv4OmqNsfkvHvjpDs4NF07tRB9zjXzGrxmGhgSw0NAN
 2KK25qbmrpKATIb4Eqsgk/yikX/SCrDEXrjhg3r8FnyPvRfctq1crZjjf672PI2E
 bDda76dH6Lq9jv5fsyJjas5OsYdMKBCnA+tGQxXPGbmTXeEcYMRbDnwhYnevI/Q/
 8pP+xstF0pmBA3tvpDPrQnYH72Qt7CLJSdcTB15NqZftU2tJxaAyJGx4gJy33jxQ
 wu6BIEGHQ7onQYiIyTwsBHyz6xNsF/CRHwAPcGdYrRRbXB5K5nxHiXNb4awciTMx
 2HF31/S4OqURNpfcpxOQo+1fb/cLqj3loGqE4jCTwkbS3lrHcAcfxyv9QNn77l1f
 hdQ0jworbUNVLUYEUQz1bkZ06GD3LSSas2ZlY1NNdHo62mjyXMQmgirNcZmrFgWl
 tl2gNFAU9x4=
 =2fuY
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "x86 Intel updates:

   - Add Jasper Lake support

   - Add support for TopDown metrics on Ice Lake

   - Fix Ice Lake & Tiger Lake uncore support, add Snow Ridge support

   - Add a PCI sub driver to support uncore PMUs where the PCI resources
     have been claimed already - extending the range of supported
     systems.

  x86 AMD updates:

   - Restore 'perf stat -a' behaviour to program the uncore PMU to count
     all CPU threads.

   - Fix setting the proper count when sampling Large Increment per
     Cycle events / 'paired' events.

   - Fix IBS Fetch sampling on F17h and some other IBS fine tuning,
     greatly reducing the number of interrupts when large sample periods
     are specified.

   - Extends Family 17h RAPL support to also work on compatible F19h
     machines.

  Core code updates:

   - Fix race in perf_mmap_close()

   - Add PERF_EV_CAP_SIBLING, to denote that sibling events should be
     closed if the leader is removed.

   - Smaller fixes and updates"

* tag 'perf-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  perf/core: Fix race in the perf_mmap_close() function
  perf/x86: Fix n_metric for cancelled txn
  perf/x86: Fix n_pair for cancelled txn
  x86/events/amd/iommu: Fix sizeof mismatch
  perf/x86/intel: Check perf metrics feature for each CPU
  perf/x86/intel: Fix Ice Lake event constraint table
  perf/x86/intel/uncore: Fix the scale of the IMC free-running events
  perf/x86/intel/uncore: Fix for iio mapping on Skylake Server
  perf/x86/msr: Add Jasper Lake support
  perf/x86/intel: Add Jasper Lake support
  perf/x86/intel/uncore: Reduce the number of CBOX counters
  perf/x86/intel/uncore: Update Ice Lake uncore units
  perf/x86/intel/uncore: Split the Ice Lake and Tiger Lake MSR uncore support
  perf/x86/intel/uncore: Support PCIe3 unit on Snow Ridge
  perf/x86/intel/uncore: Generic support for the PCI sub driver
  perf/x86/intel/uncore: Factor out uncore_pci_pmu_unregister()
  perf/x86/intel/uncore: Factor out uncore_pci_pmu_register()
  perf/x86/intel/uncore: Factor out uncore_pci_find_dev_pmu()
  perf/x86/intel/uncore: Factor out uncore_pci_get_dev_die_info()
  perf/amd/uncore: Inform the user how many counters each uncore PMU has
  ...
2020-10-12 14:14:35 -07:00
Fenghua Yu
f0f2f9feb4 x86/msr-index: Define an IA32_PASID MSR
The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses the PASID to identify a
process address space and direct responses to the right address space.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1600187413-163670-7-git-send-email-fenghua.yu@intel.com
2020-09-17 20:22:15 +02:00
Kim Phillips
36e1be8ada perf/x86/amd/ibs: Fix raw sample data accumulation
Neither IbsBrTarget nor OPDATA4 are populated in IBS Fetch mode.
Don't accumulate them into raw sample user data in that case.

Also, in Fetch mode, add saving the IBS Fetch Control Extended MSR.

Technically, there is an ABI change here with respect to the IBS raw
sample data format, but I don't see any perf driver version information
being included in perf.data file headers, but, existing users can detect
whether the size of the sample record has reduced by 8 bytes to
determine whether the IBS driver has this fix.

Fixes: 904cb3677f ("perf/x86/amd/ibs: Update IBS MSRs and feature definitions")
Reported-by: Stephane Eranian <stephane.eranian@google.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200908214740.18097-6-kim.phillips@amd.com
2020-09-10 11:19:35 +02:00
Joerg Roedel
b57de6cd16 x86/sev-es: Add SEV-ES Feature Detection
Add a sev_es_active() function for checking whether SEV-ES is enabled.
Also cache the value of MSR_AMD64_SEV at boot to speed up the feature
checking in the running code.

 [ bp: Remove "!!" in sev_active() too. ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-37-joro@8bytes.org
2020-09-07 23:00:20 +02:00
Joerg Roedel
29dcc60f6a x86/boot/compressed/64: Add stage1 #VC handler
Add the first handler for #VC exceptions. At stage 1 there is no GHCB
yet because the kernel might still be running on the EFI page table.

The stage 1 handler is limited to the MSR-based protocol to talk to the
hypervisor and can only support CPUID exit-codes, but that is enough to
get to stage 2.

 [ bp: Zap superfluous newlines after rd/wrmsr instruction mnemonics. ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-20-joro@8bytes.org
2020-09-07 19:45:25 +02:00
Kan Liang
59a854e2f3 perf/x86/intel: Support TopDown metrics on Ice Lake
Ice Lake supports the hardware TopDown metrics feature, which can free
up the scarce GP counters.

Update the event constraints for the metrics events. The metric counters
do not exist, which are mapped to a dummy offset. The sharing between
multiple users of the same metric without multiplexing is not allowed.

Implement set_topdown_event_period for Ice Lake. The values in
PERF_METRICS MSR are derived from the fixed counter 3. Both registers
should start from zero.

Implement update_topdown_event for Ice Lake. The metric is reported by
multiplying the metric (fraction) with slots. To maintain accurate
measurements, both registers are cleared for each update. The fixed
counter 3 should always be cleared before the PERF_METRICS.

Implement td_attr for the new metrics events and the new slots fixed
counter. Make them visible to the perf user tools.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-11-kan.liang@linux.intel.com
2020-08-18 16:34:37 +02:00
Kan Liang
7b2c05a15d perf/x86/intel: Generic support for hardware TopDown metrics
Intro
=====

The TopDown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors. Current perf has supported the method.

The method works well, but there is one problem. To collect the TopDown
events, several GP counters have to be used. If a user wants to collect
other events at the same time, the multiplexing probably be triggered,
which impacts the accuracy.

To free up the scarce GP counters, the hardware TopDown metrics feature
is introduced from Ice Lake. The hardware implements an additional
"metrics" register and a new Fixed Counter 3 that measures pipeline
"slots". The TopDown events can be calculated from them instead.

Events
======

The level 1 TopDown has four metrics. There is no event-code assigned to
the TopDown metrics. Four metric events are exported as separate perf
events, which map to the internal "metrics" counter register. Those
events do not exist in hardware, but can be allocated by the scheduler.

For the event mapping, a special 0x00 event code is used, which is
reserved for fake events. The metric events start from umask 0x10.

When setting up the metric events, they point to the Fixed Counter 3.
They have to be specially handled.
- Add the update_topdown_event() callback to read the additional metrics
  MSR and generate the metrics.
- Add the set_topdown_event_period() callback to initialize metrics MSR
  and the fixed counter 3.
- Add a variable n_metric_event to track the number of the accepted
  metrics events. The sharing between multiple users of the same metric
  without multiplexing is not allowed.
- Only enable/disable the fixed counter 3 when there are no other active
  TopDown events, which avoid the unnecessary writing of the fixed
  control register.
- Disable the PMU when reading the metrics event. The metrics MSR and
  the fixed counter 3 are read separately. The values may be modified by
  an NMI.

All four metric events don't support sampling. Since they will be
handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
introduced to indicate this case.

The slots event can support both sampling and counting.
For counting, the flag is also applied.
For sampling, it will be handled normally as other normal events.

Groups
======

The slots event is required in a Topdown group.
To avoid reading the METRICS register multiple times, the metrics and
slots value can only be updated by slots event in a group.
All active slots and metrics events will be updated one time.
Therefore, the slots event must be before any metric events in a Topdown
group.

NMI
======

The METRICS related register may be overflow. The bit 48 of the STATUS
register will be set. If so, PERF_METRICS and Fixed counter 3 are
required to be reset. The patch also update all active slots and
metrics events in the NMI handler.

The update_topdown_event() has to read two registers separately. The
values may be modified by an NMI. PMU has to be disabled before calling
the function.

RDPMC
======

RDPMC is temporarily disabled. A later patch will enable it.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
2020-08-18 16:34:36 +02:00
Linus Torvalds
0408497800 Power management updates for 5.9-rc1
- Make the Energy Model cover non-CPU devices (Lukasz Luba).
 
  - Add Ice Lake server idle states table to the intel_idle driver
    and eliminate a redundant static variable from it (Chen Yu,
    Rafael Wysocki).
 
  - Eliminate all W=1 build warnings from cpufreq (Lee Jones).
 
  - Add support for Sapphire Rapids and for Power Limit 4 to the
    Intel RAPL power capping driver (Sumeet Pawnikar, Zhang Rui).
 
  - Fix function name in kerneldoc comments in the idle_inject power
    capping driver (Yangtao Li).
 
  - Fix locking issues with cpufreq governors and drop a redundant
    "weak" function definition from cpufreq (Viresh Kumar).
 
  - Rearrange cpufreq to register non-modular governors at the
    core_initcall level and allow the default cpufreq governor to
    be specified in the kernel command line (Quentin Perret).
 
  - Extend, fix and clean up the intel_pstate driver (Srinivas
    Pandruvada, Rafael Wysocki):
 
    * Add a new sysfs attribute for disabling/enabling CPU
      energy-efficiency optimizations in the processor.
 
    * Make the driver avoid enabling HWP if EPP is not supported.
 
    * Allow the driver to handle numeric EPP values in the sysfs
      interface and fix the setting of EPP via sysfs in the active
      mode.
 
    * Eliminate a static checker warning and clean up a kerneldoc
      comment.
 
  - Clean up some variable declarations in the powernv cpufreq
    driver (Wei Yongjun).
 
  - Fix up the ->enter_s2idle callback definition to cover the case
    when it points to the same function as ->idle correctly (Neal
    Liu).
 
  - Rearrange and clean up the PSCI cpuidle driver (Ulf Hansson).
 
  - Make the PM core emit "changed" uevent when adding/removing the
    "wakeup" sysfs attribute of devices (Abhishek Pandit-Subedi).
 
  - Add a helper macro for declaring PM callbacks and use it in the
    MMC jz4740 driver (Paul Cercueil).
 
  - Fix white space in some places in the hibernate code and make the
    system-wide PM code use "const char *" where appropriate (Xiang
    Chen, Alexey Dobriyan).
 
  - Add one more "unsafe" helper macro to the freezer to cover the NFS
    use case (He Zhe).
 
  - Change the language in the generic PM domains framework to use
    parent/child terminology and clean up a typo and some comment
    fromatting in that code (Kees Cook, Geert Uytterhoeven).
 
  - Update the operating performance points OPP framework (Lukasz
    Luba, Andrew-sh.Cheng, Valdis Kletnieks):
 
    * Refactor dev_pm_opp_of_register_em() and update related drivers.
 
    * Add a missing function export.
 
    * Allow disabled OPPs in dev_pm_opp_get_freq().
 
  - Update devfreq core and drivers (Chanwoo Choi, Lukasz Luba, Enric
    Balletbo i Serra, Dmitry Osipenko, Kieran Bingham, Marc Zyngier):
 
    * Add support for delayed timers to the devfreq core and make the
      Samsung exynos5422-dmc driver use it.
 
    * Unify sysfs interface to use "df-" as a prefix in instance names
      consistently.
 
    * Fix devfreq_summary debugfs node indentation.
 
    * Add the rockchip,pmu phandle to the rk3399_dmc driver DT
      bindings.
 
    * List Dmitry Osipenko as the Tegra devfreq driver maintainer.
 
    * Fix typos in the core devfreq code.
 
  - Update the pm-graph utility to version 5.7 including a number of
    fixes related to suspend-to-idle (Todd Brandt).
 
  - Fix coccicheck errors and warnings in the cpupower utility (Shuah
    Khan).
 
  - Replace HTTP links with HTTPs ones in multiple places (Alexander
    A. Klimov).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl8oO24SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx7ZQP/0lQ0yABnASnwomdOH6+K/m7rvc+e9FE
 zx5pTDQswhU5tM7SQAIKqe0uSI+okF2UrBrT5onA16F+JUbnrbexJLazBPfVTTGF
 AKpKEQ7Wh69Wz+Y6cQZjm1dTuRL+dlBJuBrzR2tLSnONPMMHuFcO3xd7lgE9UAxC
 oGEf393taA6OqcUNRQIa2gqbq+k1qhKjeDucGkbOaoJ6CL0ZyWI+Tfw1WWaBBGv0
 /2wBd6V513OH8WtQCW6H3YpHmhYW6OwL8w19KyGcjPRGJaeaIP4W/Ng7mkvgL5ZB
 vZqg3XiufFV9uTe8W1NQaVv/NjlN256OteuK809aosTVjD0dhFkhBYg5TLu6HbQq
 C/NciZ+78oLedWLT73EUfw3NyS+V0jk6X2EIlBUwNi0Qw1B1pCifGOCKzWFFe5cr
 ci4xr4FG7dBkxScOxwFAU2s5TdPHLOkGkQtg4jZr0OYDrzkyLEdsnZEUjLPORo+0
 6EBXGfTOSy2CBHcYswRtzJr/1pUTzj7oejhTAMCCuYW2r3VyQtnYcVjlehtp20if
 6BfmGisk8nmtxlSm+/Y2FqKa4bNnSTMmr0UJQ+Rjp0tHs47QeucI0ORfZ5nPaBac
 +ptvIjWmn3xejT/+oAehpH9066Iuy66vzHdnj7x5+WAsmYS8n8OFtlBFkYELmLJB
 3xI5hIl7WtGo
 =8cUO
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "The most significant change here is the extension of the Energy Model
  to cover non-CPU devices (as well as CPUs) from Lukasz Luba.

  There is also some new hardware support (Ice Lake server idle states
  table for intel_idle, Sapphire Rapids and Power Limit 4 support in the
  RAPL driver), some new functionality in the existing drivers (eg. a
  new switch to disable/enable CPU energy-efficiency optimizations in
  intel_pstate, delayed timers in devfreq), some assorted fixes (cpufreq
  core, intel_pstate, intel_idle) and cleanups (eg. cpuidle-psci,
  devfreq), including the elimination of W=1 build warnings from cpufreq
  done by Lee Jones.

  Specifics:

   - Make the Energy Model cover non-CPU devices (Lukasz Luba).

   - Add Ice Lake server idle states table to the intel_idle driver and
     eliminate a redundant static variable from it (Chen Yu, Rafael
     Wysocki).

   - Eliminate all W=1 build warnings from cpufreq (Lee Jones).

   - Add support for Sapphire Rapids and for Power Limit 4 to the Intel
     RAPL power capping driver (Sumeet Pawnikar, Zhang Rui).

   - Fix function name in kerneldoc comments in the idle_inject power
     capping driver (Yangtao Li).

   - Fix locking issues with cpufreq governors and drop a redundant
     "weak" function definition from cpufreq (Viresh Kumar).

   - Rearrange cpufreq to register non-modular governors at the
     core_initcall level and allow the default cpufreq governor to be
     specified in the kernel command line (Quentin Perret).

   - Extend, fix and clean up the intel_pstate driver (Srinivas
     Pandruvada, Rafael Wysocki):

       * Add a new sysfs attribute for disabling/enabling CPU
         energy-efficiency optimizations in the processor.

       * Make the driver avoid enabling HWP if EPP is not supported.

       * Allow the driver to handle numeric EPP values in the sysfs
         interface and fix the setting of EPP via sysfs in the active
         mode.

       * Eliminate a static checker warning and clean up a kerneldoc
         comment.

   - Clean up some variable declarations in the powernv cpufreq driver
     (Wei Yongjun).

   - Fix up the ->enter_s2idle callback definition to cover the case
     when it points to the same function as ->idle correctly (Neal Liu).

   - Rearrange and clean up the PSCI cpuidle driver (Ulf Hansson).

   - Make the PM core emit "changed" uevent when adding/removing the
     "wakeup" sysfs attribute of devices (Abhishek Pandit-Subedi).

   - Add a helper macro for declaring PM callbacks and use it in the MMC
     jz4740 driver (Paul Cercueil).

   - Fix white space in some places in the hibernate code and make the
     system-wide PM code use "const char *" where appropriate (Xiang
     Chen, Alexey Dobriyan).

   - Add one more "unsafe" helper macro to the freezer to cover the NFS
     use case (He Zhe).

   - Change the language in the generic PM domains framework to use
     parent/child terminology and clean up a typo and some comment
     fromatting in that code (Kees Cook, Geert Uytterhoeven).

   - Update the operating performance points OPP framework (Lukasz Luba,
     Andrew-sh.Cheng, Valdis Kletnieks):

       * Refactor dev_pm_opp_of_register_em() and update related drivers.

       * Add a missing function export.

       * Allow disabled OPPs in dev_pm_opp_get_freq().

   - Update devfreq core and drivers (Chanwoo Choi, Lukasz Luba, Enric
     Balletbo i Serra, Dmitry Osipenko, Kieran Bingham, Marc Zyngier):

       * Add support for delayed timers to the devfreq core and make the
         Samsung exynos5422-dmc driver use it.

       * Unify sysfs interface to use "df-" as a prefix in instance
         names consistently.

       * Fix devfreq_summary debugfs node indentation.

       * Add the rockchip,pmu phandle to the rk3399_dmc driver DT
         bindings.

       * List Dmitry Osipenko as the Tegra devfreq driver maintainer.

       * Fix typos in the core devfreq code.

   - Update the pm-graph utility to version 5.7 including a number of
     fixes related to suspend-to-idle (Todd Brandt).

   - Fix coccicheck errors and warnings in the cpupower utility (Shuah
     Khan).

   - Replace HTTP links with HTTPs ones in multiple places (Alexander A.
     Klimov)"

* tag 'pm-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (71 commits)
  cpuidle: ACPI: fix 'return' with no value build warning
  cpufreq: intel_pstate: Fix EPP setting via sysfs in active mode
  cpufreq: intel_pstate: Rearrange the storing of new EPP values
  intel_idle: Customize IceLake server support
  PM / devfreq: Fix the wrong end with semicolon
  PM / devfreq: Fix indentaion of devfreq_summary debugfs node
  PM / devfreq: Clean up the devfreq instance name in sysfs attr
  memory: samsung: exynos5422-dmc: Add module param to control IRQ mode
  memory: samsung: exynos5422-dmc: Adjust polling interval and uptreshold
  memory: samsung: exynos5422-dmc: Use delayed timer as default
  PM / devfreq: Add support delayed timer for polling mode
  dt-bindings: devfreq: rk3399_dmc: Add rockchip,pmu phandle
  PM / devfreq: tegra: Add Dmitry as a maintainer
  PM / devfreq: event: Fix trivial spelling
  PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
  cpuidle: change enter_s2idle() prototype
  cpuidle: psci: Prevent domain idlestates until consumers are ready
  cpuidle: psci: Convert PM domain to platform driver
  cpuidle: psci: Fix error path via converting to a platform driver
  cpuidle: psci: Fail cpuidle registration if set OSI mode failed
  ...
2020-08-03 20:28:08 -07:00
Linus Torvalds
37e88224c0 Misc cleanups all around the place.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oRTgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1huHQ//T2hZk5zlpOtojxvdAzsPgtV4tHawseK8
 +ZZEbrH5qo5/ZMF18qyEJCm9p1yg8uIu71InULRCSgjU3v82GVCcuLXuE36U904G
 gHUqkYPnqxCqx+Li125aye9tKWahXe1DxX+uWbV0Ju7fiCO0rwYIzpWn1bnR6ilp
 fmLGSbgPlTVJwZ9mBvyi3VUlH5tDYidFN74TREUOwx2g5uhg+8uEo44Eb/bx8ESF
 dGt1Z/fnfDHkUZtmhzJk5Uz8nbw7rPHU/EZ4iZAxEzxTutY5PhsvbIfLO4t4HhGn
 utZCk/pIdiLLQ1GaTvFxqi3iolDqpOuXpnDlfEAJD8UlMCnwyh1Certq5LaRbtHS
 8SW3/CeJgzqzrrsYhkxVu2PMFWriSMxgKTLiN0KnzJN0Hu7A5lHbBY/6G7zpsF/A
 2KJ4e8lZiPCcNF7LteSRroUe4hNOYxZ2FlYTXm3AgycSL189UMfWlHFb5c+b4m1a
 cNJpz+jAom8foXN4KhRkl5PFKXVXDGTVln3NRJCh1Mqd1Ef4hsTo9H6FgHX/EfHg
 slJDwwPac80v0dzlMTSsMkyseaKRAqIObWOiknPt1wv/qja7ibVZ5mUbZ+/mfJX/
 YWybcPi1omgUSNt7TNx6jtma67rUjmJW0x9g7UJ/ttEkf6yG2lemrdusydBYuIni
 0Z2+hWzI9MM=
 =X7o0
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups all around the place"

* tag 'x86-cleanups-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioperm: Initialize pointer bitmap with NULL rather than 0
  x86: uv: uv_hub.h: Delete duplicated word
  x86: cmpxchg_32.h: Delete duplicated word
  x86: bootparam.h: Delete duplicated word
  x86/mm: Remove the unused mk_kernel_pgd() #define
  x86/tsc: Remove unused "US_SCALE" and "NS_SCALE" leftover macros
  x86/ioapic: Remove unused "IOAPIC_AUTO" define
  x86/mm: Drop unused MAX_PHYSADDR_BITS
  x86/msr: Move the F15h MSRs where they belong
  x86/idt: Make idt_descr static
  initrd: Remove erroneous comment
  x86/mm/32: Fix -Wmissing prototypes warnings for init.c
  cpu/speculation: Add prototype for cpu_show_srbds()
  x86/mm: Fix -Wmissing-prototypes warnings for arch/x86/mm/init.c
  x86/asm: Unify __ASSEMBLY__ blocks
  x86/cpufeatures: Mark two free bits in word 3
  x86/msr: Lift AMD family 0x15 power-specific MSRs
2020-08-03 16:53:28 -07:00
Kan Liang
d6a162a41b x86/msr-index: Add bunch of MSRs for Arch LBR
Add Arch LBR related MSRs and the new LBR INFO bits in MSR-index.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1593780569-62993-8-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:52 +02:00
Srinivas Pandruvada
ed7bde7a6d cpufreq: intel_pstate: Allow enable/disable energy efficiency
By default intel_pstate the driver disables energy efficiency by setting
MSR_IA32_POWER_CTL bit 19 for Kaby Lake desktop CPU model in HWP mode.
This CPU model is also shared by Coffee Lake desktop CPUs. This allows
these systems to reach maximum possible frequency. But this adds power
penalty, which some customers don't want. They want some way to enable/
disable dynamically.

So, add an additional attribute "energy_efficiency" under
/sys/devices/system/cpu/intel_pstate/ for these CPU models. This allows
to read and write bit 19 ("Disable Energy Efficiency Optimization") in
the MSR IA32_POWER_CTL.

This attribute is present in both HWP and non-HWP mode as this has an
effect in both modes. Refer to Intel Software Developer's manual for
details.

The scope of this bit is package wide. Also these systems are single
package systems. So read/write MSR on the current CPU is enough.

The energy efficiency (EE) bit setting needs to be preserved during
suspend/resume and CPU offline/online operation. To do this:
- Restoring the EE setting from the cpufreq resume() callback, if there
is change from the system default.
- By default, don't disable EE from cpufreq init() callback for matching
CPU models. Since the scope is package wide and is a single package
system, move the disable EE calls from init() callback to
intel_pstate_init() function, which is called only once.

Suggested-by: Len Brown <lenb@kernel.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-02 13:02:46 +02:00
Borislav Petkov
99e40204e0 x86/msr: Move the F15h MSRs where they belong
1068ed4547 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")

moved the three F15h power MSRs to the architectural list but that was
wrong as they belong in the family 0x15 list. That also caused:

  In file included from trace/beauty/tracepoints/x86_msr.c:10:
  perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: error: initialized field overwritten [-Werror=override-init]
    292 |  [0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
        |                                             ^~~~~~~~~~~
  perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: note: (near initialization for 'x86_AMD_V_KVM_MSRs[640]')

due to MSR_F15H_PTSC ending up being defined twice. Move them where they
belong and drop the duplicate.

Also, drop the respective tools/ changes of the msr-index.h copy the
above commit added because perf tool developers prefer to go through
those changes themselves in order to figure out whether changes to the
kernel headers would need additional handling in perf.

Fixes: 1068ed4547 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lkml.kernel.org/r/20200621163323.14e8533f@canb.auug.org.au
2020-06-22 17:15:53 +02:00
Borislav Petkov
1068ed4547 x86/msr: Lift AMD family 0x15 power-specific MSRs
... into the global msr-index.h header because they're used in multiple
compilation units. Sort the MSR list a bit. Update the msr-index.h copy
in tools.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lkml.kernel.org/r/20200608164847.14232-1-bp@alien8.de
2020-06-15 19:25:53 +02:00