Processor MMIO Stale Data is a class of vulnerabilities that may
expose data after an MMIO operation. For more details please refer to
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
Add the Processor MMIO Stale Data bug enumeration. A microcode update
adds new bits to the MSR IA32_ARCH_CAPABILITIES, define them.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
The INTEGRITY_CAPABILITIES MSR is enumerated by bit 2 of the
CORE_CAPABILITIES MSR.
Add defines for the CORE_CAPS enumeration as well as for the integrity
MSR.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-3-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Add MSR definitions that will be used to enable the new AMD
Performance Monitoring Version 2 (PerfMonV2) features. These
include:
* Performance Counter Global Control (PerfCntrGlobalCtl)
* Performance Counter Global Status (PerfCntrGlobalStatus)
* Performance Counter Global Status Clear (PerfCntrGlobalStatusClr)
The new Performance Counter Global Control and Status MSRs
provide an interface for enabling or disabling multiple
counters at the same time and for testing overflow without
probing the individual registers for each PMC.
The availability of these registers is indicated through the
PerfMonV2 feature bit of CPUID leaf 0x80000022 EAX.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/cdc0d8f75bd519848731b5c64d924f5a0619a573.1650515382.git.sandipan.das@amd.com
Pull turbostat changes for 5.19 from Len Brown:
"Chen Yu (1):
tools/power turbostat: Support thermal throttle count print
Dan Merillat (1):
tools/power turbostat: fix dump for AMD cpus
Len Brown (5):
tools/power turbostat: tweak --show and --hide capability
tools/power turbostat: fix ICX DRAM power numbers
tools/power turbostat: be more useful as non-root
tools/power turbostat: No build warnings with -Wextra
tools/power turbostat: version 2022.04.16
Sumeet Pawnikar (2):
tools/power turbostat: Add Power Limit4 support
tools/power turbostat: print power values upto three decimal
Zephaniah E. Loss-Cutler-Hull (2):
tools/power turbostat: Allow -e for all names.
tools/power turbostat: Allow printing header every N iterations"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2022.04.16
tools/power turbostat: No build warnings with -Wextra
tools/power turbostat: be more useful as non-root
tools/power turbostat: fix ICX DRAM power numbers
tools/power turbostat: Support thermal throttle count print
tools/power turbostat: Allow printing header every N iterations
tools/power turbostat: Allow -e for all names.
tools/power turbostat: print power values upto three decimal
tools/power turbostat: Add Power Limit4 support
tools/power turbostat: fix dump for AMD cpus
tools/power turbostat: tweak --show and --hide capability
A microcode update on some Intel processors causes all TSX transactions
to always abort by default[*]. Microcode also added functionality to
re-enable TSX for development purposes. With this microcode loaded, if
tsx=on was passed on the cmdline, and TSX development mode was already
enabled before the kernel boot, it may make the system vulnerable to TSX
Asynchronous Abort (TAA).
To be on safer side, unconditionally disable TSX development mode during
boot. If a viable use case appears, this can be revisited later.
[*]: Intel TSX Disable Update for Selected Processors, doc ID: 643557
[ bp: Drop unstable web link, massage heavily. ]
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/347bd844da3a333a9793c6687d4e4eb3b2419a3e.1646943780.git.pawan.kumar.gupta@linux.intel.com
The CC_ATTR_GUEST_SEV_SNP can be used by the guest to query whether the
SNP (Secure Nested Paging) feature is active.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220307213356.2797205-10-brijesh.singh@amd.com
Add support for the AMD Fam19h 16-deep branch sampling feature as
described in the AMD PPR Fam19h Model 01h Revision B1. This is a model
specific extension. It is not an architected AMD feature.
The Branch Sampling (BRS) operates with a 16-deep saturating buffer in MSR
registers. There is no branch type filtering. All control flow changes are
captured. BRS relies on specific programming of the core PMU of Fam19h. In
particular, the following requirements must be met:
- the sampling period be greater than 16 (BRS depth)
- the sampling period must use a fixed and not frequency mode
BRS interacts with the NMI interrupt as well. Because enabling BRS is
expensive, it is only activated after P event occurrences, where P is the
desired sampling period. At P occurrences of the event, the counter
overflows, the CPU catches the interrupt, activates BRS for 16 branches until
it saturates, and then delivers the NMI to the kernel. Between the overflow
and the time BRS activates more branches may be executed skewing the period.
All along, the sampling event keeps counting. The skid may be attenuated by
reducing the sampling period by 16 (subsequent patch).
BRS is integrated into perf_events seamlessly via the same
PERF_RECORD_BRANCH_STACK sample format. BRS generates perf_branch_entry
records in the sampling buffer. No prediction information is supported. The
branches are stored in reverse order of execution. The most recent branch is
the first entry in each record.
No modification to the perf tool is necessary.
BRS can be used with any sampling event. However, it is recommended to use
the RETIRED_BRANCH_INSTRUCTIONS event because it matches what the BRS
captures.
$ perf record -b -c 1000037 -e cpu/event=0xc2,name=ret_br_instructions/ test
$ perf report -D
56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0
... branch stack: nr:16
..... 0: 0000000000401d24 -> 0000000000401d5a 0 cycles 0
..... 1: 0000000000401d5c -> 0000000000401d24 0 cycles 0
..... 2: 0000000000401d22 -> 0000000000401d5c 0 cycles 0
..... 3: 0000000000401d5e -> 0000000000401d22 0 cycles 0
..... 4: 0000000000401d20 -> 0000000000401d5e 0 cycles 0
..... 5: 0000000000401d3e -> 0000000000401d20 0 cycles 0
..... 6: 0000000000401d42 -> 0000000000401d3e 0 cycles 0
..... 7: 0000000000401d3c -> 0000000000401d42 0 cycles 0
..... 8: 0000000000401d44 -> 0000000000401d3c 0 cycles 0
..... 9: 0000000000401d3a -> 0000000000401d44 0 cycles 0
..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles 0
..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles 0
..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles 0
..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles 0
..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles 0
..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles 0
... thread: test:18230
...... dso: test
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322221517.2510440-4-eranian@google.com
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism
where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP.
Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is
limited to 2 instructions (and typically fewer) on branch targets not starting
with ENDBR. CET-IBT also limits speculation of the next sequential instruction
after the indirect CALL/JMP [1].
CET-IBT is fundamentally incompatible with retpolines, but provides, as
described above, speculation limits itself.
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp
bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD
1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX
urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF
BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b
y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde
go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht
9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW
ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT
x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T
dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL
EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz
beB/7u2KIUbKEkSN
=jZfK
-----END PGP SIGNATURE-----
Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
"Add support for Intel CET-IBT, available since Tigerlake (11th gen),
which is a coarse grained, hardware based, forward edge
Control-Flow-Integrity mechanism where any indirect CALL/JMP must
target an ENDBR instruction or suffer #CP.
Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
is limited to 2 instructions (and typically fewer) on branch targets
not starting with ENDBR. CET-IBT also limits speculation of the next
sequential instruction after the indirect CALL/JMP [1].
CET-IBT is fundamentally incompatible with retpolines, but provides,
as described above, speculation limits itself"
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
kvm/emulate: Fix SETcc emulation for ENDBR
x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
kbuild: Fixup the IBT kbuild changes
x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
x86: Remove toolchain check for X32 ABI capability
x86/alternative: Use .ibt_endbr_seal to seal indirect calls
objtool: Find unused ENDBR instructions
objtool: Validate IBT assumptions
objtool: Add IBT/ENDBR decoding
objtool: Read the NOENDBR annotation
x86: Annotate idtentry_df()
x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
x86: Annotate call_on_stack()
objtool: Rework ASM_REACHABLE
x86: Mark __invalid_creds() __noreturn
exit: Mark do_group_exit() __noreturn
x86: Mark stop_this_cpu() __noreturn
objtool: Ignore extra-symbol code
objtool: Rename --duplicate to --lto
...
- Fix address filtering for Intel/PT,ARM/CoreSight
- Enable Intel/PEBS format 5
- Allow more fixed-function counters for x86
- Intel/PT: Enable not recording Taken-Not-Taken packets
- Add a few branch-types
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI4WdIRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jdTA/7BADTYzFCbdwPzHt2mR8osv7k+pDvYxs9
wxNjyi1X7N8cPkhqgIg9CfdhdyDOqo7+J4fG17f2qbwjNK7b2Fb1/U6ZoZaf+f8F
W0e2LX5KZTXUhkA+TEjrXvYD9FmJaCPM/l2RQg8U7okBs2kb0H6QT2Yn21wd1roC
WwI5KFiWSVS1IzpVLaXjDh+FJfJHd75ReMqJeus+QoVQ9NHeuI+t4DglSB1IBi54
d/zeVXE/Y4dFTQOrU06S2HxcOEptvXZsPmVLvKab/veeGGyWiGPxQpvu6bXm6u3x
0sV+dn67zut2m2pQlUZUucgGTSYIZTpOe+rNukTB9hJ4XeN4/1ohOOCrOuYM+63P
lGFbN1v+LD7Wc6C2eEhw8G5GEL0qbwzFNQ06O3EOFi7C7GKn7WS/ET6XuuMOERFk
uxEPb4pFtbBlJ0SriCprFJSd5NL3PORZlLIhv4hGH5hilLR1TFeKDuwZaM4noQxU
dL3rKGLi9H+P46Eni9H28+0gDISbv1xL+WivHOFQNmhBqAZO52ZcF3J+dgBaR1B5
pBxVTycFpZMjxSZnqTE0gMsFaLIpVGc+75Chns1rajR0mEtRtJUQUbYz4tK4zb0E
dZR1p+VF6+DYmSRhiqeaTi9uz9oE8kMa8o/EcbFIg/9BgEnUwJXU20bjnar30xQ7
9OIn7r9hjHI=
=XPuo
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event updates from Ingo Molnar:
- Fix address filtering for Intel/PT,ARM/CoreSight
- Enable Intel/PEBS format 5
- Allow more fixed-function counters for x86
- Intel/PT: Enable not recording Taken-Not-Taken packets
- Add a few branch-types
* tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix the build on !CONFIG_PHYS_ADDR_T_64BIT
perf: Add irq and exception return branch types
perf/x86/intel/uncore: Make uncore_discovery clean for 64 bit addresses
perf/x86/intel/pt: Add a capability and config bit for disabling TNTs
perf/x86/intel/pt: Add a capability and config bit for event tracing
perf/x86/intel: Increase max number of the fixed counters
KVM: x86: use the KVM side max supported fixed counter
perf/x86/intel: Enable PEBS format 5
perf/core: Allow kernel address filter when not filtering the kernel
perf/x86/intel/pt: Fix address filter config for 32-bit kernel
perf/core: Fix address filter parser for multiple filters
x86: Share definition of __is_canonical_address()
perf/x86/intel/pt: Relax address filter validation
Merge Intel Hardware Feedback Interface (HFI) thermal driver for
5.18-rc1 and update the intel-speed-select utility to support that
driver.
* thermal-hfi:
tools/power/x86/intel-speed-select: v1.12 release
tools/power/x86/intel-speed-select: HFI support
tools/power/x86/intel-speed-select: OOB daemon mode
thermal: intel: hfi: INTEL_HFI_THERMAL depends on NET
thermal: netlink: Fix parameter type of thermal_genl_cpu_capability_event() stub
thermal: intel: hfi: Notify user space for HFI events
thermal: netlink: Add a new event to notify CPU capabilities change
thermal: intel: hfi: Enable notification interrupt
thermal: intel: hfi: Handle CPU hotplug events
thermal: intel: hfi: Minimally initialize the Hardware Feedback Interface
x86/cpu: Add definitions for the Intel Hardware Feedback Interface
x86/Documentation: Describe the Intel Hardware Feedback Interface
The bits required to make the hardware go.. Of note is that, provided
the syscall entry points are covered with ENDBR, #CP doesn't need to
be an IST because we'll never hit the syscall gap.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.582331711@infradead.org
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called TNT-Disable which is enabled config bit 55.
TNT-Disable disables Taken-Not-Taken packets to reduce the tracing
overhead, but with the result that exact control flow information is
lost.
Add a capability and config bit for TNT-Disable.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20220126104815.2807416-3-adrian.hunter@intel.com
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which is enabled config bit 31.
Event Trace exposes details about asynchronous events such as interrupts
and VM-Entry/Exit.
Add a capability and config bit for Event Trace.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20220126104815.2807416-2-adrian.hunter@intel.com
asm/svm.h is the correct place for all values that are defined in
the SVM spec, and that includes AVIC.
Also add some values from the spec that were not defined before
and will be soon useful.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220207155447.840194-10-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the CPUID feature bit and the model-specific registers needed to
identify and configure the Intel Hardware Feedback Interface.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
AMD CPPC (Collaborative Processor Performance Control) function uses MSR
registers to manage the performance hints. So add the MSR register macro
here.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
XFD introduces two MSRs:
- IA32_XFD to enable/disable a feature controlled by XFD
- IA32_XFD_ERR to expose to the #NM trap handler which feature
was tried to be used for the first time.
Both use the same xstate-component bitmap format, used by XCR0.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-14-chang.seok.bae@intel.com
- Allow MONITOR/MWAIT to be used for C1 state entry on Hygon too
- Use the special RAPL CPUID bit to detect the functionality on AMD and
Hygon instead of doing family matching.
- Add support for new Intel microcode deprecating TSX on some models and
do not enable kernel workarounds for those CPUs when TSX transactions
always abort, as a result of that microcode update.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmDZhzEACgkQEsHwGGHe
VUo5ow//eRwlb1OL/D3jzLT4nTYX8+XdufaJF1HBr1Cf3mdNkiEgyu2bvsXNTpN/
ZP7CFCHibgYeHJ7qTTkhoK1DCe4YHjj450oCgg7pv40Mv9E29Rpszie8y8e/ngkc
g9OiAeEd4A32v8bRMAOOX0UZN4afismXBW0k4iwOAguNFiZ/usrrVYTZpJe3wG65
/YM9FdDZ+Mt7BavJdVyGh03PpzoSMrKyEQ673CHhERQyy5oEublrDSmtt5hQJv1W
4tgNOWpw57Gi7Vs7UYd7VvBQKwQZKeQeHJWu1TXUB6pw0lKYvULH6m0dasvc6cGb
WtCBvbQU9MRP0LvdvYOdgmSgn400z7mEwlUWmAFJLIUlDsuRpZmVQ4C1/OUnOSdx
amb7I3bp1z6Rqjs9ADW5h87qDA+q5OmbIZeIDvuRypQOB3yEktAEdUvWb65b1Fgm
9CpzebxyaOUM9YRxDzDd2joZYKnfI3stF6UCrVXaZwYei+Jmzn5gc8ZOoOX9g6gO
eX/sLW2RWRx6XxilaWZijOHJTjokVUpEnD12aGtKO6ou5QbFTwldc2Metpua42cL
5p8wRxEYeKT/EE/GKy/qIEp624QaInSEmfyq8RFKU4em7GSaSUmoQF5151LfnoRY
ARHkEdz+T8s5RI5xSvUZLRMNYjig9tZas3blYfbJHnU7V2+bspQ=
=wW+k
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:
- New AMD models support
- Allow MONITOR/MWAIT to be used for C1 state entry on Hygon too
- Use the special RAPL CPUID bit to detect the functionality on AMD and
Hygon instead of doing family matching.
- Add support for new Intel microcode deprecating TSX on some models
and do not enable kernel workarounds for those CPUs when TSX
transactions always abort, as a result of that microcode update.
* tag 'x86_cpu_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsx: Clear CPUID bits when TSX always force aborts
x86/events/intel: Do not deploy TSX force abort workaround when TSX is deprecated
x86/msr: Define new bits in TSX_FORCE_ABORT MSR
perf/x86/rapl: Use CPUID bit on AMD and Hygon parts
x86/cstate: Allow ACPI C1 FFH MWAIT use on Hygon systems
x86/amd_nb: Add AMD family 19h model 50h PCI ids
x86/cpu: Fix core name for Sapphire Rapids
Intel client processors that support the IA32_TSX_FORCE_ABORT MSR
related to perf counter interaction [1] received a microcode update that
deprecates the Transactional Synchronization Extension (TSX) feature.
The bit FORCE_ABORT_RTM now defaults to 1, writes to this bit are
ignored. A new bit TSX_CPUID_CLEAR clears the TSX related CPUID bits.
The summary of changes to the IA32_TSX_FORCE_ABORT MSR are:
Bit 0: FORCE_ABORT_RTM (legacy bit, new default=1) Status bit that
indicates if RTM transactions are always aborted. This bit is
essentially !SDV_ENABLE_RTM(Bit 2). Writes to this bit are ignored.
Bit 1: TSX_CPUID_CLEAR (new bit, default=0) When set, CPUID.HLE = 0
and CPUID.RTM = 0.
Bit 2: SDV_ENABLE_RTM (new bit, default=0) When clear, XBEGIN will
always abort with EAX code 0. When set, XBEGIN will not be forced to
abort (but will always abort in SGX enclaves). This bit is intended to
be used on developer systems. If this bit is set, transactional
atomicity correctness is not certain. SDV = Software Development
Vehicle (SDV), i.e. developer systems.
Performance monitoring counter 3 is usable in all cases, regardless of
the value of above bits.
Add support for a new CPUID bit - CPUID.RTM_ALWAYS_ABORT (CPUID 7.EDX[11])
- to indicate the status of always abort behavior.
[1] [ bp: Look for document ID 604224, "Performance Monitoring Impact
of Intel Transactional Synchronization Extension Memory". Since
there's no way for us to have stable links to documents... ]
[ bp: Massage and extend commit message. ]
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Link: https://lkml.kernel.org/r/9add61915b4a4eedad74fbd869107863a28b428e.1623704845.git-series.pawan.kumar.gupta@linux.intel.com
The SYSCFG MSR continued being updated beyond the K8 family; drop the K8
name from it.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com
- Improve Intel uncore PMU support:
- Parse uncore 'discovery tables' - a new hardware capability enumeration method
introduced on the latest Intel platforms. This table is in a well-defined PCI
namespace location and is read via MMIO. It is organized in an rbtree.
These uncore tables will allow the discovery of standard counter blocks, but
fancier counters still need to be enumerated explicitly.
- Add Alder Lake support
- Improve IIO stacks to PMON mapping support on Skylake servers
- Add Intel Alder Lake PMU support - which requires the introduction of 'hybrid' CPUs
and PMUs. Alder Lake is a mix of Golden Cove ('big') and Gracemont ('small' - Atom derived)
cores.
The CPU-side feature set is entirely symmetrical - but on the PMU side there's
core type dependent PMU functionality.
- Reduce data loss with CPU level hardware tracing on Intel PT / AUX profiling, by
fixing the AUX allocation watermark logic.
- Improve ring buffer allocation on NUMA systems
- Put 'struct perf_event' into their separate kmem_cache pool
- Add support for synchronous signals for select perf events. The immediate motivation
is to support low-overhead sampling-based race detection for user-space code. The
feature consists of the following main changes:
- Add thread-only event inheritance via perf_event_attr::inherit_thread, which limits
inheritance of events to CLONE_THREAD.
- Add the ability for events to not leak through exec(), via perf_event_attr::remove_on_exec.
- Allow the generation of SIGTRAP via perf_event_attr::sigtrap, extend siginfo with an u64
::si_perf, and add the breakpoint information to ::si_addr and ::si_perf if the event is
PERF_TYPE_BREAKPOINT.
The siginfo support is adequate for breakpoints right now - but the new field can be used
to introduce support for other types of metadata passed over siginfo as well.
- Misc fixes, cleanups and smaller updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCJGpERHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1j9zBAAuVbG2snV6SBSdXLhQcM66N3NckOXvSY5
QjjhQcuwJQEK/NJB3266K5d8qSmdyRBsWf3GCsrmyBT67P1V28K44Pu7oCV0UDtf
mpVRjEP0oR7hNsANSSgo8Fa4ZD7H5waX7dK7925Tvw8By3mMoZoddiD/84WJHhxO
NDF+GRFaRj+/dpbhV8cdCoXTjYdkC36vYuZs3b9lu0tS9D/AJgsNy7TinLvO02Cs
5peP+2y29dgvCXiGBiuJtEA6JyGnX3nUJCvfOZZ/DWDc3fdduARlRrc5Aiq4n/wY
UdSkw1VTZBlZ1wMSdmHQVeC5RIH3uWUtRoNqy0Yc90lBm55AQ0EENwIfWDUDC5zy
USdBqWTNWKMBxlEilUIyqKPQK8LW/31TRzqy8BWKPNcZt5yP5YS1SjAJRDDjSwL/
I+OBw1vjLJamYh8oNiD5b+VLqNQba81jFASfv+HVWcULumnY6ImECCpkg289Fkpi
BVR065boifJDlyENXFbvTxyMBXQsZfA+EhtxG7ju2Ni+TokBbogyCb3L2injPt9g
7jjtTOqmfad4gX1WSc+215iYZMkgECcUd9E+BfOseEjBohqlo7yNKIfYnT8mE/Xq
nb7eHjyvLiE8tRtZ+7SjsujOMHv9LhWFAbSaxU/kEVzpkp0zyd6mnnslDKaaHLhz
goUMOL/D0lg=
=NhQ7
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:
- Improve Intel uncore PMU support:
- Parse uncore 'discovery tables' - a new hardware capability
enumeration method introduced on the latest Intel platforms. This
table is in a well-defined PCI namespace location and is read via
MMIO. It is organized in an rbtree.
These uncore tables will allow the discovery of standard counter
blocks, but fancier counters still need to be enumerated
explicitly.
- Add Alder Lake support
- Improve IIO stacks to PMON mapping support on Skylake servers
- Add Intel Alder Lake PMU support - which requires the introduction of
'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big')
and Gracemont ('small' - Atom derived) cores.
The CPU-side feature set is entirely symmetrical - but on the PMU
side there's core type dependent PMU functionality.
- Reduce data loss with CPU level hardware tracing on Intel PT / AUX
profiling, by fixing the AUX allocation watermark logic.
- Improve ring buffer allocation on NUMA systems
- Put 'struct perf_event' into their separate kmem_cache pool
- Add support for synchronous signals for select perf events. The
immediate motivation is to support low-overhead sampling-based race
detection for user-space code. The feature consists of the following
main changes:
- Add thread-only event inheritance via
perf_event_attr::inherit_thread, which limits inheritance of
events to CLONE_THREAD.
- Add the ability for events to not leak through exec(), via
perf_event_attr::remove_on_exec.
- Allow the generation of SIGTRAP via perf_event_attr::sigtrap,
extend siginfo with an u64 ::si_perf, and add the breakpoint
information to ::si_addr and ::si_perf if the event is
PERF_TYPE_BREAKPOINT.
The siginfo support is adequate for breakpoints right now - but the
new field can be used to introduce support for other types of
metadata passed over siginfo as well.
- Misc fixes, cleanups and smaller updates.
* tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
signal, perf: Add missing TRAP_PERF case in siginfo_layout()
signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures
perf/x86: Allow for 8<num_fixed_counters<16
perf/x86/rapl: Add support for Intel Alder Lake
perf/x86/cstate: Add Alder Lake CPU support
perf/x86/msr: Add Alder Lake CPU support
perf/x86/intel/uncore: Add Alder Lake support
perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
perf/x86/intel: Add Alder Lake Hybrid support
perf/x86: Support filter_match callback
perf/x86/intel: Add attr_update for Hybrid PMUs
perf/x86: Add structures for the attributes of Hybrid PMUs
perf/x86: Register hybrid PMUs
perf/x86: Factor out x86_pmu_show_pmu_cap
perf/x86: Remove temporary pmu assignment in event_init
perf/x86/intel: Factor out intel_pmu_check_extra_regs
perf/x86/intel: Factor out intel_pmu_check_event_constraints
perf/x86/intel: Factor out intel_pmu_check_num_counters
perf/x86: Hybrid PMU support for extra_regs
perf/x86: Hybrid PMU support for event constraints
...
Newer CPUs provide a second mechanism to detect operations with lock
prefix which go accross a cache line boundary. Such operations have to
take bus lock which causes a system wide performance degradation when
these operations happen frequently.
The new mechanism is not using the #AC exception. It triggers #DB and is
restricted to operations in user space. Kernel side split lock access can
only be detected by the #AC based variant. Contrary to the #AC based
mechanism the #DB based variant triggers _after_ the instruction was
executed. The mechanism is CPUID enumerated and contrary to the #AC
version which is based on the magic TEST_CTRL_MSR and model/family based
enumeration on the way to become architectural.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGkr8THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodUKD/9tUXhInR7+1ykEHpMvdmSp48vqY3nc
sKmT22pPl+OchnJ62mw3T8gKpBYVleJmcCaY2qVx7hfaVcWApLGJvX4tmfXmv422
XDSJ6b8Os6wfgx5FR//I17z8ZtXnnuKkPrTMoRsQUw2qLq31y6fdQv+GW/cc1Kpw
mengjmPE+HnpaKbtuQfPdc4a+UvLjvzBMAlDZPTBPKYrP4FFqYVnUVwyTg5aLVDY
gHz4V8+b502RS/zPfTAtE3J848od+NmcUPdFlcG9DVA+hR0Rl0thvruCTFiD2vVh
i9DJ7INof5FoJDEzh0dGsD7x+MB6OY8GZyHdUMeGgIRPtWkqrG52feQQIn2YYlaL
fB3DlpNv7NIJ/0JMlALvh8S0tEoOcYdHqH+M/3K/zbzecg/FAo+lVo8WciGLPqWs
ykUG5/f/OnlTvgB8po1ebJu0h0jHnoK9heWWXk9zWIRVDPXHFOWKW3kSbTTb3icR
9hfjP/SNejpmt9Ju1OTwsgnV7NALIdVX+G5jyIEsjFl31Co1RZNYhHLFvi11FWlQ
/ssvFK9O5ZkliocGCAN9+yuOnM26VqWSCE4fis6/2aSgD2Y4Gpvb//cP96SrcNAH
u8eXNvGLlniJP3F3JImWIfIPQTrpvQhcU4eZ6NtviXqj/utQXX6c9PZ1PLYpcvUh
9AWF8rwhT8X4oA==
=lmi8
-----END PGP SIGNATURE-----
Merge tag 'x86-splitlock-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 bus lock detection updates from Thomas Gleixner:
"Support for enhanced split lock detection:
Newer CPUs provide a second mechanism to detect operations with lock
prefix which go accross a cache line boundary. Such operations have to
take bus lock which causes a system wide performance degradation when
these operations happen frequently.
The new mechanism is not using the #AC exception. It triggers #DB and
is restricted to operations in user space. Kernel side split lock
access can only be detected by the #AC based variant.
Contrary to the #AC based mechanism the #DB based variant triggers
_after_ the instruction was executed. The mechanism is CPUID
enumerated and contrary to the #AC version which is based on the magic
TEST_CTRL_MSR and model/family based enumeration on the way to become
architectural"
* tag 'x86-splitlock-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/admin-guide: Change doc for split_lock_detect parameter
x86/traps: Handle #DB for bus lock
x86/cpufeatures: Enumerate #DB for bus lock detection
Some platforms, e.g. Alder Lake, have hybrid architecture. Although most
PMU capabilities are the same, there are still some unique PMU
capabilities for different hybrid PMUs. Perf should register a dedicated
pmu for each hybrid PMU.
Add a new struct x86_hybrid_pmu, which saves the dedicated pmu and
capabilities for each hybrid PMU.
The architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicates the
architecture features which are available on all hybrid PMUs. The
architecture features are stored in the global x86_pmu.intel_cap.
For Alder Lake, the model-specific features are perf metrics and
PEBS-via-PT. The corresponding bits of the global x86_pmu.intel_cap
should be 0 for these two features. Perf should not use the global
intel_cap to check the features on a hybrid system.
Add a dedicated intel_cap in the x86_hybrid_pmu to store the
model-specific capabilities. Use the dedicated intel_cap to replace
the global intel_cap for thse two features. The dedicated intel_cap
will be set in the following "Add Alder Lake Hybrid support" patch.
Add is_hybrid() to distinguish a hybrid system. ADL may have an
alternative configuration. With that configuration, the
X86_FEATURE_HYBRID_CPU is not set. Perf cannot rely on the feature bit.
Add a new static_key_false, perf_is_hybrid, to indicate a hybrid system.
It will be assigned in the following "Add Alder Lake Hybrid support"
patch as well.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1618237865-33448-5-git-send-email-kan.liang@linux.intel.com
Bus locks degrade performance for the whole system, not just for the CPU
that requested the bus lock. Two CPU features "#AC for split lock" and
"#DB for bus lock" provide hooks so that the operating system may choose
one of several mitigation strategies.
#AC for split lock is already implemented. Add code to use the #DB for
bus lock feature to cover additional situations with new options to
mitigate.
split_lock_detect=
#AC for split lock #DB for bus lock
off Do nothing Do nothing
warn Kernel OOPs Warn once per task and
Warn once per task and and continues to run.
disable future checking
When both features are
supported, warn in #AC
fatal Kernel OOPs Send SIGBUS to user.
Send SIGBUS to user
When both features are
supported, fatal in #AC
ratelimit:N Do nothing Limit bus lock rate to
N per second in the
current non-root user.
Default option is "warn".
Hardware only generates #DB for bus lock detect when CPL>0 to avoid
nested #DB from multiple bus locks while the first #DB is being handled.
So no need to handle #DB for bus lock detected in the kernel.
#DB for bus lock is enabled by bus lock detection bit 2 in DEBUGCTL MSR
while #AC for split lock is enabled by split lock detection bit 29 in
TEST_CTRL MSR.
Both breakpoint and bus lock in the same instruction can trigger one #DB.
The bus lock is handled before the breakpoint in the #DB handler.
Delivery of #DB for bus lock in userspace clears DR6[11], which is set by
the #DB handler right after reading DR6.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com
There are two definitions for the TSC deadline MSR in msr-index.h,
one with an underscore and one without. Axe one of them and move
all the references over to the other one.
[ bp: Fixup the MSR define in handle_fastpath_set_msr_irqoff() too. ]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200305174706.0D6B8EE4@viggo.jf.intel.com
* PSCI relay at EL2 when "protected KVM" is enabled
* New exception injection code
* Simplification of AArch32 system register handling
* Fix PMU accesses when no PMU is enabled
* Expose CSV3 on non-Meltdown hosts
* Cache hierarchy discovery fixes
* PV steal-time cleanups
* Allow function pointers at EL2
* Various host EL2 entry cleanups
* Simplification of the EL2 vector allocation
s390:
* memcg accouting for s390 specific parts of kvm and gmap
* selftest for diag318
* new kvm_stat for when async_pf falls back to sync
x86:
* Tracepoints for the new pagetable code from 5.10
* Catch VFIO and KVM irqfd events before userspace
* Reporting dirty pages to userspace with a ring buffer
* SEV-ES host support
* Nested VMX support for wait-for-SIPI activity state
* New feature flag (AVX512 FP16)
* New system ioctl to report Hyper-V-compatible paravirtualization features
Generic:
* Selftest improvements
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/bdL4UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNgQQgAnTH6rhXa++Zd5F0EM2NwXwz3iEGb
lOq1DZSGjs6Eekjn8AnrWbmVQr+CBCuGU9MrxpSSzNDK/awryo3NwepOWAZw9eqk
BBCVwGBbJQx5YrdgkGC0pDq2sNzcpW/VVB3vFsmOxd9eHblnuKSIxEsCCXTtyqIt
XrLpQ1UhvI4yu102fDNhuFw2EfpzXm+K0Lc0x6idSkdM/p7SyeOxiv8hD4aMr6+G
bGUQuMl4edKZFOWFigzr8NovQAvDHZGrwfihu2cLRYKLhV97QuWVmafv/yYfXcz2
drr+wQCDNzDOXyANnssmviazrhOX0QmTAhbIXGGX/kTxYKcfPi83ZLoI3A==
=ISud
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Much x86 work was pushed out to 5.12, but ARM more than made up for it.
ARM:
- PSCI relay at EL2 when "protected KVM" is enabled
- New exception injection code
- Simplification of AArch32 system register handling
- Fix PMU accesses when no PMU is enabled
- Expose CSV3 on non-Meltdown hosts
- Cache hierarchy discovery fixes
- PV steal-time cleanups
- Allow function pointers at EL2
- Various host EL2 entry cleanups
- Simplification of the EL2 vector allocation
s390:
- memcg accouting for s390 specific parts of kvm and gmap
- selftest for diag318
- new kvm_stat for when async_pf falls back to sync
x86:
- Tracepoints for the new pagetable code from 5.10
- Catch VFIO and KVM irqfd events before userspace
- Reporting dirty pages to userspace with a ring buffer
- SEV-ES host support
- Nested VMX support for wait-for-SIPI activity state
- New feature flag (AVX512 FP16)
- New system ioctl to report Hyper-V-compatible paravirtualization features
Generic:
- Selftest improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: SVM: fix 32-bit compilation
KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
KVM: SVM: Provide support to launch and run an SEV-ES guest
KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
KVM: SVM: Provide support for SEV-ES vCPU loading
KVM: SVM: Provide support for SEV-ES vCPU creation/loading
KVM: SVM: Update ASID allocation to support SEV-ES guests
KVM: SVM: Set the encryption mask for the SVM host save area
KVM: SVM: Add NMI support for an SEV-ES guest
KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
KVM: SVM: Do not report support for SMM for an SEV-ES guest
KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
KVM: SVM: Add support for EFER write traps for an SEV-ES guest
KVM: SVM: Support string IO operations for an SEV-ES guest
KVM: SVM: Support MMIO for an SEV-ES guest
KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
KVM: SVM: Create trace events for VMGEXIT processing
...
- Use local_clock() instead of jiffies in the cpufreq statistics to
improve accuracy (Viresh Kumar).
- Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
drivers (Viresh Kumar).
- Clean up the cpufreq core, the intel_pstate driver and the
schedutil cpufreq governor (Rafael Wysocki).
- Fix up error code paths in the sti-cpufreq and mediatek cpufreq
drivers (Yangtao Li, Qinglang Miao).
- Fix cpufreq_online() to return error codes instead of success (0)
in all cases when it fails (Wang ShaoBo).
- Add mt8167 support to the mediatek cpufreq driver and blacklist
mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
- Modify the tegra194 cpufreq driver to always return values from
the frequency table as the current frequency and clean up that
driver (Sumit Gupta, Jon Hunter).
- Modify the arm_scmi cpufreq driver to allow it to discover the
power scale present in the performance protocol and provide this
information to the Energy Model (Lukasz Luba).
- Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
Rohár).
- Clean up the CPPC cpufreq driver (Ionela Voinescu).
- Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
Bergmann).
- Rework the poling interval selection for the polling state in
cpuidle (Mel Gorman).
- Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle
driver (Ulf Hansson).
- Modify the OPP framework to support empty (node-less) OPP tables
in DT for passing dependency information (Nicola Mazzucato).
- Fix potential lockdep issue in the OPP core and clean up the OPP
core (Viresh Kumar).
- Modify dev_pm_opp_put_regulators() to accept a NULL argument and
update its users accordingly (Viresh Kumar).
- Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
- Add support for governor feature flags to devfreq, make devfreq
sysfs file permissions depend on the governor and clean up the
devfreq core (Chanwoo Choi).
- Clean up the tegra20 devfreq driver and deprecate it to allow
another driver based on EMC_STAT to be used instead of it (Dmitry
Osipenko).
- Add interconnect support to the tegra30 devfreq driver, allow it
to take the interconnect and OPP information from DT and clean it
up ((Dmitry Osipenko).
- Add interconnect support to the exynos-bus devfreq driver along
with interconnect properties documentation (Sylwester Nawrocki).
- Add suport for AMD Fam17h and Fam19h processors to the RAPL power
capping driver (Victor Ding, Kim Phillips).
- Fix handling of overly long constraint names in the powercap
framework (Lukasz Luba).
- Fix the wakeup configuration handling for bridges in the ACPI
device power management core (Rafael Wysocki).
- Add support for using an abstract scale for power units in the
Energy Model (EM) and document it (Lukasz Luba).
- Add em_cpu_energy() micro-optimization to the EM (Pavankumar
Kondeti).
- Modify the generic power domains (genpd) framwework to support
suspend-to-idle (Ulf Hansson).
- Fix creation of debugfs nodes in genpd (Thierry Strudel).
- Clean up genpd (Lina Iyer).
- Clean up the core system-wide suspend code and make it print
driver flags for devices with debug enabled (Alex Shi, Patrice
Chotard, Chen Yu).
- Modify the ACPI system reboot code to make it prepare for system
power off to avoid confusing the platform firmware (Kai-Heng Feng).
- Update the pm-graph (multiple changes, mostly usability-related)
and cpupower (online and offline CPU information support) PM
utilities (Todd Brandt, Brahadambal Srinivasan).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl/Y8mcSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxjY4QAKsNFJeEtjGCxq7MxQIML3QLAsdJM9of
9kkY9skMEw4v1TRmyy7sW9jZW2pLSRcLJwWRKWu4143qUS3YUp2DQ0lqX4WyXoWu
BhnkhkMUl6iCeBO8CWnt8zsTuqSa20A13sL9LyqN1+7OZKHD8StbT4hKjBncdNNN
4aDj+1uAPyOgj2iCUZuHQ8DtpBvOLjgTh367vbhbufjeJ//8/9+R7s4Xzrj7wtmv
JlE0LDgvge9QeGTpjhxQJzn0q2/H5fg9jbmjPXUfbHJNuyKhrqnmjGyrN5m256JI
8DqGqQtJpmFp7Ihrur3uKTk3gWO05YwJ1FdeEooAKEjEMObm5xuYhKVRoDhmlJAu
G6ui+OAUvNR0FffJtbzvWe/pLovLGOEOHdvTrZxUF8Abo6br3untTm8rKTi1fhaF
wWndSMw0apGsPzCx5T+bE7AbJz2QHFpLhaVAutenuCzNI8xoMlxNKEzsaVz/+FqL
Pq/PdFaM4vNlMbv7hkb/fujkCs/v3EcX2ihzvt7I2o8dBS0D1X8A4mnuWJmiGslw
1ftbJ6M9XacwkPBTHPgeXxJh2C1yxxe5VQ9Z5fWWi7sPOUeJnUwxKaluv+coFndQ
sO6JxsPQ4hQihg8yOxLEkL6Wn68sZlmp+u2Oj+TPFAsAGANIA8rJlBPo1ppJWvdQ
j1OCIc/qzwpH
=BVdX
-----END PGP SIGNATURE-----
Merge tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These update cpufreq (core and drivers), cpuidle (polling state
implementation and the PSCI driver), the OPP (operating performance
points) framework, devfreq (core and drivers), the power capping RAPL
(Running Average Power Limit) driver, the Energy Model support, the
generic power domains (genpd) framework, the ACPI device power
management, the core system-wide suspend code and power management
utilities.
Specifics:
- Use local_clock() instead of jiffies in the cpufreq statistics to
improve accuracy (Viresh Kumar).
- Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
drivers (Viresh Kumar).
- Clean up the cpufreq core, the intel_pstate driver and the
schedutil cpufreq governor (Rafael Wysocki).
- Fix up error code paths in the sti-cpufreq and mediatek cpufreq
drivers (Yangtao Li, Qinglang Miao).
- Fix cpufreq_online() to return error codes instead of success (0)
in all cases when it fails (Wang ShaoBo).
- Add mt8167 support to the mediatek cpufreq driver and blacklist
mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
- Modify the tegra194 cpufreq driver to always return values from the
frequency table as the current frequency and clean up that driver
(Sumit Gupta, Jon Hunter).
- Modify the arm_scmi cpufreq driver to allow it to discover the
power scale present in the performance protocol and provide this
information to the Energy Model (Lukasz Luba).
- Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
Rohár).
- Clean up the CPPC cpufreq driver (Ionela Voinescu).
- Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
Bergmann).
- Rework the poling interval selection for the polling state in
cpuidle (Mel Gorman).
- Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver
(Ulf Hansson).
- Modify the OPP framework to support empty (node-less) OPP tables in
DT for passing dependency information (Nicola Mazzucato).
- Fix potential lockdep issue in the OPP core and clean up the OPP
core (Viresh Kumar).
- Modify dev_pm_opp_put_regulators() to accept a NULL argument and
update its users accordingly (Viresh Kumar).
- Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
- Add support for governor feature flags to devfreq, make devfreq
sysfs file permissions depend on the governor and clean up the
devfreq core (Chanwoo Choi).
- Clean up the tegra20 devfreq driver and deprecate it to allow
another driver based on EMC_STAT to be used instead of it (Dmitry
Osipenko).
- Add interconnect support to the tegra30 devfreq driver, allow it to
take the interconnect and OPP information from DT and clean it up
(Dmitry Osipenko).
- Add interconnect support to the exynos-bus devfreq driver along
with interconnect properties documentation (Sylwester Nawrocki).
- Add suport for AMD Fam17h and Fam19h processors to the RAPL power
capping driver (Victor Ding, Kim Phillips).
- Fix handling of overly long constraint names in the powercap
framework (Lukasz Luba).
- Fix the wakeup configuration handling for bridges in the ACPI
device power management core (Rafael Wysocki).
- Add support for using an abstract scale for power units in the
Energy Model (EM) and document it (Lukasz Luba).
- Add em_cpu_energy() micro-optimization to the EM (Pavankumar
Kondeti).
- Modify the generic power domains (genpd) framwework to support
suspend-to-idle (Ulf Hansson).
- Fix creation of debugfs nodes in genpd (Thierry Strudel).
- Clean up genpd (Lina Iyer).
- Clean up the core system-wide suspend code and make it print driver
flags for devices with debug enabled (Alex Shi, Patrice Chotard,
Chen Yu).
- Modify the ACPI system reboot code to make it prepare for system
power off to avoid confusing the platform firmware (Kai-Heng Feng).
- Update the pm-graph (multiple changes, mostly usability-related)
and cpupower (online and offline CPU information support) PM
utilities (Todd Brandt, Brahadambal Srinivasan)"
* tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
cpufreq: Fix cpufreq_online() return value on errors
cpufreq: Fix up several kerneldoc comments
cpufreq: stats: Use local_clock() instead of jiffies
cpufreq: schedutil: Simplify sugov_update_next_freq()
cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
PM: domains: create debugfs nodes when adding power domains
opp: of: Allow empty opp-table with opp-shared
dt-bindings: opp: Allow empty OPP tables
media: venus: dev_pm_opp_put_*() accepts NULL argument
drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
drm/lima: dev_pm_opp_put_*() accepts NULL argument
PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
opp: Reduce the size of critical section in _opp_kref_release()
PM / EM: Micro optimization in em_cpu_energy
cpufreq: arm_scmi: Discover the power scale in performance protocol
...
applications to populate protected regions of user code and data called
enclaves. Once activated, the new hardware protects enclave code and
data from outside access and modification.
Enclaves provide a place to store secrets and process data with those
secrets. SGX has been used, for example, to decrypt video without
exposing the decryption keys to nosy debuggers that might be used to
subvert DRM. Software has generally been rewritten specifically to
run in enclaves, but there are also projects that try to run limited
unmodified software in enclaves."
Most of the functionality is concentrated into arch/x86/kernel/cpu/sgx/
except the addition of a new mprotect() hook to control enclave page
permissions and support for vDSO exceptions fixup which will is used by
SGX enclaves.
All this work by Sean Christopherson, Jarkko Sakkinen and many others.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/XTtMACgkQEsHwGGHe
VUqxFw/+NZGf2b3CWPcrvwXCpkvSpIrqh1jQwyvkZyJ1gen7Vy8dkvf99h8+zQPI
4wSArEyjhYJKAAmBNefLKi/Cs/bdkGzLlZyDGqtM641XRjf0xXIpQkOBb6UBa+Pv
to8veQmVH2bBTM49qnd+H1wM6FzYvhTYCD8xr4HlLXtIfpP2CK2GvCb8s/4LifgD
fTucZX9TFwLgVkWOHWHN0n8XMR2Fjb2YCrwjFMKyr/M2W+pPoOCTIt4PWDuXiOeG
rFP7R4DT9jDg8ht5j2dHQT/Bo8TvTCB4Oj98MrX1TTgkSjLJySSMfyQg5EwNfSIa
HC0lg/6qwAxnhWX7cCCBETNZ4aYDmz/dxcCSsLbomGP9nMaUgUy7qn5nNuNbJilb
oCBsr8LDMzu1LJzmkduM8Uw6OINh+J8ICoVXaR5pS7gSZz/+vqIP/rK691AiqhJL
QeMkI9gQ83jEXpr/AV7ABCjGCAeqELOkgravUyTDev24eEc0LyU0qENpgxqWSTca
OvwSWSwNuhCKd2IyKZBnOmjXGwvncwX0gp1KxL9WuLkR6O8XldLAYmVCwVAOrIh7
snRot8+3qNjELa65Nh5DapwLJrU24TRoKLHLgfWK8dlqrMejNtXKucQ574Np0feR
p2hrNisOrtCwxAt7OAgWygw8agN6cJiY18onIsr4wSBm5H7Syb0=
=k7tj
-----END PGP SIGNATURE-----
Merge tag 'x86_sgx_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGC support from Borislav Petkov:
"Intel Software Guard eXtensions enablement. This has been long in the
making, we were one revision number short of 42. :)
Intel SGX is new hardware functionality that can be used by
applications to populate protected regions of user code and data
called enclaves. Once activated, the new hardware protects enclave
code and data from outside access and modification.
Enclaves provide a place to store secrets and process data with those
secrets. SGX has been used, for example, to decrypt video without
exposing the decryption keys to nosy debuggers that might be used to
subvert DRM. Software has generally been rewritten specifically to run
in enclaves, but there are also projects that try to run limited
unmodified software in enclaves.
Most of the functionality is concentrated into arch/x86/kernel/cpu/sgx/
except the addition of a new mprotect() hook to control enclave page
permissions and support for vDSO exceptions fixup which will is used
by SGX enclaves.
All this work by Sean Christopherson, Jarkko Sakkinen and many others"
* tag 'x86_sgx_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
x86/sgx: Return -EINVAL on a zero length buffer in sgx_ioc_enclave_add_pages()
x86/sgx: Fix a typo in kernel-doc markup
x86/sgx: Fix sgx_ioc_enclave_provision() kernel-doc comment
x86/sgx: Return -ERESTARTSYS in sgx_ioc_enclave_add_pages()
selftests/sgx: Use a statically generated 3072-bit RSA key
x86/sgx: Clarify 'laundry_list' locking
x86/sgx: Update MAINTAINERS
Documentation/x86: Document SGX kernel architecture
x86/sgx: Add ptrace() support for the SGX driver
x86/sgx: Add a page reclaimer
selftests/x86: Add a selftest for SGX
x86/vdso: Implement a vDSO for Intel SGX enclave call
x86/traps: Attempt to fixup exceptions in vDSO before signaling
x86/fault: Add a helper function to sanitize error code
x86/vdso: Add support for exception fixup in vDSO functions
x86/sgx: Add SGX_IOC_ENCLAVE_PROVISION
x86/sgx: Add SGX_IOC_ENCLAVE_INIT
x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES
x86/sgx: Add SGX_IOC_ENCLAVE_CREATE
x86/sgx: Add an SGX misc driver interface
...
On systems that do not have hardware enforced cache coherency between
encrypted and unencrypted mappings of the same physical page, the
hypervisor can use the VM page flush MSR (0xc001011e) to flush the cache
contents of an SEV guest page. When a small number of pages are being
flushed, this can be used in place of issuing a WBINVD across all CPUs.
CPUID 0x8000001f_eax[2] is used to determine if the VM page flush MSR is
available. Add a CPUID feature to indicate it is supported and define the
MSR.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <f1966379e31f9b208db5257509c4a089a87d33d0.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The SGX Launch Control hardware helps restrict which enclaves the
hardware will run. Launch control is intended to restrict what software
can run with enclave protections, which helps protect the overall system
from bad enclaves.
For the kernel's purposes, there are effectively two modes in which the
launch control hardware can operate: rigid and flexible. In its rigid
mode, an entity other than the kernel has ultimate authority over which
enclaves can be run (firmware, Intel, etc...). In its flexible mode, the
kernel has ultimate authority over which enclaves can run.
Enable X86_FEATURE_SGX_LC to enumerate when the CPU supports SGX Launch
Control in general.
Add MSR_IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}, which when combined contain a
SHA256 hash of a 3072-bit RSA public key. The hardware allows SGX enclaves
signed with this public key to initialize and run [*]. Enclaves not signed
with this key can not initialize and run.
Add FEAT_CTL_SGX_LC_ENABLED, which informs whether the SGXLEPUBKEYHASH MSRs
can be written by the kernel.
If the MSRs do not exist or are read-only, the launch control hardware is
operating in rigid mode. Linux does not and will not support creating
enclaves when hardware is configured in rigid mode because it takes away
the authority for launch decisions from the kernel. Note, this does not
preclude KVM from virtualizing/exposing SGX to a KVM guest when launch
control hardware is operating in rigid mode.
[*] Intel SDM: 38.1.4 Intel SGX Launch Control Configuration
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-5-jarkko@kernel.org
Populate X86_FEATURE_SGX feature from CPUID and tie it to the Kconfig
option with disabled-features.h.
IA32_FEATURE_CONTROL.SGX_ENABLE must be examined in addition to the CPUID
bits to enable full SGX support. The BIOS must both set this bit and lock
IA32_FEATURE_CONTROL for SGX to be supported (Intel SDM section 36.7.1).
The setting or clearing of this bit has no impact on the CPUID bits above,
which is why it needs to be detected separately.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-4-jarkko@kernel.org
Enable AMD Fam17h RAPL support for the power capping framework.
The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh
(Zen1) PPR.
Tested by comparing the results of following two sysfs entries and the
values directly read from corresponding MSRs via /dev/cpu/[x]/msr:
/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj
Signed-off-by: Victor Ding <victording@google.com>
Acked-by: Kim Phillips <kim.phillips@amd.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
MSRs in the rest of this file are sorted by their addresses; fixing the
two outliers.
No functional changes.
Signed-off-by: Victor Ding <victording@google.com>
Acked-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an
optional additional error logging mode which is enabled by an MSR.
Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu,
but userspace should not be poking at MSRs. So move the enabling into
the kernel.
[ bp: Correct the explanation why this is done. ]
Suggested-by: Boris Petkov <bp@alien8.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201030190807.GA13884@agluck-desk2.amr.corp.intel.com
called SEV by also encrypting the guest register state, making the
registers inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared between
the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so
in order for that exception mechanism to work, the early x86 init code
needed to be made able to handle exceptions, which, in itself, brings
a bunch of very nice cleanups and improvements to the early boot code
like an early page fault handler, allowing for on-demand building of the
identity mapping. With that, !KASLR configurations do not use the EFI
page table anymore but switch to a kernel-controlled one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly
separate from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and behind
static keys to minimize the performance impact on !SEV-ES setups.
Work by Joerg Roedel and Thomas Lendacky and others.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+FiKYACgkQEsHwGGHe
VUqS5BAAlh5mKwtxXMyFyAIHa5tpsgDjbecFzy1UVmZyxN0JHLlM3NLmb+K52drY
PiWjNNMi/cFMFazkuLFHuY0poBWrZml8zRS/mExKgUJC6EtguS9FQnRE9xjDBoWQ
gOTSGJWEzT5wnFqo8qHwlC2CDCSF1hfL8ks3cUFW2tCWus4F9pyaMSGfFqD224rg
Lh/8+arDMSIKE4uH0cm7iSuyNpbobId0l5JNDfCEFDYRigQZ6pZsQ9pbmbEpncs4
rmjDvBA5eHDlNMXq0ukqyrjxWTX4ZLBOBvuLhpyssSXnnu2T+Tcxg09+ZSTyJAe0
LyC9Wfo0v78JASXMAdeH9b1d1mRYNMqjvnBItNQoqweoqUXWz7kvgxCOp6b/G4xp
cX5YhB6BprBW2DXL45frMRT/zX77UkEKYc5+0IBegV2xfnhRsjqQAQaWLIksyEaX
nz9/C6+1Sr2IAv271yykeJtY6gtlRjg/usTlYpev+K0ghvGvTmuilEiTltjHrso1
XAMbfWHQGSd61LNXofvx/GLNfGBisS6dHVHwtkayinSjXNdWxI6w9fhbWVjQ+y2V
hOF05lmzaJSG5kPLrsFHFqm2YcxOmsWkYYDBHvtmBkMZSf5B+9xxDv97Uy9NETcr
eSYk//TEkKQqVazfCQS/9LSm0MllqKbwNO25sl0Tw2k6PnheO2g=
=toqi
-----END PGP SIGNATURE-----
Merge tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV-ES support from Borislav Petkov:
"SEV-ES enhances the current guest memory encryption support called SEV
by also encrypting the guest register state, making the registers
inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared
between the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
so in order for that exception mechanism to work, the early x86 init
code needed to be made able to handle exceptions, which, in itself,
brings a bunch of very nice cleanups and improvements to the early
boot code like an early page fault handler, allowing for on-demand
building of the identity mapping. With that, !KASLR configurations do
not use the EFI page table anymore but switch to a kernel-controlled
one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly separate
from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and
behind static keys to minimize the performance impact on !SEV-ES
setups.
Work by Joerg Roedel and Thomas Lendacky and others"
* tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
x86/sev-es: Check required CPU features for SEV-ES
x86/efi: Add GHCB mappings when SEV-ES is active
x86/sev-es: Handle NMI State
x86/sev-es: Support CPU offline/online
x86/head/64: Don't call verify_cpu() on starting APs
x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
x86/realmode: Setup AP jump table
x86/realmode: Add SEV-ES specific trampoline entry point
x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
x86/sev-es: Handle #DB Events
x86/sev-es: Handle #AC Events
x86/sev-es: Handle VMMCALL Events
x86/sev-es: Handle MWAIT/MWAITX Events
x86/sev-es: Handle MONITOR/MONITORX Events
x86/sev-es: Handle INVD Events
x86/sev-es: Handle RDPMC Events
x86/sev-es: Handle RDTSC(P) Events
...
x86 Intel updates:
- Add Jasper Lake support
- Add support for TopDown metrics on Ice Lake
- Fix Ice Lake & Tiger Lake uncore support, add Snow Ridge support
- Add a PCI sub driver to support uncore PMUs where the PCI resources
have been claimed already - extending the range of supported systems.
x86 AMD updates:
- Restore 'perf stat -a' behaviour to program the uncore PMU
to count all CPU threads.
- Fix setting the proper count when sampling Large Increment
per Cycle events / 'paired' events.
- Fix IBS Fetch sampling on F17h and some other IBS fine tuning,
greatly reducing the number of interrupts when large sample
periods are specified.
- Extends Family 17h RAPL support to also work on compatible
F19h machines.
Core code updates:
- Fix race in perf_mmap_close()
- Add PERF_EV_CAP_SIBLING, to denote that sibling events should be
closed if the leader is removed.
- Smaller fixes and updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+Ef40RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1h7NQ//ZdQ26Yg79ZaxBX1QSINJ9AgXDi6rXs75
qU9qNwr/6EF+633RZoPQGAE0Iy5v6h7iLFokcJzM9+kK/rE3ax44tSnPlcMa0+6N
SHXKCa5iL+hH7o2Spo2MZwCYseH79rloX3TSH7ajnN3X8PvwgWshF0lUE3WEWtCs
eHSojdCk43IuL9TpusuNOBM2FvgnheFYWiMbFHd0MTBUMxul30sLVCG8IIWCPA+q
TwG4RJS3X42VbL3SuAGFmOv4OmqNsfkvHvjpDs4NF07tRB9zjXzGrxmGhgSw0NAN
2KK25qbmrpKATIb4Eqsgk/yikX/SCrDEXrjhg3r8FnyPvRfctq1crZjjf672PI2E
bDda76dH6Lq9jv5fsyJjas5OsYdMKBCnA+tGQxXPGbmTXeEcYMRbDnwhYnevI/Q/
8pP+xstF0pmBA3tvpDPrQnYH72Qt7CLJSdcTB15NqZftU2tJxaAyJGx4gJy33jxQ
wu6BIEGHQ7onQYiIyTwsBHyz6xNsF/CRHwAPcGdYrRRbXB5K5nxHiXNb4awciTMx
2HF31/S4OqURNpfcpxOQo+1fb/cLqj3loGqE4jCTwkbS3lrHcAcfxyv9QNn77l1f
hdQ0jworbUNVLUYEUQz1bkZ06GD3LSSas2ZlY1NNdHo62mjyXMQmgirNcZmrFgWl
tl2gNFAU9x4=
=2fuY
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"x86 Intel updates:
- Add Jasper Lake support
- Add support for TopDown metrics on Ice Lake
- Fix Ice Lake & Tiger Lake uncore support, add Snow Ridge support
- Add a PCI sub driver to support uncore PMUs where the PCI resources
have been claimed already - extending the range of supported
systems.
x86 AMD updates:
- Restore 'perf stat -a' behaviour to program the uncore PMU to count
all CPU threads.
- Fix setting the proper count when sampling Large Increment per
Cycle events / 'paired' events.
- Fix IBS Fetch sampling on F17h and some other IBS fine tuning,
greatly reducing the number of interrupts when large sample periods
are specified.
- Extends Family 17h RAPL support to also work on compatible F19h
machines.
Core code updates:
- Fix race in perf_mmap_close()
- Add PERF_EV_CAP_SIBLING, to denote that sibling events should be
closed if the leader is removed.
- Smaller fixes and updates"
* tag 'perf-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
perf/core: Fix race in the perf_mmap_close() function
perf/x86: Fix n_metric for cancelled txn
perf/x86: Fix n_pair for cancelled txn
x86/events/amd/iommu: Fix sizeof mismatch
perf/x86/intel: Check perf metrics feature for each CPU
perf/x86/intel: Fix Ice Lake event constraint table
perf/x86/intel/uncore: Fix the scale of the IMC free-running events
perf/x86/intel/uncore: Fix for iio mapping on Skylake Server
perf/x86/msr: Add Jasper Lake support
perf/x86/intel: Add Jasper Lake support
perf/x86/intel/uncore: Reduce the number of CBOX counters
perf/x86/intel/uncore: Update Ice Lake uncore units
perf/x86/intel/uncore: Split the Ice Lake and Tiger Lake MSR uncore support
perf/x86/intel/uncore: Support PCIe3 unit on Snow Ridge
perf/x86/intel/uncore: Generic support for the PCI sub driver
perf/x86/intel/uncore: Factor out uncore_pci_pmu_unregister()
perf/x86/intel/uncore: Factor out uncore_pci_pmu_register()
perf/x86/intel/uncore: Factor out uncore_pci_find_dev_pmu()
perf/x86/intel/uncore: Factor out uncore_pci_get_dev_die_info()
perf/amd/uncore: Inform the user how many counters each uncore PMU has
...
The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses the PASID to identify a
process address space and direct responses to the right address space.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1600187413-163670-7-git-send-email-fenghua.yu@intel.com
Neither IbsBrTarget nor OPDATA4 are populated in IBS Fetch mode.
Don't accumulate them into raw sample user data in that case.
Also, in Fetch mode, add saving the IBS Fetch Control Extended MSR.
Technically, there is an ABI change here with respect to the IBS raw
sample data format, but I don't see any perf driver version information
being included in perf.data file headers, but, existing users can detect
whether the size of the sample record has reduced by 8 bytes to
determine whether the IBS driver has this fix.
Fixes: 904cb3677f ("perf/x86/amd/ibs: Update IBS MSRs and feature definitions")
Reported-by: Stephane Eranian <stephane.eranian@google.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200908214740.18097-6-kim.phillips@amd.com
Add a sev_es_active() function for checking whether SEV-ES is enabled.
Also cache the value of MSR_AMD64_SEV at boot to speed up the feature
checking in the running code.
[ bp: Remove "!!" in sev_active() too. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-37-joro@8bytes.org
Add the first handler for #VC exceptions. At stage 1 there is no GHCB
yet because the kernel might still be running on the EFI page table.
The stage 1 handler is limited to the MSR-based protocol to talk to the
hypervisor and can only support CPUID exit-codes, but that is enough to
get to stage 2.
[ bp: Zap superfluous newlines after rd/wrmsr instruction mnemonics. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-20-joro@8bytes.org
Ice Lake supports the hardware TopDown metrics feature, which can free
up the scarce GP counters.
Update the event constraints for the metrics events. The metric counters
do not exist, which are mapped to a dummy offset. The sharing between
multiple users of the same metric without multiplexing is not allowed.
Implement set_topdown_event_period for Ice Lake. The values in
PERF_METRICS MSR are derived from the fixed counter 3. Both registers
should start from zero.
Implement update_topdown_event for Ice Lake. The metric is reported by
multiplying the metric (fraction) with slots. To maintain accurate
measurements, both registers are cleared for each update. The fixed
counter 3 should always be cleared before the PERF_METRICS.
Implement td_attr for the new metrics events and the new slots fixed
counter. Make them visible to the perf user tools.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-11-kan.liang@linux.intel.com
Intro
=====
The TopDown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors. Current perf has supported the method.
The method works well, but there is one problem. To collect the TopDown
events, several GP counters have to be used. If a user wants to collect
other events at the same time, the multiplexing probably be triggered,
which impacts the accuracy.
To free up the scarce GP counters, the hardware TopDown metrics feature
is introduced from Ice Lake. The hardware implements an additional
"metrics" register and a new Fixed Counter 3 that measures pipeline
"slots". The TopDown events can be calculated from them instead.
Events
======
The level 1 TopDown has four metrics. There is no event-code assigned to
the TopDown metrics. Four metric events are exported as separate perf
events, which map to the internal "metrics" counter register. Those
events do not exist in hardware, but can be allocated by the scheduler.
For the event mapping, a special 0x00 event code is used, which is
reserved for fake events. The metric events start from umask 0x10.
When setting up the metric events, they point to the Fixed Counter 3.
They have to be specially handled.
- Add the update_topdown_event() callback to read the additional metrics
MSR and generate the metrics.
- Add the set_topdown_event_period() callback to initialize metrics MSR
and the fixed counter 3.
- Add a variable n_metric_event to track the number of the accepted
metrics events. The sharing between multiple users of the same metric
without multiplexing is not allowed.
- Only enable/disable the fixed counter 3 when there are no other active
TopDown events, which avoid the unnecessary writing of the fixed
control register.
- Disable the PMU when reading the metrics event. The metrics MSR and
the fixed counter 3 are read separately. The values may be modified by
an NMI.
All four metric events don't support sampling. Since they will be
handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
introduced to indicate this case.
The slots event can support both sampling and counting.
For counting, the flag is also applied.
For sampling, it will be handled normally as other normal events.
Groups
======
The slots event is required in a Topdown group.
To avoid reading the METRICS register multiple times, the metrics and
slots value can only be updated by slots event in a group.
All active slots and metrics events will be updated one time.
Therefore, the slots event must be before any metric events in a Topdown
group.
NMI
======
The METRICS related register may be overflow. The bit 48 of the STATUS
register will be set. If so, PERF_METRICS and Fixed counter 3 are
required to be reset. The patch also update all active slots and
metrics events in the NMI handler.
The update_topdown_event() has to read two registers separately. The
values may be modified by an NMI. PMU has to be disabled before calling
the function.
RDPMC
======
RDPMC is temporarily disabled. A later patch will enable it.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
- Make the Energy Model cover non-CPU devices (Lukasz Luba).
- Add Ice Lake server idle states table to the intel_idle driver
and eliminate a redundant static variable from it (Chen Yu,
Rafael Wysocki).
- Eliminate all W=1 build warnings from cpufreq (Lee Jones).
- Add support for Sapphire Rapids and for Power Limit 4 to the
Intel RAPL power capping driver (Sumeet Pawnikar, Zhang Rui).
- Fix function name in kerneldoc comments in the idle_inject power
capping driver (Yangtao Li).
- Fix locking issues with cpufreq governors and drop a redundant
"weak" function definition from cpufreq (Viresh Kumar).
- Rearrange cpufreq to register non-modular governors at the
core_initcall level and allow the default cpufreq governor to
be specified in the kernel command line (Quentin Perret).
- Extend, fix and clean up the intel_pstate driver (Srinivas
Pandruvada, Rafael Wysocki):
* Add a new sysfs attribute for disabling/enabling CPU
energy-efficiency optimizations in the processor.
* Make the driver avoid enabling HWP if EPP is not supported.
* Allow the driver to handle numeric EPP values in the sysfs
interface and fix the setting of EPP via sysfs in the active
mode.
* Eliminate a static checker warning and clean up a kerneldoc
comment.
- Clean up some variable declarations in the powernv cpufreq
driver (Wei Yongjun).
- Fix up the ->enter_s2idle callback definition to cover the case
when it points to the same function as ->idle correctly (Neal
Liu).
- Rearrange and clean up the PSCI cpuidle driver (Ulf Hansson).
- Make the PM core emit "changed" uevent when adding/removing the
"wakeup" sysfs attribute of devices (Abhishek Pandit-Subedi).
- Add a helper macro for declaring PM callbacks and use it in the
MMC jz4740 driver (Paul Cercueil).
- Fix white space in some places in the hibernate code and make the
system-wide PM code use "const char *" where appropriate (Xiang
Chen, Alexey Dobriyan).
- Add one more "unsafe" helper macro to the freezer to cover the NFS
use case (He Zhe).
- Change the language in the generic PM domains framework to use
parent/child terminology and clean up a typo and some comment
fromatting in that code (Kees Cook, Geert Uytterhoeven).
- Update the operating performance points OPP framework (Lukasz
Luba, Andrew-sh.Cheng, Valdis Kletnieks):
* Refactor dev_pm_opp_of_register_em() and update related drivers.
* Add a missing function export.
* Allow disabled OPPs in dev_pm_opp_get_freq().
- Update devfreq core and drivers (Chanwoo Choi, Lukasz Luba, Enric
Balletbo i Serra, Dmitry Osipenko, Kieran Bingham, Marc Zyngier):
* Add support for delayed timers to the devfreq core and make the
Samsung exynos5422-dmc driver use it.
* Unify sysfs interface to use "df-" as a prefix in instance names
consistently.
* Fix devfreq_summary debugfs node indentation.
* Add the rockchip,pmu phandle to the rk3399_dmc driver DT
bindings.
* List Dmitry Osipenko as the Tegra devfreq driver maintainer.
* Fix typos in the core devfreq code.
- Update the pm-graph utility to version 5.7 including a number of
fixes related to suspend-to-idle (Todd Brandt).
- Fix coccicheck errors and warnings in the cpupower utility (Shuah
Khan).
- Replace HTTP links with HTTPs ones in multiple places (Alexander
A. Klimov).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl8oO24SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx7ZQP/0lQ0yABnASnwomdOH6+K/m7rvc+e9FE
zx5pTDQswhU5tM7SQAIKqe0uSI+okF2UrBrT5onA16F+JUbnrbexJLazBPfVTTGF
AKpKEQ7Wh69Wz+Y6cQZjm1dTuRL+dlBJuBrzR2tLSnONPMMHuFcO3xd7lgE9UAxC
oGEf393taA6OqcUNRQIa2gqbq+k1qhKjeDucGkbOaoJ6CL0ZyWI+Tfw1WWaBBGv0
/2wBd6V513OH8WtQCW6H3YpHmhYW6OwL8w19KyGcjPRGJaeaIP4W/Ng7mkvgL5ZB
vZqg3XiufFV9uTe8W1NQaVv/NjlN256OteuK809aosTVjD0dhFkhBYg5TLu6HbQq
C/NciZ+78oLedWLT73EUfw3NyS+V0jk6X2EIlBUwNi0Qw1B1pCifGOCKzWFFe5cr
ci4xr4FG7dBkxScOxwFAU2s5TdPHLOkGkQtg4jZr0OYDrzkyLEdsnZEUjLPORo+0
6EBXGfTOSy2CBHcYswRtzJr/1pUTzj7oejhTAMCCuYW2r3VyQtnYcVjlehtp20if
6BfmGisk8nmtxlSm+/Y2FqKa4bNnSTMmr0UJQ+Rjp0tHs47QeucI0ORfZ5nPaBac
+ptvIjWmn3xejT/+oAehpH9066Iuy66vzHdnj7x5+WAsmYS8n8OFtlBFkYELmLJB
3xI5hIl7WtGo
=8cUO
-----END PGP SIGNATURE-----
Merge tag 'pm-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The most significant change here is the extension of the Energy Model
to cover non-CPU devices (as well as CPUs) from Lukasz Luba.
There is also some new hardware support (Ice Lake server idle states
table for intel_idle, Sapphire Rapids and Power Limit 4 support in the
RAPL driver), some new functionality in the existing drivers (eg. a
new switch to disable/enable CPU energy-efficiency optimizations in
intel_pstate, delayed timers in devfreq), some assorted fixes (cpufreq
core, intel_pstate, intel_idle) and cleanups (eg. cpuidle-psci,
devfreq), including the elimination of W=1 build warnings from cpufreq
done by Lee Jones.
Specifics:
- Make the Energy Model cover non-CPU devices (Lukasz Luba).
- Add Ice Lake server idle states table to the intel_idle driver and
eliminate a redundant static variable from it (Chen Yu, Rafael
Wysocki).
- Eliminate all W=1 build warnings from cpufreq (Lee Jones).
- Add support for Sapphire Rapids and for Power Limit 4 to the Intel
RAPL power capping driver (Sumeet Pawnikar, Zhang Rui).
- Fix function name in kerneldoc comments in the idle_inject power
capping driver (Yangtao Li).
- Fix locking issues with cpufreq governors and drop a redundant
"weak" function definition from cpufreq (Viresh Kumar).
- Rearrange cpufreq to register non-modular governors at the
core_initcall level and allow the default cpufreq governor to be
specified in the kernel command line (Quentin Perret).
- Extend, fix and clean up the intel_pstate driver (Srinivas
Pandruvada, Rafael Wysocki):
* Add a new sysfs attribute for disabling/enabling CPU
energy-efficiency optimizations in the processor.
* Make the driver avoid enabling HWP if EPP is not supported.
* Allow the driver to handle numeric EPP values in the sysfs
interface and fix the setting of EPP via sysfs in the active
mode.
* Eliminate a static checker warning and clean up a kerneldoc
comment.
- Clean up some variable declarations in the powernv cpufreq driver
(Wei Yongjun).
- Fix up the ->enter_s2idle callback definition to cover the case
when it points to the same function as ->idle correctly (Neal Liu).
- Rearrange and clean up the PSCI cpuidle driver (Ulf Hansson).
- Make the PM core emit "changed" uevent when adding/removing the
"wakeup" sysfs attribute of devices (Abhishek Pandit-Subedi).
- Add a helper macro for declaring PM callbacks and use it in the MMC
jz4740 driver (Paul Cercueil).
- Fix white space in some places in the hibernate code and make the
system-wide PM code use "const char *" where appropriate (Xiang
Chen, Alexey Dobriyan).
- Add one more "unsafe" helper macro to the freezer to cover the NFS
use case (He Zhe).
- Change the language in the generic PM domains framework to use
parent/child terminology and clean up a typo and some comment
fromatting in that code (Kees Cook, Geert Uytterhoeven).
- Update the operating performance points OPP framework (Lukasz Luba,
Andrew-sh.Cheng, Valdis Kletnieks):
* Refactor dev_pm_opp_of_register_em() and update related drivers.
* Add a missing function export.
* Allow disabled OPPs in dev_pm_opp_get_freq().
- Update devfreq core and drivers (Chanwoo Choi, Lukasz Luba, Enric
Balletbo i Serra, Dmitry Osipenko, Kieran Bingham, Marc Zyngier):
* Add support for delayed timers to the devfreq core and make the
Samsung exynos5422-dmc driver use it.
* Unify sysfs interface to use "df-" as a prefix in instance
names consistently.
* Fix devfreq_summary debugfs node indentation.
* Add the rockchip,pmu phandle to the rk3399_dmc driver DT
bindings.
* List Dmitry Osipenko as the Tegra devfreq driver maintainer.
* Fix typos in the core devfreq code.
- Update the pm-graph utility to version 5.7 including a number of
fixes related to suspend-to-idle (Todd Brandt).
- Fix coccicheck errors and warnings in the cpupower utility (Shuah
Khan).
- Replace HTTP links with HTTPs ones in multiple places (Alexander A.
Klimov)"
* tag 'pm-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (71 commits)
cpuidle: ACPI: fix 'return' with no value build warning
cpufreq: intel_pstate: Fix EPP setting via sysfs in active mode
cpufreq: intel_pstate: Rearrange the storing of new EPP values
intel_idle: Customize IceLake server support
PM / devfreq: Fix the wrong end with semicolon
PM / devfreq: Fix indentaion of devfreq_summary debugfs node
PM / devfreq: Clean up the devfreq instance name in sysfs attr
memory: samsung: exynos5422-dmc: Add module param to control IRQ mode
memory: samsung: exynos5422-dmc: Adjust polling interval and uptreshold
memory: samsung: exynos5422-dmc: Use delayed timer as default
PM / devfreq: Add support delayed timer for polling mode
dt-bindings: devfreq: rk3399_dmc: Add rockchip,pmu phandle
PM / devfreq: tegra: Add Dmitry as a maintainer
PM / devfreq: event: Fix trivial spelling
PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent
cpuidle: change enter_s2idle() prototype
cpuidle: psci: Prevent domain idlestates until consumers are ready
cpuidle: psci: Convert PM domain to platform driver
cpuidle: psci: Fix error path via converting to a platform driver
cpuidle: psci: Fail cpuidle registration if set OSI mode failed
...
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oRTgRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1huHQ//T2hZk5zlpOtojxvdAzsPgtV4tHawseK8
+ZZEbrH5qo5/ZMF18qyEJCm9p1yg8uIu71InULRCSgjU3v82GVCcuLXuE36U904G
gHUqkYPnqxCqx+Li125aye9tKWahXe1DxX+uWbV0Ju7fiCO0rwYIzpWn1bnR6ilp
fmLGSbgPlTVJwZ9mBvyi3VUlH5tDYidFN74TREUOwx2g5uhg+8uEo44Eb/bx8ESF
dGt1Z/fnfDHkUZtmhzJk5Uz8nbw7rPHU/EZ4iZAxEzxTutY5PhsvbIfLO4t4HhGn
utZCk/pIdiLLQ1GaTvFxqi3iolDqpOuXpnDlfEAJD8UlMCnwyh1Certq5LaRbtHS
8SW3/CeJgzqzrrsYhkxVu2PMFWriSMxgKTLiN0KnzJN0Hu7A5lHbBY/6G7zpsF/A
2KJ4e8lZiPCcNF7LteSRroUe4hNOYxZ2FlYTXm3AgycSL189UMfWlHFb5c+b4m1a
cNJpz+jAom8foXN4KhRkl5PFKXVXDGTVln3NRJCh1Mqd1Ef4hsTo9H6FgHX/EfHg
slJDwwPac80v0dzlMTSsMkyseaKRAqIObWOiknPt1wv/qja7ibVZ5mUbZ+/mfJX/
YWybcPi1omgUSNt7TNx6jtma67rUjmJW0x9g7UJ/ttEkf6yG2lemrdusydBYuIni
0Z2+hWzI9MM=
=X7o0
-----END PGP SIGNATURE-----
Merge tag 'x86-cleanups-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Misc cleanups all around the place"
* tag 'x86-cleanups-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioperm: Initialize pointer bitmap with NULL rather than 0
x86: uv: uv_hub.h: Delete duplicated word
x86: cmpxchg_32.h: Delete duplicated word
x86: bootparam.h: Delete duplicated word
x86/mm: Remove the unused mk_kernel_pgd() #define
x86/tsc: Remove unused "US_SCALE" and "NS_SCALE" leftover macros
x86/ioapic: Remove unused "IOAPIC_AUTO" define
x86/mm: Drop unused MAX_PHYSADDR_BITS
x86/msr: Move the F15h MSRs where they belong
x86/idt: Make idt_descr static
initrd: Remove erroneous comment
x86/mm/32: Fix -Wmissing prototypes warnings for init.c
cpu/speculation: Add prototype for cpu_show_srbds()
x86/mm: Fix -Wmissing-prototypes warnings for arch/x86/mm/init.c
x86/asm: Unify __ASSEMBLY__ blocks
x86/cpufeatures: Mark two free bits in word 3
x86/msr: Lift AMD family 0x15 power-specific MSRs
By default intel_pstate the driver disables energy efficiency by setting
MSR_IA32_POWER_CTL bit 19 for Kaby Lake desktop CPU model in HWP mode.
This CPU model is also shared by Coffee Lake desktop CPUs. This allows
these systems to reach maximum possible frequency. But this adds power
penalty, which some customers don't want. They want some way to enable/
disable dynamically.
So, add an additional attribute "energy_efficiency" under
/sys/devices/system/cpu/intel_pstate/ for these CPU models. This allows
to read and write bit 19 ("Disable Energy Efficiency Optimization") in
the MSR IA32_POWER_CTL.
This attribute is present in both HWP and non-HWP mode as this has an
effect in both modes. Refer to Intel Software Developer's manual for
details.
The scope of this bit is package wide. Also these systems are single
package systems. So read/write MSR on the current CPU is enough.
The energy efficiency (EE) bit setting needs to be preserved during
suspend/resume and CPU offline/online operation. To do this:
- Restoring the EE setting from the cpufreq resume() callback, if there
is change from the system default.
- By default, don't disable EE from cpufreq init() callback for matching
CPU models. Since the scope is package wide and is a single package
system, move the disable EE calls from init() callback to
intel_pstate_init() function, which is called only once.
Suggested-by: Len Brown <lenb@kernel.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
1068ed4547 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
moved the three F15h power MSRs to the architectural list but that was
wrong as they belong in the family 0x15 list. That also caused:
In file included from trace/beauty/tracepoints/x86_msr.c:10:
perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: error: initialized field overwritten [-Werror=override-init]
292 | [0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
| ^~~~~~~~~~~
perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: note: (near initialization for 'x86_AMD_V_KVM_MSRs[640]')
due to MSR_F15H_PTSC ending up being defined twice. Move them where they
belong and drop the duplicate.
Also, drop the respective tools/ changes of the msr-index.h copy the
above commit added because perf tool developers prefer to go through
those changes themselves in order to figure out whether changes to the
kernel headers would need additional handling in perf.
Fixes: 1068ed4547 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lkml.kernel.org/r/20200621163323.14e8533f@canb.auug.org.au
... into the global msr-index.h header because they're used in multiple
compilation units. Sort the MSR list a bit. Update the msr-index.h copy
in tools.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lkml.kernel.org/r/20200608164847.14232-1-bp@alien8.de
Pull x86 srbds fixes from Thomas Gleixner:
"The 9th episode of the dime novel "The performance killer" with the
subtitle "Slow Randomizing Boosts Denial of Service".
SRBDS is an MDS-like speculative side channel that can leak bits from
the random number generator (RNG) across cores and threads. New
microcode serializes the processor access during the execution of
RDRAND and RDSEED. This ensures that the shared buffer is overwritten
before it is released for reuse. This is equivalent to a full bus
lock, which means that many threads running the RNG instructions in
parallel have the same effect as the same amount of threads issuing a
locked instruction targeting an address which requires locking of two
cachelines at once.
The mitigation support comes with the usual pile of unpleasant
ingredients:
- command line options
- sysfs file
- microcode checks
- a list of vulnerable CPUs identified by model and stepping this
time which requires stepping match support for the cpu match logic.
- the inevitable slowdown of affected CPUs"
* branch 'x86/srbds' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Add Ivy Bridge to affected list
x86/speculation: Add SRBDS vulnerability and mitigation documentation
x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation
x86/cpu: Add 'table' argument to cpu_matches()
This patch enables AMD Fam17h RAPL support for the Package level metric.
The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh (Zen1) PPR.
The same output is available via the energy-pkg pseudo event:
$ perf stat -a -I 1000 --per-socket -e power/energy-pkg/
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200527224659.206129-6-eranian@google.com
SRBDS is an MDS-like speculative side channel that can leak bits from the
random number generator (RNG) across cores and threads. New microcode
serializes the processor access during the execution of RDRAND and
RDSEED. This ensures that the shared buffer is overwritten before it is
released for reuse.
While it is present on all affected CPU models, the microcode mitigation
is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
cases where TSX is not supported or has been disabled with TSX_CTRL.
The mitigation is activated by default on affected processors and it
increases latency for RDRAND and RDSEED instructions. Among other
effects this will reduce throughput from /dev/urandom.
* Enable administrator to configure the mitigation off when desired using
either mitigations=off or srbds=off.
* Export vulnerability status via sysfs
* Rename file-scoped macros to apply for non-whitelist table initializations.
[ bp: Massage,
- s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
- do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
- flip check in cpu_set_bug_bits() to save an indentation level,
- reflow comments.
jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
tglx: Dropped the fused off magic for now
]
Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
- Atomic operations (lock prefixed instructions) which span two cache
lines have to acquire the global bus lock. This is at least 1k cycles
slower than an atomic operation within a cache line and disrupts
performance on other cores. Aside of performance disruption this is
a unpriviledged form of DoS.
Some newer CPUs have the capability to raise an #AC trap when such an
operation is attempted. The detection is by default enabled in warning
mode which will warn once when a user space application is caught. A
command line option allows to disable the detection or to select fatal
mode which will terminate offending applications with SIGBUS.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B/uMTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYocsAD/9yqpw+XlPKNPsfbm9sbirBDfTrENcL
F44iwn4WnrjoW/gnnZCYmPxJFsTtGVPqxHdUf4eyGemg9r9ZEO0DQftmUHC5Z6KX
aa/b5JoeM61wp9HlpVlD4D1jVt4pWyQODQeZnUXE4DEzmRc3cD/5lSU+/VeaIwwz
lxwUemqmXK7ucH2KA7smOGsl2nU6ED84q3mdOB1b4Cw+gWYMUnPJnuS/ipriBRx4
BYbMItcxsFvtdO9Hx8PvGd5LUK0wW8JOWrYQICD2kLpZtHtGeaHpBzFzL0+nMU7d
1epyDqJQDmX+PAzvj+EYyn3HTfobZlckn+tbxMQkkS+oDk1ywOZd+BancClvn5/5
jMfPIQJF5bGASVnzGMWhzVdwthTZiMG4d1iKsUWOA/hN0ch0+rm1BqraToabsEFg
Sv7/rvl9KtSOtMJTeAmMhlZUMBj9m8BtPFjniDwp6nw/upGgJdST5mrKFNYZvqOj
JnXsEMr/nJVW6bnUvT6LF66xbHlzHdxtodkQWqF+IEsyRaOz1zAGpQamP98KxNLc
dq/XYoEe1KqIFbg4BkNP+GeDL3FQDxjFNwPQnnjQEzWRbjkHlfmq1uKCsR2r8mBO
fYNJ1X8lTyGV0kx/ERpWGazzabpzh+8Lr1yMhnoA3EWvlzUjmpN2PFI4oTpTrtzT
c/q16SCxim3NWA==
=D9x8
-----END PGP SIGNATURE-----
Merge tag 'x86-splitlock-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 splitlock updates from Thomas Gleixner:
"Support for 'split lock' detection:
Atomic operations (lock prefixed instructions) which span two cache
lines have to acquire the global bus lock. This is at least 1k cycles
slower than an atomic operation within a cache line and disrupts
performance on other cores. Aside of performance disruption this is a
unpriviledged form of DoS.
Some newer CPUs have the capability to raise an #AC trap when such an
operation is attempted. The detection is by default enabled in warning
mode which will warn once when a user space application is caught. A
command line option allows to disable the detection or to select fatal
mode which will terminate offending applications with SIGBUS"
* tag 'x86-splitlock-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR
x86/split_lock: Rework the initialization flow of split lock detection
x86/split_lock: Enable split lock detection by kernel
A split-lock occurs when an atomic instruction operates on data that spans
two cache lines. In order to maintain atomicity the core takes a global bus
lock.
This is typically >1000 cycles slower than an atomic operation within a
cache line. It also disrupts performance on other cores (which must wait
for the bus lock to be released before their memory operations can
complete). For real-time systems this may mean missing deadlines. For other
systems it may just be very annoying.
Some CPUs have the capability to raise an #AC trap when a split lock is
attempted.
Provide a command line option to give the user choices on how to handle
this:
split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, but allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock
On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it will
OOPs.
One implementation wrinkle is that the MSR to control the split lock
detection is per-core, not per thread. This might result in some short
lived races on HT systems in "warn" mode if Linux tries to enable on one
thread while disabling on the other. Race analysis by Sean Christopherson:
- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between off/warn/fatal modes at runtime isn't supported
and disabling is tracked per task, so hardware will always reach a steady
state that matches the configured mode. I.e. split-lock is guaranteed to
be enabled in hardware once all _TIF_SLD threads have been scheduled out.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com
Commit
aaf248848d ("perf/x86/msr: Add AMD IRPERF (Instructions Retired)
performance counter")
added support for access to the free-running counter via 'perf -e
msr/irperf/', but when exercised, it always returns a 0 count:
BEFORE:
$ perf stat -e instructions,msr/irperf/ true
Performance counter stats for 'true':
624,833 instructions
0 msr/irperf/
Simply set its enable bit - HWCR bit 30 - to make it start counting.
Enablement is restricted to all machines advertising IRPERF capability,
except those susceptible to an erratum that makes the IRPERF return
bad values.
That erratum occurs in Family 17h models 00-1fh [1], but not in F17h
models 20h and above [2].
AFTER (on a family 17h model 31h machine):
$ perf stat -e instructions,msr/irperf/ true
Performance counter stats for 'true':
621,690 instructions
622,490 msr/irperf/
[1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors
[2] Revision Guide for AMD Family 17h Models 30h-3Fh Processors
The revision guides are available from the bugzilla Link below.
[ bp: Massage commit message. ]
Fixes: aaf248848d ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: http://lkml.kernel.org/r/20200214201805.13830-1-kim.phillips@amd.com
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXdtnVQAKCRCAXGG7T9hj
vg1hAQDqG1DKZvR6BtlvETFMz7ZlrXVkpm6C74Wy4bLiO5KSSAEAneFbrDwFVa0c
d05Z6wemjlyEd7u3gkVQBKfHkbWBRQQ=
=aDIL
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- a small series to remove the build constraint of Xen x86 MCE handling
to 64-bit only
- a bunch of minor cleanups
* tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Fix Kconfig indentation
xen/mcelog: also allow building for 32-bit kernels
xen/mcelog: add PPIN to record when available
xen/mcelog: drop __MC_MSR_MCGCAP
xen/gntdev: Use select for DMA_SHARED_BUFFER
xen: mm: make xen_mm_init static
xen: mm: include <xen/xen-ops.h> for missing declarations
This is to augment commit 3f5a7896a5 ("x86/mce: Include the PPIN in MCE
records when available").
I'm also adding "synd" and "ipid" fields to struct xen_mce, in an
attempt to keep field offsets in sync with struct mce. These two fields
won't get populated for now, though.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Some processors may incur a machine check error possibly resulting in an
unrecoverable CPU lockup when an instruction fetch encounters a TLB
multi-hit in the instruction TLB. This can occur when the page size is
changed along with either the physical address or cache type. The relevant
erratum can be found here:
https://bugzilla.kernel.org/show_bug.cgi?id=205195
There are other processors affected for which the erratum does not fully
disclose the impact.
This issue affects both bare-metal x86 page tables and EPT.
It can be mitigated by either eliminating the use of large pages or by
using careful TLB invalidations when changing the page size in the page
tables.
Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in
MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which
are mitigated against this issue.
Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
TSX Async Abort (TAA) is a side channel vulnerability to the internal
buffers in some Intel processors similar to Microachitectural Data
Sampling (MDS). In this case, certain loads may speculatively pass
invalid data to dependent operations when an asynchronous abort
condition is pending in a TSX transaction.
This includes loads with no fault or assist condition. Such loads may
speculatively expose stale data from the uarch data structures as in
MDS. Scope of exposure is within the same-thread and cross-thread. This
issue affects all current processors that support TSX, but do not have
ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES.
On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0,
CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers
using VERW or L1D_FLUSH, there is no additional mitigation needed for
TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by
disabling the Transactional Synchronization Extensions (TSX) feature.
A new MSR IA32_TSX_CTRL in future and current processors after a
microcode update can be used to control the TSX feature. There are two
bits in that MSR:
* TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted
Transactional Memory (RTM).
* TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other
TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
disabled with updated microcode but still enumerated as present by
CPUID(EAX=7).EBX{bit4}.
The second mitigation approach is similar to MDS which is clearing the
affected CPU buffers on return to user space and when entering a guest.
Relevant microcode update is required for the mitigation to work. More
details on this approach can be found here:
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html
The TSX feature can be controlled by the "tsx" command line parameter.
If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is
deployed. The effective mitigation state can be read from sysfs.
[ bp:
- massage + comments cleanup
- s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh.
- remove partial TAA mitigation in update_mds_branch_idle() - Josh.
- s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g
]
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Transactional Synchronization Extensions (TSX) may be used on certain
processors as part of a speculative side channel attack. A microcode
update for existing processors that are vulnerable to this attack will
add a new MSR - IA32_TSX_CTRL to allow the system administrator the
option to disable TSX as one of the possible mitigations.
The CPUs which get this new MSR after a microcode upgrade are the ones
which do not set MSR_IA32_ARCH_CAPABILITIES.MDS_NO (bit 5) because those
CPUs have CPUID.MD_CLEAR, i.e., the VERW implementation which clears all
CPU buffers takes care of the TAA case as well.
[ Note that future processors that are not vulnerable will also
support the IA32_TSX_CTRL MSR. ]
Add defines for the new IA32_TSX_CTRL MSR and its bits.
TSX has two sub-features:
1. Restricted Transactional Memory (RTM) is an explicitly-used feature
where new instructions begin and end TSX transactions.
2. Hardware Lock Elision (HLE) is implicitly used when certain kinds of
"old" style locks are used by software.
Bit 7 of the IA32_ARCH_CAPABILITIES indicates the presence of the
IA32_TSX_CTRL MSR.
There are two control bits in IA32_TSX_CTRL MSR:
Bit 0: When set, it disables the Restricted Transactional Memory (RTM)
sub-feature of TSX (will force all transactions to abort on the
XBEGIN instruction).
Bit 1: When set, it disables the enumeration of the RTM and HLE feature
(i.e. it will make CPUID(EAX=7).EBX{bit4} and
CPUID(EAX=7).EBX{bit11} read as 0).
The other TSX sub-feature, Hardware Lock Elision (HLE), is
unconditionally disabled by the new microcode but still enumerated
as present by CPUID(EAX=7).EBX{bit4}, unless disabled by
IA32_TSX_CTRL_MSR[1] - TSX_CTRL_CPUID_CLEAR.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Pull x86 cpu-feature updates from Ingo Molnar:
- Rework the Intel model names symbols/macros, which were decades of
ad-hoc extensions and added random noise. It's now a coherent, easy
to follow nomenclature.
- Add new Intel CPU model IDs:
- "Tiger Lake" desktop and mobile models
- "Elkhart Lake" model ID
- and the "Lightning Mountain" variant of Airmont, plus support code
- Add the new AVX512_VP2INTERSECT instruction to cpufeatures
- Remove Intel MPX user-visible APIs and the self-tests, because the
toolchain (gcc) is not supporting it going forward. This is the
first, lowest-risk phase of MPX removal.
- Remove X86_FEATURE_MFENCE_RDTSC
- Various smaller cleanups and fixes
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
x86/cpu: Update init data for new Airmont CPU model
x86/cpu: Add new Airmont variant to Intel family
x86/cpu: Add Elkhart Lake to Intel family
x86/cpu: Add Tiger Lake to Intel family
x86: Correct misc typos
x86/intel: Add common OPTDIFFs
x86/intel: Aggregate microserver naming
x86/intel: Aggregate big core graphics naming
x86/intel: Aggregate big core mobile naming
x86/intel: Aggregate big core client naming
x86/cpufeature: Explain the macro duplication
x86/ftrace: Remove mcount() declaration
x86/PCI: Remove superfluous returns from void functions
x86/msr-index: Move AMD MSRs where they belong
x86/cpu: Use constant definitions for CPU models
lib: Remove redundant ftrace flag removal
x86/crash: Remove unnecessary comparison
x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE()
x86: Remove X86_FEATURE_MFENCE_RDTSC
x86/mpx: Remove MPX APIs
...
If PEBS declares ability to output its data to Intel PT stream, use the
aux_output attribute bit to enable PEBS data output to PT. This requires
a PT event to be present and scheduled in the same context. Unlike the
DS area, the kernel does not extract PEBS records from the PT stream to
generate corresponding records in the perf stream, because that would
require real time in-kernel PT decoding, which is not feasible. The PMI,
however, can still be used.
The output setting is per-CPU, so all PEBS events must be either writing
to PT or to the DS area, therefore, in case of conflict, the conflicting
event will fail to schedule, allowing the rotation logic to alternate
between the PEBS->PT and PEBS->DS events.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: kan.liang@linux.intel.com
Link: https://lkml.kernel.org/r/20190806084606.4021-3-alexander.shishkin@linux.intel.com
There have been reports of RDRAND issues after resuming from suspend on
some AMD family 15h and family 16h systems. This issue stems from a BIOS
not performing the proper steps during resume to ensure RDRAND continues
to function properly.
RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be
reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND
support using CPUID, including the kernel, will believe that RDRAND is
not supported.
Update the CPU initialization to clear the RDRAND CPUID bit for any family
15h and 16h processor that supports RDRAND. If it is known that the family
15h or family 16h system does not have an RDRAND resume issue or that the
system will not be placed in suspend, the "rdrand=force" kernel parameter
can be used to stop the clearing of the RDRAND CPUID bit.
Additionally, update the suspend and resume path to save and restore the
MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in
place after resuming from suspend.
Note, that clearing the RDRAND CPUID bit does not prevent a processor
that normally supports the RDRAND instruction from executing it. So any
code that determined the support based on family and model won't #UD.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>
Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
... sort them in and fixup comment, while at it.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190819070140.23708-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
umwait or tpause allows the processor to enter a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state) for a period specified by
the instruction or until the system time limit or until a store to the
monitored address range in umwait.
IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
the processor and to set the maximum time the processor can reside in C0.1
or C0.2.
By default C0.2 is enabled so the user wait instructions can enter the
C0.2 state to save more power with slower wakeup time.
Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
default. A quote from Andy:
"What I want to avoid is the case where it works dramatically differently
on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
behave a bit differently if the max timeout is hit, and I'd like that
path to get exercised widely by making it happen even on default
configs."
A sysfs interface to adjust the time and the C0.2 enablement is provided in
a follow up change.
[ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
mask throughout the code.
Massaged comments and changelog ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
* POWER: support for direct access to the POWER9 XIVE interrupt controller,
memory and performance optimizations.
* x86: support for accessing memory not backed by struct page, fixes and refactoring
* Generic: dirty page tracking improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJc3qV/AAoJEL/70l94x66Dn3QH/jX1Bn0P/RZAIt4w0SySklSg
PqxUKDyBQqB9vN9Qeb9jWXAKPH2CtM3+up/rz7oRnBWp7qA6vXcC/R/QJYAvzdXE
nklsR/oYCsflR1KdlVYuDvvPCPP2fLBU5zfN83OsaBQ8fNRkm3gN+N5XQ2SbXbLy
Mo9tybS4otY201UAC96e8N0ipwwyCRpDneQpLcl+F5nH3RBt63cVbs04O+70MXn7
eT4I+8K3+Go7LATzT8hglD21D/7uvE31qQb6yr5L33IfhU4GB51RZzBXTNaAdY8n
hT1rMrRkAMAFWYZPQDfoMadjWU3i5DIfstKjDxOr9oTfuOEp5Z+GvJwvVnUDg1I=
=D0+p
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- support for SVE and Pointer Authentication in guests
- PMU improvements
POWER:
- support for direct access to the POWER9 XIVE interrupt controller
- memory and performance optimizations
x86:
- support for accessing memory not backed by struct page
- fixes and refactoring
Generic:
- dirty page tracking improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits)
kvm: fix compilation on aarch64
Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
kvm: x86: Fix L1TF mitigation for shadow MMU
KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible
KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device
KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing"
KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs
kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete
tests: kvm: Add tests for KVM_SET_NESTED_STATE
KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state
tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID
tests: kvm: Add tests to .gitignore
KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
KVM: Fix the bitmap range to copy during clear dirty
KVM: arm64: Fix ptrauth ID register masking logic
KVM: x86: use direct accessors for RIP and RSP
KVM: VMX: Use accessors for GPRs outside of dedicated caching logic
KVM: x86: Omit caching logic for always-available GPRs
kvm, x86: Properly check whether a pfn is an MMIO or not
...
Pull x86 MDS mitigations from Thomas Gleixner:
"Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is
available in various CPU internal buffers. This new set of misfeatures
has the following CVEs assigned:
CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
MDS attacks target microarchitectural buffers which speculatively
forward data under certain conditions. Disclosure gadgets can expose
this data via cache side channels.
Contrary to other speculation based vulnerabilities the MDS
vulnerability does not allow the attacker to control the memory target
address. As a consequence the attacks are purely sampling based, but
as demonstrated with the TLBleed attack samples can be postprocessed
successfully.
The mitigation is to flush the microarchitectural buffers on return to
user space and before entering a VM. It's bolted on the VERW
instruction and requires a microcode update. As some of the attacks
exploit data structures shared between hyperthreads, full protection
requires to disable hyperthreading. The kernel does not do that by
default to avoid breaking unattended updates.
The mitigation set comes with documentation for administrators and a
deeper technical view"
* 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/speculation/mds: Fix documentation typo
Documentation: Correct the possible MDS sysfs values
x86/mds: Add MDSUM variant to the MDS documentation
x86/speculation/mds: Add 'mitigations=' support for MDS
x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
x86/speculation/mds: Fix comment
x86/speculation/mds: Add SMT warning message
x86/speculation: Move arch_smt_update() call to after mitigation decisions
x86/speculation/mds: Add mds=full,nosmt cmdline option
Documentation: Add MDS vulnerability documentation
Documentation: Move L1TF to separate directory
x86/speculation/mds: Add mitigation mode VMWERV
x86/speculation/mds: Add sysfs reporting for MDS
x86/speculation/mds: Add mitigation control for MDS
x86/speculation/mds: Conditionally clear CPU buffers on idle entry
x86/kvm/vmx: Add MDS protection when L1D Flush is not active
x86/speculation/mds: Clear CPU buffers on exit to user
x86/speculation/mds: Add mds_clear_cpu_buffers()
x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
x86/speculation/mds: Add BUG_MSBDS_ONLY
...
Let guests clear the Intel PT ToPA PMI status (bit 55 of
MSR_CORE_PERF_GLOBAL_OVF_CTRL).
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Inject a PMI for KVM guest when Intel PT working
in Host-Guest mode and Guest ToPA entry memory buffer
was completely filled.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Adaptive PEBS is a new way to report PEBS sampling information. Instead
of a fixed size record for all PEBS events it allows to configure the
PEBS record to only include the information needed. Events can then opt
in to use such an extended record, or stay with a basic record which
only contains the IP.
The major new feature is to support LBRs in PEBS record.
Besides normal LBR, this allows (much faster) large PEBS, while still
supporting callstacks through callstack LBR. So essentially a lot of
profiling can now be done without frequent interrupts, dropping the
overhead significantly.
The main requirement still is to use a period, and not use frequency
mode, because frequency mode requires reevaluating the frequency on each
overflow.
The floating point state (XMM) is also supported, which allows efficient
profiling of FP function arguments.
Introduce specific drain function to handle variable length records.
Use a new callback to parse the new record format, and also handle the
STATUS field now being at a different offset.
Add code to set up the configuration register. Since there is only a
single register, all events either get the full super set of all events,
or only the basic record.
Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: https://lkml.kernel.org/r/20190402194509.2832-6-kan.liang@linux.intel.com
[ Renamed GPRS => GP. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Microarchitectural Data Sampling (MDS), is a class of side channel attacks
on internal buffers in Intel CPUs. The variants are:
- Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
- Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
- Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
dependent load (store-to-load forwarding) as an optimization. The forward
can also happen to a faulting or assisting load operation for a different
memory address, which can be exploited under certain conditions. Store
buffers are partitioned between Hyper-Threads so cross thread forwarding is
not possible. But if a thread enters or exits a sleep state the store
buffer is repartitioned which can expose data from one thread to the other.
MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
L1 miss situations and to hold data which is returned or sent in response
to a memory or I/O operation. Fill buffers can forward data to a load
operation and also write data to the cache. When the fill buffer is
deallocated it can retain the stale data of the preceding operations which
can then be forwarded to a faulting or assisting load operation, which can
be exploited under certain conditions. Fill buffers are shared between
Hyper-Threads so cross thread leakage is possible.
MLDPS leaks Load Port Data. Load ports are used to perform load operations
from memory or I/O. The received data is then forwarded to the register
file or a subsequent operation. In some implementations the Load Port can
contain stale data from a previous operation which can be forwarded to
faulting or assisting loads under certain conditions, which again can be
exploited eventually. Load ports are shared between Hyper-Threads so cross
thread leakage is possible.
All variants have the same mitigation for single CPU thread case (SMT off),
so the kernel can treat them as one MDS issue.
Add the basic infrastructure to detect if the current CPU is affected by
MDS.
[ tglx: Rewrote changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Greg pointed out that speculation related bit defines are using (1 << N)
format instead of BIT(N). Aside of that (1 << N) is wrong as it should use
1UL at least.
Clean it up.
[ Josh Poimboeuf: Fix tools build ]
Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Skylake systems will receive a microcode update to address a TSX
errata. This microcode will (by default) clobber PMC3 when TSX
instructions are (speculatively or not) executed.
It also provides an MSR to cause all TSX transaction to abort and
preserve PMC3.
Add the CPUID enumeration and MSR definition.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
single-stepping fixes, improved tracing, various timer and vGIC
fixes
* x86: Processor Tracing virtualization, STIBP support, some correctness fixes,
refactorings and splitting of vmx.c, use the Hyper-V range TLB flush hypercall,
reduce order of vcpu struct, WBNOINVD support, do not use -ftrace for __noclone
functions, nested guest support for PAUSE filtering on AMD, more Hyper-V
enlightenments (direct mode for synthetic timers)
* PPC: nested VFIO
* s390: bugfixes only this time
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJcH0vFAAoJEL/70l94x66Dw/wH/2FZp1YOM5OgiJzgqnXyDbyf
dNEfWo472MtNiLsuf+ZAfJojVIu9cv7wtBfXNzW+75XZDfh/J88geHWNSiZDm3Fe
aM4MOnGG0yF3hQrRQyEHe4IFhGFNERax8Ccv+OL44md9CjYrIrsGkRD08qwb+gNh
P8T/3wJEKwUcVHA/1VHEIM8MlirxNENc78p6JKd/C7zb0emjGavdIpWFUMr3SNfs
CemabhJUuwOYtwjRInyx1y34FzYwW3Ejuc9a9UoZ+COahUfkuxHE8u+EQS7vLVF6
2VGVu5SA0PqgmLlGhHthxLqVgQYo+dB22cRnsLtXlUChtVAq8q9uu5sKzvqEzuE=
=b4Jx
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- selftests improvements
- large PUD support for HugeTLB
- single-stepping fixes
- improved tracing
- various timer and vGIC fixes
x86:
- Processor Tracing virtualization
- STIBP support
- some correctness fixes
- refactorings and splitting of vmx.c
- use the Hyper-V range TLB flush hypercall
- reduce order of vcpu struct
- WBNOINVD support
- do not use -ftrace for __noclone functions
- nested guest support for PAUSE filtering on AMD
- more Hyper-V enlightenments (direct mode for synthetic timers)
PPC:
- nested VFIO
s390:
- bugfixes only this time"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: x86: Add CPUID support for new instruction WBNOINVD
kvm: selftests: ucall: fix exit mmio address guessing
Revert "compiler-gcc: disable -ftracer for __noclone functions"
KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry
KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
KVM: Make kvm_set_spte_hva() return int
KVM: Replace old tlb flush function with new one to flush a specified range.
KVM/MMU: Add tlb flush with range helper function
KVM/VMX: Add hv tlb range flush support
x86/hyper-v: Add HvFlushGuestAddressList hypercall support
KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
KVM: x86: Disable Intel PT when VMXON in L1 guest
KVM: x86: Set intercept for Intel PT MSRs read/write
KVM: x86: Implement Intel PT MSRs read/write emulation
...
Intel Processor Trace virtualization can be work in one
of 2 possible modes:
a. System-Wide mode (default):
When the host configures Intel PT to collect trace packets
of the entire system, it can leave the relevant VMX controls
clear to allow VMX-specific packets to provide information
across VMX transitions.
KVM guest will not aware this feature in this mode and both
host and KVM guest trace will output to host buffer.
b. Host-Guest mode:
Host can configure trace-packet generation while in
VMX non-root operation for guests and root operation
for native executing normally.
Intel PT will be exposed to KVM guest in this mode, and
the trace output to respective buffer of host and guest.
In this mode, tht status of PT will be saved and disabled
before VM-entry and restored after VM-exit if trace
a virtual machine.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add bit definitions for Intel PT MSRs to support trace output
directed to the memeory subsystem and holds a count if packet
bytes that have been sent out.
These are required by the upcoming PT support in KVM guests
for MSRs read/write emulation.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Intel Processor Trace (PT) MSR bit defines are in a private
header. The upcoming support for PT virtualization requires these defines
to be accessible from KVM code.
Move them to the global MSR header file.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some guests OSes (including Windows 10) write to MSR 0xc001102c
on some cases (possibly while trying to apply a CPU errata).
Make KVM ignore reads and writes to that MSR, so the guest won't
crash.
The MSR is documented as "Execution Unit Configuration (EX_CFG)",
at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family
15h Models 00h-0Fh Processors".
Cc: stable@vger.kernel.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To avoid the overhead of STIBP always on, it's necessary to allow per task
control of STIBP.
Add a new task flag TIF_SPEC_IB and evaluate it during context switch if
SMT is active and flag evaluation is enabled by the speculation control
code. Add the conditional evaluation to x86_virt_spec_ctrl() as well so the
guest/host switch works properly.
This has no effect because TIF_SPEC_IB cannot be set yet and the static key
which controls evaluation is off. Preparatory patch for adding the control
code.
[ tglx: Simplify the context switch logic and make the TIF evaluation
depend on SMP=y and on the static key controlling the conditional
update. Rename it to TIF_SPEC_IB because it controls both STIBP and
IBPB ]
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.176917199@linutronix.de
Implements counter freezing for Arch Perfmon v4 (Skylake and
newer). This allows to speed up the PMI handler by avoiding
unnecessary MSR writes and make it more accurate.
The Arch Perfmon v4 PMI handler is substantially different than
the older PMI handler.
Differences to the old handler:
- It relies on counter freezing, which eliminates several MSR
writes from the PMI handler and lowers the overhead significantly.
It makes the PMI handler more accurate, as all counters get
frozen atomically as soon as any counter overflows. So there is
much less counting of the PMI handler itself.
With the freezing we don't need to disable or enable counters or
PEBS. Only BTS which does not support auto-freezing still needs to
be explicitly managed.
- The PMU acking is done at the end, not the beginning.
This makes it possible to avoid manual enabling/disabling
of the PMU, instead we just rely on the freezing/acking.
- The APIC is acked before reenabling the PMU, which avoids
problems with LBRs occasionally not getting unfreezed on Skylake.
- Looping is only needed to workaround a corner case which several PMIs
are very close to each other. For common cases, the counters are freezed
during PMI handler. It doesn't need to do re-check.
This patch:
- Adds code to enable v4 counter freezing
- Fork <=v3 and >=v4 PMI handlers into separate functions.
- Add kernel parameter to disable counter freezing. It took some time to
debug counter freezing, so in case there are new problems we added an
option to turn it off. Would not expect this to be used until there
are new bugs.
- Only for big core. The patch for small core will be posted later
separately.
Performance:
When profiling a kernel build on Kabylake with different perf options,
measuring the length of all NMI handlers using the nmi handler
trace point:
V3 is without counter freezing.
V4 is with counter freezing.
The value is the average cost of the PMI handler.
(lower is better)
perf options ` V3(ns) V4(ns) delta
-c 100000 1088 894 -18%
-g -c 100000 1862 1646 -12%
--call-graph lbr -c 100000 3649 3367 -8%
--c.g. dwarf -c 100000 2248 1982 -12%
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Link: http://lkml.kernel.org/r/1533712328-2834-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed. Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD aka 0x10B) which has similar write-only semantics to other
MSRs defined in the document.
The semantics of this MSR is to allow "finer granularity invalidation of
caching structures than existing mechanisms like WBINVD. It will writeback
and invalidate the L1 data cache, including all cachelines brought in by
preceding instructions, without invalidating all caches (eg. L2 or
LLC). Some processors may also invalidate the first level level instruction
cache on a L1D_FLUSH command. The L1 data and instruction caches may be
shared across the logical processors of a core."
Use it instead of the loop based L1 flush algorithm.
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511
[ tglx: Avoid allocating pages when the MSR is available ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
According to the Intel Software Developers' Manual, Vol. 4, Order No.
335592, these macros have been reversed since they were added in the
initial turbostat commit. The reversed definitions were presumably
copied from turbostat.c to this file.
Fixes: 9c63a650bb ("tools/power/x86/turbostat: share kernel MSR #defines")
Signed-off-by: Matt Turner <mattst88@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
The "336996 Speculative Execution Side Channel Mitigations" from
May defines this as SSB_NO, hence lets sync-up.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD). To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.
Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2]
as SSBD (Speculative Store Bypass Disable).
Hence changing it.
It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name
is going to be. Following the rename it would be SSBD_NO but that rolls out
to Speculative Store Bypass Disable No.
Also fixed the missing space in X86_FEATURE_AMD_SSBD.
[ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The Speculative Store Bypass vulnerability can be mitigated with the
Reduced Data Speculation (RDS) feature. To allow finer grained control of
this eventually expensive mitigation a per task mitigation control is
required.
Add a new TIF_RDS flag and put it into the group of TIF flags which are
evaluated for mismatch in switch_to(). If these bits differ in the previous
and the next task, then the slow path function __switch_to_xtra() is
invoked. Implement the TIF_RDS dependent mitigation control in the slow
path.
If the prctl for controlling Speculative Store Bypass is disabled or no
task uses the prctl then there is no overhead in the switch_to() fast
path.
Update the KVM related speculation control functions to take TID_RDS into
account as well.
Based on a patch from Tim Chen. Completely rewritten.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Intel CPUs expose methods to:
- Detect whether RDS capability is available via CPUID.7.0.EDX[31],
- The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS.
- MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS.
With that in mind if spec_store_bypass_disable=[auto,on] is selected set at
boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it.
Note that this does not fix the KVM case where the SPEC_CTRL is exposed to
guests which can muck with it, see patch titled :
KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS.
And for the firmware (IBRS to be set), see patch titled:
x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits
[ tglx: Distangled it from the intel implementation and kept the call order ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Pull x86/pti updates from Thomas Gleixner:
"Another set of melted spectrum related changes:
- Code simplifications and cleanups for RSB and retpolines.
- Make the indirect calls in KVM speculation safe.
- Whitelist CPUs which are known not to speculate from Meltdown and
prepare for the new CPUID flag which tells the kernel that a CPU is
not affected.
- A less rigorous variant of the module retpoline check which merily
warns when a non-retpoline protected module is loaded and reflects
that fact in the sysfs file.
- Prepare for Indirect Branch Prediction Barrier support.
- Prepare for exposure of the Speculation Control MSRs to guests, so
guest OSes which depend on those "features" can use them. Includes
a blacklist of the broken microcodes. The actual exposure of the
MSRs through KVM is still being worked on"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Simplify indirect_branch_prediction_barrier()
x86/retpoline: Simplify vmexit_fill_RSB()
x86/cpufeatures: Clean up Spectre v2 related CPUID flags
x86/cpu/bugs: Make retpoline module warning conditional
x86/bugs: Drop one "mitigation" from dmesg
x86/nospec: Fix header guards names
x86/alternative: Print unadorned pointers
x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
x86/msr: Add definitions for new speculation control MSRs
x86/cpufeatures: Add AMD feature bits for Speculation Control
x86/cpufeatures: Add Intel feature bits for Speculation Control
x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
module/retpoline: Warn about missing retpoline in module
KVM: VMX: Make indirect call speculation safe
KVM: x86: Make indirect calls in emulator speculation safe
Pull x86 pti updates from Thomas Gleixner:
"This contains:
- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.
- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.
- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared
- removal of dead code in the PTI pagetable setup functions
- add PTI documentation
- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.
- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.
- the initial spectre_v2 mitigations, aka retpoline:
+ The necessary ASM thunk and compiler support
+ The ASM variants of retpoline and the conversion of affected ASM
code
+ Make LFENCE serializing on AMD so it can be used as speculation
trap
+ The RSB fill after vmexit
- initial objtool support for retpoline
As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:
- the retpoline add on of LFENCE which waits for ACKs
- the RSB fill after context switch
Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...
With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference
to MFENCE_RDTSC. However, since the kernel could be running under a
hypervisor that does not support writing that MSR, read the MSR back and
verify that the bit has been set successfully. If the MSR can be read
and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the
MFENCE_RDTSC feature.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220932.12580.52458.stgit@tlendack-t1.amdoffice.net
To aid in speculation control, make LFENCE a serializing instruction
since it has less overhead than MFENCE. This is done by setting bit 1
of MSR 0xc0011029 (DE_CFG). Some families that support LFENCE do not
have this MSR. For these families, the LFENCE instruction is already
serializing.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220921.12580.71694.stgit@tlendack-t1.amdoffice.net
Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature. SEV is identified by
CPUID 0x8000001f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR). Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Early in the boot process, add checks to determine if the kernel is
running with Secure Encrypted Virtualization (SEV) active.
Checking for SEV requires checking that the kernel is running under a
hypervisor (CPUID 0x00000001, bit 31), that the SEV feature is available
(CPUID 0x8000001f, bit 1) and then checking a non-interceptable SEV MSR
(0xc0010131, bit 0).
This check is required so that during early compressed kernel booting the
pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
updated to include the encryption mask so that when the kernel is
decompressed into encrypted memory, it can boot properly.
After the kernel is decompressed and continues booting the same logic is
used to check if SEV is active and set a flag indicating so. This allows
to distinguish between SME and SEV, each of which have unique differences
in how certain things are handled: e.g. DMA (always bounce buffered with
SEV) or EFI tables (always access decrypted with SME).
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kvm@vger.kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171020143059.3291-13-brijesh.singh@amd.com
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Update the CPU features to include identifying and reporting on the
Secure Memory Encryption (SME) feature. SME is identified by CPUID
0x8000001f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG). Only show the SME feature as available if reported by
CPUID, enabled by BIOS and not configured as CONFIG_X86_32=y.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/85c17ff450721abccddc95e611ae8df3f4d9718b.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts pending.
ARM:
- VCPU request overhaul
- allow timer and PMU to have their interrupt number selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups
s390:
- initial machine check forwarding
- migration support for the CMMA page hinting information
- cleanups and fixes
x86:
- nested VMX bugfixes and improvements
- more reliable NMI window detection on AMD
- APIC timer optimizations
Generic:
- VCPU request overhaul + documentation of common code patterns
- kvm_stat improvements
There is a small conflict in arch/s390 due to an arch-wide field rename.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJZW4XTAAoJEL/70l94x66DkhMH/izpk54KI17PtyQ9VYI2sYeZ
BWK6Kl886g3ij4pFi3pECqjDJzWaa3ai+vFfzzpJJ8OkCJT5Rv4LxC5ERltVVmR8
A3T1I/MRktSC0VJLv34daPC2z4Lco/6SPipUpPnL4bE2HATKed4vzoOjQ3tOeGTy
dwi7TFjKwoVDiM7kPPDRnTHqCe5G5n13sZ49dBe9WeJ7ttJauWqoxhlYosCGNPEj
g8ZX8+cvcAhVnz5uFL8roqZ8ygNEQq2mgkU18W8ZZKuiuwR0gdsG0gSBFNTdwIMK
NoreRKMrw0+oLXTIB8SZsoieU6Qi7w3xMAMabe8AJsvYtoersugbOmdxGCr1lsA=
=OD7H
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"PPC:
- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts pending.
ARM:
- VCPU request overhaul
- allow timer and PMU to have their interrupt number selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups
s390:
- initial machine check forwarding
- migration support for the CMMA page hinting information
- cleanups and fixes
x86:
- nested VMX bugfixes and improvements
- more reliable NMI window detection on AMD
- APIC timer optimizations
Generic:
- VCPU request overhaul + documentation of common code patterns
- kvm_stat improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
Update my email address
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
kvm: x86: mmu: allow A/D bits to be disabled in an mmu
x86: kvm: mmu: make spte mmio mask more explicit
x86: kvm: mmu: dead code thanks to access tracking
KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
KVM: x86: remove ignored type attribute
KVM: LAPIC: Fix lapic timer injection delay
KVM: lapic: reorganize restart_apic_timer
KVM: lapic: reorganize start_hv_timer
kvm: nVMX: Check memory operand to INVVPID
KVM: s390: Inject machine check into the nested guest
KVM: s390: Inject machine check into the guest
tools/kvm_stat: add new interactive command 'b'
tools/kvm_stat: add new command line switch '-i'
tools/kvm_stat: fix error on interactive command 'g'
KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
...
- Rework suspend-to-idle to allow it to take wakeup events signaled
by the EC into account on ACPI-based platforms in order to properly
support power button wakeup from suspend-to-idle on recent Dell
laptops (Rafael Wysocki).
That includes the core suspend-to-idle code rework, support for
the Low Power S0 _DSM interface, and support for the ACPI INT0002
Virtual GPIO device from Hans de Goede (required for USB keyboard
wakeup from suspend-to-idle to work on some machines).
- Stop trying to export the current CPU frequency via /proc/cpuinfo
on x86 as that is inaccurate and confusing (Len Brown).
- Rework the way in which the current CPU frequency is exported by
the kernel (over the cpufreq sysfs interface) on x86 systems with
the APERF and MPERF registers by always using values read from
these registers, when available, to compute the current frequency
regardless of which cpufreq driver is in use (Len Brown).
- Rework the PCI/ACPI device wakeup infrastructure to remove the
questionable and artificial distinction between "devices that
can wake up the system from sleep states" and "devices that can
generate wakeup signals in the working state" from it, which
allows the code to be simplified quite a bit (Rafael Wysocki).
- Fix the wakeup IRQ framework by making it use SRCU instead of
RCU which doesn't allow sleeping in the read-side critical
sections, but which in turn is expected to be allowed by the
IRQ bus locking infrastructure (Thomas Gleixner).
- Modify some computations in the intel_pstate driver to avoid
rounding errors resulting from them (Srinivas Pandruvada).
- Reduce the overhead of the intel_pstate driver in the HWP
(hardware-managed P-states) mode and when the "performance"
P-state selection algorithm is in use by making it avoid
registering scheduler callbacks in those cases (Len Brown).
- Rework the energy_performance_preference sysfs knob in
intel_pstate by changing the values that correspond to
different symbolic hint names used by it (Len Brown).
- Make it possible to use more than one cpuidle driver at the same
time on ARM (Daniel Lezcano).
- Make it possible to prevent the cpuidle menu governor from using
the 0 state by disabling it via sysfs (Nicholas Piggin).
- Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1
on AMD systems (Yazen Ghannam).
- Make the CPPC cpufreq driver take the lowest nonlinear performance
information into account (Prashanth Prakash).
- Add support for hi3660 to the cpufreq-dt driver, fix the
imx6q driver and clean up the sfi, exynos5440 and intel_pstate
drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila,
Rafael Wysocki, Tao Wang).
- Fix a few minor issues in the generic power domains (genpd)
framework and clean it up somewhat (Krzysztof Kozlowski,
Mikko Perttunen, Viresh Kumar).
- Fix a couple of minor issues in the operating performance points
(OPP) framework and clean it up somewhat (Viresh Kumar).
- Fix a CONFIG dependency in the hibernation core and clean it up
slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).
- Add rk3228 support to the rockchip-io adaptive voltage scaling
(AVS) driver (David Wu).
- Fix an incorrect bit shift operation in the RAPL power capping
driver (Adam Lessnau).
- Add support for the EPP field in the HWP (hardware managed
P-states) control register, HWP.EPP, to the x86_energy_perf_policy
tool and update msr-index.h with HWP.EPP values (Len Brown).
- Fix some minor issues in the turbostat tool (Len Brown).
- Add support for AMD family 0x17 CPUs to the cpupower tool and fix
a minor issue in it (Sherry Hurwitz).
- Assorted cleanups, mostly related to the constification of some
data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
Kozlowski).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZWrICAAoJEILEb/54YlRxZYMQAIRhfbyDxKq+ByvSilUS8kTA
AItwJ8FFzykhiwN75Cqabg4rAGyWma7IRs1vzU7zeC1aEQIn+bTQtvk+utZNI+g2
ANFlDha20q/sXsP/CDMMTIAdW9tSOC0TOvFI9s2V2Y8dJZhoekO4ctx34FAfUS5d
Ao6rwSAWCMsCXcGaTAlqTA+TEJmBG7u6Iq6hq6ngltoFwOv3mWWBVn52VVaJ7SMp
9/IPbbLGMFAedrgEBRGCR+MME1xZZpvcZIJaTt1Mgn7Cx3cJaysIUAvqY/SsvFGq
5FcUTcF2qpK3+AGawiAxZIjvOBsGRtIwqKinNIzYWs/NjiIdzmgVAmTeuPtTqp+5
HFehUdtkFcnuDnLqSNzAaZUa7tw84cJkwnbVMnesx0MkG6rZ1SeL22E2Sabpcdsh
3Yo1ThzJSxi59DhiiE92EQnNCEjmCldRy+8q5Ag035muxl6EJYvuNBMnZv/BMCUn
ltSNOrmps1DlN+Col8ORIeNzQ1YjYzWMqKAYzSbyccm4ug/iSHx0/DuESmQ4GTlF
YCwkmqyWiHrBwpl51jc+4a7SGlMmKRqU+MJes0CjagaaqoUAb8qeBOpzEJ0yNwjZ
wtI41l6blE6kbMD3yqGdCfiB2S7GlPVoxa15eX1wRyLH3fLjwwrzJirEaiBS86tI
1PzHZEOmBlh3DYC6DBKA
=Wsph
-----END PGP SIGNATURE-----
Merge tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The big ticket items here are the rework of suspend-to-idle in order
to add proper support for power button wakeup from it on recent Dell
laptops and the rework of interfaces exporting the current CPU
frequency on x86.
In addition to that, support for a few new pieces of hardware is
added, the PCI/ACPI device wakeup infrastructure is simplified
significantly and the wakeup IRQ framework is fixed to unbreak the IRQ
bus locking infrastructure.
Also, there are some functional improvements for intel_pstate, tools
updates and small fixes and cleanups all over.
Specifics:
- Rework suspend-to-idle to allow it to take wakeup events signaled
by the EC into account on ACPI-based platforms in order to properly
support power button wakeup from suspend-to-idle on recent Dell
laptops (Rafael Wysocki).
That includes the core suspend-to-idle code rework, support for the
Low Power S0 _DSM interface, and support for the ACPI INT0002
Virtual GPIO device from Hans de Goede (required for USB keyboard
wakeup from suspend-to-idle to work on some machines).
- Stop trying to export the current CPU frequency via /proc/cpuinfo
on x86 as that is inaccurate and confusing (Len Brown).
- Rework the way in which the current CPU frequency is exported by
the kernel (over the cpufreq sysfs interface) on x86 systems with
the APERF and MPERF registers by always using values read from
these registers, when available, to compute the current frequency
regardless of which cpufreq driver is in use (Len Brown).
- Rework the PCI/ACPI device wakeup infrastructure to remove the
questionable and artificial distinction between "devices that can
wake up the system from sleep states" and "devices that can
generate wakeup signals in the working state" from it, which allows
the code to be simplified quite a bit (Rafael Wysocki).
- Fix the wakeup IRQ framework by making it use SRCU instead of RCU
which doesn't allow sleeping in the read-side critical sections,
but which in turn is expected to be allowed by the IRQ bus locking
infrastructure (Thomas Gleixner).
- Modify some computations in the intel_pstate driver to avoid
rounding errors resulting from them (Srinivas Pandruvada).
- Reduce the overhead of the intel_pstate driver in the HWP
(hardware-managed P-states) mode and when the "performance" P-state
selection algorithm is in use by making it avoid registering
scheduler callbacks in those cases (Len Brown).
- Rework the energy_performance_preference sysfs knob in intel_pstate
by changing the values that correspond to different symbolic hint
names used by it (Len Brown).
- Make it possible to use more than one cpuidle driver at the same
time on ARM (Daniel Lezcano).
- Make it possible to prevent the cpuidle menu governor from using
the 0 state by disabling it via sysfs (Nicholas Piggin).
- Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on
AMD systems (Yazen Ghannam).
- Make the CPPC cpufreq driver take the lowest nonlinear performance
information into account (Prashanth Prakash).
- Add support for hi3660 to the cpufreq-dt driver, fix the imx6q
driver and clean up the sfi, exynos5440 and intel_pstate drivers
(Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael
Wysocki, Tao Wang).
- Fix a few minor issues in the generic power domains (genpd)
framework and clean it up somewhat (Krzysztof Kozlowski, Mikko
Perttunen, Viresh Kumar).
- Fix a couple of minor issues in the operating performance points
(OPP) framework and clean it up somewhat (Viresh Kumar).
- Fix a CONFIG dependency in the hibernation core and clean it up
slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).
- Add rk3228 support to the rockchip-io adaptive voltage scaling
(AVS) driver (David Wu).
- Fix an incorrect bit shift operation in the RAPL power capping
driver (Adam Lessnau).
- Add support for the EPP field in the HWP (hardware managed
P-states) control register, HWP.EPP, to the x86_energy_perf_policy
tool and update msr-index.h with HWP.EPP values (Len Brown).
- Fix some minor issues in the turbostat tool (Len Brown).
- Add support for AMD family 0x17 CPUs to the cpupower tool and fix a
minor issue in it (Sherry Hurwitz).
- Assorted cleanups, mostly related to the constification of some
data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
Kozlowski)"
* tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits)
cpufreq: Update scaling_cur_freq documentation
cpufreq: intel_pstate: Clean up after performance governor changes
PM: hibernate: constify attribute_group structures.
cpuidle: menu: allow state 0 to be disabled
intel_idle: Use more common logging style
PM / Domains: Fix missing default_power_down_ok comment
PM / Domains: Fix unsafe iteration over modified list of domains
PM / Domains: Fix unsafe iteration over modified list of domain providers
PM / Domains: Fix unsafe iteration over modified list of device links
PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device
PM / Domains: Call driver's noirq callbacks
PM / core: Drop run_wake flag from struct dev_pm_info
PCI / PM: Simplify device wakeup settings code
PCI / PM: Drop pme_interrupt flag from struct pci_dev
ACPI / PM: Consolidate device wakeup settings code
ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags
PM / QoS: constify *_attribute_group.
PM / AVS: rockchip-io: add io selectors and supplies for rk3228
powercap/RAPL: prevent overridding bits outside of the mask
PM / sysfs: Constify attribute groups
...
Bits 11:2 must be zero and the linear addess in bits 63:12 must be
canonical. Otherwise, WRMSR(BNDCFGS) should raise #GP.
Fixes: 0dd376e709 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save")
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Currently, the SMIs are visible to all performance counters, because
many users want to measure everything including SMIs. But in some
cases, the SMI cycles should not be counted - for example, to calculate
the cost of an SMI itself. So a knob is needed.
When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance
counters will be effected. There is no way to do per-counter freeze
on SMI. So it should not use the per-event interface (e.g. ioctl or
event attribute) to set FREEZE_WHILE_SMM bit.
Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM
bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
while in SMM.
Value has to be 0 or 1. It will be applied to all processors.
Also serialize the entire setting so we don't get multiple concurrent
threads trying to update to different values.
Signed-off-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: bp@alien8.de
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1494600673-244667-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
x = 1
ulong_long = x << 32;
results in:
warning: left shift count >= width of type
x = 8
ulong_long = x << 24;
results in a sign extended ulong_long
Cast x to unsigned long long in these macros
to prevent these errors.
Signed-off-by: Len Brown <len.brown@intel.com>
The Hardware Performance State request MSR has a field
to express the "Energy Performance Preference" (HWP.EPP).
Decode that field so the definition may be shared by
by the intel_pstate driver and any utilities that
decode the same register.
Signed-off-by: Len Brown <len.brown@intel.com>
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. Exposing this feature to userspace will allow a
ptracer to trap and emulate the CPUID instruction.
When supported, this feature is controlled by toggling bit 0 of
MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991
Implement a new pair of arch_prctls, available on both x86-32 and x86-64.
ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting
is enabled (and thus the CPUID instruction is not available) or 1 if
CPUID faulting is not enabled.
ARCH_SET_CPUID: Set the CPUID state to the second argument. If
cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will
be deactivated. Returns ENODEV if CPUID faulting is not supported on
this system.
The state of the CPUID faulting flag is propagated across forks, but reset
upon exec.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. This will allow a ptracer to emulate the CPUID
instruction.
Bit 31 of MSR_PLATFORM_INFO advertises support for this feature. It is
documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991
Detect support for this feature and expose it as X86_FEATURE_CPUID_FAULT.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-8-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This matches the only public Intel documentation of this MSR, in the
"Virtualization Technology FlexMigration Application Note"
(preserved at https://bugzilla.kernel.org/attachment.cgi?id=243991)
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-2-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The debug control MSR is "highly magical" as the blockstep bit can be
cleared by hardware under not well documented circumstances.
So a task switch relying on the bit set by the previous task (according to
the previous tasks thread flags) can trip over this and not update the flag
for the next task.
To fix this its required to handle DEBUGCTLMSR_BTF when either the previous
or the next or both tasks have the TIF_BLOCKSTEP flag set.
While at it avoid branching within the TIF_BLOCKSTEP case and evaluating
boot_cpu_data twice in kernels without CONFIG_X86_DEBUGCTLMSR.
x86_64: arch/x86/kernel/process.o
text data bss dec hex
3024 8577 16 11617 2d61 Before
3008 8577 16 11601 2d51 After
i386: No change
[ tglx: Made the shift value explicit, use a local variable to make the
code readable and massaged changelog]
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/20170214081104.9244-3-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull changes related to turbostat for v4.11 from Len Brown.
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (44 commits)
tools/power turbostat: version 17.02.24
tools/power turbostat: bugfix: --add u32 was printed as u64
tools/power turbostat: show error on exec
tools/power turbostat: dump p-state software config
tools/power turbostat: show package number, even without --debug
tools/power turbostat: support "--hide C1" etc.
tools/power turbostat: move --Package and --processor into the --cpu option
tools/power turbostat: turbostat.8 update
tools/power turbostat: update --list feature
tools/power turbostat: use wide columns to display large numbers
tools/power turbostat: Add --list option to show available header names
tools/power turbostat: fix zero IRQ count shown in one-shot command mode
tools/power turbostat: add --cpu parameter
tools/power turbostat: print sysfs C-state stats
tools/power turbostat: extend --add option to accept /sys path
tools/power turbostat: skip unused counters on BDX
tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limits
tools/power turbostat: skip unused counters on SKX
tools/power turbostat: Denverton: use HW CC1 counter, skip C3, C7
tools/power turbostat: initial Gemini Lake SOC support
...
This non-architectural MSR has disable bits
for various prefetchers on modern processors.
While these bits are generally touched only by the BIOS,
say, via BIOS SETUP, it is useful to dump them
when examining options that can alter performance.
Cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
The Baytrail SOC, with its Silvermont core, has some unique properties:
1. a hardware CC1 residency counter
2. a module-c6 residency counter
3. a package-c6 counter at traditional package-c7 counter address.
The SOC does not support c3, pc3, c7 or pc7 counters.
Signed-off-by: Len Brown <len.brown@intel.com>
The two users, intel_idle driver and turbostat utility
are using the new name, MSR_PKG_CST_CONFIG_CONTROL
Cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
define MSR_PKG_CST_CONFIG_CONTROL (0xE2),
which is the string used by Intel Documentation.
We use this MSR in intel_idle and turbostat by a previous name,
to be updated in the next patch.
Cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Define new MSR MISC_FEATURE_ENABLES (0x140).
On supported CPUs if bit 1 of this MSR is set, then calling MONITOR and
MWAIT instructions outside of ring 0 will not cause invalid-opcode
exception.
The MSR MISC_FEATURE_ENABLES is not yet documented in the SDM. Here is the
relevant documentation:
Hex Dec Name Scope
140H 320 MISC_FEATURE_ENABLES Thread
0 Reserved
1 If set to 1, the MONITOR and MWAIT instructions do not
cause invalid-opcode exceptions when executed with CPL > 0
or in virtual-8086 mode. If MWAIT is executed when CPL > 0
or in virtual-8086 mode, and if EAX indicates a C-state
other than C0 or C1, the instruction operates as if EAX
indicated the C-state C1.
63:2 Reserved
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-2-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Intel Xeons from Ivy Bridge onwards support a processor identification
number set in the factory. To the user this is a handy unique number to
identify a particular CPU. Intel can decode this to the fab/production
run to track errors. On systems that have it, include it in the machine
check record. I'm told that this would be helpful for users that run
large data centers with multi-socket servers to keep track of which CPUs
are seeing errors.
Boris:
* Add some clarifying comments and spacing.
* Mask out [63:2] in the disabled-but-not-locked case
* Call the MSR variable "val" for more readability.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20161123114855.njguoaygp3qnbkia@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove MSR_NHM_TURBO_RATIO_LIMIT and MSR_IVT_TURBO_RATIO_LIMIT as
they are duplicate.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Nothing outside of the Intel PT driver should ever care about its MSR
bits, so there is no reason to keep them in msr-index.h. This patch
moves them to a pt-local header.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1461771888-10409-3-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add Skylake client support for RAPL domains. In addition to RAPL domains
in Broadwell clients, it has support for platform domain (aka PSys). The
PSys domain controls the entire SoC instead of just a CPU package. Unlike
package domain, PSys support requires more than just processor level
implementation. The other parts in the system need additional HW level
signaling, which OEMs need to support. When not supported, the energy
counter register in PSys domain returns 0.
Also corrected error in comment for GPU counter, which previously was
DRAM counter.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com
[ Cnverted to model_match stuff. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: jacob.jun.pan@linux.intel.com
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1460930581-29748-2-git-send-email-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* pm-core:
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
PM / runtime: Document steps for device removal
* powercap:
powercap: intel_rapl: Add missing Haswell model
* pm-tools:
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.
IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
AMD Carrizo (Family 15h, Model 60h) introduces a time-stamp counter
which is indicated by CPUID.8000_0001H:ECX[27]. It increments at a 100
MHz rate in all P-states, and C states, S0, or S1. The frequency is
about 100MHz. This counter will be used to calculate processor power
and other parts. So add an interface into the MSR PMU to get the PTSC
counter value.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1454056197-5893-2-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The list of CPU model specific registers contains two copies of TDP
registers, remove the one, which is out of numerical order in the
list.
Fixes: 6a35fc2d6c ("cpufreq: intel_pstate: get P1 from TAR when available")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Kristen Carlson
Accardi <kristen@linux.intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: http://lkml.kernel.org/r/1459018020-24577-1-git-send-email-vladimir_zapolskiy@mentor.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- Redesign of cpufreq governors and the intel_pstate driver to
make them use callbacks invoked by the scheduler to trigger CPU
frequency evaluation instead of using per-CPU deferrable timers
for that purpose (Rafael Wysocki).
- Reorganization and cleanup of cpufreq governor code to make it
more straightforward and fix some concurrency problems in it
(Rafael Wysocki, Viresh Kumar).
- Cleanup and improvements of locking in the cpufreq core (Viresh
Kumar).
- Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
Kumar, Eric Biggers).
- intel_pstate driver updates including fixes, optimizations and a
modification to make it enable enable hardware-coordinated P-state
selection (HWP) by default if supported by the processor (Philippe
Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
Franciosi).
- Operating Performance Points (OPP) framework updates to improve
its handling of voltage regulators and device clocks and updates
of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
- Updates of the powernv cpufreq driver to fix initialization
and cleanup problems in it and correct its worker thread handling
with respect to CPU offline, new powernv_throttle tracepoint
(Shilpasri Bhat).
- ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
- ACPICA updates including one fix for a regression introduced
by previos changes in the ACPICA code (Bob Moore, Lv Zheng,
David Box, Colin Ian King).
- Support for installing ACPI tables from initrd (Lv Zheng).
- Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
Chaugule).
- Support for _HID(ACPI0010) devices (ACPI processor containers)
and ACPI processor driver cleanups (Sudeep Holla).
- Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
Aleksey Makarov).
- Modification of the ACPI PCI IRQ management code to make it treat
255 in the Interrupt Line register as "not connected" on x86 (as
per the specification) and avoid attempts to use that value as
a valid interrupt vector (Chen Fan).
- ACPI APEI fixes related to resource leaks (Josh Hunt).
- Removal of modularity from a few ACPI drivers (BGRT, GHES,
intel_pmic_crc) that cannot be built as modules in practice (Paul
Gortmaker).
- PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
as a valid resource type (Harb Abdulhamid).
- New device ID (future AMD I2C controller) in the ACPI driver for
AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
- Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
- cpuidle menu governor optimization to avoid a square root
computation in it (Rasmus Villemoes).
- Fix for potential use-after-free in the generic device properties
framework (Heikki Krogerus).
- Updates of the generic power domains (genpd) framework including
support for multiple power states of a domain, fixes and debugfs
output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
Geert Uytterhoeven).
- Intel RAPL power capping driver updates to reduce IPI overhead in
it (Jacob Pan).
- System suspend/hibernation code cleanups (Eric Biggers, Saurabh
Sengar).
- Year 2038 fix for the process freezer (Abhilash Jindal).
- turbostat utility updates including new features (decoding of more
registers and CPUID fields, sub-second intervals support, GFX MHz
and RC6 printout, --out command line option), fixes (syscall jitter
detection and workaround, reductioin of the number of syscalls made,
fixes related to Xeon x200 processors, compiler warning fixes) and
cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJW50NXAAoJEILEb/54YlRxvr8QAIktC9+ft0y5AmU46hDcBWcK
QutyWJL9X9BS6DWBJZA2qclDYFmhMfi5Fza1se0gQ9TnLB/KrBwHWLsiYoTsb1k+
nPKf214aPk+qAhkVuyB4leNWML9Qz9n9jwku/EYxWWpgtbSRf3+0ioIKZeWWc/8V
JvuaOu4O+g/tkmL7QTrnGWBwhIIssAAV85QPsHkx+g68MrCj4UMMzm7z9G21SPXX
bmP8yIHsczX/XnRsY0W2NSno7Vdk6ImHpDJ26IAZg28WRNPWICHgGYHvB0TTWMvb
tts+yqfF7/7QLRjT/M8k9CzDBDE/DnVqoZ0fNJ+aYr7hNKF32mtAN+jH9ZB9dl/P
fEFapJkPxnWyzAoVoB9Dz0rkcZkYMlbxlLWzUGpaPq0JflUUTzLk0ApSjmMn4HRO
UddwCDdyHTaYThp3gn6GbOb0pIP0SdOVbI1M2QV2x/4PLcT2Ft8Np1+1RFWOeinZ
Bdl9AE890big0808mqbBzw/buETwr9FjHtCdDPXpP0vJpkBLu3nIYRNb0LCt39es
mWMp6dFhGgvGj3D3ahTuV3GI8hdpDkh9SObexa11RCjkTKrXcwEmFxHxLeFXwKYq
alG278bo6cSChRMziS1lis+W/3tsJRN4TXUSv1PPzJHrFgptQVFRStU9ngBKP+pN
WB+itPc4Fw0YHOrAFsrx
=cfty
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"This time the majority of changes go into cpufreq and they are
significant.
First off, the way CPU frequency updates are triggered is different
now. Instead of having to set up and manage a deferrable timer for
each CPU in the system to evaluate and possibly change its frequency
periodically, cpufreq governors set up callbacks to be invoked by the
scheduler on a regular basis (basically on utilization updates). The
"old" governors, "ondemand" and "conservative", still do all of their
work in process context (although that is triggered by the scheduler
now), but intel_pstate does it all in the callback invoked by the
scheduler with no need for any additional asynchronous processing.
Of course, this eliminates the overhead related to the management of
all those timers, but also it allows the cpufreq governor code to be
simplified quite a bit. On top of that, the common code and data
structures used by the "ondemand" and "conservative" governors are
cleaned up and made more straightforward and some long-standing and
quite annoying problems are addressed. In particular, the handling of
governor sysfs attributes is modified and the related locking becomes
more fine grained which allows some concurrency problems to be avoided
(particularly deadlocks with the core cpufreq code).
In principle, the new mechanism for triggering frequency updates
allows utilization information to be passed from the scheduler to
cpufreq. Although the current code doesn't make use of it, in the
works is a new cpufreq governor that will make decisions based on the
scheduler's utilization data. That should allow the scheduler and
cpufreq to work more closely together in the long run.
In addition to the core and governor changes, cpufreq drivers are
updated too. Fixes and optimizations go into intel_pstate, the
cpufreq-dt driver is updated on top of some modification in the
Operating Performance Points (OPP) framework and there are fixes and
other updates in the powernv cpufreq driver.
Apart from the cpufreq updates there is some new ACPICA material,
including a fix for a problem introduced by previous ACPICA updates,
and some less significant changes in the ACPI code, like CPPC code
optimizations, ACPI processor driver cleanups and support for loading
ACPI tables from initrd.
Also updated are the generic power domains framework, the Intel RAPL
power capping driver and the turbostat utility and we have a bunch of
traditional assorted fixes and cleanups.
Specifics:
- Redesign of cpufreq governors and the intel_pstate driver to make
them use callbacks invoked by the scheduler to trigger CPU
frequency evaluation instead of using per-CPU deferrable timers for
that purpose (Rafael Wysocki).
- Reorganization and cleanup of cpufreq governor code to make it more
straightforward and fix some concurrency problems in it (Rafael
Wysocki, Viresh Kumar).
- Cleanup and improvements of locking in the cpufreq core (Viresh
Kumar).
- Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
Kumar, Eric Biggers).
- intel_pstate driver updates including fixes, optimizations and a
modification to make it enable enable hardware-coordinated P-state
selection (HWP) by default if supported by the processor (Philippe
Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
Franciosi).
- Operating Performance Points (OPP) framework updates to improve its
handling of voltage regulators and device clocks and updates of the
cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
- Updates of the powernv cpufreq driver to fix initialization and
cleanup problems in it and correct its worker thread handling with
respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
Bhat).
- ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
- ACPICA updates including one fix for a regression introduced by
previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
Colin Ian King).
- Support for installing ACPI tables from initrd (Lv Zheng).
- Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
Chaugule).
- Support for _HID(ACPI0010) devices (ACPI processor containers) and
ACPI processor driver cleanups (Sudeep Holla).
- Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
Aleksey Makarov).
- Modification of the ACPI PCI IRQ management code to make it treat
255 in the Interrupt Line register as "not connected" on x86 (as
per the specification) and avoid attempts to use that value as a
valid interrupt vector (Chen Fan).
- ACPI APEI fixes related to resource leaks (Josh Hunt).
- Removal of modularity from a few ACPI drivers (BGRT, GHES,
intel_pmic_crc) that cannot be built as modules in practice (Paul
Gortmaker).
- PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
as a valid resource type (Harb Abdulhamid).
- New device ID (future AMD I2C controller) in the ACPI driver for
AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
- Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
- cpuidle menu governor optimization to avoid a square root
computation in it (Rasmus Villemoes).
- Fix for potential use-after-free in the generic device properties
framework (Heikki Krogerus).
- Updates of the generic power domains (genpd) framework including
support for multiple power states of a domain, fixes and debugfs
output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
Geert Uytterhoeven).
- Intel RAPL power capping driver updates to reduce IPI overhead in
it (Jacob Pan).
- System suspend/hibernation code cleanups (Eric Biggers, Saurabh
Sengar).
- Year 2038 fix for the process freezer (Abhilash Jindal).
- turbostat utility updates including new features (decoding of more
registers and CPUID fields, sub-second intervals support, GFX MHz
and RC6 printout, --out command line option), fixes (syscall jitter
detection and workaround, reductioin of the number of syscalls
made, fixes related to Xeon x200 processors, compiler warning
fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"
* tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
tools/power turbostat: bugfix: TDP MSRs print bits fixing
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
tools/power turbostat: call __cpuid() instead of __get_cpuid()
tools/power turbostat: indicate SMX and SGX support
tools/power turbostat: detect and work around syscall jitter
tools/power turbostat: show GFX%rc6
tools/power turbostat: show GFXMHz
tools/power turbostat: show IRQs per CPU
tools/power turbostat: make fewer systems calls
tools/power turbostat: fix compiler warnings
tools/power turbostat: add --out option for saving output in a file
tools/power turbostat: re-name "%Busy" field to "Busy%"
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
tools/power turbostat: allow sub-sec intervals
ACPI / APEI: ERST: Fixed leaked resources in erst_init
ACPI / APEI: Fix leaked resources
intel_pstate: Do not skip samples partially
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
...
Pull turbostat updates for 4.6 from Len Brown.
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: bugfix: TDP MSRs print bits fixing
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
tools/power turbostat: call __cpuid() instead of __get_cpuid()
tools/power turbostat: indicate SMX and SGX support
tools/power turbostat: detect and work around syscall jitter
tools/power turbostat: show GFX%rc6
tools/power turbostat: show GFXMHz
tools/power turbostat: show IRQs per CPU
tools/power turbostat: make fewer systems calls
tools/power turbostat: fix compiler warnings
tools/power turbostat: add --out option for saving output in a file
tools/power turbostat: re-name "%Busy" field to "Busy%"
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
tools/power turbostat: allow sub-sec intervals
tools/power turbostat: Decode MSR_MISC_PWR_MGMT
tools/power turbostat: decode HWP registers
x86 msr-index: Simplify syntax for HWP fields
tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
tools/power turbostat: decode more CPUID fields
In order to keep this file's size sensible and not cause too much
unnecessary churn, make the rule explicit - similar to pci_ids.h - that
only MSRs which are used in multiple compilation units, should get added
to it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: alex.williamson@redhat.com
Cc: gleb@kernel.org
Cc: joro@8bytes.org
Cc: kvm@vger.kernel.org
Cc: sherry.hurwitz@amd.com
Cc: wei@redhat.com
Link: http://lkml.kernel.org/r/1455612202-14414-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The kernel accesses IC_CFG MSR (0xc0011021) on AMD because it
checks whether the way access filter is enabled on some F15h
models, and, if so, disables it.
kvm doesn't handle that MSR access and complains about it, which
can get really noisy in dmesg when one starts kvm guests all the
time for testing. And it is useless anyway - guest kernel
shouldn't be doing such changes anyway so tell it that that
filter is disabled.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
MSR_NHM_PLATFORM_INFO has been replaced by...
MSR_PLATFORM_INFO
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After Ivybridge, the max non turbo ratio obtained from platform info msr
is not always guaranteed P1 on client platforms. The max non turbo
activation ratio (TAR), determines the max for the current level of TDP.
The ratio in platform info is physical max. The TAR MSR can be locked,
so updating this value is not possible on all platforms.
This change gets this ratio from MSR TURBO_ACTIVATION_RATIO if
available,
but also do some sanity checking to make sure that this value is
correct.
The sanity check involves reading the TDP ratio for the current tdp
control value when platform has configurable TDP present and matching
TAC
with this.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull perf fixes from Thomas Gleixner:
"Another pile of fixes for perf:
- Plug overflows and races in the core code
- Sanitize the flow of the perf syscall so we error out before
handling the more complex and hard to undo setups
- Improve and fix Broadwell and Skylake hardware support
- Revert a fix which broke what it tried to fix in perf tools
- A couple of smaller fixes in various places of perf tools"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Fix copying of /proc/kcore
perf intel-pt: Remove no_force_psb from documentation
perf probe: Use existing routine to look for a kernel module by dso->short_name
perf/x86: Change test_aperfmperf() and test_intel() to static
tools lib traceevent: Fix string handling in heterogeneous arch environments
perf record: Avoid infinite loop at buildid processing with no samples
perf: Fix races in computing the header sizes
perf: Fix u16 overflows
perf: Restructure perf syscall point of no return
perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask
perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake
perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific
perf tools: Bool functions shouldn't return -1
tools build: Add test for presence of __get_cpuid() gcc builtin
tools build: Add test for presence of numa_num_possible_cpus() in libnuma
Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum"
perf stat: Fix per-pkg event reporting bug
These have roughly the same purpose as the SMRR, which we do not need
to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at
boot, which causes problems when running a Xen dom0 under KVM.
Just return 0, meaning that processor protection of SMRAM is not
in effect.
Reported-by: M A Young <m.a.young@durham.ac.uk>
Cc: stable@vger.kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Skylake has a new FRONTEND_LATENCY PEBS event to accurately profile
frontend problems (like ITLB or decoding issues).
The new event is configured through a separate MSR, which selects
a range of sub events.
Define the extra MSR as a extra reg and export support for it
through sysfs. To avoid duplicating the existing
tables use a new function to add new entries to existing tables.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1435707205-6676-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- ACPICA update to upstream revision 20150818 including method
tracing extensions to allow more in-depth AML debugging in the
kernel and a number of assorted fixes and cleanups (Bob Moore,
Lv Zheng, Markus Elfring).
- ACPI sysfs code updates and a documentation update related to
AML method tracing (Lv Zheng).
- ACPI EC driver fix related to serialized evaluations of _Qxx
methods and ACPI tools updates allowing the EC userspace tool
to be built from the kernel source (Lv Zheng).
- ACPI processor driver updates preparing it for future
introduction of CPPC support and ACPI PCC mailbox driver
updates (Ashwin Chaugule).
- ACPI interrupts enumeration fix for a regression related
to the handling of IRQ attribute conflicts between MADT
and the ACPI namespace (Jiang Liu).
- Fixes related to ACPI device PM (Mika Westerberg, Srinidhi Kasagar).
- ACPI device registration code reorganization to separate the
sysfs-related code and bus type operations from the rest (Rafael
J Wysocki).
- Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).
- ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups
(Pan Xinhui, Rafael J Wysocki).
- cpufreq core cleanups on top of the previous changes allowing it
to preseve its sysfs directories over system suspend/resume (Viresh
Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).
- cpufreq fixes and cleanups related to governors (Viresh Kumar).
- cpufreq updates (core and the cpufreq-dt driver) related to the
turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).
- New DT bindings for Operating Performance Points (OPP), support
for them in the OPP framework and in the cpufreq-dt driver plus
related OPP framework fixes and cleanups (Viresh Kumar).
- cpufreq powernv driver updates (Shilpasri G Bhat).
- New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).
- Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).
- intel_pstate driver updates including Skylake-S support, support
for enabling HW P-states per CPU and an additional vendor bypass
list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).
- cpuidle core fixes related to the handling of coupled idle states
(Xunlei Pang).
- intel_idle driver updates including Skylake Client support and
support for freeze-mode-specific idle states (Len Brown).
- Driver core updates related to power management (Andy Shevchenko,
Rafael J Wysocki).
- Generic power domains framework fixes and cleanups (Jon Hunter,
Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).
- Device PM QoS framework update to allow the latency tolerance
setting to be exposed to user space via sysfs (Mika Westerberg).
- devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).
- System sleep support updates (Alan Stern, Len Brown, SungEun Kim).
- rockchip-io AVS support updates (Heiko Stuebner).
- PM core clocks support fixup (Colin Ian King).
- Power capping RAPL driver update including support for Skylake H/S
and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).
- Generic device properties framework fixes related to the handling
of static (driver-provided) property sets (Andy Shevchenko).
- turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
Shreyas B Prabhu).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJV5hhGAAoJEILEb/54YlRxs+EQAK51iFk48+IbpHYaZZ50Yo4m
ZZc2zBcbwRcBlU9vKERrhG+jieSl8J/JJNxT8vBjKqyvNw038mCjewQh02ol0HuC
R7nlDiVJkmZ50sLO4xwE/1UBZr/XqbddwCUnYzvFMkMTA0ePzFtf8BrJ1FXpT8S/
fkwSXQty6hvJDwxkfrbMSaA730wMju9lahx8D6MlmUAedWYZOJDMQKB4WKa/St5X
9uckBPHUBB2KiKlXxdbFPwKLNxHvLROq5SpDLc6cM/7XZB+QfNFy85CUjCUtYo1O
1W8k0qnztvZ6UEv27qz5dejGyAGOarMWGGNsmL9evoeGeHRpQL+dom7HcTnbAfUZ
walyhYSm/zKkdy7Vl3xWUUQkMG48+PviMI6K0YhHXb3Rm5wlR/yBNZTwNIty9SX/
fKCHEa8QynWwLxgm53c3xRkiitJxMsHNK03moLD9zQMjshTyTNvpNbZoahyKQzk6
H+9M1DBRHhkkREDWSwGutukxfEMtWe2vcZcyERrFiY7l5k1j58DwDBMPqjPhRv6q
P/1NlCzr0XYf83Y86J18LbDuPGDhTjjIEn6CqbtI2mmWqTg3+rF7zvS2ux+FzMnA
gisv8l6GT9JiWhxKFqqL/rrVpwtyHebWLYE/RpNUW6fEzLziRNj1qyYO9dqI/GGi
I3rfxlXoc/5xJWCgNB8f
=fTgI
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"From the number of commits perspective, the biggest items are ACPICA
and cpufreq changes with the latter taking the lead (over 50 commits).
On the cpufreq front, there are many cleanups and minor fixes in the
core and governors, driver updates etc. We also have a new cpufreq
driver for Mediatek MT8173 chips.
ACPICA mostly updates its debug infrastructure and adds a number of
fixes and cleanups for a good measure.
The Operating Performance Points (OPP) framework is updated with new
DT bindings and support for them among other things.
We have a few updates of the generic power domains framework and a
reorganization of the ACPI device enumeration code and bus type
operations.
And a lot of fixes and cleanups all over.
Included is one branch from the MFD tree as it contains some
PM-related driver core and ACPI PM changes a few other commits are
based on.
Specifics:
- ACPICA update to upstream revision 20150818 including method
tracing extensions to allow more in-depth AML debugging in the
kernel and a number of assorted fixes and cleanups (Bob Moore, Lv
Zheng, Markus Elfring).
- ACPI sysfs code updates and a documentation update related to AML
method tracing (Lv Zheng).
- ACPI EC driver fix related to serialized evaluations of _Qxx
methods and ACPI tools updates allowing the EC userspace tool to be
built from the kernel source (Lv Zheng).
- ACPI processor driver updates preparing it for future introduction
of CPPC support and ACPI PCC mailbox driver updates (Ashwin
Chaugule).
- ACPI interrupts enumeration fix for a regression related to the
handling of IRQ attribute conflicts between MADT and the ACPI
namespace (Jiang Liu).
- Fixes related to ACPI device PM (Mika Westerberg, Srinidhi
Kasagar).
- ACPI device registration code reorganization to separate the
sysfs-related code and bus type operations from the rest (Rafael J
Wysocki).
- Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).
- ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan
Xinhui, Rafael J Wysocki).
- cpufreq core cleanups on top of the previous changes allowing it to
preseve its sysfs directories over system suspend/resume (Viresh
Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).
- cpufreq fixes and cleanups related to governors (Viresh Kumar).
- cpufreq updates (core and the cpufreq-dt driver) related to the
turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).
- New DT bindings for Operating Performance Points (OPP), support for
them in the OPP framework and in the cpufreq-dt driver plus related
OPP framework fixes and cleanups (Viresh Kumar).
- cpufreq powernv driver updates (Shilpasri G Bhat).
- New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).
- Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).
- intel_pstate driver updates including Skylake-S support, support
for enabling HW P-states per CPU and an additional vendor bypass
list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).
- cpuidle core fixes related to the handling of coupled idle states
(Xunlei Pang).
- intel_idle driver updates including Skylake Client support and
support for freeze-mode-specific idle states (Len Brown).
- Driver core updates related to power management (Andy Shevchenko,
Rafael J Wysocki).
- Generic power domains framework fixes and cleanups (Jon Hunter,
Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).
- Device PM QoS framework update to allow the latency tolerance
setting to be exposed to user space via sysfs (Mika Westerberg).
- devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).
- System sleep support updates (Alan Stern, Len Brown, SungEun Kim).
- rockchip-io AVS support updates (Heiko Stuebner).
- PM core clocks support fixup (Colin Ian King).
- Power capping RAPL driver update including support for Skylake H/S
and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).
- Generic device properties framework fixes related to the handling
of static (driver-provided) property sets (Andy Shevchenko).
- turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
Shreyas B Prabhu)"
* tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (180 commits)
cpufreq: speedstep-lib: Use monotonic clock
cpufreq: powernv: Increase the verbosity of OCC console messages
cpufreq: sfi: use kmemdup rather than duplicating its implementation
cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor()
cpufreq: rename cpufreq_real_policy as cpufreq_user_policy
cpufreq: remove redundant 'policy' field from user_policy
cpufreq: remove redundant 'governor' field from user_policy
cpufreq: update user_policy.* on success
cpufreq: use memcpy() to copy policy
cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
cpufreq: mediatek: Add MT8173 cpufreq driver
dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings
PM / Domains: Fix typo in description of genpd_dev_pm_detach()
PM / Domains: Remove unusable governor dummies
PM / Domains: Make pm_genpd_init() available to modules
PM / domains: Align column headers and data in pm_genpd_summary output
powercap / RAPL: disable the 2nd power limit properly
tools: cpupower: Fix error when running cpupower monitor
PM / OPP: Drop unlikely before IS_ERR(_OR_NULL)
PM / OPP: Fix static checker warning (broken 64bit big endian systems)
...
Pull turbostat changes for v4.3 from Len Brown.
* 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: fix typo on DRAM column in Joules-mode
tools/power turbostat: fix parameter passing for forked command
tools/power turbostat: dump CONFIG_TDP
tools/power turbostat: cpu0 is no longer hard-coded, so update output
tools/power turbostat: update turbostat(8)
Add new MSRs (LBR_INFO) and some new MSR bits used by the Intel Skylake
PMU driver.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel PT chapter in the new Intel Architecture SDM adds several packets
corresponding enable bits and registers that control packet generation.
Also, additional bits in the Intel PT CPUID leaf were added to enumerate
presence and parameters of these new packets and features.
The packets and enables are:
* CYC: cycle accurate mode, provides the number of cycles elapsed since
previous CYC packet; its presence and available threshold values are
enumerated via CPUID;
* MTC: mini time counter packets, used for tracking TSC time between
full TSC packets; its presence and available resolution options are
enumerated via CPUID;
* PSB packet period is now configurable, available period values are
enumerated via CPUID.
This patch adds corresponding bit and register definitions, pmu driver
capabilities based on CPUID enumeration, new attribute format bits for
the new featurens and extends event configuration validation function
to take these into account.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1438262131-12725-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This header containing all MSRs and respective bit definitions
got exported to userspace in conjunction with the big UAPI
shuffle.
But, it doesn't belong in the UAPI headers because userspace can
do its own MSR defines and exporting them from the kernel blocks
us from doing cleanups/renames in that header. Which is
ridiculous - it is not kernel's job to export such a header and
keep MSRs list and their names stable.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-19-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>