mirror_ubuntu-kernels/arch/x86/kernel/cpu/mce
Tony Luck 1f68ce2a02 x86/mce: Handle Intel threshold interrupt storms
Add an Intel specific hook into machine_check_poll() to keep track of
per-CPU, per-bank corrected error logs (with a stub for the
CONFIG_MCE_INTEL=n case).

When a storm is observed the rate of interrupts is reduced by setting
a large threshold value for this bank in IA32_MCi_CTL2. This bank is
added to the bitmap of banks for this CPU to poll. The polling rate is
increased to once per second.

When a storm ends reset the threshold in IA32_MCi_CTL2 back to 1, remove
the bank from the bitmap for polling, and change the polling rate back
to the default.

If a CPU with banks in storm mode is taken offline, the new CPU that
inherits ownership of those banks takes over management of storm(s) in
the inherited bank(s).

The cmci_discover() function was already very large. These changes
pushed it well over the top. Refactor with three helper functions to
bring it back under control.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231115195450.12963-4-tony.luck@intel.com
2023-12-15 14:53:42 +01:00
..
amd.c x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types 2023-11-28 16:26:55 +01:00
apei.c x86/cpu: Move phys_proc_id into topology info 2023-10-10 14:38:17 +02:00
core.c x86/mce: Add per-bank CMCI storm mitigation 2023-12-15 14:52:01 +01:00
dev-mcelog.c x86/mce/dev-mcelog: use strscpy() to instead of strncpy() 2023-01-07 11:47:35 +01:00
genpool.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 2019-06-05 17:37:17 +02:00
inject.c x86/mce/inject: Clear test status value 2023-11-22 19:13:38 +01:00
intel.c x86/mce: Handle Intel threshold interrupt storms 2023-12-15 14:53:42 +01:00
internal.h x86/mce: Handle Intel threshold interrupt storms 2023-12-15 14:53:42 +01:00
Makefile thermal: Move therm_throt there from x86/mce 2021-02-08 11:43:20 +01:00
p5.c x86/mce: Get rid of machine_check_vector 2021-09-23 11:15:49 +02:00
severity.c x86/mce: Use severity table to handle uncorrected errors in kernel 2022-10-31 17:01:19 +01:00
threshold.c x86/mce: Handle Intel threshold interrupt storms 2023-12-15 14:53:42 +01:00
winchip.c x86/mce: Get rid of machine_check_vector 2021-09-23 11:15:49 +02:00