mirror_ubuntu-kernels/arch/x86
Borislav Petkov 9a90ed065a x86/thermal: Fix LVT thermal setup for SMI delivery mode
There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):

  [    0.033858] read lvtthmr: 0x330, val: 0x200

which means:

 - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
 - bits [10:8] have 010b which means SMI delivery mode.

Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:

  setup_local_APIC:

  	...

	/*
	 * If this comes from kexec/kcrash the APIC might be enabled in
	 * SPIV. Soft disable it before doing further initialization.
	 */
	value = apic_read(APIC_SPIV);
	value &= ~APIC_SPIV_APIC_ENABLED;
	apic_write(APIC_SPIV, value);

which means (from the SDM):

"10.4.7.2 Local APIC State After It Has Been Software Disabled

...

* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."

And this happens too:

  [    0.124111] APIC: Switch to symmetric I/O mode setup
  [    0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
  [    0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0

This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...

Now, before

  4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")

due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.

But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.

Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.

Fixes: 4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney <james@nurealm.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
2021-05-31 22:32:26 +02:00
..
boot x86/boot/compressed: Enable -Wundef 2021-05-12 21:39:56 +02:00
configs module: remove EXPORT_UNUSED_SYMBOL* 2021-02-08 12:28:07 +01:00
crypto Objtool updates in this cycle were: 2021-04-28 12:53:24 -07:00
entry quota: Disable quotactl_path syscall 2021-05-17 14:39:56 +02:00
events perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context 2021-05-18 12:53:47 +02:00
hyperv The x86 MM changes in this cycle were: 2021-04-29 11:41:43 -07:00
ia32 x86/ia32_signal: Propagate __user annotation properly 2020-12-11 19:44:31 +01:00
include x86/thermal: Fix LVT thermal setup for SMI delivery mode 2021-05-31 22:32:26 +02:00
kernel x86/thermal: Fix LVT thermal setup for SMI delivery mode 2021-05-31 22:32:26 +02:00
kvm - Enable -Wundef for the compressed kernel build stage 2021-05-16 09:31:06 -07:00
lib - turn the stack canary into a normal __percpu variable on 32-bit which 2021-04-27 17:45:09 -07:00
math-emu x86/fpu/math-emu: Fix function cast warning 2021-03-23 00:08:02 +01:00
mm x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG 2021-05-10 07:51:38 +02:00
net Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
pci x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG 2021-05-10 07:51:38 +02:00
platform x86/sev-es: Rename sev-es.{ch} to sev.{ch} 2021-05-10 07:40:27 +02:00
power - turn the stack canary into a normal __percpu variable on 32-bit which 2021-04-27 17:45:09 -07:00
purgatory crypto: sha - split sha.h into sha1.h and sha2.h 2020-11-20 14:45:33 +11:00
ras
realmode x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG 2021-05-10 07:51:38 +02:00
tools x86/tools/insn_sanity: Convert to insn_decode() 2021-03-15 12:21:11 +01:00
um um: elf.h: Fix W=1 warning for empty body in 'do' statement 2021-04-15 23:10:50 +02:00
video
xen x86/Xen: swap NX determination and GDT setup on BSP 2021-05-21 09:53:52 +02:00
.gitignore
Kbuild
Kconfig x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE 2021-05-05 11:27:27 -07:00
Kconfig.assembler
Kconfig.cpu
Kconfig.debug x86, libnvdimm/test: Remove COPY_MC_TEST 2020-10-26 18:08:35 +01:00
Makefile x86/build: Fix location of '-plugin-opt=' flags 2021-05-19 13:05:53 +02:00
Makefile_32.cpu
Makefile.um