Commit Graph

18732 Commits

Author SHA1 Message Date
Eric DeVolder
fed8d8773b x86/acpi/boot: Correct acpi_is_processor_usable() check
The logic in acpi_is_processor_usable() requires the online capable
bit be set for hotpluggable CPUs.  The online capable bit has been
introduced in ACPI 6.3.

However, for ACPI revisions < 6.3 which do not support that bit, CPUs
should be reported as usable, not the other way around.

Reverse the check.

  [ bp: Rewrite commit message. ]

Fixes: e2869bd7af ("x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC")
Suggested-by: Miguel Luis <miguel.luis@oracle.com>
Suggested-by: Boris Ostrovsky <boris.ovstrosky@oracle.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: David R <david@unsolicited.net>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230327191026.3454-2-eric.devolder@oracle.com
2023-03-30 11:07:30 +02:00
Mario Limonciello
a74fabfbd1 x86/ACPI/boot: Use FADT version to check support for online capable
ACPI 6.3 introduced the online capable bit, and also introduced MADT
version 5.

Latter was used to distinguish whether the offset storing online capable
could be used. However ACPI 6.2b has MADT version "45" which is for
an errata version of the ACPI 6.2 spec.  This means that the Linux code
for detecting availability of MADT will mistakenly flag ACPI 6.2b as
supporting online capable which is inaccurate as it's an ACPI 6.3 feature.

Instead use the FADT major and minor revision fields to distinguish this.

  [ bp: Massage. ]

Fixes: aa06e20f1b ("x86/ACPI: Don't add CPUs that are not online capable")
Reported-by: Eric DeVolder <eric.devolder@oracle.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/943d2445-84df-d939-f578-5d8240d342cc@unsolicited.net
2023-03-30 10:50:30 +02:00
Michael Kelley
812b0597fb x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.

vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set).  Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM.  Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.

Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.

A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.

When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.

To accomplish the switch in approach, the following must be done:

* Update Hyper-V initialization to set the cc_mask based on vTOM
  and do other coco initialization.

* Update physical_mask so the vTOM bit is no longer treated as part
  of the physical address

* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
  under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
  the vTOM bit as a protection flag.

* Code already exists to make hypercalls to inform Hyper-V about pages
  changing between shared and private.  Update this code to run as a
  callback from __set_memory_enc_pgtable().

* Remove the Hyper-V special case from __set_memory_enc_dec()

* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
  since mem_encrypt_init() will now do it.

* Add a Hyper-V specific implementation of the is_private_mmio()
  callback that returns true for the IO-APIC and vTPM MMIO addresses

  [1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
  [2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/

  [ bp: Touchups. ]

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-27 09:31:43 +02:00
Michael Kelley
88e378d400 x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM
Current code always maps MMIO devices as shared (decrypted) in a
confidential computing VM. But Hyper-V guest VMs on AMD SEV-SNP with vTOM
use a paravisor running in VMPL0 to emulate some devices, such as the
IO-APIC and TPM. In such a case, the device must be accessed as private
(encrypted) because the paravisor emulates the device at an address below
vTOM, where all accesses are encrypted.

Add a new hypervisor callback to determine if an MMIO address should
be mapped private. The callback allows hypervisor-specific code to handle
any quirks, the use of a paravisor, etc. in determining whether a mapping
must be private. If the callback is not used by a hypervisor, default
to returning "false", which is consistent with normal coco VM behavior.

Use this callback as another special case to check for when doing
ioremap().  Just checking the starting address is sufficient as an
ioremap range must be all private or all shared.

Also make the callback in early boot IO-APIC mapping code that uses the
fixmap.

  [ bp: Touchups. ]

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1678329614-3482-2-git-send-email-mikelley@microsoft.com
2023-03-26 23:42:40 +02:00
Linus Torvalds
986c63741d - Add a AMX ptrace self test
- Prevent a false-positive warning when retrieving the (invalid) address of
   dynamic FPU features in their init state which are not saved in
   init_fpstate at all
 
 - Randomize per-CPU entry areas only when KASLR is enabled
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmQgPFAACgkQEsHwGGHe
 VUrAfA//QyZE5JnH0Ber3upRlZ/dPSNKIaOX6DMLshGj7QDqs2utTnjc4pwaqGWD
 OWpPuAJvOo2+NsN4nfB12venasIzseXDBBhEw6a5kYx73QmFbZ4XswFBLl2Eh8we
 cFbqU4B8SQvFQaahZ4kRRHpsmNGEPYRvgh2lBjcKUJBUaCuu6KoqE9+I3t173Obc
 sPfkXmhintDjYIjKfllN78rsBq4uCCaOVu5u299ZFMdBakRtx0M7U3547+4hwoE3
 txP+VK+TPs8e64XJtCTem1br8HXNt/W5pC4IoQPnH8V+FLhUp1iIz6FpVHnJ7VMD
 9c8VL7e8BNXhKkQn8sSkSVUZV3xNP7n4MbKKbba3f6EWPZnI28WQ3w09LUte/1aa
 hHEHyjMVyJfUiAcfuE1gZflG1+TqT8GkQJ+hqG9+/iSCWftOMuhfsKCROCLGhltJ
 yYBoyR2ZC1ErSLIOvgYAEUIeZ9FkzreOU0Pit6P/5qaPu+EXw3uDzoZB0WQH40Z5
 PQwz04/s3idPwbfCZDOyNc7QZwxbGu1ESkdiTtCJmbBLW0MkWiBCnf/qZsK7PdD1
 Q2qmx86ewIo6QipJpGK9pqWuzwFYNEJJHn3P7T1CcYQnQb+61m+b6WeYozQCgyMF
 0dII6JulW98/WzjVgH6zUA0a0dicO7FM9H6iEGqlIcvxv0PuM7M=
 =eZTj
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add a AMX ptrace self test

 - Prevent a false-positive warning when retrieving the (invalid)
   address of dynamic FPU features in their init state which are not
   saved in init_fpstate at all

 - Randomize per-CPU entry areas only when KASLR is enabled

* tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/x86/amx: Add a ptrace test
  x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf()
  x86/mm: Do not shuffle CPU entry areas without KASLR
2023-03-26 09:01:24 -07:00
Josh Poimboeuf
fb799447ae x86,objtool: Split UNWIND_HINT_EMPTY in two
Mark reported that the ORC unwinder incorrectly marks an unwind as
reliable when the unwind terminates prematurely in the dark corners of
return_to_handler() due to lack of information about the next frame.

The problem is UNWIND_HINT_EMPTY is used in two different situations:

  1) The end of the kernel stack unwind before hitting user entry, boot
     code, or fork entry

  2) A blind spot in ORC coverage where the unwinder has to bail due to
     lack of information about the next frame

The ORC unwinder has no way to tell the difference between the two.
When it encounters an undefined stack state with 'end=1', it blindly
marks the stack reliable, which can break the livepatch consistency
model.

Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and
UNWIND_HINT_END_OF_STACK.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:58 +01:00
Josh Poimboeuf
4708ea14be x86,objtool: Separate unret validation from unwind hints
The ENTRY unwind hint type is serving double duty as both an empty
unwind hint and an unret validation annotation.

Unret validation is unrelated to unwinding. Separate it out into its own
annotation.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ff7448d492ea21b86d8a90264b105fbd0d751077.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:58 +01:00
Josh Poimboeuf
f902cfdd46 x86,objtool: Introduce ORC_TYPE_*
Unwind hints and ORC entry types are two distinct things.  Separate them
out more explicitly.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/cc879d38fff8a43f8f7beb2fd56e35a5a384d7cd.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:57 +01:00
Luis Chamberlain
89d7971eb2 x86: Simplify one-level sysctl registration for itmt_kern_table
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230310233248.3965389-3-mcgrof%40kernel.org
2023-03-22 11:47:21 -07:00
Uros Bizjak
22767544e9 x86/ACPI/boot: Improve __acpi_acquire_global_lock
Improve __acpi_acquire_global_lock by using a temporary variable.
This enables compiler to perform if-conversion and improves generated
code from:

 ...
 72a:	d1 ea                	shr    %edx
 72c:	83 e1 fc             	and    $0xfffffffc,%ecx
 72f:	83 e2 01             	and    $0x1,%edx
 732:	09 ca                	or     %ecx,%edx
 734:	83 c2 02             	add    $0x2,%edx
 737:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
 73b:	75 e9                	jne    726 <__acpi_acquire_global_lock+0x6>
 73d:	83 e2 03             	and    $0x3,%edx
 740:	31 c0                	xor    %eax,%eax
 742:	83 fa 03             	cmp    $0x3,%edx
 745:	0f 95 c0             	setne  %al
 748:	f7 d8                	neg    %eax

to:

 ...
 72a:	d1 e9                	shr    %ecx
 72c:	83 e2 fc             	and    $0xfffffffc,%edx
 72f:	83 e1 01             	and    $0x1,%ecx
 732:	09 ca                	or     %ecx,%edx
 734:	83 c2 02             	add    $0x2,%edx
 737:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
 73b:	75 e9                	jne    726 <__acpi_acquire_global_lock+0x6>
 73d:	8d 41 ff             	lea    -0x1(%rcx),%eax

BTW: the compiler could generate:

	lea 0x2(%rcx,%rdx,1),%edx

instead of:

	or     %ecx,%edx
	add    $0x2,%edx

but unwated conversion from add to or when bits are known to be zero
prevents this improvement. This is GCC PR108477.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/all/20230320212012.12704-1-ubizjak%40gmail.com
2023-03-22 11:32:39 -07:00
Chang S. Bae
b158888402 x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf()
__copy_xstate_to_uabi_buf() copies either from the tasks XSAVE buffer
or from init_fpstate into the ptrace buffer. Dynamic features, like
XTILEDATA, have an all zeroes init state and are not saved in
init_fpstate, which means the corresponding bit is not set in the
xfeatures bitmap of the init_fpstate header.

But __copy_xstate_to_uabi_buf() retrieves addresses for both the tasks
xstate and init_fpstate unconditionally via __raw_xsave_addr().

So if the tasks XSAVE buffer has a dynamic feature set, then the
address retrieval for init_fpstate triggers the warning in
__raw_xsave_addr() which checks the feature bit in the init_fpstate
header.

Remove the address retrieval from init_fpstate for extended features.
They have an all zeroes init state so init_fpstate has zeros for them.
Then zeroing the user buffer for the init state is the same as copying
them from init_fpstate.

Fixes: 2308ee57d9 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/kvm/20230221163655.920289-2-mizhang@google.com/
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/all/20230227210504.18520-2-chang.seok.bae%40intel.com
Cc: stable@vger.kernel.org
2023-03-22 10:59:13 -07:00
Mark Rutland
fee86a4ed5 ftrace: selftest: remove broken trace_direct_tramp
The ftrace selftest code has a trace_direct_tramp() function which it
uses as a direct call trampoline. This happens to work on x86, since the
direct call's return address is in the usual place, and can be returned
to via a RET, but in general the calling convention for direct calls is
different from regular function calls, and requires a trampoline written
in assembly.

On s390, regular function calls place the return address in %r14, and an
ftrace patch-site in an instrumented function places the trampoline's
return address (which is within the instrumented function) in %r0,
preserving the original %r14 value in-place. As a regular C function
will return to the address in %r14, using a C function as the trampoline
results in the trampoline returning to the caller of the instrumented
function, skipping the body of the instrumented function.

Note that the s390 issue is not detcted by the ftrace selftest code, as
the instrumented function is trivial, and returning back into the caller
happens to be equivalent.

On arm64, regular function calls place the return address in x30, and
an ftrace patch-site in an instrumented function saves this into r9
and places the trampoline's return address (within the instrumented
function) in x30. A regular C function will return to the address in
x30, but will not restore x9 into x30. Consequently, using a C function
as the trampoline results in returning to the trampoline's return
address having corrupted x30, such that when the instrumented function
returns, it will return back into itself.

To avoid future issues in this area, remove the trace_direct_tramp()
function, and require that each architecture with direct calls provides
a stub trampoline, named ftrace_stub_direct_tramp. This can be written
to handle the architecture's trampoline calling convention, and in
future could be used elsewhere (e.g. in the ftrace ops sample, to
measure the overhead of direct calls), so we may as well always build it
in.

Link: https://lkml.kernel.org/r/20230321140424.345218-8-revest@chromium.org

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:59:29 -04:00
Dionna Glaze
0144e3b85d x86/sev: Change snp_guest_issue_request()'s fw_err argument
The GHCB specification declares that the firmware error value for
a guest request will be stored in the lower 32 bits of EXIT_INFO_2.  The
upper 32 bits are for the VMM's own error code. The fw_err argument to
snp_guest_issue_request() is thus a misnomer, and callers will need
access to all 64 bits.

The type of unsigned long also causes problems, since sw_exit_info2 is
u64 (unsigned long long) vs the argument's unsigned long*. Change this
type for issuing the guest request. Pass the ioctl command struct's error
field directly instead of in a local variable, since an incomplete guest
request may not set the error code, and uninitialized stack memory would
be written back to user space.

The firmware might not even be called, so bookend the call with the no
firmware call error and clear the error.

Since the "fw_err" field is really exitinfo2 split into the upper bits'
vmm error code and lower bits' firmware error code, convert the 64 bit
value to a union.

  [ bp:
   - Massage commit message
   - adjust code
   - Fix a build issue as
   Reported-by: kernel test robot <lkp@intel.com>
   Link: https://lore.kernel.org/oe-kbuild-all/202303070609.vX6wp2Af-lkp@intel.com
   - print exitinfo2 in hex
   Tom:
    - Correct -EIO exit case. ]

Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230214164638.1189804-5-dionnaglaze@google.com
Link: https://lore.kernel.org/r/20230307192449.24732-12-bp@alien8.de
2023-03-21 15:43:19 +01:00
David Woodhouse
805ae9dc3b x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
When bringing up a secondary CPU from do_boot_cpu(), the warm reset flag
is set in CMOS and the starting IP for the trampoline written inside the
BDA at 0x467. Once the CPU is running, the CMOS flag is unset and the
value in the BDA cleared.

To allow for parallel bringup of CPUs, add a reference count to track the
number of CPUs currently bring brought up, and clear the state only when
the count reaches zero.

Since the RTC spinlock is required to write to the CMOS, it can be used
for mutual exclusion on the refcount too.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Link: https://lore.kernel.org/r/20230316222109.1940300-5-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
Brian Gerst
8f6be6d870 x86/smpboot: Remove initial_gs
Given its CPU#, each CPU can find its own per-cpu offset, and directly set
GSBASE accordingly. The global variable can be eliminated.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Usama Arif <usama.arif@bytedance.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20230316222109.1940300-9-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
Brian Gerst
c253b64020 x86/smpboot: Remove early_gdt_descr on 64-bit
Build the GDT descriptor on the stack instead.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Usama Arif <usama.arif@bytedance.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20230316222109.1940300-8-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
Brian Gerst
3adee777ad x86/smpboot: Remove initial_stack on 64-bit
In order to facilitate parallel startup, start to eliminate some of the
global variables passing information to CPUs in the startup path.

However, start by introducing one more: smpboot_control. For now this
merely holds the CPU# of the CPU which is coming up. Each CPU can then
find its own per-cpu data, and everything else it needs can be found
from there, allowing the other global variables to be removed.

First to be removed is initial_stack. Each CPU can load %rsp from its
current_task->thread.sp instead. That is already set up with the correct
idle thread for APs. Set up the .sp field in INIT_THREAD on x86 so that
the BSP also finds a suitable stack pointer in the static per-cpu data
when coming up on first boot.

On resume from S3, the CPU needs a temporary stack because its idle task
is already active. Instead of setting initial_stack, the sleep code can
simply set its own current->thread.sp to point to the temporary stack.
Nobody else cares about ->thread.sp for a thread which is currently on
a CPU, because the true value is actually in the %rsp register. Which
is restored with the rest of the CPU context in do_suspend_lowlevel().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Usama Arif <usama.arif@bytedance.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20230316222109.1940300-7-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
David Woodhouse
cefad862f2 x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
Each of the sibling CPUs in a cluster uses the same clustermask. The first
CPU in a cluster will need a new clustermask allocated, while subsequent
siblings will use the same clustermask as the first.

However, the CPU being brought up cannot yet perform memory allocations
at the point that this occurs in init_x2apic_ldr().

So at present, the alloc_clustermask() function allocates a clustermask
just in case it's needed, storing it in the global cluster_hotplug_mask.
A CPU which is the first sibling of a cluster will "take" it from there
and set cluster_hotplug_mask to NULL, in order for alloc_clustermask()
to allocate a new one before bringing up the next CPU.

To facilitate parallel bringup of CPUs in future, switch to a model
where alloc_clustermask() prepopulates the clustermask in the per_cpu
data for each present CPU in the cluster in advance. All that the CPU
needs to do for itself in init_x2apic_ldr() is set its own bit in that
mask.

The 'node' and 'clusterid' members of struct cluster_mask are thus
redundant, and it can become a simple struct cpumask instead.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Link: https://lore.kernel.org/r/20230316222109.1940300-2-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
Muralidhara M K
4c1cdec319 x86/MCE/AMD: Use an u64 for bank_map
Thee maximum number of MCA banks is 64 (MAX_NR_BANKS), see

  a0bc32b3ca ("x86/mce: Increase maximum number of banks to 64").

However, the bank_map which contains a bitfield of which banks to
initialize is of type unsigned int and that overflows when those bit
numbers are >= 32, leading to UBSAN complaining correctly:

  UBSAN: shift-out-of-bounds in arch/x86/kernel/cpu/mce/amd.c:1365:38
  shift exponent 32 is too large for 32-bit type 'int'

Change the bank_map to a u64 and use the proper BIT_ULL() macro when
modifying bits in there.

  [ bp: Rewrite commit message. ]

Fixes: a0bc32b3ca ("x86/mce: Increase maximum number of banks to 64")
Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127151601.1068324-1-muralimk@amd.com
2023-03-19 19:07:04 +01:00
Linus Torvalds
c46a7d0473 - Flush out logged errors immediately after MCA banks configuration
changes over sysfs have been done instead of waiting until something
   else triggers the workqueue later - another error or the polling
   interval cycle is reached
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmQXB9kACgkQEsHwGGHe
 VUqqjA//YbcRx2PFcZT5nnuQlb6bptsluCUrHOcJVT/1fe0ayrlvahuw/QtSXRH4
 Vwukc3+1cehp3CcSbHKAKOArTL7NV2tbk+EZQk+Ae+7QdRz/9TuEenL6ipCC1cr4
 Z3Bo3KZmHlBcoJaQDcQWWIL8TiYAPXdqXWksh8q+0pxDI2wuFguFBJI84j+AUZH+
 I4EDXLfzQn8RQZgiggEIez0aOIig74eaPfhHsNlqJJYG4x/EVgmRn9qJpYBGAeq6
 xQR6NvHUTjCCZAASI1QJ/IT5rXD17iey3J/gIw3QZEhotBCCDdk5vh8S8zqDStRF
 x3Za7qeC5m4HMfB/09v8HGeTitlaT0BYmM2CFOsru7I/qI+dJccDTwLmF8UY5Nj2
 G6454A7ZEQ13lhfAoDIeVFfoSkqyXNz+McTtOQ8/xDJ5hnuNJ4WtT7sWemWZlV5S
 l14xVFbojtGNmQygUGeL7cxl6h12Y9zFNwh1A5HzwH4EvywQJW7/35pxXEZIO3tl
 EioXKe1eSLcKoD9VAv8icmstpwJl1Gm5Xge1oyw8cyTW6d3hM8ZOEqdTAJvRkfG7
 LwPl3qC6Hrqhjc26WZ9pxmvR1hSYLWIidy6MlNeO9mf6wZR/ub+SmuHzy6n7TZl4
 pTsVver93ZgS1J8CJ0ohCK1jHs+2aLvh/6qiJvIw9lbgAZ2HKPo=
 =Q6Dq
 -----END PGP SIGNATURE-----

Merge tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS fix from Borislav Petkov:

 - Flush out logged errors immediately after MCA banks configuration
   changes over sysfs have been done instead of waiting until something
   else triggers the workqueue later - another error or the polling
   interval cycle is reached

* tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Make sure logged MCEs are processed after sysfs update
2023-03-19 09:57:53 -07:00
Linus Torvalds
4ac39c5910 - Check cmdline_find_option()'s return value before further processing
- Clear temporary storage in the resctrl code to prevent access to an
   unexistent MSR
 
 - Add a simple throttling mechanism to protect the hypervisor from potentially
   malicious SEV guests issuing requests in rapid succession.
 
   In order to not jeopardize the sanity of everyone involved in
   maintaining this code, the request issuing side has received
   a cleanup, split in more or less trivial, small and digestible pieces.
   Otherwise, the code was threatening to become an unmaintainable mess.
 
   Therefore, that cleanup is marked indirectly also for stable so that
   there's no differences between the upstream code and the stable
   variant when it comes down to backporting more there.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmQW/64ACgkQEsHwGGHe
 VUoWzBAAl1KD4RR5EhrppOCl5mWtZmKUf+COag7RiqggXJyhCTXO+5N24dHcgoJB
 h60gY7Nxg0CpZVbkMDSpJIuclmlMkiCLgUeuvN6E5ofgb/ZSv9nDuCXPUtLQ962d
 T6071/v48G+2PVGm+PD1xAwP3065i3itVV/k6Xn8fxeXf/fq8L5eU5tADuFICI0b
 dKbd7U+TEQAh5E6BUwms2G1P0glJqqL37H22fTcyxI6D2T/UJLlc4+or5JmTofDa
 XJE/UHn+ZaGZYjhdr/BrlcxnY1jUTQH2K3wciADmNolkuCpDQJs6GgN98lXdhT34
 vyWQVokHGEKE8Va6m5wZX90eKraSc27/0d5ZlHz/rIJgVBxp/VvCzqLUZRvkRwwk
 k7bVOeZHe6P+b0QQl7uL9U2ff0sV/4PX0NLr+jzQdlA2ZYuTV6YgBDl7nAe1Tw/J
 gJViAvDbm26mlTG1wQrvw9M2P4AQIYpEmD4KPs7j2aQafUgtGqfTBwyeKHXdtMLJ
 TrkEISZZ8BVVvYghctN4R21IryUSnfq2eXxPwxUMh78SrO8sC23QJ5PVqKM/enF8
 azf/ZBgANidzqJ44k2Ow2bnO0ZTYZblvl3NUCMNa5SjQmzAEUzupHEKUgV10MFMR
 J3lspGU47BVeirFPWlCYKr+3Buwzur5xo5wCmrezxbN0fAo9k5M=
 =6UGz
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "There's a little bit more 'movement' in there for my taste but it
  needs to happen and should make the code better after it.

   - Check cmdline_find_option()'s return value before further
     processing

   - Clear temporary storage in the resctrl code to prevent access to an
     unexistent MSR

   - Add a simple throttling mechanism to protect the hypervisor from
     potentially malicious SEV guests issuing requests in rapid
     succession.

     In order to not jeopardize the sanity of everyone involved in
     maintaining this code, the request issuing side has received a
     cleanup, split in more or less trivial, small and digestible
     pieces. Otherwise, the code was threatening to become an
     unmaintainable mess.

     Therefore, that cleanup is marked indirectly also for stable so
     that there's no differences between the upstream code and the
     stable variant when it comes down to backporting more there"

* tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix use of uninitialized buffer in sme_enable()
  x86/resctrl: Clear staged_config[] before and after it is used
  virt/coco/sev-guest: Add throttling awareness
  virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case
  virt/coco/sev-guest: Do some code style cleanups
  virt/coco/sev-guest: Carve out the request issuing logic into a helper
  virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()
  virt/coco/sev-guest: Simplify extended guest request handling
  virt/coco/sev-guest: Check SEV_SNP attribute at probe time
2023-03-19 09:43:41 -07:00
Greg Kroah-Hartman
60260272dc x86/umwait: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230313182918.1312597-10-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:29:29 +01:00
Greg Kroah-Hartman
216f58beb2 x86/microcode: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230313182918.1312597-9-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:29:26 +01:00
Greg Kroah-Hartman
1aaba11da9 driver core: class: remove module * from class_create()
The module pointer in class_create() never actually did anything, and it
shouldn't have been requred to be set as a parameter even if it did
something.  So just remove it and fix up all callers of the function in
the kernel tree at the same time.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:16:33 +01:00
Juergen Gross
11af36cb89 x86/paravirt: Convert simple paravirt functions to asm
All functions referenced via __PV_IS_CALLEE_SAVE() need to be assembler
functions, as those functions calls are hidden from the compiler.

In case the kernel is compiled with "-fzero-call-used-regs" the compiler
will clobber caller-saved registers at the end of C functions, which
will result in unexpectedly zeroed registers at the call site of the
related paravirt functions.

Replace the C functions with DEFINE_PARAVIRT_ASM() constructs using
the same instructions as the related paravirt calls in the
PVOP_ALT_[V]CALLEE*() macros. And since they're not C functions visible
to the compiler anymore, latter won't do the callee-clobbered zeroing
invoked by -fzero-call-used-regs and thus won't corrupt registers.

  [ bp: Extend commit message. ]

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230317063325.361-1-jgross@suse.com
2023-03-17 13:29:47 +01:00
Michael Kelley
f8acb24aaf x86/hyperv: Block root partition functionality in a Confidential VM
Hyper-V should never specify a VM that is a Confidential VM and also
running in the root partition.  Nonetheless, explicitly block such a
combination to guard against a compromised Hyper-V maliciously trying to
exploit root partition functionality in a Confidential VM to expose
Confidential VM secrets. No known bug is being fixed, but the attack
surface for Confidential VMs on Hyper-V is reduced.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1678894453-95392-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-03-17 10:57:35 +00:00
Kirill A. Shutemov
23e5d9ec2b x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
IOMMU and SVA-capable devices know nothing about LAM and only expect
canonical addresses. An attempt to pass down tagged pointer will lead
to address translation failure.

By default do not allow to enable both LAM and use SVA in the same
process.

The new ARCH_FORCE_TAGGED_SVA arch_prctl() overrides the limitation.
By using the arch_prctl() userspace takes responsibility to never pass
tagged address to the device.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230312112612.31869-12-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:40 -07:00
Kirill A. Shutemov
400b9b9344 iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
Kernel has few users of pasid_valid() and all but one checks if the
process has PASID allocated. The helper takes ioasid_t as the input.

Replace the helper with mm_valid_pasid() that takes mm_struct as the
argument. The only call that checks PASID that is not tied to mm_struct
is open-codded now.

This is preparatory patch. It helps avoid ifdeffery: no need to
dereference mm->pasid in generic code to check if the process has PASID.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230312112612.31869-11-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:40 -07:00
Kirill A. Shutemov
2f8794bd08 x86/mm: Provide arch_prctl() interface for LAM
Add a few of arch_prctl() handles:

 - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number
   of tag bits. It is rounded up to the nearest LAM mode that can
   provide it. For now only LAM_U57 is supported, with 6 tag bits.

 - ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag
   bits located in the address.

 - ARCH_GET_MAX_TAG_BITS returns the maximum tag bits user can request.
   Zero if LAM is not supported.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/all/20230312112612.31869-9-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:39 -07:00
Kirill A. Shutemov
74c228d20a x86/uaccess: Provide untagged_addr() and remove tags before address check
untagged_addr() is a helper used by the core-mm to strip tag bits and
get the address to the canonical shape based on rules of the current
thread. It only handles userspace addresses.

The untagging mask is stored in per-CPU variable and set on context
switching to the task.

The tags must not be included into check whether it's okay to access the
userspace address. Strip tags in access_ok().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/all/20230312112612.31869-7-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:39 -07:00
Kirill A. Shutemov
5ef495e55f x86: Allow atomic MM_CONTEXT flags setting
So far there's no need in atomic setting of MM context flags in
mm_context_t::flags. The flags set early in exec and never change
after that.

LAM enabling requires atomic flag setting. The upcoming flag
MM_CONTEXT_FORCE_TAGGED_SVA can be set much later in the process
lifetime where multiple threads exist.

Convert the field to unsigned long and do MM_CONTEXT_* accesses with
__set_bit() and test_bit().

No functional changes.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/all/20230312112612.31869-3-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:39 -07:00
Fenghua Yu
d7ce15e1d4 x86/split_lock: Enumerate architectural split lock disable bit
The December 2022 edition of the Intel Instruction Set Extensions manual
defined that the split lock disable bit in the IA32_CORE_CAPABILITIES MSR
is (and retrospectively always has been) architectural.

Remove all the model specific checks except for Ice Lake variants which are
still needed because these CPU models do not enumerate presence of the
IA32_CORE_CAPABILITIES MSR.

Originally-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/lkml/20220701131958.687066-1-fenghua.yu@intel.com/t/#mada243bee0915532a6adef6a9e32d244d1a9aef4
2023-03-16 11:50:51 +01:00
Borislav Petkov (AMD)
8cc68c9c9e x86/CPU/AMD: Make sure EFER[AIBRSE] is set
The AutoIBRS bit gets set only on the BSP as part of determining which
mitigation to enable on AMD. Setting on the APs relies on the
circumstance that the APs get booted through the trampoline and EFER
- the MSR which contains that bit - gets replicated on every AP from the
BSP.

However, this can change in the future and considering the security
implications of this bit not being set on every CPU, make sure it is set
by verifying EFER later in the boot process and on every AP.

Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20230224185257.o3mcmloei5zqu7wa@treble
2023-03-16 11:50:00 +01:00
Peter Newman
322b72e0fd x86/resctrl: Avoid redundant counter read in __mon_event_count()
__mon_event_count() does the per-RMID, per-domain work for
user-initiated event count reads and the initialization of new monitor
groups.

In the initialization case, after resctrl_arch_reset_rmid() calls
__rmid_read() to record an initial count for a new monitor group, it
immediately calls resctrl_arch_rmid_read(). This re-read of the hardware
counter is unnecessary and the following computations are ignored by the
caller during initialization.

Following return from resctrl_arch_reset_rmid(), just clear the
mbm_state and return. This involves moving the mbm_state lookup into the
rr->first case, as it's not needed for regular event count reads: the
QOS_L3_OCCUP_EVENT_ID case was redundant with the accumulating logic at
the end of the function.

Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/all/20221220164132.443083-2-peternewman%40google.com
2023-03-15 15:44:15 -07:00
Shawn Wang
0424a7dfe9 x86/resctrl: Clear staged_config[] before and after it is used
As a temporary storage, staged_config[] in rdt_domain should be cleared
before and after it is used. The stale value in staged_config[] could
cause an MSR access error.

Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3
Cache (MBA should be disabled if the number of CLOSIDs for MB is less than
16.) :
	mount -t resctrl resctrl -o cdp /sys/fs/resctrl
	mkdir /sys/fs/resctrl/p{1..7}
	umount /sys/fs/resctrl/
	mount -t resctrl resctrl /sys/fs/resctrl
	mkdir /sys/fs/resctrl/p{1..8}

An error occurs when creating resource group named p8:
    unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
    Call Trace:
     <IRQ>
     __flush_smp_call_function_queue+0x11d/0x170
     __sysvec_call_function+0x24/0xd0
     sysvec_call_function+0x89/0xc0
     </IRQ>
     <TASK>
     asm_sysvec_call_function+0x16/0x20

When creating a new resource control group, hardware will be configured
by the following process:
    rdtgroup_mkdir()
      rdtgroup_mkdir_ctrl_mon()
        rdtgroup_init_alloc()
          resctrl_arch_update_domains()

resctrl_arch_update_domains() iterates and updates all resctrl_conf_type
whose have_new_ctrl is true. Since staged_config[] holds the same values as
when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA
configurations. When group p8 is created, get_config_index() called in
resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for
CDP_CODE and CDP_DATA, which will be translated to an invalid register -
0xca0 in this scenario.

Fix it by clearing staged_config[] before and after it is used.

[reinette: re-order commit tags]

Fixes: 75408e4350 ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
Suggested-by: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com
2023-03-15 15:19:43 -07:00
Linus Torvalds
29db00c252 Tracing fixes for v6.3
- Do not allow histogram values to have modifies.
   Can cause a NULL pointer dereference if they do.
 
 - Warn if hist_field_name() is passed a NULL.
   Prevent the NULL pointer dereference mentioned above.
 
 - Fix invalid address look up race in lookup_rec()
 
 - Define ftrace_stub_graph conditionally to prevent linker errors
 
 - Always check if RCU is watching at all tracepoint locations
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZBDuTBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsboAP4yfrFYvIIKM5EkzkEiPI+V2hdlA12x
 bt839jO5AWCmhAEAiY8FmKatpBJQKsiGqSOab8aHOMnhGFZwltCHAPa9PAI=
 =vtA2
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Do not allow histogram values to have modifies. They can cause a NULL
   pointer dereference if they do.

 - Warn if hist_field_name() is passed a NULL. Prevent the NULL pointer
   dereference mentioned above.

 - Fix invalid address look up race in lookup_rec()

 - Define ftrace_stub_graph conditionally to prevent linker errors

 - Always check if RCU is watching at all tracepoint locations

* tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Make tracepoint lockdep check actually test something
  ftrace,kcfi: Define ftrace_stub_graph conditionally
  ftrace: Fix invalid address access in lookup_rec() when index is 0
  tracing: Check field value in hist_field_name()
  tracing: Do not let histogram values have some modifiers
2023-03-14 17:07:54 -07:00
Dionna Glaze
72f7754dcf virt/coco/sev-guest: Add throttling awareness
A potentially malicious SEV guest can constantly hammer the hypervisor
using this driver to send down requests and thus prevent or at least
considerably hinder other guests from issuing requests to the secure
processor which is a shared platform resource.

Therefore, the host is permitted and encouraged to throttle such guest
requests.

Add the capability to handle the case when the hypervisor throttles
excessive numbers of requests issued by the guest. Otherwise, the VM
platform communication key will be disabled, preventing the guest from
attesting itself.

Realistically speaking, a well-behaved guest should not even care about
throttling. During its lifetime, it would end up issuing a handful of
requests which the hardware can easily handle.

This is more to address the case of a malicious guest. Such guest should
get throttled and if its VMPCK gets disabled, then that's its own
wrongdoing and perhaps that guest even deserves it.

To the implementation: the hypervisor signals with SNP_GUEST_REQ_ERR_BUSY
that the guest requests should be throttled. That error code is returned
in the upper 32-bit half of exitinfo2 and this is part of the GHCB spec
v2.

So the guest is given a throttling period of 1 minute in which it
retries the request every 2 seconds. This is a good default but if it
turns out to not pan out in practice, it can be tweaked later.

For safety, since the encryption algorithm in GHCBv2 is AES_GCM, control
must remain in the kernel to complete the request with the current
sequence number. Returning without finishing the request allows the
guest to make another request but with different message contents. This
is IV reuse, and breaks cryptographic protections.

  [ bp:
    - Rewrite commit message and do a simplified version.
    - The stable tags are supposed to denote that a cleanup should go
      upfront before backporting this so that any future fixes to this
      can preserve the sanity of the backporter(s). ]

Fixes: d5af44dde5 ("x86/sev: Provide support for SNP guest request NAEs")
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org> # d6fd48eff7 ("virt/coco/sev-guest: Check SEV_SNP attribute at probe time")
Cc: <stable@kernel.org> # 970ab82374 (" virt/coco/sev-guest: Simplify extended guest request handling")
Cc: <stable@kernel.org> # c5a338274b ("virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()")
Cc: <stable@kernel.org> # 0fdb6cc7c8 ("virt/coco/sev-guest: Carve out the request issuing logic into a helper")
Cc: <stable@kernel.org> # d25bae7dc7 ("virt/coco/sev-guest: Do some code style cleanups")
Cc: <stable@kernel.org> # fa4ae42cc6 ("virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case")
Link: https://lore.kernel.org/r/20230214164638.1189804-2-dionnaglaze@google.com
2023-03-13 13:29:27 +01:00
Borislav Petkov (AMD)
fa4ae42cc6 virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case
snp_issue_guest_request() checks the value returned by the hypervisor in
sw_exit_info_2 and returns a different error depending on it.

Convert those checks into a switch-case to make it more readable when
more error values are going to be checked in the future.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230307192449.24732-8-bp@alien8.de
2023-03-13 12:55:34 +01:00
Borislav Petkov (AMD)
970ab82374 virt/coco/sev-guest: Simplify extended guest request handling
Return a specific error code - -ENOSPC - to signal the too small cert
data buffer instead of checking exit code and exitinfo2.

While at it, hoist the *fw_err assignment in snp_issue_guest_request()
so that a proper error value is returned to the callers.

  [ Tom: check override_err instead of err. ]

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230307192449.24732-4-bp@alien8.de
2023-03-13 11:27:10 +01:00
Borislav Petkov (AMD)
d6fd48eff7 virt/coco/sev-guest: Check SEV_SNP attribute at probe time
No need to check it on every ioctl. And yes, this is a common SEV driver
but it does only SNP-specific operations currently. This can be
revisited later, when more use cases appear.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230307192449.24732-3-bp@alien8.de
2023-03-13 11:20:20 +01:00
Yazen Ghannam
4783b9cb37 x86/mce: Make sure logged MCEs are processed after sysfs update
A recent change introduced a flag to queue up errors found during
boot-time polling. These errors will be processed during late init once
the MCE subsystem is fully set up.

A number of sysfs updates call mce_restart() which goes through a subset
of the CPU init flow. This includes polling MCA banks and logging any
errors found. Since the same function is used as boot-time polling,
errors will be queued. However, the system is now past late init, so the
errors will remain queued until another error is found and the workqueue
is triggered.

Call mce_schedule_work() at the end of mce_restart() so that queued
errors are processed.

Fixes: 3bff147b18 ("x86/mce: Defer processing of early errors")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230301221420.2203184-1-yazen.ghannam@amd.com
2023-03-12 21:12:21 +01:00
Linus Torvalds
d3d0cac69f - Disable XSAVES on AMD Zen1 and Zen2 machines due to an erratum. No
impact to anything as those machines will fallback to XSAVEC which is
   equivalent there.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmQNtvAACgkQEsHwGGHe
 VUpGiRAAjlYpvaQK24s8MiQr3LBC0pKsgKstf1Jx5C+HspmS5JAdF83646kMOUKm
 MUGPfQwK1nN5kO0/fOlo4O6vhSIF2Ft/Xfrd/APZm6qJhR3pli9675NeF8fH2D5t
 Ypgtl6psRudkB3RUmE1cmHWbr9dMnHZZLnL6iA/qHYXCY3kaw96ncM6HjdnrjXRd
 OV2+N4dyhTet3MdUdw7dSr1uz75O5PQH/1FwR1V2zroF1sjImaIwQ7JN51hIITxw
 DzfTbfuJzdAqwfztBFG/yZ5K+DEoU5BemHHIuhq+X9/7GeLMd059DdnZuXSX8mcH
 jjzOa/E5r/PjYze0XRWT3RbI5fbSc1qhNbmj3kLNP3KE/F3S74n6FR58oLNqosVk
 zw1TYP8oocdjG1VxJdm5qndIzwHMSj3qkd+BSNZZ1fwINVLXtSDubtThkN/i+81+
 nqnMA8HFrcwy1bhwq4jd5dmP7tjlODATfeL4ZV6/6J1RX8Vwu+bjdy8PM+vJYJ0d
 pnFLT20cf6Or0MQHUssO+uh6oC3aQ6AxPWJcuUfbdSLYzjr2EObgCHXGZOhCjvhC
 CsALcmwnLh5XzwglzWoXyyv+tsJar63XYcPSEIt+gIfXpLf7ZbzcOSDLDkri6B3Z
 fCABGASFnoXr7ZYnGxH4L5WKWOk1W+pgpxyC4mnzD9oHtXIzUPU=
 =u6kj
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.3_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:
 "A single erratum fix for AMD machines:

   - Disable XSAVES on AMD Zen1 and Zen2 machines due to an erratum. No
     impact to anything as those machines will fallback to XSAVEC which
     is equivalent there"

* tag 'x86_urgent_for_v6.3_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Disable XSAVES on AMD family 0x17
2023-03-12 09:12:03 -07:00
Arnd Bergmann
aa69f81492 ftrace,kcfi: Define ftrace_stub_graph conditionally
When CONFIG_FUNCTION_GRAPH_TRACER is disabled, __kcfi_typeid_ftrace_stub_graph
is missing, causing a link failure:

 ld.lld: error: undefined symbol: __kcfi_typeid_ftrace_stub_graph
 referenced by arch/x86/kernel/ftrace_64.o:(__cfi_ftrace_stub_graph) in archive vmlinux.a

Mark the reference to it as conditional on the same symbol, as
is done on arm64.

Link: https://lore.kernel.org/linux-trace-kernel/20230131093643.3850272-1-arnd@kernel.org

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Fixes: 883bbbffa5 ("ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()")
See-also: 2598ac6ec4 ("arm64: ftrace: Define ftrace_stub_graph only with FUNCTION_GRAPH_TRACER")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-09 22:17:06 -05:00
Song Liu
ac3b432839 module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:

1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
   obvious how these data are used (are they RO, RX, or RW?)

Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:

        MOD_TEXT,
        MOD_DATA,
        MOD_RODATA,
        MOD_RO_AFTER_INIT,
        MOD_INIT_TEXT,
        MOD_INIT_DATA,
        MOD_INIT_RODATA,

and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.

Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.

module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.

Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-09 12:55:15 -08:00
Linus Torvalds
7fef099702 x86/resctl: fix scheduler confusion with 'current'
The implementation of 'current' on x86 is very intentionally special: it
is a very common thing to look up, and it uses 'this_cpu_read_stable()'
to get the current thread pointer efficiently from per-cpu storage.

And the keyword in there is 'stable': the current thread pointer never
changes as far as a single thread is concerned.  Even if when a thread
is preempted, or moved to another CPU, or even across an explicit call
'schedule()' that thread will still have the same value for 'current'.

It is, after all, the kernel base pointer to thread-local storage.
That's why it's stable to begin with, but it's also why it's important
enough that we have that special 'this_cpu_read_stable()' access for it.

So this is all done very intentionally to allow the compiler to treat
'current' as a value that never visibly changes, so that the compiler
can do CSE and combine multiple different 'current' accesses into one.

However, there is obviously one very special situation when the
currently running thread does actually change: inside the scheduler
itself.

So the scheduler code paths are special, and do not have a 'current'
thread at all.  Instead there are _two_ threads: the previous and the
next thread - typically called 'prev' and 'next' (or prev_p/next_p)
internally.

So this is all actually quite straightforward and simple, and not all
that complicated.

Except for when you then have special code that is run in scheduler
context, that code then has to be aware that 'current' isn't really a
valid thing.  Did you mean 'prev'? Did you mean 'next'?

In fact, even if then look at the code, and you use 'current' after the
new value has been assigned to the percpu variable, we have explicitly
told the compiler that 'current' is magical and always stable.  So the
compiler is quite free to use an older (or newer) value of 'current',
and the actual assignment to the percpu storage is not relevant even if
it might look that way.

Which is exactly what happened in the resctl code, that blithely used
'current' in '__resctrl_sched_in()' when it really wanted the new
process state (as implied by the name: we're scheduling 'into' that new
resctl state).  And clang would end up just using the old thread pointer
value at least in some configurations.

This could have happened with gcc too, and purely depends on random
compiler details.  Clang just seems to have been more aggressive about
moving the read of the per-cpu current_task pointer around.

The fix is trivial: just make the resctl code adhere to the scheduler
rules of using the prev/next thread pointer explicitly, instead of using
'current' in a situation where it just wasn't valid.

That same code is then also used outside of the scheduler context (when
a thread resctl state is explicitly changed), and then we will just pass
in 'current' as that pointer, of course.  There is no ambiguity in that
case.

The fix may be trivial, but noticing and figuring out what went wrong
was not.  The credit for that goes to Stephane Eranian.

Reported-by: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/
Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Stephane Eranian <eranian@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-08 11:48:11 -08:00
Philippe Mathieu-Daudé
b4c108d7da x86/cpu: Expose arch_cpu_idle_dead()'s prototype definition
Include <linux/cpu.h> to make sure arch_cpu_idle_dead() matches its
prototype going forward.

Inspired-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20230214083857.50163-1-philmd@linaro.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08 08:44:30 -08:00
Josh Poimboeuf
071c44e427 sched/idle: Mark arch_cpu_idle_dead() __noreturn
Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead()
return"), in Xen, when a previously offlined CPU was brought back
online, it unexpectedly resumed execution where it left off in the
middle of the idle loop.

There were some hacks to make that work, but the behavior was surprising
as do_idle() doesn't expect an offlined CPU to return from the dead (in
arch_cpu_idle_dead()).

Now that Xen has been fixed, and the arch-specific implementations of
arch_cpu_idle_dead() also don't return, give it a __noreturn attribute.

This will cause the compiler to complain if an arch-specific
implementation might return.  It also improves code generation for both
caller and callee.

Also fixes the following warning:

  vmlinux.o: warning: objtool: do_idle+0x25f: unreachable instruction

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/60d527353da8c99d4cf13b6473131d46719ed16d.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08 08:44:28 -08:00
Josh Poimboeuf
eab89405b6 x86/cpu: Mark play_dead() __noreturn
play_dead() doesn't return.  Annotate it as such.  By extension this
also makes arch_cpu_idle_dead() noreturn.

Link: https://lore.kernel.org/r/f3a069e6869c51ccfdda656b76882363bc9fcfa4.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08 08:44:26 -08:00
Andrew Cooper
b0563468ee x86/CPU/AMD: Disable XSAVES on AMD family 0x17
AMD Erratum 1386 is summarised as:

  XSAVES Instruction May Fail to Save XMM Registers to the Provided
  State Save Area

This piece of accidental chronomancy causes the %xmm registers to
occasionally reset back to an older value.

Ignore the XSAVES feature on all AMD Zen1/2 hardware.  The XSAVEC
instruction (which works fine) is equivalent on affected parts.

  [ bp: Typos, move it into the F17h-specific function. ]

Reported-by: Tavis Ormandy <taviso@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230307174643.1240184-1-andrew.cooper3@citrix.com
2023-03-08 16:56:08 +01:00
Borislav Petkov (AMD)
554eec0b4a x86/mce: Always inline old MCA stubs
The stubs for the ancient MCA support (CONFIG_X86_ANCIENT_MCE) are
normally optimized away on 64-bit builds. However, an allmodconfig one
causes the compiler to add sanitizer calls gunk into them and they exist
as constprop calls. Which objtool then complains about:

  vmlinux.o: warning: objtool: do_machine_check+0xad8: call to \
    pentium_machine_check.constprop.0() leaves .noinstr.text section

due to them missing noinstr. One could tag them "noinstr" but what
should really happen is, they should be forcefully inlined so that all
that gunk gets optimized away and the warning doesn't even have a chance
to fire.

Do so.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230222191054.4701-1-bp@alien8.de
2023-03-08 13:50:07 +01:00
Thomas Weißschuh
7214b32b6f x86/MCE/AMD: Make kobj_type structure constant
Since

  ee6d3dd4ed ("driver core: make kobj_type constant.")

the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definition to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230217-kobj_type-mce-amd-v1-1-40ef94816444@weissschuh.net
2023-03-06 09:57:27 +01:00
Juergen Gross
c9ae1b10d9 x86/paravirt: Merge activate_mm() and dup_mmap() callbacks
The two paravirt callbacks .mmu.activate_mm() and .mmu.dup_mmap() are
sharing the same implementations in all cases: for Xen PV guests they
are pinning the PGD of the new mm_struct, and for all other cases they
are a NOP.

In the end, both callbacks are meant to register an address space with
the underlying hypervisor, so there needs to be only a single callback
for that purpose.

So merge them to a common callback .mmu.enter_mmap() (in contrast to the
corresponding already existing .mmu.exit_mmap()).

As the first parameter of the old callbacks isn't used, drop it from the
replacement one.

  [ bp: Remove last occurrence of paravirt_activate_mm() in
    asm/mmu_context.h ]

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Link: https://lore.kernel.org/r/20230207075902.7539-1-jgross@suse.com
2023-03-06 09:41:37 +01:00
Linus Torvalds
7f9ec7d816 A small set of updates for x86:
- Return -EIO instead of success when the certificate buffer for SEV
    guests is not large enough.
 
  - Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared on
    return to userspace for performance reasons, but the leaves user space
    vulnerable to cross-thread attacks which STIBP prevents. Update the
    documentation accordingly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmQEVnETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoegJEACbn+CQKFxB4kXJ1xBamYsqQfxY1mM1
 yFziEVH3VCXSshfvKePH7fnoAUHTzhy+SjN6c1ERvl82WVXm/BoF2B81KpN9Yd18
 R6wTpIS227Pn+Ll1yfVQJMHrb0mnSczo5vCGyOzMOxkqIbNCkHRMoeSBspfNLLGM
 3D2+IQqBaqBgNzPQ3JHrwRqQAy/3ZJT4IrHSFe0LwgYQ/EeAGydY8UN0wB1y5YN0
 SoFhPd7B7UWxUD7PrfriBc3B2HN44QkMpe/fQJ4y0GVF+1Uqp6Ti7ouCEVg60A3g
 8kiS+98FBIzHySk+xfX/vlhiQD/J2c6/+p28gw+iGf6YmUsQbeu64tV5TAUGGBN+
 kErLvJmJnC/dwWiEMXzv/e6sNKoZi0Yz/JVq6atuoT/521cjDEDapZRxBSmaW33M
 Zn6YF8FIsUTHGdt9Equ+HPjZZTyk34W8f0d0N+lws0QNWtk5d0KU5XP2PDp+Mj6O
 dGVaGv88qmMIr0o/s9CgvpefSM8L7fC0WQwRpRr905gu8k6YxuEWQofuh365ZcKT
 sEDeRqZYi+ue4+gW1GRje6M5ODftTWoLPlX2f+iZui1gwwpuczvj0sRR10kKfKRD
 qxpHcxyIzS2MW4aT1JnVgeWStt0x5wWeq1qzO1bwBJCAlS63vln/mUnBq7+uV0ca
 KiEah5vP4dcenA==
 =1RwH
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 updates from Thomas Gleixner:
 "A small set of updates for x86:

   - Return -EIO instead of success when the certificate buffer for SEV
     guests is not large enough

   - Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared
     on return to userspace for performance reasons, but the leaves user
     space vulnerable to cross-thread attacks which STIBP prevents.
     Update the documentation accordingly"

* tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  virt/sev-guest: Return -EIO if certificate buffer is not large enough
  Documentation/hw-vuln: Document the interaction between IBRS and STIBP
  x86/speculation: Allow enabling STIBP with legacy IBRS
2023-03-05 11:27:48 -08:00
Linus Torvalds
857f1268a5 Changes in this cycle were:
- Shrink 'struct instruction', to improve objtool performance & memory
    footprint.
 
  - Other maximum memory usage reductions - this makes the build both faster,
    and fixes kernel build OOM failures on allyesconfig and similar configs
    when they try to build the final (large) vmlinux.o.
 
  - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying
    single-byte instruction (PUSH/POP or LEAVE). This requires the
    extension of the ORC metadata structure with a 'signal' field.
 
  - Misc fixes & cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmQAVp8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gV6A//YbWb4nNxYbRFBd1O3FnFfy4efrDQ4btI
 hwkL6f7jka9RnIpIEatJvaLdNvyN5tuPCC/+B5eVnvFdd1JcBUmj5D+zYFt6H6qt
 BG4M6TNHFkP1kOJVfFGn8UPRfoMz2oMiEqilpsc1Yuf7b3ldMJtGUoHaeZC9pyqe
 RUisKNw4WHZp2G/gTBUWxW17xpWY3Awgch/w4HCu8wMnR+uEC44i0UCBfnAadl36
 ar66PfhMJcQIv0XkK9wu43g7+HFnjpxHOx35JW3lRot0xRnwl/JcsmaX5iPkh0gt
 HV8eLH80J0homeMZDY7vWIKJxGeLkIdfjO5gxwTdnFc9rQw3GwHp1B7WTS6J3Vwe
 gM00kyaGly3CvkKMiz5QQBfViWCjE25nYS8X0i9Oz6Gk58IkRPGByaDTKRjNrDJB
 BwH9DE9xb3dPVZRv/PejkTdggQWo+FDTrL8ulHIjUFK11M7VubwkskecNHkfpAOE
 TRy5iLjMocF8u7hdyec6Mma2K6qEndC2Rw9ZMPQ7TeieMsBcl63cSRgSJLFfdRhr
 /5c6Hr2SNQKU8xu+3j49GyBwFvp4CwCa+GPs9/o+l0uCvuKNIn9B788cm4TjxLJ9
 C3PRzE6B/CaLhYvlC5k5cNM+I4YpoMU/mvSvY6HcC0Duj2nSAWS2VV60MVMDpqVX
 8nK4xnla2tM=
 =bpPY
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Shrink 'struct instruction', to improve objtool performance & memory
   footprint

 - Other maximum memory usage reductions - this makes the build both
   faster, and fixes kernel build OOM failures on allyesconfig and
   similar configs when they try to build the final (large) vmlinux.o

 - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying
   single-byte instruction (PUSH/POP or LEAVE). This requires the
   extension of the ORC metadata structure with a 'signal' field

 - Misc fixes & cleanups

* tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  objtool: Fix ORC 'signal' propagation
  objtool: Remove instruction::list
  x86: Fix FILL_RETURN_BUFFER
  objtool: Fix overlapping alternatives
  objtool: Union instruction::{call_dest,jump_table}
  objtool: Remove instruction::reloc
  objtool: Shrink instruction::{type,visited}
  objtool: Make instruction::alts a single-linked list
  objtool: Make instruction::stack_ops a single-linked list
  objtool: Change arch_decode_instruction() signature
  x86/entry: Fix unwinding from kprobe on PUSH/POP instruction
  x86/unwind/orc: Add 'signal' field to ORC metadata
  objtool: Optimize layout of struct special_alt
  objtool: Optimize layout of struct symbol
  objtool: Allocate multiple structures with calloc()
  objtool: Make struct check_options static
  objtool: Make struct entries[] static and const
  objtool: Fix HOSTCC flag usage
  objtool: Properly support make V=1
  objtool: Install libsubcmd in build
  ...
2023-03-02 09:45:34 -08:00
KP Singh
6921ed9049 x86/speculation: Allow enabling STIBP with legacy IBRS
When plain IBRS is enabled (not enhanced IBRS), the logic in
spectre_v2_user_select_mitigation() determines that STIBP is not needed.

The IBRS bit implicitly protects against cross-thread branch target
injection. However, with legacy IBRS, the IBRS bit is cleared on
returning to userspace for performance reasons which leaves userspace
threads vulnerable to cross-thread branch target injection against which
STIBP protects.

Exclude IBRS from the spectre_v2_in_ibrs_mode() check to allow for
enabling STIBP (through seccomp/prctl() by default or always-on, if
selected by spectre_v2_user kernel cmdline parameter).

  [ bp: Massage. ]

Fixes: 7c693f54c8 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
Reported-by: José Oliveira <joseloliveira11@gmail.com>
Reported-by: Rodrigo Branco <rodrigo@kernelhacking.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230220120127.1975241-1-kpsingh@kernel.org
Link: https://lore.kernel.org/r/20230221184908.2349578-1-kpsingh@kernel.org
2023-02-27 18:57:09 +01:00
Linus Torvalds
49d5759268 ARM:
- Provide a virtual cache topology to the guest to avoid
   inconsistencies with migration on heterogenous systems. Non secure
   software has no practical need to traverse the caches by set/way in
   the first place.
 
 - Add support for taking stage-2 access faults in parallel. This was an
   accidental omission in the original parallel faults implementation,
   but should provide a marginal improvement to machines w/o FEAT_HAFDBS
   (such as hardware from the fruit company).
 
 - A preamble to adding support for nested virtualization to KVM,
   including vEL2 register state, rudimentary nested exception handling
   and masking unsupported features for nested guests.
 
 - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
   resuming a CPU when running pKVM.
 
 - VGIC maintenance interrupt support for the AIC
 
 - Improvements to the arch timer emulation, primarily aimed at reducing
   the trap overhead of running nested.
 
 - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
   interest of CI systems.
 
 - Avoid VM-wide stop-the-world operations when a vCPU accesses its own
   redistributor.
 
 - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
   in the host.
 
 - Aesthetic and comment/kerneldoc fixes
 
 - Drop the vestiges of the old Columbia mailing list and add [Oliver]
   as co-maintainer
 
 This also drags in arm64's 'for-next/sme2' branch, because both it and
 the PSCI relay changes touch the EL2 initialization code.
 
 RISC-V:
 
 - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
 
 - Correctly place the guest in S-mode after redirecting a trap to the guest
 
 - Redirect illegal instruction traps to guest
 
 - SBI PMU support for guest
 
 s390:
 
 - Two patches sorting out confusion between virtual and physical
   addresses, which currently are the same on s390.
 
 - A new ioctl that performs cmpxchg on guest memory
 
 - A few fixes
 
 x86:
 
 - Change tdp_mmu to a read-only parameter
 
 - Separate TDP and shadow MMU page fault paths
 
 - Enable Hyper-V invariant TSC control
 
 - Fix a variety of APICv and AVIC bugs, some of them real-world,
   some of them affecting architecurally legal but unlikely to
   happen in practice
 
 - Mark APIC timer as expired if its in one-shot mode and the count
   underflows while the vCPU task was being migrated
 
 - Advertise support for Intel's new fast REP string features
 
 - Fix a double-shootdown issue in the emergency reboot code
 
 - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
   similar treatment to VMX
 
 - Update Xen's TSC info CPUID sub-leaves as appropriate
 
 - Add support for Hyper-V's extended hypercalls, where "support" at this
   point is just forwarding the hypercalls to userspace
 
 - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
   MSR filters
 
 - One-off fixes and cleanups
 
 - Fix and cleanup the range-based TLB flushing code, used when KVM is
   running on Hyper-V
 
 - Add support for filtering PMU events using a mask.  If userspace
   wants to restrict heavily what events the guest can use, it can now
   do so without needing an absurd number of filter entries
 
 - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
   support is disabled
 
 - Add PEBS support for Intel Sapphire Rapids
 
 - Fix a mostly benign overflow bug in SEV's send|receive_update_data()
 
 - Move several SVM-specific flags into vcpu_svm
 
 x86 Intel:
 
 - Handle NMI VM-Exits before leaving the noinstr region
 
 - A few trivial cleanups in the VM-Enter flows
 
 - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
   EPTP switching (or any other VM function) for L1
 
 - Fix a crash when using eVMCS's enlighted MSR bitmaps
 
 Generic:
 
 - Clean up the hardware enable and initialization flow, which was
   scattered around multiple arch-specific hooks.  Instead, just
   let the arch code call into generic code.  Both x86 and ARM should
   benefit from not having to fight common KVM code's notion of how
   to do initialization.
 
 - Account allocations in generic kvm_arch_alloc_vm()
 
 - Fix a memory leak if coalesced MMIO unregistration fails
 
 selftests:
 
 - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
   the correct hypercall instruction instead of relying on KVM to patch
   in VMMCALL
 
 - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
 mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
 9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
 duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
 VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
 /WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
 =goe1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Provide a virtual cache topology to the guest to avoid
     inconsistencies with migration on heterogenous systems. Non secure
     software has no practical need to traverse the caches by set/way in
     the first place

   - Add support for taking stage-2 access faults in parallel. This was
     an accidental omission in the original parallel faults
     implementation, but should provide a marginal improvement to
     machines w/o FEAT_HAFDBS (such as hardware from the fruit company)

   - A preamble to adding support for nested virtualization to KVM,
     including vEL2 register state, rudimentary nested exception
     handling and masking unsupported features for nested guests

   - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
     resuming a CPU when running pKVM

   - VGIC maintenance interrupt support for the AIC

   - Improvements to the arch timer emulation, primarily aimed at
     reducing the trap overhead of running nested

   - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
     interest of CI systems

   - Avoid VM-wide stop-the-world operations when a vCPU accesses its
     own redistributor

   - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
     exceptions in the host

   - Aesthetic and comment/kerneldoc fixes

   - Drop the vestiges of the old Columbia mailing list and add [Oliver]
     as co-maintainer

  RISC-V:

   - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE

   - Correctly place the guest in S-mode after redirecting a trap to the
     guest

   - Redirect illegal instruction traps to guest

   - SBI PMU support for guest

  s390:

   - Sort out confusion between virtual and physical addresses, which
     currently are the same on s390

   - A new ioctl that performs cmpxchg on guest memory

   - A few fixes

  x86:

   - Change tdp_mmu to a read-only parameter

   - Separate TDP and shadow MMU page fault paths

   - Enable Hyper-V invariant TSC control

   - Fix a variety of APICv and AVIC bugs, some of them real-world, some
     of them affecting architecurally legal but unlikely to happen in
     practice

   - Mark APIC timer as expired if its in one-shot mode and the count
     underflows while the vCPU task was being migrated

   - Advertise support for Intel's new fast REP string features

   - Fix a double-shootdown issue in the emergency reboot code

   - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
     SVM similar treatment to VMX

   - Update Xen's TSC info CPUID sub-leaves as appropriate

   - Add support for Hyper-V's extended hypercalls, where "support" at
     this point is just forwarding the hypercalls to userspace

   - Clean up the kvm->lock vs. kvm->srcu sequences when updating the
     PMU and MSR filters

   - One-off fixes and cleanups

   - Fix and cleanup the range-based TLB flushing code, used when KVM is
     running on Hyper-V

   - Add support for filtering PMU events using a mask. If userspace
     wants to restrict heavily what events the guest can use, it can now
     do so without needing an absurd number of filter entries

   - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
     support is disabled

   - Add PEBS support for Intel Sapphire Rapids

   - Fix a mostly benign overflow bug in SEV's
     send|receive_update_data()

   - Move several SVM-specific flags into vcpu_svm

  x86 Intel:

   - Handle NMI VM-Exits before leaving the noinstr region

   - A few trivial cleanups in the VM-Enter flows

   - Stop enabling VMFUNC for L1 purely to document that KVM doesn't
     support EPTP switching (or any other VM function) for L1

   - Fix a crash when using eVMCS's enlighted MSR bitmaps

  Generic:

   - Clean up the hardware enable and initialization flow, which was
     scattered around multiple arch-specific hooks. Instead, just let
     the arch code call into generic code. Both x86 and ARM should
     benefit from not having to fight common KVM code's notion of how to
     do initialization

   - Account allocations in generic kvm_arch_alloc_vm()

   - Fix a memory leak if coalesced MMIO unregistration fails

  selftests:

   - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
     emit the correct hypercall instruction instead of relying on KVM to
     patch in VMMCALL

   - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
  KVM: SVM: hyper-v: placate modpost section mismatch error
  KVM: x86/mmu: Make tdp_mmu_allowed static
  KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Handle SMCs taken from virtual EL2
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: Use the S2 MMU context to iterate over S2 table
  ...
2023-02-25 11:30:21 -08:00
Linus Torvalds
d8e473182a - Fixup comment typo
- Prevent unexpected #VE's from:
   - Hosts removing perfectly good guest mappings (SEPT_VE_DISABLE
   - Excessive #VE notifications (NOTIFY_ENABLES) which are
     delivered via a #VE.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmP1WzMACgkQaDWVMHDJ
 krAqig/+MzIYmUIkuYbluektxPdzI6zhY/Z+eD5DDH9OFZX5e0WQrmHpQbJ3i4Q6
 LT5JQ+yAI2ox/mhPfyCeDXqdRiatJJExUDUepc0qsOEW9gTsJ+edYwUsJg8HII61
 +TLz/BiMSF6xCUk46b4CqzhoeEk1dupFAG204uc4vGwSfXdysN3buAcciJc1rOTS
 7G9hI9fdLSjEJ8yyFebSDMPxSmdnjJPrDK3LF/leGJEpAQ/eMU0entG4ZH3Uyh2s
 3EnDpOdRjX56LAEixB4e5igXyS7wesCun4ytOnwndzW8p4gPIsypcJUEbVt84BfA
 HQaSWP35BFAn0JshJnFPmj4r4jV2EB8l630dVTOKdNSiIa3YjyB5nbzy+mMPFl4f
 8vcrHEZ6boEcRhgz0zFG0RfnDsjdbqKgFBXdRt0vYB/CG+EfmYaPoDXsb/8A7dtc
 8IQ9wLk2AqG0L8blZVS2kjFxNa/9lkDcMsAbfZmlORTQTF2WN2Jlbxri87vuBpRy
 8sqMUhgvHoffd/SIiDzJJIBjOH5/RhXLKhGzXQHI1vpZdU6ps9KIvohiycgx1mUQ
 lXXQwN5OWSHdUXZ7TFBIGXy9n32Ak/k5GCzCJSqvsMJDDdbycGVB+YCaKX6QK30+
 HAHrPy/FQ3FFvZWdsDMD5Pn4RkF4LYH/k4QZwqBFMs9+/Sdzwxc=
 =UpyL
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull Intel Trust Domain Extensions (TDX) updates from Dave Hansen:
 "Other than a minor fixup, the content here is to ensure that TDX
  guests never see virtualization exceptions (#VE's) that might be
  induced by the untrusted VMM.

  This is a highly desirable property. Without it, #VE exception
  handling would fall somewhere between NMIs, machine checks and total
  insanity. With it, #VE handling remains pretty mundane.

  Summary:

   - Fixup comment typo

   - Prevent unexpected #VE's from:
      - Hosts removing perfectly good guest mappings (SEPT_VE_DISABLE)
      - Excessive #VE notifications (NOTIFY_ENABLES) which are delivered
        via a #VE"

* tag 'x86_tdx_for_6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Do not corrupt frame-pointer in __tdx_hypercall()
  x86/tdx: Disable NOTIFY_ENABLES
  x86/tdx: Relax SEPT_VE_DISABLE check for debug TD
  x86/tdx: Use ReportFatalError to report missing SEPT_VE_DISABLE
  x86/tdx: Expand __tdx_hypercall() to handle more arguments
  x86/tdx: Refactor __tdx_hypercall() to allow pass down more arguments
  x86/tdx: Add more registers to struct tdx_hypercall_args
  x86/tdx: Fix typo in comment in __tdx_hypercall()
2023-02-25 09:11:30 -08:00
Linus Torvalds
3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds
06e1a81c48 A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon by Andy
 
 - Preparatory refactoring and cleanup work in the efivar layer by Johan,
   which is needed to accommodate the Snapdragon arm64 laptops that
   expose their EFI variable store via a TEE secure world API.
 
 - Enhancements to the EFI memory map handling so that Xen dom0 can
   safely access EFI configuration tables (Demi Marie)
 
 - Wire up the newly introduced IBT/BTI flag in the EFI memory attributes
   table, so that firmware that is generated with ENDBR/BTI landing pads
   will be mapped with enforcement enabled.
 
 - Clean up how we check and print the EFI revision exposed by the
   firmware.
 
 - Incorporate EFI memory attributes protocol definition contributed by
   Evgeniy and wire it up in the EFI zboot code. This ensures that these
   images can execute under new and stricter rules regarding the default
   memory permissions for EFI page allocations. (More work is in progress
   here)
 
 - CPER header cleanup by Dan Williams
 
 - Use a raw spinlock to protect the EFI runtime services stack on arm64
   to ensure the correct semantics under -rt. (Pierre)
 
 - EFI framebuffer quirk for Lenovo Ideapad by Darrell.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPzuwsACgkQw08iOZLZ
 jyS7dwwAm95DlDxFIQi4FmTm2mqJws9PyDrkfaAK1CoyqCgeOLQT2FkVolgr8jne
 pwpwCTXtYP8y0BZvdQEIjpAq/BHKaD3GJSPfl7lo+pnUu68PpsFWaV6EdT33KKfj
 QeF0MnUvrqUeTFI77D+S0ZW2zxdo9eCcahF3TPA52/bEiiDHWBF8Qm9VHeQGklik
 zoXA15ft3mgITybgjEA0ncGrVZiBMZrYoMvbdkeoedfw02GN/eaQn8d2iHBtTDEh
 3XNlo7ONX0v50cjt0yvwFEA0AKo0o7R1cj+ziKH/bc4KjzIiCbINhy7blroSq+5K
 YMlnPHuj8Nhv3I+MBdmn/nxRCQeQsE4RfRru04hfNfdcqjAuqwcBvRXvVnjWKZHl
 CmUYs+p/oqxrQ4BjiHfw0JKbXRsgbFI6o3FeeLH9kzI9IDUPpqu3Ma814FVok9Ai
 zbOCrJf5tEtg5tIavcUESEMBuHjEafqzh8c7j7AAqbaNjlihsqosDy9aYoarEi5M
 f/tLec86
 =+pOz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "A healthy mix of EFI contributions this time:

   - Performance tweaks for efifb earlycon (Andy)

   - Preparatory refactoring and cleanup work in the efivar layer, which
     is needed to accommodate the Snapdragon arm64 laptops that expose
     their EFI variable store via a TEE secure world API (Johan)

   - Enhancements to the EFI memory map handling so that Xen dom0 can
     safely access EFI configuration tables (Demi Marie)

   - Wire up the newly introduced IBT/BTI flag in the EFI memory
     attributes table, so that firmware that is generated with ENDBR/BTI
     landing pads will be mapped with enforcement enabled

   - Clean up how we check and print the EFI revision exposed by the
     firmware

   - Incorporate EFI memory attributes protocol definition and wire it
     up in the EFI zboot code (Evgeniy)

     This ensures that these images can execute under new and stricter
     rules regarding the default memory permissions for EFI page
     allocations (More work is in progress here)

   - CPER header cleanup (Dan Williams)

   - Use a raw spinlock to protect the EFI runtime services stack on
     arm64 to ensure the correct semantics under -rt (Pierre)

   - EFI framebuffer quirk for Lenovo Ideapad (Darrell)"

* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
  arm64: efi: Make efi_rt_lock a raw_spinlock
  efi: Add mixed-mode thunk recipe for GetMemoryAttributes
  efi: x86: Wire up IBT annotation in memory attributes table
  efi: arm64: Wire up BTI annotation in memory attributes table
  efi: Discover BTI support in runtime services regions
  efi/cper, cxl: Remove cxl_err.h
  efi: Use standard format for printing the EFI revision
  efi: Drop minimum EFI version check at boot
  efi: zboot: Use EFI protocol to remap code/data with the right attributes
  efi/libstub: Add memory attribute protocol definitions
  efi: efivars: prevent double registration
  efi: verify that variable services are supported
  efivarfs: always register filesystem
  efi: efivars: add efivars printk prefix
  efi: Warn if trying to reserve memory under Xen
  efi: Actually enable the ESRT under Xen
  efi: Apply allowlist to EFI configuration tables when running under Xen
  efi: xen: Implement memory descriptor lookup based on hypercall
  efi: memmap: Disregard bogus entries instead of returning them
  ...
2023-02-23 14:41:48 -08:00
Linus Torvalds
7dd86cf801 Livepatching changes for 6.3
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmP01jEACgkQUqAMR0iA
 lPIH1g/9G3CLt9+by3d0FiS5AsbK6vohZRzKxqCTyX9b2p2sLQuiTu8TodJ9IOur
 axpaOuPOIjQ253yfqrYL2YZHdfr/632nSTGsT18p7k8a9m6ghQ6cW+uq23Ro9W43
 uubjyvFMTeLBnkDTSJquciENdMtyPwyiTN0+ZvDOsAOKoBt7i3Rt7lcorjDuALwb
 2SQ80qCjYLy1r6mkQyy3IhLQVSeRiaqkR6IAdxR5EeXmbSi0YYh0Jxz5AzPMeDw6
 F1w7Whe6Vf8vf4JHVDqNazB3f+JH4JrmRvk6LxlGU33z9uJac4NIbboGDXzRP21h
 A0UGqwJimeoi8dji9Y3cXRIYRc+HqyqL0IadSHHLCou746/CJDBQN1EjNNBgZgu6
 cTELnRL9TsV9tWt9boqbMK0KfCFnOsGPMDAXXDH5G/ZUsvyDweGMNelgNTXURHFX
 f1cTfdDj2T/iG3XDZHf35/rSI56BDQJcJ1G1fQc3sl+G9jDGxX3PBJYNAUpvC6cy
 iewDeROcAAXmwUqaC8epJJfN2m9zJJpDgR1t4mZTnKkmRF090ZApyDA4M769kn0J
 ioHvE8Pe9+GjY8syGzq3EJRyQnfXgTbAyKfiTpGWefLPrgkQ9LfFQRQ1S21sy483
 Bz/KrNl1tMWFsSmngNYaiOZifjxVj8pYoIz1W2s0ore9sYu78YM=
 =xdQJ
 -----END PGP SIGNATURE-----

Merge tag 'livepatching-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

Pull livepatching updates from Petr Mladek:

 - Allow reloading a livepatched module by clearing livepatch-specific
   relocations in the livepatch module.

   Otherwise, the repeated load would fail on consistency checks.

* tag 'livepatching-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  livepatch,x86: Clear relocation targets on a module removal
  x86/module: remove unused code in __apply_relocate_add
2023-02-23 14:00:10 -08:00
Linus Torvalds
2b79eb73e2 probes updates for 6.3:
- Skip negative return code check for snprintf in eprobe.
 
 - Add recursive call test cases for kprobe unit test
 
 - Add 'char' type to probe events to show it as the character instead of value.
 
 - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols.
 
 - Fix kselftest to check filter on eprobe event correctly.
 
 - Add filter on eprobe to the README file in tracefs.
 
 - Fix optprobes to check whether there is 'under unoptimizing' optprobe when optimizing another kprobe correctly.
 
 - Fix optprobe to check whether there is 'under unoptimizing' optprobe when fetching the original instruction correctly.
 
 - Fix optprobe to free 'forcibly unoptimized' optprobe correctly.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmP0JdYACgkQ2/sHvwUr
 Pxt6sQf/TD9Kwqx3XG1tnLPev6yt2nuggUippHwWUFHlJtMyUaLV8aKFqByyEe+j
 tCQvrFIIJq242xg0Jac/MAf2exlWG9jsmVZPmvC1YzepOAbjXu2eBkIS7LsbeHjF
 JJypNnEceffWCpNoD6nlvR0xWXenqRbZJwdsGqo3u+fXnzTurEMY2GU2xOyv39tv
 S1uNLPANJxdMb/2iUsUE3hMbe82dqr8zPcApqWFtTBB6QPHI3B2SjuQHpQxwbTPl
 bzAl0yQkLSQXprVzT7xJ4xLnzbl1ljgJBci5aX8BFF+VD9oYkypdfYVczBH5VsP9
 E3eT9T9lRf4Q99EqxNy5uw7NqQXGQg==
 =CMPb
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull kprobes updates from Masami Hiramatsu:

 - Skip negative return code check for snprintf in eprobe

 - Add recursive call test cases for kprobe unit test

 - Add 'char' type to probe events to show it as the character instead
   of value

 - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols

 - Fix kselftest to check filter on eprobe event correctly

 - Add filter on eprobe to the README file in tracefs

 - Fix optprobes to check whether there is 'under unoptimizing' optprobe
   when optimizing another kprobe correctly

 - Fix optprobe to check whether there is 'under unoptimizing' optprobe
   when fetching the original instruction correctly

 - Fix optprobe to free 'forcibly unoptimized' optprobe correctly

* tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/eprobe: no need to check for negative ret value for snprintf
  test_kprobes: Add recursed kprobe test case
  tracing/probe: add a char type to show the character value of traced arguments
  selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols
  selftests/ftrace: Fix eprobe syntax test case to check filter support
  tracing/eprobe: Fix to add filter on eprobe description in README file
  x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range
  x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
  kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
2023-02-23 13:03:08 -08:00
Linus Torvalds
525445efac NMI diagnostics for v6.3
Add diagnostics to the x86 NMI handler to help detect NMI-handler bugs
 on the one hand and failing hardware on the other.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmPq3x0THHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jOwED/41rLVFORQqNpK5mitA2acqVzRmUG9J
 sUHkJCPmHVr4sEDwqi2u+iBqHMqm8COaQOKA65tPsHJKI1PPcIjBG371QPPYsdRl
 +qNq6oLCrD37Dgs7CmJPjIO+0P2Xb765GUmcNhR9aH9QnYGz2a3s7QfXV2WlFjq3
 1LJ6Z8euEQBb5IE1syp3HHYf3IP4Z88gQxcU4kgV16uADnW0IKSw8F7p9B/EjSnB
 IjIh8gkAbfqNh0VXpex/wzPkrXRbjcOr1s43YkoYS1t3ggIZc6MEGs1kTmXAjxo2
 4S4CAPKfh4Btlez9VVIMwCDb56fHG6I5wyP+jH51dhNNKiuLqnSyHU3kWU3GFiYn
 5Ix7BKtAtp/AzASrL1xildOYjN6gB2QdQijs+bvqzH8Rm8Nl1Yy1z6p6iqcJGe/q
 cerzulajs+/UG2XKwRTWw6I3km4WkueEHYjEzmer+olK/Akx4COYiqMivdY5HIwK
 M7cFVQz2EiHLP3fu2LWrOkNi/Dy96Vsuya0n3E8Ch7Xtdjez7QPAdWU7bLXY/OTd
 jCRdd1MDPz87XQLSmbCV42nJzP5nNryBfijS7swqq9qL/D152ycctxOpIHa6/fJH
 pze5nqRBxjwlhOatJds5mVbhtKwF01YV4wzcaHypCmBXXTz1zVj/hFSbzuUuFwEI
 06c2sVNqzez4tw==
 =MPMw
 -----END PGP SIGNATURE-----

Merge tag 'nmi.2023.02.14a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull x86 NMI diagnostics from Paul McKenney:
 "Add diagnostics to the x86 NMI handler to help detect NMI-handler bugs
  on the one hand and failing hardware on the other"

* tag 'nmi.2023.02.14a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  x86/nmi: Print reasons why backtrace NMIs are ignored
  x86/nmi: Accumulate NMI-progress evidence in exc_nmi()
2023-02-23 09:28:37 -08:00
Ingo Molnar
585a78c1f7 Merge branch 'linus' into objtool/core, to pick up Xen dependencies
Pick up dependencies - freshly merged upstream via xen-next - before applying
dependent objtool changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-02-23 09:16:39 +01:00
Linus Torvalds
36289a03bc This update includes the following changes:
API:
 
 - Use kmap_local instead of kmap_atomic.
 - Change request callback to take void pointer.
 - Print FIPS status in /proc/crypto (when enabled).
 
 Algorithms:
 
 - Add rfc4106/gcm support on arm64.
 - Add ARIA AVX2/512 support on x86.
 
 Drivers:
 
 - Add TRNG driver for StarFive SoC.
 - Delete ux500/hash driver (subsumed by stm32/hash).
 - Add zlib support in qat.
 - Add RSA support in aspeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmPzAiwACgkQxycdCkmx
 i6et8xAAoO3w5MZFGXMzWsYhfSZFdceXBEQfDR7JOCdHxpMIQhw0FLlb0uttFk6m
 SeWrdP9wiifBDoCmw7qffFJml8ZftPL/XeXjob2d9v7jKbPyw3lDSIdsNfN/5EEL
 oIc9915zwrgawvahPAa+PQ4Ue03qRjUyOcV42dpd1W3NYhzDVHoK5OUU+mEFYDvx
 Sgw/YUugKf0VXkVDFzG5049+CPcheyRZqclAo9jyl2eZiXujgUyV33nxRCtqIA+t
 7jlHKwi+6QzFHY0CX5BvShR8xyEuH5MLoU3H/jYGXnRb3nEpRYAEO4VZchIHqF0F
 Y6pKIKc6Q8OyIVY8RsjQY3hioCqYnQFZ5Xtc1zGtOYEitVLbkmItMG0mVn0XOfyt
 gJDi6gkEw5uPUbEQdI4R1xEgJ8eCckMsOJ+uRxqTm+uLqNDxPbsB9bohKniMogXV
 lDlVXjU23AA9VeKtqU8FvWjfgqsN47X4aoq1j4/4aI7X9F7P9FOP21TZloP7+ssj
 PFrzNaRXUrMEsvyS1wqPegIh987lj6WkH4hyU0wjzaIq4IQELidHsSXFS12iWIPH
 kTEoC/trAVoYSr0zXKWUCs4h/x0FztVNbjs4KiDP2FLXX1RzeVZ0WlaXZhryHr+n
 1+8yCuS6tVofAbSX0wNkZdf0x5+3CIBw4kqSIvjKDPYYEfIDaT0=
 =dMYe
 -----END PGP SIGNATURE-----

Merge tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto update from Herbert Xu:
 "API:
   - Use kmap_local instead of kmap_atomic
   - Change request callback to take void pointer
   - Print FIPS status in /proc/crypto (when enabled)

  Algorithms:
   - Add rfc4106/gcm support on arm64
   - Add ARIA AVX2/512 support on x86

  Drivers:
   - Add TRNG driver for StarFive SoC
   - Delete ux500/hash driver (subsumed by stm32/hash)
   - Add zlib support in qat
   - Add RSA support in aspeed"

* tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (156 commits)
  crypto: x86/aria-avx - Do not use avx2 instructions
  crypto: aspeed - Fix modular aspeed-acry
  crypto: hisilicon/qm - fix coding style issues
  crypto: hisilicon/qm - update comments to match function
  crypto: hisilicon/qm - change function names
  crypto: hisilicon/qm - use min() instead of min_t()
  crypto: hisilicon/qm - remove some unused defines
  crypto: proc - Print fips status
  crypto: crypto4xx - Call dma_unmap_page when done
  crypto: octeontx2 - Fix objects shared between several modules
  crypto: nx - Fix sparse warnings
  crypto: ecc - Silence sparse warning
  tls: Pass rec instead of aead_req into tls_encrypt_done
  crypto: api - Remove completion function scaffolding
  tls: Remove completion function scaffolding
  tipc: Remove completion function scaffolding
  net: ipv6: Remove completion function scaffolding
  net: ipv4: Remove completion function scaffolding
  net: macsec: Remove completion function scaffolding
  dm: Remove completion function scaffolding
  ...
2023-02-21 18:10:50 -08:00
Linus Torvalds
b8878e5a5c hyperv-next for v6.3.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmPzgDgTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXrc7CACfG4SSd8KkWU/y8Q66Irxdau0a3ETD
 KL4UNRKGIyKujufgFsme79O6xVSSsCNSay449wk20hqn8lnwbSRi9pUwmLn29hfd
 CMFleWIqgwGFfC1do5DRF1vrt1siuG/jVE07mWsEwuY2iHx/es+H7LiQKidhkndZ
 DhXRqoi7VYiJv5fRSumpkUJrMZiI96o9Mk09HUksdMwCn3+7RQEqHnlTH5KOozKF
 iMroDB72iNw5Na/USZwWL2EDRptENam3lFkPBeDPqNw0SbG4g65JGPR9DSa0Lkbq
 AGCJQkdU33mcYQG5MY7R4K1evufpOl/apqLW7h92j45Znr9ok6Vr2c1R
 =J1VT
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20230220' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - allow Linux to run as the nested root partition for Microsoft
   Hypervisor (Jinank Jain and Nuno Das Neves)

 - clean up the return type of callback functions (Dawei Li)

* tag 'hyperv-next-signed-20230220' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Fix hv_get/set_register for nested bringup
  Drivers: hv: Make remove callback of hyperv driver void returned
  Drivers: hv: Enable vmbus driver for nested root partition
  x86/hyperv: Add an interface to do nested hypercalls
  Drivers: hv: Setup synic registers in case of nested root partition
  x86/hyperv: Add support for detecting nested hypervisor
2023-02-21 16:59:23 -08:00
Linus Torvalds
877934769e - Cache the AMD debug registers in per-CPU variables to avoid MSR writes
where possible, when supporting a debug registers swap feature for
   SEV-ES guests
 
 - Add support for AMD's version of eIBRS called Automatic IBRS which is
   a set-and-forget control of indirect branch restriction speculation
   resources on privilege change
 
 - Add support for a new x86 instruction - LKGS - Load kernel GS which is
   part of the FRED infrastructure
 
 - Reset SPEC_CTRL upon init to accomodate use cases like kexec which
   rediscover
 
 - Other smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmP1RDIACgkQEsHwGGHe
 VUohBw//ZB9ZRqsrKdm6D9YaP2x4Zb+kqKqo6rjYeWaYqyPyCwDujPwh+pb3Oq1t
 aj62muDv1t/wEJc8mKNkfXkjEEtBVAOcpb5YIpKreoEvNKyevol83Ih0u5iJcTRE
 E5qf8HDS8b/JZrcazJJLl6WQmQNH5RiKSu5bbCpRhoeOcyo5pRYR5MztK9vNmAQk
 GMdwHsUSU+jN8uiE4HnpaOb/luhgFindRwZVTpdjJegQWLABS8cl3CKeTv4+PW45
 isvv37XnQP248wsptIEVRHeG6g3g/HtvwRx7DikUw06QwUyUK7H9hJssOoSP8TL9
 u4psRwfWnJ1OxU6klL+s0Ii+pjQ97wXmK/oqK7QkdUwhWqR/mQAW2e9kWHAngyDn
 A6mKbzSM6HFAeSXQpB9cMb6uvYRD44SngDFe3WXtEK8jiiQ70ikUm4E28I5KJOPg
 s+RyioHk0NFRHYSOOBqNG1NKz6ED7L3GbgbbzxkgMh21AAyI3X351t+PtGoLV5ew
 eqOsM7lbg9Scg1LvPk1JcoALS8USWqgar397rz9qGUs+OkPWBtEBCmTdMz/Eb+2t
 g/WHdLS5/ajSs5gNhT99W3DeqZMPDEkgBRSeyBBmY3CUD3gBL2wXEktRXv504zBR
 RC4oyUPX3c9E2ib6GATLE3kBLbcz9hTWbMxF+X3lLJvTVd/Qc2o=
 =v/ZC
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Cache the AMD debug registers in per-CPU variables to avoid MSR
   writes where possible, when supporting a debug registers swap feature
   for SEV-ES guests

 - Add support for AMD's version of eIBRS called Automatic IBRS which is
   a set-and-forget control of indirect branch restriction speculation
   resources on privilege change

 - Add support for a new x86 instruction - LKGS - Load kernel GS which
   is part of the FRED infrastructure

 - Reset SPEC_CTRL upon init to accomodate use cases like kexec which
   rediscover

 - Other smaller fixes and cleanups

* tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd: Cache debug register values in percpu variables
  KVM: x86: Propagate the AMD Automatic IBRS feature to the guest
  x86/cpu: Support AMD Automatic IBRS
  x86/cpu, kvm: Add the SMM_CTL MSR not present feature
  x86/cpu, kvm: Add the Null Selector Clears Base feature
  x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf
  x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature
  KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code
  x86/cpu, kvm: Add support for CPUID_80000021_EAX
  x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h>
  x86/gsseg: Use the LKGS instruction if available for load_gs_index()
  x86/gsseg: Move load_gs_index() to its own new header file
  x86/gsseg: Make asm_load_gs_index() take an u16
  x86/opcode: Add the LKGS instruction to x86-opcode-map
  x86/cpufeature: Add the CPU feature bit for LKGS
  x86/bugs: Reset speculation control settings on init
  x86/cpu: Remove redundant extern x86_read_arch_cap_msr()
2023-02-21 14:51:40 -08:00
Linus Torvalds
2504ba8b01 Power management updates for 6.3-rc1
- Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
    Karny, Arnd Bergmann, Bagas Sanjaya).
 
  - Drop the custom cpufreq driver for loongson1 that is not necessary
    any more and the corresponding cpufreq platform device (Keguang
    Zhang).
 
  - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
    entries (Paul E. McKenney).
 
  - Enable thermal cooling for Tegra194 (Yi-Wei Wang).
 
  - Register module device table and add missing compatibles for
    cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss).
 
  - Various dt binding updates for qcom-cpufreq-nvmem and opp-v2-kryo-cpu
    (Christian Marangi).
 
  - Make kobj_type structure in the cpufreq core constant (Thomas
    Weißschuh).
 
  - Make cpufreq_unregister_driver() return void (Uwe Kleine-König).
 
  - Make the TEO cpuidle governor check CPU utilization in order to refine
   idle state selection (Kajetan Puchalski).
 
  - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
    cpuidle driver is selected and replace a default_idle() call in that
    driver with arch_cpu_idle() to allow MWAIT to be used (Li RongQing).
 
  - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
    Bityutskiy).
 
  - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
    avoid randconfig build failures (Arnd Bergmann).
 
  - Make kobj_type structures used in the cpuidle sysfs interface
    constant (Thomas Weißschuh).
 
  - Make the cpuidle driver registration code update microsecond values
    of idle state parameters in accordance with their nanosecond values
    if they are provided (Rafael Wysocki).
 
  - Make the PSCI cpuidle driver prevent topology CPUs from being
    suspended on PREEMPT_RT (Krzysztof Kozlowski).
 
  - Document that pm_runtime_force_suspend() cannot be used with
    DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald).
 
  - Add EXPORT macros for exporting PM functions from drivers (Richard
    Fitzgerald).
 
  - Remove /** from non-kernel-doc comments in hibernation code (Randy
    Dunlap).
 
  - Fix possible name leak in powercap_register_zone() (Yang Yingliang).
 
  - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
    capping driver (Zhang Rui).
 
  - Modify the idle_inject power capping facility to support 100% idle
    injection (Srinivas Pandruvada).
 
  - Fix large time windows handling in the intel_rapl power capping
    driver (Zhang Rui).
 
  - Fix memory leaks with using debugfs_lookup() in the generic PM
    domains and Energy Model code (Greg Kroah-Hartman).
 
  - Add missing 'cache-unified' property in the example for kryo OPP
    bindings (Rob Herring).
 
  - Fix error checking in opp_migrate_dentry() (Qi Zheng).
 
  - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
    Dybcio).
 
  - Modify some power management utilities to use the canonical ftrace
    path (Ross Zwisler).
 
  - Correct spelling problems for Documentation/power/ as reported by
    codespell (Randy Dunlap).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPuJfMSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx/5kQAJNOVImLEPLerLP8xufw30//LuDU5Gi0
 STsyDOMql/I2MpkeqeCcgrSbpy6NlEglOvg16gfpQ3qqTCLF9ypENxs9E5BGGvW0
 aEdCzvaoqmvi9PCr/jmj0EPP70/U+rIX5m/k0QdjLh9x0aLoAEe3uRJTfR9QVqXf
 I7JX0N9kjKi7YxpA5DlkHrS7J7GPPiWlesJ3p4wXuHMo3jf+6fgkoPFt8yRrGWeh
 AHzGT2BLrsy7aAUjGZB65Qx9q3fnSXMmXOjmn0Xh2njQah+zRZDwrNzwoY2HTLL/
 KQ6/Ww16USYRZtCS1fmGwAj9I+ddq6AOvhPCMn0vLXXmKVAMUrVVWnQS/0+vpm9y
 suUMK9Tndkgxd1vjby2246ThJn27uDd/ERFan4ouQo2j22uICY+SDo3osj2hMXka
 wq4zthXkY8KgjZ+MuXnZxPhcOvo8KRvfxAU0fy5efQnSkbtwY9UlMvjPBMBHm/RA
 21/6kjQNtq5vMmI37oC8DH+oPrRQ7sUKuY7HNqwO9P3QNKWVmNe7cF5UtXXxME7Q
 ULvP1d+u+TNNdHFLryPwCSzBO34wQEccdRZBjalZ8tBe6JiDWUFHC3giSURZSuzZ
 GDvzVaNX6PkgToyv4inBTB8lTp6pAuUjaWNvNJzVvUXiEKHB0ihzg5vpJW5NdwlH
 15Tn8cjH7pp0
 =lZLx
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add EPP support to the AMD P-state cpufreq driver, add support
  for new platforms to the Intel RAPL power capping driver, intel_idle
  and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194,
  drop the custom cpufreq driver for loongson1 that is not necessary any
  more (and the corresponding cpufreq platform device), fix assorted
  issues and clean up code.

  Specifics:

   - Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
     Karny, Arnd Bergmann, Bagas Sanjaya)

   - Drop the custom cpufreq driver for loongson1 that is not necessary
     any more and the corresponding cpufreq platform device (Keguang
     Zhang)

   - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
     entries (Paul E. McKenney)

   - Enable thermal cooling for Tegra194 (Yi-Wei Wang)

   - Register module device table and add missing compatibles for
     cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss)

   - Various dt binding updates for qcom-cpufreq-nvmem and
     opp-v2-kryo-cpu (Christian Marangi)

   - Make kobj_type structure in the cpufreq core constant (Thomas
     Weißschuh)

   - Make cpufreq_unregister_driver() return void (Uwe Kleine-König)

   - Make the TEO cpuidle governor check CPU utilization in order to
     refine idle state selection (Kajetan Puchalski)

   - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
     cpuidle driver is selected and replace a default_idle() call in
     that driver with arch_cpu_idle() to allow MWAIT to be used (Li
     RongQing)

   - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
     Bityutskiy)

   - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
     avoid randconfig build failures (Arnd Bergmann)

   - Make kobj_type structures used in the cpuidle sysfs interface
     constant (Thomas Weißschuh)

   - Make the cpuidle driver registration code update microsecond values
     of idle state parameters in accordance with their nanosecond values
     if they are provided (Rafael Wysocki)

   - Make the PSCI cpuidle driver prevent topology CPUs from being
     suspended on PREEMPT_RT (Krzysztof Kozlowski)

   - Document that pm_runtime_force_suspend() cannot be used with
     DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald)

   - Add EXPORT macros for exporting PM functions from drivers (Richard
     Fitzgerald)

   - Remove /** from non-kernel-doc comments in hibernation code (Randy
     Dunlap)

   - Fix possible name leak in powercap_register_zone() (Yang Yingliang)

   - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
     capping driver (Zhang Rui)

   - Modify the idle_inject power capping facility to support 100% idle
     injection (Srinivas Pandruvada)

   - Fix large time windows handling in the intel_rapl power capping
     driver (Zhang Rui)

   - Fix memory leaks with using debugfs_lookup() in the generic PM
     domains and Energy Model code (Greg Kroah-Hartman)

   - Add missing 'cache-unified' property in the example for kryo OPP
     bindings (Rob Herring)

   - Fix error checking in opp_migrate_dentry() (Qi Zheng)

   - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
     Dybcio)

   - Modify some power management utilities to use the canonical ftrace
     path (Ross Zwisler)

   - Correct spelling problems for Documentation/power/ as reported by
     codespell (Randy Dunlap)"

* tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
  Documentation: amd-pstate: disambiguate user space sections
  cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ
  dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables
  PM: Add EXPORT macros for exporting PM functions
  cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
  MIPS: loongson32: Drop obsolete cpufreq platform device
  powercap: intel_rapl: Fix handling for large time window
  cpuidle: driver: Update microsecond values of state parameters as needed
  cpuidle: sysfs: make kobj_type structures constant
  cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
  PM: EM: fix memory leak with using debugfs_lookup()
  PM: domains: fix memory leak with using debugfs_lookup()
  cpufreq: Make kobj_type structure constant
  cpufreq: davinci: Fix clk use after free
  cpufreq: amd-pstate: avoid uninitialized variable use
  cpufreq: Make cpufreq_unregister_driver() return void
  OPP: fix error checking in opp_migrate_dentry()
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible
  ...
2023-02-21 12:13:58 -08:00
Linus Torvalds
9e58df973d Updates for the interrupt subsystem:
Core:
 
     - Move the interrupt affinity spreading mechanism into lib/group_cpus
       so it can be used for similar spreading requirements, e.g. in the
       block multi-queue code.
 
       This also contains a first usecase in the block multi-queue code which
       Jens asked to take along with the librarization.
 
     - Improve irqdomain locking to close a number race conditions which
       can be observed with massive parallel device driver probing.
 
     - Enforce and document the semantics of disable_irq() which cannot be
       invoked safely from non-sleepable context.
 
     - Move the IPI multiplexing code from the Apple AIC driver into the
       core. so it can be reused by RISCV.
 
   Drivers:
 
     - Plug OF node refcounting leaks in various drivers.
 
     - Correctly mark level triggered interrupts in the Broadcom L2 drivers.
 
     - The usual small fixes and improvements.
 
     - No new drivers for the record!
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzUSkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoY3DEAC9E4yLO7VxxTrs/KrAVCgL3SnHVXQU
 nE42uFbQwpCILuNmnqP3uvTHLCsXZkbuBaZEbxLBxC2iyU6+31N1Is+e6cClGMjK
 kX6U9g9EqiRCdX3fgJiEU16fCgE8D1AEg+7XKLjeasQhCfKQGGtCtE9/Gmg/Ji92
 gcEY/bjvm1hcoNo9dh/vR4k0k63fb13716RLScozUkS/XYVlu+LrrG349gD2WEA9
 lh1twDkXvZTWkiYKWAkLorxcNyKhcnJxJw8zEIGVF5b6pCCudK8gXjBbMD5abC7W
 xano6B8F455eSKNsi2TWyW47ZHUkC60sqCNDgI2MBTsI7D72UpAJoDfe0VjbMoaH
 RQJnrGsUQbviBUen+LEet7nWZBQJRKZHOVtYEjA8ndB3PJUXKKcLeODdw11odyjR
 bgZk+0wnowMArIaoLfeItF2oSpfSzLVxh2i8Aeus5tBesvhVCOi4LABRBKGCWvMj
 cpSlMhZ4znMnr5j5lOGpcAjKFlWVh1HmF70Y2deGZi5xC8EXFL/VsB7rH5LEEEuF
 7I8CO8M1mXeOTJoCchCbuAYgZyuk1DIhKUyOiYQZblaPNGcVGvCIN31SFBRT9h/8
 e0VwSvVL756GhotUp/LjgTdG7MoKspWqRG00+q84SsDalsKGXMW7zmHc+1NgGN/C
 Yxio1Jlly9Rwyw==
 =+pu3
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core:

   - Move the interrupt affinity spreading mechanism into lib/group_cpus
     so it can be used for similar spreading requirements, e.g. in the
     block multi-queue code

     This also contains a first usecase in the block multi-queue code
     which Jens asked to take along with the librarization

   - Improve irqdomain locking to close a number race conditions which
     can be observed with massive parallel device driver probing

   - Enforce and document the semantics of disable_irq() which cannot be
     invoked safely from non-sleepable context

   - Move the IPI multiplexing code from the Apple AIC driver into the
     core, so it can be reused by RISCV

  Drivers:

   - Plug OF node refcounting leaks in various drivers

   - Correctly mark level triggered interrupts in the Broadcom L2
     drivers

   - The usual small fixes and improvements

   - No new drivers for the record!"

* tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts
  irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts
  irqdomain: Switch to per-domain locking
  irqchip/mvebu-odmi: Use irq_domain_create_hierarchy()
  irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy()
  irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy()
  irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
  irqchip/gic-v2m: Use irq_domain_create_hierarchy()
  irqchip/alpine-msi: Use irq_domain_add_hierarchy()
  x86/uv: Use irq_domain_create_hierarchy()
  x86/ioapic: Use irq_domain_create_hierarchy()
  irqdomain: Clean up irq_domain_push/pop_irq()
  irqdomain: Drop leftover brackets
  irqdomain: Drop dead domain-name assignment
  irqdomain: Drop revmap mutex
  irqdomain: Fix domain registration race
  irqdomain: Fix mapping-creation race
  irqdomain: Refactor __irq_domain_alloc_irqs()
  irqdomain: Look for existing mapping only once
  irqdomain: Drop bogus fwspec-mapping error handling
  ...
2023-02-21 10:03:48 -08:00
Linus Torvalds
560b803067 Updates for timekeeping, timers and clockevent/source drivers:
Core:
 
     - Yet another round of improvements to make the clocksource watchdog
       more robust:
 
       	 - Relax the clocksource-watchdog skew criteria to match the NTP
            criteria.
 
 	 - Temporarily skip the watchdog when high memory latencies are
 	   detected which can lead to false-positives.
 
 	 - Provide an option to enable TSC skew detection even on systems
            where TSC is marked as reliable.
 
       Sigh!
 
     - Initialize the restart block in the nanosleep syscalls to be directed
       to the no restart function instead of doing a partial setup on entry.
 
       This prevents an erroneous restart_syscall() invocation from
       corrupting user space data. While such a situation is clearly a user
       space bug, preventing this is a correctness issue and caters to the
       least suprise principle.
 
     - Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
       to align it with the nanosleep semantics.
 
   Drivers:
 
     - The obligatory new driver bindings for Mediatek, Rockchip and RISC-V
       variants.
 
     - Add support for the C3STOP misfeature to the RISC-V timer to handle
       the case where the timer stops in deeper idle state.
 
     - Set up a static key in the RISC-V timer correctly before first use.
 
     - The usual small improvements and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzV+cTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYlDEACMrjN2F6qeiOW94t4nQ3qP1M9AMSgO
 OihC04XuM14/3tEviu/cUOd60wYcUQ/kfI5C+IL35ezeP2w9lnuKqeFpG7aDOa33
 5F3isDPamJdXZEZs44CW15brR6dqDlEi5acKee/TtFV9mN6xNhzxM64IaFqecPmW
 P+BTwunB8xwquY8RzsHXor/GOGb6mqWQIPoHEPnywTDe/xQYWt0Exzi7ch6HQr5Z
 ZzHG6X4h6UTNimjay6L4qsRQWILmPIg4Z5IlycWMQ8qDFM0lbnIJqkG4JwceolI6
 aRQyLe3NQFcPYgq3ue+SNm4RckYn4NbAa1zFm0d5VDgKp4xW1sxvtkxOJuxjaOw2
 /rLkHkmyuVvCeTMAySfxrwnszAoM505CHC6CEYc1xELbeCkROFUaymtVyNFnnTru
 V/Jt/T2Gyx6tOrafX7u+djUjv9figddRpNbskVZvEi3Ztq4MQ069nK3oSUqtP5vO
 INApNg4lq6s8aGqVE+Kp9+CKwGqZqI4MdxQMNMAmCRLPon6apActVawbj18qO/wS
 qblQ0cbF8a16itlQ3V68qmhcPh6EZOuq8II4etNq6U0ulV9712WfMbat3z53LG94
 QNkAmZ3/wui93I+Q2NPxhf5ybJFQZhR0SOtVO6xIdTgOntkODwzzGu9UapfD8mLb
 k5BpWnH8CoUgiw==
 =I67j
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Updates for timekeeping, timers and clockevent/source drivers:

  Core:

   - Yet another round of improvements to make the clocksource watchdog
     more robust:

       - Relax the clocksource-watchdog skew criteria to match the NTP
         criteria.

       - Temporarily skip the watchdog when high memory latencies are
         detected which can lead to false-positives.

       - Provide an option to enable TSC skew detection even on systems
         where TSC is marked as reliable.

     Sigh!

   - Initialize the restart block in the nanosleep syscalls to be
     directed to the no restart function instead of doing a partial
     setup on entry.

     This prevents an erroneous restart_syscall() invocation from
     corrupting user space data. While such a situation is clearly a
     user space bug, preventing this is a correctness issue and caters
     to the least suprise principle.

   - Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
     to align it with the nanosleep semantics.

  Drivers:

   - The obligatory new driver bindings for Mediatek, Rockchip and
     RISC-V variants.

   - Add support for the C3STOP misfeature to the RISC-V timer to handle
     the case where the timer stops in deeper idle state.

   - Set up a static key in the RISC-V timer correctly before first use.

   - The usual small improvements and fixes all over the place"

* tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ
  clocksource/drivers/em_sti: Mark driver as non-removable
  clocksource/drivers/sh_tmu: Mark driver as non-removable
  clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
  clocksource/drivers/timer-microchip-pit64b: Add delay timer
  clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM
  dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx
  dt-bindings: timer: mediatek,mtk-timer: add MT8365
  clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
  clocksource/drivers/sh_cmt: Mark driver as non-removable
  clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST
  clocksource/drivers/riscv: Increase the clock source rating
  clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT
  dt-bindings: timer: Add bindings for the RISC-V timer device
  RISC-V: time: initialize hrtimer based broadcast clock event device
  dt-bindings: timer: rk-timer: Add rktimer for rv1126
  time/debug: Fix memory leak with using debugfs_lookup()
  clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
  posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()
  clocksource: Verify HPET and PMTMR when TSC unverified
  ...
2023-02-21 09:45:13 -08:00
Linus Torvalds
056612fd41 Miscellaneous cleanups in X86:
- Correct the common copy and pasted mishandling of kstrtobool() in the
     strict_sas_size() setup function.
 
   - Make recalibrate_cpu_khz() an GPL only export.
 
   - Check TSC feature before doing anything else which avoids pointless
     code execution if TSC is not available.
 
   - Remove or fixup stale and misleading comments.
 
   - Remove unused or pointelessly duplicated variables.
 
   - Spelling and typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzWVkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodbEEAC7XjF7BkZ9nhmAMWgwThKbHhNb3QLk
 oO0pcbbff2o7bhcP55Mb6R52G1a/kEvpFg/iF6+4/GcsbxHtLhILtG0PGOgmg28p
 UcdXt8EvkMv+bICr3gYtnwqB50stc/1s8JhHVItaDIXbRjNOrkBHQzgcPx0qfC8w
 INPhlqShSehGtzmaoP4AWMfVtBlqKXlCADpQGd8hcTojlNRAJwzBF9mZbWGdgopW
 qa3yoa+s6kL3M2lXvwREuz/1JnmtKx7cav9ldWlSno2dgDbw1ioDZg9tJhARJo//
 toF9Y9h12ASDBaqVoyVJgKmDQddsdxkBTrMCKQX8yRH21pEX9eeHM/re9lNtUbhl
 4/0juvAKFyviatWAHHCPYGyuPGrSsrsj5sea2fNURnkc6TZ4pHHArDytpAOhYqh2
 8CPpT2Qn/C6CqUsc9Z2fbDZBAOTKR/IF93NzE+HcjRjDyjm30ImeKEbwMHfEa7lX
 V3/wvXH9+WIzvVC3EqbvVqkArG1YQTqQHBZIl9+Za2iEeLz8DGEWCH0b7w8/m2Cg
 0mzUOzjJviy6ShO0B8fZK8LuCoDbPAmL4etfjp1t3q+EsuG5pYOrYtrnZ76XWYD7
 TWxlBHhrYuqUBERpN7SCJgixqXgWVUe2/hZwstQqbmvH/jOe9TGgxrIu2MmvB1kK
 5+ul2d2uwbd4cA==
 =zlRy
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull miscellaneous x86 cleanups from Thomas Gleixner:

 - Correct the common copy and pasted mishandling of kstrtobool() in the
   strict_sas_size() setup function

 - Make recalibrate_cpu_khz() an GPL only export

 - Check TSC feature before doing anything else which avoids pointless
   code execution if TSC is not available

 - Remove or fixup stale and misleading comments

 - Remove unused or pointelessly duplicated variables

 - Spelling and typo fixes

* tag 'x86-cleanups-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hotplug: Remove incorrect comment about mwait_play_dead()
  x86/tsc: Do feature check as the very first thing
  x86/tsc: Make recalibrate_cpu_khz() export GPL only
  x86/cacheinfo: Remove unused trace variable
  x86/Kconfig: Fix spellos & punctuation
  x86/signal: Fix the value returned by strict_sas_size()
  x86/cpu: Remove misleading comment
  x86/setup: Move duplicate boot_cpu_data definition out of the ifdeffery
  x86/boot/e820: Fix typo in e820.c comment
2023-02-21 09:24:08 -08:00
Linus Torvalds
3f0b0903fd - Add getcpu support for the 32-bit version of the vDSO
- Some smaller fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzusMACgkQEsHwGGHe
 VUojfQ/7BOqXI0XsHTIwilF12w2bLQl1PeI4bSk6VY+iAN2YmQkq2qvNUgwt62e5
 5Z95cDuCZ8sx6L3mDIoOgWBN9zdLbxNhezLFDykb+6as67PMaww9l9R6n3JoC2qm
 ELso5JZnWvIZ7Cu7RRm9IzbSj93JAlN3Aypexe61NywMyge9CAvCiOEhvW+lkYSD
 lhZqgbm5WAB14F1CeqFyC8kVvUez1GH9Dunbe7ozk7LqRfTRlf5YPH88iE4UKzdg
 JXmbcHB2K4aQzfIW66OFPnl/4Cl+XxS/i5CR2NtWlB4/ANZBPoUr7QAS239OpC6u
 3uwv/qPmMe7p/lYMaGXSUpzD/MOCHP1HPN8/CWgdyK+Mdmctpqr0FYh1qXXm1Nuu
 v0SE3btHVIB5UfvImoOlV/RfCx3+TqxzqUU2erc0iD5VxlRfrqJEwJdJHOgRGxFU
 vflRxMQOshhyI7+Q7et0S0QlgK4HvGEHmBUwBsUbfyptIxbqpOLK8INC6N8qwGKZ
 gTuBxLNZ5yRE/NeOVe0cL2ooelfOlg7GKUI+gZbfzzQw8M5WZW9qEDS9y2wIuGey
 wBFJNzjKXSkrTxc6Hd136N7DX7PlMjiJhXP42s+7rXJguPvgk1oVyEuaX540+xX4
 HphXRC2QW0o0hCeFgP11Ai4oq/vRW1RFvdDimJjveJAv19bQNv0=
 =Wg/8
 -----END PGP SIGNATURE-----

Merge tag 'x86_vdso_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 vdso updates from Borislav Petkov:

 - Add getcpu support for the 32-bit version of the vDSO

 - Some smaller fixes

* tag 'x86_vdso_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Fix -Wmissing-prototypes warnings
  x86/vdso: Fake 32bit VDSO build on 64bit compile for vgetcpu
  selftests: Emit a warning if getcpu() is missing on 32bit
  x86/vdso: Provide getcpu for x86-32.
  x86/cpu: Provide the full setup for getcpu() on x86-32
  x86/vdso: Move VDSO image init to vdso2c generated code
2023-02-21 08:54:41 -08:00
Linus Torvalds
efebca0ba9 - Fix mixed steppings support on AMD which got broken somewhere along
the way
 
 - Improve revision reporting
 
 - Properly check CPUID capabilities after late microcode upgrade to
   avoid false positives
 
 - A garden variety of other small fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzs/sACgkQEsHwGGHe
 VUonGw//RgIVCZIkuytiZesFsAXD3sn4Mmji7WoRZvu3XooA0idOo+7ujBeNcJGw
 aFGjf0K5b7eAfiREqTXPlFSymPid7aN+7cPJD7iURJ5UEoDXca1vVh9Jeq7lhvRL
 M5CErroStya17vFqU5pz50EcUwGcao/N3wY+0rERk8Rkqu864PgI+KahS2V2D2PU
 XolD4CH/+JZMAJPaTG5dSkSf3gJevW/owZ+F2oqKKYNlFsQ6aYd/JZYwIQ2X7W9T
 HdVYzeASZs0tfBEPOsZUSobmIlqUU/MziefDyUuTYbO1DPJ525787RLpRyubhG9k
 b/7DWUNymR56B8AUq/RV6YE/Dw2YpcrP3Eu0pSbD5xUfEy8eFCcIr+cUL5M9+I4W
 iCZtYYGypNbDQf5NRkubtQu8xIwEbjOZNv444kMMBimZGzt/WDEGMHqgRbKpJ2MQ
 F2HoBnNVC5O2BddS0ErTpQDWK8B/c0+S4L1ZTkbh/y9CNhzITZ10sLAEGQawvBEk
 PBYeCQ98m72ijLcecz0vvVO81UHGicqyY86OqeqRx0FbGO9cZJg+8BqyTLxsRTSW
 OgxtB/moURdanWAAOdxZ91yUw40CYWn7qXhYxilZDtGgkFT6sUdA126uMxLJ8u2v
 WiOHmj/ymszHhkJiahcSMaD8gRFnLQ59jNatHNa/5Jyw0mi330g=
 =z8rd
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode loader updates from Borislav Petkov:

 - Fix mixed steppings support on AMD which got broken somewhere along
   the way

 - Improve revision reporting

 - Properly check CPUID capabilities after late microcode upgrade to
   avoid false positives

 - A garden variety of other small fixes

* tag 'x86_microcode_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/core: Return an error only when necessary
  x86/microcode/AMD: Fix mixed steppings support
  x86/microcode/AMD: Add a @cpu parameter to the reloading functions
  x86/microcode/amd: Remove load_microcode_amd()'s bsp parameter
  x86/microcode: Allow only "1" as a late reload trigger value
  x86/microcode/intel: Print old and new revision during early boot
  x86/microcode/intel: Pass the microcode revision to print_ucode_info() directly
  x86/microcode: Adjust late loading result reporting message
  x86/microcode: Check CPU capabilities after late microcode update correctly
  x86/microcode: Add a parameter to microcode_check() to store CPU capabilities
  x86/microcode: Use the DEVICE_ATTR_RO() macro
  x86/microcode/AMD: Handle multiple glued containers properly
  x86/microcode/AMD: Rename a couple of functions
2023-02-21 08:47:36 -08:00
Linus Torvalds
aa8c3db40a - Add support for a new AMD feature called slow memory bandwidth
allocation.  Its goal is to control resource allocation in external slow
 memory which is connected to the machine like for example through CXL devices,
 accelerators etc
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzmf4ACgkQEsHwGGHe
 VUppKg//Tq+lHaMYO8aTvk4jgqbR9RVXJwPbtEOp2C0kSLs5QxBms/o21IXnxJ07
 tdbIGOrfJGlbzSWP8ywysRRQwpKlwltWUVAjMOFqEfzEURLL042qtHZ8nxGKSGrc
 IZFJLNTMyx1Zyjc7e9A/hANCOoQFoPHT8zHf1CNNo1LtzgHzNZG6kggLHh5tRKSz
 Xi7wFbYBtmttsyIA/iAQjYAU0O9MnmdnktUb7XdPSFtTIZ3Nyw90We4gwYueEPzD
 S/rQHKr8V7ROZMHXQ/BWpVWdcxGoHD8acUSVq8j20KW3W9/H8KL9TRVakvnf0aRW
 g0efxKXdTjTRO49GgD7FUL8x1JdAOXeZwQYDzKPqW/GRESRdpOvsaMwcLDCEpIXK
 PmEOVReklokJF0btFqaVYkY6wGE2KLKmp97g/RffuHdIeIomwI9lTpy9kyQsKakc
 yJ+VsE85BlBEVkHNt49qFClO1L98G3IgZTTt6//EGv0EJl8pELfsddsbjG5uXun+
 xFhr2i7gllQcV4B4HSFFdYRBLvZYnTfKlNR7Hs9pRJT7V28Jv2GURiCHBm4sRv9O
 k3FX7sxytH2syBBwU1NNrMRMo+KgjVZurJwiHpTRbb39K6uCgLk/wbXfWh2SovW1
 BRItz2T6LFu4bo6WIhakx31pNmH94P8vC0acO8LHECVji7qvXFM=
 =8hmj
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

 - Add support for a new AMD feature called slow memory bandwidth
   allocation. Its goal is to control resource allocation in external
   slow memory which is connected to the machine like for example
   through CXL devices, accelerators etc

* tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix a silly -Wunused-but-set-variable warning
  Documentation/x86: Update resctrl.rst for new features
  x86/resctrl: Add interface to write mbm_local_bytes_config
  x86/resctrl: Add interface to write mbm_total_bytes_config
  x86/resctrl: Add interface to read mbm_local_bytes_config
  x86/resctrl: Add interface to read mbm_total_bytes_config
  x86/resctrl: Support monitor configuration
  x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
  x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  x86/resctrl: Include new features in command line options
  x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
2023-02-21 08:38:45 -08:00
Linus Torvalds
1adce1b944 - Teach the static_call patching infrastructure to handle conditional
tall calls properly which can be static calls too
 
 - Add proper struct alt_instr.flags which controls different aspects of
   insn patching behavior
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzkAcACgkQEsHwGGHe
 VUpA3A//bALDnLosUQe/m8CTcj1AU12Y59fGInoLl5xArM3liOhRYWj9yu8+2r5N
 j+89yjWoiaogu/9B18pV0+VnBrFUbALZmHxec0+4VAWyMqYuTbqN28Nj/2cZiHdP
 I/9mwGu40I/Ira021D132EcdoZI7O/6bFlh+kEoAqxc7rsqhD5KKRMlrTTdEPVjH
 aRbWIuzqDWNhbi7IwfgEBIPLiQZQKmIdH5hsFMD6yOMIdMRL6CwKmXVg2M1Zp8ta
 5v2Aqgvu2nZYCIteP4GQck2AlUBlGR4ClGQQRII+U1o8c9dM0hfcIDgsbSYKvgrY
 ANm9MQJaF7MRomk9y4E0EHPZAJEMLKUgiQXMxWpER3O1GOKgZPlyzNSe0gRCiL6O
 NZWZ2cGtdhQMrko4EapE3GNryM1HoCY/QCuD1fCYwoc/pRBDhCxsSqjWUd8G/6wn
 s3S/mD0v3nmTrxHg8sWvqhKshsd7B9V0LSkTpHktz3soFIJGXTxbrtty0CIS61pM
 4iUMYB9SjunoEmdwC7+gCN3sCiRpRqfmIybqXdsW3d37QI+FM5aSBPw51xULubfY
 Wsxo8SkH+IMYxXmfbQuUppsGZ+1QHzU08+MrlvNxGHUjS1aMnsrFF/fbfbbCnWvX
 7hcyBPT0jxc9RPMNeKDm4ItapMMGxGdv6XiRmM8LiUtVG2fMaW4=
 =XUqC
 -----END PGP SIGNATURE-----

Merge tag 'x86_alternatives_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 asm alternatives updates from Borislav Petkov:

 - Teach the static_call patching infrastructure to handle conditional
   tall calls properly which can be static calls too

 - Add proper struct alt_instr.flags which controls different aspects of
   insn patching behavior

* tag 'x86_alternatives_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/static_call: Add support for Jcc tail-calls
  x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions
  x86/alternatives: Introduce int3_emulate_jcc()
  x86/alternatives: Add alt_instr.flags
2023-02-21 08:27:47 -08:00
Linus Torvalds
0246725d73 - Add support for reporting more bits of the physical address on error,
on newer AMD CPUs
 
 - Mask out bits which don't belong to the address of the error being
   reported
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzUAEACgkQEsHwGGHe
 VUr22RAAh7fi3s8sDP4B2WBe1LPKZystZamxlLObBG2eLT7g0YmSKV12+bHCGf/B
 nGqz9iy+e/T1Khxv0gdEyuENwzuitXgiEOYgB4u70HimWy5422ZCzn1EiOMFtyST
 g0ehOR+tU84YwMVR40ui3spI1DHgeVPqVBLHBARZ1OAaA58N8eVREC6MqJAeAzIU
 +VYiBbn69quECTuU1P7yaT8NDnbm5G6pA1dhKLc7vLl9QWzoW1yWLLcp+oGFN6B8
 rcGDKEDK1OYtdHScRCfhFrznkeYP6SVnSt4wlAgX+HVGPoMpvq8nJygxCWdE0yjd
 aQGhdcVJkQlSqm1iJUv0MK9nkolqXVVSVTurpHunAq7ctul6Qm/X+fsfwBgSIXXn
 Gdj3in374MLWCz/xGqeBS8IiiPxGxJA9s350jyk02LK6Np6sXeuc4PpR66+6FAKQ
 Ypen+uWJ6oBof04bW7DBK0R14atA8EpOOLUrrGIsSkNSEIjLaCipMZOpRCbOw76N
 bXcdnKKsaEDjKtHClvx/vZXklfzWk0OgF8qtY0nGF+khvDAi3pQaIIlCehf0Qemh
 6j00TqIYBCXa0kuKktdPzVJSM7A7TZ5ftboa1IPhE+GYrFFee/VJ3yfgqz102FWI
 RJsY8JXt+EP3VMSOQYqQ5KzcLBJ2uDiRYtgUo4P1CITNpRfZEMc=
 =e9v9
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Add support for reporting more bits of the physical address on error,
   on newer AMD CPUs

 - Mask out bits which don't belong to the address of the error being
   reported

* tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Mask out non-address bits from machine check bank
  x86/mce: Add support for Extended Physical Address MCA changes
  x86/mce: Define a function to extract ErrorAddr from MCA_ADDR
2023-02-21 08:04:51 -08:00
Linus Torvalds
89f5349e06 Changes in this cycle:
- Simplify add_rtc_cmos()
 
  - Use strscpy() in the mcelog code
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzdU8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gS6w/8DFsYUIK4CGEtG0MYH+DUVsz/zyo4pejM
 yukMpKxXMJKi7pZ6k+0he9LOawa2WeK/hzzJh0zP3EzJtF1RR831XSsGRNPu3OTQ
 Q6mAuFdlLC1EAgJs6muqhIF5/bxfti6pFpZBN8Dwi/9VPpUQwayOgaJXysiRA2aN
 r/czEgp5fGgExC4QLE6HIPIzhsyjUlngH2F4xNeO13cS6S0T3Ns1xcSJs9jJfwMW
 2Vx0FCSo/cj8Hdr10NGEJNrqCzN9yvUuZuQ4utp6yf03zWyP1L4c2MB0E+aw6d1c
 ygpYWm400vlEFHfT8x9UVnybR5wABG4GP8JNtSBHASk7rNPKl5cKfOIPHzdsCnFO
 bHh1Xc4gduFVB8nUilUlcvoseum8GaYOqhi6ov2hwq+n47uam9H5QlCWn7OyLBbW
 88Ajg+wqNxG/R3mhyPslXDMr/dccQ9mcZSxbPDX14LpG8bWAjvM3yP43N8w6myVs
 1Br8Lsbf8lm5jiJ5hv2GBGQa9eDA0qLBFnkvBTe9zx7AV/K/KnTUXlK2DdeZVfO8
 eqgyTrXANyyJqTC/s2GAOLFwySRZFx9EHw6Kg8cmjoG9o8VCpXljQ8qj71ZOtFlo
 xBDlmg4Y6czRZC2kQEFC1kA30nLw+2UAnOEwr11JgGod9K+DqFtLKnVVEsrtxmGH
 E7ccV3QRc/Q=
 =Nuzs
 -----END PGP SIGNATURE-----

Merge tag 'x86-platform-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 platform update from Ingo Molnar:

 - Simplify add_rtc_cmos()

 - Use strscpy() in the mcelog code

* tag 'x86-platform-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/dev-mcelog: use strscpy() to instead of strncpy()
  x86/rtc: Simplify PNP ids check
2023-02-20 19:04:54 -08:00
Linus Torvalds
2e0ddb34e5 Updates in this cycle:
- Replace zero-length array in struct xregs_state with flexible-array member,
    to help the enabling of stricter compiler checks.
 
  - Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzc+cRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gJ2RAAni5xUIObYPeEH4gEL0hXaLDXXYM+iyvK
 Fp4kI/9HLdgEOYjIFZG1H6nOlSDOSwFHg2mhpxXRujByAuohGT8cU2ax6Zh3ya1g
 qJy5UepKa+Twvoifts4mmClJrK+tVCWfKthoqCc5C6D2lGroCplvp5hE2/KIs/9d
 QcnpKaFAapYNat1rzLtBLQUovxVdd8yow5+pTEJhUnhF6EYMIn4M5cPncINPdx98
 AnQi1v6pWjhgK5gP7G1maz7J0khr0nNzd17Dtx+iknLX4cVxa/Hx+wQtsom80Xwt
 aeIwPYiAYTFs0ZjVrEtwwtV0ub29viHeS2x1v8nkLSA0LJAVrIX/1yuIMbUuFqS8
 PW/mSKkA0phhtVbe6SYtUyXv8zLysWenFre6nD7PNpWFmjOyNUSJcw/clUYwBYVX
 LbtwaOwI+shbd/BEGwAPC4235XOl1wq/g9gvV4bt3vTi0pE8tke0oHVkIG6bCAK1
 +CB5pnAHfZRNC8H1CkyurbmHPUOMaS/PK4LpvPK1/G03Fe4s42SUNUd7DWCevxq3
 agvfOmiH/XvoLdJejYmqeLAz700gBntYkNvBg2P1f30EVj/wGvVDEen8MBGMaIDe
 5bef3pIUfP5Ye3DdzFRcjI5OjbijNHxQ7MnIqqW3Uuz2ujdq8ME88l8eUxJE7JZN
 KZv7r+asPEM=
 =az43
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Ingo Molnar:

 - Replace zero-length array in struct xregs_state with flexible-array
   member, to help the enabling of stricter compiler checks.

 - Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads.

* tag 'x86-fpu-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads
  x86/fpu: Replace zero-length array in struct xregs_state with flexible-array member
2023-02-20 18:50:02 -08:00
Linus Torvalds
8a68bd3e9f Changes in this cycle:
- Clean up the signal frame layout tests
 
  - Suppress KMSAN false positive reports in arch_within_stack_frames()
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzctERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gYdxAArL7aQy1GW51Wo+HjQXm6ONalHpb9hgjz
 rTUp+eFPhlPq6dtPhcNQoT3eyEzlgoIHw5b2gDByLzU+kKROaGW6T/ao3jxKtR2+
 YVj+P/9jCx/Kak+3Fw2rAZjpRIJn/iWEaPVIXVGj9wAml8aeJWy14tOTUSaS0fIB
 EGAtpXaPCg8/DtU2q4mSwcW0n2bwqFs0jYJRaSxuZo16woQffkMpAF97P/q1TKNs
 t5Bvx40Hio9kijYuZmSrcygAtOxlL+rcHXsJofPmlOlt/N5yopszrwfKUGzwN1b8
 BTOPac5gGbhfY/pKC+rB8nf2bDdYEdqBW9hYhy7AGiAAHOy4xoULxQXxicANRU6+
 2m1q/jmYah282AOOUAy8kfx9DmEZbXmZcefkAJfjhobJIwYXxa4tKayDowUph2LP
 c3ronMZoLq1tCZ/rCVeOfijOu+cDMJy2gI6lLBxFVxsIHQigWfCEWcCp0z+10FAJ
 puUGIZJnA+Gchf1zrIEcoAYVw3bkFZ/Mx6cE8nogLO9XT76S5R94Z/sE70gzX439
 gbsE1GnffyFwF2o2ClzoCobHs50JSMQGPbI7hJy+MyaRqoiyMaegNE1+unUAHR4O
 dJifzhr8LyBr4maO8AbeNtreIlW67CpSEUM/TY+KithkRtkZOCUJO9iDsggKH96n
 E3+WC9zqUgg=
 =T5pv
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Ingo Molnar:

 - Clean up the signal frame layout tests

 - Suppress KMSAN false positive reports in arch_within_stack_frames()

* tag 'x86-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Suppress KMSAN reports in arch_within_stack_frames()
  x86/signal/compat: Move sigaction_compat_abi() to signal_64.c
  x86/signal: Move siginfo field tests
2023-02-20 18:40:45 -08:00
Linus Torvalds
35011c67c8 Changes in this cycle:
- Robustify/fix calling startup_{32,64}() from the decompressor code,
    and removing x86 quirk from scripts/head-object-list.txt as
    a result.
 
  - Do not register processors that cannot be onlined for x2APIC
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzcNsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gFjhAAqxnVl1X413IK9sd4C56wWdoLlRo9uGvO
 HtAYK1SRzibTOrFn+ByBhugYzYPyxIx634rM6hyp4nkVEnbgCXQ+Qc1xOwjyW8fh
 gxR3FCxsQiqajg7/1DOOSoMc/rc3adU73RHCWTjcHV/Zo7KEtvVa6AFvMTd1xzt9
 eMPqi7wsPflbdUV9wvf6cKkFPe3Nm3P1hOlUDHGmYZkDw30N8UlZmxvegwrBFDdV
 SpiJ0ZLV90NGJ6k6O3XSd4pVDxMn9DlYd0v/0r+YAT56hiRhefSKR2/jQntutZqp
 YlyZYjvwUjwEgOdUWPPRbndWWEfFsE2XQQclr4L+ZLQ/Gm8jTsT2b/IvXBmF4FzV
 0kzjNdhkPObx3X6UQZ47r6J3x8SWA9qZ6JH+uqCd6w/UW1KIiMBZ2kuIXvJn6eSr
 xFLabjPPeOeRXFpiQJjIZ31m7i3JlQbIsfb8IIxI1D55nEkNywjk9VqlLEVw23qD
 p93l0+ehpnZ2YCjV0kts/EXMikSmVZorA5wkTzEmG5ER+2BuIDin+wuGPawXrKew
 QCa2X7GoVmxf81Rcz7f/E+JnYcSTQ6AQzFkOxe3zb97bnRsckM/87buC0GktcPjW
 C8iy3yZzEIhRj2ilKEZLl7jIK59B4jReUKJx+vsxk2k2p5fuRdMkMtPfIZDBwVHQ
 PzRZGSDY4FI=
 =p3z1
 -----END PGP SIGNATURE-----

Merge tag 'x86-boot-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Ingo Molnar:

 - Robustify/fix calling startup_{32,64}() from the decompressor code,
   and removing x86 quirk from scripts/head-object-list.txt as a result.

 - Do not register processors that cannot be onlined for x2APIC

* tag 'x86-boot-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC
  scripts/head-object-list: Remove x86 from the list
  x86/boot: Robustify calling startup_{32,64}() from the decompressor code
2023-02-20 18:32:55 -08:00
Linus Torvalds
1f2d9ffc7a Scheduler updates in this cycle are:
- Improve the scalability of the CFS bandwidth unthrottling logic
    with large number of CPUs.
 
  - Fix & rework various cpuidle routines, simplify interaction with
    the generic scheduler code. Add __cpuidle methods as noinstr to
    objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks.
 
  - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS,
    to query previously issued registrations.
 
  - Limit scheduler slice duration to the sysctl_sched_latency period,
    to improve scheduling granularity with a large number of SCHED_IDLE
    tasks.
 
  - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
    but also enable them to prevent a cascade of followup problems and
    repeat warnings.
 
  - Fix the rescheduling logic in prio_changed_dl().
 
  - Micro-optimize cpufreq and sched-util methods.
 
  - Micro-optimize ttwu_runnable()
 
  - Micro-optimize the idle-scanning in update_numa_stats(),
    select_idle_capacity() and steal_cookie_task().
 
  - Update the RSEQ code & self-tests
 
  - Constify various scheduler methods
 
  - Remove unused methods
 
  - Refine __init tags
 
  - Documentation updates
 
  - ... Misc other cleanups, fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzbJwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIvA//ZcEaB8Z6ChLRQjM+bsaudKJu3pdLQbPK
 iYbP8Da+LsAfxbEfYuGV3m+jIp0LlBOtsI/EezxQrXV+V7FvNyAX9Y00eEu/zlj8
 7Jn3LMy/DBYTwH7LwVdcU0MyIVI8ZPc6WNnkx0LOtGZn8n+qfHPSDzcP3CW+a5AV
 UvllPYpYyEmsX0Eby7CF4Ue8mSmbViw/xR3rNr8ZSve0c25XzKabw8O9kE3jiHxP
 d/zERJoAYeDyYUEuZqhfn5dTlB4an4IjNEkAfRE5SQ09RA8Gkxsa5Ar8gob9e9M1
 eQsdd4/bdhnrkM8L5qDZczqmgCTZ2bukQrxkBXhRDhLgoFxwAn77b+2ZjmIW3Lae
 AyGqRcDSg1q2oxaYm5ZiuO/t26aDOZu9vPHyHRDGt95EGbZlrp+GgeePyfCigJYz
 UmPdZAAcHdSymnnnlcvdG37WVvaVkpgWZzd8LbtBi23QR+Zc4WQ2IlgnUS5WKNNf
 VOBcAcP6E1IslDotZDQCc2dPFFQoQQEssVooyUc5oMytm7BsvxXLOeHG+Ncu/8uc
 H+U8Qn8jnqTxJbC5hkWQIJlhVKCq2FJrHxxySYTKROfUNcDgCmxboFeAcXTCIU1K
 T0S+sdoTS/CvtLklRkG0j6B8N4N98mOd9cFwUV3tX+/gMLMep3hCQs5L76JagvC5
 skkQXoONNaM=
 =l1nN
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Improve the scalability of the CFS bandwidth unthrottling logic with
   large number of CPUs.

 - Fix & rework various cpuidle routines, simplify interaction with the
   generic scheduler code. Add __cpuidle methods as noinstr to objtool's
   noinstr detection and fix boatloads of cpuidle bugs & quirks.

 - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
   previously issued registrations.

 - Limit scheduler slice duration to the sysctl_sched_latency period, to
   improve scheduling granularity with a large number of SCHED_IDLE
   tasks.

 - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
   but also enable them to prevent a cascade of followup problems and
   repeat warnings.

 - Fix the rescheduling logic in prio_changed_dl().

 - Micro-optimize cpufreq and sched-util methods.

 - Micro-optimize ttwu_runnable()

 - Micro-optimize the idle-scanning in update_numa_stats(),
   select_idle_capacity() and steal_cookie_task().

 - Update the RSEQ code & self-tests

 - Constify various scheduler methods

 - Remove unused methods

 - Refine __init tags

 - Documentation updates

 - Misc other cleanups, fixes

* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  sched/rt: pick_next_rt_entity(): check list_entry
  sched/deadline: Add more reschedule cases to prio_changed_dl()
  sched/fair: sanitize vruntime of entity being placed
  sched/fair: Remove capacity inversion detection
  sched/fair: unlink misfit task from cpu overutilized
  objtool: mem*() are not uaccess safe
  cpuidle: Fix poll_idle() noinstr annotation
  sched/clock: Make local_clock() noinstr
  sched/clock/x86: Mark sched_clock() noinstr
  x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
  x86/atomics: Always inline arch_atomic64*()
  cpuidle: tracing, preempt: Squash _rcuidle tracing
  cpuidle: tracing: Warn about !rcu_is_watching()
  cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
  cpuidle: drivers: firmware: psci: Dont instrument suspend code
  KVM: selftests: Fix build of rseq test
  exit: Detect and fix irq disabled state in oops
  cpuidle, arm64: Fix the ARM64 cpuidle logic
  cpuidle: mvebu: Fix duplicate flags assignment
  sched/fair: Limit sched slice duration
  ...
2023-02-20 17:41:08 -08:00
Linus Torvalds
a2f0e7eee1 The latest perf updates in this cycle are:
- Optimize perf_sample_data layout
  - Prepare sample data handling for BPF integration
  - Update the x86 PMU driver for Intel Meteor Lake
  - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
    discovery breakage
  - Fix the x86 Zhaoxin PMU driver
  - Cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzaHgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jYQg/+KRfobCevMQlZVnz09T3SsJ4ahJ587BL6
 g2C6kobyUNfeChpFVroBkTR+yCb6Mq4xGr2nda9+2E978BYu9eanpx/u/bXNQ6NU
 6YhLwgRrlFXonYn07kFfUJeELZ0W+zpPvymEN1KhTQWcrgXDfXRt2VfMwNsVxGRF
 ZRyCWK+UOzSMU22FtW3I/xVLBB0vio9Y6wRC5QOpDVW5YtGwQGust7GJ53JPK43J
 m2soJvWORauT+v0aqc7ggOtKd6pahVoXrDrbktxtq9N0ZGI+PubVCGevex++cXm/
 B3QSf6VcMMuU6pfzxiEwRa8Whrc3XFeSDEfvMjC5v3becGNkdNBnGOJzYprwgRZJ
 irb6/dSrv5P2lj6WphsO1Wzcm7EoWh8M7DVOMh/13Y/oODRdOrv48112Don9UURC
 EPyvzAzizqdwdDopUmfiqUwuAXqb8uPZqCgmlz/NJkVz1/ijlfrmLgeDuf0vI7Aq
 HznzzRwjFHzyCH7D+rtonFh3JDaqgaouY76tpC5yTtzKbZPlFT8kzeCvqkTMnGgH
 czZnSNc/kBup0HDkNSlthK+TyrMXWKeVa8KQSY1E0NJHO4IBBCMzZywSoAaeofQK
 hqfQyofX9XHmuHhCA4yIfv1XkZGlBTxpPAyDdHjgs9iJTsodSYMs8ESY08eW8DXn
 Ld/35O6SylM=
 =ztUT
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:

 - Optimize perf_sample_data layout

 - Prepare sample data handling for BPF integration

 - Update the x86 PMU driver for Intel Meteor Lake

 - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
   discovery breakage

 - Fix the x86 Zhaoxin PMU driver

 - Cleanups

* tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  perf/x86/intel/uncore: Add Meteor Lake support
  x86/perf/zhaoxin: Add stepping check for ZXC
  perf/x86/intel/ds: Fix the conversion from TSC to perf time
  perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
  perf/x86/uncore: Add a quirk for UPI on SPR
  perf/x86/uncore: Ignore broken units in discovery table
  perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
  perf/x86/uncore: Factor out uncore_device_to_die()
  perf/core: Call perf_prepare_sample() before running BPF
  perf/core: Introduce perf_prepare_header()
  perf/core: Do not pass header for sample ID init
  perf/core: Set data->sample_flags in perf_prepare_sample()
  perf/core: Add perf_sample_save_brstack() helper
  perf/core: Add perf_sample_save_raw_data() helper
  perf/core: Add perf_sample_save_callchain() helper
  perf/core: Save the dynamic parts of sample data size
  x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation
  perf/core: Change the layout of perf_sample_data
  perf/x86/msr: Add Meteor Lake support
  perf/x86/cstate: Add Meteor Lake support
  ...
2023-02-20 17:29:55 -08:00
Linus Torvalds
6e649d0856 Updates for this cycle were:
- rwsem micro-optimizations
  - spinlock micro-optimizations
  - cleanups, simplifications
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzZkURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gBvA/6A5RMmSdOIVojmiwUYvZInA/Kpm2wuW0q
 bQWW9maLb96JMpj3FB5Xs5U993WiF0Gt9aGHoND9V2wOYbnv01ElCKKgsw7zLnXb
 c++txpmD+HoUGp94H8T2nA3szPLR7OpPpLmfjTWHKeWQRTStJobTTqi5jVTUZT37
 92MZ2tVzapQJq5VESk0C+0FBFDobh0gTX8hwkEj83ubXK4rC071/gJD4JHZt4nWN
 Up9YGNoNvw+ns7upo2C1XJ4H4ucFoCXT2smH4Oh0gk8Cfs6oP1k5H8J5aJQ+23fT
 EOWzkk0vJdpukNXI1+4G4KMwCO6zv+xVxXpEBizEeTgKWwbJpgBeGrisheAUyMHT
 bwfztsn+NQET11NsccmtRzspscUT42Nc+FUW0KeR2LiBKZhuD6l1Tac3w2HolycA
 2YjMQx3ATOEnMFgv4jGlldlasIAnYj0qitw6wCGqkJSvrC3au/LfcBHn45SxkBWc
 KZV1Oj26aH1hDxYSLyZRmFEvf/46D9CHmv8ReuFbmM6FIYwL+go+Odw/MHMFAbZR
 aP9YR4e94p6WmaNwMqzozP+wN67E4TME2vG6+T1n/szKDogoBlcn/wl053pkcHa/
 CsjELY82/CRRrDgWSnSKUZEFvnnBujyEiSz7pTZCzdBTMc/EcxK5CHOSyN23x+LI
 TvvxFn7KM/o=
 =yTly
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - rwsem micro-optimizations

 - spinlock micro-optimizations

 - cleanups, simplifications

* tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  vduse: Remove include of rwlock.h
  locking/lockdep: Remove lockdep_init_map_crosslock.
  x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock()
  x86/PAT: Use try_cmpxchg() in set_page_memtype()
  locking/rwsem: Disable preemption in all down_write*() and up_write() code paths
  locking/rwsem: Disable preemption in all down_read*() and up_read() code paths
  locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  locking/qspinlock: Micro-optimize pending state waiting for unlock
2023-02-20 17:18:23 -08:00
Yang Jihong
f1c97a1b4e x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range
When arch_prepare_optimized_kprobe calculating jump destination address,
it copies original instructions from jmp-optimized kprobe (see
__recover_optprobed_insn), and calculated based on length of original
instruction.

arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when
checking whether jmp-optimized kprobe exists.
As a result, setup_detour_execution may jump to a range that has been
overwritten by jump destination address, resulting in an inval opcode error.

For example, assume that register two kprobes whose addresses are
<func+9> and <func+11> in "func" function.
The original code of "func" function is as follows:

   0xffffffff816cb5e9 <+9>:     push   %r12
   0xffffffff816cb5eb <+11>:    xor    %r12d,%r12d
   0xffffffff816cb5ee <+14>:    test   %rdi,%rdi
   0xffffffff816cb5f1 <+17>:    setne  %r12b
   0xffffffff816cb5f5 <+21>:    push   %rbp

1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1.
  After the optimization, "func" code changes to:

   0xffffffff816cc079 <+9>:     push   %r12
   0xffffffff816cc07b <+11>:    jmp    0xffffffffa0210000
   0xffffffff816cc080 <+16>:    incl   0xf(%rcx)
   0xffffffff816cc083 <+19>:    xchg   %eax,%ebp
   0xffffffff816cc084 <+20>:    (bad)
   0xffffffff816cc085 <+21>:    push   %rbp

Now op1->flags == KPROBE_FLAG_OPTIMATED;

2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2.

register_kprobe(kp2)
  register_aggr_kprobe
    alloc_aggr_kprobe
      __prepare_optimized_kprobe
        arch_prepare_optimized_kprobe
          __recover_optprobed_insn    // copy original bytes from kp1->optinsn.copied_insn,
                                      // jump address = <func+14>

3. disable kp1:

disable_kprobe(kp1)
  __disable_kprobe
    ...
    if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
      ret = disarm_kprobe(orig_p, true)       // add op1 in unoptimizing_list, not unoptimized
      orig_p->flags |= KPROBE_FLAG_DISABLED;  // op1->flags ==  KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED
    ...

4. unregister kp2
__unregister_kprobe_top
  ...
  if (!kprobe_disabled(ap) && !kprobes_all_disarmed) {
    optimize_kprobe(op)
      ...
      if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return
        return;
      p->kp.flags |= KPROBE_FLAG_OPTIMIZED;   //  now op2 has KPROBE_FLAG_OPTIMIZED
  }

"func" code now is:

   0xffffffff816cc079 <+9>:     int3
   0xffffffff816cc07a <+10>:    push   %rsp
   0xffffffff816cc07b <+11>:    jmp    0xffffffffa0210000
   0xffffffff816cc080 <+16>:    incl   0xf(%rcx)
   0xffffffff816cc083 <+19>:    xchg   %eax,%ebp
   0xffffffff816cc084 <+20>:    (bad)
   0xffffffff816cc085 <+21>:    push   %rbp

5. if call "func", int3 handler call setup_detour_execution:

  if (p->flags & KPROBE_FLAG_OPTIMIZED) {
    ...
    regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
    ...
  }

The code for the destination address is

   0xffffffffa021072c:  push   %r12
   0xffffffffa021072e:  xor    %r12d,%r12d
   0xffffffffa0210731:  jmp    0xffffffff816cb5ee <func+14>

However, <func+14> is not a valid start instruction address. As a result, an error occurs.

Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/

Fixes: f66c0447cc ("kprobes: Set unoptimized flag after unoptimizing code")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:49:16 +09:00
Yang Jihong
868a6fc0ca x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
Since the following commit:

  commit f66c0447cc ("kprobes: Set unoptimized flag after unoptimizing code")

modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe
may be in the optimizing or unoptimizing state when op.kp->flags
has KPROBE_FLAG_OPTIMIZED and op->list is not empty.

The __recover_optprobed_insn check logic is incorrect, a kprobe in the
unoptimizing state may be incorrectly determined as unoptimizing.
As a result, incorrect instructions are copied.

The optprobe_queued_unopt function needs to be exported for invoking in
arch directory.

Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/

Fixes: f66c0447cc ("kprobes: Set unoptimized flag after unoptimizing code")
Cc: stable@vger.kernel.org
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:49:16 +09:00
Nuno Das Neves
b14033a3e6 x86/hyperv: Fix hv_get/set_register for nested bringup
hv_get_nested_reg only translates SINT0, resulting in the wrong sint
being registered by nested vmbus.

Fix the issue with new utility function hv_is_sint_reg.

While at it, improve clarity of hv_set_non_nested_register and hv_is_synic_reg.

Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Jinank Jain <jinankjain@linux.microsoft.com>
Link: https://lore.kernel.org/r/1675980172-6851-1-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-02-16 14:32:37 +00:00
Paolo Bonzini
33436335e9 KVM/riscv changes for 6.3
- Fix wrong usage of PGDIR_SIZE to check page sizes
 - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
 - Redirect illegal instruction traps to guest
 - SBI PMU support for guest
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmPifFIACgkQrUjsVaLH
 LAcEyxAAinMBaBhiPmwWZQvcCzh/UFmJo8BQCwAPuwoc/a4ZGAR7ylzd0oJilP8M
 wSgX6Ad8XF+CEW2VpxW9nwyi41N25ep1Lrf8vOaWy9L9QNUo0t15WrCIbXT2p399
 HrK9fz7HHKKIMsJy+rYb9EepdmMf55xtr1Y/EjyvhoDQbrEMlKsAODYz/SUoriQG
 Tn3cCYBzLdvzDzu0xXM9v+nsetWXdajK/v4je+mE3NQceXhePAO4oVWP4IpnoROd
 ZQm3evvVdf0WtKG9curxwMB7jjBqDBFrcLYl0qHGa7pi2o5PzVM7esgaV47KwetH
 IgA/Mrf1IfzpgM7VYDDax5wUHlKj63KisqU0J8rU3PUloQXaWqv7+ho51t9GzZ/i
 9x4uyO/evVntgyTw6HCbqmQJDgEtJiG1ydrR/ydBMYHLnh7LPY2UpKgcqmirtbkK
 1/DYDp84vikQ5VW1hc8IACdoBShh9Moh4xsEStzkTrIeHcZCjtORXUh8UIPZ0Mu2
 7Mnkktu9I55SLwA3rwH/EYT1ISrOV1G+q3wfqgeLpn8YUWwCIiqWQ5Ur0/WSMJse
 uJ3HedZDzj9T4n4khX+mKEYh6joAafQZag+4TID2lRSwd0S/mpeC22hYrViMdDmq
 yhE+JNin/sz4AVaHNzGwfqk2NC2RFl9aRn2X0xTwyBubif9pKMQ=
 =spUL
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.3

- Fix wrong usage of PGDIR_SIZE to check page sizes
- Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
2023-02-15 12:33:28 -05:00
Paolo Bonzini
27b025ebb0 KVM VMX changes for 6.3:
- Handle NMI VM-Exits before leaving the noinstr region
 
  - A few trivial cleanups in the VM-Enter flows
 
  - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
    EPTP switching (or any other VM function) for L1
 
  - Fix a crash when using eVMCS's enlighted MSR bitmaps
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmPsL4USHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5fvQP/jOhCos/8VrNPJ5vtVkELSgbefbLFPbR
 ThCefVekbnbqkJ4PdxLFOw/nzK+NREnoet02lvLXIeBt243qMeDWbNuGncZ+rqqX
 ohJ6ALIS7ag2egG2ZllM4KTV1jWnm7bG6C8QsJwrFuG7wHe69lPYNnv34OD4l8QO
 PR7reXBl7iyqQehF0rCqkYvtpv1QI+VlVd3ODCg/0vaZX4AOPZI6X1hX2+myUfTy
 LI1oQ88SQ0ptoH5o7xBtNQw7Zknm/SP82Djzpg/1Wpr3cZv4g/1vblNcxpU/D+3I
 Cp8xM42c82GIX1Zk6uYfsoCyQZ7Bd38llbgcLM5Fi0F2IuKJKoN1iuH77RJ1Juqf
 v+SkRmFAr5Uf3UYVp4ck4FxTRH58ooDPeIv2jdaguyE9NgRfbVsghn6QDC+gkZMx
 z4xdwCBXgsAAkhJwOLqDy98ZrQZoy1kxlmG2jLIe5RvOzI4WQDUJkJAd+u0dBD/W
 U7PRqbqfJFtvvKN784BTa4CO8m+KwaNz4Xrwcg4YtBnKBgDU/cxFUZnnSEHdi9Pr
 tVf462dfReK3slK3rkPw5HlacFdQsgfX6+VX1eeFKY+gE88TXoZks/X7ma24MXEw
 hOu6VZMat1NjkfcgUcMffBVZlejK7C6oXTeVmqVKsx6hUs+zzqxpQ05AT4TMbAUI
 gPTHJyy9ByHO
 =WXGG
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-vmx-6.3' of https://github.com/kvm-x86/linux into HEAD

KVM VMX changes for 6.3:

 - Handle NMI VM-Exits before leaving the noinstr region

 - A few trivial cleanups in the VM-Enter flows

 - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
   EPTP switching (or any other VM function) for L1

 - Fix a crash when using eVMCS's enlighted MSR bitmaps
2023-02-15 12:23:19 -05:00
Rafael J. Wysocki
7e71a13353 Merge branches 'pm-cpuidle', 'pm-core' and 'pm-sleep'
Merge cpuidle updates, PM core updates and changes related to system
sleep handling for 6.3-rc1:

 - Make the TEO cpuidle governor check CPU utilization in order to refine
   idle state selection (Kajetan Puchalski).

 - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
   cpuidle driver is selected and replace a default_idle() call in that
   driver with arch_cpu_idle() which allows MWAIT to be used (Li
   RongQing).

 - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
   Bityutskiy).

 - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
   avoid randconfig build failures (Arnd Bergmann).

 - Make kobj_type structures used in the cpuidle sysfs interface
   constant (Thomas Weißschuh).

 - Make the cpuidle driver registration code update microsecond values
   of idle state parameters in accordance with their nanosecond values
   if they are provided (Rafael Wysocki).

 - Make the PSCI cpuidle driver prevent topology CPUs from being
   suspended on PREEMPT_RT (Krzysztof Kozlowski).

 - Document that pm_runtime_force_suspend() cannot be used with
   DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald).

 - Add EXPORT macros for exporting PM functions from drivers (Richard
   Fitzgerald).

 - Drop "select SRCU" from system sleep Kconfig (Paul E. McKenney).

 - Remove /** from non-kernel-doc comments in hibernation code (Randy
   Dunlap).

* pm-cpuidle:
  cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
  cpuidle: driver: Update microsecond values of state parameters as needed
  cpuidle: sysfs: make kobj_type structures constant
  cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
  intel_idle: add Emerald Rapids Xeon support
  cpuidle-haltpoll: Replace default_idle() with arch_cpu_idle()
  cpuidle-haltpoll: select haltpoll governor
  cpuidle: teo: Introduce util-awareness
  cpuidle: teo: Optionally skip polling states in teo_find_shallower_state()

* pm-core:
  PM: Add EXPORT macros for exporting PM functions
  PM: runtime: Document that force_suspend() is incompatible with SMART_SUSPEND

* pm-sleep:
  PM: sleep: Remove "select SRCU"
  PM: hibernate: swap: don't use /** for non-kernel-doc comments
2023-02-15 15:59:48 +01:00
Srivatsa S. Bhat (VMware)
fcb3a81d22 x86/hotplug: Remove incorrect comment about mwait_play_dead()
The comment that says mwait_play_dead() returns only on failure is a bit
misleading because mwait_play_dead() could actually return for valid
reasons (such as mwait not being supported by the platform) that do not
indicate a failure of the CPU offline operation. So, remove the comment.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230128003751.141317-1-srivatsa@csail.mit.edu
2023-02-14 23:44:34 +01:00
Linus Torvalds
82eac0c830 Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
 transitions out of C0 state, the other thread gets access to twice as many
 entries in the RSB, but unfortunately the predictions of the now-halted
 logical processor are not purged.  Therefore, the executing processor
 could speculatively execute from locations that the now-halted processor
 had trained the RSB on.
 
 The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
 when context switching to the idle thread. However, KVM allows a VMM to
 prevent exiting guest mode when transitioning out of C0 using the
 KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change this
 behavior. To mitigate the cross-thread return address predictions bug,
 a VMM must not be allowed to override the default behavior to intercept
 C0 transitions.
 
 These patches introduce a KVM module parameter that, if set, will prevent
 the user from disabling the HLT, MWAIT and CSTATE exits.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPrvAAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMQAgf8CirL+yrngzQ+39UTFQgXj3IS1UyR
 o2mF39w3bVlQhLNf5MBHF965FUeOV/8A16x73hJGjhiiuisphtQWS/6xKR6uYbq7
 0Qi821skqN6XpRsWTWqFHMsdY+n0skr8QeXG4k/GJu7Ghb3tqs4eTGgnf2WBfI8/
 K1UgTmjd9+ikM5gKZoVLpcqZnti0gx3lM+cvZGdfrIUaXB+i+hNd2NfRTiGsTOiK
 fX7vZtLvOeje2TPoKLhzekTbh8kTU07HRWID9aVXT8bLy6Zd6tg2CHlv11noKpwv
 DFVV+RsJ1SiAQYSwT+4IvWfIG4oq4onBQ972g2a27pP2cxF+38GXzt4NQw==
 =xscg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Certain AMD processors are vulnerable to a cross-thread return address
  predictions bug. When running in SMT mode and one of the sibling
  threads transitions out of C0 state, the other thread gets access to
  twice as many entries in the RSB, but unfortunately the predictions of
  the now-halted logical processor are not purged. Therefore, the
  executing processor could speculatively execute from locations that
  the now-halted processor had trained the RSB on.

  The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
  when context switching to the idle thread. However, KVM allows a VMM
  to prevent exiting guest mode when transitioning out of C0 using the
  KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change
  this behavior. To mitigate the cross-thread return address predictions
  bug, a VMM must not be allowed to override the default behavior to
  intercept C0 transitions.

  These patches introduce a KVM module parameter that, if set, will
  prevent the user from disabling the HLT, MWAIT and CSTATE exits"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions
  KVM: x86: Mitigate the cross-thread return address predictions bug
  x86/speculation: Identify processors vulnerable to SMT RSB predictions
2023-02-14 09:17:01 -08:00
Johan Hovold
bc1bc1b309 x86/ioapic: Use irq_domain_create_hierarchy()
Use the irq_domain_create_hierarchy() helper to create the hierarchical
domain, which both serves as documentation and avoids poking at
irqdomain internals.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-13-johan+linaro@kernel.org
2023-02-13 19:31:24 +00:00
Thomas Gleixner
ab407a1919 Clocksource watchdog commits for v6.3
This pull request contains the following:
 
 o	Improvements to clocksource-watchdog console messages.
 
 o	Loosening of the clocksource-watchdog skew criteria to match
 	those of NTP (500 parts per million, relaxed from 400 parts
 	per million).  If it is good enough for NTP, it is good enough
 	for the clocksource watchdog.
 
 o	Suspend clocksource-watchdog checking temporarily when high
 	memory latencies are detected.	This avoids the false-positive
 	clock-skew events that have been seen on production systems
 	running memory-intensive workloads.
 
 o	On systems where the TSC is deemed trustworthy, use it as the
 	watchdog timesource, but only when specifically requested using
 	the tsc=watchdog kernel boot parameter.  This permits clock-skew
 	events to be detected, but avoids forcing workloads to use the
 	slow HPET and ACPI PM timers.  These last two timers are slow
 	enough to cause systems to be needlessly marked bad on the one
 	hand, and real skew does sometimes happen on production systems
 	running production workloads on the other.  And sometimes it is
 	the fault of the TSC, or at least of the firmware that told the
 	kernel to program the TSC with the wrong frequency.
 
 o	Add a tsc=revalidate kernel boot parameter to allow the kernel
 	to diagnose cases where the TSC hardware works fine, but was told
 	by firmware to tick at the wrong frequency.  Such cases are rare,
 	but they really have happened on production systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmPhnhkTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jClDD/9gTo62MakVQz2wzBRBcWunzX4BAfy2
 2ORqZYqq8cJ4ccFVWtSq7gZ+0bxiT+J4jaVyJpmUPzaiCSfNUT+GLjWyLGzF9Xq+
 xLWpFJOhFhKYjYN2m1ottuQ81V7aTlorC8AJt/o+oCJFGUCb/heg/UrmoZ6DweHw
 H7uXS9yenKdKgYoMENW+8IVsy16sT4D5Fe8XAD/2J6vBBUbgBzKWhi8XSgSHB/Xw
 GCP4UfXVGl5QRG9Xu4ZgrFV1t4azxtmdBghFm7/Kep/j6ttSY78yoS43AbI57bhD
 fWB5mfAQvO+Zo5/9rLjcDzeZCp/PSdARD41aycPMiei08K278tIN9T/fmfSoG6rV
 lVRdFxTHrQcqc9d+g+mGASQBezCF8pxonm9HYLBpNjyfYHnKV70SPXywO4oqAJ1I
 7dCm+uv3Y8KaJdVnPUWOHJjvQLx9NWK5/pXBYjsYnLR+69EVmGDgPZ+/ulQxkWBj
 DtrQgs+sHQ8gngNpAilxuu/lrUXzrC8N4mtxXKBFQoCPYQMFBkr9S+aAEHIgZT9H
 1dWwR1QxeR5uxt7U+3DmTyJ1XKfYjDyyScesILlLMLbdKgZtTS5wGaK4QdJ3QW2z
 z4zqPDccWDDZKZy9W4QBnFBx6Rn49C8xThy7f6Loc+2cKAT10hrEmRJsn79AOCDc
 6hV0S2U9a6ypQg==
 =OWY2
 -----END PGP SIGNATURE-----

Merge tag 'clocksource.2023.02.06b' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into timers/core

Pull clocksource watchdog changes from Paul McKenney:

     o	Improvements to clocksource-watchdog console messages.

     o	Loosening of the clocksource-watchdog skew criteria to match
     	those of NTP (500 parts per million, relaxed from 400 parts
     	per million).  If it is good enough for NTP, it is good enough
     	for the clocksource watchdog.

     o	Suspend clocksource-watchdog checking temporarily when high
     	memory latencies are detected.	This avoids the false-positive
     	clock-skew events that have been seen on production systems
     	running memory-intensive workloads.

     o	On systems where the TSC is deemed trustworthy, use it as the
     	watchdog timesource, but only when specifically requested using
     	the tsc=watchdog kernel boot parameter.  This permits clock-skew
     	events to be detected, but avoids forcing workloads to use the
     	slow HPET and ACPI PM timers.  These last two timers are slow
     	enough to cause systems to be needlessly marked bad on the one
     	hand, and real skew does sometimes happen on production systems
     	running production workloads on the other.  And sometimes it is
     	the fault of the TSC, or at least of the firmware that told the
     	kernel to program the TSC with the wrong frequency.

     o	Add a tsc=revalidate kernel boot parameter to allow the kernel
     	to diagnose cases where the TSC hardware works fine, but was told
     	by firmware to tick at the wrong frequency.  Such cases are rare,
     	but they really have happened on production systems.

Link: https://lore.kernel.org/r/20230210193640.GA3325193@paulmck-ThinkPad-P17-Gen-1
2023-02-13 19:28:48 +01:00
Josh Poimboeuf
ffb1b4a410 x86/unwind/orc: Add 'signal' field to ORC metadata
Add a 'signal' field which allows unwind hints to specify whether the
instruction pointer should be taken literally (like for most interrupts
and exceptions) rather than decremented (like for call stack return
addresses) when used to find the next ORC entry.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org
2023-02-11 12:37:51 +01:00
Borislav Petkov (AMD)
6b8d5dde5b x86/tsc: Do feature check as the very first thing
Do the feature check as the very first thing in the function. Everything
else comes after that and is meaningless work if the TSC CPUID bit is
not even set. Switch to cpu_feature_enabled() too, while at it.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y5990CUCuWd5jfBH@zn.tnic
2023-02-11 10:44:07 +01:00
Borislav Petkov (AMD)
8fe6d84947 x86/tsc: Make recalibrate_cpu_khz() export GPL only
A quick search doesn't reveal any use outside of the kernel - which
would be questionable to begin with anyway - so make the export GPL
only.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y599miBzWRAuOwhg@zn.tnic
2023-02-11 10:44:07 +01:00
Borislav Petkov (AMD)
851026a2bf x86/cacheinfo: Remove unused trace variable
15cd8812ab ("x86: Remove the CPU cache size printk's") removed the
last use of the trace local var. Remove it too and the useless trace
cache case.

No functional changes.

Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230210234541.9694-1-bp@alien8.de
Link: http://lore.kernel.org/r/20220705073349.1512-1-jiapeng.chong@linux.alibaba.com
2023-02-11 10:43:31 +01:00
Tom Lendacky
be8de49bea x86/speculation: Identify processors vulnerable to SMT RSB predictions
Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
transitions out of C0 state, the other sibling thread could use return
target predictions from the sibling thread that transitioned out of C0.

The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
when context switching to the idle thread. However, KVM allows a VMM to
prevent exiting guest mode when transitioning out of C0. A guest could
act maliciously in this situation, so create a new x86 BUG that can be
used to detect if the processor is vulnerable.

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <91cec885656ca1fcd4f0185ce403a53dd9edecb7.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-10 06:43:03 -05:00
Suren Baghdasaryan
1c71222e5f mm: replace vma->vm_flags direct modifications with modifier calls
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

[akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09 16:51:39 -08:00
Ard Biesheuvel
93be2859e2 efi: x86: Wire up IBT annotation in memory attributes table
UEFI v2.10 extends the EFI memory attributes table with a flag that
indicates whether or not all RuntimeServicesCode regions were
constructed with ENDBR landing pads, permitting the OS to map these
regions with IBT restrictions enabled.

So let's take this into account on x86 as well.

Suggested-by: Peter Zijlstra <peterz@infradead.org> # ibt_save() changes
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2023-02-09 19:30:54 +01:00
Nadav Amit
ae052e3ae0 x86/kprobes: Fix 1 byte conditional jump target
Commit 3bc753c06d ("kbuild: treat char as always unsigned") broke
kprobes.  Setting a probe-point on 1 byte conditional jump can cause the
kernel to crash when the (signed) relative jump offset gets treated as
unsigned.

Fix by replacing the unsigned 'immediate.bytes' (plus a cast) with the
signed 'immediate.value' when assigning to the relative jump offset.

[ dhansen: clarified changelog ]

Fixes: 3bc753c06d ("kbuild: treat char as always unsigned")
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230208071708.4048-1-namit%40vmware.com
2023-02-08 12:03:27 -08:00
Paul E. McKenney
0051293c53 clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
Unconditionally enabling TSC watchdog checking of the HPET and PMTMR
clocksources can degrade latency and performance.  Therefore, provide
a new "watchdog" option to the tsc= boot parameter that opts into such
checking.  Note that tsc=watchdog is overridden by a tsc=nowatchdog
regardless of their relative positions in the list of boot parameters.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
2023-02-06 16:38:30 -08:00
Sebastian Andrzej Siewior
717cce3bdc x86/cpu: Provide the full setup for getcpu() on x86-32
setup_getcpu() configures two things:

  - it writes the current CPU & node information into MSR_TSC_AUX
  - it writes the same information as a GDT entry.

By using the "full" setup_getcpu() on i386 it is possible to read the CPU
information in userland via RDTSCP() or via LSL from the GDT.

Provide an GDT_ENTRY_CPUNODE for x86-32 and make the setup function
unconditionally available.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Link: https://lore.kernel.org/r/20221125094216.3663444-2-bigeasy@linutronix.de
2023-02-06 15:48:54 +01:00
Borislav Petkov (AMD)
f33e0c893b x86/microcode/core: Return an error only when necessary
Return an error from the late loading function which is run on each CPU
only when an error has actually been encountered during the update.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230130161709.11615-5-bp@alien8.de
2023-02-06 13:41:31 +01:00
Borislav Petkov (AMD)
7ff6edf4fe x86/microcode/AMD: Fix mixed steppings support
The AMD side of the loader has always claimed to support mixed
steppings. But somewhere along the way, it broke that by assuming that
the cached patch blob is a single one instead of it being one per
*node*.

So turn it into a per-node one so that each node can stash the blob
relevant for it.

  [ NB: Fixes tag is not really the exactly correct one but it is good
    enough. ]

Fixes: fe055896c0 ("x86/microcode: Merge the early microcode loader")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org> # 2355370cd9 ("x86/microcode/amd: Remove load_microcode_amd()'s bsp parameter")
Cc: <stable@kernel.org> # a5ad92134b ("x86/microcode/AMD: Add a @cpu parameter to the reloading functions")
Link: https://lore.kernel.org/r/20230130161709.11615-4-bp@alien8.de
2023-02-06 13:40:16 +01:00
Borislav Petkov (AMD)
a5ad92134b x86/microcode/AMD: Add a @cpu parameter to the reloading functions
Will be used in a subsequent change.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230130161709.11615-3-bp@alien8.de
2023-02-06 12:14:20 +01:00
Borislav Petkov (AMD)
2355370cd9 x86/microcode/amd: Remove load_microcode_amd()'s bsp parameter
It is always the BSP.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230130161709.11615-2-bp@alien8.de
2023-02-06 11:13:04 +01:00
Song Liu
0c05e7bd2d livepatch,x86: Clear relocation targets on a module removal
Josh reported a bug:

  When the object to be patched is a module, and that module is
  rmmod'ed and reloaded, it fails to load with:

  module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
  livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
  livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

  The livepatch module has a relocation which references a symbol
  in the _previous_ loading of nfsd. When apply_relocate_add()
  tries to replace the old relocation with a new one, it sees that
  the previous one is nonzero and it errors out.

He also proposed three different solutions. We could remove the error
check in apply_relocate_add() introduced by commit eda9cec4c9
("x86/module: Detect and skip invalid relocations"). However the check
is useful for detecting corrupted modules.

We could also deny the patched modules to be removed. If it proved to be
a major drawback for users, we could still implement a different
approach. The solution would also complicate the existing code a lot.

We thus decided to reverse the relocation patching (clear all relocation
targets on x86_64). The solution is not
universal and is too much arch-specific, but it may prove to be simpler
in the end.

Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Originally-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230125185401.279042-2-song@kernel.org
2023-02-03 11:28:22 +01:00
Song Liu
bbb93362a4 x86/module: remove unused code in __apply_relocate_add
This "#if 0" block has been untouched for many years. Remove it to clean
up the code.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230125185401.279042-1-song@kernel.org
2023-02-03 11:27:23 +01:00
Paul E. McKenney
efc8b329c7 clocksource: Verify HPET and PMTMR when TSC unverified
On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC,
NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the
TSC is disabled.  This works well much of the time, but there is the
occasional production-level system that meets all of these criteria, but
which still has a TSC that skews significantly from atomic-clock time.
This is usually attributed to a firmware or hardware fault.  Yes, the
various NTP daemons do express their opinions of userspace-to-atomic-clock
time skew, but they put them in various places, depending on the daemon
and distro in question.  It would therefore be good for the kernel to
have some clue that there is a problem.

The old behavior of marking the TSC unstable is a non-starter because a
great many workloads simply cannot tolerate the overheads and latencies
of the various non-TSC clocksources.  In addition, NTP-corrected systems
sometimes can tolerate significant kernel-space time skew as long as
the userspace time sources are within epsilon of atomic-clock time.

Therefore, when watchdog verification of TSC is disabled, enable it for
HPET and PMTMR (AKA ACPI PM timer).  This provides the needed in-kernel
time-skew diagnostic without degrading the system's performance.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Waiman Long <longman@redhat.com>
Cc: <x86@kernel.org>
Tested-by: Feng Tang <feng.tang@intel.com>
2023-02-02 14:23:02 -08:00
Feng Tang
a7ec817d55 x86/tsc: Add option to force frequency recalibration with HW timer
The kernel assumes that the TSC frequency which is provided by the
hardware / firmware via MSRs or CPUID(0x15) is correct after applying
a few basic consistency checks. This disables the TSC recalibration
against HPET or PM timer.

As a result there is no mechanism to validate that frequency in cases
where a firmware or hardware defect is suspected. And there was case
that some user used atomic clock to measure the TSC frequency and
reported an inaccuracy issue, which was later fixed in firmware.

Add an option 'recalibrate' for 'tsc' kernel parameter to force the
tsc freq recalibration with HPET or PM timer, and warn if the
deviation from previous value is more than about 500 PPM, which
provides a way to verify the data from hardware / firmware.

There is no functional change to existing work flow.

Recently there was a real-world case: "The 40ms/s divergence between
TSC and HPET was observed on hardware that is quite recent" [1], on
that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
and the force-reclibration with HPET/PMTIMER both calibrated out
value of 1975 MHz, which also matched with check from software
'chronyd', indicating it's a problem of BIOS or firmware.

[Thanks tglx for helping improving the commit log]
[ paulmck: Wordsmith Kconfig help text. ]

[1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: <x86@kernel.org>
Cc: <linux-doc@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-02-02 14:22:52 -08:00
Alexey Kardashevskiy
7914695743 x86/amd: Cache debug register values in percpu variables
Reading DR[0-3]_ADDR_MASK MSRs takes about 250 cycles which is going to
be noticeable with the AMD KVM SEV-ES DebugSwap feature enabled.  KVM is
going to store host's DR[0-3] and DR[0-3]_ADDR_MASK before switching to
a guest; the hardware is going to swap these on VMRUN and VMEXIT.

Store MSR values passed to set_dr_addr_mask() in percpu variables
(when changed) and return them via new amd_get_dr_addr_mask().
The gain here is about 10x.

As set_dr_addr_mask() uses the array too, change the @dr type to
unsigned to avoid checking for <0. And give it the amd_ prefix to match
the new helper as the whole DR_ADDR_MASK feature is AMD-specific anyway.

While at it, replace deprecated boot_cpu_has() with cpu_feature_enabled()
in set_dr_addr_mask().

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230120031047.628097-2-aik@amd.com
2023-01-31 20:09:26 +01:00
Ashok Raj
25d0dc4b95 x86/microcode: Allow only "1" as a late reload trigger value
Microcode gets reloaded late only if "1" is written to the reload file.
However, the code silently treats any other unsigned integer as a
successful write even though no actions are performed to load microcode.

Make the loader more strict to accept only "1" as a trigger value and
return an error otherwise.

  [ bp: Massage commit message. ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230130213955.6046-3-ashok.raj@intel.com
2023-01-31 16:47:03 +01:00
Peter Zijlstra
923510c88d x86/static_call: Add support for Jcc tail-calls
Clang likes to create conditional tail calls like:

  0000000000000350 <amd_pmu_add_event>:
  350:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1) 351: R_X86_64_NONE      __fentry__-0x4
  355:       48 83 bf 20 01 00 00 00         cmpq   $0x0,0x120(%rdi)
  35d:       0f 85 00 00 00 00       jne    363 <amd_pmu_add_event+0x13>     35f: R_X86_64_PLT32     __SCT__amd_pmu_branch_add-0x4
  363:       e9 00 00 00 00          jmp    368 <amd_pmu_add_event+0x18>     364: R_X86_64_PLT32     __x86_return_thunk-0x4

Where 0x35d is a static call site that's turned into a conditional
tail-call using the Jcc class of instructions.

Teach the in-line static call text patching about this.

Notably, since there is no conditional-ret, in that case patch the Jcc
to point at an empty stub function that does the ret -- or the return
thunk when needed.

Reported-by: "Erhard F." <erhard_f@mailbox.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/Y9Kdg9QjHkr9G5b5@hirez.programming.kicks-ass.net
2023-01-31 15:05:31 +01:00
Peter Zijlstra
ac0ee0a956 x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions
In order to re-write Jcc.d32 instructions text_poke_bp() needs to be
taught about them.

The biggest hurdle is that the whole machinery is currently made for 5
byte instructions and extending this would grow struct text_poke_loc
which is currently a nice 16 bytes and used in an array.

However, since text_poke_loc contains a full copy of the (s32)
displacement, it is possible to map the Jcc.d32 2 byte opcodes to
Jcc.d8 1 byte opcode for the int3 emulation.

This then leaves the replacement bytes; fudge that by only storing the
last 5 bytes and adding the rule that 'length == 6' instruction will
be prefixed with a 0x0f byte.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230123210607.115718513@infradead.org
2023-01-31 15:05:31 +01:00
Peter Zijlstra
db7adcfd1c x86/alternatives: Introduce int3_emulate_jcc()
Move the kprobe Jcc emulation into int3_emulate_jcc() so it can be
used by more code -- specifically static_call() will need this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230123210607.057678245@infradead.org
2023-01-31 15:05:30 +01:00
Peter Zijlstra
8739c68115 sched/clock/x86: Mark sched_clock() noinstr
In order to use sched_clock() from noinstr code, mark it and all it's
implenentations noinstr.

The whole pvclock thing (used by KVM/Xen) is a bit of a pain,
since it calls out to watchdogs, create a
pvclock_clocksource_read_nowd() variant doesn't do that and can be
noinstr.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230126151323.702003578@infradead.org
2023-01-31 15:01:47 +01:00
Uros Bizjak
5c9da9fe82 x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
Improve atomic update of last_value in pvclock_clocksource_read:

- Atomic update can be skipped if the "last_value" is already
  equal to "ret".

- The detection of atomic update failure is not correct. The value,
  returned by atomic64_cmpxchg should be compared to the old value
  from the location to be updated. If these two are the same, then
  atomic update succeeded and "last_value" location is updated to
  "ret" in an atomic way. Otherwise, the atomic update failed and
  it should be retried with the value from "last_value" - exactly
  what atomic64_try_cmpxchg does in a correct and more optimal way.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20230118202330.3740-1-ubizjak@gmail.com
Link: https://lore.kernel.org/r/20230126151323.643408110@infradead.org
2023-01-31 15:01:46 +01:00
Ingo Molnar
57a30218fa Linux 6.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
 E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
 nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
 TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
 s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
 ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
 SDc/Y/c=
 =zpaD
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc6' into sched/core, to pick up fixes

Pick up fixes before merging another batch of cpuidle updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-31 15:01:20 +01:00
Linus Torvalds
bc6bc34b10 - Start checking for -mindirect-branch-cs-prefix clang support too now that LLVM
16 will support it
 
 - Fix a NULL ptr deref when suspending with Xen PV
 
 - Have a SEV-SNP guest check explicitly for features enabled by the hypervisor
   and fail gracefully if some are unsupported by the guest instead of failing in
   a non-obvious and hard-to-debug way
 
 - Fix a MSI descriptor leakage under Xen
 
 - Mark Xen's MSI domain as supporting MSI-X
 
 - Prevent legacy PIC interrupts from being resent in software by marking them
   level triggered, as they should be, which lead to a NULL ptr deref
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPWdaAACgkQEsHwGGHe
 VUrHUBAAvRh7cDVKvr2wPNJ+9RFjPFubcLq/+4yTe4Bu14wnYn8+gb2S7P2bPJWl
 TxbZhJGV2IeYyB+aJybjGwX96AzE5lWD6epQLRLI/BOVmqLA8Eyw9te0NEQDd9Wq
 hgY1/hhUmLdpXubq09YjIht5nlxVZ+JFfppouTR7LVtZl7MFvQbAOU8rWzpcbm7l
 UlA4GLakFeOYpbI5g+zzsZ1a1M0cSz8wQ43plD5aAtNy2wKbWiA4QDXJo9J7+ZQg
 1FmOHZ3ChPjEhf9k/N2uAZ6RUHVEDIJ7hDfsM3j/qsKGzCV4cho2Ify+0PFWo4pt
 FO2bRh1yDFqdz4m6ulhnmuMnoRnEuswwrTzrG+HYu1ntpUt72dULmmRBpJ0s6C7W
 BHpCThNWIlcfHbLNY+VAOJ2hiOqfU/8ld+8R0sn7xmomjgCRYHh9kDGOeuvh1hJl
 jfITDuL2gjcj3Ph6+xh6KvOga41ff3EcxkfvqolZ/emRllPiDWsKYXVgUIl22FHt
 23xH7gutPCp27MdoXM1EaWs5s/PQfctvm4LrW6XS8IjWURLo4RrkU6y7YD5VKVy8
 KCKcpF+JMdE1Ao1WpMFfjDG4ObyMYyqnD790nQVM1e5kIhOpeWaa7WzkGFRyIo6Q
 5BU3GGDyMT8SHy7NFhQL62YJe0P2ZctNIDjiTQlYDnWWhjmvk8I=
 =4QMN
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Start checking for -mindirect-branch-cs-prefix clang support too now
   that LLVM 16 will support it

 - Fix a NULL ptr deref when suspending with Xen PV

 - Have a SEV-SNP guest check explicitly for features enabled by the
   hypervisor and fail gracefully if some are unsupported by the guest
   instead of failing in a non-obvious and hard-to-debug way

 - Fix a MSI descriptor leakage under Xen

 - Mark Xen's MSI domain as supporting MSI-X

 - Prevent legacy PIC interrupts from being resent in software by
   marking them level triggered, as they should be, which lead to a NULL
   ptr deref

* tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block
  acpi: Fix suspend with Xen PV
  x86/sev: Add SEV-SNP guest feature negotiation support
  x86/pci/xen: Fixup fallout from the PCI/MSI overhaul
  x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain
  x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
2023-01-29 11:17:34 -08:00
Kirill A. Shutemov
0da908c291 x86/tdx: Add more registers to struct tdx_hypercall_args
struct tdx_hypercall_args is used to pass down hypercall arguments to
__tdx_hypercall() assembly routine.

Currently __tdx_hypercall() handles up to 6 arguments. In preparation to
changes in __tdx_hypercall(), expand the structure to 6 more registers
and generate asm offsets for them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230126221159.8635-3-kirill.shutemov%40linux.intel.com
2023-01-27 09:42:09 -08:00
Uros Bizjak
890a0794b3 x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock()
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
__acpi_{acquire,release}_global_lock().  x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after CMPXCHG
(and related MOV instruction in front of CMPXCHG).

Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when CMPXCHG
fails. There is no need to re-read the value in the loop.

Note that the value from *ptr should be read using READ_ONCE() to prevent
the compiler from merging, refetching or reordering the read.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230116162522.4072-1-ubizjak@gmail.com
2023-01-26 11:49:40 +01:00
Borislav Petkov (AMD)
793207bad7 x86/resctrl: Fix a silly -Wunused-but-set-variable warning
clang correctly complains

  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1456:6: warning: variable \
     'h' set but not used [-Wunused-but-set-variable]
          u32 h;
              ^

but it can't know whether this use is innocuous or really a problem.
There's a reason why those warning switches are behind a W=1 and not
enabled by default - yes, one needs to do:

  make W=1 CC=clang HOSTCC=clang arch/x86/kernel/cpu/resctrl/

with clang 14 in order to trigger it.

I would normally not take a silly fix like that but this one is simple
and doesn't make the code uglier so...

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/202301242015.kbzkVteJ-lkp@intel.com
2023-01-26 11:15:20 +01:00
Kim Phillips
e7862eda30 x86/cpu: Support AMD Automatic IBRS
The AMD Zen4 core supports a new feature called Automatic IBRS.

It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS,
h/w manages its IBRS mitigation resources automatically across CPL transitions.

The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by
setting MSR C000_0080 (EFER) bit 21.

Enable Automatic IBRS by default if the CPU feature is present.  It typically
provides greater performance over the incumbent generic retpolines mitigation.

Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum.  AMD Automatic IBRS and
Intel Enhanced IBRS have similar enablement.  Add NO_EIBRS_PBRSB to
cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS.

The kernel command line option spectre_v2=eibrs is used to select AMD Automatic
IBRS, if available.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-8-kim.phillips@amd.com
2023-01-25 17:16:01 +01:00
Kim Phillips
5b909d4ae5 x86/cpu, kvm: Add the Null Selector Clears Base feature
The Null Selector Clears Base feature was being open-coded for KVM.
Add it to its newly added native CPUID leaf 0x80000021 EAX proper.

Also drop the bit description comments now it's more self-describing.

  [ bp: Convert test in check_null_seg_clears_base() too. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-6-kim.phillips@amd.com
2023-01-25 16:25:46 +01:00
Kim Phillips
84168ae786 x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf
The LFENCE always serializing feature bit was defined as scattered
LFENCE_RDTSC and its native leaf bit position open-coded for KVM.  Add
it to its newly added CPUID leaf 0x80000021 EAX proper.  With
LFENCE_RDTSC in its proper place, the kernel's set_cpu_cap() will
effectively synthesize the feature for KVM going forward.

Also, DE_CFG[1] doesn't need to be set on such CPUs anymore.

  [ bp: Massage and merge diff from Sean. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-5-kim.phillips@amd.com
2023-01-25 13:06:13 +01:00
Jens Axboe
cb3ea4b767 x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads
We don't set it on PF_KTHREAD threads as they never return to userspace,
and PF_IO_WORKER threads are identical in that regard. As they keep
running in the kernel until they die, skip setting the FPU flag on them.

More of a cosmetic thing that was found while debugging and
issue and pondering why the FPU flag is set on these threads.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/560c844c-f128-555b-40c6-31baff27537f@kernel.dk
2023-01-25 12:35:15 +01:00
Brian Gerst
4c382d723e x86/vdso: Move VDSO image init to vdso2c generated code
Generate an init function for each VDSO image, replacing init_vdso() and
sysenter_setup().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230124184019.26850-1-brgerst@gmail.com
2023-01-25 12:33:40 +01:00
Kim Phillips
8415a74852 x86/cpu, kvm: Add support for CPUID_80000021_EAX
Add support for CPUID leaf 80000021, EAX. The majority of the features will be
used in the kernel and thus a separate leaf is appropriate.

Include KVM's reverse_cpuid entry because features are used by VM guests, too.

  [ bp: Massage commit message. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-2-kim.phillips@amd.com
2023-01-25 12:33:06 +01:00
Sean Christopherson
54a3b70a75 x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too
Use a dedicated entry for invoking the NMI handler from KVM VMX's VM-Exit
path for 32-bit even though using a dedicated entry for 32-bit isn't
strictly necessary.  Exposing a single symbol will allow KVM to reference
the entry point in assembly code without having to resort to more #ifdefs
(or #defines).  identry.h is intended to be included from asm files only
once, and so simply including idtentry.h in KVM assembly isn't an option.

Bypassing the ESP fixup and CR3 switching in the standard NMI entry code
is safe as KVM always handles NMIs that occur in the guest on a kernel
stack, with a kernel CR3.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221213060912.654668-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:36:40 -08:00
Sean Christopherson
a2b07fa7b9 x86/reboot: Disable SVM, not just VMX, when stopping CPUs
Disable SVM and more importantly force GIF=1 when halting a CPU or
rebooting the machine.  Similar to VMX, SVM allows software to block
INITs via CLGI, and thus can be problematic for a crash/reboot.  The
window for failure is smaller with SVM as INIT is only blocked while
GIF=0, i.e. between CLGI and STGI, but the window does exist.

Fixes: fba4f472b3 ("x86/reboot: Turn off KVM when halting a CPU")
Cc: stable@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221130233650.1404148-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:22 -08:00
Sean Christopherson
d81f952aa6 x86/reboot: Disable virtualization in an emergency if SVM is supported
Disable SVM on all CPUs via NMI shootdown during an emergency reboot.
Like VMX, SVM can block INIT, e.g. if the emergency reboot is triggered
between CLGI and STGI, and thus can prevent bringing up other CPUs via
INIT-SIPI-SIPI.

Cc: stable@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221130233650.1404148-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:22 -08:00
Sean Christopherson
26044aff37 x86/crash: Disable virt in core NMI crash handler to avoid double shootdown
Disable virtualization in crash_nmi_callback() and rework the
emergency_vmx_disable_all() path to do an NMI shootdown if and only if a
shootdown has not already occurred.   NMI crash shootdown fundamentally
can't support multiple invocations as responding CPUs are deliberately
put into halt state without unblocking NMIs.  But, the emergency reboot
path doesn't have any work of its own, it simply cares about disabling
virtualization, i.e. so long as a shootdown occurred, emergency reboot
doesn't care who initiated the shootdown, or when.

If "crash_kexec_post_notifiers" is specified on the kernel command line,
panic() will invoke crash_smp_send_stop() and result in a second call to
nmi_shootdown_cpus() during native_machine_emergency_restart().

Invoke the callback _before_ disabling virtualization, as the current
VMCS needs to be cleared before doing VMXOFF.  Note, this results in a
subtle change in ordering between disabling virtualization and stopping
Intel PT on the responding CPUs.  While VMX and Intel PT do interact,
VMXOFF and writes to MSR_IA32_RTIT_CTL do not induce faults between one
another, which is all that matters when panicking.

Harden nmi_shootdown_cpus() against multiple invocations to try and
capture any such kernel bugs via a WARN instead of hanging the system
during a crash/dump, e.g. prior to the recent hardening of
register_nmi_handler(), re-registering the NMI handler would trigger a
double list_add() and hang the system if CONFIG_BUG_ON_DATA_CORRUPTION=y.

 list_add double add: new=ffffffff82220800, prev=ffffffff8221cfe8, next=ffffffff82220800.
 WARNING: CPU: 2 PID: 1319 at lib/list_debug.c:29 __list_add_valid+0x67/0x70
 Call Trace:
  __register_nmi_handler+0xcf/0x130
  nmi_shootdown_cpus+0x39/0x90
  native_machine_emergency_restart+0x1c9/0x1d0
  panic+0x237/0x29b

Extract the disabling logic to a common helper to deduplicate code, and
to prepare for doing the shootdown in the emergency reboot path if SVM
is supported.

Note, prior to commit ed72736183 ("x86/reboot: Force all cpus to exit
VMX root if VMX is supported"), nmi_shootdown_cpus() was subtly protected
against a second invocation by a cpu_vmx_enabled() check as the kdump
handler would disable VMX if it ran first.

Fixes: ed72736183 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported")
Cc: stable@vger.kernel.org
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/all/20220427224924.592546-2-gpiccoli@igalia.com
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221130233650.1404148-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:21 -08:00
Paolo Bonzini
dc7c31e922 Merge branch 'kvm-v6.2-rc4-fixes' into HEAD
ARM:

* Fix the PMCR_EL0 reset value after the PMU rework

* Correctly handle S2 fault triggered by a S1 page table walk
  by not always classifying it as a write, as this breaks on
  R/O memslots

* Document why we cannot exit with KVM_EXIT_MMIO when taking
  a write fault from a S1 PTW on a R/O memslot

* Put the Apple M2 on the naughty list for not being able to
  correctly implement the vgic SEIS feature, just like the M1
  before it

* Reviewer updates: Alex is stepping down, replaced by Zenghui

x86:

* Fix various rare locking issues in Xen emulation and teach lockdep
  to detect them

* Documentation improvements

* Do not return host topology information from KVM_GET_SUPPORTED_CPUID
2023-01-24 06:05:23 -05:00
Babu Moger
4fe61bff5a x86/resctrl: Add interface to write mbm_local_bytes_config
The event configuration for mbm_local_bytes can be changed by the
user by writing to the configuration file
/sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.

The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

For example, to change the mbm_local_bytes_config to count all the non-temporal
writes on domain 0, the bits 2 and 3 needs to be set which is 1100b (in hex
0xc).
Run the command:

  $echo  0=0xc > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

To change the mbm_local_bytes to count only reads to local NUMA domain 1,
the bit 0 needs to be set which 1b (in hex 0x1). Run the command:

  $echo  1=0x1 > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-13-babu.moger@amd.com
2023-01-23 17:40:32 +01:00
Babu Moger
92bd5a1390 x86/resctrl: Add interface to write mbm_total_bytes_config
The event configuration for mbm_total_bytes can be changed by the user by
writing to the file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.

The event configuration settings are domain specific and affect all the
CPUs in the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

For example:

To change the mbm_total_bytes to count only reads on domain 0, the bits
0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
Run the command:

  $echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

To change the mbm_total_bytes to count all the slow memory reads on domain 1,
the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
Run the command:

  $echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-12-babu.moger@amd.com
2023-01-23 17:40:30 +01:00
Babu Moger
73afb2d3ce x86/resctrl: Add interface to read mbm_local_bytes_config
The event configuration can be viewed by the user by reading the configuration
file /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.  The event
configuration settings are domain specific and will affect all the CPUs in the
domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

By default, the mbm_local_bytes_config is set to 0x15 to count all the local
event types.

For example:

  $cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
  0=0x15;1=0x15;2=0x15;3=0x15

In this case, the event mbm_local_bytes is configured with 0x15 on
domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-11-babu.moger@amd.com
2023-01-23 17:40:27 +01:00
Babu Moger
dc2a3e8579 x86/resctrl: Add interface to read mbm_total_bytes_config
The event configuration can be viewed by the user by reading the
configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.  The
event configuration settings are domain specific and will affect all the CPUs in
the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

By default, the mbm_total_bytes_config is set to 0x7f to count all the
event types.

For example:

  $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
  0=0x7f;1=0x7f;2=0x7f;3=0x7f

In this case, the event mbm_total_bytes is configured with 0x7f on
domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-10-babu.moger@amd.com
2023-01-23 17:40:24 +01:00
Babu Moger
d507f83ced x86/resctrl: Support monitor configuration
Add a new field in struct mon_evt to support Bandwidth Monitoring Event
Configuration (BMEC) and also update the "mon_features" display.

The resctrl file "mon_features" will display the supported events
and files that can be used to configure those events if monitor
configuration is supported.

Before the change:

  $ cat /sys/fs/resctrl/info/L3_MON/mon_features
  llc_occupancy
  mbm_total_bytes
  mbm_local_bytes

After the change when BMEC is supported:

  $ cat /sys/fs/resctrl/info/L3_MON/mon_features
  llc_occupancy
  mbm_total_bytes
  mbm_total_bytes_config
  mbm_local_bytes
  mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-9-babu.moger@amd.com
2023-01-23 17:40:21 +01:00
Babu Moger
bd334c86b5 x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
In an upcoming change, rdt_get_mon_l3_config() needs to call rdt_cpu_has() to
query the monitor related features. It cannot be called right now because
rdt_cpu_has() has the __init attribute but rdt_get_mon_l3_config() doesn't.

Add the __init attribute to rdt_get_mon_l3_config() that is only called by
get_rdt_mon_resources() that already has the __init attribute. Also make
rdt_cpu_has() available to by rdt_get_mon_l3_config() via the internal header
file.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-8-babu.moger@amd.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-01-23 17:40:11 +01:00
Babu Moger
5b6fac3fa4 x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
The QoS slow memory configuration details are available via
CPUID_Fn80000020_EDX_x02. Detect the available details and
initialize the rest to defaults.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-7-babu.moger@amd.com
2023-01-23 17:38:44 +01:00
Babu Moger
a76f65c89f x86/resctrl: Include new features in command line options
Add the command line options to enable or disable the new resctrl features:

smba: Slow Memory Bandwidth Allocation
bmec: Bandwidth Monitor Event Configuration.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-6-babu.moger@amd.com
2023-01-23 17:38:38 +01:00
Babu Moger
78335aac61 x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
Newer AMD processors support the new feature Bandwidth Monitoring Event
Configuration (BMEC).

The feature support is identified via CPUID Fn8000_0020_EBX_x0[3]: EVT_CFG -
Bandwidth Monitoring Event Configuration (BMEC)

The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes are set to
count all the total and local reads/writes, respectively. With the introduction
of slow memory, the two counters are not enough to count all the different types
of memory events. Therefore, BMEC provides the option to configure
mbm_total_bytes and mbm_local_bytes to count the specific type of events.

Each BMEC event has a configuration MSR which contains one field for each
bandwidth type that can be used to configure the bandwidth event to track any
combination of supported bandwidth types. The event will count requests from
every bandwidth type bit that is set in the corresponding configuration
register.

Following are the types of events supported:

  ====    ========================================================
  Bits    Description
  ====    ========================================================
  6       Dirty Victims from the QOS domain to all types of memory
  5       Reads to slow memory in the non-local NUMA domain
  4       Reads to slow memory in the local NUMA domain
  3       Non-temporal writes to non-local NUMA domain
  2       Non-temporal writes to local NUMA domain
  1       Reads to memory in the non-local NUMA domain
  0       Reads to memory in the local NUMA domain
  ====    ========================================================

By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to 0x15 to
count all the local memory events.

Feature description is available in the specification, "AMD64 Technology
Platform Quality of Service Extensions, Revision: 1.03 Publication" at
https://bugzilla.kernel.org/attachment.cgi?id=301365

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-5-babu.moger@amd.com
2023-01-23 17:38:31 +01:00
Babu Moger
a5b6996655 x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
Add a new resource type RDT_RESOURCE_SMBA to handle the QoS enforcement
policies on the external slow memory.

Mostly initialization of the essentials. Setting fflags to RFTYPE_RES_MB
configures the SMBA resource to have the same resctrl files as the
existing MBA resource. The SMBA resource has identical properties to
the existing MBA resource. These properties will be enumerated in an
upcoming change and exposed via resctrl because of this flag.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-4-babu.moger@amd.com
2023-01-23 17:38:22 +01:00
Babu Moger
f334f723a6 x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
Add the new AMD feature X86_FEATURE_SMBA. With it, the QOS enforcement policies
can be applied to external slow memory connected to the host. QOS enforcement is
accomplished by assigning a Class Of Service (COS) to a processor and specifying
allocations or limits for that COS for each resource to be allocated.

This feature is identified by the CPUID function 0x8000_0020_EBX_x0[2]:
L3SBE - L3 external slow memory bandwidth enforcement.

CXL.memory is the only supported "slow" memory device. With SMBA, the hardware
enables bandwidth allocation on the slow memory devices.  If there are multiple
slow memory devices in the system, then the throttling logic groups all the slow
sources together and applies the limit on them as a whole.

The presence of the SMBA feature (with CXL.memory) is independent of whether
slow memory device is actually present in the system. If there is no slow memory
in the system, then setting a SMBA limit will have no impact on the performance
of the system.

Presence of CXL memory can be identified by the numactl command:

  $numactl -H
  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  node 0 size: 63678 MB node 0 free: 59542 MB
  node 1 cpus:
  node 1 size: 16122 MB
  node 1 free: 15627 MB
  node distances:
  node   0   1
     0:  10  50
     1:  50  10

CPU list for CXL memory will be empty. The cpu-cxl node distance is greater than
cpu-to-cpu distances. Node 1 has the CXL memory in this case. CXL memory can
also be identified using ACPI SRAT table and memory maps.

Feature description is available in the specification, "AMD64 Technology
Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
Revision: 1.03 Issue Date: February 2022" at
https://bugzilla.kernel.org/attachment.cgi?id=301365

See also https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-3-babu.moger@amd.com
2023-01-23 17:38:17 +01:00
Babu Moger
fc3b618c87 x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
on_each_cpu_mask() runs the function on each CPU specified by cpumask,
which may include the local processor.

Replace smp_call_function_many() with on_each_cpu_mask() to simplify
the code.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-2-babu.moger@amd.com
2023-01-23 17:38:04 +01:00
Linus Torvalds
2475bf0250 - Make sure the scheduler doesn't use stale frequency scaling values when latter
get disabled due to a value error
 
 - Fix a NULL pointer access on UP configs
 
 - Use the proper locking when updating CPU capacity
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPNKRsACgkQEsHwGGHe
 VUr8NA/9GTqGUpOq0iRQkEOE1oAdT4ZxJ9dQpaWjwWSNv40HT7XJBzjtvzAyiOBF
 LTUVClBct5i1/j0tVcNq1zXEj3im2e24Ki6A1TWukejgGAMT/7siSkuChEDAMg2M
 b79nKCMpuIZMzkJND3qTkW/aMPpAyU82G8BeLCjw7vgPBsbVkjgbxGFKYdHgpLZa
 kdX/GhOufu40jcGeUxA4zWTpXfuXT3OG7JYLrlHeJ/HEdzy9kLCkWH4jHHllzPQw
 c4JwG7UnNkDKD6zkG0Guzi1zQy39egh/kaj7FQmVap9sneq+x69T4ta0BfXdBXnQ
 Vsqc/nnOybUzr9Gjg5W6KRZWk0k6hK9n3+cye88BfRvzMY0KAgxeCiZEn7cuKqZp
 15Agzz77vcwt32QsxjSp+hKQxJtePcBCurDhkfuAZyPELNIBeCW6inrXArprfTIg
 IfEF068GKsvDGu3I/z49VXFZ9YBoerQREHbN/xL1VNeB8VoAxAE227OMUvhMcIu1
 jxVvkESwc9BIybTcJbUjgg2i9t+Fv3gtZBQ6GFfDboHneia4U5a9aTyN6QGjX+5P
 SIXPhePbFCNrhW7JbSGqKBd96MczbFNQinRhgX1lBU0241cAnchY4nH9fUvv7Zrt
 b/QDg58tb2bJkm1Z08L256ZETELOU9nLJVxUbtBNSSO9dfkvoBs=
 =aNSK
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Make sure the scheduler doesn't use stale frequency scaling values
   when latter get disabled due to a value error

 - Fix a NULL pointer access on UP configs

 - Use the proper locking when updating CPU capacity

* tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
  sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs
  sched/fair: Fixes for capacity inversion detection
  sched/uclamp: Fix a uninitialized variable warnings
2023-01-22 12:14:58 -08:00
Ashok Raj
a9a5cac225 x86/microcode/intel: Print old and new revision during early boot
Make early loading message match late loading message and print both old
and new revisions.

This is helpful to know what the BIOS loaded revision is before an early
update.

Cache the early BIOS revision before the microcode update and have
print_ucode_info() print both the old and new revision in the same
format as microcode_reload_late().

  [ bp: Massage, remove useless comment. ]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230120161923.118882-6-ashok.raj@intel.com
2023-01-21 14:55:21 +01:00
Ashok Raj
174f1b909a x86/microcode/intel: Pass the microcode revision to print_ucode_info() directly
print_ucode_info() takes a struct ucode_cpu_info pointer as parameter.
Its sole purpose is to print the microcode revision.

The only available ucode_cpu_info always describes the currently loaded
microcode revision. After a microcode update is successful, this is the
new revision, or on failure it is the original revision.

In preparation for future changes, replace the struct ucode_cpu_info
pointer parameter with a plain integer which contains the revision
number and adjust the call sites accordingly.

No functional change.

  [ bp:
    - Fix + cleanup commit message.
    - Revert arbitrary, unrelated change.
  ]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230120161923.118882-5-ashok.raj@intel.com
2023-01-21 14:55:20 +01:00
Ashok Raj
6eab3abac7 x86/microcode: Adjust late loading result reporting message
During late microcode loading, the "Reload completed" message is issued
unconditionally, regardless of success or failure.

Adjust the message to report the result of the update.

  [ bp: Massage. ]

Fixes: 9bd681251b ("x86/microcode: Announce reload operation's completion")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/lkml/874judpqqd.ffs@tglx/
2023-01-21 14:55:20 +01:00
Ashok Raj
c0dd9245aa x86/microcode: Check CPU capabilities after late microcode update correctly
The kernel caches each CPU's feature bits at boot in an x86_capability[]
structure. However, the capabilities in the BSP's copy can be turned off
as a result of certain command line parameters or configuration
restrictions, for example the SGX bit. This can cause a mismatch when
comparing the values before and after the microcode update.

Another example is X86_FEATURE_SRBDS_CTRL which gets added only after
microcode update:

  --- cpuid.before	2023-01-21 14:54:15.652000747 +0100
  +++ cpuid.after	2023-01-21 14:54:26.632001024 +0100
  @@ -10,7 +10,7 @@ CPU:
      0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
      0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x11142120
      0x00000006 0x00: eax=0x000027f7 ebx=0x00000002 ecx=0x00000001 edx=0x00000000
  -   0x00000007 0x00: eax=0x00000000 ebx=0x029c6fbf ecx=0x40000000 edx=0xbc002400
  +   0x00000007 0x00: eax=0x00000000 ebx=0x029c6fbf ecx=0x40000000 edx=0xbc002e00
  									     ^^^

and which proves for a gazillionth time that late loading is a bad bad
idea.

microcode_check() is called after an update to report any previously
cached CPUID bits which might have changed due to the update.

Therefore, store the cached CPU caps before the update and compare them
with the CPU caps after the microcode update has succeeded.

Thus, the comparison is done between the CPUID *hardware* bits before
and after the upgrade instead of using the cached, possibly runtime
modified values in BSP's boot_cpu_data copy.

As a result, false warnings about CPUID bits changes are avoided.

  [ bp:
  	- Massage.
	- Add SRBDS_CTRL example.
	- Add kernel-doc.
	- Incorporate forgotten review feedback from dhansen.
	]

Fixes: 1008c52c09 ("x86/CPU: Add a microcode loader callback")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230109153555.4986-3-ashok.raj@intel.com
2023-01-21 14:53:20 +01:00
Ashok Raj
ab31c74455 x86/microcode: Add a parameter to microcode_check() to store CPU capabilities
Add a parameter to store CPU capabilities before performing a microcode
update so that CPU capabilities can be compared before and after update.

  [ bp: Massage. ]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230109153555.4986-2-ashok.raj@intel.com
2023-01-20 21:45:13 +01:00
Li RongQing
716ff71ae2 cpuidle-haltpoll: Replace default_idle() with arch_cpu_idle()
When a KVM guest has MWAIT, mwait_idle() is used as the default idle
function.

However, the cpuidle-haltpoll driver calls default_idle() from
default_enter_idle() directly and that one uses HLT instead of MWAIT,
which may affect performance adversely, because MWAIT is preferred to
HLT as explained by the changelog of commit aebef63cf7 ("x86: Remove
vendor checks from prefer_mwait_c1_over_halt").

Make default_enter_idle() call arch_cpu_idle(), which can use MWAIT,
instead of default_idle() to address this issue.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
[ rjw: Changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20 17:33:52 +01:00
Paul E. McKenney
344da544f1 x86/nmi: Print reasons why backtrace NMIs are ignored
Instrument nmi_trigger_cpumask_backtrace() to dump out diagnostics based
on evidence accumulated by exc_nmi().  These diagnostics are dumped for
CPUs that ignored an NMI backtrace request for more than 10 seconds.

[ paulmck: Apply Ingo Molnar feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2023-01-19 15:55:12 -08:00
Paul E. McKenney
1a3ea611fc x86/nmi: Accumulate NMI-progress evidence in exc_nmi()
CPUs ignoring NMIs is often a sign of those CPUs going bad, but there
are quite a few other reasons why a CPU might ignore NMIs.  Therefore,
accumulate evidence within exc_nmi() as to what might be preventing a
given CPU from responding to an NMI.

[ paulmck: Apply Peter Zijlstra feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2023-01-19 15:54:41 -08:00
Guangju Wang[baidu]
59047d942b x86/microcode: Use the DEVICE_ATTR_RO() macro
Use DEVICE_ATTR_RO() helper instead of open-coded DEVICE_ATTR(),
which makes the code a bit shorter and easier to read.

No change in functionality.

Signed-off-by: Guangju Wang[baidu] <wgj900@163.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230118023554.1898-1-wgj900@163.com
2023-01-18 12:02:20 +01:00
Ingo Molnar
65adf3a57c Linux 6.2-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPEGkMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGE4YH/1sxdve3nlWT7eyf
 VDanaCZFc2u6Sk1td23HyhvOVbFjj1E39UokHK9YfzD4hhn+u9SFvDgoXHGhHzYq
 IhDvNM3mLsKpEtjwNbbi/lt6tryJchhZ96O5leRiUxZaw/GBVnHrKGR56vCjC/DB
 zq8td4I+TmTPMpsL3e3WZqJoX3bjTqgZFNShA1mZFYtuQRpAVkJkafMRwuAySIMQ
 +FyK/zZ+MpGJ7y/UmWfaBmS5GtjBz0fva7fTbEkUqk8bDtNUWMBhsTKtgU1LBpBe
 Yrkhs3sdQdpKSm9yNaNrlLhSjFwkusV5kR09p6/5w+wthdPmjnoUZuEnAZpYyydE
 ZY7+Vqk=
 =q9Ac
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc4' into perf/core, to pick up fixes

Move from the -rc1 base to the fresher -rc4 kernel that
has various fixes included, before applying a larger
patchset.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-18 11:56:57 +01:00
Jinank Jain
7fec185a56 Drivers: hv: Setup synic registers in case of nested root partition
Child partitions are free to allocate SynIC message and event page but in
case of root partition it must use the pages allocated by Microsoft
Hypervisor (MSHV). Base address for these pages can be found using
synthetic MSRs exposed by MSHV. There is a slight difference in those MSRs
for nested vs non-nested root partition.

Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/cb951fb1ad6814996fc54f4a255c5841a20a151f.1672639707.git.jinankjain@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-01-17 13:36:43 +00:00
Thomas Gleixner
5fa5595072 x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
Baoquan reported that after triggering a crash the subsequent crash-kernel
fails to boot about half of the time. It triggers a NULL pointer
dereference in the periodic tick code.

This happens because the legacy timer interrupt (IRQ0) is resent in
software which happens in soft interrupt (tasklet) context. In this context
get_irq_regs() returns NULL which leads to the NULL pointer dereference.

The reason for the resend is a spurious APIC interrupt on the IRQ0 vector
which is captured and leads to a resend when the legacy timer interrupt is
enabled. This is wrong because the legacy PIC interrupts are level
triggered and therefore should never be resent in software, but nothing
ever sets the IRQ_LEVEL flag on those interrupts, so the core code does not
know about their trigger type.

Ensure that IRQ_LEVEL is set when the legacy PCI interrupts are set up.

Fixes: a4633adcdb ("[PATCH] genirq: add genirq sw IRQ-retrigger")
Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/87mt6rjrra.ffs@tglx
2023-01-16 17:24:56 +01:00
Yair Podemsky
5f5cc9ed99 x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
Once disable_freq_invariance_work is called the scale_freq_tick function
will not compute or update the arch_freq_scale values.
However the scheduler will still read these values and use them.
The result is that the scheduler might perform unfair decisions based on stale
values.

This patch adds the step of setting the arch_freq_scale values for all
cpus to the default (max) value SCHED_CAPACITY_SCALE, Once all cpus
have the same arch_freq_scale value the scaling is meaningless.

Signed-off-by: Yair Podemsky <ypodemsk@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230110160206.75912-1-ypodemsk@redhat.com
2023-01-16 10:19:15 +01:00
Christophe JAILLET
ef6dfc4b23 x86/signal: Fix the value returned by strict_sas_size()
Functions used with __setup() return 1 when the argument has been
successfully parsed.

Reverse the returned value so that 1 is returned when kstrtobool() is
successful (i.e. returns 0).

My understanding of these __setup() functions is that returning 1 or 0
does not change much anyway - so this is more of a cleanup than a
functional fix.

I spot it and found it spurious while looking at something else.
Even if the output is not perfect, you'll get the idea with:

   $ git grep -B2 -A10 retu.*kstrtobool | grep __setup -B10

Fixes: 3aac3ebea0 ("x86/signal: Implement sigaltstack size validation")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/73882d43ebe420c9d8fb82d0560021722b243000.1673717552.git.christophe.jaillet@wanadoo.fr
2023-01-15 09:54:27 +01:00
Juergen Gross
d55dcb7384 x86/cpu: Remove misleading comment
The comment of the "#endif" after setup_disable_pku() is wrong.

As the related #ifdef is only a few lines above, just remove the
comment.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230113130126.1966-1-jgross@suse.com
2023-01-13 14:20:20 +01:00
Peter Zijlstra
10a099405f cpuidle, xenpv: Make more PARAVIRT_XXL noinstr clean
objtool found a few cases where this code called out into instrumented
code:

  vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() leaves .noinstr.text section
  vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() leaves .noinstr.text section
  vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section

Solve this by:

 - marking arch_safe_halt(), wbinvd(), native_wbinvd() and
   HYPERVISOR_sched_op() as __always_inline().

 - Explicitly uninlining xen_safe_halt() and pv_native_wbinvd() [they were
   already uninlined by the compiler on use as function pointers] and
   annotating them as 'noinstr'.

 - Annotating pv_native_safe_halt() as 'noinstr'.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195541.171918174@infradead.org
2023-01-13 11:48:16 +01:00
Peter Zijlstra
89b3098703 arch/idle: Change arch_cpu_idle() behavior: always exit with IRQs disabled
Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.

However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.

Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.618076436@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
9b461a6faa cpuidle, intel_idle: Fix CPUIDLE_FLAG_IBRS
objtool to the rescue:

  vmlinux.o: warning: objtool: intel_idle_ibrs+0x17: call to spec_ctrl_current() leaves .noinstr.text section
  vmlinux.o: warning: objtool: intel_idle_ibrs+0x27: call to wrmsrl.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.556912863@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
821ad23d0e cpuidle, intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATE
Fix instrumentation bugs objtool found:

  vmlinux.o: warning: objtool: intel_idle_s2idle+0xd5: call to fpu_idle_fpregs() leaves .noinstr.text section
  vmlinux.o: warning: objtool: intel_idle_xstate+0x11: call to fpu_idle_fpregs() leaves .noinstr.text section
  vmlinux.o: warning: objtool: fpu_idle_fpregs+0x9: call to xfeatures_in_use() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.494977795@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
2b5a0e425e objtool/idle: Validate __cpuidle code as noinstr
Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
aaa3896b96 x86/idle: Replace 'x86_idle' function pointer with a static_call
Typical boot time setup; no need to suffer an indirect call for that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.453613251@infradead.org
2023-01-13 11:03:21 +01:00
H. Peter Anvin (Intel)
92cbbadf73 x86/gsseg: Use the LKGS instruction if available for load_gs_index()
The LKGS instruction atomically loads a segment descriptor into the
%gs descriptor registers, *except* that %gs.base is unchanged, and the
base is instead loaded into MSR_IA32_KERNEL_GS_BASE, which is exactly
what we want this function to do.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230112072032.35626-6-xin3.li@intel.com
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-13 10:07:27 +01:00
Jinank Jain
c4bdf94f97 x86/hyperv: Add support for detecting nested hypervisor
Detect if Linux is running as a nested hypervisor in the root
partition for Microsoft Hypervisor, using flags provided by MSHV.
Expose a new variable hv_nested that is used later for decisions
specific to the nested use case.

Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/8e3e7112806e81d2292a66a56fe547162754ecea.1672639707.git.jinankjain@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-01-12 15:23:26 +00:00
H. Peter Anvin (Intel)
ae53fa1870 x86/gsseg: Move load_gs_index() to its own new header file
GS is a special segment on x86_64, move load_gs_index() to its own new
header file to simplify header inclusion.

No change in functionality.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230112072032.35626-5-xin3.li@intel.com
2023-01-12 13:06:36 +01:00
Breno Leitao
0125acda7d x86/bugs: Reset speculation control settings on init
Currently, x86_spec_ctrl_base is read at boot time and speculative bits
are set if Kconfig items are enabled. For example, IBRS is enabled if
CONFIG_CPU_IBRS_ENTRY is configured, etc. These MSR bits are not cleared
if the mitigations are disabled.

This is a problem when kexec-ing a kernel that has the mitigation
disabled from a kernel that has the mitigation enabled. In this case,
the MSR bits are not cleared during the new kernel boot. As a result,
this might have some performance degradation that is hard to pinpoint.

This problem does not happen if the machine is (hard) rebooted because
the bit will be cleared by default.

  [ bp: Massage. ]

Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221128153148.1129350-1-leitao@debian.org
2023-01-12 11:37:01 +01:00
Yuntao Wang
50c66d7b04 x86/setup: Move duplicate boot_cpu_data definition out of the ifdeffery
Both the if and else blocks define an exact same boot_cpu_data variable, move
the duplicate variable definition out of the if/else block.

In addition, do some other minor cleanups.

  [ bp: Massage. ]

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20220601122914.820890-1-ytcoode@gmail.com
2023-01-11 12:45:16 +01:00
Wang Yong
b7d1f15b5c x86/boot/e820: Fix typo in e820.c comment
change "itsmain" to "its main".

Fixes: 544a0f47e7 ("x86/boot/e820: Rename e820_table_saved to e820_table_firmware and improve the description")
Signed-off-by: Wang Yong <yongw.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221211103849.173870-1-yongw.kernel@gmail.com
2023-01-11 12:45:03 +01:00
Peter Newman
2a81160d29 x86/resctrl: Fix event counts regression in reused RMIDs
When creating a new monitoring group, the RMID allocated for it may have
been used by a group which was previously removed. In this case, the
hardware counters will have non-zero values which should be deducted
from what is reported in the new group's counts.

resctrl_arch_reset_rmid() initializes the prev_msr value for counters to
0, causing the initial count to be charged to the new group. Resurrect
__rmid_read() and use it to initialize prev_msr correctly.

Unlike before, __rmid_read() checks for error bits in the MSR read so
that callers don't need to.

Fixes: 1d81d15db3 ("x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221220164132.443083-1-peternewman@google.com
2023-01-10 19:51:59 +01:00
Peter Newman
fe1f071438 x86/resctrl: Fix task CLOSID/RMID update race
When the user moves a running task to a new rdtgroup using the task's
file interface or by deleting its rdtgroup, the resulting change in
CLOSID/RMID must be immediately propagated to the PQR_ASSOC MSR on the
task(s) CPUs.

x86 allows reordering loads with prior stores, so if the task starts
running between a task_curr() check that the CPU hoisted before the
stores in the CLOSID/RMID update then it can start running with the old
CLOSID/RMID until it is switched again because __rdtgroup_move_task()
failed to determine that it needs to be interrupted to obtain the new
CLOSID/RMID.

Refer to the diagram below:

CPU 0                                   CPU 1
-----                                   -----
__rdtgroup_move_task():
  curr <- t1->cpu->rq->curr
                                        __schedule():
                                          rq->curr <- t1
                                        resctrl_sched_in():
                                          t1->{closid,rmid} -> {1,1}
  t1->{closid,rmid} <- {2,2}
  if (curr == t1) // false
   IPI(t1->cpu)

A similar race impacts rdt_move_group_tasks(), which updates tasks in a
deleted rdtgroup.

In both cases, use smp_mb() to order the task_struct::{closid,rmid}
stores before the loads in task_curr().  In particular, in the
rdt_move_group_tasks() case, simply execute an smp_mb() on every
iteration with a matching task.

It is possible to use a single smp_mb() in rdt_move_group_tasks(), but
this would require two passes and a means of remembering which
task_structs were updated in the first loop. However, benchmarking
results below showed too little performance impact in the simple
approach to justify implementing the two-pass approach.

Times below were collected using `perf stat` to measure the time to
remove a group containing a 1600-task, parallel workload.

CPU: Intel(R) Xeon(R) Platinum P-8136 CPU @ 2.00GHz (112 threads)

  # mkdir /sys/fs/resctrl/test
  # echo $$ > /sys/fs/resctrl/test/tasks
  # perf bench sched messaging -g 40 -l 100000

task-clock time ranges collected using:

  # perf stat rmdir /sys/fs/resctrl/test

Baseline:                     1.54 - 1.60 ms
smp_mb() every matching task: 1.57 - 1.67 ms

  [ bp: Massage commit message. ]

Fixes: ae28d1aae4 ("x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR")
Fixes: 0efc89be94 ("x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221220161123.432120-1-peternewman@google.com
2023-01-10 19:47:30 +01:00
Kishon Vijay Abraham I
e2869bd7af x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC
Section 5.2.12.12 Processor Local x2APIC Structure in the ACPI v6.5
spec mandates that both "enabled" and "online capable" Local APIC Flags
should be used to determine if the processor is usable or not.

However, Linux doesn't use the "online capable" flag for x2APIC to
determine if the processor is usable. As a result, cpu_possible_mask has
incorrect value and results in more memory getting allocated for per_cpu
variables than it is going to be used.

Make sure Linux parses both "enabled" and "online capable" flags for
x2APIC to correctly determine if the processor is usable.

Fixes: aa06e20f1b ("x86/ACPI: Don't add CPUs that are not online capable")
Reported-by: Leo Duran <leo.duran@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230105041059.39366-1-kvijayab@amd.com
2023-01-10 19:21:07 +01:00
Ashok Raj
bb5525a506 x86/cpu: Remove redundant extern x86_read_arch_cap_msr()
The prototype for the x86_read_arch_cap_msr() function has moved to
arch/x86/include/asm/cpu.h - kill the redundant definition in arch/x86/kernel/cpu.h
and include the header.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Link: https://lore.kernel.org/r/20221128172451.792595-1-ashok.raj@intel.com
2023-01-10 12:40:24 +01:00
Chuang Wang
9fcad995c6 x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation
For the `FF /digit` opcodes in prepare_emulation, use switch-case
instead of hand-written code to make the logic easier to understand.

Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20221129084022.718355-1-nashuiliang@gmail.com
2023-01-10 12:37:14 +01:00
Tony Luck
8a01ec97dc x86/mce: Mask out non-address bits from machine check bank
Systems that support various memory encryption schemes (MKTME, TDX, SEV)
use high order physical address bits to indicate which key should be
used for a specific memory location.

When a memory error is reported, some systems may report those key
bits in the IA32_MCi_ADDR machine check MSR.

The Intel SDM has a footnote for the contents of the address register
that says: "Useful bits in this field depend on the address methodology
in use when the register state is saved."

AMD Processor Programming Reference has a more explicit description
of the MCA_ADDR register:

 "For physical addresses, the most significant bit is given by
  Core::X86::Cpuid::LongModeInfo[PhysAddrSize]."

Add a new #define MCI_ADDR_PHYSADDR for the mask of valid physical
address bits within the machine check bank address register. Use this
mask for recoverable machine check handling and in the EDAC driver to
ignore any key bits that may be present.

  [ Tony: Based on independent fixes proposed by Fan Du and Isaku Yamahata ]

Reported-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reported-by: Fan Du <fan.du@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20230109152936.397862-1-tony.luck@intel.com
2023-01-10 11:47:07 +01:00
Masami Hiramatsu (Google)
8e791f7eba x86/kprobes: Drop removed INT3 handling code
Drop removed INT3 handling code from kprobe_int3_handler() because this
case (get_kprobe() doesn't return corresponding kprobe AND the INT3 is
removed) must not happen with the kprobe managed INT3, but can happen
with the non-kprobe INT3, which should be handled by other callbacks.

For the kprobe managed INT3, it is already safe. The commit 5c02ece818
("x86/kprobes: Fix ordering while text-patching") introduced
text_poke_sync() to the arch_disarm_kprobe() right after removing INT3.
Since this text_poke_sync() uses IPI to call sync_core() on all online
cpus, that ensures that all running INT3 exception handlers have done.
And, the unregister_kprobe() will remove the kprobe from the hash table
after arch_disarm_kprobe().

Thus, when the kprobe managed INT3 hits, kprobe_int3_handler() should
be able to find corresponding kprobe always by get_kprobe(). If it can
not find any kprobe, this means that is NOT a kprobe managed INT3.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/166981518895.1131462.4693062055762912734.stgit@devnote3
2023-01-07 12:29:08 +01:00
Xu Panda
7ddf0050a2 x86/mce/dev-mcelog: use strscpy() to instead of strncpy()
The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL terminated strings.

Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/202212031419324523731@zte.com.cn
2023-01-07 11:47:35 +01:00
Taehee Yoo
35344cf30f crypto: x86/aria - do not use magic number offsets of aria_ctx
aria-avx assembly code accesses members of aria_ctx with magic number
offset. If the shape of struct aria_ctx is changed carelessly,
aria-avx will not work.
So, we need to ensure accessing members of aria_ctx with correct
offset values, not with magic numbers.

It adds ARIA_CTX_enc_key, ARIA_CTX_dec_key, and ARIA_CTX_rounds in the
asm-offsets.c So, correct offset definitions will be generated.
aria-avx assembly code can access members of aria_ctx safely with
these definitions.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-06 17:15:47 +08:00
Hans de Goede
bd4edba265 x86/rtc: Simplify PNP ids check
compare_pnp_id() already iterates over the single linked pnp_ids list
starting with the id past to it.

So there is no need for add_rtc_cmos() to call compare_pnp_id()
for each id on the list.

No change in functionality intended.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: linux-kernel@vger.kernel.org
2023-01-06 04:22:34 +01:00
Brian Gerst
6be9a8f18f x86/signal/compat: Move sigaction_compat_abi() to signal_64.c
Also remove the now-empty signal_compat.c.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20221219193904.190220-3-brgerst@gmail.com
Cc: Al Viro <viro@zeniv.linux.org.uk>
2023-01-06 04:16:02 +01:00
Brian Gerst
f6e2a56c2b x86/signal: Move siginfo field tests
Move the tests to the appropriate signal_$(BITS).c file.

Convert them to use static_assert(), removing the need for a dummy
function.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20221219193904.190220-2-brgerst@gmail.com
Cc: Al Viro <viro@zeniv.linux.org.uk>
2023-01-06 04:16:02 +01:00
Borislav Petkov (AMD)
5d1dd961e7 x86/alternatives: Add alt_instr.flags
Add a struct alt_instr.flags field which will contain different flags
controlling alternatives patching behavior.

The initial idea was to be able to specify it as a separate macro
parameter but that would mean touching all possible invocations of the
alternatives macros and thus a lot of churn.

What is more, as PeterZ suggested, being able to say ALT_NOT(feature) is
very readable and explains exactly what is meant.

So make the feature field a u32 where the patching flags are the upper
u16 part of the dword quantity while the lower u16 word is the feature.

The highest feature number currently is 0x26a (i.e., word 19) so there
is plenty of space. If that becomes insufficient, the field can be
extended to u64 which will then make struct alt_instr of the nice size
of 16 bytes (14 bytes currently).

There should be no functional changes resulting from this.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Y6RCoJEtxxZWwotd@zn.tnic
2023-01-05 12:46:47 +01:00
Rodrigo Branco
a664ec9158 x86/bugs: Flush IBP in ib_prctl_set()
We missed the window between the TIF flag update and the next reschedule.

Signed-off-by: Rodrigo Branco <bsdaemon@google.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
2023-01-04 11:25:32 +01:00
Jason A. Donenfeld
72bb8f8cc0 x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type
Both <linux/mmiotrace.h> and <asm/insn-eval.h> define various MMIO_ enum constants,
whose namespace overlaps.

Rename the <asm/insn-eval.h> ones to have a INSN_ prefix, so that the headers can be
used from the same source file.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230101162910.710293-2-Jason@zx2c4.com
2023-01-03 18:46:06 +01:00
Takashi Iwai
d00dd2f264 x86/kexec: Fix double-free of elf header buffer
After

  b3e34a47f9 ("x86/kexec: fix memory leak of elf header buffer"),

freeing image->elf_headers in the error path of crash_load_segments()
is not needed because kimage_file_post_load_cleanup() will take
care of that later. And not clearing it could result in a double-free.

Drop the superfluous vfree() call at the error path of
crash_load_segments().

Fixes: b3e34a47f9 ("x86/kexec: fix memory leak of elf header buffer")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221122115122.13937-1-tiwai@suse.de
2023-01-02 18:56:21 +01:00
Paolo Bonzini
fc471e8310 Merge branch 'kvm-late-6.1' into HEAD
x86:

* Change tdp_mmu to a read-only parameter

* Separate TDP and shadow MMU page fault paths

* Enable Hyper-V invariant TSC control

selftests:

* Use TAP interface for kvm_binary_stats_test and tsc_msrs_test

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:36:47 -05:00
Vitaly Kuznetsov
b961aa757f x86/hyperv: Add HV_EXPOSE_INVARIANT_TSC define
Avoid open coding BIT(0) of HV_X64_MSR_TSC_INVARIANT_CONTROL by adding
a dedicated define. While there's only one user at this moment, the
upcoming KVM implementation of Hyper-V Invariant TSC feature will need
to use it as well.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221013095849.705943-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:27 -05:00
Smita Koralahalli
fcd343a285 x86/mce: Add support for Extended Physical Address MCA changes
Newer AMD CPUs support more physical address bits.

That is, the MCA_ADDR registers on Scalable MCA systems contain the
ErrorAddr in bits [56:0] instead of [55:0]. Hence, the existing LSB field
from bits [61:56] in MCA_ADDR must be moved around to accommodate the
larger ErrorAddr size.

MCA_CONFIG[McaLsbInStatusSupported] indicates this change. If set, the
LSB field will be found in MCA_STATUS rather than MCA_ADDR.

Each logical CPU has unique MCA bank in hardware and is not shared with
other logical CPUs. Additionally, on SMCA systems, each feature bit may
be different for each bank within same logical CPU.

Check for MCA_CONFIG[McaLsbInStatusSupported] for each MCA bank and for
each CPU.

Additionally, all MCA banks do not support maximum ErrorAddr bits in
MCA_ADDR. Some banks might support fewer bits but the remaining bits are
marked as reserved.

  [ Yazen: Rebased and fixed up formatting.
    bp: Massage comments. ]

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221206173607.1185907-5-yazen.ghannam@amd.com
2022-12-28 22:37:37 +01:00
Smita Koralahalli
2117654e80 x86/mce: Define a function to extract ErrorAddr from MCA_ADDR
Move MCA_ADDR[ErrorAddr] extraction into a separate helper function. This
will be further refactored to support extended ErrorAddr bits in MCA_ADDR
in newer AMD CPUs.

  [ bp: Massage. ]

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/all/20220225193342.215780-3-Smita.KoralahalliChannabasappa@amd.com/
2022-12-28 22:11:48 +01:00
Masami Hiramatsu (Google)
63dc6325ff x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK
Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
speculative execution after function return, kprobe jump optimization
always fails on the functions with such INT3 inside the function body.
(It already checks the INT3 padding between functions, but not inside
 the function)

To avoid this issue, as same as kprobes, check whether the INT3 comes
from kgdb or not, and if so, stop decoding and make it fail. The other
INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be
treated as a one-byte instruction.

Fixes: e463a09af2 ("x86: Add straight-line-speculation mitigation")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/167146051929.1374301.7419382929328081706.stgit@devnote3
2022-12-27 12:51:58 +01:00
Masami Hiramatsu (Google)
1993bf9799 x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK
Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
speculative execution after RET instruction, kprobes always failes to
check the probed instruction boundary by decoding the function body if
the probed address is after such sequence. (Note that some conditional
code blocks will be placed after function return, if compiler decides
it is not on the hot path.)

This is because kprobes expects kgdb puts the INT3 as a software
breakpoint and it will replace the original instruction.
But these INT3 are not such purpose, it doesn't need to recover the
original instruction.

To avoid this issue, kprobes checks whether the INT3 is owned by
kgdb or not, and if so, stop decoding and make it fail. The other
INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be
treated as a one-byte instruction.

Fixes: e463a09af2 ("x86: Add straight-line-speculation mitigation")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/167146051026.1374301.392728975473572291.stgit@devnote3
2022-12-27 12:51:58 +01:00
Arnd Bergmann
ade8c20847 x86/calldepth: Fix incorrect init section references
The addition of callthunks_translate_call_dest means that
skip_addr() and patch_dest() can no longer be discarded
as part of the __init section freeing:

WARNING: modpost: vmlinux.o: section mismatch in reference: callthunks_translate_call_dest.cold (section: .text.unlikely) -> skip_addr (section: .init.text)
WARNING: modpost: vmlinux.o: section mismatch in reference: callthunks_translate_call_dest.cold (section: .text.unlikely) -> patch_dest (section: .init.text)
WARNING: modpost: vmlinux.o: section mismatch in reference: is_callthunk.cold (section: .text.unlikely) -> skip_addr (section: .init.text)
ERROR: modpost: Section mismatches detected.
Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.

Fixes: b2e9dfe54b ("x86/bpf: Emit call depth accounting if required")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221215164334.968863-1-arnd@kernel.org
2022-12-27 12:51:58 +01:00
Borislav Petkov
ba73e369b7 x86/microcode/AMD: Handle multiple glued containers properly
It can happen that - especially during testing - the microcode
blobs of all families are all glued together in the initrd. The
current code doesn't check whether the current container matched
a microcode patch and continues to the next one, which leads to
save_microcode_in_initrd_amd() to look at the next and thus wrong one:

  microcode: parse_container: ucode: 0xffff88807e9d9082
  microcode: verify_patch: buf: 0xffff88807e9d90ce, buf_size: 26428
  microcode: verify_patch: proc_id: 0x8082, patch_fam: 0x17, this family: 0x17
  microcode: verify_patch: buf: 0xffff88807e9d9d56, buf_size: 23220
  microcode: verify_patch: proc_id: 0x8012, patch_fam: 0x17, this family: 0x17
  microcode: parse_container: MATCH: eq_id: 0x8012, patch proc_rev_id: 0x8012

<-- matching patch found

  microcode: verify_patch: buf: 0xffff88807e9da9de, buf_size: 20012
  microcode: verify_patch: proc_id: 0x8310, patch_fam: 0x17, this family: 0x17
  microcode: verify_patch: buf: 0xffff88807e9db666, buf_size: 16804
  microcode: Invalid type field (0x414d44) in container file section header.
  microcode: Patch section fail

<-- checking chokes on the microcode magic value of the next container.

  microcode: parse_container: saving container 0xffff88807e9d9082
  microcode: save_microcode_in_initrd_amd: scanned containers, data: 0xffff88807e9d9082, size: 9700a

and now if there's a next (and last container) it'll use that in
save_microcode_in_initrd_amd() and not find a proper patch, ofc.

Fix that by moving the out: label up, before the desc->mc check which
jots down the pointer of the matching patch and is used to signal to the
caller that it has found a matching patch in the current container.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221219210656.5140-2-bp@alien8.de
2022-12-26 06:41:05 +01:00
Borislav Petkov
61de9b7036 x86/microcode/AMD: Rename a couple of functions
- Rename apply_microcode_early_amd() to early_apply_microcode():
simplify the name so that it is clear what it does and when does it do
it.

- Rename __load_ucode_amd() to find_blobs_in_containers(): the new name
actually explains what it does.

Document some.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221219210656.5140-1-bp@alien8.de
2022-12-26 06:30:31 +01:00
Linus Torvalds
4f292c4de4 New Feature:
* Randomize the per-cpu entry areas
 Cleanups:
 * Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open
   coding it
 * Move to "native" set_memory_rox() helper
 * Clean up pmd_get_atomic() and i386-PAE
 * Remove some unused page table size macros
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOc53UACgkQaDWVMHDJ
 krCUHw//SGZ+La0hLZLAiAiZTXLZZHpYkOmg1Oj1+11qSU11uZzTFqDpauhaKpRS
 cJCSh+D+RXe5e2ipgt0+Zl0hESLt7pJf8258OE4ra0DL/IlyO9uqruAs9Kn3eRS/
 Fk76nG8gdEU+JKJqpG02GqOLslYQuIy96n9hpuj1x25b614+uezPfC7S4XEat0NT
 MbJQ+jnVDf16aJIJkzT+iSwhubDVeh+bSHeO0SSCzX23WLUqDeg5NvlyxoCHGbBh
 UpUTWggV/0pYAkBKRHToeJs8qTWREwuuH/8JGewpe9A0tjdB5wyZfNL2PuracweN
 9MauXC3T5f0+Ca4yIIaPq1fF7Ny/PR2dBFihk27rOD0N7tjaZxNwal2pB1sZcmvZ
 +PAokjyTPVH5ZXjkMYGGAUe1jyjwr2+TgFSZxhTnDuGtyVQiY4pihGKOifLCX6tv
 x6khvYeTBw7wfaDRtKEAf+2kLHYn+71HszHP/8bNKX9T03h+Zf0i1wdZu5xbM5Gc
 VK2wR7bCC+UftJJYG0pldcHg2qaF19RBHK2tLwp7zngUv7lTbkKfkgKjre73KV2a
 D4b76lrqdUMo6UYwYdw7WtDyarZS4OVLq2DcNhwwMddBCaX8kyN5a4AqwQlZYJ0u
 dM+kuMofE8U3yMxmMhJimkZUsj09yLHIqfynY0jbAcU3nhKZZNY=
 =wwVF
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Dave Hansen:
 "New Feature:

   - Randomize the per-cpu entry areas

  Cleanups:

   - Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it

   - Move to "native" set_memory_rox() helper

   - Clean up pmd_get_atomic() and i386-PAE

   - Remove some unused page table size macros"

* tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  x86/mm: Ensure forced page table splitting
  x86/kasan: Populate shadow for shared chunk of the CPU entry area
  x86/kasan: Add helpers to align shadow addresses up and down
  x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
  x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
  x86/mm: Recompute physical address for every page of per-CPU CEA mapping
  x86/mm: Rename __change_page_attr_set_clr(.checkalias)
  x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
  x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
  x86/mm: Add a few comments
  x86/mm: Fix CR3_ADDR_MASK
  x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
  mm: Convert __HAVE_ARCH_P..P_GET to the new style
  mm: Remove pointless barrier() after pmdp_get_lockless()
  x86/mm/pae: Get rid of set_64bit()
  x86_64: Remove pointless set_64bit() usage
  x86/mm/pae: Be consistent with pXXp_get_and_clear()
  x86/mm/pae: Use WRITE_ONCE()
  x86/mm/pae: Don't (ab)use atomic64
  mm/gup: Fix the lockless PMD access
  ...
2022-12-17 14:06:53 -06:00
Linus Torvalds
71a7507afb Driver Core changes for 6.2-rc1
Here is the set of driver core and kernfs changes for 6.2-rc1.
 
 The "big" change in here is the addition of a new macro,
 container_of_const() that will preserve the "const-ness" of a pointer
 passed into it.
 
 The "problem" of the current container_of() macro is that if you pass in
 a "const *", out of it can comes a non-const pointer unless you
 specifically ask for it.  For many usages, we want to preserve the
 "const" attribute by using the same call.  For a specific example, this
 series changes the kobj_to_dev() macro to use it, allowing it to be used
 no matter what the const value is.  This prevents every subsystem from
 having to declare 2 different individual macros (i.e.
 kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce
 the const value at build time, which having 2 macros would not do
 either.
 
 The driver for all of this have been discussions with the Rust kernel
 developers as to how to properly mark driver core, and kobject, objects
 as being "non-mutable".  The changes to the kobject and driver core in
 this pull request are the result of that, as there are lots of paths
 where kobjects and device pointers are not modified at all, so marking
 them as "const" allows the compiler to enforce this.
 
 So, a nice side affect of the Rust development effort has been already
 to clean up the driver core code to be more obvious about object rules.
 
 All of this has been bike-shedded in quite a lot of detail on lkml with
 different names and implementations resulting in the tiny version we
 have in here, much better than my original proposal.  Lots of subsystem
 maintainers have acked the changes as well.
 
 Other than this change, included in here are smaller stuff like:
   - kernfs fixes and updates to handle lock contention better
   - vmlinux.lds.h fixes and updates
   - sysfs and debugfs documentation updates
   - device property updates
 
 All of these have been in the linux-next tree for quite a while with no
 problems, OTHER than some merge issues with other trees that should be
 obvious when you hit them (block tree deletes a driver that this tree
 modifies, iommufd tree modifies code that this tree also touches).  If
 there are merge problems with these trees, please let me know.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY5wz3A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yks0ACeKYUlVgCsER8eYW+x18szFa2QTXgAn2h/VhZe
 1Fp53boFaQkGBjl8mGF8
 =v+FB
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of driver core and kernfs changes for 6.2-rc1.

  The "big" change in here is the addition of a new macro,
  container_of_const() that will preserve the "const-ness" of a pointer
  passed into it.

  The "problem" of the current container_of() macro is that if you pass
  in a "const *", out of it can comes a non-const pointer unless you
  specifically ask for it. For many usages, we want to preserve the
  "const" attribute by using the same call. For a specific example, this
  series changes the kobj_to_dev() macro to use it, allowing it to be
  used no matter what the const value is. This prevents every subsystem
  from having to declare 2 different individual macros (i.e.
  kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce
  the const value at build time, which having 2 macros would not do
  either.

  The driver for all of this have been discussions with the Rust kernel
  developers as to how to properly mark driver core, and kobject,
  objects as being "non-mutable". The changes to the kobject and driver
  core in this pull request are the result of that, as there are lots of
  paths where kobjects and device pointers are not modified at all, so
  marking them as "const" allows the compiler to enforce this.

  So, a nice side affect of the Rust development effort has been already
  to clean up the driver core code to be more obvious about object
  rules.

  All of this has been bike-shedded in quite a lot of detail on lkml
  with different names and implementations resulting in the tiny version
  we have in here, much better than my original proposal. Lots of
  subsystem maintainers have acked the changes as well.

  Other than this change, included in here are smaller stuff like:

   - kernfs fixes and updates to handle lock contention better

   - vmlinux.lds.h fixes and updates

   - sysfs and debugfs documentation updates

   - device property updates

  All of these have been in the linux-next tree for quite a while with
  no problems"

* tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits)
  device property: Fix documentation for fwnode_get_next_parent()
  firmware_loader: fix up to_fw_sysfs() to preserve const
  usb.h: take advantage of container_of_const()
  device.h: move kobj_to_dev() to use container_of_const()
  container_of: add container_of_const() that preserves const-ness of the pointer
  driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion.
  driver core: fix up missed scsi/cxlflash class.devnode() conversion.
  driver core: fix up some missing class.devnode() conversions.
  driver core: make struct class.devnode() take a const *
  driver core: make struct class.dev_uevent() take a const *
  cacheinfo: Remove of_node_put() for fw_token
  device property: Add a blank line in Kconfig of tests
  device property: Rename goto label to be more precise
  device property: Move PROPERTY_ENTRY_BOOL() a bit down
  device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*()
  kernfs: fix all kernel-doc warnings and multiple typos
  driver core: pass a const * into of_device_uevent()
  kobject: kset_uevent_ops: make name() callback take a const *
  kobject: kset_uevent_ops: make filter() callback take a const *
  kobject: make kobject_namespace take a const *
  ...
2022-12-16 03:54:54 -08:00
Linus Torvalds
fe36bb8736 Tracing updates for 6.2:
- Add options to the osnoise tracer
   o panic_on_stop option that panics the kernel if osnoise is greater than some
     user defined threshold.
   o preempt option, to test noise while preemption is disabled
   o irq option, to test noise when interrupts are disabled
 
 - Add .percent and .graph suffix to histograms to give different outputs
 
 - Add nohitcount to disable showing hitcount in histogram output
 
 - Add new __cpumask() to trace event fields to annotate that a unsigned long
   array is a cpumask to user space and should be treated as one.
 
 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing and take
   snapshots.
 
 - Fix x86/kmmio mmio tracer to work with the updates to lockdep
 
 - Unify the panic and die notifiers
 
 - Add back ftrace_expect reference that is used to extract more information in
   the ftrace_bug() code.
 
 - Have trigger filter parsing errors show up in the tracing error log.
 
 - Updated MAINTAINERS file to add kernel tracing  mailing list and patchwork
   info
 
 - Use IDA to keep track of event type numbers.
 
 - And minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5vIcxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlO7AQCmtZbriadAR6N7Llj092YXmYfzrxyi
 1WS35vhpZsBJ8gEA8j68l+LrgNt51N2gXlTXEHgXzdBgL/TKAPSX4D99GQY=
 =z1pe
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add options to the osnoise tracer:
      - 'panic_on_stop' option that panics the kernel if osnoise is
        greater than some user defined threshold.
      - 'preempt' option, to test noise while preemption is disabled
      - 'irq' option, to test noise when interrupts are disabled

 - Add .percent and .graph suffix to histograms to give different
   outputs

 - Add nohitcount to disable showing hitcount in histogram output

 - Add new __cpumask() to trace event fields to annotate that a unsigned
   long array is a cpumask to user space and should be treated as one.

 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing
   and take snapshots.

 - Fix x86/kmmio mmio tracer to work with the updates to lockdep

 - Unify the panic and die notifiers

 - Add back ftrace_expect reference that is used to extract more
   information in the ftrace_bug() code.

 - Have trigger filter parsing errors show up in the tracing error log.

 - Updated MAINTAINERS file to add kernel tracing mailing list and
   patchwork info

 - Use IDA to keep track of event type numbers.

 - And minor fixes and clean ups

* tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
  tracing: Fix cpumask() example typo
  tracing: Improve panic/die notifiers
  ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernels
  tracing: Do not synchronize freeing of trigger filter on boot up
  tracing: Remove pointer (asterisk) and brackets from cpumask_t field
  tracing: Have trigger filter parsing errors show up in error_log
  x86/mm/kmmio: Remove redundant preempt_disable()
  tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
  Documentation/osnoise: Add osnoise/options documentation
  tracing/osnoise: Add preempt and/or irq disabled options
  tracing/osnoise: Add PANIC_ON_STOP option
  Documentation/osnoise: Escape underscore of NO_ prefix
  tracing: Fix some checker warnings
  tracing/osnoise: Make osnoise_options static
  tracing: remove unnecessary trace_trigger ifdef
  ring-buffer: Handle resize in early boot up
  tracing/hist: Fix issue of losting command info in error_log
  tracing: Fix issue of missing one synthetic field
  tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
  tracing/hist: Fix wrong return value in parse_action_params()
  ...
2022-12-15 18:01:16 -08:00
Linus Torvalds
8fa590bf34 ARM64:
* Enable the per-vcpu dirty-ring tracking mechanism, together with an
   option to keep the good old dirty log around for pages that are
   dirtied by something other than a vcpu.
 
 * Switch to the relaxed parallel fault handling, using RCU to delay
   page table reclaim and giving better performance under load.
 
 * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option,
   which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a9:
   "Fix a number of issues with MTE, such as races on the tags being
   initialised vs the PG_mte_tagged flag as well as the lack of support
   for VM_SHARED when KVM is involved.  Patches from Catalin Marinas and
   Peter Collingbourne").
 
 * Merge the pKVM shadow vcpu state tracking that allows the hypervisor
   to have its own view of a vcpu, keeping that state private.
 
 * Add support for the PMUv3p5 architecture revision, bringing support
   for 64bit counters on systems that support it, and fix the
   no-quite-compliant CHAIN-ed counter support for the machines that
   actually exist out there.
 
 * Fix a handful of minor issues around 52bit VA/PA support (64kB pages
   only) as a prefix of the oncoming support for 4kB and 16kB pages.
 
 * Pick a small set of documentation and spelling fixes, because no
   good merge window would be complete without those.
 
 s390:
 
 * Second batch of the lazy destroy patches
 
 * First batch of KVM changes for kernel virtual != physical address support
 
 * Removal of a unused function
 
 x86:
 
 * Allow compiling out SMM support
 
 * Cleanup and documentation of SMM state save area format
 
 * Preserve interrupt shadow in SMM state save area
 
 * Respond to generic signals during slow page faults
 
 * Fixes and optimizations for the non-executable huge page errata fix.
 
 * Reprogram all performance counters on PMU filter change
 
 * Cleanups to Hyper-V emulation and tests
 
 * Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest
   running on top of a L1 Hyper-V hypervisor)
 
 * Advertise several new Intel features
 
 * x86 Xen-for-KVM:
 
 ** Allow the Xen runstate information to cross a page boundary
 
 ** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
 
 ** Add support for 32-bit guests in SCHEDOP_poll
 
 * Notable x86 fixes and cleanups:
 
 ** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
 
 ** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
    years back when eliminating unnecessary barriers when switching between
    vmcs01 and vmcs02.
 
 ** Clean up vmread_error_trampoline() to make it more obvious that params
    must be passed on the stack, even for x86-64.
 
 ** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
    of the current guest CPUID.
 
 ** Fudge around a race with TSC refinement that results in KVM incorrectly
    thinking a guest needs TSC scaling when running on a CPU with a
    constant TSC, but no hardware-enumerated TSC frequency.
 
 ** Advertise (on AMD) that the SMM_CTL MSR is not supported
 
 ** Remove unnecessary exports
 
 Generic:
 
 * Support for responding to signals during page faults; introduces
   new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
 
 Selftests:
 
 * Fix an inverted check in the access tracking perf test, and restore
   support for asserting that there aren't too many idle pages when
   running on bare metal.
 
 * Fix build errors that occur in certain setups (unsure exactly what is
   unique about the problematic setup) due to glibc overriding
   static_assert() to a variant that requires a custom message.
 
 * Introduce actual atomics for clear/set_bit() in selftests
 
 * Add support for pinning vCPUs in dirty_log_perf_test.
 
 * Rename the so called "perf_util" framework to "memstress".
 
 * Add a lightweight psuedo RNG for guest use, and use it to randomize
   the access pattern and write vs. read percentage in the memstress tests.
 
 * Add a common ucall implementation; code dedup and pre-work for running
   SEV (and beyond) guests in selftests.
 
 * Provide a common constructor and arch hook, which will eventually be
   used by x86 to automatically select the right hypercall (AMD vs. Intel).
 
 * A bunch of added/enabled/fixed selftests for ARM64, covering memslots,
   breakpoints, stage-2 faults and access tracking.
 
 * x86-specific selftest changes:
 
 ** Clean up x86's page table management.
 
 ** Clean up and enhance the "smaller maxphyaddr" test, and add a related
    test to cover generic emulation failure.
 
 ** Clean up the nEPT support checks.
 
 ** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
 
 ** Fix an ordering issue in the AMX test introduced by recent conversions
    to use kvm_cpu_has(), and harden the code to guard against similar bugs
    in the future.  Anything that tiggers caching of KVM's supported CPUID,
    kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
    the caching occurs before the test opts in via prctl().
 
 Documentation:
 
 * Remove deleted ioctls from documentation
 
 * Clean up the docs for the x86 MSR filter.
 
 * Various fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt
 KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF
 mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex
 yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii
 Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW
 MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA==
 =QAYX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Enable the per-vcpu dirty-ring tracking mechanism, together with an
     option to keep the good old dirty log around for pages that are
     dirtied by something other than a vcpu.

   - Switch to the relaxed parallel fault handling, using RCU to delay
     page table reclaim and giving better performance under load.

   - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
     option, which multi-process VMMs such as crosvm rely on (see merge
     commit 382b5b87a9: "Fix a number of issues with MTE, such as
     races on the tags being initialised vs the PG_mte_tagged flag as
     well as the lack of support for VM_SHARED when KVM is involved.
     Patches from Catalin Marinas and Peter Collingbourne").

   - Merge the pKVM shadow vcpu state tracking that allows the
     hypervisor to have its own view of a vcpu, keeping that state
     private.

   - Add support for the PMUv3p5 architecture revision, bringing support
     for 64bit counters on systems that support it, and fix the
     no-quite-compliant CHAIN-ed counter support for the machines that
     actually exist out there.

   - Fix a handful of minor issues around 52bit VA/PA support (64kB
     pages only) as a prefix of the oncoming support for 4kB and 16kB
     pages.

   - Pick a small set of documentation and spelling fixes, because no
     good merge window would be complete without those.

  s390:

   - Second batch of the lazy destroy patches

   - First batch of KVM changes for kernel virtual != physical address
     support

   - Removal of a unused function

  x86:

   - Allow compiling out SMM support

   - Cleanup and documentation of SMM state save area format

   - Preserve interrupt shadow in SMM state save area

   - Respond to generic signals during slow page faults

   - Fixes and optimizations for the non-executable huge page errata
     fix.

   - Reprogram all performance counters on PMU filter change

   - Cleanups to Hyper-V emulation and tests

   - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
     guest running on top of a L1 Hyper-V hypervisor)

   - Advertise several new Intel features

   - x86 Xen-for-KVM:

      - Allow the Xen runstate information to cross a page boundary

      - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured

      - Add support for 32-bit guests in SCHEDOP_poll

   - Notable x86 fixes and cleanups:

      - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

      - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
        a few years back when eliminating unnecessary barriers when
        switching between vmcs01 and vmcs02.

      - Clean up vmread_error_trampoline() to make it more obvious that
        params must be passed on the stack, even for x86-64.

      - Let userspace set all supported bits in MSR_IA32_FEAT_CTL
        irrespective of the current guest CPUID.

      - Fudge around a race with TSC refinement that results in KVM
        incorrectly thinking a guest needs TSC scaling when running on a
        CPU with a constant TSC, but no hardware-enumerated TSC
        frequency.

      - Advertise (on AMD) that the SMM_CTL MSR is not supported

      - Remove unnecessary exports

  Generic:

   - Support for responding to signals during page faults; introduces
     new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks

  Selftests:

   - Fix an inverted check in the access tracking perf test, and restore
     support for asserting that there aren't too many idle pages when
     running on bare metal.

   - Fix build errors that occur in certain setups (unsure exactly what
     is unique about the problematic setup) due to glibc overriding
     static_assert() to a variant that requires a custom message.

   - Introduce actual atomics for clear/set_bit() in selftests

   - Add support for pinning vCPUs in dirty_log_perf_test.

   - Rename the so called "perf_util" framework to "memstress".

   - Add a lightweight psuedo RNG for guest use, and use it to randomize
     the access pattern and write vs. read percentage in the memstress
     tests.

   - Add a common ucall implementation; code dedup and pre-work for
     running SEV (and beyond) guests in selftests.

   - Provide a common constructor and arch hook, which will eventually
     be used by x86 to automatically select the right hypercall (AMD vs.
     Intel).

   - A bunch of added/enabled/fixed selftests for ARM64, covering
     memslots, breakpoints, stage-2 faults and access tracking.

   - x86-specific selftest changes:

      - Clean up x86's page table management.

      - Clean up and enhance the "smaller maxphyaddr" test, and add a
        related test to cover generic emulation failure.

      - Clean up the nEPT support checks.

      - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.

      - Fix an ordering issue in the AMX test introduced by recent
        conversions to use kvm_cpu_has(), and harden the code to guard
        against similar bugs in the future. Anything that tiggers
        caching of KVM's supported CPUID, kvm_cpu_has() in this case,
        effectively hides opt-in XSAVE features if the caching occurs
        before the test opts in via prctl().

  Documentation:

   - Remove deleted ioctls from documentation

   - Clean up the docs for the x86 MSR filter.

   - Various fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
  KVM: x86: Add proper ReST tables for userspace MSR exits/flags
  KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
  KVM: arm64: selftests: Align VA space allocator with TTBR0
  KVM: arm64: Fix benign bug with incorrect use of VA_BITS
  KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
  KVM: x86: Advertise that the SMM_CTL MSR is not supported
  KVM: x86: remove unnecessary exports
  KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
  tools: KVM: selftests: Convert clear/set_bit() to actual atomics
  tools: Drop "atomic_" prefix from atomic test_and_set_bit()
  tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
  KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
  perf tools: Use dedicated non-atomic clear/set bit helpers
  tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
  KVM: arm64: selftests: Enable single-step without a "full" ucall()
  KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
  KVM: Remove stale comment about KVM_REQ_UNHALT
  KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
  KVM: Reference to kvm_userspace_memory_region in doc and comments
  KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
  ...
2022-12-15 11:12:21 -08:00
Pasha Tatashin
82328227db x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
Other architectures and the common mm/ use P*D_MASK, and P*D_SIZE.
Remove the duplicated P*D_PAGE_MASK and P*D_PAGE_SIZE which are only
used in x86/*.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220516185202.604654-1-tatashin@google.com
2022-12-15 10:37:27 -08:00
Peter Zijlstra
d48567c9a0 mm: Introduce set_memory_rox()
Because endlessly repeating:

	set_memory_ro()
	set_memory_x()

is getting tedious.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net
2022-12-15 10:37:26 -08:00
Peter Zijlstra
eb7d389d5b x86/ftrace: Remove SYSTEM_BOOTING exceptions
Now that text_poke is available before ftrace, remove the
SYSTEM_BOOTING exceptions.

Specifically, this cures a W+X case during boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.945960823@infradead.org
2022-12-15 10:37:26 -08:00
Peter Zijlstra
97e3d26b5e x86/mm: Randomize per-cpu entry area
Seth found that the CPU-entry-area; the piece of per-cpu data that is
mapped into the userspace page-tables for kPTI is not subject to any
randomization -- irrespective of kASLR settings.

On x86_64 a whole P4D (512 GB) of virtual address space is reserved for
this structure, which is plenty large enough to randomize things a
little.

As such, use a straight forward randomization scheme that avoids
duplicates to spread the existing CPUs over the available space.

  [ bp: Fix le build. ]

Reported-by: Seth Jenkins <sethjenkins@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-12-15 10:37:26 -08:00
Linus Torvalds
94a855111e - Add the call depth tracking mitigation for Retbleed which has
been long in the making. It is a lighterweight software-only fix for
 Skylake-based cores where enabling IBRS is a big hammer and causes a
 significant performance impact.
 
 What it basically does is, it aligns all kernel functions to 16 bytes
 boundary and adds a 16-byte padding before the function, objtool
 collects all functions' locations and when the mitigation gets applied,
 it patches a call accounting thunk which is used to track the call depth
 of the stack at any time.
 
 When that call depth reaches a magical, microarchitecture-specific value
 for the Return Stack Buffer, the code stuffs that RSB and avoids its
 underflow which could otherwise lead to the Intel variant of Retbleed.
 
 This software-only solution brings a lot of the lost performance back,
 as benchmarks suggest:
 
   https://lore.kernel.org/all/20220915111039.092790446@infradead.org/
 
 That page above also contains a lot more detailed explanation of the
 whole mechanism
 
 - Implement a new control flow integrity scheme called FineIBT which is
 based on the software kCFI implementation and uses hardware IBT support
 where present to annotate and track indirect branches using a hash to
 validate them
 
 - Other misc fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOZp5EACgkQEsHwGGHe
 VUrZFxAAvi/+8L0IYSK4mKJvixGbTFjxN/Swo2JVOfs34LqGUT6JaBc+VUMwZxdb
 VMTFIZ3ttkKEodjhxGI7oGev6V8UfhI37SmO2lYKXpQVjXXnMlv/M+Vw3teE38CN
 gopi+xtGnT1IeWQ3tc/Tv18pleJ0mh5HKWiW+9KoqgXj0wgF9x4eRYDz1TDCDA/A
 iaBzs56j8m/FSykZHnrWZ/MvjKNPdGlfJASUCPeTM2dcrXQGJ93+X2hJctzDte0y
 Nuiw6Y0htfFBE7xoJn+sqm5Okr+McoUM18/CCprbgSKYk18iMYm3ZtAi6FUQZS1A
 ua4wQCf49loGp15PO61AS5d3OBf5D3q/WihQRbCaJvTVgPp9sWYnWwtcVUuhMllh
 ZQtBU9REcVJ/22bH09Q9CjBW0VpKpXHveqQdqRDViLJ6v/iI6EFGmD24SW/VxyRd
 73k9MBGrL/dOf1SbEzdsnvcSB3LGzp0Om8o/KzJWOomrVKjBCJy16bwTEsCZEJmP
 i406m92GPXeaN1GhTko7vmF0GnkEdJs1GVCZPluCAxxbhHukyxHnrjlQjI4vC80n
 Ylc0B3Kvitw7LGJsPqu+/jfNHADC/zhx1qz/30wb5cFmFbN1aRdp3pm8JYUkn+l/
 zri2Y6+O89gvE/9/xUhMohzHsWUO7xITiBavewKeTP9GSWybWUs=
 =cRy1
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Borislav Petkov:

 - Add the call depth tracking mitigation for Retbleed which has been
   long in the making. It is a lighterweight software-only fix for
   Skylake-based cores where enabling IBRS is a big hammer and causes a
   significant performance impact.

   What it basically does is, it aligns all kernel functions to 16 bytes
   boundary and adds a 16-byte padding before the function, objtool
   collects all functions' locations and when the mitigation gets
   applied, it patches a call accounting thunk which is used to track
   the call depth of the stack at any time.

   When that call depth reaches a magical, microarchitecture-specific
   value for the Return Stack Buffer, the code stuffs that RSB and
   avoids its underflow which could otherwise lead to the Intel variant
   of Retbleed.

   This software-only solution brings a lot of the lost performance
   back, as benchmarks suggest:

       https://lore.kernel.org/all/20220915111039.092790446@infradead.org/

   That page above also contains a lot more detailed explanation of the
   whole mechanism

 - Implement a new control flow integrity scheme called FineIBT which is
   based on the software kCFI implementation and uses hardware IBT
   support where present to annotate and track indirect branches using a
   hash to validate them

 - Other misc fixes and cleanups

* tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
  x86/paravirt: Use common macro for creating simple asm paravirt functions
  x86/paravirt: Remove clobber bitmask from .parainstructions
  x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al
  x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit
  x86/Kconfig: Enable kernel IBT by default
  x86,pm: Force out-of-line memcpy()
  objtool: Fix weak hole vs prefix symbol
  objtool: Optimize elf_dirty_reloc_sym()
  x86/cfi: Add boot time hash randomization
  x86/cfi: Boot time selection of CFI scheme
  x86/ibt: Implement FineIBT
  objtool: Add --cfi to generate the .cfi_sites section
  x86: Add prefix symbols for function padding
  objtool: Add option to generate prefix symbols
  objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf
  objtool: Slice up elf_create_section_symbol()
  kallsyms: Revert "Take callthunks into account"
  x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces
  x86/retpoline: Fix crash printing warning
  x86/paravirt: Fix a !PARAVIRT build warning
  ...
2022-12-14 15:03:00 -08:00
Linus Torvalds
c7020e1b34 pci-v6.2-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmOYpTIUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxuZhAAhGjE8voLZeOYwxbvfL69hGTAZ+Me
 x2hqRWVhh/IGWXTTaoSLwSjMMokcmAKN5S/wv8qdCG5sB8EN8FyTBIZDy8PuRRdl
 8UlqlBMSL+d4oSRDCnYLxFNcynLRNnmx2dfcdw9tJ4zjTLN8Y4o8PHFogR6pJ3MT
 sDC8S0myTQKXr4wAGzTZycKsiGManviYtByp6dCcKD3Oy5Q2uZ9OKO2DP2yQpn+F
 c3IJSV9oDz3KR8JVJ5Q1iz9cdMXbGwjkM3JLlHpxhedwjN4ErLumPutKcebtzO5C
 aTqabN7Nnzc4yJusAIfojFCWH7fgaYUyJ3pxcFyJ4tu4m9Last+2I5UB/kV2sYAD
 jWiCYx3sA/mRopNXOnrBGae+Lgy+sQnt8or0grySr0bK+b+ArAGis4uT4A0uASGO
 RUQdIQwz7zhHeQrwAladHWxnx4BEDNCatgfn38p4fklIYKydCY5nfZURMDvHezSR
 G6Nu08hoE9ZXlmkWTFw+5F23wPWKcCpzZj0hf7OroIouXUp8vqSFSqatH5vGkbCl
 bDswck9GdRJ2hl5SvFOeelaXkM42du45TMLU2JmIn6dYYFNrO93JgdvKSU7E2CpG
 AmDIpg1Idxo8fEPPGH1I7RVU5+ilzmmPQQY7poQW+va4/dEd/QVp1+ZZTDnMC1qk
 qi3ck22VdvPU2VU=
 =KULr
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Squash portdrv_{core,pci}.c into portdrv.c to ease maintenance and
     make more things static.

   - Make portdrv bind to Switch Ports that have AER. Previously, if
     these Ports lacked MSI/MSI-X, portdrv failed to bind, which meant
     the Ports couldn't be suspended to low-power states. AER on these
     Ports doesn't use interrupts, and the AER driver doesn't need to
     claim them.

   - Assign PCI domain IDs using ida_alloc(), which makes host bridge
     add/remove work better.

  Resource management:

   - To work better with recent BIOSes that use EfiMemoryMappedIO for
     PCI host bridge apertures, remove those regions from the E820 map
     (E820 entries normally prevent us from allocating BARs). In v5.19,
     we added some quirks to disable E820 checking, but that's not very
     maintainable. EfiMemoryMappedIO means the OS needs to map the
     region for use by EFI runtime services; it shouldn't prevent OS
     from using it.

  PCIe native device hotplug:

   - Build pciehp by default if USB4 is enabled, since Thunderbolt/USB4
     PCIe tunneling depends on native PCIe hotplug.

   - Enable Command Completed Interrupt only if supported to avoid user
     confusion from lspci output that says this is enabled but not
     supported.

   - Prevent pciehp from binding to Switch Upstream Ports; this happened
     because of interaction with acpiphp and caused devices below the
     Upstream Port to disappear.

  Power management:

   - Convert AGP drivers to generic power management. We hope to remove
     legacy power management from the PCI core eventually.

  Virtualization:

   - Fix pci_device_is_present(), which previously always returned
     "false" for VFs, causing virtio hangs when unbinding the driver.

  Miscellaneous:

   - Convert drivers to gpiod API to prepare for dropping some legacy
     code.

   - Fix DOE fencepost error for the maximum data object length.

  Baikal-T1 PCIe controller driver:

   - Add driver and DT bindings.

  Broadcom STB PCIe controller driver:

   - Enable Multi-MSI.

   - Delay 100ms after PERST# deassert to allow power and clocks to
     stabilize.

   - Configure Read Completion Boundary to 64 bytes.

  Freescale i.MX6 PCIe controller driver:

   - Initialize PHY before deasserting core reset to fix a regression in
     v6.0 on boards where the PHY provides the reference.

   - Fix imx6sx and imx8mq clock names in DT schema.

  Intel VMD host bridge driver:

   - Fix Secondary Bus Reset on VMD bridges, which allows reset of NVMe
     SSDs in VT-d pass-through scenarios.

   - Disable MSI remapping, which gets re-enabled by firmware during
     suspend/resume.

  MediaTek PCIe Gen3 controller driver:

   - Add MT7986 and MT8195 support.

  Qualcomm PCIe controller driver:

   - Add SC8280XP/SA8540P basic interconnect support.

  Rockchip DesignWare PCIe controller driver:

   - Base DT schema on common Synopsys schema.

  Synopsys DesignWare PCIe core:

   - Collect DT items shared between Root Port and Endpoint (PERST GPIO,
     PHY info, clocks, resets, link speed, number of lanes, number of
     iATU windows, interrupt info, etc) to snps,dw-pcie-common.yaml.

   - Add dma-ranges support for Root Ports and Endpoints.

   - Consolidate DT resource retrieval for "dbi", "dbi2", "atu", etc. to
     reduce code duplication.

   - Add generic names for clocks and resets to encourage more
     consistent naming across drivers using DesignWare IP.

   - Stop advertising PTM Responder role for Endpoints, which aren't
     allowed to be responders.

  TI J721E PCIe driver:

   - Add j721s2 host mode ID to DT schema.

   - Add interrupt properties to DT schema.

  Toshiba Visconti PCIe controller driver:

   - Fix interrupts array max constraints in DT schema"

* tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (95 commits)
  x86/PCI: Use pr_info() when possible
  x86/PCI: Fix log message typo
  x86/PCI: Tidy E820 removal messages
  PCI: Skip allocate_resource() if too little space available
  efi/x86: Remove EfiMemoryMappedIO from E820 map
  PCI/portdrv: Allow AER service only for Root Ports & RCECs
  PCI: xilinx-nwl: Fix coding style violations
  PCI: mvebu: Switch to using gpiod API
  PCI: pciehp: Enable Command Completed Interrupt only if supported
  PCI: aardvark: Switch to using devm_gpiod_get_optional()
  dt-bindings: PCI: mediatek-gen3: add support for mt7986
  dt-bindings: PCI: mediatek-gen3: add SoC based clock config
  dt-bindings: PCI: qcom: Allow 'dma-coherent' property
  PCI: mt7621: Add sentinel to quirks table
  PCI: vmd: Fix secondary bus reset for Intel bridges
  PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning
  PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db
  PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32)
  PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member
  PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path
  ...
2022-12-14 09:54:10 -08:00
Linus Torvalds
e2ca6ba6ba MM patches for 6.2-rc1.
- More userfaultfs work from Peter Xu.
 
 - Several convert-to-folios series from Sidhartha Kumar and Huang Ying.
 
 - Some filemap cleanups from Vishal Moola.
 
 - David Hildenbrand added the ability to selftest anon memory COW handling.
 
 - Some cpuset simplifications from Liu Shixin.
 
 - Addition of vmalloc tracing support by Uladzislau Rezki.
 
 - Some pagecache folioifications and simplifications from Matthew Wilcox.
 
 - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it.
 
 - Miguel Ojeda contributed some cleanups for our use of the
   __no_sanitize_thread__ gcc keyword.  This series shold have been in the
   non-MM tree, my bad.
 
 - Naoya Horiguchi improved the interaction between memory poisoning and
   memory section removal for huge pages.
 
 - DAMON cleanups and tuneups from SeongJae Park
 
 - Tony Luck fixed the handling of COW faults against poisoned pages.
 
 - Peter Xu utilized the PTE marker code for handling swapin errors.
 
 - Hugh Dickins reworked compound page mapcount handling, simplifying it
   and making it more efficient.
 
 - Removal of the autonuma savedwrite infrastructure from Nadav Amit and
   David Hildenbrand.
 
 - zram support for multiple compression streams from Sergey Senozhatsky.
 
 - David Hildenbrand reworked the GUP code's R/O long-term pinning so
   that drivers no longer need to use the FOLL_FORCE workaround which
   didn't work very well anyway.
 
 - Mel Gorman altered the page allocator so that local IRQs can remnain
   enabled during per-cpu page allocations.
 
 - Vishal Moola removed the try_to_release_page() wrapper.
 
 - Stefan Roesch added some per-BDI sysfs tunables which are used to
   prevent network block devices from dirtying excessive amounts of
   pagecache.
 
 - David Hildenbrand did some cleanup and repair work on KSM COW
   breaking.
 
 - Nhat Pham and Johannes Weiner have implemented writeback in zswap's
   zsmalloc backend.
 
 - Brian Foster has fixed a longstanding corner-case oddity in
   file[map]_write_and_wait_range().
 
 - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
   Chen.
 
 - Shiyang Ruan has done some work on fsdax, to make its reflink mode
   work better under xfstests.  Better, but still not perfect.
 
 - Christoph Hellwig has removed the .writepage() method from several
   filesystems.  They only need .writepages().
 
 - Yosry Ahmed wrote a series which fixes the memcg reclaim target
   beancounting.
 
 - David Hildenbrand has fixed some of our MM selftests for 32-bit
   machines.
 
 - Many singleton patches, as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA
 jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml
 CodAgiA51qwzId3GRytIo/tfWZSezgA=
 =d19R
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - More userfaultfs work from Peter Xu

 - Several convert-to-folios series from Sidhartha Kumar and Huang Ying

 - Some filemap cleanups from Vishal Moola

 - David Hildenbrand added the ability to selftest anon memory COW
   handling

 - Some cpuset simplifications from Liu Shixin

 - Addition of vmalloc tracing support by Uladzislau Rezki

 - Some pagecache folioifications and simplifications from Matthew
   Wilcox

 - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
   it

 - Miguel Ojeda contributed some cleanups for our use of the
   __no_sanitize_thread__ gcc keyword.

   This series should have been in the non-MM tree, my bad

 - Naoya Horiguchi improved the interaction between memory poisoning and
   memory section removal for huge pages

 - DAMON cleanups and tuneups from SeongJae Park

 - Tony Luck fixed the handling of COW faults against poisoned pages

 - Peter Xu utilized the PTE marker code for handling swapin errors

 - Hugh Dickins reworked compound page mapcount handling, simplifying it
   and making it more efficient

 - Removal of the autonuma savedwrite infrastructure from Nadav Amit and
   David Hildenbrand

 - zram support for multiple compression streams from Sergey Senozhatsky

 - David Hildenbrand reworked the GUP code's R/O long-term pinning so
   that drivers no longer need to use the FOLL_FORCE workaround which
   didn't work very well anyway

 - Mel Gorman altered the page allocator so that local IRQs can remnain
   enabled during per-cpu page allocations

 - Vishal Moola removed the try_to_release_page() wrapper

 - Stefan Roesch added some per-BDI sysfs tunables which are used to
   prevent network block devices from dirtying excessive amounts of
   pagecache

 - David Hildenbrand did some cleanup and repair work on KSM COW
   breaking

 - Nhat Pham and Johannes Weiner have implemented writeback in zswap's
   zsmalloc backend

 - Brian Foster has fixed a longstanding corner-case oddity in
   file[map]_write_and_wait_range()

 - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
   Chen

 - Shiyang Ruan has done some work on fsdax, to make its reflink mode
   work better under xfstests. Better, but still not perfect

 - Christoph Hellwig has removed the .writepage() method from several
   filesystems. They only need .writepages()

 - Yosry Ahmed wrote a series which fixes the memcg reclaim target
   beancounting

 - David Hildenbrand has fixed some of our MM selftests for 32-bit
   machines

 - Many singleton patches, as usual

* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
  mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
  mm: mmu_gather: allow more than one batch of delayed rmaps
  mm: fix typo in struct pglist_data code comment
  kmsan: fix memcpy tests
  mm: add cond_resched() in swapin_walk_pmd_entry()
  mm: do not show fs mm pc for VM_LOCKONFAULT pages
  selftests/vm: ksm_functional_tests: fixes for 32bit
  selftests/vm: cow: fix compile warning on 32bit
  selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
  mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
  mm,thp,rmap: fix races between updates of subpages_mapcount
  mm: memcg: fix swapcached stat accounting
  mm: add nodes= arg to memory.reclaim
  mm: disable top-tier fallback to reclaim on proactive reclaim
  selftests: cgroup: make sure reclaim target memcg is unprotected
  selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  mm: memcg: fix stale protection of reclaim target memcg
  mm/mmap: properly unaccount memory on mas_preallocate() failure
  omfs: remove ->writepage
  jfs: remove ->writepage
  ...
2022-12-13 19:29:45 -08:00
Linus Torvalds
a70210f415 - Add support for multiple testing sequences to the Intel In-Field Scan
driver in order to be able to run multiple different test patterns.
 Rework things and remove the BROKEN dependency so that the driver can be
 enabled (Jithu Joseph)
 
 - Remove the subsys interface usage in the microcode loader because it
 is not really needed
 
 - A couple of smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOYjh8ACgkQEsHwGGHe
 VUpu8xAAhY7ywLcAoG9p3AaGiXpryFwnXFBah13o1rkgkJGRaG/eVjPJ4KUUjOQs
 Wo3WUHeeHwmFWq+F/OSRefNsptOLBQ3u/cSza9TDDjPoS3glO5cIFc34JqIItMTg
 L1GMB4LfmD1+9lYpM6Td11/Dluqf7EjeEdF4qDmCRZ5i4YNsaAlM4HtgATavNkYc
 6Bvsi1r7tv7tCNDAEYqEfsQLoc79Yca4W5s86HNIyrxtyk9RLrK75WvRkcpTSnK9
 SEpgpYwZy4iRTtZmePC7BqqbHfV6NoeuRqIMR73FrNK9pQuauGFMPkIx08Sgl3BW
 /YGpefleGBHhy6Dqa6rEPsYS9xHfhqYAde09zzECJWW4VSI0PuFKyfm67ep2O7q6
 zbV2DjxEZ+8kWeO9cDJPedEd8pXC8Ua7H+KNl00npdfNlkBaVR9ZRjX7ZVoiFMi8
 6SRmCr1MLngldSMkUr6cYiLpoXmRzM+7gnKhVzhO6yNa0eihYBAIZ5lei0n9Q01W
 Soxvec2KKeSZraNLoQH0MSndEJY4sqx6lPjlXgFT6gGHzgfQZTg+9INdaPK9gbI7
 tg5j1e0/1UyvWrxYxOdzThtRY1X7Y1QtdpQDcatkVOgR1uZi1CTDx1dxTrHP5jbZ
 7MSKn/8/T61beG6ujjif+pC8kOwNISLNDBBZGNzeLRyx8t9/6jQ=
 =Z2Nu
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode and IFS updates from Borislav Petkov:
 "The IFS (In-Field Scan) stuff goes through tip because the IFS driver
  uses the same structures and similar functionality as the microcode
  loader and it made sense to route it all through this branch so that
  there are no conflicts.

   - Add support for multiple testing sequences to the Intel In-Field
     Scan driver in order to be able to run multiple different test
     patterns. Rework things and remove the BROKEN dependency so that
     the driver can be enabled (Jithu Joseph)

   - Remove the subsys interface usage in the microcode loader because
     it is not really needed

   - A couple of smaller fixes and cleanups"

* tag 'x86_microcode_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/microcode/intel: Do not retry microcode reloading on the APs
  x86/microcode/intel: Do not print microcode revision and processor flags
  platform/x86/intel/ifs: Add missing kernel-doc entry
  Revert "platform/x86/intel/ifs: Mark as BROKEN"
  Documentation/ABI: Update IFS ABI doc
  platform/x86/intel/ifs: Add current_batch sysfs entry
  platform/x86/intel/ifs: Remove reload sysfs entry
  platform/x86/intel/ifs: Add metadata validation
  platform/x86/intel/ifs: Use generic microcode headers and functions
  platform/x86/intel/ifs: Add metadata support
  x86/microcode/intel: Use a reserved field for metasize
  x86/microcode/intel: Add hdr_type to intel_microcode_sanity_check()
  x86/microcode/intel: Reuse microcode_sanity_check()
  x86/microcode/intel: Use appropriate type in microcode_sanity_check()
  x86/microcode/intel: Reuse find_matching_signature()
  platform/x86/intel/ifs: Remove memory allocation from load path
  platform/x86/intel/ifs: Remove image loading during init
  platform/x86/intel/ifs: Return a more appropriate error code
  platform/x86/intel/ifs: Remove unused selection
  x86/microcode: Drop struct ucode_cpu_info.valid
  ...
2022-12-13 15:05:29 -08:00
Linus Torvalds
3ef3ace4e2 - Split MTRR and PAT init code to accomodate at least Xen PV and TDX
guests which do not get MTRRs exposed but only PAT. (TDX guests do not
 support the cache disabling dance when setting up MTRRs so they fall
 under the same category.) This is a cleanup work to remove all the ugly
 workarounds for such guests and init things separately (Juergen Gross)
 
 - Add two new Intel CPUs to the list of CPUs with "normal" Energy
 Performance Bias, leading to power savings
 
 - Do not do bus master arbitration in C3 (ARB_DISABLE) on modern Centaur
 CPUs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOYhIMACgkQEsHwGGHe
 VUpxug//ZKw3hYFroKhsULJi/e0j2nGARiSlJrJcFHl2vgh9yGvDsnYUyM/rgjgt
 cM3uCLbEG7nA6uhB3nupzaXZ8lBM1nU9kiEl/kjQ5oYf9nmJ48fLttvWGfxYN4s3
 kj5fYVhlOZpntQXIWrwxnPqghUysumMnZmBJeKYiYNNfkj62l3xU2Ni4Gnjnp02I
 9MmUhl7pj1aEyOQfM8rovy+wtYCg5WTOmXVlyVN+b9MwfYeK+stojvCZHxtJs9BD
 fezpJjjG+78xKUC7vVZXCh1p1N5Qvj014XJkVl9Hg0n7qizKFZRtqi8I769G2ptd
 exP8c2nDXKCqYzE8vK6ukWgDANQPs3d6Z7EqUKuXOCBF81PnMPSUMyNtQFGNM6Wp
 S5YSvFfCgUjp50IunOpvkDABgpM+PB8qeWUq72UFQJSOymzRJg/KXtE2X+qaMwtC
 0i6VLXfMddGcmqNKDppfGtCjq2W5VrNIIJedtAQQGyl+pl3XzZeNomhJpm/0mVfJ
 8UrlXZeXl/EUQ7qk40gC/Ash27pU9ZDx4CMNMy1jDIQqgufBjEoRIDSFqQlghmZq
 An5/BqMLhOMxUYNA7bRUnyeyxCBypetMdQt5ikBmVXebvBDmArXcuSNAdiy1uBFX
 KD8P3Y1AnsHIklxkLNyZRUy7fb4mgMFenUbgc0vmbYHbFl0C0pQ=
 =Zmgh
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

 - Split MTRR and PAT init code to accomodate at least Xen PV and TDX
   guests which do not get MTRRs exposed but only PAT. (TDX guests do
   not support the cache disabling dance when setting up MTRRs so they
   fall under the same category)

   This is a cleanup work to remove all the ugly workarounds for such
   guests and init things separately (Juergen Gross)

 - Add two new Intel CPUs to the list of CPUs with "normal" Energy
   Performance Bias, leading to power savings

 - Do not do bus master arbitration in C3 (ARB_DISABLE) on modern
   Centaur CPUs

* tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  x86/mtrr: Make message for disabled MTRRs more descriptive
  x86/pat: Handle TDX guest PAT initialization
  x86/cpuid: Carve out all CPUID functionality
  x86/cpu: Switch to cpu_feature_enabled() for X86_FEATURE_XENPV
  x86/cpu: Remove X86_FEATURE_XENPV usage in setup_cpu_entry_area()
  x86/cpu: Drop 32-bit Xen PV guest code in update_task_stack()
  x86/cpu: Remove unneeded 64-bit dependency in arch_enter_from_user_mode()
  x86/cpufeatures: Add X86_FEATURE_XENPV to disabled-features.h
  x86/acpi/cstate: Optimize ARB_DISABLE on Centaur CPUs
  x86/mtrr: Simplify mtrr_ops initialization
  x86/cacheinfo: Switch cache_ap_init() to hotplug callback
  x86: Decouple PAT and MTRR handling
  x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()
  x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init
  x86/mtrr: Get rid of __mtrr_enabled bool
  x86/mtrr: Simplify mtrr_bp_init()
  x86/mtrr: Remove set_all callback from struct mtrr_ops
  x86/mtrr: Disentangle MTRR init from PAT init
  x86/mtrr: Move cache control code to cacheinfo.c
  x86/mtrr: Split MTRR-specific handling from cache dis/enabling
  ...
2022-12-13 14:56:56 -08:00
Linus Torvalds
4eb77fa102 - Do some spring cleaning to the compressed boot code by moving the
EFI mixed-mode code to a separate compilation unit, the AMD memory
 encryption early code where it belongs and fixing up build dependencies.
 Make the deprecated EFI handover protocol optional with the goal of
 removing it at some point (Ard Biesheuvel)
 
 - Skip realmode init code on Xen PV guests as it is not needed there
 
 - Remove an old 32-bit PIC code compiler workaround
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOYaiMACgkQEsHwGGHe
 VUrNVhAAk3lLagEsrBcQ24SnMMAyQvdKfRucn9fbs72jBCyWbDqXcE59qNgdbMS1
 3rIL+EJdF8jlm5K28GjRS1WSvwUyYbyFEfUcYfqZl9L/5PAl7PlG7nNQw7/gXnw+
 xS57w/Q3cONlo5LC0K2Zkbj/59RvDoBEs3nkhozkKR0npTDW/LK3Vl0zgKTkvqsV
 DzRIHhWsqSEvpdowbQmQCyqFh/pOoQlZkQwjYVA9+SaQYdH3Yo1dpLd5i9I9eVmJ
 dci/HDU+plwYYuZ1XhxwXr82PcdCUVYjJ/DTt9GkTVYq7u5EWx62puxTl+c+wbG2
 H1WBXuZHBGdzNMFdnb1k9RuLCaYdaxKTNlZh3FPMMDtkjtjKTl/olXTlFUYFgI6E
 FPv4hi15g6pMveS3K6YUAd0uGvpsjvLUZHPqMDVS2trhxLENQALc6Id/PwqzrQ1T
 FzfPYcDyFFwMM3MDuWc8ClwEDD9wr0Z4m4Aek/ca2r85AKEX8ZtTTlWZoI4E9A4B
 hEjUFnRhT/d6XLWwZqcOIKfwtbpKAjdsCN3ElFst8ogRFAXqW8luDoI4BRCkBC4p
 T4RHdij4afkuFjSAxBacazpaavtcCsDqXwBpeL4YN+4fA7+NokVZGiQVh/3S8BPn
 LlgIf6awFq6yQq7JyEGPdk+dWn5sknldixZ55m666ZLzSvQhvE8=
 =VGZx
 -----END PGP SIGNATURE-----

Merge tag 'x86_boot_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Borislav Petkov:
 "A  of early boot cleanups and fixes.

   - Do some spring cleaning to the compressed boot code by moving the
     EFI mixed-mode code to a separate compilation unit, the AMD memory
     encryption early code where it belongs and fixing up build
     dependencies. Make the deprecated EFI handover protocol optional
     with the goal of removing it at some point (Ard Biesheuvel)

   - Skip realmode init code on Xen PV guests as it is not needed there

   - Remove an old 32-bit PIC code compiler workaround"

* tag 'x86_boot_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Remove x86_32 PIC using %ebx workaround
  x86/boot: Skip realmode init code when running as Xen PV guest
  x86/efi: Make the deprecated EFI handover protocol optional
  x86/boot/compressed: Only build mem_encrypt.S if AMD_MEM_ENCRYPT=y
  x86/boot/compressed: Adhere to calling convention in get_sev_encryption_bit()
  x86/boot/compressed: Move startup32_check_sev_cbit() out of head_64.S
  x86/boot/compressed: Move startup32_check_sev_cbit() into .text
  x86/boot/compressed: Move startup32_load_idt() out of head_64.S
  x86/boot/compressed: Move startup32_load_idt() into .text section
  x86/boot/compressed: Pull global variable reference into startup32_load_idt()
  x86/boot/compressed: Avoid touching ECX in startup32_set_idt_entry()
  x86/boot/compressed: Simplify IDT/GDT preserve/restore in the EFI thunk
  x86/boot/compressed, efi: Merge multiple definitions of image_offset into one
  x86/boot/compressed: Move efi32_pe_entry() out of head_64.S
  x86/boot/compressed: Move efi32_entry out of head_64.S
  x86/boot/compressed: Move efi32_pe_entry into .text section
  x86/boot/compressed: Move bootargs parsing out of 32-bit startup code
  x86/boot/compressed: Move 32-bit entrypoint code into .text section
  x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S
2022-12-13 14:45:29 -08:00
Linus Torvalds
fc4c9f4504 EFI updates for v6.2:
- Refactor the zboot code so that it incorporates all the EFI stub
   logic, rather than calling the decompressed kernel as a EFI app.
 - Add support for initrd= command line option to x86 mixed mode.
 - Allow initrd= to be used with arbitrary EFI accessible file systems
   instead of just the one the kernel itself was loaded from.
 - Move some x86-only handling and manipulation of the EFI memory map
   into arch/x86, as it is not used anywhere else.
 - More flexible handling of any random seeds provided by the boot
   environment (i.e., systemd-boot) so that it becomes available much
   earlier during the boot.
 - Allow improved arch-agnostic EFI support in loaders, by setting a
   uniform baseline of supported features, and adding a generic magic
   number to the DOS/PE header. This should allow loaders such as GRUB or
   systemd-boot to reduce the amount of arch-specific handling
   substantially.
 - (arm64) Run EFI runtime services from a dedicated stack, and use it to
   recover from synchronous exceptions that might occur in the firmware
   code.
 - (arm64) Ensure that we don't allocate memory outside of the 48-bit
   addressable physical range.
 - Make EFI pstore record size configurable
 - Add support for decoding CXL specific CPER records
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmOTQ1cACgkQw08iOZLZ
 jyQRkAv+LqaZFWeVwhAQHiw/N3RnRM0nZHea6++D2p1y/ZbCpwv3pdLl2YHQ1KmW
 wDG9Nr4C1ITLtfy1YZKeYpwloQtq9S1GZDWnFpVv/hdo7L924eRAwIlxowWn1OnP
 ruxv2PaYXyb0plh1YD1f6E1BqrfUOtajET55Kxs9ZsxmnMtDpIX3NiYy4LKMBIZC
 +Eywt41M3uBX+wgmSujFBMVVJjhOX60WhUYXqy0RXwDKOyrz/oW5td+eotSCreB6
 FVbjvwQvUdtzn4s1FayOMlTrkxxLw4vLhsaUGAdDOHd3rg3sZT9Xh1HqFFD6nss6
 ZAzAYQ6BzdiV/5WSB9meJe+BeG1hjTNKjJI6JPO2lctzYJqlnJJzI6JzBuH9vzQ0
 dffLB8NITeEW2rphIh+q+PAKFFNbXWkJtV4BMRpqmzZ/w7HwupZbUXAzbWE8/5km
 qlFpr0kmq8GlVcbXNOFjmnQVrJ8jPYn+O3AwmEiVAXKZJOsMH0sjlXHKsonme9oV
 Sk71c6Em
 =JEXz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "Another fairly sizable pull request, by EFI subsystem standards.

  Most of the work was done by me, some of it in collaboration with the
  distro and bootloader folks (GRUB, systemd-boot), where the main focus
  has been on removing pointless per-arch differences in the way EFI
  boots a Linux kernel.

   - Refactor the zboot code so that it incorporates all the EFI stub
     logic, rather than calling the decompressed kernel as a EFI app.

   - Add support for initrd= command line option to x86 mixed mode.

   - Allow initrd= to be used with arbitrary EFI accessible file systems
     instead of just the one the kernel itself was loaded from.

   - Move some x86-only handling and manipulation of the EFI memory map
     into arch/x86, as it is not used anywhere else.

   - More flexible handling of any random seeds provided by the boot
     environment (i.e., systemd-boot) so that it becomes available much
     earlier during the boot.

   - Allow improved arch-agnostic EFI support in loaders, by setting a
     uniform baseline of supported features, and adding a generic magic
     number to the DOS/PE header. This should allow loaders such as GRUB
     or systemd-boot to reduce the amount of arch-specific handling
     substantially.

   - (arm64) Run EFI runtime services from a dedicated stack, and use it
     to recover from synchronous exceptions that might occur in the
     firmware code.

   - (arm64) Ensure that we don't allocate memory outside of the 48-bit
     addressable physical range.

   - Make EFI pstore record size configurable

   - Add support for decoding CXL specific CPER records"

* tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
  arm64: efi: Recover from synchronous exceptions occurring in firmware
  arm64: efi: Execute runtime services from a dedicated stack
  arm64: efi: Limit allocations to 48-bit addressable physical region
  efi: Put Linux specific magic number in the DOS header
  efi: libstub: Always enable initrd command line loader and bump version
  efi: stub: use random seed from EFI variable
  efi: vars: prohibit reading random seed variables
  efi: random: combine bootloader provided RNG seed with RNG protocol output
  efi/cper, cxl: Decode CXL Error Log
  efi/cper, cxl: Decode CXL Protocol Error Section
  efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
  efi: x86: Move EFI runtime map sysfs code to arch/x86
  efi: runtime-maps: Clarify purpose and enable by default for kexec
  efi: pstore: Add module parameter for setting the record size
  efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
  efi: memmap: Move manipulation routines into x86 arch tree
  efi: memmap: Move EFI fake memmap support into x86 arch tree
  efi: libstub: Undeprecate the command line initrd loader
  efi: libstub: Add mixed mode support to command line initrd loader
  efi: libstub: Permit mixed mode return types other than efi_status_t
  ...
2022-12-13 14:31:47 -08:00
Linus Torvalds
75f4d9af8b iov_iter work; most of that is about getting rid of
direction misannotations and (hopefully) preventing
 more of the same for the future.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY5ZzQAAKCRBZ7Krx/gZQ
 65RZAP4nTkvOn0NZLVFkuGOx8pgJelXAvrteyAuecVL8V6CR4AD40qCVY51PJp8N
 MzwiRTeqnGDxTTF7mgd//IB6hoatAA==
 =bcvF
 -----END PGP SIGNATURE-----

Merge tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull iov_iter updates from Al Viro:
 "iov_iter work; most of that is about getting rid of direction
  misannotations and (hopefully) preventing more of the same for the
  future"

* tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  use less confusing names for iov_iter direction initializers
  iov_iter: saner checks for attempt to copy to/from iterator
  [xen] fix "direction" argument of iov_iter_kvec()
  [vhost] fix 'direction' argument of iov_iter_{init,bvec}()
  [target] fix iov_iter_bvec() "direction" argument
  [s390] memcpy_real(): WRITE is "data source", not destination...
  [s390] zcore: WRITE is "data source", not destination...
  [infiniband] READ is "data destination", not source...
  [fsi] WRITE is "data source", not destination...
  [s390] copy_oldmem_kernel() - WRITE is "data source", not destination
  csum_and_copy_to_iter(): handle ITER_DISCARD
  get rid of unlikely() on page_copy_sane() calls
2022-12-12 18:29:54 -08:00
Linus Torvalds
268325bda5 Random number generator updates for Linux 6.2-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe
 A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw
 bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I
 RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX
 WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS
 waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT
 ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6
 /ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI
 PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX
 RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4
 CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq
 APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE=
 =QRhK
 -----END PGP SIGNATURE-----

Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:

 - Replace prandom_u32_max() and various open-coded variants of it,
   there is now a new family of functions that uses fast rejection
   sampling to choose properly uniformly random numbers within an
   interval:

       get_random_u32_below(ceil) - [0, ceil)
       get_random_u32_above(floor) - (floor, U32_MAX]
       get_random_u32_inclusive(floor, ceil) - [floor, ceil]

   Coccinelle was used to convert all current users of
   prandom_u32_max(), as well as many open-coded patterns, resulting in
   improvements throughout the tree.

   I'll have a "late" 6.1-rc1 pull for you that removes the now unused
   prandom_u32_max() function, just in case any other trees add a new
   use case of it that needs to converted. According to linux-next,
   there may be two trivial cases of prandom_u32_max() reintroductions
   that are fixable with a 's/.../.../'. So I'll have for you a final
   conversion patch doing that alongside the removal patch during the
   second week.

   This is a treewide change that touches many files throughout.

 - More consistent use of get_random_canary().

 - Updates to comments, documentation, tests, headers, and
   simplification in configuration.

 - The arch_get_random*_early() abstraction was only used by arm64 and
   wasn't entirely useful, so this has been replaced by code that works
   in all relevant contexts.

 - The kernel will use and manage random seeds in non-volatile EFI
   variables, refreshing a variable with a fresh seed when the RNG is
   initialized. The RNG GUID namespace is then hidden from efivarfs to
   prevent accidental leakage.

   These changes are split into random.c infrastructure code used in the
   EFI subsystem, in this pull request, and related support inside of
   EFISTUB, in Ard's EFI tree. These are co-dependent for full
   functionality, but the order of merging doesn't matter.

 - Part of the infrastructure added for the EFI support is also used for
   an improvement to the way vsprintf initializes its siphash key,
   replacing an sleep loop wart.

 - The hardware RNG framework now always calls its correct random.c
   input function, add_hwgenerator_randomness(), rather than sometimes
   going through helpers better suited for other cases.

 - The add_latent_entropy() function has long been called from the fork
   handler, but is a no-op when the latent entropy gcc plugin isn't
   used, which is fine for the purposes of latent entropy.

   But it was missing out on the cycle counter that was also being mixed
   in beside the latent entropy variable. So now, if the latent entropy
   gcc plugin isn't enabled, add_latent_entropy() will expand to a call
   to add_device_randomness(NULL, 0), which adds a cycle counter,
   without the absent latent entropy variable.

 - The RNG is now reseeded from a delayed worker, rather than on demand
   when used. Always running from a worker allows it to make use of the
   CPU RNG on platforms like S390x, whose instructions are too slow to
   do so from interrupts. It also has the effect of adding in new inputs
   more frequently with more regularity, amounting to a long term
   transcript of random values. Plus, it helps a bit with the upcoming
   vDSO implementation (which isn't yet ready for 6.2).

 - The jitter entropy algorithm now tries to execute on many different
   CPUs, round-robining, in hopes of hitting even more memory latencies
   and other unpredictable effects. It also will mix in a cycle counter
   when the entropy timer fires, in addition to being mixed in from the
   main loop, to account more explicitly for fluctuations in that timer
   firing. And the state it touches is now kept within the same cache
   line, so that it's assured that the different execution contexts will
   cause latencies.

* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
  random: include <linux/once.h> in the right header
  random: align entropy_timer_state to cache line
  random: mix in cycle counter when jitter timer fires
  random: spread out jitter callback to different CPUs
  random: remove extraneous period and add a missing one in comments
  efi: random: refresh non-volatile random seed when RNG is initialized
  vsprintf: initialize siphash key using notifier
  random: add back async readiness notifier
  random: reseed in delayed work rather than on-demand
  random: always mix cycle counter in add_latent_entropy()
  hw_random: use add_hwgenerator_randomness() for early entropy
  random: modernize documentation comment on get_random_bytes()
  random: adjust comment to account for removed function
  random: remove early archrandom abstraction
  random: use random.trust_{bootloader,cpu} command line option only
  stackprotector: actually use get_random_canary()
  stackprotector: move get_random_canary() into stackprotector.h
  treewide: use get_random_u32_inclusive() when possible
  treewide: use get_random_u32_{above,below}() instead of manual loop
  treewide: use get_random_u32_below() instead of deprecated function
  ...
2022-12-12 16:22:22 -08:00
Linus Torvalds
2f60f83084 - Have alternatives patch the same sections in modules as in vmlinux
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOXitsACgkQEsHwGGHe
 VUp8IA//W0CiKuDcqZcMLzl5A16ZOSNXS1xuxBjFcDhS2JtCb3NZEeXLPRcEWglO
 HwOZpLq5gc32kSboujT4XqKFnrtcdS94fO+BPwB/xxlM4Y4WWp4JRwbAylzGOOft
 5NmZFB35zLMAKDpCogrigYtvav+usZqeCt2SRxAGrK8MuCXLk53OndQdChfJj0+O
 VzZsd6gdhjCJ20lZzSYiAZWUYE1Ibfd6hch37A/T1bLD8crANWJPV97PCCJivWIH
 PVx3NTWzCjSX507eX3+v1Nf8a+GpCGcJzJwu8+0o5T6lWrf4vyXF/Evz4jgdUe4i
 8ZeeCDTsIgbDT7WhLpM6DS5SvkgrYkCamsSQzFLFdeXwpPlgvogyc6DmoQvPy1sw
 WFPTYy+HOp/5Slz7GIPN2WdjE5RkfFDQ6a+w72R6YZDlZLYEKCWciZmRwOlrtmR6
 K2ujo4ipEK9I7QW9EES2WAAvaM0AsxfjX545T9IDI4W+AXil+m08TFDPwmoag4Ja
 q0MUTDFAfmvdhAa2rxJuSedbpjYflb/uuYHqX/kdR9syHhtJpiw1TypaVYD89hpL
 AceNGn1gHMVTcC19+Ey9uoL9+Y3VfazD1UIFuzK/iFIgsieqK6zxTRKo8mZ0NHYE
 fNnA2sdcteKgaW+d2aKsb0yF1OZ05nfg1YgmUZLbcE1oXmGEJ5A=
 =expz
 -----END PGP SIGNATURE-----

Merge tag 'x86_alternatives_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 alternative update from Borislav Petkov:
 "A single alternatives patching fix for modules:

   - Have alternatives patch the same sections in modules as in vmlinux"

* tag 'x86_alternatives_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternative: Consistently patch SMP locks in vmlinux and modules
2022-12-12 14:54:24 -08:00
Linus Torvalds
9196a0ba9f - Fix confusing output from /sys/kernel/debug/ras/daemon_active
- Add another MCE severity error case to the Intel error severity
 table to promote UC and AR errors to panic severity and remove the
 corresponding code condition doing that.
 
 - Make sure the thresholding and deferred error interrupts on AMD SMCA
 systems clear the all registers reporting an error so that there are no
 multiple errors logged for the same event
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOXiPcACgkQEsHwGGHe
 VUomHQ/9Gj6Go0ILvIEpn3VCC8bRc0nf/SFJ+4BnFBc+GdiN6ePhDwn1of3bPk/d
 zNuPFGEa1rV7r97MDsggGetvNVVrA6zumoPmLrHPrZSvhRyCW620J43RH+3T4bzz
 XfCMU6oRJg+F4dFUPnnAnp/6DTwLSe0ofpc2eARlmjdOxTo4MRfxWfwe5XALGGN1
 q+8ycB3Gb8cvhjlB61PL7hhjuy6yH29v63vjUMqsyfDmVhXRxY3xymg+4SxalCBf
 0Zz8/RRJFHSOwzUPsQUm9kVMN8phhJ/fN0B1wsLDWlt0K5Vx5D19l+wbH4aaTTcF
 8bWMMmS43raFuARkcROAEbDrWM2kEo5Qe9eWhZ7HB9wLG9SicJfZIH5Th4ul40V8
 4RARSj1ve4vfxzNmmhFf+RL8kdYWLpxlwVJUhdHkiqKTqN8SIQMb/dsNg/7KjFsV
 N3PSZ0lOEQ5Q2l5fSZoL+auqXgJBD5BUy+Gjk0awZavzZCdI35/LK8xVzrgCgsRk
 AlAcfvngpyZB7A0aeiwalIszcyjk1cnK8+RwIRtvM0CAYhKP+SVTvq4wHtRiO5HD
 TfuVgaHSgOyoOD0NdUGM6PXgrBXqnyIYI2me8Gwtg7gzenizBsv/uh8cvWa3DQnG
 5NItCYEG7Paa7p3VQDvFqxKldIC1Bjwi2pYYIDOgRiwWGbYzSCU=
 =sKP5
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

 - Fix confusing output from /sys/kernel/debug/ras/daemon_active

 - Add another MCE severity error case to the Intel error severity table
   to promote UC and AR errors to panic severity and remove the
   corresponding code condition doing that.

 - Make sure the thresholding and deferred error interrupts on AMD SMCA
   systems clear the all registers reporting an error so that there are
   no multiple errors logged for the same event

* tag 'ras_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  RAS: Fix return value from show_trace()
  x86/mce: Use severity table to handle uncorrected errors in kernel
  x86/MCE/AMD: Clear DFR errors found in THR handler
2022-12-12 14:51:56 -08:00
Linus Torvalds
40deb5e41a * Clarify XSAVE consistency warnings
* Fix up ptrace interface to protection keys register (PKRU)
  * Avoid undefined compiler behavior with TYPE_ALIGN
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYisACgkQaDWVMHDJ
 krAJkA//QRChRwyKi1syinXt2SGoSa3mTzP23SyV0TunOfKBiBUreFJ2mMFjsX0h
 V7SJcu82sCWLHAY6LZRdyiF8zK3Cfzpbgb1QfzBCefE/gU801FhCypqNbQO5Lpdr
 PEo+naaDOzwDWDt0A6OkAArgb0zfaOGL+OBhuwT7mcUtBz6gCakFqG2BMgOzqD1z
 SAp0RraoSsFnKFl5Gv44+gkThq8/8yL5tyrJtnGv1jAsbhw9zmloaOue6MNMPJhH
 3sFQnML3qeNRozquWWeCPu/hxWuFDitPhwdmNRZrnQ3DyRdDhCZPOjv+tQmxI3EO
 5c+UIkMIsRh2nZLwHcM+iO5cWE7lyiAWpgqqArB+r2CFXWK5q2lplhXngBodE9Kr
 ki/NZ6oEitT3+bLXhCwyc7WKxohl2IlmclJ4AD3Qrp4bzPhfsZebL6nNs/3bxWuF
 CxJWIKzjtIcgNSEJaDOzFA5CAImq74r/kCW4e11ZXwmOnx6PX1YG6p0C1yknrZYJ
 bvy8WxureO7OJEcVZfwxpXLYbb+7Q/k/l2DkUdVAvKSCB81uWR4JzEp4oooDxf2j
 6x9qT5Mi95FhAHOCmlxwkQJTBCB36LkVF/3ESEOqJmun4F5ghPbMX2JzpBa6jPCS
 lzkBrzA8MAdmaLHhDO+nd5m8HVY3QBSXDVtRTycmuloeoSeyBno=
 =An0n
 -----END PGP SIGNATURE-----

Merge tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Dave Hansen:
 "There are two little fixes in here, one to give better XSAVE warnings
  and another to address some undefined behavior in offsetof().

  There is also a collection of patches to fix some issues with ptrace
  and the protection keys register (PKRU). PKRU is a real oddity because
  it is exposed in the XSAVE-related ABIs, but it is generally managed
  without using XSAVE in the kernel. This fix thankfully came with a
  selftest to ward off future regressions.

  Summary:

   - Clarify XSAVE consistency warnings

   - Fix up ptrace interface to protection keys register (PKRU)

   - Avoid undefined compiler behavior with TYPE_ALIGN"

* tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN
  selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace
  x86/fpu: Emulate XRSTOR's behavior if the xfeatures PKRU bit is not set
  x86/fpu: Allow PKRU to be (once again) written by ptrace.
  x86/fpu: Add a pkru argument to copy_uabi_to_xstate()
  x86/fpu: Add a pkru argument to copy_uabi_from_kernel_to_xstate().
  x86/fpu: Take task_struct* in copy_sigframe_from_user_to_xstate()
  x86/fpu/xstate: Fix XSTATE_WARN_ON() to emit relevant diagnostics
2022-12-12 14:41:57 -08:00
Linus Torvalds
1cab145a94 Add a sysctl to control the split lock misery mode
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYlcACgkQaDWVMHDJ
 krB+IQ//fLNzHnNmTaFq3TlmF9HiJ926uCu2C3MMma0g8NlFXGLTbeI/UaaER12/
 m/N4d+/WPSO1PnsMR36f10Byr8PN2fpbJHGMsHtZ4Y4MTnGycM6JxjDYeFuaSPB5
 Cw2IRsO6c7X/dWEVW7hLbHhlG4MpsiX9APt2/PBGpGJm88wL1RDosMKst6430UQK
 24JZtFbdyaPnUlo48ql85VkGtdgFHXRnebhM0sX95bVWdSLvNWUSpQAyETp+U9rn
 CH75pnoKcJspKun5FmdN2n3gix8Rumz8OZuv9e4XAfBl94H4OZ+SeRN4YbKUvzJP
 PtSCz7PT8VQNsJVCA58TQ+QdmhtKsT4ia0ylDvMhHiozzjUNeeS54qJQSUyPLOqK
 dBl4hl6BmGMMH2fAZGeoxVmVZMIdLaE0PBECjBEuPAG15IqlxQwTdSeyo0k+S0wV
 wYUtCqmxOItW3TA8y044zDjCcIN6wiFymBJtjKbAMxz54ONfnUqgAUluXLeE3xim
 8UqL/uM869Ptu6sDO6sfROd1K8EA3KXrsmOGZV7s9hp+qGQcxsvUDhePT9EosS/G
 JcmYspV211FO2fTAAOiCe5SJRkoPw/lRWufjNNNWWd3mawJhDeYujZ2fQAxEThC+
 Mf8FyFsbxOdbJ1UatgWs/iLOnVwMJf/E1hraq7mdRuZHbNQm7H4=
 =yyO+
 -----END PGP SIGNATURE-----

Merge tag 'x86_splitlock_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 splitlock updates from Dave Hansen:
 "Add a sysctl to control the split lock misery mode.

  This enables users to reduce the penalty inflicted on split lock
  users. There are some proprietary, binary-only games which became
  entirely unplayable with the old penalty.

  Anyone opting into the new mode is, of course, more exposed to the DoS
  nasitness inherent with split locks, but they can play their games
  again"

* tag 'x86_splitlock_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/split_lock: Add sysctl to control the misery mode
2022-12-12 14:39:51 -08:00
Linus Torvalds
287f037db5 Minor cleanups:
* Remove unnecessary arch_has_empty_bitmaps structure memory
  * Move rescrtl MSR defines into msr-index.h, like normal MSRs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYhsACgkQaDWVMHDJ
 krA75w//XmOC929XGMOY7WQL6IZlH62xsJbtb3BhmM24Ho7RHSNQGPD+ukArCb0u
 V/w50Q4crQrLsIxqWjXkyDQ7w66PvvsAIhYFBEV4kssRli9y173CzJQt/lQfUXL9
 T7vG5WY1n4f+vtvmZfwcFaGOPkZ5edp8v1y8Grk3r93ci2VDSk+yvEiq80c+JQoX
 ZnEYPxGPUpwAVuaysY8wkGCEc4Yln6gtTKzpVPXE18WAs82OeiCWBfldI/+95j3o
 /5r5asYQpD8bVhtLHi1mepkBAGbeVNWhSJVlOE9HdU9WnzCkNKn1ZXRuXSBlvTeq
 FPjg6vsBXuz8zQV4Dd3Jk3hWv3H/4sTWsgiyUFdHtz/VlE9M8NjGcE4caOgSuBqR
 2ovI/HwdvdYyiZwvNN0fXrnzEn1MliSXDgAscNuxzovJXqdTP2BpUj0SVlZdVs0U
 0xba5sZ5A6fh2SwKX7JQYYsEh4gudiixR+D2l5u7EUOiNyfw0DZgWi/ElpvX4ncy
 QvDIIqlm29A/VkJQAdSHJc0ew+w39M7f3VNfQviLXxudGFuhrg+kXlI1UYGcX/cH
 4LEjmE1KCymmq7v+7+zBrHwsVCxr5mi/CZnx+/4Y/2O+xOKJ1U7GQDXWzu/SC+aF
 tEwqDCldYKjqrfdkmuGXSt2YipkNOC2EBLY32mW7rtTDDIXPSro=
 =n2UE
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cache resource control updates from Dave Hansen:
 "These declare the resource control (rectrl) MSRs a bit more normally
  and clean up an unnecessary structure member:

   - Remove unnecessary arch_has_empty_bitmaps structure memory

   - Move rescrtl MSR defines into msr-index.h, like normal MSRs"

* tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Move MSR defines into msr-index.h
  x86/resctrl: Remove arch_has_empty_bitmaps
2022-12-12 14:30:54 -08:00
Linus Torvalds
2da68a77b9 * Introduce a new SGX feature (Asynchrounous Exit Notification)
for bare-metal enclaves and KVM guests to mitigate single-step
    attacks
  * Increase batching to speed up enclave release
  * Replace kmap/kunmap_atomic() calls
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYkEACgkQaDWVMHDJ
 krB5Og//Vn0oy0pGhda+LtHJgpa9/qPlzvoZCBxi/6SfLneadE5/g/q2KHbiCgVf
 sQ6SEZ0MiVc2SrQcA6CntMO+stJIHG4LqYutygfKDoxXHGzxotzvzTmRV7Qxfhj5
 LrPfl4cLWVO/jGDs0XQpOVFykKgdMcg1OjlnQYfriFiIiBkcClC7F0zYrOWAQWW0
 z+4h3mlWzyAcBdxrZ9qPVqBMbM3qVKQWeE4D9K2Edfgx1lhQBmvtRdYXTplk08tV
 DrfEkG5L189lrwlmbkKT5+pXSTmJqJzBoYyAGOH8n4Wb9aKLdagJErVg0ocXx8uV
 ngPFU5vmaZza7EZcQheu8iRfM+zQCrcVjBImrRLyQPgCeMBX7o75axYvu4/bvPkP
 3+1/JUL6/m738Fqom4wUKdeoJFw/HLGRyQ36yhZAEzH7wPv7/9Q1zpdxcypE6a+Q
 B7UGQNVXV9g5Ivhe44gZIKx/3VL7AthtyCQvhwGQzzm4jX2SwnQKNXy0iKlJr2iI
 LyREdYlJsRR1/wMdjnj2QqtnWPRZ5/rzl7bvWqiXa4xyvcgArrBowjMdZBttaItJ
 cVK5Aj2bvR3Yc/e9GtPoLvwU5IwtoXgUe1B4DsJtoFoUq7gUGZZcEd5uAYRAk7PX
 lyP2LQNxX5i150cxjlSYLLLTNmwvZQ+5PFq+V5+McKbAge8OD8g=
 =bIXL
 -----END PGP SIGNATURE-----

Merge tag 'x86_sgx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 sgx updates from Dave Hansen:
 "The biggest deal in this series is support for a new hardware feature
  that allows enclaves to detect and mitigate single-stepping attacks.

  There's also a minor performance tweak and a little piece of the
  kmap_atomic() -> kmap_local() transition.

  Summary:

   - Introduce a new SGX feature (Asynchrounous Exit Notification) for
     bare-metal enclaves and KVM guests to mitigate single-step attacks

   - Increase batching to speed up enclave release

   - Replace kmap/kunmap_atomic() calls"

* tag 'x86_sgx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Replace kmap/kunmap_atomic() calls
  KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest
  x86/sgx: Allow enclaves to use Asynchrounous Exit Notification
  x86/sgx: Reduce delay and interference of enclave release
2022-12-12 14:18:44 -08:00
Linus Torvalds
631aa74442 Updates for miscellaneous x86 areas:
- Reserve a new boot loader type for barebox which is usally used on ARM
     and MIPS, but can also be utilized as EFI payload on x86 to provide
     watchdog-supervised boot up.
 
   - Consolidate the native and compat 32bit signal handling code and split
     the 64bit version out into a separate source file
 
   - Switch the ESPFIX random usage to get_random_long().
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUvMQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoQmmD/9xVeaZbBInehnzbsZi4C4WyOMGUg4l
 AoZC0QSzp2hFZRwpbu4Df1Zh2VN5nItAhQUvNLfdZv9/GL5VkhO+J5fPEHUbtnQ8
 34TujaTHAssyib8uRFTAxxGSz3S2jPRrzUloZ71M+Whx7Fw7Fh8M/t8DmnvnaPtw
 uYbBmZd9mZ0Y7BVMoXh70V0nd21PN8a8qQhYRaUD7lyb1w6Tcfzag4J1DXFfP8Lm
 ovaf2AW3mgt+RmzIRNqP28weLt/VxFC38H/nZ9Jlc9npfnLTyGfwfOxE0CILfEo+
 cYYVbMaIN+vs5kJQaVbvEJvk7oumLC9CvwE6oIL8J0XOs8dbBHkbZPQYW0yVF1/m
 rXEd3LBSNhnZIF0aMUoJrBZAI++nGZo0izSu3eGwLZXSbWBVjlzPAqeBJQtqfQ/E
 j87IisQjkWeOOSNvBas1bURWa7Gy5QFRCxbJQFfAZjIHhg+fIwxrK0HlSqxUXqK5
 PRbc1LsWjUn9TspOC+mRIKrqAfetkohL7BGc+uuslH3uXiMQVAghg37+rSqvAjkn
 50d8XxqOd7aC0NOVn8BfxhMf85Ge7z/0r7JJcaLcRY7/CP6S3vTCAgbSjN4+WzfN
 sRu5W/m8oLuF8Q9DdgqtqiNrYezhoEKJHZsGoi/IGy6eAYjMxPX/Cl4YysdqV32N
 Z55ZeEBwg9KC1g==
 =AHdL
 -----END PGP SIGNATURE-----

Merge tag 'x86-misc-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Thomas Gleixner:
 "Updates for miscellaneous x86 areas:

   - Reserve a new boot loader type for barebox which is usally used on
     ARM and MIPS, but can also be utilized as EFI payload on x86 to
     provide watchdog-supervised boot up.

   - Consolidate the native and compat 32bit signal handling code and
     split the 64bit version out into a separate source file

   - Switch the ESPFIX random usage to get_random_long()"

* tag 'x86-misc-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/espfix: Use get_random_long() rather than archrandom
  x86/signal/64: Move 64-bit signal code to its own file
  x86/signal/32: Merge native and compat 32-bit signal code
  x86/signal: Add ABI prefixes to frame setup functions
  x86/signal: Merge get_sigframe()
  x86: Remove __USER32_DS
  signal/compat: Remove compat_sigset_t override
  x86/signal: Remove sigset_t parameter from frame setup functions
  x86/signal: Remove sig parameter from frame setup functions
  Documentation/x86/boot: Reserve type_of_loader=13 for barebox
2022-12-12 13:01:14 -08:00
Linus Torvalds
79ad89123c A set of x86 cleanups:
- Rework the handling of x86_regset for 32 and 64 bit. The original
     implementation tried to minimize the allocation size with quite some
     hard to understand and fragile tricks. Make it robust and straight
     forward by separating the register enumerations for 32 and 64 bit
     completely.
 
   - Add a few missing static annotations
 
   - Remove the stale unused setup_once() assembly function
 
   - Address a few minor static analysis and kernel-doc warnings
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUu0ATHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUNzEACNn5XbRqxPQZak5XHeJ46/VNVTqTE0
 Z7euwF8oP+aAybyDevvm18D2hB9Atn4vU9QJYhnTxBXbCLUNErKrH8FcXdNOBbeC
 YdAX7nO5WH8IM+drCMySeK6Tv6rvhnDUtgBzdBSl4NdPXUSOnGo+jHqHfN/Q+/n0
 yvbwSoVAjD01sxVZQqKQOrzDgDuR/zlISCVudfS+tR4Rm/CYj0cl+MQS9Z1VM3Z6
 7pqyypd5+CyNAD6vTDY/q+ZK0ShfNnU9TIIoGmOB/pc0kLctwIu3MY76Uo2DUgGn
 n/ItR9mvYu/QelCwX02VG3aRYJPLRfBa+DjQfZUwZapRz3rsjKtfa8ogpPZTLrSO
 o4ht/jxlKKDyNOQKYeL2yy054JR4DkKziilEzw5GZHeH2y66XWudRuWfMwbTdrGc
 esP5fSNfZ9uluYl6GCCw6S83RJzQ8aZXRcAy7CJgw2Qb4XE7IOA2jf18x5AYaDUp
 4a6HCjbxYkEmKCkzkh9+w5koYruyizMBKMBBh5QsMzH4xp20s/vffHwbZ1tls9Za
 eTDC/E+wW9Om3qynRynm0EmcHpa0j+RcmkHOhFcXj6SRLnhzktk4Rrr3vlhardS3
 Pc8h3GnE5mFXqS8t3r6/hvMk+6svhSu3RbICiLNU72F/tVLU628ux/WoCKfXZloE
 7HxWoVhkTF7eOw==
 =DTBQ
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Thomas Gleixner:
 "A set of x86 cleanups:

   - Rework the handling of x86_regset for 32 and 64 bit.

     The original implementation tried to minimize the allocation size
     with quite some hard to understand and fragile tricks. Make it
     robust and straight forward by separating the register enumerations
     for 32 and 64 bit completely.

   - Add a few missing static annotations

   - Remove the stale unused setup_once() assembly function

   - Address a few minor static analysis and kernel-doc warnings"

* tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/32: Remove setup_once()
  x86/kaslr: Fix process_mem_region()'s return value
  x86: Fix misc small issues
  x86/boot: Repair kernel-doc for boot_kstrtoul()
  x86: Improve formatting of user_regset arrays
  x86: Separate out x86_regset for 32 and 64 bit
  x86/i8259: Make default_legacy_pic static
  x86/tsc: Make art_related_clocksource static
2022-12-12 12:44:03 -08:00
Linus Torvalds
369013162f A set of changes for the x86 APIC code:
- Handle the case where x2APIC is enabled and locked by the BIOS on a
     kernel with CONFIG_X86_X2APIC=n gracefully. Instead of a panic which
     does not make it to the graphical console during very early boot,
     simply disable the local APIC completely and boot with the PIC and very
     limited functionality, which allows to diagnose the issue.
 
   - Convert x86 APIC device tree bindings to YAML
 
   - Extend x86 APIC device tree bindings to configure interrupt delivery
     mode and handle this in during init. This allows to boot with device
     tree on platforms which lack a legacy PIC.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUuYUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaTED/9D33bnJesbDVZs31HxLJc/jZED0/Do
 dli0wRHWmQx9jpUmTXlKRhhIcUOjPy3Cdz44yoOH14wdJ96qUCBUj8sS9vFO4F7M
 CS/eoO77GKG6oXpMvsNC5TcSaZnXAb4UYz5wCV21ZXL6P0izhOivKSqTR222jT6e
 afEzQhwWhHZmrkX44F1YvMuc+HP6+swfO635vNtZhKtlA7NeKdHRijGZhrXEhNO/
 Pue2xbYVMSLNaRTRtN0Mjm6UvShBLQhbmD/vXrVOCztfzhSfwq0LRC9xXcXmdWCY
 XjflM+osQxIUs2WbpL1lohq5VUzTlWVNsZe4YkH5b0xMEO9HkD7apF03p03SIO4n
 X37joMbrfPz9ZsmSdaN836YZd74IfQ5wnFFQTVL0BC0M4lZNeAnNcxVr3Mfio4yX
 GvYahmyvxHlbWag4SYqVsy15QiNV/xZZZD6uIvBvMCfxoFKw8tBF+9/2Iy+3R+zj
 n7q17Y9bLSXwh1Z/9xgwdTs+7SNCpIlZ/5nz8NpBhHaZF2BziICCv2TEKZUXmli3
 HHkWM7ikj67zgFMiWLLOZpiYz/vgJEFE9nhlmXEH1RNMIfqom/JG8FN8GE1C9kYV
 dmSjOE7x/CdZfJ83BRlTx5j2HfAs7RW4A7IMWPIxNdqEFmhxWnQIHasAfMrHcoIU
 pAQ8u/qoduJA4A==
 =dpZx
 -----END PGP SIGNATURE-----

Merge tag 'x86-apic-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 apic update from Thomas Gleixner:
 "A set of changes for the x86 APIC code:

   - Handle the case where x2APIC is enabled and locked by the BIOS on a
     kernel with CONFIG_X86_X2APIC=n gracefully.

     Instead of a panic which does not make it to the graphical console
     during very early boot, simply disable the local APIC completely
     and boot with the PIC and very limited functionality, which allows
     to diagnose the issue

   - Convert x86 APIC device tree bindings to YAML

   - Extend x86 APIC device tree bindings to configure interrupt
     delivery mode and handle this in during init. This allows to boot
     with device tree on platforms which lack a legacy PIC"

* tag 'x86-apic-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/of: Add support for boot time interrupt delivery mode configuration
  x86/of: Replace printk(KERN_LVL) with pr_lvl()
  dt-bindings: x86: apic: Introduce new optional bool property for lapic
  dt-bindings: x86: apic: Convert Intel's APIC bindings to YAML schema
  x86/of: Remove unused early_init_dt_add_memory_arch()
  x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOS
2022-12-12 12:30:31 -08:00
Linus Torvalds
9d33edb20f Updates for the interrupt core and driver subsystem:
- Core:
 
    The bulk is the rework of the MSI subsystem to support per device MSI
    interrupt domains. This solves conceptual problems of the current
    PCI/MSI design which are in the way of providing support for PCI/MSI[-X]
    and the upcoming PCI/IMS mechanism on the same device.
 
    IMS (Interrupt Message Store] is a new specification which allows device
    manufactures to provide implementation defined storage for MSI messages
    contrary to the uniform and specification defined storage mechanisms for
    PCI/MSI and PCI/MSI-X. IMS not only allows to overcome the size limitations
    of the MSI-X table, but also gives the device manufacturer the freedom to
    store the message in arbitrary places, even in host memory which is shared
    with the device.
 
    There have been several attempts to glue this into the current MSI code,
    but after lengthy discussions it turned out that there is a fundamental
    design problem in the current PCI/MSI-X implementation. This needs some
    historical background.
 
    When PCI/MSI[-X] support was added around 2003, interrupt management was
    completely different from what we have today in the actively developed
    architectures. Interrupt management was completely architecture specific
    and while there were attempts to create common infrastructure the
    commonalities were rudimentary and just providing shared data structures and
    interfaces so that drivers could be written in an architecture agnostic
    way.
 
    The initial PCI/MSI[-X] support obviously plugged into this model which
    resulted in some basic shared infrastructure in the PCI core code for
    setting up MSI descriptors, which are a pure software construct for holding
    data relevant for a particular MSI interrupt, but the actual association to
    Linux interrupts was completely architecture specific. This model is still
    supported today to keep museum architectures and notorious stranglers
    alive.
 
    In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel,
    which was creating yet another architecture specific mechanism and resulted
    in an unholy mess on top of the existing horrors of x86 interrupt handling.
    The x86 interrupt management code was already an incomprehensible maze of
    indirections between the CPU vector management, interrupt remapping and the
    actual IO/APIC and PCI/MSI[-X] implementation.
 
    At roughly the same time ARM struggled with the ever growing SoC specific
    extensions which were glued on top of the architected GIC interrupt
    controller.
 
    This resulted in a fundamental redesign of interrupt management and
    provided the today prevailing concept of hierarchical interrupt
    domains. This allowed to disentangle the interactions between x86 vector
    domain and interrupt remapping and also allowed ARM to handle the zoo of
    SoC specific interrupt components in a sane way.
 
    The concept of hierarchical interrupt domains aims to encapsulate the
    functionality of particular IP blocks which are involved in interrupt
    delivery so that they become extensible and pluggable. The X86
    encapsulation looks like this:
 
                                             |--- device 1
      [Vector]---[Remapping]---[PCI/MSI]--|...
                                             |--- device N
 
    where the remapping domain is an optional component and in case that it is
    not available the PCI/MSI[-X] domains have the vector domain as their
    parent. This reduced the required interaction between the domains pretty
    much to the initialization phase where it is obviously required to
    establish the proper parent relation ship in the components of the
    hierarchy.
 
    While in most cases the model is strictly representing the chain of IP
    blocks and abstracting them so they can be plugged together to form a
    hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware
    it's clear that the actual PCI/MSI[-X] interrupt controller is not a global
    entity, but strict a per PCI device entity.
 
    Here we took a short cut on the hierarchical model and went for the easy
    solution of providing "global" PCI/MSI domains which was possible because
    the PCI/MSI[-X] handling is uniform across the devices. This also allowed
    to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in
    turn made it simple to keep the existing architecture specific management
    alive.
 
    A similar problem was created in the ARM world with support for IP block
    specific message storage. Instead of going all the way to stack a IP block
    specific domain on top of the generic MSI domain this ended in a construct
    which provides a "global" platform MSI domain which allows overriding the
    irq_write_msi_msg() callback per allocation.
 
    In course of the lengthy discussions we identified other abuse of the MSI
    infrastructure in wireless drivers, NTB etc. where support for
    implementation specific message storage was just mindlessly glued into the
    existing infrastructure. Some of this just works by chance on particular
    platforms but will fail in hard to diagnose ways when the driver is used
    on platforms where the underlying MSI interrupt management code does not
    expect the creative abuse.
 
    Another shortcoming of today's PCI/MSI-X support is the inability to
    allocate or free individual vectors after the initial enablement of
    MSI-X. This results in an works by chance implementation of VFIO (PCI
    pass-through) where interrupts on the host side are not set up upfront to
    avoid resource exhaustion. They are expanded at run-time when the guest
    actually tries to use them. The way how this is implemented is that the
    host disables MSI-X and then re-enables it with a larger number of
    vectors again. That works by chance because most device drivers set up
    all interrupts before the device actually will utilize them. But that's
    not universally true because some drivers allocate a large enough number
    of vectors but do not utilize them until it's actually required,
    e.g. for acceleration support. But at that point other interrupts of the
    device might be in active use and the MSI-X disable/enable dance can
    just result in losing interrupts and therefore hard to diagnose subtle
    problems.
 
    Last but not least the "global" PCI/MSI-X domain approach prevents to
    utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS
    is not longer providing a uniform storage and configuration model.
 
    The solution to this is to implement the missing step and switch from
    global PCI/MSI domains to per device PCI/MSI domains. The resulting
    hierarchy then looks like this:
 
                               |--- [PCI/MSI] device 1
      [Vector]---[Remapping]---|...
                               |--- [PCI/MSI] device N
 
    which in turn allows to provide support for multiple domains per device:
 
                               |--- [PCI/MSI] device 1
                               |--- [PCI/IMS] device 1
      [Vector]---[Remapping]---|...
                               |--- [PCI/MSI] device N
                               |--- [PCI/IMS] device N
 
    This work converts the MSI and PCI/MSI core and the x86 interrupt
    domains to the new model, provides new interfaces for post-enable
    allocation/free of MSI-X interrupts and the base framework for PCI/IMS.
    PCI/IMS has been verified with the work in progress IDXD driver.
 
    There is work in progress to convert ARM over which will replace the
    platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
    "solutions" are in the works as well.
 
  - Drivers:
 
    - Updates for the LoongArch interrupt chip drivers
 
    - Support for MTK CIRQv2
 
    - The usual small fixes and updates all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUsygTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYXiD/40tXKzCzf0qFIqUlZLia1N3RRrwrNC
 DVTixuLtR9MrjwE+jWLQILa85SHInV8syXHSd35SzhsGDxkURFGi+HBgVWmysODf
 br9VSh3Gi+kt7iXtIwAg8WNWviGNmS3kPksxCko54F0YnJhMY5r5bhQVUBQkwFG2
 wES1C9Uzd4pdV2bl24Z+WKL85cSmZ+pHunyKw1n401lBABXnTF9c4f13zC14jd+y
 wDxNrmOxeL3mEH4Pg6VyrDuTOURSf3TjJjeEq3EYqvUo0FyLt9I/cKX0AELcZQX7
 fkRjrQQAvXNj39RJfeSkojDfllEPUHp7XSluhdBu5aIovSamdYGCDnuEoZ+l4MJ+
 CojIErp3Dwj/uSaf5c7C3OaDAqH2CpOFWIcrUebShJE60hVKLEpUwd6W8juplaoT
 gxyXRb1Y+BeJvO8VhMN4i7f3232+sj8wuj+HTRTTbqMhkElnin94tAx8rgwR1sgR
 BiOGMJi4K2Y8s9Rqqp0Dvs01CW4guIYvSR4YY+WDbbi1xgiev89OYs6zZTJCJe4Y
 NUwwpqYSyP1brmtdDdBOZLqegjQm+TwUb6oOaasFem4vT1swgawgLcDnPOx45bk5
 /FWt3EmnZxMz99x9jdDn1+BCqAZsKyEbEY1avvhPVMTwoVIuSX2ceTBMLseGq+jM
 03JfvdxnueM3gw==
 =9erA
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt core and driver subsystem:

  The bulk is the rework of the MSI subsystem to support per device MSI
  interrupt domains. This solves conceptual problems of the current
  PCI/MSI design which are in the way of providing support for
  PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device.

  IMS (Interrupt Message Store] is a new specification which allows
  device manufactures to provide implementation defined storage for MSI
  messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified
  message store which is uniform accross all devices). The PCI/MSI[-X]
  uniformity allowed us to get away with "global" PCI/MSI domains.

  IMS not only allows to overcome the size limitations of the MSI-X
  table, but also gives the device manufacturer the freedom to store the
  message in arbitrary places, even in host memory which is shared with
  the device.

  There have been several attempts to glue this into the current MSI
  code, but after lengthy discussions it turned out that there is a
  fundamental design problem in the current PCI/MSI-X implementation.
  This needs some historical background.

  When PCI/MSI[-X] support was added around 2003, interrupt management
  was completely different from what we have today in the actively
  developed architectures. Interrupt management was completely
  architecture specific and while there were attempts to create common
  infrastructure the commonalities were rudimentary and just providing
  shared data structures and interfaces so that drivers could be written
  in an architecture agnostic way.

  The initial PCI/MSI[-X] support obviously plugged into this model
  which resulted in some basic shared infrastructure in the PCI core
  code for setting up MSI descriptors, which are a pure software
  construct for holding data relevant for a particular MSI interrupt,
  but the actual association to Linux interrupts was completely
  architecture specific. This model is still supported today to keep
  museum architectures and notorious stragglers alive.

  In 2013 Intel tried to add support for hot-pluggable IO/APICs to the
  kernel, which was creating yet another architecture specific mechanism
  and resulted in an unholy mess on top of the existing horrors of x86
  interrupt handling. The x86 interrupt management code was already an
  incomprehensible maze of indirections between the CPU vector
  management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X]
  implementation.

  At roughly the same time ARM struggled with the ever growing SoC
  specific extensions which were glued on top of the architected GIC
  interrupt controller.

  This resulted in a fundamental redesign of interrupt management and
  provided the today prevailing concept of hierarchical interrupt
  domains. This allowed to disentangle the interactions between x86
  vector domain and interrupt remapping and also allowed ARM to handle
  the zoo of SoC specific interrupt components in a sane way.

  The concept of hierarchical interrupt domains aims to encapsulate the
  functionality of particular IP blocks which are involved in interrupt
  delivery so that they become extensible and pluggable. The X86
  encapsulation looks like this:

                                            |--- device 1
     [Vector]---[Remapping]---[PCI/MSI]--|...
                                            |--- device N

  where the remapping domain is an optional component and in case that
  it is not available the PCI/MSI[-X] domains have the vector domain as
  their parent. This reduced the required interaction between the
  domains pretty much to the initialization phase where it is obviously
  required to establish the proper parent relation ship in the
  components of the hierarchy.

  While in most cases the model is strictly representing the chain of IP
  blocks and abstracting them so they can be plugged together to form a
  hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the
  hardware it's clear that the actual PCI/MSI[-X] interrupt controller
  is not a global entity, but strict a per PCI device entity.

  Here we took a short cut on the hierarchical model and went for the
  easy solution of providing "global" PCI/MSI domains which was possible
  because the PCI/MSI[-X] handling is uniform across the devices. This
  also allowed to keep the existing PCI/MSI[-X] infrastructure mostly
  unchanged which in turn made it simple to keep the existing
  architecture specific management alive.

  A similar problem was created in the ARM world with support for IP
  block specific message storage. Instead of going all the way to stack
  a IP block specific domain on top of the generic MSI domain this ended
  in a construct which provides a "global" platform MSI domain which
  allows overriding the irq_write_msi_msg() callback per allocation.

  In course of the lengthy discussions we identified other abuse of the
  MSI infrastructure in wireless drivers, NTB etc. where support for
  implementation specific message storage was just mindlessly glued into
  the existing infrastructure. Some of this just works by chance on
  particular platforms but will fail in hard to diagnose ways when the
  driver is used on platforms where the underlying MSI interrupt
  management code does not expect the creative abuse.

  Another shortcoming of today's PCI/MSI-X support is the inability to
  allocate or free individual vectors after the initial enablement of
  MSI-X. This results in an works by chance implementation of VFIO (PCI
  pass-through) where interrupts on the host side are not set up upfront
  to avoid resource exhaustion. They are expanded at run-time when the
  guest actually tries to use them. The way how this is implemented is
  that the host disables MSI-X and then re-enables it with a larger
  number of vectors again. That works by chance because most device
  drivers set up all interrupts before the device actually will utilize
  them. But that's not universally true because some drivers allocate a
  large enough number of vectors but do not utilize them until it's
  actually required, e.g. for acceleration support. But at that point
  other interrupts of the device might be in active use and the MSI-X
  disable/enable dance can just result in losing interrupts and
  therefore hard to diagnose subtle problems.

  Last but not least the "global" PCI/MSI-X domain approach prevents to
  utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact
  that IMS is not longer providing a uniform storage and configuration
  model.

  The solution to this is to implement the missing step and switch from
  global PCI/MSI domains to per device PCI/MSI domains. The resulting
  hierarchy then looks like this:

                              |--- [PCI/MSI] device 1
     [Vector]---[Remapping]---|...
                              |--- [PCI/MSI] device N

  which in turn allows to provide support for multiple domains per
  device:

                              |--- [PCI/MSI] device 1
                              |--- [PCI/IMS] device 1
     [Vector]---[Remapping]---|...
                              |--- [PCI/MSI] device N
                              |--- [PCI/IMS] device N

  This work converts the MSI and PCI/MSI core and the x86 interrupt
  domains to the new model, provides new interfaces for post-enable
  allocation/free of MSI-X interrupts and the base framework for
  PCI/IMS. PCI/IMS has been verified with the work in progress IDXD
  driver.

  There is work in progress to convert ARM over which will replace the
  platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
  "solutions" are in the works as well.

  Drivers:

   - Updates for the LoongArch interrupt chip drivers

   - Support for MTK CIRQv2

   - The usual small fixes and updates all over the place"

* tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits)
  irqchip/ti-sci-inta: Fix kernel doc
  irqchip/gic-v2m: Mark a few functions __init
  irqchip/gic-v2m: Include arm-gic-common.h
  irqchip/irq-mvebu-icu: Fix works by chance pointer assignment
  iommu/amd: Enable PCI/IMS
  iommu/vt-d: Enable PCI/IMS
  x86/apic/msi: Enable PCI/IMS
  PCI/MSI: Provide pci_ims_alloc/free_irq()
  PCI/MSI: Provide IMS (Interrupt Message Store) support
  genirq/msi: Provide constants for PCI/IMS support
  x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN
  PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X
  PCI/MSI: Provide prepare_desc() MSI domain op
  PCI/MSI: Split MSI-X descriptor setup
  genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN
  genirq/msi: Provide msi_domain_alloc_irq_at()
  genirq/msi: Provide msi_domain_ops:: Prepare_desc()
  genirq/msi: Provide msi_desc:: Msi_data
  genirq/msi: Provide struct msi_map
  x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
  ...
2022-12-12 11:21:29 -08:00
Linus Torvalds
9c2b840a3b Three small x86 fixes which did not make it into 6.1:
- Remove a superfluous noinline which prevents GCC-7.3 to optimize a stub
     function away.
 
   - Allow uprobes on REP NOP and do not treat them like word-sized branch
     instructions.
 
   - Make the VDSO symbol export of __vdso_sgx_enter_enclave() depend on
     CONFIG_X86_SGX to prevent build fails with newer LLVM versions which
     rightfully detect that there is no function behind the symbol.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOW+sQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWH5EACPYcRw9PNBLMC6L0MF5G0qCFmLcjqn
 Fe8LxLywsKdyT6f1aAcOetIqkwDN/fuUyJHcioKqyqSkNlNeRV2hoZ9OlsBGJ7zC
 6HH41ZCrY39liKzMM2JmfxU6XxT74zEt3Fly4G127d78HBi9DYwk8fT6GY8/BOk6
 wkeWuczqRY1NNek1SBIciBn/FMZU8UShqjKzQsS1Bpj2Dm2ZvHdVh+P2okp2wl9Z
 gMbFN0Jq+8jRWOb4BF0Hx2Fg+WjXZPhT8msDXh8Vnr0u7bchWCljbLvvFST2hfpo
 +u/uKeOgOHm0XfUBOQa2WpEpev4M3ve1WFSkmP/0Qe3tcaRabMRDXGezZJSAdf1K
 dZV0tQu+4rygzZwEf4ppskxejG7LSvyzrLdebPvzUYFT14C5E22jRxp1+Mpswq28
 ZPiw6yc3XXUqboNV3JVNs3PDPBVucSCHfQfUNEfjUayaMhb4w5jQyy93WIffOzVU
 0KnXe9XX0MA3e5zVJMXExW4907Iks/K+qNgXtx/8fJnqaECIJInxZfbPmj74ZpfT
 6b0sJVt04eFX4uYKoLPpFoP9LFUvzU5eR7e7yuoiSGFh3D3p9bimyR5xhBxNqs8Y
 j7XL2i0jY95w6v1kK3Kmgr2L+JCAN2v/JFJ+eIOYQAIb/VkhTfNq/MHL33bDJ1X3
 2IrBEgo5tk7VNw==
 =oJ/K
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Three small x86 fixes which did not make it into 6.1:

   - Remove a superfluous noinline which prevents GCC-7.3 to optimize a
     stub function away

   - Allow uprobes on REP NOP and do not treat them like word-sized
     branch instructions

   - Make the VDSO symbol export of __vdso_sgx_enter_enclave() depend on
     CONFIG_X86_SGX to prevent build failures with newer LLVM versions
     which rightfully detect that there is no function behind the
     symbol"

* tag 'x86-urgent-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Conditionally export __vdso_sgx_enter_enclave()
  uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix
  x86/alternative: Remove noinline from __ibt_endbr_seal[_end]() stubs
2022-12-12 11:10:02 -08:00
Linus Torvalds
7d62159919 hyperv-next for v6.2
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmORzR4THHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXkqCCACFwHz04iepLE7R8ZZ6BVUhD6uzfzDo
 s1j7ozOUGUe3vI6q0DElHWVQZgzIzLypVsfWkZToe6jeOU6R48b0tZSFyJCUNwGM
 ogmS7N8fBdHfY9SBFoUPoziBifXpf3kq4hhX/w+1Lge9CN5Ywc4KjuJb91EAInbs
 lm47O4KQY8w8A7BbPBHYBueUVWLvgwPRPOS032zqxN1787m2tCxpqkfnImK39kh6
 IsBBIZfYsok0H5wldhZXnsARpEOeFF6BoFBXpFPlmnbv2VcK2AfZgTYdA3ESyEgd
 NyOFDfh6BO07gTR1xCH6gvOpkHwx6xKAkjE36RymdhXS6fhRCRsfahVB
 =m78g
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Drop unregister syscore from hyperv_cleanup to avoid hang (Gaurav
   Kohli)

 - Clean up panic path for Hyper-V framebuffer (Guilherme G. Piccoli)

 - Allow IRQ remapping to work without x2apic (Nuno Das Neves)

 - Fix comments (Olaf Hering)

 - Expand hv_vp_assist_page definition (Saurabh Sengar)

 - Improvement to page reporting (Shradha Gupta)

 - Make sure TSC clocksource works when Linux runs as the root partition
   (Stanislav Kinsburskiy)

* tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Remove unregister syscore call from Hyper-V cleanup
  iommu/hyper-v: Allow hyperv irq remapping without x2apic
  clocksource: hyper-v: Add TSC page support for root partition
  clocksource: hyper-v: Use TSC PFN getter to map vvar page
  clocksource: hyper-v: Introduce TSC PFN getter
  clocksource: hyper-v: Introduce a pointer to TSC page
  x86/hyperv: Expand definition of struct hv_vp_assist_page
  PCI: hv: update comment in x86 specific hv_arch_irq_unmask
  hv: fix comment typo in vmbus_channel/low_latency
  drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers
  video: hyperv_fb: Avoid taking busy spinlock on panic path
  hv_balloon: Add support for configurable order free page reporting
  mm/page_reporting: Add checks for page_reporting_order param
2022-12-12 09:34:16 -08:00
Bjorn Helgaas
00904bf64c x86/PCI: Tidy E820 removal messages
These messages:

  clipped [mem size 0x00000000 64bit] to [mem size 0xfffffffffffa0000 64bit] for e820 entry [mem 0x0009f000-0x000fffff]

aren't as useful as they could be because (a) the resource is often
IORESOURCE_UNSET, so we print the size instead of the start/end and (b) we
print the available resource even if it is empty after removing the E820
entry.

Print the available space by hand to avoid the IORESOURCE_UNSET problem and
only if it's non-empty.  No functional change intended.

Link: https://lore.kernel.org/r/20221208190341.1560157-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
2022-12-10 10:33:11 -06:00
Steven Rostedt (Google)
fd3dc56253 ftrace/x86: Add back ftrace_expected for ftrace bug reports
After someone reported a bug report with a failed modification due to the
expected value not matching what was found, it came to my attention that
the ftrace_expected is no longer set when that happens. This makes for
debugging the issue a bit more difficult.

Set ftrace_expected to the expected code before calling ftrace_bug, so
that it shows what was expected and why it failed.

Link: https://lore.kernel.org/all/CA+wXwBQ-VhK+hpBtYtyZP-NiX4g8fqRRWithFOHQW-0coQ3vLg@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20221209105247.01d4e51d@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 768ae4406a ("x86/ftrace: Use text_poke()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-09 11:54:28 -05:00
Ard Biesheuvel
d9f26ae731 Linux 6.1-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmONI6weHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG9xgH/jqXGuMoO1ikfmGb
 7oY0W/f69G9V/e0DxFLvnIjhFgCUzdnNsmD4jQJA4x6QsxwLWuvpI282Ez+bHV5T
 U4RPsxJZIIMsXE2lKM9BRgeLzDdCt0aK4Pj+3x2x7NZC5cWFSQ8PyQJkCwg+0PQo
 u8Ly+GO8c4RUMf4/rrAZQq16qZUqGDaGm1EJhtSoa+KiR81LmUUmbDIK9Mr53rmQ
 wou+95XhibwMWr17WgXA28bTgYqn9UGr67V3qvTH2LC7GW8BCoKvn+3wh6TVhlWj
 dsWplXgcOP0/OHvSC5Sb1Uibk5Gx3DlIzYa6OfNZQuZ5xmQqm9kXjW8lmYpWFHy/
 38/5HWc=
 =EuoA
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-rc8' into efi/next

Linux 6.1-rc8
2022-12-07 19:08:57 +01:00
Thomas Gleixner
6e24c88773 x86/apic/msi: Enable PCI/IMS
Enable IMS in the domain init and allocation mapping code, but do not
enable it on the vector domain as discussed in various threads on LKML.

The interrupt remap domains can expand this setting like they do with
PCI multi MSI.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232327.022658817@linutronix.de
2022-12-05 22:22:35 +01:00
Thomas Gleixner
4d5a4ccc51 x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
and related code which is not longer required now that the interrupt remap
code has been converted to MSI parent domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.267353814@linutronix.de
2022-12-05 22:22:33 +01:00
Thomas Gleixner
cc7594ffad iommu/amd: Switch to MSI base domains
Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.209212272@linutronix.de
2022-12-05 22:22:33 +01:00
Thomas Gleixner
9a945234ab iommu/vt-d: Switch to MSI parent domains
Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de
2022-12-05 22:22:33 +01:00
Thomas Gleixner
b6d5fc3a52 x86/apic/vector: Provide MSI parent domain
Enable MSI parent domain support in the x86 vector domain and fixup the
checks in the iommu implementations to check whether device::msi::domain is
the default MSI parent domain. That keeps the existing logic to protect
e.g. devices behind VMD working.

The interrupt remap PCI/MSI code still works because the underlying vector
domain still provides the same functionality.

None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are
affected either. They still work the same way both at the low level and the
PCI/MSI implementations they provide.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de
2022-12-05 22:22:33 +01:00
Ashok Raj
be1b670f61 x86/microcode/intel: Do not retry microcode reloading on the APs
The retries in load_ucode_intel_ap() were in place to support systems
with mixed steppings. Mixed steppings are no longer supported and there is
only one microcode image at a time. Any retries will simply reattempt to
apply the same image over and over without making progress.

  [ bp: Zap the circumstantial reasoning from the commit message. ]

Fixes: 06b8534cb7 ("x86/microcode: Rework microcode loading")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221129210832.107850-3-ashok.raj@intel.com
2022-12-05 21:22:21 +01:00
Thomas Gleixner
3dad5f9ad9 genirq/msi: Move IRQ_DOMAIN_MSI_NOMASK_QUIRK to MSI flags
It's truly a MSI only flag and for the upcoming per device MSI domains this
must be in the MSI flags so it can be set during domain setup without
exposing this quirk outside of x86.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230313.454246167@linutronix.de
2022-12-05 19:20:58 +01:00
Oleg Nesterov
cefa72129e uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix
Intel ICC -hotpatch inserts 2-byte "0x66 0x90" NOP at the start of each
function to reserve extra space for hot-patching, and currently it is not
possible to probe these functions because branch_setup_xol_ops() wrongly
rejects NOP with REP prefix as it treats them like word-sized branch
instructions.

Fixes: 250bbd12c2 ("uprobes/x86: Refuse to attach uprobe to "word-sized" branch insns")
Reported-by: Seiji Nishikawa <snishika@redhat.com>
Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20221204173933.GA31544@redhat.com
2022-12-05 11:55:18 +01:00
Juergen Gross
7882b69eb6 x86/mtrr: Make message for disabled MTRRs more descriptive
Instead of just saying "Disabled" when MTRRs are disabled for any
reason, tell what is disabled and why.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221205080433.16643-3-jgross@suse.com
2022-12-05 11:08:25 +01:00
Ashok Raj
5b1586ab06 x86/microcode/intel: Do not print microcode revision and processor flags
collect_cpu_info() is used to collect the current microcode revision and
processor flags on every CPU.

It had a weird mechanism to try to mimick a "once" functionality in the
sense that, that information should be issued only when it is differing
from the previous CPU.

However (1):

the new calling sequence started doing that in parallel:

  microcode_init()
  |-> schedule_on_each_cpu(setup_online_cpu)
      |-> collect_cpu_info()

resulting in multiple redundant prints:

  microcode: sig=0x50654, pf=0x80, revision=0x2006e05
  microcode: sig=0x50654, pf=0x80, revision=0x2006e05
  microcode: sig=0x50654, pf=0x80, revision=0x2006e05

However (2):

dumping this here is not that important because the kernel does not
support mixed silicon steppings microcode. Finally!

Besides, there is already a pr_info() in microcode_reload_late() that
shows both the old and new revisions.

What is more, the CPU signature (sig=0x50654) and Processor Flags
(pf=0x80) above aren't that useful to the end user, they are available
via /proc/cpuinfo and they don't change anyway.

Remove the redundant pr_info().

  [ bp: Heavily massage. ]

Fixes: b6f86689d5 ("x86/microcode: Rip out the subsys interface gunk")
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221103175901.164783-2-ashok.raj@intel.com
2022-12-03 14:41:06 +01:00
Pawan Gupta
6606515742 x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3
The "force" argument to write_spec_ctrl_current() is currently ambiguous
as it does not guarantee the MSR write. This is due to the optimization
that writes to the MSR happen only when the new value differs from the
cached value.

This is fine in most cases, but breaks for S3 resume when the cached MSR
value gets out of sync with the hardware MSR value due to S3 resetting
it.

When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write
is skipped. Which results in SPEC_CTRL mitigations not getting restored.

Move the MSR write from write_spec_ctrl_current() to a new function that
unconditionally writes to the MSR. Update the callers accordingly and
rename functions.

  [ bp: Rework a bit. ]

Fixes: caa0ff24d5 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value")
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-02 15:45:33 -08:00
Kristen Carlson Accardi
89e927bbcd x86/sgx: Replace kmap/kunmap_atomic() calls
kmap_local_page() is the preferred way to create temporary mappings when it
is feasible, because the mappings are thread-local and CPU-local.

kmap_local_page() uses per-task maps rather than per-CPU maps. This in
effect removes the need to disable preemption on the local CPU while the
mapping is active, and thus vastly reduces overall system latency. It is
also valid to take pagefaults within the mapped region.

The use of kmap_atomic() in the SGX code was not an explicit design choice
to disable page faults or preemption, and there is no compelling design
reason to using kmap_atomic() vs. kmap_local_page().

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/linux-sgx/Y0biN3%2FJsZMa0yUr@kernel.org/
Link: https://lore.kernel.org/r/20221115161627.4169428-1-kristen@linux.intel.com
2022-12-02 14:59:56 +01:00
Rahul Tanwar
2833275568 x86/of: Add support for boot time interrupt delivery mode configuration
Presently, init/boot time interrupt delivery mode is enumerated only for
ACPI enabled systems by parsing MADT table or for older systems by parsing
MP table. But for OF based x86 systems, it is assumed & hardcoded to be
legacy PIC mode. This causes a boot time crash for platforms which do not
provide a 8259 compliant legacy PIC.

Add support for configuration of init time interrupt delivery mode for x86
OF based systems by introducing a new optional boolean property
'intel,virtual-wire-mode' for the local APIC interrupt-controller
node. This property emulates IMCRP Bit 7 of MP feature info byte 2 of MP
floating pointer structure.

Defaults to legacy PIC mode if absent. Configures it to virtual wire
compatibility mode if present.

Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20221124084143.21841-5-rtanwar@maxlinear.com
2022-12-02 14:57:14 +01:00
Rahul Tanwar
535403323b x86/of: Replace printk(KERN_LVL) with pr_lvl()
Use pr_lvl() instead of the deprecated printk(KERN_LVL).

Just a upgrade of print utilities usage. no functional changes.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20221124084143.21841-4-rtanwar@maxlinear.com
2022-12-02 14:57:14 +01:00
Andy Shevchenko
9b09927c0c x86/of: Remove unused early_init_dt_add_memory_arch()
Recently objtool started complaining about dead code in the object files,
in particular

vmlinux.o: warning: objtool: early_init_dt_scan_memory+0x191: unreachable instruction

when CONFIG_OF=y.

Indeed, early_init_dt_scan() is not used on x86 and making it compile (with
help of CONFIG_OF) will abrupt the code flow since in the middle of it
there is a BUG() instruction.

Remove the pointless function.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221124184824.9548-1-andriy.shevchenko@linux.intel.com
2022-12-02 14:57:13 +01:00
Mateusz Jończyk
e3998434da x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOS
A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on
platforms that have x2APIC already enabled in the BIOS before starting the
kernel.

The kernel was supposed to panic with an approprite error message in
validate_x2apic() due to the missing X2APIC support.

However, validate_x2apic() was run too late in the boot cycle, and the
kernel tried to initialize the APIC nonetheless. This resulted in an
earlier panic in setup_local_APIC() because the APIC was not registered.

In my experiments, a panic message in setup_local_APIC() was not visible
in the graphical console, which resulted in a hang with no indication
what has gone wrong.

Instead of calling panic(), disable the APIC, which results in a somewhat
working system with the PIC only (and no SMP). This way the user is able to
diagnose the problem more easily.

Disabling X2APIC mode is not an option because it's impossible on systems
with locked x2APIC.

The proper place to disable the APIC in this case is in check_x2apic(),
which is called early from setup_arch(). Doing this in
__apic_intr_mode_select() is too late.

Make check_x2apic() unconditionally available and remove the empty stub.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Robert Elliott (Servers) <elliott@hpe.com>
Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/d573ba1c-0dc4-3016-712a-cc23a8a33d42@molgen.mpg.de
Link: https://lore.kernel.org/lkml/20220911084711.13694-3-mat.jonczyk@o2.pl
Link: https://lore.kernel.org/all/20221129215008.7247-1-mat.jonczyk@o2.pl
2022-12-02 14:28:52 +01:00
Brian Gerst
ff4c85c053 x86/asm/32: Remove setup_once()
After the removal of the stack canary segment setup code, this function
does nothing.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221115184328.70874-1-brgerst@gmail.com
2022-12-02 14:06:34 +01:00
Miaohe Lin
023e59d4ce x86/alternative: Remove noinline from __ibt_endbr_seal[_end]() stubs
Due to the explicit 'noinline' GCC-7.3 is not able to optimize away the
argument setup of:

	apply_ibt_endbr(__ibt_endbr_seal, __ibt_enbr_seal_end);

even when X86_KERNEL_IBT=n and the function is an empty stub, which leads
to link errors due to missing __ibt_endbr_seal* symbols:

ld: arch/x86/kernel/alternative.o: in function `alternative_instructions':
alternative.c:(.init.text+0x15d): undefined reference to `__ibt_endbr_seal_end'
ld: alternative.c:(.init.text+0x164): undefined reference to `__ibt_endbr_seal'

Remove the explicit 'noinline' to help gcc optimize them away.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221011113803.956808-1-linmiaohe@huawei.com
2022-12-02 12:54:43 +01:00
Andrew Morton
a38358c934 Merge branch 'mm-hotfixes-stable' into mm-stable 2022-11-30 14:58:42 -08:00
Nuno Das Neves
fea858dc5d iommu/hyper-v: Allow hyperv irq remapping without x2apic
If x2apic is not available, hyperv-iommu skips remapping
irqs. This breaks root partition which always needs irqs
remapped.

Fix this by allowing irq remapping regardless of x2apic,
and change hyperv_enable_irq_remapping() to return
IRQ_REMAP_XAPIC_MODE in case x2apic is missing.

Tested with root and non-root hyperv partitions.

Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1668715899-8971-1-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28 16:48:20 +00:00