pve-kernel/patches/kernel
Fiona Ebner 6c9726f077 cherry-pick potential fix for NULL pointer deref with AMD Arcturus GPU during boot
The issue was reported in the enterprise support and is handled by
Alexander Zeidler. It has the following trace [0] and causes an issue
with the networking down the line, because 'udevadm settle' would time
out. The customer reported that mainline kernel 6.9.3 booted fine.
Looking at the new commits, this one stood out, as it heavily modifies
the arcturus_get_power_limit() function. While not tagged for stable,
it seems straightforward enough and has a good chance to fix the
issue.

[0]:

> Jul 09 07:34:59 proxmox kernel: BUG: kernel NULL pointer dereference, address: 000000000000000f
> Jul 09 07:34:59 proxmox kernel: #PF: supervisor read access in kernel mode
> Jul 09 07:34:59 proxmox kernel: #PF: error_code(0x0000) - not-present page
> Jul 09 07:34:59 proxmox kernel: PGD 0 P4D 0
> Jul 09 07:34:59 proxmox kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
> Jul 09 07:34:59 proxmox kernel: CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: P           O       6.8.8-2-pve #1
> Jul 09 07:34:59 proxmox kernel: Hardware name: Supermicro AS -4124GS-TNR-03-EB004/H12DSG-O-CPU, BIOS 2.7 09/21/2023
> Jul 09 07:34:59 proxmox kernel: Workqueue: events work_for_cpu_fn
> Jul 09 07:34:59 proxmox kernel: RIP: 0010:arcturus_get_power_limit+0xb5/0x1b0 [amdgpu]
> Jul 09 07:34:59 proxmox kernel: Code: 24 48 85 d2 74 05 8b 45 cc 89 02 4d 85 ff 74 38 44 0f b6 a3 b8 06 00 00 41 80 fc 01 0f 87 81 d7 3d 00 48 8b 45 b0 41 83 e4 01 <0f> b6 40 0f 75 10 84 c0 74 14 45 8b bf 86 01 00 00 45 31 e4 eb 0e
> Jul 09 07:34:59 proxmox kernel: RSP: 0018:ffffaa42c029fc38 EFLAGS: 00010246
> Jul 09 07:34:59 proxmox kernel: RAX: 0000000000000000 RBX: ffff8d803362b000 RCX: 0000000000000000
> Jul 09 07:34:59 proxmox kernel: RDX: ffff8d803362b6c0 RSI: 0000000000000000 RDI: 0000000000000000
> Jul 09 07:34:59 proxmox kernel: RBP: ffffaa42c029fc88 R08: 0000000000000000 R09: ffffffffc177e1f0
> Jul 09 07:34:59 proxmox kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> Jul 09 07:34:59 proxmox kernel: R13: ffff8d803362b6c8 R14: ffff8d803362b6c4 R15: ffff8d80424a1014
> Jul 09 07:34:59 proxmox kernel: FS:  0000000000000000(0000) GS:ffff8e7f0ae00000(0000) knlGS:0000000000000000
> Jul 09 07:34:59 proxmox kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 09 07:34:59 proxmox kernel: CR2: 000000000000000f CR3: 0000006b24a36003 CR4: 0000000000f70ef0
> Jul 09 07:34:59 proxmox kernel: PKRU: 55555554
> Jul 09 07:34:59 proxmox kernel: Call Trace:
> Jul 09 07:34:59 proxmox kernel:  <TASK>
> Jul 09 07:34:59 proxmox kernel:  ? show_regs+0x6d/0x80
> Jul 09 07:34:59 proxmox kernel:  ? __die+0x24/0x80
> Jul 09 07:34:59 proxmox kernel:  ? page_fault_oops+0x176/0x500
> Jul 09 07:34:59 proxmox kernel:  ? do_user_addr_fault+0x2f9/0x6b0
> Jul 09 07:34:59 proxmox kernel:  ? exc_page_fault+0x83/0x1b0
> Jul 09 07:34:59 proxmox kernel:  ? asm_exc_page_fault+0x27/0x30
> Jul 09 07:34:59 proxmox kernel:  ? __pfx_arcturus_get_power_limit+0x10/0x10 [amdgpu]
> Jul 09 07:34:59 proxmox kernel:  ? arcturus_get_power_limit+0xb5/0x1b0 [amdgpu]
> Jul 09 07:34:59 proxmox kernel:  ? arcturus_get_power_limit+0x62/0x1b0 [amdgpu]
> Jul 09 07:34:59 proxmox kernel:  smu_late_init+0x16f/0x4d0 [amdgpu]
> Jul 09 07:34:59 proxmox kernel:  amdgpu_device_ip_late_init+0x68/0x2a0 [amdgpu]
> Jul 09 07:34:59 proxmox kernel:  amdgpu_device_init+0x242d/0x26e0 [amdgpu]
> Jul 09 07:34:59 proxmox kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> Jul 09 07:34:59 proxmox kernel:  amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
> Jul 09 07:34:59 proxmox kernel:  amdgpu_pci_probe+0x195/0x520 [amdgpu]
> Jul 09 07:34:59 proxmox kernel:  local_pci_probe+0x47/0xb0
> Jul 09 07:34:59 proxmox kernel:  work_for_cpu_fn+0x1a/0x30
> Jul 09 07:34:59 proxmox kernel:  process_one_work+0x16d/0x350
> Jul 09 07:34:59 proxmox kernel:  worker_thread+0x306/0x440
> Jul 09 07:34:59 proxmox kernel:  ? __pfx_worker_thread+0x10/0x10
> Jul 09 07:34:59 proxmox kernel:  kthread+0xf2/0x120
> Jul 09 07:34:59 proxmox kernel:  ? __pfx_kthread+0x10/0x10
> Jul 09 07:34:59 proxmox kernel:  ret_from_fork+0x47/0x70
> Jul 09 07:34:59 proxmox kernel:  ? __pfx_kthread+0x10/0x10
> Jul 09 07:34:59 proxmox kernel:  ret_from_fork_asm+0x1b/0x30
> Jul 09 07:34:59 proxmox kernel:  </TASK>

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-07-12 16:51:42 +02:00
..
0001-Make-mkcompile_h-accept-an-alternate-timestamp-strin.patch update kernel and patches for Ubuntu-6.5.0-20.20 2024-02-14 11:08:30 +01:00
0002-wireless-Add-Debian-wireless-regdb-certificates.patch wireless: Add Debian wireless-regdb certificates 2023-02-10 12:48:20 +01:00
0003-bridge-keep-MAC-of-first-assigned-port.patch wireless: Add Debian wireless-regdb certificates 2023-02-10 12:48:20 +01:00
0004-pci-Enable-overrides-for-missing-ACS-capabilities-4..patch rebase patches on top of Ubuntu-6.8.0-38.38 2024-06-10 11:10:14 +02:00
0005-kvm-disable-default-dynamic-halt-polling-growth.patch update submodule and patches for 24.04 Noble based kernel 2024-04-02 18:14:21 +02:00
0006-net-core-downgrade-unregister_netdevice-refcount-lea.patch rebase patches on top of Ubuntu-6.8.0-38.38 2024-06-10 11:10:14 +02:00
0007-Revert-fortify-Do-not-cast-to-unsigned-char.patch update submodule and patches for 24.04 Noble based kernel 2024-04-02 18:14:21 +02:00
0008-kvm-xsave-set-mask-out-PKRU-bit-in-xfeatures-if-vCPU.patch rebase patches on top of Ubuntu-6.8.0-38.38 2024-06-10 11:10:14 +02:00
0009-allow-opt-in-to-allow-pass-through-on-broken-hardwar.patch rebase patches on top of Ubuntu-6.8.0-38.38 2024-06-10 11:10:14 +02:00
0010-KVM-nSVM-Advertise-support-for-flush-by-ASID.patch update submodule and patches for 24.04 Noble based kernel 2024-04-02 18:14:21 +02:00
0011-revert-memfd-improve-userspace-warnings-for-missing-.patch update submodule and patches for 24.04 Noble based kernel 2024-04-02 18:14:21 +02:00
0012-apparmor-expect-msg_namelen-0-for-recvmsg-calls.patch update sources and patches to Ubuntu-6.8.0-32.32 2024-05-02 13:51:01 +02:00
0013-x86-CPU-AMD-Improve-the-erratum-1386-workaround.patch update sources and patches to Ubuntu-6.8.0-32.32 2024-05-02 13:51:01 +02:00
0014-block-fix-request.queuelist-usage-in-flush.patch update fix for managing block flush queue list 2024-06-10 13:34:41 +02:00
0015-scsi-core-Handle-devices-which-return-an-unusually-l.patch fix #5448: support SCSI contollers with bad VDP page length encoding again 2024-06-20 10:55:23 +02:00
0016-e1000e-change-usleep_range-to-udelay-in-PHY-mdic-acc.patch fix #5554: improve e1000e stability on cable reconnection 2024-06-24 10:22:20 +02:00
0017-virtio-pci-Check-if-is_avq-is-NULL.patch cherry-pick "virtio-pci: Check if is_avq is NULL" 2024-06-24 10:59:15 +02:00
0018-cifs-fix-pagecache-leak-when-do-writepages.patch add fix for CIFS client memory leak 2024-07-12 16:51:42 +02:00
0019-drm-amdgpu-pm-Don-t-use-OD-table-on-Arcturus.patch cherry-pick potential fix for NULL pointer deref with AMD Arcturus GPU during boot 2024-07-12 16:51:42 +02:00