Dave Airlie
be3cd668ff
drm-misc-next for 6.17:
...
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- mode_config: Change fb_create prototype to pass the drm_format_info
and avoid redundant lookups in drivers
- sched: kunit improvements, memory leak fixes, reset handling
improvements
- tests: kunit EDID update
Driver Changes:
- amdgpu: Hibernation fixes, structure lifetime fixes
- nouveau: sched improvements
- sitronix: Add Sitronix ST7567 Support
- bridge:
- Make connector available to bridge detect hook
- panel:
- More refcounting changes
- New panels: BOE NE14QDM
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaHisvgAKCRAnX84Zoj2+
dihcAX49H551lQ42amN11jN4pR2tpKLjRMCSsNnXzkokJYx8adEaGTcZq+0Oi9rZ
muGQKJABf2SSb/bem7qEu0JVL3/Pz8DOLbKIE4ltPMlHfYF5NzJhy3AKIMIjMLH7
XqSDXOMgjA==
=CfQG
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2025-07-17' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- mode_config: Change fb_create prototype to pass the drm_format_info
and avoid redundant lookups in drivers
- sched: kunit improvements, memory leak fixes, reset handling
improvements
- tests: kunit EDID update
Driver Changes:
- amdgpu: Hibernation fixes, structure lifetime fixes
- nouveau: sched improvements
- sitronix: Add Sitronix ST7567 Support
- bridge:
- Make connector available to bridge detect hook
- panel:
- More refcounting changes
- New panels: BOE NE14QDM
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250717-efficient-kudu-of-fantasy-ff95e0@houat
2025-07-21 09:16:51 +10:00
Christophe JAILLET
28c5c48638
drm/amdgpu: Fix missing unlocking in an error path in amdgpu_userq_create()
...
If kasprintf() fails, some mutex still need to be released to avoid locking
issue, as already done in all other error handling path.
Fixes: c03ea34cbf
("drm/amdgpu: add support of debugfs for mqd information")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/all/366557fa7ca8173fd78c58336986ca56953369b9.1752087753.git.christophe.jaillet@wanadoo.fr/
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16 15:46:04 -04:00
Simona Vetter
7e11e01d1f
amd-drm-next-6.17-2025-07-11:
...
amdgpu:
- Clean up function signatures
- GC 10 KGQ reset fix
- SDMA reset cleanups
- Misc fixes
- LVDS fixes
- UserQ fix
amdkfd:
- Reset fix
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaHF5fgAKCRC93/aFa7yZ
2FbQAPsH59r6i1K6TfyLYuCIOwFXyqx/AEy2/HmlBx3r2ElC2AEA8R0rrKTVpzlP
ZN6WaCDLEBOlk21S3U3QCHuSlf/G1Ac=
=H/st
-----END PGP SIGNATURE-----
Merge tag 'amd-drm-next-6.17-2025-07-11' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.17-2025-07-11:
amdgpu:
- Clean up function signatures
- GC 10 KGQ reset fix
- SDMA reset cleanups
- Misc fixes
- LVDS fixes
- UserQ fix
amdkfd:
- Reset fix
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250711205548.21052-1-alexander.deucher@amd.com
2025-07-11 23:55:40 +02:00
Sunil Khatri
8f9abaff41
drm/amdgpu: fix MQD debugfs undefined symbol when DEBUG_FS=n
...
Fix undefined reference to amdgpu_mqd_info_fops during
debugfs_create_file if DEBUG_FS=n
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Link: https://lore.kernel.org/r/20250708101551.68033-1-sunil.khatri@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2025-07-09 10:14:19 +02:00
Vitaly Prosyak
a886d26f2c
drm/amdgpu: fix use-after-free in amdgpu_userq_suspend+0x51a/0x5a0
...
[ +0.000020] BUG: KASAN: slab-use-after-free in amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu]
[ +0.000817] Read of size 8 at addr ffff88812eec8c58 by task amd_pci_unplug/1733
[ +0.000027] CPU: 10 UID: 0 PID: 1733 Comm: amd_pci_unplug Tainted: G W 6.14.0+ #2
[ +0.000009] Tainted: [W]=WARN
[ +0.000003] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000003] dump_stack_lvl+0x76/0xa0
[ +0.000011] print_report+0xce/0x600
[ +0.000009] ? srso_return_thunk+0x5/0x5f
[ +0.000006] ? kasan_complete_mode_report_info+0x76/0x200
[ +0.000007] ? kasan_addr_to_slab+0xd/0xb0
[ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu]
[ +0.000707] kasan_report+0xbe/0x110
[ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu]
[ +0.000541] __asan_report_load8_noabort+0x14/0x30
[ +0.000005] amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu]
[ +0.000535] ? stop_cpsch+0x396/0x600 [amdgpu]
[ +0.000556] ? stop_cpsch+0x429/0x600 [amdgpu]
[ +0.000536] ? __pfx_amdgpu_userq_suspend+0x10/0x10 [amdgpu]
[ +0.000536] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? kgd2kfd_suspend+0x132/0x1d0 [amdgpu]
[ +0.000542] amdgpu_device_fini_hw+0x581/0xe90 [amdgpu]
[ +0.000485] ? down_write+0xbb/0x140
[ +0.000007] ? __mutex_unlock_slowpath.constprop.0+0x317/0x360
[ +0.000005] ? __pfx_amdgpu_device_fini_hw+0x10/0x10 [amdgpu]
[ +0.000482] ? __kasan_check_write+0x14/0x30
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? up_write+0x55/0xb0
[ +0.000007] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? blocking_notifier_chain_unregister+0x6c/0xc0
[ +0.000008] amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
[ +0.000484] amdgpu_pci_remove+0x93/0x130 [amdgpu]
[ +0.000482] pci_device_remove+0xae/0x1e0
[ +0.000008] device_remove+0xc7/0x180
[ +0.000008] device_release_driver_internal+0x3d4/0x5a0
[ +0.000007] device_release_driver+0x12/0x20
[ +0.000004] pci_stop_bus_device+0x104/0x150
[ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40
[ +0.000005] remove_store+0xd7/0xf0
[ +0.000005] ? __pfx_remove_store+0x10/0x10
[ +0.000006] ? __pfx__copy_from_iter+0x10/0x10
[ +0.000006] ? __pfx_dev_attr_store+0x10/0x10
[ +0.000006] dev_attr_store+0x3f/0x80
[ +0.000006] sysfs_kf_write+0x125/0x1d0
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? __kasan_check_write+0x14/0x30
[ +0.000005] kernfs_fop_write_iter+0x2ea/0x490
[ +0.000005] ? rw_verify_area+0x70/0x420
[ +0.000005] ? __pfx_kernfs_fop_write_iter+0x10/0x10
[ +0.000006] vfs_write+0x90d/0xe70
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? __pfx_vfs_write+0x10/0x10
[ +0.000004] ? local_clock+0x15/0x30
[ +0.000008] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __kasan_slab_free+0x5f/0x80
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __kasan_check_read+0x11/0x20
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? fdget_pos+0x1d3/0x500
[ +0.000007] ksys_write+0x119/0x220
[ +0.000005] ? putname+0x1c/0x30
[ +0.000006] ? __pfx_ksys_write+0x10/0x10
[ +0.000007] __x64_sys_write+0x72/0xc0
[ +0.000006] x64_sys_call+0x18ab/0x26f0
[ +0.000006] do_syscall_64+0x7c/0x170
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __pfx___x64_sys_openat+0x10/0x10
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __kasan_check_read+0x11/0x20
[ +0.000003] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? fpregs_assert_state_consistent+0x21/0xb0
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? syscall_exit_to_user_mode+0x4e/0x240
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? do_syscall_64+0x88/0x170
[ +0.000003] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? irqentry_exit+0x43/0x50
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? exc_page_fault+0x7c/0x110
[ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000006] RIP: 0033:0x7480c0b14887
[ +0.000005] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ +0.000005] RSP: 002b:00007fff142b0058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ +0.000006] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007480c0b14887
[ +0.000003] RDX: 0000000000000001 RSI: 00007480c0e7365a RDI: 0000000000000004
[ +0.000003] RBP: 00007fff142b0080 R08: 0000563b2e73c170 R09: 0000000000000000
[ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff142b02f8
[ +0.000003] R13: 0000563b159a72a9 R14: 0000563b159a9d48 R15: 00007480c0f19040
[ +0.000008] </TASK>
[ +0.000445] Allocated by task 427 on cpu 5 at 29.342331s:
[ +0.000011] kasan_save_stack+0x28/0x60
[ +0.000006] kasan_save_track+0x18/0x70
[ +0.000006] kasan_save_alloc_info+0x38/0x60
[ +0.000005] __kasan_kmalloc+0xc1/0xd0
[ +0.000006] __kmalloc_cache_noprof+0x1bd/0x430
[ +0.000007] amdgpu_driver_open_kms+0x172/0x760 [amdgpu]
[ +0.000493] drm_file_alloc+0x569/0x9a0
[ +0.000007] drm_client_init+0x1b7/0x410
[ +0.000007] drm_fbdev_client_setup+0x174/0x470
[ +0.000006] drm_client_setup+0x8a/0xf0
[ +0.000006] amdgpu_pci_probe+0x510/0x10c0 [amdgpu]
[ +0.000483] local_pci_probe+0xe7/0x1b0
[ +0.000006] pci_device_probe+0x5bf/0x890
[ +0.000006] really_probe+0x1fd/0x950
[ +0.000005] __driver_probe_device+0x307/0x410
[ +0.000006] driver_probe_device+0x4e/0x150
[ +0.000005] __driver_attach+0x223/0x510
[ +0.000006] bus_for_each_dev+0x102/0x1a0
[ +0.000005] driver_attach+0x3d/0x60
[ +0.000006] bus_add_driver+0x309/0x650
[ +0.000005] driver_register+0x13d/0x490
[ +0.000006] __pci_register_driver+0x1ee/0x2b0
[ +0.000006] rfcomm_dlc_clear_state+0x69/0x220 [rfcomm]
[ +0.000011] do_one_initcall+0x9c/0x3e0
[ +0.000007] do_init_module+0x29e/0x7f0
[ +0.000006] load_module+0x5c75/0x7c80
[ +0.000006] init_module_from_file+0x106/0x180
[ +0.000006] idempotent_init_module+0x377/0x740
[ +0.000006] __x64_sys_finit_module+0xd7/0x180
[ +0.000006] x64_sys_call+0x1f0b/0x26f0
[ +0.000006] do_syscall_64+0x7c/0x170
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000013] Freed by task 1733 on cpu 5 at 59.907086s:
[ +0.000011] kasan_save_stack+0x28/0x60
[ +0.000006] kasan_save_track+0x18/0x70
[ +0.000005] kasan_save_free_info+0x3b/0x60
[ +0.000005] __kasan_slab_free+0x54/0x80
[ +0.000006] kfree+0x127/0x470
[ +0.000006] amdgpu_driver_postclose_kms+0x455/0x760 [amdgpu]
[ +0.000493] drm_file_free.part.0+0x5b1/0xba0
[ +0.000006] drm_file_free+0x13/0x30
[ +0.000006] drm_client_release+0x1c4/0x2b0
[ +0.000006] drm_fbdev_ttm_fb_destroy+0xd2/0x120 [drm_ttm_helper]
[ +0.000007] put_fb_info+0x97/0xe0
[ +0.000007] unregister_framebuffer+0x197/0x380
[ +0.000005] drm_fb_helper_unregister_info+0x94/0x100
[ +0.000005] drm_fbdev_client_unregister+0x3c/0x80
[ +0.000007] drm_client_dev_unregister+0x144/0x330
[ +0.000006] drm_dev_unregister+0x49/0x1b0
[ +0.000006] drm_dev_unplug+0x4c/0xd0
[ +0.000006] amdgpu_pci_remove+0x58/0x130 [amdgpu]
[ +0.000484] pci_device_remove+0xae/0x1e0
[ +0.000008] device_remove+0xc7/0x180
[ +0.000007] device_release_driver_internal+0x3d4/0x5a0
[ +0.000006] device_release_driver+0x12/0x20
[ +0.000007] pci_stop_bus_device+0x104/0x150
[ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40
[ +0.000006] remove_store+0xd7/0xf0
[ +0.000006] dev_attr_store+0x3f/0x80
[ +0.000005] sysfs_kf_write+0x125/0x1d0
[ +0.000006] kernfs_fop_write_iter+0x2ea/0x490
[ +0.000006] vfs_write+0x90d/0xe70
[ +0.000006] ksys_write+0x119/0x220
[ +0.000006] __x64_sys_write+0x72/0xc0
[ +0.000006] x64_sys_call+0x18ab/0x26f0
[ +0.000005] do_syscall_64+0x7c/0x170
[ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000012] The buggy address belongs to the object at ffff88812eec8000
which belongs to the cache kmalloc-rnd-07-4k of size 4096
[ +0.000016] The buggy address is located 3160 bytes inside of
freed 4096-byte region [ffff88812eec8000, ffff88812eec9000)
[ +0.000023] The buggy address belongs to the physical page:
[ +0.000009] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12eec8
[ +0.000007] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ +0.000005] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[ +0.000007] page_type: f5(slab)
[ +0.000008] raw: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000
[ +0.000005] raw: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000
[ +0.000006] head: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000
[ +0.000005] head: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000
[ +0.000006] head: 0017ffffc0000003 ffffea0004bbb201 ffffffffffffffff 0000000000000000
[ +0.000005] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
[ +0.000005] page dumped because: kasan: bad access detected
[ +0.000010] Memory state around the buggy address:
[ +0.000009] ffff88812eec8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000012] ffff88812eec8b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] >ffff88812eec8c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] ^
[ +0.000010] ffff88812eec8c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] ffff88812eec8d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] ==================================================================
The use-after-free occurs because a delayed work item (`suspend_work`) may still
be pending or running when resources it accesses are freed during device removal
or file close. The previous code used `flush_work(&fpriv->evf_mgr.suspend_work.work)`,
which does not wait for delayed work that has not yet started. As a result, the
delayed work could run after its memory was freed, causing a use-after-free.
By switching to `flush_delayed_work(&fpriv->evf_mgr.suspend_work)`, we ensure that
the kernel waits for both queued and delayed work to finish before
freeing memory, closing this race.
Fixes: adba092973
("drm/amdgpu: Fix Illegal opcode in command stream Error")
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-07 13:50:42 -04:00
Sunil Khatri
c03ea34cbf
drm/amdgpu: add support of debugfs for mqd information
...
Add debugfs support for mqd for each queue of the client.
The address exposed to debugfs could be used to dump
the mqd.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250704075548.1549849-5-sunil.khatri@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2025-07-04 16:00:48 +02:00
Jesse.Zhang
96a86dcb5b
drm/amdgpu: Fix circular locking in userq creation
...
A circular locking dependency was detected between the global
`adev->userq_mutex` and per-file `userq_mgr->userq_mutex` when
creating user queues. The issue occurs because:
1. `amdgpu_userq_suspend()` and `amdgpu_userq_resume` take `adev->userq_mutex` first, then
`userq_mgr->userq_mutex`
2. While `amdgpu_userq_create()` takes them in reverse order
This patch resolves the issue by:
1. Moving the `adev->userq_mutex` lock earlier in `amdgpu_userq_create()`
to cover the `amdgpu_userq_ensure_ev_fence()` call
2. Releasing it after we're done with both queue creation and the
scheduling halt check
v2: remove unused adev->userq_mutex lock (Prike)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14 11:29:38 -04:00
Arunpravin Paneer Selvam
bc5bab82d3
drm/amdgpu: Fix userq ttm_bo_pin and ttm_bo_unpin lockdep warnings
...
The ttm_bo_pin and ttm_bo_unpin warnings are resolved by moving the
doorbell bo reserve up before pin/unpin.
WARNING: CPU: 11 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:592 ttm_bo_pin+0x1f6/0x270 [ttm]
[ +0.000277] CPU: 11 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15
[ +0.000006] Tainted: [W]=WARN
[ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024
[ +0.000004] RIP: 0010:ttm_bo_pin+0x1f6/0x270 [ttm]
[ +0.000005] RSP: 0018:ffff88846ca879d0 EFLAGS: 00010246
[ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000
[ +0.000004] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ +0.000005] RBP: ffff88846ca879e8 R08: 0000000000000000 R09: 0000000000000000
[ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810b7ca848
[ +0.000004] R13: ffff88846c666250 R14: 1ffff1108d950f44 R15: ffff88846ca87aa0
[ +0.000005] FS: 00007c45ff436d00(0000) GS:ffff888409580000(0000) knlGS:0000000000000000
[ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000005] CR2: 00005b0c142a60e0 CR3: 000000012ce5a000 CR4: 0000000000f50ef0
[ +0.000004] PKRU: 55555554
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000005] ? show_regs+0x6c/0x80
[ +0.000007] ? __warn+0xd2/0x2d0
[ +0.000007] ? ttm_bo_pin+0x1f6/0x270 [ttm]
[ +0.000031] ? report_bug+0x282/0x2f0
[ +0.000012] ? handle_bug+0x6e/0xc0
[ +0.000007] ? exc_invalid_op+0x18/0x50
[ +0.000007] ? asm_exc_invalid_op+0x1b/0x20
[ +0.000017] ? ttm_bo_pin+0x1f6/0x270 [ttm]
[ +0.000014] amdgpu_bo_pin+0x365/0x9d0 [amdgpu]
[ +0.000191] ? __pfx_amdgpu_bo_pin+0x10/0x10 [amdgpu]
[ +0.000185] ? drm_gem_object_lookup+0x81/0xc0
[ +0.000008] ? kasan_save_alloc_info+0x37/0x60
[ +0.000007] ? __kasan_kmalloc+0xc3/0xd0
[ +0.000013] amdgpu_userqueue_get_doorbell_index+0xee/0x5f0 [amdgpu]
[ +0.000209] amdgpu_userq_ioctl+0x6b4/0xd40 [amdgpu]
[ +0.000193] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[ +0.000211] ? lock_acquire+0x7c/0xc0
[ +0.000006] ? drm_dev_enter+0x51/0x190
[ +0.000015] drm_ioctl_kernel+0x18b/0x330
[ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[ +0.000190] ? __pfx_drm_ioctl_kernel+0x10/0x10
[ +0.000005] ? lock_acquire+0x7c/0xc0
[ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? __kasan_check_write+0x14/0x30
[ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000011] drm_ioctl+0x589/0xd00
[ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[ +0.000194] ? __pfx_drm_ioctl+0x10/0x10
[ +0.000006] ? __pm_runtime_resume+0x80/0x110
[ +0.000021] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? trace_hardirqs_on+0x53/0x60
[ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80
[ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu]
[ +0.000185] __x64_sys_ioctl+0x13a/0x1c0
[ +0.000010] x64_sys_call+0x11ad/0x25f0
[ +0.000007] do_syscall_64+0x91/0x180
[ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? irqentry_exit+0x77/0xb0
[ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? exc_page_fault+0x93/0x150
[ +0.000009] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000005] RIP: 0033:0x7c45ff924ded
[ +0.000005] RSP: 002b:00007ffff7167810 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded
[ +0.000004] RDX: 00007ffff7167870 RSI: 00000000c0486456 RDI: 000000000000000b
[ +0.000004] RBP: 00007ffff7167860 R08: ffff800100000000 R09: 0000000000010000
[ +0.000005] R10: 00007ffff7167950 R11: 0000000000000246 R12: 00005b0c2a51bc48
[ +0.000004] R13: 000000000000000b R14: 0000000000000000 R15: 00007ffff7167950
[ +0.000022] </TASK>
[ +0.000004] irq event stamp: 80693
[ +0.000004] hardirqs last enabled at (80699): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0
[ +0.000005] hardirqs last disabled at (80704): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0
[ +0.000005] softirqs last enabled at (80390): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0
[ +0.000005] softirqs last disabled at (80385): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0
[ +0.000006] ---[ end trace 0000000000000000 ]---
------------------------------------------------------------------------------------------------------
[ +0.000006] WARNING: CPU: 10 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:611 ttm_bo_unpin+0x21f/0x2c0 [ttm]
[ +0.000280] CPU: 10 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15
[ +0.000006] Tainted: [W]=WARN
[ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024
[ +0.000004] RIP: 0010:ttm_bo_unpin+0x21f/0x2c0 [ttm]
[ +0.000005] RSP: 0018:ffff88846ca87888 EFLAGS: 00010246
[ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000
[ +0.000005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ +0.000004] RBP: ffff88846ca878a0 R08: 0000000000000000 R09: 0000000000000000
[ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164e90050
[ +0.000005] R13: ffff88846c666200 R14: 0000000000000001 R15: ffff888168402d28
[ +0.000004] FS: 00007c45ff436d00(0000) GS:ffff888409500000(0000) knlGS:0000000000000000
[ +0.000005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000004] CR2: 00007c45f7373b20 CR3: 000000012ce5a000 CR4: 0000000000f50ef0
[ +0.000005] PKRU: 55555554
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000005] ? show_regs+0x6c/0x80
[ +0.000008] ? __warn+0xd2/0x2d0
[ +0.000007] ? ttm_bo_unpin+0x21f/0x2c0 [ttm]
[ +0.000012] ? report_bug+0x282/0x2f0
[ +0.000013] ? handle_bug+0x6e/0xc0
[ +0.000006] ? exc_invalid_op+0x18/0x50
[ +0.000008] ? asm_exc_invalid_op+0x1b/0x20
[ +0.000017] ? ttm_bo_unpin+0x21f/0x2c0 [ttm]
[ +0.000011] ? ttm_bo_unpin+0x217/0x2c0 [ttm]
[ +0.000011] amdgpu_bo_unpin+0x45/0x250 [amdgpu]
[ +0.000216] amdgpu_userq_ioctl+0x2c3/0xd40 [amdgpu]
[ +0.000226] ? drm_dev_exit+0x2d/0x60
[ +0.000010] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[ +0.000201] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? lock_acquire+0x7c/0xc0
[ +0.000006] ? drm_dev_enter+0x51/0x190
[ +0.000015] drm_ioctl_kernel+0x18b/0x330
[ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[ +0.000188] ? __pfx_drm_ioctl_kernel+0x10/0x10
[ +0.000006] ? lock_acquire+0x7c/0xc0
[ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? __kasan_check_write+0x14/0x30
[ +0.000006] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000010] drm_ioctl+0x589/0xd00
[ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu]
[ +0.000211] ? __pfx_drm_ioctl+0x10/0x10
[ +0.000006] ? __pm_runtime_resume+0x80/0x110
[ +0.000020] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000006] ? trace_hardirqs_on+0x53/0x60
[ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80
[ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu]
[ +0.000186] __x64_sys_ioctl+0x13a/0x1c0
[ +0.000010] x64_sys_call+0x11ad/0x25f0
[ +0.000007] do_syscall_64+0x91/0x180
[ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? do_syscall_64+0x9d/0x180
[ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000010] ? __pfx___rseq_handle_notify_resume+0x10/0x10
[ +0.000005] ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
[ +0.000013] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? syscall_exit_to_user_mode+0x95/0x260
[ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? do_syscall_64+0x9d/0x180
[ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? do_syscall_64+0x9d/0x180
[ +0.000011] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000010] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? irqentry_exit_to_user_mode+0x8b/0x260
[ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000006] ? irqentry_exit+0x77/0xb0
[ +0.000004] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000005] ? exc_page_fault+0x93/0x150
[ +0.000010] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000005] RIP: 0033:0x7c45ff924ded
[ +0.000005] RSP: 002b:00007ffff7168790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded
[ +0.000005] RDX: 00007ffff71687f0 RSI: 00000000c0486456 RDI: 000000000000000b
[ +0.000004] RBP: 00007ffff71687e0 R08: 00005b0c2a49b010 R09: 0000000000000007
[ +0.000004] R10: 00005b0c2a4d7140 R11: 0000000000000246 R12: 000000000000000b
[ +0.000004] R13: 00007c45ff19e5cc R14: 00005b0c2a51c538 R15: 00005b0c2a51bbd8
[ +0.000022] </TASK>
[ +0.000005] irq event stamp: 87419
[ +0.000004] hardirqs last enabled at (87425): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0
[ +0.000005] hardirqs last disabled at (87430): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0
[ +0.000005] softirqs last enabled at (87058): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0
[ +0.000006] softirqs last disabled at (87053): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0
[ +0.000005] ---[ end trace 0000000000000000 ]---
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 13:39:46 -04:00
Jesse.Zhang
648a0dc0d7
drm/amdgpu: Fix user queue deadlock by reordering mutex locking
...
This resolves a deadlock between user queue management and GPU reset
paths by enforcing consistent lock ordering.
The deadlock occurred when:
1. Process exit path (amdgpu_userq_mgr_fini) would:
- Take uqm->userq_mutex
- Then try to take adev->userq_mutex for list operations
2. GPU reset path (amdgpu_userq_pre_reset) would:
- Take adev->userq_mutex first (for list traversal)
- Then take uqm->userq_mutex
The solution establishes a strict top-down locking order:
1. Always take adev->userq_mutex before any uqm->userq_mutex
2. Maintain this order consistently across all code paths
Changes made:
- Reordered locking in amdgpu_userq_mgr_fini() to take device lock first
- Kept existing proper order in amdgpu_userq_pre_reset()
- Simplified the fini flow by removing redundant operations
This prevents circular dependencies while maintaining thread safety
during both normal operation and GPU reset scenarios.
Fixes: 4ce60dbada
("drm/amdgpu: store userq_managers in a list in adev")
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 09:32:25 -04:00
Arvind Yadav
f10eb185ad
drm/amdgpu: Fix NULL dereference in amdgpu_userq_restore_worker
...
Switch cancel_delayed_work() to cancel_delayed_work_sync() to ensure
the delayed work has finished executing before proceeding with
resource cleanup. This prevents a potential use-after-free or
NULL dereference if the resume_work is still running during finalization.
BUG: kernel NULL pointer dereference, address: 0000000000000140
[ +0.000050] #PF: supervisor read access in kernel mode
[ +0.000019] #PF: error_code(0x0000) - not-present page
[ +0.000021] PGD 0 P4D 0
[ +0.000015] Oops: Oops: 0000 [#1 ] PREEMPT SMP NOPTI
[ +0.000021] CPU: 17 UID: 0 PID: 196299 Comm: kworker/17:0 Tainted: G U 6.14.0-org-staging #1
[ +0.000032] Tainted: [U]=USER
[ +0.000015] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F39 03/22/2024
[ +0.000029] Workqueue: events amdgpu_userq_restore_worker [amdgpu]
[ +0.000426] RIP: 0010:drm_exec_lock_obj+0x32/0x210 [drm_exec]
[ +0.000025] Code: e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 48 83 ec 08 4c 8b 77 30 4d 85 f6 0f 85 c0 00 00 00 4c 8d 7f 08 48 39 77 38 74 54 <49> 8b bd f8 00 00 00 4c 89 fe 41 f6 04 24 01 75 3c e8 08 50 bc e0
[ +0.000046] RSP: 0018:ffffab1b04da3ce8 EFLAGS: 00010297
[ +0.000020] RAX: 0000000000000001 RBX: ffff930cc60e4bc0 RCX: 0000000000000000
[ +0.000025] RDX: 0000000000000004 RSI: 0000000000000048 RDI: ffffab1b04da3d88
[ +0.000028] RBP: ffffab1b04da3d10 R08: ffff930cc60e4000 R09: 0000000000000000
[ +0.000022] R10: ffffab1b04da3d18 R11: 0000000000000001 R12: ffffab1b04da3d88
[ +0.000023] R13: 0000000000000048 R14: 0000000000000000 R15: ffffab1b04da3d90
[ +0.000023] FS: 0000000000000000(0000) GS:ffff9313dea80000(0000) knlGS:0000000000000000
[ +0.000024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000021] CR2: 0000000000000140 CR3: 000000018351a000 CR4: 0000000000350ef0
[ +0.000025] Call Trace:
[ +0.000018] <TASK>
[ +0.000015] ? show_regs+0x69/0x80
[ +0.000022] ? __die+0x25/0x70
[ +0.000019] ? page_fault_oops+0x15d/0x510
[ +0.000024] ? do_user_addr_fault+0x312/0x690
[ +0.000024] ? sched_clock_cpu+0x10/0x1a0
[ +0.000028] ? exc_page_fault+0x78/0x1b0
[ +0.000025] ? asm_exc_page_fault+0x27/0x30
[ +0.000024] ? drm_exec_lock_obj+0x32/0x210 [drm_exec]
[ +0.000024] drm_exec_prepare_obj+0x21/0x60 [drm_exec]
[ +0.000021] amdgpu_vm_lock_pd+0x22/0x30 [amdgpu]
[ +0.000266] amdgpu_userq_validate_bos+0x6c/0x320 [amdgpu]
[ +0.000333] amdgpu_userq_restore_worker+0x4a/0x120 [amdgpu]
[ +0.000316] process_one_work+0x189/0x3c0
[ +0.000021] worker_thread+0x2a4/0x3b0
[ +0.000022] kthread+0x109/0x220
[ +0.000018] ? __pfx_worker_thread+0x10/0x10
[ +0.000779] ? _raw_spin_unlock_irq+0x1f/0x40
[ +0.000560] ? __pfx_kthread+0x10/0x10
[ +0.000543] ret_from_fork+0x3c/0x60
[ +0.000507] ? __pfx_kthread+0x10/0x10
[ +0.000515] ret_from_fork_asm+0x1a/0x30
[ +0.000515] </TASK>
v2: Replace cancel_delayed_work() to cancel_delayed_work_sync()
in amdgpu_userq_destroy() and amdgpu_userq_evict().
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 09:21:39 -04:00
Sunil Khatri
71353c1a4f
drm/amdgpu: change DRM_DBG_DRIVER to drm_dbg_driver
...
update the functions in amdgpu_userqueues.c from
DRM_DBG_DRIVER to drm_dbg_driver so multi gpu instance
can be logged in.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05 13:29:38 -04:00
Sunil Khatri
c46a37628a
drm/amdgpu: change DRM_ERROR to drm_file_err in amdgpu_userq.c
...
change the DRM_ERROR and drm_err to drm_file_err
to add process name and pid to the logging.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05 13:29:33 -04:00
Sunil Khatri
8c97cdb1a6
drm/amdgpu: use drm_file_err in fence timeouts
...
use drm_file_err instead of DRM_ERROR which adds
process and pid information in the userqueue error
logging.
Sample log:
[ 19.802315] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: ibus-x11 pid: 2055 client: Unset ... Couldn't unmap all the queues
[ 19.802319] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: ibus-x11 pid: 2055 client: Unset ... Failed to evict userqueue
[ 19.838432] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: systemd-logind pid: 1042 client: Unset ... Couldn't unmap all the queues
[ 19.838436] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: systemd-logind pid: 1042 client: Unset ... Failed to evict userqueue
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05 13:29:25 -04:00
Sunil Khatri
30ff75809d
drm/amdgpu: add drm_file reference in userq_mgr
...
drm_file will be used in usermode queues code to
enable better process information in logging and hence
add drm_file part of the userq_mgr struct.
update the drm_file pointer in userq_mgr for each
amdgpu_driver_open_kms.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05 13:29:18 -04:00
Dan Carpenter
97c39b4da6
drm/amdgpu/userq: remove unnecessary NULL check
...
The "ticket" pointer points to in the middle of the &exec struct so it
can't be NULL. Remove the check.
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:18:52 -04:00
Alex Deucher
482d485332
drm/amdgpu/userq: take the userq_mgr lock in enforce isolation
...
Add the missing locking.
Fixes: 94976e7e5e
("drm/amdgpu/userq: add helpers to start/stop scheduling")
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:09:26 -04:00
Alex Deucher
c5e02d6588
drm/amdgpu/userq: take the userq_mgr lock in suspend/resume
...
Add the missing locking.
Fixes: 73e12e98ec
("drm/amdgpu/userq: add suspend and resume helpers")
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:08:24 -04:00
Arvind Yadav
56801cb83c
drm/amdgpu: remove DRM_AMDGPU_NAVI3X_USERQ config for UQ
...
DRM_AMDGPU_NAVI3X_USERQ config support is not required for
usermode queue.
v2: rebase.
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:06:00 -04:00
Sunil Khatri
127e612bf1
drm/amdgpu: update fence ptr with context:seqno
...
log context:seqno of the fence during timeout rather
than logging fence pointer.
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Alex Deucher
42a6667780
drm/amdgpu/userq: use consistent function naming
...
s/userqueue/userq/
1. remove the mix of amdgpu_userqueue and amdgpu_userq
2. to be consistent with other amdgpu_userq_fence.c
3. it's shorter
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00