Commit Graph

30 Commits

Author SHA1 Message Date
Thomas Zimmermann
c598d5eb9f Merge drm/drm-next into drm-misc-next
Backmerging to forward to v6.16-rc1

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-06-11 09:01:34 +02:00
Dan Carpenter
98a46a4089 drm/amdgpu: Fix integer overflow issues in amdgpu_userq_fence.c
This patch only affects 32bit systems.  There are several integer
overflows bugs here but only the "sizeof(u32) * num_syncobj"
multiplication is a problem at runtime.  (The last lines of this patch).

These variables are u32 variables that come from the user.  The issue
is the multiplications can overflow leading to us allocating a smaller
buffer than intended.  For the first couple integer overflows, the
syncobj_handles = memdup_user() allocation is immediately followed by
a kmalloc_array():

	syncobj = kmalloc_array(num_syncobj_handles, sizeof(*syncobj), GFP_KERNEL);

In that situation the kmalloc_array() works as a bounds check and we
haven't accessed the syncobj_handlesp[] array yet so the integer overflow
is harmless.

But the "num_syncobj" multiplication doesn't have that and the integer
overflow could lead to an out of bounds access.

Fixes: a292fdecd7 ("drm/amdgpu: Implement userqueue signal/wait IOCTL")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-03 15:06:35 -04:00
Tvrtko Ursulin
bf33a0003d dma-fence: Use a flag for 64-bit seqnos
With the goal of reducing the need for drivers to touch (and dereference)
fence->ops, we move the 64-bit seqnos flag from struct dma_fence_ops to
the fence->flags.

Drivers which were setting this flag are changed to use new
dma_fence_init64() instead of dma_fence_init().

v2:
 * Streamlined init and added kerneldoc.
 * Rebase for amdgpu userq which landed since.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com> # v1
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://lore.kernel.org/r/20250515095004.28318-3-tvrtko.ursulin@igalia.com
2025-06-03 17:38:04 +01:00
Arunpravin Paneer Selvam
218caca4ba drm/amdgpu/userq: Fix lock contention in userq fence
Fix lockdep warnings.

[  +0.000637] ================================
[  +0.000004] WARNING: inconsistent lock state
[  +0.000004] 6.12.0+ #18 Tainted: G        W  OE
[  +0.000004] --------------------------------
[  +0.000004] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[  +0.000004] Xwayland/1952 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  +0.000005] ffff8884636f4740 (&fence_drv->fence_list_lock){?...}-{2:2}, at: amdgpu_userq_fence_driver_destroy+0xb8/0x540 [amdgpu]
[  +0.000208] {IN-HARDIRQ-W} state was registered at:
[  +0.000004]   lock_acquire.part.0+0x116/0x360
[  +0.000005]   lock_acquire+0x7c/0xc0
[  +0.000005]   _raw_spin_lock+0x2f/0x60
[  +0.000005]   amdgpu_userq_fence_driver_process+0x75/0x400 [amdgpu]
[  +0.000185]   gfx_v12_0_eop_irq+0x29f/0x420 [amdgpu]
[  +0.000210]   amdgpu_irq_dispatch+0x2a4/0x7b0 [amdgpu]
[  +0.000191]   amdgpu_ih_process+0x1e1/0x3d0 [amdgpu]
[  +0.000185]   amdgpu_irq_handler+0x28/0xc0 [amdgpu]
[  +0.000183]   __handle_irq_event_percpu+0x1bb/0x590
[  +0.000005]   handle_irq_event+0xab/0x1d0
[  +0.000005]   handle_edge_irq+0x1fd/0xc10
[  +0.000005]   __common_interrupt+0x83/0x190
[  +0.000004]   common_interrupt+0xb1/0xe0
[  +0.000005]   asm_common_interrupt+0x27/0x40
[  +0.000004]   cpuidle_enter_state+0x2ba/0x530
[  +0.000005]   cpuidle_enter+0x4f/0xb0
[  +0.000006]   call_cpuidle+0x46/0xd0
[  +0.000005]   do_idle+0x367/0x430
[  +0.000004]   cpu_startup_entry+0x58/0x70
[  +0.000005]   start_secondary+0x224/0x2b0
[  +0.000005]   common_startup_64+0x13e/0x141
[  +0.000005] irq event stamp: 88271
[  +0.000004] hardirqs last  enabled at (88271): [<ffffffffad9ca7a1>] _raw_spin_unlock_irqrestore+0x51/0x80
[  +0.000005] hardirqs last disabled at (88270): [<ffffffffad9ca424>] _raw_spin_lock_irqsave+0x74/0x80
[  +0.000005] softirqs last  enabled at (87858): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0
[  +0.000005] softirqs last disabled at (87849): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0
[  +0.000005]
              other info that might help us debug this:
[  +0.000004]  Possible unsafe locking scenario:

[  +0.000003]        CPU0
[  +0.000004]        ----
[  +0.000003]   lock(&fence_drv->fence_list_lock);

v2:
  Drop fence_list_flags and use xa_lock_irqsave() flags parameter (Christian)
  Fix merge conflicts.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 13:39:20 -04:00
Arvind Yadav
010503a3cb drm/amdgpu: Fix amdgpu_userq_wait_ioctl() warn missing error code 'r'
To resolve the warning regarding the missing error code 'r' in
amdgpu_userq_wait_ioctl(), assign the value 'r = -EINVAL'.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202505080458.rnV8YfiY-lkp@intel.com/
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13 09:21:56 -04:00
Arvind Yadav
3e50b1d625 drm/amdgpu: only keep most recent fence for each context
Keep only the latest fences to reduce the number of values
given back to userspace

v2: - Export this code from dma-fence-unwrap.c(by Christian).
v3: - To split this in a dma_buf patch and amd userq patch(by Sunil).
    - No need to add a new function just re-use existing(by Christian).
v4: Export dma_fence_dedub_array function and used it(by Christian).

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05 13:29:58 -04:00
Dan Carpenter
d6c6d5ec66 drm/amdgpu/userq: Call unreserve on error in amdgpu_userq_fence_read_wptr()
This error path should call amdgpu_bo_unreserve() before returning.

Fixes: d8675102ba ("drm/amdgpu: add vm root BO lock before accessing the vm")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:17:42 -04:00
Arvind Yadav
56801cb83c drm/amdgpu: remove DRM_AMDGPU_NAVI3X_USERQ config for UQ
DRM_AMDGPU_NAVI3X_USERQ config support is not required for
usermode queue.

v2: rebase.

Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:06:00 -04:00
Alex Deucher
42a6667780 drm/amdgpu/userq: use consistent function naming
s/userqueue/userq/

1. remove the mix of amdgpu_userqueue and amdgpu_userq
2. to be consistent with other amdgpu_userq_fence.c
3. it's shorter

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Arunpravin Paneer Selvam
4b27406380 drm/amdgpu: Add queue id support to the user queue wait IOCTL
Add queue id support to the user queue wait IOCTL
drm_amdgpu_userq_wait structure.

This is required to retrieve the wait user queue and maintain
the fence driver references in it so that the user queue in
the same context releases their reference to the fence drivers
at some point before queue destruction.

Otherwise, we would gather those references until we
don't have any more space left and crash.

v2: Modify the UAPI comment as per the mesa and libdrm UAPI comment.

Libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/408
Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34493

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Dan Carpenter
0e023c327b drm/amdgpu: Clean up error handling in amdgpu_userq_fence_driver_alloc()
1) Checkpatch complains if we print an error message for kzalloc()
   failure.  The kzalloc() failure already has it's own error messages
   built in.  Also this allocation is small enough that it is guaranteed
   to succeed.
2) Return directly instead of doing a goto free_fence_drv.  The
   "fence_drv" is already NULL so no cleanup is necessary.

Reviewed-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:10 -04:00
Dan Carpenter
8ff7c78bae drm/amdgpu: Fix double free in amdgpu_userq_fence_driver_alloc()
The goto frees "fence_drv" so this is a double free bug.  There is no
need to call amdgpu_seq64_free(adev, fence_drv->va) since the seq64
allocation failed so change the goto to goto free_fence_drv.  Also
propagate the error code from amdgpu_seq64_alloc() instead of hard coding
it to -ENOMEM.

Fixes: e7cf21fbb2 ("drm/amdgpu: Few optimization and fixes for userq fence driver")
Reviewed-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:52:53 -04:00
Alex Deucher
edc762a51c drm/amdgpu/userq: move some code around
Move some userq fence handling code into amdgpu_userq_fence.c.
This matches the other code in that file.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:20 -04:00
Arvind Yadav
f15d4e92f7 drm/amdgpu: Fix display freeze lockup error
A deadlock situation has arised between the userq
signal ioctl and the eviction fence. In this scenario,
the function amdgpu_userq_signal_ioctl() has acquired a reservation
lock on the read/write buffer object (BO) through drm_exec.
Subsequently, it calls amdgpu_userqueue_ensure_ev_fence(),
which is in a waiting for the userq resume work.
Meanwhile, the userq suspend worker has initiated the userq resume
work(amdgpu_userqueue_resume_worker). This userq resume work attempts
to validate the vm->done BO, leading to amdgpu_userqueue_validate_bos
also attempting to reservation lock the same write BO that is already
locked by amdgpu_userq_signal_ioctl.
As a result, the resume work becomes stalled, causing
amdgpu_userqueue_ensure_ev_fence to remain in a waiting state.

Call Trace:
[  242.836469] INFO: task gnome-shel:cs0:1288 blocked for more than 120 seconds.
[  242.836486]       Tainted: G           OE      6.12.0-rc2rebased-oct-24+ #4
[  242.836491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.836494] task:gnome-shel:cs0  state:D stack:0     pid:1288  tgid:1282  ppid:1180   flags:0x00000002
[  242.836503] Call Trace:
[  242.836508]  <TASK>
[  242.836517]  __schedule+0x3e0/0xb10
[  242.836530]  ? srso_return_thunk+0x5/0x5f
[  242.836541]  schedule+0x31/0x120
[  242.836546]  schedule_timeout+0x150/0x160
[  242.836551]  ? srso_return_thunk+0x5/0x5f
[  242.836555]  ? sysvec_call_function+0x69/0xd0
[  242.836562]  ? srso_return_thunk+0x5/0x5f
[  242.836567]  ? preempt_count_add+0x7f/0xd0
[  242.836577]  __wait_for_common+0x91/0x180
[  242.836582]  ? __pfx_schedule_timeout+0x10/0x10
[  242.836590]  wait_for_completion+0x28/0x30
[  242.836595]  __flush_work+0x16c/0x290
[  242.836602]  ? __pfx_wq_barrier_func+0x10/0x10
[  242.836611]  flush_delayed_work+0x3a/0x60
[  242.836621]  amdgpu_userqueue_ensure_ev_fence+0x2d/0xb0 [amdgpu]
[  242.836966]  amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu]
[  242.837171]  ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[  242.837365]  drm_ioctl_kernel+0xae/0x100 [drm]
[  242.837398]  drm_ioctl+0x2a1/0x500 [drm]
[  242.837420]  ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[  242.837622]  ? srso_return_thunk+0x5/0x5f
[  242.837627]  ? srso_return_thunk+0x5/0x5f
[  242.837630]  ? _raw_spin_unlock_irqrestore+0x2b/0x50
[  242.837635]  amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[  242.837811]  __x64_sys_ioctl+0x99/0xd0
[  242.837820]  x64_sys_call+0x1209/0x20d0
[  242.837825]  do_syscall_64+0x51/0x120
[  242.837830]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  242.837835] RIP: 0033:0x7f2f33f1a94f
[  242.837838] RSP: 002b:00007f2f24ffea30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  242.837842] RAX: ffffffffffffffda RBX: 00007f2f24ffebd0 RCX: 00007f2f33f1a94f
[  242.837845] RDX: 00007f2f24ffebd0 RSI: 00000000c0306457 RDI: 000000000000000d
[  242.837847] RBP: 00007f2f24ffeab0 R08: 0000000000000000 R09: 0000000000000000
[  242.837849] R10: 00007f2f24ffecd0 R11: 0000000000000246 R12: 00007f2f25000640
[  242.837851] R13: 00000000c0306457 R14: 000000000000000d R15: 00007fff3b39c1e0
[  242.837858]  </TASK>
[  242.837865] INFO: task Xwayland:cs0:1517 blocked for more than 120 seconds.
[  242.837869]       Tainted: G           OE      6.12.0-rc2rebased-oct-24+ #4
[  242.837872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.837874] task:Xwayland:cs0    state:D stack:0     pid:1517  tgid:1338  ppid:1282   flags:0x00004002
[  242.837878] Call Trace:
[  242.837880]  <TASK>
[  242.837883]  __schedule+0x3e0/0xb10
[  242.837890]  schedule+0x31/0x120
[  242.837894]  schedule_preempt_disabled+0x1c/0x30
[  242.837897]  __mutex_lock.constprop.0+0x386/0x6e0
[  242.837902]  ? srso_return_thunk+0x5/0x5f
[  242.837905]  ? __timer_delete_sync+0x81/0xe0
[  242.837911]  __mutex_lock_slowpath+0x13/0x20
[  242.837915]  mutex_lock+0x3b/0x50
[  242.837919]  amdgpu_userqueue_ensure_ev_fence+0x35/0xb0 [amdgpu]
[  242.838138]  amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu]
[  242.838340]  ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[  242.838531]  drm_ioctl_kernel+0xae/0x100 [drm]
[  242.838559]  drm_ioctl+0x2a1/0x500 [drm]
[  242.838580]  ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[  242.838778]  ? srso_return_thunk+0x5/0x5f
[  242.838783]  ? srso_return_thunk+0x5/0x5f
[  242.838786]  ? _raw_spin_unlock_irqrestore+0x2b/0x50
[  242.838791]  amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[  242.838967]  __x64_sys_ioctl+0x99/0xd0
[  242.838972]  x64_sys_call+0x1209/0x20d0
[  242.838975]  do_syscall_64+0x51/0x120
[  242.838979]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  242.838982] RIP: 0033:0x7f9118b1a94f
[  242.838985] RSP: 002b:00007f910cdff760 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  242.838989] RAX: ffffffffffffffda RBX: 00007f910cdff910 RCX: 00007f9118b1a94f
[  242.838991] RDX: 00007f910cdff910 RSI: 00000000c0306457 RDI: 000000000000000c
[  242.838993] RBP: 00007f910cdff7e0 R08: 0000000000000000 R09: 0000000000000001
[  242.838995] R10: 00007f910cdff9d4 R11: 0000000000000246 R12: 00007f910ce00640
[  242.838997] R13: 00000000c0306457 R14: 000000000000000c R15: 00007fff9dd11d10
[  242.839004]  </TASK>

v2: Addressed review comemnts from Christian.
v3/v4: Addressed review comemnts from Christian.
   - Move drm_exec drm_exec loop after userq fence create.
   - cleanup the newly created userq fence in case of error.
v5 - Addressed review comemnts from Christian.
   - Create a new amdgpu_userq_fence_alloc() function for allocation.
   - Calling dma_fence_put for cleanup procedure.
   - make amdgpu_userq_fence_create() function static.
   - drm_exec_init is called after mutex_unlock.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:20 -04:00
Arunpravin Paneer Selvam
239a310b49 drm/amdgpu: Fix out-of-bounds issue in user fence
Fix out-of-bounds issue in userq fence create when
accessing the userq xa structure. Added a lock to
protect the race condition.

v2:(Christian)
  - Allocate memory with GFP_ATOMIC.

v3:
  - Moved to 2 xa approach.

v4:(Christian)
  - Lock the xa_for_each blocks and memory allocation part
    as well to make sure that xa is not modified in between
    the 2 xa_for_each blocks.

BUG: KASAN: slab-out-of-bounds in amdgpu_userq_fence_create+0x726/0x880 [amdgpu]
[  +0.000006] Call Trace:
[  +0.000005]  <TASK>
[  +0.000005]  dump_stack_lvl+0x6c/0x90
[  +0.000011]  print_report+0xc4/0x5e0
[  +0.000009]  ? srso_return_thunk+0x5/0x5f
[  +0.000008]  ? kasan_complete_mode_report_info+0x26/0x1d0
[  +0.000007]  ? amdgpu_userq_fence_create+0x726/0x880 [amdgpu]
[  +0.000405]  kasan_report+0xdf/0x120
[  +0.000009]  ? amdgpu_userq_fence_create+0x726/0x880 [amdgpu]
[  +0.000405]  __asan_report_store8_noabort+0x17/0x20
[  +0.000007]  amdgpu_userq_fence_create+0x726/0x880 [amdgpu]
[  +0.000406]  ? __pfx_amdgpu_userq_fence_create+0x10/0x10 [amdgpu]
[  +0.000408]  ? srso_return_thunk+0x5/0x5f
[  +0.000008]  ? ttm_resource_move_to_lru_tail+0x235/0x4f0 [ttm]
[  +0.000013]  ? srso_return_thunk+0x5/0x5f
[  +0.000008]  amdgpu_userq_signal_ioctl+0xd29/0x1c70 [amdgpu]
[  +0.000412]  ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[  +0.000404]  ? try_to_wake_up+0x165/0x1840
[  +0.000010]  ? __pfx_futex_wake_mark+0x10/0x10
[  +0.000011]  drm_ioctl_kernel+0x178/0x2f0 [drm]
[  +0.000050]  ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[  +0.000404]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[  +0.000043]  ? __kasan_check_read+0x11/0x20
[  +0.000007]  ? srso_return_thunk+0x5/0x5f
[  +0.000007]  ? __kasan_check_write+0x14/0x20
[  +0.000008]  drm_ioctl+0x513/0xd20 [drm]
[  +0.000040]  ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[  +0.000407]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
[  +0.000044]  ? srso_return_thunk+0x5/0x5f
[  +0.000007]  ? _raw_spin_lock_irqsave+0x99/0x100
[  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  +0.000006]  ? __rseq_handle_notify_resume+0x188/0xc30
[  +0.000008]  ? srso_return_thunk+0x5/0x5f
[  +0.000008]  ? srso_return_thunk+0x5/0x5f
[  +0.000006]  ? _raw_spin_unlock_irqrestore+0x27/0x50
[  +0.000010]  amdgpu_drm_ioctl+0xcd/0x1d0 [amdgpu]
[  +0.000388]  __x64_sys_ioctl+0x135/0x1b0
[  +0.000009]  x64_sys_call+0x1205/0x20d0
[  +0.000007]  do_syscall_64+0x4d/0x120
[  +0.000008]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0.000007] RIP: 0033:0x7f7c3d31a94f

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:20 -04:00
Arunpravin Paneer Selvam
c9e20cb005 drm/amdgpu: Fix NULL ptr dereference issue for non userq fences
Add the correct fences count variable [num_fences] in the fences
array iteration to handle the userq / non-userq fences.

v2:(Christian)
  - All fences in the array either come from some reservation object
    or drm_syncobj. If any of those are NULL then there is a bug
    somewhere else.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:19 -04:00
Shashank Sharma
a242a3e4b5 drm/amdgpu: simplify eviction fence suspend/resume
The basic idea in this redesign is to add an eviction fence only in UQ
resume path. When userqueue is not present, keep ev_fence as NULL

Main changes are:
 - do not create the eviction fence during evf_mgr_init, keeping
   evf_mgr->ev_fence=NULL until UQ get active.
 - do not replace the ev_fence in evf_resume path, but replace it only in
   uq_resume path, so remove all the unnecessary code from ev_fence_resume.
 - add a new helper function (amdgpu_userqueue_ensure_ev_fence) which
   will do the following:
   - flush any pending uq_resume work, so that it could create an
     eviction_fence
   - if there is no pending uq_resume_work, add a uq_resume work and
     wait for it to execute so that we always have a valid ev_fence
 - call this helper function from two places, to ensure we have a valid
   ev_fence:
   - when a new uq is created
   - when a new uq completion fence is created

v2: Worked on review comments by Christian.
v3: Addressed few more review comments by Christian.
v4: Move mutex lock outside of the amdgpu_userqueue_suspend()
    function (Christian).
v5: squash in build fix (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:19 -04:00
Shashank Sharma
b0328087c1 drm/amdgpu: suspend gfx userqueues
This patch adds suspend support for gfx userqueues. It typically does
the following:
- adds an enable_signaling function for the eviction fence, so that it
  can trigger the userqueue suspend,
- adds a delayed work to handle suspending of the eviction_fence
- adds a suspend function to handle suspending of userqueues which
  suspends all the queues under this userq manager and signals the
  eviction fence,
- adds a function to replace the old eviction fence with a new one and
  attach it to each of the objects,
- adds reference of userq manager in the eviction fence container so
  that it can be used in the suspend function.

V2: Addressed Christian's review comments:
    - schedule suspend work immediately

V4: Addressed Christian's review comments:
    - wait for pending uq fences before starting suspend, added
      queue->last_fence for the same
    - accommodate ev_fence_mgr into existing code
    - some bug fixes and NULL checks

V5: Addressed Christian's review comments (gitlab)
    - Wait for eviction fence to get signaled in destroy,
      don't signal it
    - Wait for eviction fence to get signaled in replace fence,
      don't signal it

V6: Addressed Christian's review comments
    - Do not destroy the old eviction fence until we have it replaced
    - Change the sequence of fence replacement sub-tasks
    - reusing the ev_fence delayed work for userqueue suspend as well
      (Shashank).

V7: Addressed Christian's review comments
    - give evf_mgr as argument (instead of fpriv) to replace_fence()
    - save ptr to evf_mgr in ev_fence (instead of uq_mgr)
    - modify suspend_all_queues logic to reflect error properly
    - remove the garbage drm_exec_lock section in wait_for_signal
    - grab the userqueue mutex before starting the wait for fence
    - remove the unrelated gobj check from signal_ioctl

V8: Added race condition fixes

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:18 -04:00
Arunpravin Paneer Selvam
2761bb9a31 drm/amdgpu: Modify userq signal/wait struct field names
Modify kernel UAPI userq signal/wait struct field names and
description corresponding to the libdrm UAPI review comments.

libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/392

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:17 -04:00
Arunpravin Paneer Selvam
189ee986b0 drm/amdgpu: add userq specific kernel config for fence ioctls
Keep the user queue fence signal and wait IOCTLs in the
kernel config CONFIG_DRM_AMDGPU_NAVI3X_USERQ.

v2(Christian):
  - Remove the userq specific config added for kernel queues fence init
    function.

v3(Alex):
  - It will be better to return an error(-ENOTSUPP) in these cases.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:17 -04:00
Arunpravin Paneer Selvam
f7cb6a28e1 drm/amdgpu: Add gpu_addr support to seq64 allocation
Add gpu address support to seq64 alloc function.

v1:(Christian)
  - Add the user of this new interface change to the same
    patch.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:17 -04:00
Arunpravin Paneer Selvam
cb4a73f46f drm/amdgpu: Add separate array of read and write for BO handles
Drop AMDGPU_USERQ_BO_WRITE as this should not be a global option
of the IOCTL, It should be option per buffer. Hence adding separate
array for read and write BO handles.

v2(Marek):
  - Internal kernel details shouldn't be here. This file should only
    document the observed behavior, not the implementation .

v3:
  - Fix DAL CI clang issue.

v4:
  - Added Alex RB to merge the kernel UAPI changes since he has
    already approved the amdgpu_drm.h changes.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:17 -04:00
Arunpravin Paneer Selvam
d8675102ba drm/amdgpu: add vm root BO lock before accessing the vm
Add a vm root BO lock before accessing the userqueue VM.

v1:(Christian)
   - Keep the VM locked until you are done with the mapping.
   - Grab a temporary BO reference, drop the VM lock and acquire the BO.
     When you are done with everything just drop the BO lock and
     then the temporary BO reference.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:17 -04:00
Arunpravin Paneer Selvam
fbea3d3174 drm/amdgpu: Add the missing error handling for xa_store() call
Add the missing error handling for xa_store() call in the function
amdgpu_userq_fence_driver_alloc().

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:17 -04:00
Arunpravin Paneer Selvam
e7cf21fbb2 drm/amdgpu: Few optimization and fixes for userq fence driver
Few optimization and fixes for userq fence driver.

v1:(Christian):
  - Remove unnecessary comments.
  - In drm_exec_init call give num_bo_handles as last parameter it would
    making allocation of the array more efficient
  - Handle return value of __xa_store() and improve the error handling of
    amdgpu_userq_fence_driver_alloc().

v2:(Christian):
   - Revert userq_xa xarray init to XA_FLAGS_LOCK_IRQ.
   - move the xa_unlock before the error check of the call xa_err(__xa_store())
     and moved this change to a separate patch as this is adding a missing error
     handling.
   - Removed the unnecessary comments.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:16 -04:00
Arunpravin Paneer Selvam
8949843762 drm/amdgpu: Enable userq fence interrupt support
Add support to handle the userqueue protected fence signal hardware
interrupt.

Create a xarray which maps the doorbell index to the fence driver address.
This would help to retrieve the fence driver information when an userq fence
interrupt is triggered. Firmware sends the doorbell offset value and
this info is compared with the queue's mqd doorbell offset value.
If they are same, we process the userq fence interrupt.

v1:(Christian):
  - use xa_load to extract the fence driver.
  - move the amdgpu_userq_fence_driver_process call within the xa_lock
    as there is a chance that fence_drv might be freed.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:16 -04:00
Arunpravin Paneer Selvam
15e30a6e47 drm/amdgpu: Add wait IOCTL timeline syncobj support
Add user fence wait IOCTL timeline syncobj support.

v2:(Christian)
  - handle dma_fence_wait() return value.
  - shorten the variable name syncobj_timeline_points a bit.
  - move num_points up to avoid padding issues.

v3:(Christian)
  - Handle timeline drm_syncobj_find_fence() call error
    handling
  - Use dma_fence_unwrap_for_each() in timeline fence as
    there could be more than one fence.

v4:(Christian)
  - Drop the first num_fences since fence is always included in
    the dma_fence_unwrap_for_each() iteration, when fence != f
    then fence is most likely just a container.

v5: Added Alex RB to merge the kernel UAPI changes since he has
    already approved the amdgpu_drm.h changes.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:16 -04:00
Arunpravin Paneer Selvam
a292fdecd7 drm/amdgpu: Implement userqueue signal/wait IOCTL
This patch introduces new IOCTL for userqueue secure semaphore.

The signal IOCTL called from userspace application creates a drm
syncobj and array of bo GEM handles and passed in as parameter to
the driver to install the fence into it.

The wait IOCTL gets an array of drm syncobjs, finds the fences
attached to the drm syncobjs and obtain the array of
memory_address/fence_value combintion which are returned to
userspace.

v2: (Christian)
    - Install fence into GEM BO object.
    - Lock all BO's using the dma resv subsystem
    - Reorder the sequence in signal IOCTL function.
    - Get write pointer from the shadow wptr
    - use userq_fence to fetch the va/value in wait IOCTL.

v3: (Christian)
    - Use drm_exec helper for the proper BO drm reserve and avoid BO
      lock/unlock issues.
    - fence/fence driver reference count logic for signal/wait IOCTLs.

v4: (Christian)
    - Fixed the drm_exec calling sequence
    - use dma_resv_for_each_fence_unlock if BO's are not locked
    - Modified the fence_info array storing logic.

v5: (Christian)
    - Keep fence_drv until wait queue execution.
    - Add dma_fence_wait for other fences.
    - Lock BO's using drm_exec as the number of fences in them could
      change.
    - Install signaled fences as well into BO/Syncobj.
    - Move Syncobj fence installation code after the drm_exec_prepare_array.
    - Directly add dma_resv_usage_rw(args->bo_flags....
    - remove unnecessary dma_fence_put.

v6: (Christian)
    - Add xarray stuff to store the fence_drv
    - Implement a function to iterate over the xarray and drop
      the fence_drv references.
    - Add drm_exec_until_all_locked() wrapper
    - Add a check that if we haven't exceeded the user allocated num_fences
      before adding dma_fence to the fences array.

v7: (Christian)
    - Use memdup_user() for kmalloc_array + copy_from_user
    - Move the fence_drv references from the xarray into the newly created fence
      and drop the fence_drv references when we signal this fence.
    - Move this locking of BOs before the "if (!wait_info->num_fences)",
      this way you need this code block only once.
    - Merge the error handling code and the cleanup + return 0 code.
    - Initializing the xa should probably be done in the userq code.
    - Remove the userq back pointer stored in fence_drv.
    - Pass xarray as parameter in amdgpu_userq_walk_and_drop_fence_drv()

v8: (Christian)
    - Move fence_drv references must come before adding the fence to the list.
    - Use xa_lock_irqsave_nested for nested spinlock operations.
    - userq_mgr should be per fpriv and not one per device.
    - Restructure the interrupt process code for the early exit of the loop.
    - The reference acquired in the syncobj fence replace code needs to be
      kept around.
    - Modify the dma_fence acquire placement in wait IOCTL.
    - Move USERQ_BO_WRITE flag to UAPI header file.
    - drop the fence drv reference after telling the hw to stop accessing it.
    - Add multi sync object support to userq signal IOCTL.

V9: (Christian)
    - Store all the fence_drv ref to other drivers and not ourself.
    - Remove the userq fence xa implementation and replace with
      kvmalloc_array.

v10: (Christian)
     - Add a comment for the userq_xa xarray
     - drop the if check of userq_fence->fence_drv_array
     - use the i variable to initialize userq_fence->fence_drv_array_count
     - drop the fence reference before you free the array in the error handling,
       otherwise it could be that some references leaked.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:16 -04:00
Arunpravin Paneer Selvam
2e65ea1ab2 drm/amdgpu: screen freeze and userq driver crash
Screen freeze and userq fence driver crash while playing Xonotic

v2: (Christian)
    - There is change that fence might signal in between testing
      and grabbing the lock. Hence we can move the lock above the
      if..else check and use the dma_fence_is_signaled_locked().

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:16 -04:00
Arunpravin Paneer Selvam
97ff194625 drm/amdgpu: Implement a new userqueue fence driver
Developed a userqueue fence driver for the userqueue process shared
BO synchronization.

Create a dma fence having write pointer as the seqno and allocate a
seq64 memory for each user queue process and feed this memory address
into the firmware/hardware, thus the firmware writes the read pointer
into the given address when the process completes it execution.
Compare wptr and rptr, if rptr >= wptr, signal the fences for the waiting
process to consume the buffers.

v2: Worked on review comments from Christian for the following
    modifications

    - Add wptr as sequence number into the fence
    - Add a reference count for the fence driver
    - Add dma_fence_put below the list_del as it might
      frees the userq fence.
    - Trim unnecessary code in interrupt handler.
    - Check dma fence signaled state in dma fence creation
      function for a potential problem of hardware completing
      the job processing beforehand.
    - Add necessary locks.
    - Create a list and process all the unsignaled fences.
    - clean up fences in destroy function.
    - implement .signaled callback function

v3: Worked on review comments from Christian
    - Modify naming convention for reference counted objects
    - Fix fence driver reference drop issue
    - Drop amdgpu_userq_fence_driver_process() function return value

v4: Worked on review comments from Christian
    - Moved fence driver allocation into amdgpu_userq_fence_driver_alloc()
    - Added detail doc mentioning the differences b/w
      two spinlocks declared.

v5: Worked on review comments from Christian
    - Check before upcast and remove local variable
    - Add error handling in fence_drv alloc function.
    - Move rptr read fn outside of the loop and remove WARN_ON in
      destroy function.

v6:
  - clear the seq64 memory in user fence driver(Christian)
  - fix for the wptr va bo mapping(Christian)
  - move the fence_drv xa entry erase code from the interrupt handler
    into user fence destroy function

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:16 -04:00