Commit Graph

33 Commits

Author SHA1 Message Date
Dave Airlie
7e2818386a amd-drm-next-6.17-2025-07-01:
amdgpu:
 - FAMS2 fixes
 - OLED fixes
 - Misc cleanups
 - AUX fixes
 - DMCUB updates
 - SR-IOV hibernation support
 - RAS updates
 - DP tunneling fixes
 - DML2 fixes
 - Backlight improvements
 - Suspend improvements
 - Use scaling for non-native modes on eDP
 - SDMA 4.4.x fixes
 - PCIe DPM fixes
 - SDMA 5.x fixes
 - Cleaner shader updates for GC 9.x
 - Remove fence slab
 - ISP genpd support
 - Parition handling rework
 - SDMA FW checks for userq support
 - Add missing firmware declaration
 - Fix leak in amdgpu_ctx_mgr_entity_fini()
 - Freesync fix
 - Ring reset refactoring
 - Legacy dpm verbosity changes
 
 amdkfd:
 - GWS fix
 - mtype fix for ext coherent system memory
 - MMU notifier fix
 - gfx7/8 fix
 
 radeon:
 - CS validation support for additional GL extensions
 - Bump driver version for new CS validation checks
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaGQ6hwAKCRC93/aFa7yZ
 2JVJAQC/IpZoSmW22VrtXjBiQ3yoYt61oOM/WJVXPV11FmisyQEAqQ9nc+/lY9SM
 s5/RskaQGlfCGqafnaSGuoILkD/YDQ4=
 =i8mT
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.17-2025-07-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.17-2025-07-01:

amdgpu:
- FAMS2 fixes
- OLED fixes
- Misc cleanups
- AUX fixes
- DMCUB updates
- SR-IOV hibernation support
- RAS updates
- DP tunneling fixes
- DML2 fixes
- Backlight improvements
- Suspend improvements
- Use scaling for non-native modes on eDP
- SDMA 4.4.x fixes
- PCIe DPM fixes
- SDMA 5.x fixes
- Cleaner shader updates for GC 9.x
- Remove fence slab
- ISP genpd support
- Parition handling rework
- SDMA FW checks for userq support
- Add missing firmware declaration
- Fix leak in amdgpu_ctx_mgr_entity_fini()
- Freesync fix
- Ring reset refactoring
- Legacy dpm verbosity changes

amdkfd:
- GWS fix
- mtype fix for ext coherent system memory
- MMU notifier fix
- gfx7/8 fix

radeon:
- CS validation support for additional GL extensions
- Bump driver version for new CS validation checks

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250701194707.32905-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-07-04 10:06:29 +10:00
Alex Deucher
bf1cd14f9e drm/amdgpu: switch job hw_fence to amdgpu_fence
Use the amdgpu fence container so we can store additional
data in the fence.  This also fixes the start_time handling
for MCBP since we were casting the fence to an amdgpu_fence
and it wasn't.

Fixes: 3f4c175d62 ("drm/amdgpu: MCBP based on DRM scheduler (v9)")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18 12:19:21 -04:00
Maxime Ripard
7b1166dee8 drm for 6.16-rc1
new drivers:
 - bring in the asahi uapi header standalone
 - nova-drm: stub driver
 
 rust dependencies (for nova-core):
 - auxiliary
   - bus abstractions
   - driver registration
   - sample driver
 - devres changes from driver-core
 - revocable changes
 
 core:
 - add Apple fourcc modifiers
 - add virtio capset definitions
 - extend EXPORT_SYNC_FILE for timeline syncobjs
 - convert to devm_platform_ioremap_resource
 - refactor shmem helper page pinning
 - DP powerup/down link helpers
 - remove disgusting turds
 - extended %p4cc in vsprintf.c to support fourcc prints
 - change vsprintf %p4cn to %p4chR, remove %p4cn
 - Add drm_file_err function
 - IN_FORMATS_ASYNC property
 - move sitronix from tiny to their own subdir
 
 rust:
 - add drm core infrastructure rust abstractions
   (device/driver, ioctl, file, gem)
 
 dma-buf:
 - adjust sg handling to not cache map on attach
 - allow setting dma-device for import
 - Add a helper to sort and deduplicate dma_fence arrays
 
 docs:
 - updated drm scheduler docs
 - fbdev todo update
 - fb rendering
 - actual brightness
 
 ttm:
 - fix delayed destroy resv object
 
 bridge:
 - add kunit tests
 - convert tc358775 to atomic
 - convert drivers to devm_drm_bridge_alloc
 - convert rk3066_hdmi to bridge driver
 
 scheduler:
 - add kunit tests
 
 panel:
 - refcount panels to improve lifetime handling
 - Powertip PH128800T004-ZZA01
 - NLT NL13676BC25-03F, Tianma TM070JDHG34-00
 - Himax HX8279/HX8279-D DDIC
 - Visionox G2647FB105
 - Sitronix ST7571
 - ZOTAC rotation quirk
 
 vkms:
 - allow attaching more displays
 
 i915:
 - xe3lpd display updates
 - vrr refactor
 - intel_display struct conversions
 - xe2hpd memory type identification
 - add link rate/count to i915_display_info
 - cleanup VGA plane handling
 - refactor HDCP GSC
 - fix SLPC wait boosting reference counting
 - add 20ms delay to engine reset
 - fix fence release on early probe errors
 
 xe:
 - SRIOV updates
 - BMG PCI ID update
 - support separate firmware for each GT
 - SVM fix, prelim SVM multi-device work
 - export fan speed
 - temp disable d3cold on BMG
 - backup VRAM in PM notifier instead of suspend/freeze
 - update xe_ttm_access_memory to use GPU for non-visible access
 - fix guc_info debugfs for VFs
 - use copy_from_user instead of __copy_from_user
 - append PCIe gen5 limitations to xe_firmware document
 
 amdgpu:
 - DSC cleanup
 - DC Scaling updates
 - Fused I2C-over-AUX updates
 - DMUB updates
 - Use drm_file_err in amdgpu
 - Enforce isolation updates
 - Use new dma_fence helpers
 - USERQ fixes
 - Documentation updates
 - SR-IOV updates
 - RAS updates
 - PSP 12 cleanups
 - GC 9.5 updates
 - SMU 13.x updates
 - VCN / JPEG SR-IOV updates
 
 amdkfd:
 - Update error messages for SDMA
 - Userptr updates
 - XNACK fixes
 
 radeon:
 - CIK doorbell cleanup
 
 nouveau:
 - add support for NVIDIA r570 GSP firmware
 - enable Hopper/Blackwell support
 
 nova-core:
 - fix task list
 - register definition infrastructure
 - move firmware into own rust module
 - register auxiliary device for nova-drm
 
 nova-drm:
 - initial driver skeleton
 
 msm:
 - GPU:
   - ACD (adaptive clock distribution) for X1-85
   - drop fictional address_space_size
   - improve GMU HFI response time out robustness
   - fix crash when throttling during boot
 - DPU:
   - use single CTL path for flushing on DPU 5.x+
   - improve SSPP allocation code for better sharing
   - Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550
   - Added SAR2130P support
   - Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660
 - DP:
   - switch to new audio helpers
   - better LTTPR handling
 - DSI:
   - Added support for SA8775P
   - Added SAR2130P support
 - HDMI:
   - Switched to use new helpers for ACR data
   - Fixed old standing issue of HPD not working in some cases
 
 amdxdna:
 - add dma-buf support
 - allow empty command submits
 
 renesas:
 - add dma-buf support
 - add zpos, alpha, blend support
 
 panthor:
 - fail properly for NO_MMAP bos
 - add SET_LABEL ioctl
 - debugfs BO dumping support
 
 imagination:
 - update DT bindings
 - support TI AM68 GPU
 
 hibmc:
 - improve interrupt handling and HPD support
 
 virtio:
 - add panic handler support
 
 rockchip:
 - add RK3588 support
 - add DP AUX bus panel support
 
 ivpu:
 - add heartbeat based hangcheck
 
 mediatek:
 - prepares support for MT8195/99 HDMIv2/DDCv2
 
 anx7625:
 - improve HPD
 
 tegra:
 - speed up firmware loading
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmg2aVAACgkQDHTzWXnE
 hr6DjhAApr2fZjugU3EmpsARdcIWgEd+X65R97ef7RlUGqBKm2joSwZGOhH0oBsG
 9WyO92Qzu6XMe8OibKqY4D2hir9UPz5v+uEWe3q9CzZGbNyAwyVRjVkaKpnI9upv
 1dmHFI7HgPu6qbz6RfPIfgALBLXvVXMaQ4+ZgN/cLtZFa+OLAV5ByqWsRPPXZFb0
 F/pQGQ4ursglfA+LH3SVPfnTN53lu93IlM5/Os9OQQGj+44w94zQ6DCm7CY1AugH
 n+RM/0Yv7WaoF1ByeOtq4FcrmLRrd+ozsvITbRZqhOx7zS/mhP8LRzAwgKWOYzSh
 puKunyQiSdHR7FSqSi8uyY3YumcLWNa/17LMKoTf+KqweJbKGE7RVBuFBn6WUdPb
 AYHZrSB4USAeyahdrrsU+q7ltu5urs5ckpbXsRurMiaUz/BLim1PIm3N5FDLPY7B
 PD1n1FcMUv3CmJT5Y+aNIQgmf1/dETESRTSAgSoOo3gNp6jdRCYqSuWIBsppibWT
 26+tyz0/FGhE50QviHzg0Sv+jd/g93fN6snNlV8wNFMviq3bC69Toa+y3qJ5e7UC
 /42R7nCWdkCZJfr6E67rOaahe9TDV/LXLqPErwptOkdK8sMchaIgF+deybgTtTi/
 zGRBfjLvb5ocYBmPbeGX4mtXNRpyZ3o9I0QUyGUO4zMwFXmFwn0=
 =jpVr
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaD7zuQAKCRAnX84Zoj2+
 dv21AX4qAXMoS1eQQOzx5/MN0LhibwHO8lq0HgyhKKCMZTUvFP91hvuB6qKGzxEU
 +RJmN5cBgPGNuXwr9zLe5A/Lv1LWgfSj1DaAlauYvduFh1xyLOLuo0H3xfTsKrcl
 Onjxi5QVsg==
 =bMa5
 -----END PGP SIGNATURE-----

Merge drm-next-2025-05-28 into drm-misc-next

Christian needs a recent drm-next branch to merge fence patches.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-03 15:07:39 +02:00
Pierre-Eric Pelloux-Prayer
2956554823 drm/sched: Store the drm client_id in drm_sched_fence
This will be used in a later commit to trace the drm client_id in
some of the gpu_scheduler trace events.

This requires changing all the users of drm_sched_job_init to
add an extra parameter.

The newly added drm_client_id field in the drm_sched_fence is a bit
of a duplicate of the owner one. One suggestion I received was to
merge those 2 fields - this can't be done right now as amdgpu uses
some special values (AMDGPU_FENCE_OWNER_*) that can't really be
translated into a client id. Christian is working on getting rid of
those; when it's done we should be able to squash owner/drm_client_id
together.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250526125505.2360-3-pierre-eric.pelloux-prayer@amd.com
2025-05-28 16:15:58 +02:00
Alex Deucher
2e0454b730 drm/amdgpu: adjust enforce_isolation handling
Switch from a bool to an enum and allow more options
for enforce isolation.  There are now 3 modes of operation:
- Disabled (0)
- Enabled (serialization and cleaner shader) (1)
- Enabled in legacy mode (no serialization or cleaner shader) (2)
This provides better flexibility for more use cases.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:58:15 -04:00
Srinivasan Shanmugam
dba1a6cfc3 drm/amdgpu: Enforce isolation as part of the job
This patch adds a new parameter 'enforce_isolation' to the amdgpu_job
structure. This parameter is used to determine whether shader isolation
should be enforced for a job. The enforce_isolation parameter is then
stored in the amdgpu_job structure and used when flushing the VM.

The enforce_isolation field of the amdgpu_job structure is set directly
after the job is allocated

This change allows more fine-grained control over shader isolation,
making it possible to enforce isolation on a per-job basis rather than
globally. This can be useful in scenarios where only certain jobs
require isolation.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-20 22:06:43 -04:00
Christian König
f88e295e90 drm/amdgpu: add VM generation token
Instead of using the VRAM lost counter add a 64bit token which indicates
if a context or job is still valid to use.

Should the VRAM be lost or the page tables need re-creation the token will
change indicating that userspace needs to act and re-create the contexts
and re-submit the work.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
ac9287055f drm/amdgpu: add gfx shadow CS IOCTL support
Add support for submitting the shadow update packet
when submitting an IB.  Needed for MCBP on GFX11.

v2: update API for CSA (Alex)
v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC
    Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in
    amdgpu_cs_pass1()
    Only initialize shadow on first use
    (Alex)
v4: Pass parameters rather than job to new ring callback (Alex)
v5: squash in change to call SET_Q_PREEMPTION_MODE/COND_EXEC
    before RELEASE_MEM to complete the UMDs use of the shadow (Alex)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-24 18:16:19 -04:00
Christian König
5f3c40e9e2 drm/amdgpu: cleanup SPM support a bit
This should probably not access job->vm and also emit the SPM switch
under the conditional execute.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14 09:48:32 -05:00
Christian König
56b0989e29 drm/amdgpu: fix GDS/GWS/OA switch handling
Bas pointed out that this isn't working as expected and could cause
crashes. Fix the handling by storing the marker that a switch is needed
inside the job instead.

Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14 09:48:32 -05:00
Christian König
1728baa7e4 drm/amdgpu: use scheduler dependencies for CS
Entirely remove the sync obj in the job.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-11-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König
1b2d5eda5a drm/amdgpu: move explicit sync check into the CS
This moves the memory allocation out of the critical code path.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-8-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König
f7d66fb2ea drm/amdgpu: cleanup scheduler job initialization v2
Init the DRM scheduler base class while allocating the job.

This makes the whole handling much more cleaner.

v2: fix coding style

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-7-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König
68ce8b2422 drm/amdgpu: add gang submit backend v2
Allows submitting jobs as gang which needs to run on multiple
engines at the same time.

Basic idea is that we have a global gang submit fence representing when the
gang leader is finally pushed to run on the hardware last.

Jobs submitted as gang are never re-submitted in case of a GPU reset since this
won't work and will just deadlock the hardware immediately again.

v2: fix logic inversion, improve documentation, fix rcu

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20 12:40:32 -04:00
Christian König
c2b08e7a6d drm/amdgpu: move entity selection and job init earlier during CS
Initialize the entity for the CS and scheduler job much earlier.

v2: fix job initialisation order and use correct scheduler instance

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:18:23 -04:00
Christian König
736ec9fadd drm/amdgpu: move setting the job resources
Move setting the job resources into amdgpu_job.c

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13 14:33:01 -04:00
Andrey Grodzovsky
f6a3f66063 drm/amdgpu: Get rid of amdgpu_job->external_hw_fence
This is a follow-up cleanup to [1]. See bellow refcount balancing
for calling amdgpu_job_submit_direct after this cleanup as far
as I calculated.

amdgpu_fence_emit
	dma_fence_init 1
	dma_fence_get(fence) 2
	rcu_assign_pointer(*ptr, dma_fence_get(fence) 3

---> amdgpu_job_submit_direct completes before fence signaled
			amdgpu_sa_bo_free
				(*sa_bo)->fence = dma_fence_get(fence) 4

			amdgpu_job_free
				dma_fence_put 3

			amdgpu_vcn_enc_get_destroy_msg
				*fence = dma_fence_get(f) 4
				dma_fence_put(f); 3

			amdgpu_vcn_enc_ring_test_ib
				dma_fence_put(fence) 2

			amdgpu_fence_process
				dma_fence_put 1

			amdgpu_sa_bo_remove_locked
				dma_fence_put 0

---> amdgpu_job_submit_direct completes after fence signaled
			amdgpu_fence_process
				dma_fence_put 2

			amdgpu_job_free
				dma_fence_put 1

			amdgpu_vcn_enc_get_destroy_msg
				*fence = dma_fence_get(f) 2
				dma_fence_put(f); 1

			amdgpu_vcn_enc_ring_test_ib
				dma_fence_put(fence) 0

[1] - https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzovsky@amd.com/

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:37:25 -04:00
Christian König
6103b2f24e drm/amdgpu: properly embed the IBs into the job
We now have standard macros for that.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Christian König
a190f8dc4a drm/amdgpu: header cleanup
No function change, just move a bunch of definitions from amdgpu.h into
separate header files.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Jack Zhang
c530b02f39 drm/amd/amdgpu embed hw_fence into amdgpu_job
Why: Previously hw fence is alloced separately with job.
It caused historical lifetime issues and corner cases.
The ideal situation is to take fence to manage both job
and fence's lifetime, and simplify the design of gpu-scheduler.

How:
We propose to embed hw_fence into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
v2:
use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
embedded in a job.
v3:
remove redundant variable ring in amdgpu_job
v4:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
v5
add missing handling in amdgpu_fence_enable_signaling

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Jack Zhang <Jack.Zhang7@hotmail.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-16 15:16:58 -04:00
Luben Tuikov
0bb5d5b03f drm/amdgpu: Move to a per-IB secure flag (TMZ)
Move from a per-CS secure flag (TMZ) to a per-IB
secure flag.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:29 -04:00
Huang Rui
cb5fae143d drm/amdgpu: job is secure iff CS is secure (v5)
Mark a job as secure, if and only if the command
submission flag has the secure flag set.

v2: fix the null job pointer while in vmid 0
submission.
v3: Context --> Command submission.
v4: filling cs parser with cs->in.flags
v5: move the job secure flag setting out of amdgpu_cs_submit()

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
xinhui pan
c8e42d5785 drm/amdgpu: implement more ib pools (v2)
We have three ib pools, they are normal, VM, direct pools.

Any jobs which schedule IBs without dependence on gpu scheduler should
use DIRECT pool.

Any jobs schedule direct VM update IBs should use VM pool.

Any other jobs use NORMAL pool.

v2: squash in coding style fix

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
Christian König
971fe55545 drm/amdgpu: drop amdgpu_job.owner
Entirely unused.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:35:10 -05:00
Andrey Grodzovsky
7c6e68c777 drm/amdgpu: Avoid HW GPU reset for RAS.
Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.

Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt  will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.

v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count

v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13 17:41:05 -05:00
Jack Xiao
d8780dc71d drm/amdgpu: add ib preemption status in amdgpu_job (v2)
Add ib preemption status in amdgpu_job, so that ring level function
can detect preemption and program for resuming it.

v2: squash in fix to restore job->preamble_status back to status value (Jack)

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 18:57:40 -05:00
Rex Zhu
34955e038a drm/amdgpu: Modify the argument of emit_ib interface
use the point of struct amdgpu_job as the function
argument instand of vmid, so the other members of
struct amdgpu_job can be visit in emit_ib function.

v2: add a wrapper for getting the VMID
    add the job before the ib on the parameter list.
v3: refine the wrapper name

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:50 -05:00
Christian König
a1917b73d8 drm/amdgpu: remove job->adev (v2)
We can get that from the ring.

v2: squash in "drm/amdgpu: always initialize job->base.sched" (Alex)

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-17 14:18:28 -05:00
Christian König
ee913fd9e1 drm/amdgpu: add amdgpu_job_submit_direct helper
Make sure that we properly initialize at least the sched member.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-16 16:11:53 -05:00
Christian König
3320b8d2ac drm/amdgpu: remove job->ring
We can easily get that from the scheduler.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-16 16:11:53 -05:00
Christian König
0e28b10ff1 drm/amdgpu: remove ring parameter from amdgpu_job_submit
We know the ring through the entity anyway.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-16 16:11:52 -05:00
Christian König
eb3961a574 drm/amdgpu: remove fence context from the job
Can be obtained directly from the fence as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-16 16:11:52 -05:00
Christian König
050d9d43a7 drm/amdgpu: cleanup job header
Move job related defines, structure and function declarations to
amdgpu_job.h

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-16 16:11:51 -05:00