mirror_ubuntu-kernels

mirror of https://git.proxmox.com/git/mirror_ubuntu-kernels.git synced 2025-12-08 17:57:15 +00:00

Author	SHA1	Message	Date
Mario Limonciello	b7cdb41e7d	drm/amd: Delay removal of the firmware framebuffer Removing the firmware framebuffer from the driver means that even if the driver doesn't support the IP blocks in a GPU it will no longer be functional after the driver fails to initialize. This change will ensure that unsupported IP blocks at least cause the driver to work with the EFI framebuffer. Cc: stable@vger.kernel.org Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-09 16:26:48 -05:00
Luben Tuikov	08e60fac1d	drm/amdgpu: Fix potential NULL dereference Fix potential NULL dereference, in the case when "man", the resource manager might be NULL, when/if we print debug information. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Cc: Dan Carpenter <error27@gmail.com> Cc: kernel test robot <lkp@intel.com> Fixes: `7554886daa` ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-09 16:24:26 -05:00
Aaron Liu	f6e856e72c	drm/amdgpu: update ta_secureDisplay_if.h to v27.00.00.08 1. Rename securedisplay_cmd to ta_securedisplay_cmd. 2. Rename ta_securedisplay_max_phy to ta_securedisplay_phy_ID. 3. update securedisplay_cmd to ta_securedisplay_cmd Signed-off-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Alan Liu <HaoPing.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-05 11:43:46 -05:00
Hawking Zhang	4a1c9a444b	drm/amdgpu: allow query error counters for specific IP block amdgpu_ras_block_late_init will be invoked in IP specific ras_late_init call as a common helper for all the IP blocks. However, when amdgpu_ras_block_late_init call amdgpu_ras_query_error_count to query ras error counters, amdgpu_ras_query_error_count queries all the IP blocks that support ras query interface. This results to wrong error counters cached in software copies when there are ras errors detected at time zero or warm reset procedure. i.e., in sdma_ras_late_init phase, it counts on sdma/mmhub errors, while, in mmhub_ras_late_init phase, it still counts on sdma/mmhub errors. The change updates amdgpu_ras_query_error_count interface to allow query specific ip error counter. It introduces a new input parameter: query_info. if query_info is NULL, it means query all the IP blocks, otherwise, only query the ip block specified by query_info. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-05 11:42:14 -05:00
xurui	90f56611fc	drm/amdgpu: Retry DDC probing on DVI on failure if we got an HPD interrupt HPD signals on DVI ports can be fired off before the pins required for DDC probing actually make contact, due to the pins for HPD making contact first. This results in a HPD signal being asserted but DDC probing failing, resulting in hotplugging occasionally failing. Rescheduling the hotplug work for a second when we run into an HPD signal with a failing DDC probe usually gives enough time for the rest of the connector's pins to make contact, and fixes this issue. Signed-off-by: xurui <xurui@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-05 11:42:09 -05:00
Michel Dänzer	4243c84aa0	Revert "drm/amd/display: Enable Freesync Video Mode by default" This reverts commit `de05abe6b9`. The bug referenced below was bisected to this commit. There has been no activity toward fixing it in 3 months, so let's revert for now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:57:58 -05:00
Stanley.Yang	c26cd99918	drm/amdgpu: remove enable ras cmd call trace [Why] [ 41.285804] RIP: 0010:amdgpu_ras_feature_enable+0x15c/0x310 [amdgpu] [ 41.285945] Code: 48 89 c1 48 c7 c2 b9 f2 88 c1 48 c7 c0 c0 f2 88 c1 49 8b 3c 24 48 0f 44 d0 48 c7 c6 98 33 80 c1 e8 5f 52 75 d9 e9 fa fe ff ff <0f> 0b e9 66 ff ff ff 48 8b 3d 86 8c 0f da ba 00 04 00 00 be c0 0d [ 41.285946] RSP: 0018:ffffbccdc72efc90 EFLAGS: 00010246 [ 41.285948] RAX: 0000000000000004 RBX: ffff931897406980 RCX: 0000000000000002 [ 41.285949] RDX: 0000000000000dc0 RSI: 0000000000000002 RDI: ffff931500042b00 [ 41.285950] RBP: ffffbccdc72efcc0 R08: 0000000000000002 R09: ffff931885b87000 [ 41.285951] R10: 0000000000ffff10 R11: 0000000000000001 R12: ffff931893e20000 [ 41.285952] R13: 0000000000000001 R14: ffff931885b87000 R15: 0000000000000000 [ 41.285953] FS: 0000000000000000(0000) GS:ffff931c6f200000(0000) knlGS:0000000000000000 [ 41.285954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.285955] CR2: 000055dd6f532008 CR3: 000000061b010006 CR4: 00000000003706e0 [ 41.285956] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 41.285957] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 41.285958] Call Trace: [ 41.285959] <TASK> [ 41.285963] ? gfx_v11_0_early_init+0x250/0x250 [amdgpu] [ 41.286117] gfx_v11_0_late_init+0x8c/0xb0 [amdgpu] [ 41.286271] amdgpu_device_ip_late_init+0x8d/0x3c0 [amdgpu] [ 41.286401] amdgpu_device_init.cold+0x1677/0x1fda [amdgpu] [ 41.286616] ? pci_bus_read_config_word+0x4a/0x70 [ 41.286621] ? do_pci_enable_device+0xdb/0x110 [ 41.286625] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ 41.286762] amdgpu_pci_probe+0x18d/0x3a0 [amdgpu] [ 41.286898] local_pci_probe+0x4b/0x90 [ 41.286901] work_for_cpu_fn+0x1a/0x30 [ 41.286903] process_one_work+0x22b/0x3d0 [ 41.286905] worker_thread+0x223/0x420 [ 41.286907] ? process_one_work+0x3d0/0x3d0 [ 41.286908] kthread+0x12a/0x150 [ 41.286911] ? set_kthread_struct+0x50/0x50 [ 41.286913] ret_from_fork+0x22/0x30 [How] For specific asic, only mem ecc is enabled, sram ecc is not enabled, but it still need to send ras enable cmd to gfx block to support poison mode, so add check posion mode. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:57:58 -05:00
Stanley.Yang	6ecc01a9ce	drm/amdgpu: correct umc poison mode set value For GFX 11.0.3, Due to security policy, there is no way to check UcFatalEn field of UMCCH0_0_GeccCtrl to identify UMC poison mode. This is workaround force set umc poison mode as 1 for GFX 11.0.3 Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:57:58 -05:00
Christian König	0b04ea391c	drm/amdgpu: allow zero as vram limit This allows testing the driver without any VRAM. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:50:22 -05:00
Christian König	da2f992091	drm/amdgpu: cleanup visible vram size handling Centralize the limit handling and validation in one place instead of spreading that around in different hw generations. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:50:14 -05:00
Christian König	7ccfd79fdd	drm/amdgpu: rename vram_scratch into mem_scratch Rename vram_scratch into mem_scratch and allow allocating it into GTT as well. The only problem with that is that we won't have a default page for the system aperture any more. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:50:03 -05:00
Christian König	58ab2c08d7	drm/amdgpu: use VRAM\|GTT for a bunch of kernel allocations Technically all of those can use GTT as well, no need to force things into VRAM. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:49:54 -05:00
Saleemkhan Jamadar	9c705b96d2	drm/amdgpu: enable VCN DPG for GC IP v11.0.4 Enable VCN Dynamic Power Gating control for GC IP v11.0.4. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:49:43 -05:00
Likun Gao	360cd08196	drm/amdgpu: adjust the sequence to check soft reset 1.Drop soft reset check when do should recover gpu check. (As it will skip gpu reset operation if some ip is hang but not support soft reset) 2.Check soft reset status before do soft reset when pre asic reset. a. If check soft reset return true, it means: some ip is hang and it also support soft reset, will try soft reset first. b. If check soft reset return false, it means: I. All the ip are not hang, will skip gpu reset. II. Some ip is hang but not support soft reset, will skip soft reset and retry with full reset later. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:49:26 -05:00
Tim Huang	e3bf7e96d0	drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0 MES is part of gfxoff and MES suspend and resume are skipped for S0i3. But the mes_self_test call path is still in the amdgpu_device_ip_late_init. it's should also be skipped for s0ix as no hardware re-initialization happened. Besides, mes_self_test will free the BO that triggers a lot of warning messages while in the suspend state. [ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu] [ 81.679435] Call Trace: [ 81.679726] <TASK> [ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu] [ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu] [ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu] [ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu] [ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu] [ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu] [ 81.684818] pci_pm_resume+0x5c/0xa0 [ 81.685247] ? pci_pm_thaw+0x90/0x90 [ 81.685658] dpm_run_callback+0x4e/0x160 [ 81.686110] device_resume+0xad/0x210 [ 81.686529] async_resume+0x1e/0x40 [ 81.686931] async_run_entry_fn+0x33/0x120 [ 81.687405] process_one_work+0x21d/0x3f0 [ 81.687869] worker_thread+0x4a/0x3c0 [ 81.688293] ? process_one_work+0x3f0/0x3f0 [ 81.688777] kthread+0xff/0x130 [ 81.689157] ? kthread_complete_and_exit+0x20/0x20 [ 81.689707] ret_from_fork+0x22/0x30 [ 81.690118] </TASK> [ 81.690380] ---[ end trace 0000000000000000 ]--- v2: make the comment clean and use adev->in_s0ix instead of adev->suspend Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:47:58 -05:00
Alex Deucher	5620a1889e	drm/amdgpu: skip MES for S0ix as well since it's part of GFX It's also part of gfxoff. Cc: stable@vger.kernel.org # 6.0, 6.1 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:47:07 -05:00
Alex Deucher	b7665165ae	Revert "drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix" This reverts commit `e5d59cfa33`. This is no longer needed since we no longer suspend SDMA during S0ix. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:47:04 -05:00
Alex Deucher	5804463a65	Revert "drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume" This reverts commit `f543d28687`. This is no longer needed since we no longer touch SDMA 5.x for s0i3. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:46:58 -05:00
Alex Deucher	2a7798ea73	drm/amdgpu: for S0ix, skip SDMA 5.x+ suspend/resume SDMA 5.x is part of the GFX block so it's controlled via GFXOFF. Skip suspend as it should be handled the same as GFX. v2: drop SDMA 4.x. That requires special handling. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:46:55 -05:00
Alex Deucher	47198eb721	drm/amdgpu: don't mess with SDMA clock or powergating in S0ix It's handled by GFXOFF for SDMA 5.x and SMU saves the state on SDMA 4.x. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:46:52 -05:00
Alex Deucher	735c706468	drm/amdgpu/gmc11: don't touch gfxhub registers during S0ix gfxhub registers are part of gfx IP and should not need to be changed. Doing so without disabling gfxoff can hang the gfx IP. v2: add comments explaining why we can skip the interrupt control for S0i3 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:46:49 -05:00
Alex Deucher	d5d29009b8	drm/amdgpu/gmc10: don't touch gfxhub registers during S0ix gfxhub registers are part of gfx IP and should not need to be changed. Doing so without disabling gfxoff can hang the gfx IP. v2: add comments explaining why we can skip the interrupt control for S0i3 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:46:46 -05:00
Alex Deucher	b93df61dda	drm/amdgpu/gmc9: don't touch gfxhub registers during S0ix gfxhub registers are part of gfx IP and should not need to be changed. Doing so without disabling gfxoff can hang the gfx IP. v2: add comments explaining why we can skip the interrupt control for S0i3 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:46:41 -05:00
Philip Yang	41d82649ca	drm/amdkfd: Fix double release compute pasid If kfd_process_device_init_vm returns failure after vm is converted to compute vm and vm->pasid set to compute pasid, KFD will not take pdd->drm_file reference. As a result, drm close file handler maybe called to release the compute pasid before KFD process destroy worker to release the same pasid and set vm->pasid to zero, this generates below WARNING backtrace and NULL pointer access. Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step of kfd_process_device_init_vm, to ensure vm pasid is the original pasid if acquiring vm failed or is the compute pasid with pdd->drm_file reference taken to avoid double release same pasid. amdgpu: Failed to create process VM object ida_free called for id=32770 which is not allocated. WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140 RIP: 0010:ida_free+0x96/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:ida_free+0x76/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-20 12:46:20 -05:00
Candice Li	e4f665de41	drm/amdgpu: Add poison mode query for df v4_3 Add poison mode query support on df v4_3. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:19:30 -05:00
Evan Quan	cf5cf34983	drm/amdgpu: bump minor version number for DEV_INFO and SENSOR IOCTLs update Update AMDGPU_INFO_DEV_INFO IOCTL for minimum engine and memory clock. And update AMDGPU_INFO_SENSOR IOCTL for PEAK_PSTATE engine and memory clock. User applications can better utilize these IOCTLs to get needed informations. Increase the minor version number to indicate that the new flags are available. Proposed mesa patch: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/278 Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:19:07 -05:00
Evan Quan	88347fa18b	drm/amdgpu: expose the minimum shader/memory clock frequency Otherwise, some UMD tools will treate them as 0 at default while actually they are not. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:19:01 -05:00
Evan Quan	5cfd978490	drm/amdgpu: expose peak profiling mode shader/memory clocks Expose those informations to UMD who need them as for standard profiling mode. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:18:54 -05:00
Tao Zhou	2dd9032beb	drm/amdgpu: define RAS query poison mode function 1. no need to query poison mode on SRIOV guest side, host can handle it. 2. define the function to simplify code. v2: rename amdgpu_ras_poison_mode_query to amdgpu_ras_query_poison_mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:18:19 -05:00
Tao Zhou	3189501e6f	drm/amdgpu: update VCN/JPEG RAS setting Support VCN/JPEG RAS in both bare metal and SRIOV environment. v2: update commit description. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:18:19 -05:00
Tao Zhou	248c9635b8	drm/amdgpu: skip RAS error injection in SRIOV Injection on guest is not allowed. v2: return directly in SRIOV environment. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:18:19 -05:00
Tao Zhou	6a822b7ace	drm/amdgpu: add VCN poison consumption handler for SRIOV Inform host and let host handle consumption interrupt. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:18:19 -05:00
Tao Zhou	e643823d62	drm/amdgpu: add RAS poison consumption handler for SRIOV Send message to PF if VF receives RAS poison consumption interrupt. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:18:19 -05:00
Tao Zhou	ae844dd79f	drm/amdgpu: add RAS poison consumption handler for NV SRIOV Send handling request to host. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:18:19 -05:00
Tao Zhou	8ede944da6	drm/amdgpu: add RAS poison consumption handler for AI SRIOV Send message to host and host will handle it. v2: split the patch into two parts, one is for mxgpu ai and another one is for common poison consumption handler. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-15 12:18:19 -05:00
Christian König	4772222066	drm/amdgpu: revert "generally allow over-commit during BO allocation" This reverts commit `f9d00a4a8d`. This causes problem for KFD because when we overcommit we accidentially bind the BO to GTT for moving it into VRAM. We also need to make sure that this is done only as fallback after trying to evict first. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 16:48:00 -05:00
Luben Tuikov	3273f11675	drm/amdgpu: Remove unnecessary domain argument Remove the "domain" argument to amdgpu_bo_create_kernel_at() since this function takes an "offset" argument which is the offset off of VRAM, and as such allocation always takes place in VRAM. Thus, the "domain" argument is unnecessary. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 16:48:00 -05:00
Luben Tuikov	7554886daa	drm/amdgpu: Fix size validation for non-exclusive domains (v4) Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the requested memory exists, else we get a kernel oops when dereferencing "man". v2: Make the patch standalone, i.e. not dependent on local patches. v3: Preserve old behaviour and just check that the manager pointer is not NULL. v4: Complain if GTT domain requested and it is uninitialized--most likely a bug. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 16:48:00 -05:00
Luben Tuikov	28afcb0ad5	drm/amdgpu: Check if fru_addr is not NULL (v2) Always check if fru_addr is not NULL. This commit also fixes a "smatch" warning. v2: Add a Fixes tag. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Dan Carpenter <error27@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Fixes: `afbe5d1e4b` ("drm/amdgpu: Bug-fix: Reading I2C FRU data on newer ASICs") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 16:48:00 -05:00
Christian König	e44a0fe630	drm/amdgpu: rework reserved VMID handling Instead of reserving a VMID for a single process allow that many processes use the reserved ID. This allows for proper isolation between the processes. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 09:48:33 -05:00
Christian König	053499f7b4	drm/amdgpu: stop waiting for the VM during unreserve This is completely pointless since the VMID always stays allocated until the VM is idle. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 09:48:32 -05:00
Christian König	5f3c40e9e2	drm/amdgpu: cleanup SPM support a bit This should probably not access job->vm and also emit the SPM switch under the conditional execute. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 09:48:32 -05:00
Christian König	56b0989e29	drm/amdgpu: fix GDS/GWS/OA switch handling Bas pointed out that this isn't working as expected and could cause crashes. Fix the handling by storing the marker that a switch is needed inside the job instead. Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 09:48:32 -05:00
Felix Kuehling	f95f51a4c3	drm/amdgpu: Add notifier lock for KFD userptrs Add a per-process MMU notifier lock for processing notifiers from userptrs. Use that lock to properly synchronize page table updates with MMU notifiers. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Xiaogang Chen<Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 09:48:05 -05:00
Christian König	4d2ccd96ac	drm/amdgpu: WARN when freeing kernel memory during suspend When buffers are freed during suspend there is no guarantee that they can be re-allocated during resume. The PSP subsystem seems to be quite buggy regarding this, so add a WARN_ON() to point out those bugs. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexdeucher@amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 09:48:04 -05:00
Christian König	9c3db58bf8	drm/amdgpu: fixx NULL pointer deref in gmc_v9_0_get_vm_pte We not only need to make sure that we have a BO, but also that the BO has some backing store. Fixes: `d1a372af1c` ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-14 09:48:04 -05:00
Shikang Fan	47ea20762b	drm/amdgpu: Add an extra evict_resource call during device_suspend. - evict_resource is taking too long causing sriov full access mode timeout. So, add an extra evict_resource in the beginning as an early evict. Signed-off-by: Shikang Fan <shikang.fan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-12-13 17:10:32 -05:00
Alex Deucher	1d4624cd72	drm/amdgpu: handle polaris10/11 overlap asics (v2) Some special polaris 10 chips overlap with the polaris11 DID range. Handle this properly in the driver. v2: use local flags for other function calls. Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2022-12-09 16:50:30 -05:00
Alex Deucher	81d0bcf990	drm/amdgpu: make display pinning more flexible (v2) Only apply the static threshold for Stoney and Carrizo. This hardware has certain requirements that don't allow mixing of GTT and VRAM. Newer asics do not have these requirements so we should be able to be more flexible with where buffers end up. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2270 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2291 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2255 Acked-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2022-12-09 16:50:30 -05:00
Dave Airlie	66efff515a	Merge tag 'amd-drm-next-6.2-2022-12-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.2-2022-12-07: amdgpu: - DSC fixes for DCN 2.1 - HDMI PCON fixes - PSR fixes - DC DML fixes - Properly throttle on BO allocation - GFX 11.0.4 fixes - MMHUB fix - Make some functions static Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221207232439.5908-1-alexander.deucher@amd.com	2022-12-09 12:08:33 +10:00

1 2 3 4 5 ...

11703 Commits