mirror_ubuntu-kernels

mirror of https://git.proxmox.com/git/mirror_ubuntu-kernels.git synced 2025-11-29 09:14:10 +00:00

Author	SHA1	Message	Date
Prike Liang	d61e1d1d52	drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume In the S2idle suspend/resume phase the gfxoff is keeping functional so some IP blocks will be likely to reinitialize at gfxoff entry and that will result in failing to program GC registers.Therefore, let disallow gfxoff until AMDGPU IPs reinitialized completely. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.15.x	2022-10-26 17:48:43 -04:00
Joaquín Ignacio Aramendía	809734c110	drm/amd/display: Revert logic for plane modifiers This file was split in commit `5d945cbcd4` ("drm/amd/display: Create a file dedicated to planes") and the logic in dm_plane_format_mod_supported() function got changed by a switch logic. That change broke drm_plane modifiers setting on series 5000 APUs (tested on OXP mini AMD 5800U and HP Dev One 5850U PRO) leading to Gamescope not working as reported on GitHub[1] To reproduce the issue, enter a TTY and run: $ gamescope -- vkcube With said commit applied it will abort. This one restores the old logic, fixing the issue that affects Gamescope. [1](https://github.com/Plagman/gamescope/issues/624) Cc: <stable@vger.kernel.org> # 6.0.x Signed-off-by: Joaquín Ignacio Aramendía <samsagax@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-24 14:46:29 -04:00
Jesse Zhang	969758bbf5	drm/amdkfd: correct the cache info for gfx1036 correct the cache information for gfx1036 Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2022-10-24 14:44:03 -04:00
Prike Liang	9656db1b93	drm/amdkfd: update gfx1037 Lx cache setting Update the gfx1037 L1/L2 cache setting. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2022-10-24 14:44:03 -04:00
YuBiao Wang	e105b6212f	drm/amdgpu: skip mes self test for gc 11.0.3 in recover Temporary disable mes self teset for gc 11.0.3 during gpu_recovery. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-24 14:44:03 -04:00
David Francis	68bc147363	drm/amd: Add IMU fw version to fw version queries IMU is a new firmware for GFX11. There are four means by which firmware version can be queried from the driver: device attributes, vf2pf, debugfs, and the AMDGPU_INFO_FW_VERSION option in the amdgpu info ioctl. Add IMU as an option for those four methods. V2: Added debugfs Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-24 14:44:03 -04:00
Alvin Lee	abe4d9f03f	drm/amd/display: Don't return false if no stream pipe_ctx[i] exists even if the pipe is not in use. If the pipe is not in use it will always have a null stream, so don't return false in this case. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-24 14:44:03 -04:00
Rodrigo Siqueira	ca08a1725d	drm/amd/display: Remove wrong pipe control lock When using a device based on DCN32/321, we have an issue where a second 4k@60Hz display does not light up, and the system becomes unresponsive for a few minutes. In the debug process, it was possible to see a hang in the function dcn20_post_unlock_program_front_end in this part: for (j = 0; j < TIMEOUT_FOR_PIPE_ENABLE_MS*1000 && hubp->funcs->hubp_is_flip_pending(hubp); j++) mdelay(1); } The hubp_is_flip_pending always returns positive for waiting pending flips which is a symptom of pipe hang. Additionally, the dmesg log shows this message after a few minutes: BUG: soft lockup - CPU#4 stuck for 26s! ... [ +0.000003] dcn20_post_unlock_program_front_end+0x112/0x340 [amdgpu] [ +0.000171] dc_commit_state_no_check+0x63d/0xbf0 [amdgpu] [ +0.000155] ? dc_validate_global_state+0x358/0x3d0 [amdgpu] [ +0.000154] dc_commit_state+0xe2/0xf0 [amdgpu] This confirmed the hypothesis that we had a pipe hanging somewhere. Next, after checking the ftrace entries, we have the below weird sequence: [..] 2) \| dcn10_lock_all_pipes [amdgpu]() { 2) 0.120 us \| optc1_is_tg_enabled [amdgpu](); 2) \| dcn20_pipe_control_lock [amdgpu]() { 2) \| dc_dmub_srv_clear_inbox0_ack [amdgpu]() { 2) 0.121 us \| amdgpu_dm_dmub_reg_write [amdgpu](); 2) 0.551 us \| } 2) \| dc_dmub_srv_send_inbox0_cmd [amdgpu]() { 2) 0.110 us \| amdgpu_dm_dmub_reg_write [amdgpu](); 2) 0.511 us \| } 2) \| dc_dmub_srv_wait_for_inbox0_ack [amdgpu]() { 2) 0.110 us \| amdgpu_dm_dmub_reg_read [amdgpu](); 2) 0.110 us \| amdgpu_dm_dmub_reg_read [amdgpu](); 2) 0.110 us \| amdgpu_dm_dmub_reg_read [amdgpu](); 2) 0.110 us \| amdgpu_dm_dmub_reg_read [amdgpu](); 2) 0.110 us \| amdgpu_dm_dmub_reg_read [amdgpu](); 2) 0.110 us \| amdgpu_dm_dmub_reg_read [amdgpu](); 2) 0.110 us \| amdgpu_dm_dmub_reg_read [amdgpu](); [..] We are not expected to read from dmub register so many times and for so long. From the trace log, it was possible to identify that the function dcn20_pipe_control_lock was triggering the dmub operation when it was unnecessary and causing the hang issue. This commit drops the unnecessary dmub code and, consequently, fixes the second display not lighting up the issue. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-24 14:44:03 -04:00
Kenneth Feng	08841950db	drm/amd/pm: allow gfxoff on gc_11_0_3 allow gfxoff on gc_11_0_3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-24 14:44:03 -04:00
Rafael Mendonca	90bfee142a	drm/amdkfd: Fix memory leak in kfd_mem_dmamap_userptr() If the number of pages from the userptr BO differs from the SG BO then the allocated memory for the SG table doesn't get freed before returning -EINVAL, which may lead to a memory leak in some error paths. Fix this by checking the number of pages before allocating memory for the SG table. Fixes: `264fb4d332` ("drm/amdgpu: Add multi-GPU DMA mapping helpers") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-24 14:44:03 -04:00
Lijo Lazar	d2c4c1569a	drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x MMHUB 2.1.x versions don't have ATCL2. Remove accesses to ATCL2 registers. Since they are non-existing registers, read access will cause a 'Completer Abort' and gets reported when AER is enabled with the below patch. Tagging with the patch so that this is backported along with it. v2: squash in uninitialized warning fix (Nathan Chancellor) Fixes: `8795e182b0` ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2022-10-24 14:44:03 -04:00
Yiqing Yao	226dcfad34	drm/amdgpu: Adjust MES polling timeout for sriov [why] MES response time in sriov may be longer than default value due to reset or init in other VF. A timeout value specific to sriov is needed. [how] When in sriov, adjust the timeout value to calculated worst case scenario. Signed-off-by: Yiqing Yao <yiqing.yao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-21 16:13:26 -04:00
Kenneth Feng	09aef0258a	drm/amd/pm: update driver-if header for smu_v13_0_10 update driver-if header for smu_v13_0_10 and merge with smu_v13_0_0 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-21 16:12:59 -04:00
Chengming Gui	79610d3041	drm/amdgpu: fix pstate setting issue [WHY] 0, original pstate X 1, ctx_A_create -> ctx_A->stable_pstate = X 2, ctx_A_set_pstate (Y) -> current pstate is Y (PEAK or STANDARD) 3, ctx_B_create -> ctx_B->stable_pstate = Y 4, ctx_A_destroy -> restore pstate to X 5, ctx_B_destroy -> restore pstate to Y Above sequence will cause final pstate is wrong (Y), should be original X. [HOW] When ctx_B create, if ctx_A touched pstate setting (not auto, stable_pstate_ctx != NULL), set ctx_B->stable_pstate the same value as ctx_A saved, if stable_pstate_ctx == NULL, fetch current pstate to fill ctx_B->stable_pstate. Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2022-10-21 16:12:09 -04:00
Dave Airlie	cbc543c59e	drm-misc-fixes for v6.1-rc2: - Fix a buffer overflow in format_helper_test. - Set DDC pointer in drmm_connector_init. - Compiler fixes for panfrost. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmNRMiEACgkQ/lWMcqZw E8MJExAAhVv/bEcg9Ib1e2gIirtOKB3w0V20rEvkvplzsfKALMH3oXHx0cIbk2sj YBICPiuc2x4sDQW9xlQmPa5Gv+6aoqjSOpu+Zpc5MQAb8qnO/vxDCOlYDJwcicjP DdxXwJcJ48+x3FFWmrg6gcU8fmH6Rckb2BgBsF3fyzOhIMF2ME6a02S+bYuHNOxK sQfo1Tlbibi5pfrxHFBE9V+Wjf3ohQwxQHmcpAC6wwgtGKOLQSxkU7o+wCFUNsn1 dlWTVe9+2xgNQenTue091PAkFP5R4T3wv654EPudyUYjybQL8ci8aFjaFTQgR3BT s4LsViTViTbm5OrSga2hGA+GGH2OGkW70yN+tqXPVxWyPaMA9GuoTh9xKhY3zVsB 69HlWBzfhAoyixp0rWmilZIGAvKXn8xf+HzOxQ/ihDk6a/VsyLbEdYP72AQTRCi8 A+h6vBNxP6AVPwDCA//hpCupG75bI8h/Plf1R/W7uzVTKXCSPAGdzROI8dxjsFAX Y4OR9kd8yn+bpf6G6D0q+tJV8BqUAzs5AUVkXWU5i1XaAK6fNpqh066yFQt8xCHX hFRcr7StYqI9ZliGP1ZEjE8nsiaqPZfDLMqSQlHl392JEfmf5uwskKwVadGHVfuC L+iATevGiNsY4JHHk+YBXHMSWrX9XIRNDio+LZQgFttr/KTK4ac= =rzib -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v6.1-rc2: - Fix a buffer overflow in format_helper_test. - Set DDC pointer in drmm_connector_init. - Compiler fixes for panfrost. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/c4d05683-8ebe-93b8-d24c-d1d2c68f12c4@linux.intel.com	2022-10-21 09:56:14 +10:00
Alex Deucher	50b0e4d4da	drm/amdgpu: fix sdma doorbell init ordering on APUs Commit `8795e182b0` ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") uncovered a bug in amdgpu that required a reordering of the driver init sequence to avoid accessing a special register on the GPU before it was properly set up leading to an PCI AER error. This reordering uncovered a different hw programming ordering dependency in some APUs where the SDMA doorbells need to be programmed before the GFX doorbells. To fix this, move the SDMA doorbell programming back into the soc15 common code, but use the actual doorbell range values directly rather than the values stored in the ring structure since those will not be initialized at this point. This is a partial revert, but with the doorbell assignment fixed so the proper doorbell index is set before it's used. Fixes: `e3163bc8ff` ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: skhan@linuxfoundation.org Cc: stable@vger.kernel.org	2022-10-20 09:35:51 -04:00
Thomas Zimmermann	1aca5ce036	Merge drm/drm-fixes into drm-misc-fixes Backmerging to get v6.1-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2022-10-20 09:09:00 +02:00
Christian König	01f2cf5384	drm/amdgpu: use DRM_SCHED_FENCE_DONT_PIPELINE for VM updates Make sure that we always have a CPU round trip to let the submission code correctly decide if a TLB flush is necessary or not. Signed-off-by: Christian König <christian.koenig@amd.com> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-2-christian.koenig@amd.com	2022-10-19 12:45:00 +02:00
Arunpravin Paneer Selvam	8273b40486	drm/amdgpu: Fix for BO move issue A user reported a bug on CAPE VERDE system where uvd_v3_1 IP component failed to initialize as there is an issue with BO move code from one memory to other. In function amdgpu_mem_visible() called by amdgpu_bo_move(), when there are no blocks to compare or if we have a single block then break the loop. Fixes: `312b4dc11d` ("drm/amdgpu: Fix VRAM BO swap issue") Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:14:07 -04:00
YuBiao Wang	2abe92c7ad	drm/amdgpu: dequeue mes scheduler during fini [Why] If mes is not dequeued during fini, mes will be in an uncleaned state during reload, then mes couldn't receive some commands which leads to reload failure. [How] Perform MES dequeue via MMIO after all the unmap jobs are done by mes and before kiq fini. v2: Move the dequeue operation inside kiq_hw_fini. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:13:59 -04:00
Kenneth Feng	5ce4726a13	drm/amd/pm: enable thermal alert on smu_v13_0_10 enable thermal alert on smu_v13_0_10 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:13:34 -04:00
Yifan Zha	97a3d6090f	drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11 [Why] L1 blocks most of GC registers accessing by MMIO. [How] Use RLCG interface to program GC registers under SRIOV VF in full access time. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:13:24 -04:00
Nathan Chancellor	e688ba3e27	drm/amdkfd: Fix type of reset_type parameter in hqd_destroy() callback When booting a kernel compiled with CONFIG_CFI_CLANG on a machine with an RX 6700 XT, there is a CFI failure in kfd_destroy_mqd_cp(): [ 12.894543] CFI failure at kfd_destroy_mqd_cp+0x2a/0x40 [amdgpu] (target: hqd_destroy_v10_3+0x0/0x260 [amdgpu]; expected type: 0x8594d794) Clang's kernel Control Flow Integrity (kCFI) makes sure that all indirect call targets have a type that exactly matches the function pointer prototype. In this case, hqd_destroy()'s third parameter, reset_type, should have a type of 'uint32_t' but every implementation of this callback has a third parameter type of 'enum kfd_preempt_type'. Update the function pointer prototype to match reality so that there is no more CFI violation. Link: https://github.com/ClangBuiltLinux/linux/issues/1738 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:13:12 -04:00
Guenter Roeck	8a70b2d89e	drm/amd/display: Increase frame size limit for display_mode_vba_util_32.o Building 32-bit images may fail with the following error. drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c: In function ‘dml32_UseMinimumDCFCLK’: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:3142:1: error: the frame size of 1096 bytes is larger than 1024 bytes This is seen when building i386:allmodconfig with any of the following compilers. gcc (Debian 12.2.0-3) 12.2.0 gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 The problem is not seen if the compiler supports GCC_PLUGIN_LATENT_ENTROPY because in that case CONFIG_FRAME_WARN is already set to 2048 even for 32-bit builds. dml32_UseMinimumDCFCLK() was introduced with commit `dda4fb85e4` ("drm/amd/display: DML changes for DCN32/321"). It declares a large number of local variables. Increase the frame size for the affected file to 2048, similar to other files in the same directory, to enable 32-bit build tests with affected compilers. Fixes: `dda4fb85e4` ("drm/amd/display: DML changes for DCN32/321") Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Reported-by: Łukasz Bartosik <ukaszb@google.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:13:03 -04:00
Tim Huang	31c261a7ff	drm/amd/pm: add SMU IP v13.0.4 IF version define to V7 The pmfw has changed the driver interface version, so keep same with the fw. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:12:51 -04:00
Tim Huang	853fdb4916	drm/amd/pm: update SMU IP v13.0.4 driver interface version Update the SMU driver interface version to V7. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:12:44 -04:00
ZhenGuo Yin	5fa993737b	drm/amd/pm: Init pm_attr_list when dpm is disabled [Why] In SRIOV multi-vf, dpm is always disabled, and pm_attr_list won't be initialized. There will be a NULL pointer call trace after removing the dpm check condition in amdgpu_pm_sysfs_fini. BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:amdgpu_device_attr_remove_groups+0x20/0x90 [amdgpu] Call Trace: <TASK> amdgpu_pm_sysfs_fini+0x2f/0x40 [amdgpu] amdgpu_device_fini_hw+0xdf/0x290 [amdgpu] [How] List pm_attr_list should be initialized when dpm is disabled. Fixes: `a6ad27cec5` ("drm/amd/pm: Remove redundant check condition") Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:12:36 -04:00
Evan Quan	3059cd8c5f	drm/amd/pm: disable cstate feature for gpu reset scenario Suggested by PMFW team and same as what did for gfxoff feature. This can address some Mode1Reset failures observed on SMU13.0.0. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:12:20 -04:00
Evan Quan	ba2f09960e	drm/amd/pm: fulfill SMU13.0.7 cstate control interface Fulfill the functionality for cstate control. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:12:14 -04:00
Evan Quan	528c0e66e0	drm/amd/pm: fulfill SMU13.0.0 cstate control interface Fulfill the functionality for cstate control. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:12:01 -04:00
YiPeng Chai	3bd026c3e3	drm/amdgpu: Add sriov vf ras support in amdgpu_ras_asic_supported V2: Add sriov vf ras support in amdgpu_ras_asic_supported. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:11:48 -04:00
YiPeng Chai	001ebcf5b9	drm/amdgpu: Enable ras support for mp0 v13_0_0 and v13_0_10 V1: Enable ras support for CHIP_IP_DISCOVERY asic type. V2: 1. Change commit comment. 2. Enable ras support for mp0 v13_0_0 and v13_0_10. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:11:37 -04:00
YiPeng Chai	7cd3f6c3ac	drm/amdgpu: Enable gmc soft reset on gmc_v11_0_3 Enable gmc soft reset on gmc_v11_0_3. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:09:48 -04:00
Likun Gao	b7a76a2914	drm/amdgpu: skip mes self test for gc 11.0.3 Temporary disable mes self teset for gc 11.0.3. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:09:41 -04:00
Kenneth Feng	f700486cd1	drm/amd/pm: skip loading pptable from driver on secure board for smu_v13_0_10 skip loading pptable from driver on secure board since it's loaded from psp. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Guan Yu <Guan.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:09:31 -04:00
Kenneth Feng	657e07221c	drm/amd/amdgpu: enable gfx clock gating features on smu_v13_0_10 enable gfx clock gating features on smu_v13_0_10 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Jack Gui <Jack.Gui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:09:23 -04:00
Kenneth Feng	4c7f9a3c15	drm/amd/pm: remove the pptable id override on smu_v13_0_10 remove the pptable id override on smu_v13_0_10, and the id is fetched from vbios now. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:09:17 -04:00
Kenneth Feng	4d72a4e4fb	drm/amd/pm: temporarily disable thermal alert on smu_v13_0_10 temporarily disable thermal alert on smu_v13_0_10 due to kfd test fail. will enable it again after confirming the thermal hardware setting. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:09:10 -04:00
Asher Song	4545ae2ed3	drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly" This reverts commit `16fb4dca95`. Unfortunately, that commit causes fan monitors can't be read and written properly. Fixes: `16fb4dca95` ("drm/amdgpu: getting fan speed pwm for vega10 properly") Signed-off-by: Asher Song <Asher.Song@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:08:49 -04:00
Victor Zhao	a31e62873f	drm/amdgpu: Refactor mode2 reset logic for v11.0.7 - refactor mode2 on v11.0.7 to align with aldebaran - comment out using mode2 reset as default for now, will introduce another controller to replace previous reset_level_mask v2: squash in unused variable removal (Alex) Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:08:40 -04:00
Victor Zhao	a340847b02	Revert "drm/amdgpu: let mode2 reset fallback to default when failure" This reverts commit `dac6b80818`. This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with the original design of reset handler. Will redesign it. Fixes: `dac6b80818` ("drm/amdgpu: let mode2 reset fallback to default when failure") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:08:33 -04:00
Victor Zhao	afbaa15501	Revert "drm/amdgpu: add debugfs amdgpu_reset_level" This reverts commit `5bd8d53f6f`. This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler. Fixes: `5bd8d53f6f` ("drm/amdgpu: add debugfs amdgpu_reset_level") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:08:25 -04:00
Danijel Slivka	65f8682b9a	drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime. v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT which indicates that VF MMIO write access is not allowed in sriov runtime Signed-off-by: Danijel Slivka <danijel.slivka@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:07:58 -04:00
Linus Torvalds	5e714bf171	- Alistair Popple has a series which addresses a race which causes page refcounting errors in ZONE_DEVICE pages. - Peter Xu fixes some userfaultfd test harness instability. - Various other patches in MM, mainly fixes. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0j6igAKCRDdBJ7gKXxA jnGxAP99bV39ZtOsoY4OHdZlWU16BUjKuf/cb3bZlC2G849vEwD+OKlij86SG20j MGJQ6TfULJ8f1dnQDd6wvDfl3FMl7Qc= =tbdp -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - fix a race which causes page refcounting errors in ZONE_DEVICE pages (Alistair Popple) - fix userfaultfd test harness instability (Peter Xu) - various other patches in MM, mainly fixes * tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits) highmem: fix kmap_to_page() for kmap_local_page() addresses mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page mm/selftest: uffd: explain the write missing fault check mm/hugetlb: use hugetlb_pte_stable in migration race check mm/hugetlb: fix race condition of uffd missing/minor handling zram: always expose rw_page LoongArch: update local TLB if PTE entry exists mm: use update_mmu_tlb() on the second thread kasan: fix array-bounds warnings in tests hmm-tests: add test for migrate_device_range() nouveau/dmem: evict device private memory during release nouveau/dmem: refactor nouveau_dmem_fault_copy_one() mm/migrate_device.c: add migrate_device_range() mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page() mm/memremap.c: take a pgmap reference on page allocation mm: free device private pages have zero refcount mm/memory.c: fix race when faulting a device private page mm/damon: use damon_sz_region() in appropriate place mm/damon: move sz_damon_region to damon_sz_region lib/test_meminit: add checks for the allocation functions ...	2022-10-14 12:28:43 -07:00
Nathan Chancellor	2130b87b22	drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n After commit `8799c0be89` ("drm/amd/display: Fix vblank refcount in vrr transition"), a build with CONFIG_DEBUG_FS=n is broken due to a misplaced brace, along the lines of: In file included from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_trace.h:39, from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:41: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: At top level: ./include/drm/drm_atomic.h:864:9: error: expected identifier or ‘(’ before ‘for’ 864 \| for ((__i) = 0; \ \| ^~~ drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8317:9: note: in expansion of macro ‘for_each_new_crtc_in_state’ 8317 \| for_each_new_crtc_in_state(state, crtc, new_crtc_state, j) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ Move the brace within the #ifdef so that the file can be built with or without CONFIG_DEBUG_FS. Fixes: `8799c0be89` ("drm/amd/display: Fix vblank refcount in vrr transition") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-10-14 11:13:19 -07:00
Linus Torvalds	9c9155a350	drm fixes for 6.1-rc1 amdgpu: - DC mutex fix - DC SubVP fixes - DCN 3.2.x fixes - DCN 3.1.x fixes - SDMA 6.x fixes - Enable DPIA for 3.1.4 - VRR fixes - VRAM BO swapping fix - Revert dirty fb helper change - SR-IOV suspend/resume fixes - Work around GCC array bounds check fail warning - UMC 8.10 fixes - Misc fixes and cleanups i915: - Round to closest in g4x+ HDMI clock readout - Update MOCS table for EHL - Fix PSR_IMR/IIR field handling - Fix watermark calculations for gen12+/DG2 modifiers - Reject excessive dotclocks early - Fix revocation of non-persistent contexts - Handle migration for dpt - Fix display problems after resume - Allow control over the flags when migrating - Consider DG2_RC_CCS_CC when migrating buffers -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmNIrI0ACgkQDHTzWXnE hr4muRAAm6mywF0G9LhtVuYhpl5A8EZR5as4PxE1nSdLs4t1+OSJ6MfJoY/Ihvc6 LMzMadDMHoZbY/9z0MZDJEfeBSTRjTu1+g8u63d0tMf0YqI8/3D53sOodlULftc4 9YwjyB4G2xol/fk81Z5IgIcFilW5M8OjqMHuswxb0k92aF3arkHkVT+w0Njd26dK Fqxm7Z8FbdQFlWeZLaY43K4CGQUqMtg2Xcc+F/ciNAs9a6QwTudaenchSdXMBzCW OMv14KLDG3YO+tndOESmKVE9qJ0X/7J8mzUpxfzbcAalGJQpctTGPiy/UBLQvOzx BKOSCVRg+1LPRnpnj9nZWCrs5oWCaDWr1XjYNP7za8wukHIr3sGRST1sDkVqzeKw Ct62V9H5ix3I9UG/QXSguY30Vq/w7zPJQuQy+CWNKGjsXR8hVyCc2NTIk1FlDMww vRjOD9stbSdX5hGS/lSi2xZ9ERPi2L73bPQblosKl3Gi65kScvdzl2F9PSuMGsik rK1PB2I7dj1gJp4f8RmQZCrN0UlfH/YAVn5rpZSejAE+mFGG0/qzgBkIkNhAHB9s 3cg/EiDRbGQ6zxffE6GGP4HN6E9vVtuOVJl8sGR90aDRvVyTXKgTXiXRfHGc9BZj xc6sRPQjjWvHVopN354A9bJs3InT9rwEOw/0PHhK1GXE4htkGpc= =X1UY -----END PGP SIGNATURE----- Merge tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm Pull more drm updates from Dave Airlie: "Round of fixes for the merge window stuff, bunch of amdgpu and i915 changes, this should have the gcc11 warning fix, amongst other changes. amdgpu: - DC mutex fix - DC SubVP fixes - DCN 3.2.x fixes - DCN 3.1.x fixes - SDMA 6.x fixes - Enable DPIA for 3.1.4 - VRR fixes - VRAM BO swapping fix - Revert dirty fb helper change - SR-IOV suspend/resume fixes - Work around GCC array bounds check fail warning - UMC 8.10 fixes - Misc fixes and cleanups i915: - Round to closest in g4x+ HDMI clock readout - Update MOCS table for EHL - Fix PSR_IMR/IIR field handling - Fix watermark calculations for gen12+/DG2 modifiers - Reject excessive dotclocks early - Fix revocation of non-persistent contexts - Handle migration for dpt - Fix display problems after resume - Allow control over the flags when migrating - Consider DG2_RC_CCS_CC when migrating buffers" * tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm: (110 commits) drm/amd/display: Add HUBP surface flip interrupt handler drm/i915/display: consider DG2_RC_CCS_CC when migrating buffers drm/i915: allow control over the flags when migrating drm/amd/display: Simplify bool conversion drm/amd/display: fix transfer function passed to build_coefficients() drm/amd/display: add a license to cursor_reg_cache.h drm/amd/display: make virtual_disable_link_output static drm/amd/display: fix indentation in dc.c drm/amd/display: make dcn32_split_stream_for_mpc_or_odm static drm/amd/display: fix build error on arm64 drm/amd/display: 3.2.207 drm/amd/display: Clean some DCN32 macros drm/amdgpu: Add poison mode query for umc v8_10_0 drm/amdgpu: Update umc v8_10_0 headers drm/amdgpu: fix coding style issue for mca notifier drm/amdgpu: define convert_error_address for umc v8.7 drm/amdgpu: define RAS convert_error_address API drm/amdgpu: remove check for CE in RAS error address query drm/i915: Fix display problems after resume drm/amd/display: fix array-bounds error in dc_stream_remove_writeback() [take 2] ...	2022-10-13 21:56:34 -07:00
Alistair Popple	ef23345089	mm: free device private pages have zero refcount Since `27674ef6c7` ("mm: remove the extra ZONE_DEVICE struct page refcount") device private pages have no longer had an extra reference count when the page is in use. However before handing them back to the owning device driver we add an extra reference count such that free pages have a reference count of one. This makes it difficult to tell if a page is free or not because both free and in use pages will have a non-zero refcount. Instead we should return pages to the drivers page allocator with a zero reference count. Kernel code can then safely use kernel functions such as get_page_unless_zero(). Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-10-12 18:51:49 -07:00
Alistair Popple	16ce101db8	mm/memory.c: fix race when faulting a device private page Patch series "Fix several device private page reference counting issues", v2 This series aims to fix a number of page reference counting issues in drivers dealing with device private ZONE_DEVICE pages. These result in use-after-free type bugs, either from accessing a struct page which no longer exists because it has been removed or accessing fields within the struct page which are no longer valid because the page has been freed. During normal usage it is unlikely these will cause any problems. However without these fixes it is possible to crash the kernel from userspace. These crashes can be triggered either by unloading the kernel module or unbinding the device from the driver prior to a userspace task exiting. In modules such as Nouveau it is also possible to trigger some of these issues by explicitly closing the device file-descriptor prior to the task exiting and then accessing device private memory. This involves some minor changes to both PowerPC and AMD GPU code. Unfortunately I lack hardware to test either of those so any help there would be appreciated. The changes mimic what is done in for both Nouveau and hmm-tests though so I doubt they will cause problems. This patch (of 8): When the CPU tries to access a device private page the migrate_to_ram() callback associated with the pgmap for the page is called. However no reference is taken on the faulting page. Therefore a concurrent migration of the device private page can free the page and possibly the underlying pgmap. This results in a race which can crash the kernel due to the migrate_to_ram() function pointer becoming invalid. It also means drivers can't reliably read the zone_device_data field because the page may have been freed with memunmap_pages(). Close the race by getting a reference on the page while holding the ptl to ensure it has not been freed. Unfortunately the elevated reference count will cause the migration required to handle the fault to fail. To avoid this failure pass the faulting page into the migrate_vma functions so that if an elevated reference count is found it can be checked to see if it's expected or not. [mpe@ellerman.id.au: fix build] Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Lyude Paul <lyude@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-10-12 18:51:49 -07:00
Aurabindo Pillai	0811b9e453	drm/amd/display: Add HUBP surface flip interrupt handler Add the hubp surface flip handler. This fixes some flip timeout issues. Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x	2022-10-12 11:27:41 -04:00
Yang Li	1f768ba469	drm/amd/display: Simplify bool conversion The result of 'pwr_status == 0' is Boolean, and the question mark expression is redundant. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2354 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-11 11:58:42 -04:00

1 2 3 4 5 ...

23801 Commits