Commit Graph

31 Commits

Author SHA1 Message Date
Lijo Lazar
389d79a195 drm/amdgpu: Update supported modes for GC v9.5.0
For GC v9.5.0 SOCs, both CPX and QPX compute modes are also supported in
NPS2 mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9d1ac25c7f830e0132aa816393b1e9f140e71148)
Cc: stable@vger.kernel.org
2025-08-04 15:32:07 -04:00
Hawking Zhang
f268cef77e drm/amdgpu: Convert pre|post_partition_switch into common helpers
So they can be reused for future products

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24 10:04:03 -04:00
Hawking Zhang
e0f14a2abf drm/amdgpu: Convert update_supported_modes into a common helper
So it can be used for future products

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24 10:03:57 -04:00
Hawking Zhang
4dbc17b455 drm/amdgpu: Convert update_partition_sched_list into a common helper v3
The update_partition_sched_list function does not
need to remain as a soc specific callback. It can
be reused for future products.

v2: bypass the function if xcp_mgr is not available (Likun)

v3: Let caller check the availability of xcp_mgr (Lijo)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24 10:03:41 -04:00
Hawking Zhang
bf587417ff drm/amdgpu: Convert select_sched into a common helper v3
The xcp select_sched function does not need to
remain as a soc specific callback. It can be reused
for future products

v2: bypass the function if xcp_mgr is not available (Likun)

v3: Let caller check the availability of xcp mgr (Lijo)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24 10:03:32 -04:00
Likun Gao
37b791d667 drm/amdgpu: use common function to map ip for aqua_vanjaram
Transfer to use function amdgpu_ip_map_init to map ip
instance for aqua_vanjaram instead of operation on
different ASIC.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24 10:03:25 -04:00
Lijo Lazar
04141c05f3 drm/amdgpu: Extend bus status check to more cases
In case of unexpected errors, check if device is alive on the bus.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18 12:19:21 -04:00
Lijo Lazar
cc473057bb drm/amdgpu: Allow NPS2-CPX combination for VFs
CPX partition mode is compatible with NPS2 on aquavanjaram VFs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:38:05 -04:00
Lijo Lazar
3aa37922c6 drm/amdgpu: Use compatible NPS mode info
Compatible NPS modes for a partition mode are exposed through xcp_config
interface. To determine if a compute partition mode is valid, check if
the current NPS mode is part of compatible NPS modes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16 13:37:50 -04:00
Lijo Lazar
ee97326fb9 drm/amdgpu: Add NPS2 to DPX compatible mode
Compute partition DPX is possible in NPS2 mode. Update the compatible
modes for DPX.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26 17:44:32 -04:00
Tao Zhou
4d614ce8ff drm/amdgpu: add RAS CPER ring buffer
And initialize it, this is a pure software ring to store RAS CPER data.

v2: change ring size to 0x100000
v2: update the initialization of count_dw of cper ring, it's dword
variable
v3: skip VM inv eng for cper
v3: init/fini when aca enabled

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Amber Lin
5183e69090 drm/amdgpu: Remove extra checks for CPX
As far as the number of XCCs, the number of compute partitions, and the
number of memory partitions qualify, CPX is valid.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:49 -05:00
Lijo Lazar
990c4f5807 drm/amdgpu: Fix DPX valid mode check on GC 9.4.3
For DPX mode, the number of memory partitions supported should be less
than or equal to 2.

Fixes: 1589c82a10 ("drm/amdgpu: Check memory ranges for valid xcp mode")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05 10:34:06 -05:00
Lijo Lazar
e5ad71779d drm/amdgpu: Add compatible NPS mode info
Populate the compatible NPS modes also for providing partition
configuration details through sysfs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:06:23 -05:00
Lijo Lazar
f501057aff drm/amdgpu: Add callback get xcp resource info
Add a callback interface to get the resource information of a partition
mode. Presently the information has number of resources and number of
entities sharing the resource.

Add the implementation for aquavanjaram SOCs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Lijo Lazar
1bc0b33915 drm/amd: Add helper to get partition config modes
Add helper to get supported/available partition config modes

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Lijo Lazar
42ac749d5b drm/amdgpu: Fix XCP instance mask calculation
Fix instance mask calculation for VCN IP. There are cases where VCN
instance could be shared across partitions. Fix here so that other
blocks don't need to check for any shared instances based on partition
mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18 16:15:09 -04:00
Srinivasan Shanmugam
f846250b8a drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute rings
This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v9_4_3 module.

The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.

`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.

`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.

These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.

The commit also includes a check for the type of the ring. If the type
of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the
`enforce_isolation` structure in the `gfx` structure of the
`amdgpu_device` is set to the `xcp_id` of the ring. This ensures that
the correct `xcp_id` is used when enforcing isolation on compute rings.
The `xcp_id` is an identifier for an XCP partition, and different rings
can be associated with different XCP partitions.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
2024-08-20 22:08:02 -04:00
Lijo Lazar
d02ddefc7e drm/amdgpu: Initialize VF partition mode
For SOCs with GFX v9.4.3, a VF may have multiple compute partitions.
Fetch the partition information during init and initialize partition
nodes. There is no support to switch partition mode in VF mode, hence
disable the same.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:28 -04:00
Rajneesh Bhardwaj
ef5c0f897e drm/amdgpu: Make CPX mode auto default in NPS4
On GFXIP9.4.3, make CPX mode as the default compute mode if the node is
setup in NPS4 memory partition mode. This change is only applicable for
dGPU, for APU, continue to use TPX mode.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Jesse Zhang
1a00f2ac82 drm/amdgpu: Fix the warning division or modulo by zero
Checks the partition mode and returns an error for an invalid mode.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:11:47 -04:00
Lijo Lazar
8954c3fbe7 drm/amdgpu: Change AID detection logic
On GFX 9.4.3 SOCs, only 2 SDMA instances need to be available to be
considered as a valid AID.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:47:19 -04:00
Lijo Lazar
91bc860116 drm/amdgpu: Fix VCN allocation in CPX partition
VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In
certain configs, VCN instance can be exclusively allocated to a
partition even under CPX mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:08:34 -04:00
Lijo Lazar
650f0487d6 drm/amdgpu: Read aquavanjaram USR register state
Add support to read state of USR links in aquavanjaram SOC.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:37 -05:00
Lijo Lazar
13ac7c0e30 drm/amdgpu: Read aquavanjaram WAFL register state
Add support to read state of WAFL links in aquavanjaram SOC.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:36 -05:00
Lijo Lazar
92e508eaf3 drm/amdgpu: Read aquavanjaram XGMI register state
Add support to read state of XGMI links in aquavanjaram SOC.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:35 -05:00
Lijo Lazar
081a6eda2b drm/amdgpu: Read aquavanjaram PCIE register state
Add support to read aqua vanjaram PCIE register state

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:35 -05:00
Lijo Lazar
c45e38f217 drm/amdgpu: Restore partition mode after reset
On a full device reset, PSP FW gets unloaded. Hence restore the
partition mode by placing a new request.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:54:51 -04:00
Lijo Lazar
6cb209ed68 drm/amdgpu: Update ring scheduler info as needed
Not all rings have scheduler associated. Only update scheduler data for
rings with scheduler. It could result in out of bound access as total
rings are more than those associated with particular IPs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:39:29 -04:00
Guchun Chen
18cf073faa drm/amdgpu: use a macro to define no xcp partition case
~0 as no xcp partition is used in several places, so improve its
definition by a macro for code consistency.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:19:02 -04:00
Lijo Lazar
fe9aaddf90 drm/amdgpu: Rename aqua_vanjaram_reg_init.c
This contains SOC specific functions, rename to a more generic format
<soc>.c => aqua_vanjaram.c

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:38:27 -04:00