Commit Graph

53 Commits

Author SHA1 Message Date
Jay Cornwall
424648c383 drm/amdkfd: Fix instruction hazard in gfx12 trap handler
VALU instructions with SGPR source need wait states to avoid hazard
with SALU using different SGPR.

v2: Eliminate some hazards to reduce code explosion

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7e0459d453)
Cc: stable@vger.kernel.org # 6.12.x
2025-03-18 16:28:34 -04:00
Lancelot SIX
d584198a6f drm/amdkfd: Ensure consistent barrier state saved in gfx12 trap handler
It is possible for some waves in a workgroup to finish their save
sequence before the group leader has had time to capture the workgroup
barrier state.  When this happens, having those waves exit do impact the
barrier state.  As a consequence, the state captured by the group leader
is invalid, and is eventually incorrectly restored.

This patch proposes to have all waves in a workgroup wait for each other
at the end of their save sequence (just before calling s_endpgm_saved).

Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.12.x
2025-02-12 19:47:15 -05:00
Jay Cornwall
1241b64d4b drm/amdkfd: Clear MODE.VSKIP in gfx9 trap handler
If user shader issues S_SETVSKIP then this state will persist when
executing the trap handler, causing vector instructions to be
skipped.

VSKIP state is already saved/restored through the MODE register.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-24 09:53:05 -05:00
Jay Cornwall
36a21f2686 drm/amdkfd: Sync trap handler binary with source
Source and binary have become mismatched during branch activity.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-24 09:52:24 -05:00
Lancelot SIX
5690011a70 drm/amdkfd: Handle save/restore of lds allocated in 1280B blocks
The gfx-9 trap handler is reading LDS allocation size in 256 bytes
granularity (from SQ_WAVE_LDS_ALLOC), but it using the assumption that
this value is always even (i.e. the LDS allocation is really done in
multiple of 512 bytes).  This was true so far, but gfx-950 allocates LDS
in chunks of 1280 bytes, making this assumption invalid.  This can cause
the trap handler to try to save / restore past the end of LDS, and past
the LDS allocated slot in the save are, overriding data from the
following wave.

This patch updates the trap handler to support LDS allocated in 1280
bytes blocks:
- During restore, copy from main memory directly to LDS in batch of 1280
  bytes.
- During save, continue to use 512 bytes blocks (we only have 2 VGPRs we
  can use to hold data), making sure to mask the upper half of the wave
  when handling when the LDS size is not a multiple of 512 bytes.

Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Co-authored-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:51 -05:00
Lancelot SIX
549120edfd drm/amdkfd: Adjust CWSR trap handler for gfx950
In gfx950, the SQ_WAVE_LDS_ALLOC.LDS_SIZE field is extended to bits 12
to 22.  The LDS_SIZE granularity remains unchanged (units of 64 dwords,
or 256 bytes).  This patch adjusts the CWSR trap handler to read the
full extent of LDS_SIZE.

Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:51 -05:00
Jay Cornwall
ec14eab37d drm/amdkfd: Extend gfx12 trap handler fix to gfx10/11
In commit fda812ebe3 ("drm/amdkfd: gfx12 context save/restore trap handler fixes")
the following fix was introduced but incorrectly restricted to gfx12.
The same issue and a corresponding fix apply to gfx10 and gfx11.

Do not overwrite TRAPSTS.{SAVECTX,HOST_TRAP} when restoring this
register. Both of these fields can assert while the wavefront is
running the trap handler.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:15 -04:00
Jay Cornwall
c5afb313e7 drm/amdkfd: Handle deallocated VPGRs in gfx11+ trap handler
A wavefront may deallocate its VGPRs at the end of a program while
waiting for memory transactions to complete. If it subsequently
receives a context save exception it will be unable to save,
since this requires VGPRs. In this case the trap handler should
terminate the wavefront.

Fixes intermittent VM faults under context switching load.

V2: Use S_ENDPGM instead of S_ENDPGM_SAVED for performance counters

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:05:57 -04:00
Jay Cornwall
fda812ebe3 drm/amdkfd: gfx12 context save/restore trap handler fixes
Fix LDS size interpretation: 512 bytes (>= gfx12) vs 256 (< gfx12).

Ensure STATE_PRIV.BARRIER_COMPLETE cannot change after reading or
before writing. Other waves in the threadgroup may cause this field
to assert if they complete the barrier.

Do not overwrite EXCP_FLAG_PRIV.{SAVE_CONTEXT,HOST_TRAP} when
restoring this register. Both of these fields can assert while the
wavefront is running the trap handler.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:57:40 -04:00
Jay Cornwall
c5e358913d drm/amdkfd: Replace deprecated gfx12 trap handler instructions
Newer assemblers reject S_WAITCNT. All instances of S_WAITCNT can be
replaced by S_WAITCNT 0 (< gfx12) or S_WAIT_IDLE (>= gfx12) since
there is no concurrency of different memory instruction classes.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Jay Cornwall
dec4f2d224 drm/amdkfd: Sync trap handler binary with source
Source and binary have become mismatched during branch activity.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Laurent Morichetti
cf338b5dfe drm/amdkfd: enable missed single-step workaround for gfx12
When trap_ctrl.trap_after_inst is set, it is possible for a wave to
enter the trap handler, after single-stepping an instruction and a
save_context is raised, with only save_context set in excp_flag_priv.

Because excp_flag_priv.trap_after_inst is not reliably set, we need to
use the missed single-step workaround for gfx12 as well.

Also add wave_start and wave_end as exceptions that should be handled
by the 2nd level trap handler.

Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Tested-by: Lancelot Six <lancelot.six@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:13 -04:00
Lancelot SIX
450abfe433 drm/amdkfd: save and restore barrier state for gfx12
Add support to save and restore the work group barrier state in gfx12
CWSR trap handler.

There is no support to directly restore the signal count of a barrier
state, so instead this patch repeatedly calls s_barrier_signal to
increment the signal count to the desired value.

In this patch, I have implemented the logic to restore the barrier at
the end of the block restoring the HWREGs.  This process needs to be
done by exactly 1 wave per work group.  To achieve this, the initial
value of s_restore_spi_init_hi (containing a FIRST_WAVE bit) needs to be
saved up until that point.  An alternative could be restore the barrier
earlier in the process (around when LDS is restored, as the same wave
does both).  Doing this would break the pattern that the restore
procedure follows the CWSR area layout.

Before restoring the barrier, this patch checks if the barrier was whose
state was saved has the "valid" bit set, even if I don't think this
barrier can be in an invalid state during context save.  I expect this
test to always be true.

Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:12 -04:00
Jay Cornwall
f281003336 drm/amdkfd: Add gfx12 trap handler support
- HWREG changes since gfx11
- Save/restore barrier state
- get_wave_size is now reserved by assembler

v2: rebase (Alex)

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:12 -04:00
Laurent Morichetti
45bbf800c5 drm/amdkfd: Use SQC when TCP would fail in gfx10.1 context save
Similarly to gfx9, gfx10.1 drops vector stores when an xnack error is
raised. To work around this issue, use scalar stores instead of vector
stores when trapsts.xnack_error == 1.

Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-06 15:24:50 -05:00
Laurent Morichetti
804bf74b16 drm/amdkfd: pass debug exceptions to second-level trap handler
Call the 2nd level trap handler if the cwsr handler is entered with any
one of wave_start, wave_end, or trap_after_inst exceptions.

Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Tested-by: Lancelot Six <lancelot.six@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:05:33 -05:00
Jay Cornwall
7297ff96ea drm/amdkfd: Use S_ENDPGM_SAVED in trap handler
This instruction has no functional difference to S_ENDPGM
but allows performance counters to track save events correctly.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Laurent Morichetti <laurent.morichetti@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29 15:38:20 -05:00
Laurent Morichetti
f4fac4163c drm/amdkfd: Clear the VALU exception state in the trap handler
The trap handler could be entered with pending VALU exceptions, so
clear the exception state before issuing vector instructions.

Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Tested-by: Lancelot Six <lancelot.six@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:29:54 -05:00
Jay Cornwall
05c899eacc drm/amdkfd: Sign-extend TMA address in trap handler
SMEM instructions can reach addresses above 47 bits but require
bit 47 to be sign-extended through bits [63:48].

This allows the TMA to be relocated in a following patch.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07 17:14:06 -04:00
Jay Cornwall
631ddc3553 drm/amdkfd: Sync trap handler binaries with source
Some changes have been lost during rebases. Rebuild sources.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07 17:14:06 -04:00
Jay Cornwall
1d44ff3d7a drm/amdkfd: Trap handler changes for GC 9.4.3 v2
v1:
Check new exception bits in TRAPSTS register
Remove single step exception workaround, now part of
exception bits

v2:
GC 9.4.3 uses ttmp11 to store {1’b0, dispatch index [24:0],
wave_id_in_workgroup[5:0]}, so use ttmp13 instead of ttmp11 to
preserve ib_sts. (Laurent)

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Laurent Morichetti <Laurent.Morichetti@amd.com>
Reviewed-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-31 11:18:55 -04:00
Jay Cornwall
6640f8e5ad drm/amdkfd: update GFX11 CWSR trap handler
With corresponding FW change fixes issue where triggering CWSR on a
workgroup with waves in s_barrier wouldn't lead to a back-off and
therefore cause a hang.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Tested-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Graham Sider <Graham.Sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-11-02 17:16:25 -04:00
David Belanger
585a82618b drm/amdgpu: Enable SA software trap.
Enables support for software trap for MES >= 4.
Adapted from implementation from Jay Cornwall.

v2: Add IP version check in conditions.
v3: Remove debugger code changes.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Jay Cornwall
6a8170383c drm/amdkfd: Add gfx11 trap handler
Based on gfx10 with following changes:

- GPR_ALLOC.VGPR_SIZE field moved (and size corrected in gfx10)
- s_sendmsg_rtn_b64 replaces some s_sendmsg/s_getreg
- Buffer instructions no longer have direct-to-LDS modifier

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-26 14:56:33 -04:00
Eric Huang
5e613723f8 drm/amdkfd: port cwsr trap handler from dkms branch
Most of changes are for debugger feature, and it is
to simplify trap handler support for new asics in the
future.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-26 14:56:32 -04:00
Laurent Morichetti
ad6cc94a6b drm/amdkfd: Fix saving the ACC vgprs for Aldebaran
get_num_acc_vgprs does not set status.scc if the number of acc vgprs
is 0, so use an and instruction to set the condition code.

The Aldebaran handler binary was not based on the latest version of
the sources, so this update to the binary is the minimal change only
adding two instructions to set the condition code.

A newer version of the handler should be generated and tested in
another commit.

Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 22:56:55 -04:00
Jay Cornwall
0ef6845c8c drm/amdkfd: Add aldebaran trap handler support
Similar to arcturus, but ARCH/ACC VGPRs may now be split unevenly.
A new field in SQ_WAVE_GPR_ALLOC tracks the boundary between the two
sets of VGPRs.

Squash below patches:

drm/amdkfd: Use preprocessor for IP-specific trap handler code
drm/amdkfd: Fix VGPR restore race in gfx8/gfx9 trap handler
drm/amdkfd: Remove duplicated code in gfx9 trap handler
drm/amdkfd: Separate ARCH/ACC VGPR restore in trap handler
drm/amdkfd: Reverse order of ARCH/ACC VGPR restore in trap handler

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-10 00:02:24 -05:00
Jay Cornwall
b60646a20c drm/amdkfd: Fix spurious debug exception on gfx10
s_barrier triggers a debug exception when issued with PRIV=1,
DEBUG_EN=1. This causes spurious notifications to rocm-gdb.

Clear MODE before issuing s_barrier and restore MODE afterwards
in the context restore handler.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Tested-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-10 17:26:51 -04:00
Felix Kuehling
c342d7c579 Revert "drm/amdkfd: Unify gfx9/gfx10 context save area layouts"
This reverts commit 0a5baee415.

The change introduced a regression on some chips. Reverting until
a proper solution can be found.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-10 17:26:00 -04:00
Felix Kuehling
5218992251 Revert "drm/amdkfd: Fix spurious debug exception on gfx10"
This reverts commit ea368183ae.

Needed due to conflicts when reverting "drm/amdkfd: Unify gfx9/gfx10
context save area layouts".

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-10 17:24:51 -04:00
Jay Cornwall
ea368183ae drm/amdkfd: Fix spurious debug exception on gfx10
s_barrier triggers a debug exception when issued with PRIV=1,
DEBUG_EN=1. This causes spurious notifications to rocm-gdb.

Clear MODE before issuing s_barrier and restore MODE afterwards
in the context restore handler.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Tested-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-27 16:22:16 -04:00
Laurent Morichetti
0a5baee415 drm/amdkfd: Unify gfx9/gfx10 context save area layouts
Add some padding before the MODE register in the HWREGs block to
preserve the same layout as gfx9. This simplifies implementation of a
user-mode debugger.

Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-27 16:20:47 -04:00
Jay Cornwall
3cefc7189c drm/amdkfd: Support debugger in Navi1x trap handler
- Preserve scalar GPRs ttmp[4:11] and ttmp13
- Add single step exception during context save workaround
- Remove incorrect PC adjustment during context save

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:11 -04:00
Jay Cornwall
d0f1a85366 drm/amdkfd: Support newer assemblers in gfx10 trap handler
The contents of macros are parsed by the assembler before conditions
have been tested. This causes assembly errors when using IP-specific
instructions in the IP-unified trap handler.

Add a preprocessing step to filter IP-specific code.

Also guard a Navi1x-specific instruction (no effect on Sienna_Cichlid).

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:11 -04:00
Jay Cornwall
80b6cfedd3 drm/amdkfd: Add Sienna_Cichlid trap handler support
- Replace SQC stores with TCP stores
- Synchronize with MSG_SAVEWAVE via lgkmcnt
- HW_REG_IB_STS is now read-only

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:11 -04:00
Jay Cornwall
c18cc2bb9e drm/amdkfd: Fix race in gfx10 context restore handler
Missing synchronization with VGPR restore leads to intermittent
VGPR trashing in the user shader.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-03 09:11:04 -05:00
Jay Cornwall
4b617e2b9e drm/amdkfd: Swap trap temporary registers in gfx10 trap handler
ttmp[4:5] hold information useful to the debugger. Use ttmp[14:15]
instead, aligning implementation with gfx9 trap handler.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: shaoyun liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16 15:28:31 -05:00
Jay Cornwall
1faa3b8054 drm/amdkfd: Save/restore vcc on gfx10
VCC moved out of user SGPR allocation in gfx10. It's now stored
in SGPRs 106-107.

Also fixes incorrect SGPR read offsets.

Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30 23:48:33 -05:00
Jay Cornwall
f9e346aba1 drm/amdkfd: Save/restore flat_scratch_lo/hi on gfx10
These moved from SGPRs in gfx9 to HWREG in gfx10.

Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30 23:48:33 -05:00
Jay Cornwall
7ce55e0b6f drm/amdkfd: Fix gfx10 wave64 VGPR context restore
Copy/paste error, first 4 VGPRs are separated by 64 dwords (256 bytes).

Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30 23:48:33 -05:00
Jay Cornwall
a36e896740 drm/amdkfd: Replace gfx10 trap handler with correct branch
Previously submitted code was taken from an incorrect branch and
was non-functional.

Cc: Oak Zeng <oak.zeng@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-By: Oak Zeng <oak.zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30 23:22:11 -05:00
Jay Cornwall
7c2eaf5cdb drm/amdkfd: Fix lost single step exceptions in gfx9 trap handler
If the trap is entered due to MODE.DEBUG_EN=1 and SAVECTX is raised
concurrently the handler cannot identify the source of the exception.
This causes the debugger to lose single step exception notification
when a context save request arrives at the same time.

When MODE.DEBUG_EN=1 and STATUS.HALT=0 (exception not already handled)
jump to the second-level trap handler upon entering the trap. The
second-level trap will set STATUS.HALT=1 and return to the shader.
If SAVECTX was raised then control flow will return to the trap, which
will then handle the context save request.

Cc: Tony Tye <tony.tye@amd.com>
Cc: Laurent Morichetti <laurent.morichetti@amd.com>
Cc: Qingchuan Shi <qingchuan.shi@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30 23:22:02 -05:00
Jay Cornwall
8c7a5d9e6f drm/amdkfd: Use SQC when TCP would fail in gfx9 context save.
When a wavefront raises TRAPSTS.XNACK_ERROR with STATUS.ALLOW_REPLAY=0
subsequent memory instructions have undefined behavior. In practice
SQC stores continue to work but TCP stores do not.

Context save is permitted to fail after XNACK error because the
wavefront will be halted and subsequently terminated. However the
debugger has an interest in retrieving the wavefront VGPR/LDS state.

Detect the out-of-spec case and use SQC stores during context save
in place of TCP stores.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30 23:19:47 -05:00
Jay Cornwall
37f86a9b36 drm/amdkfd: Merge gfx9/arcturus trap handlers, add ACC VGPR save
ACC VGPRs are a secondary VGPR set of same size as the primary VGPRs.
Save them as a block immediately following VGPRs.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:06 -05:00
Oak Zeng
3baa24f0fc drm/amdkfd: Add arcturus CWSR trap handler
CWSR (compute wave save/restore) is used for
preempting compute queues.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:06 -05:00
Philip Cox
14328aa58c drm/amdkfd: Add navi10 support to amdkfd. (v3)
KFD (kernel fusion driver) is the kernel driver
for the compute backend for usermode compute
stack.

v2: squash in updates (Alex)
v3: squash in rebase fixes (Alex)

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 18:59:24 -05:00
Jay Cornwall
fa722f0d98 drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15]
ttmp[4:5] is initialized by the SPI with SPI_GDBG_TRAP_DATA* values.
These values are more useful to the debugger than ttmp[14:15], which
carries dispatch_scratch_base*. There are too few registers to
preserve both.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:21:01 -05:00
Jay Cornwall
5883600901 drm/amdkfd: Fix gfx9 XNACK state save/restore
SQ_WAVE_IB_STS.RCNT grew from 4 bits to 5 in gfx9. Do not truncate
when saving in the high bits of TTMP1.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:21:01 -05:00
Jay Cornwall
157e586dc9 drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL
If instruction fetch fails the wave cannot be halted and returned to
the shader without raising MEM_VIOL again. Currently the wave is
terminated if this occurs, but this loses information about the cause
of the fault. The debugger would prefer the faulting wave state to be
context-saved.

Poll inside the trap handler until TRAPSTS.SAVECTX indicates context
save is ready. Exit the poll loop and complete the remainder of the
exception handler, then return to the shader. The next instruction
fetch will be from the trap handler and not the faulting PC. Context
save will then deschedule the wave and save its state.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:21:01 -05:00
Jay Cornwall
2db2f25959 drm/amdkfd: Fix gfx8 MEM_VIOL exception handler
When MEM_VIOL is asserted the context save handler rewinds the
program counter. This is incorrect for any source of the exception.
MEM_VIOL may be raised in normal operation by out-of-bounds access
to LDS or GDS and does not require special handling.

Remove PC adjustment when MEM_VIOL has been raised.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:21:01 -05:00