Commit Graph

22 Commits

Author SHA1 Message Date
Linus Torvalds
b08494a8f7 drm for 6.16-rc1
new drivers:
 - bring in the asahi uapi header standalone
 - nova-drm: stub driver
 
 rust dependencies (for nova-core):
 - auxiliary
   - bus abstractions
   - driver registration
   - sample driver
 - devres changes from driver-core
 - revocable changes
 
 core:
 - add Apple fourcc modifiers
 - add virtio capset definitions
 - extend EXPORT_SYNC_FILE for timeline syncobjs
 - convert to devm_platform_ioremap_resource
 - refactor shmem helper page pinning
 - DP powerup/down link helpers
 - remove disgusting turds
 - extended %p4cc in vsprintf.c to support fourcc prints
 - change vsprintf %p4cn to %p4chR, remove %p4cn
 - Add drm_file_err function
 - IN_FORMATS_ASYNC property
 - move sitronix from tiny to their own subdir
 
 rust:
 - add drm core infrastructure rust abstractions
   (device/driver, ioctl, file, gem)
 
 dma-buf:
 - adjust sg handling to not cache map on attach
 - allow setting dma-device for import
 - Add a helper to sort and deduplicate dma_fence arrays
 
 docs:
 - updated drm scheduler docs
 - fbdev todo update
 - fb rendering
 - actual brightness
 
 ttm:
 - fix delayed destroy resv object
 
 bridge:
 - add kunit tests
 - convert tc358775 to atomic
 - convert drivers to devm_drm_bridge_alloc
 - convert rk3066_hdmi to bridge driver
 
 scheduler:
 - add kunit tests
 
 panel:
 - refcount panels to improve lifetime handling
 - Powertip PH128800T004-ZZA01
 - NLT NL13676BC25-03F, Tianma TM070JDHG34-00
 - Himax HX8279/HX8279-D DDIC
 - Visionox G2647FB105
 - Sitronix ST7571
 - ZOTAC rotation quirk
 
 vkms:
 - allow attaching more displays
 
 i915:
 - xe3lpd display updates
 - vrr refactor
 - intel_display struct conversions
 - xe2hpd memory type identification
 - add link rate/count to i915_display_info
 - cleanup VGA plane handling
 - refactor HDCP GSC
 - fix SLPC wait boosting reference counting
 - add 20ms delay to engine reset
 - fix fence release on early probe errors
 
 xe:
 - SRIOV updates
 - BMG PCI ID update
 - support separate firmware for each GT
 - SVM fix, prelim SVM multi-device work
 - export fan speed
 - temp disable d3cold on BMG
 - backup VRAM in PM notifier instead of suspend/freeze
 - update xe_ttm_access_memory to use GPU for non-visible access
 - fix guc_info debugfs for VFs
 - use copy_from_user instead of __copy_from_user
 - append PCIe gen5 limitations to xe_firmware document
 
 amdgpu:
 - DSC cleanup
 - DC Scaling updates
 - Fused I2C-over-AUX updates
 - DMUB updates
 - Use drm_file_err in amdgpu
 - Enforce isolation updates
 - Use new dma_fence helpers
 - USERQ fixes
 - Documentation updates
 - SR-IOV updates
 - RAS updates
 - PSP 12 cleanups
 - GC 9.5 updates
 - SMU 13.x updates
 - VCN / JPEG SR-IOV updates
 
 amdkfd:
 - Update error messages for SDMA
 - Userptr updates
 - XNACK fixes
 
 radeon:
 - CIK doorbell cleanup
 
 nouveau:
 - add support for NVIDIA r570 GSP firmware
 - enable Hopper/Blackwell support
 
 nova-core:
 - fix task list
 - register definition infrastructure
 - move firmware into own rust module
 - register auxiliary device for nova-drm
 
 nova-drm:
 - initial driver skeleton
 
 msm:
 - GPU:
   - ACD (adaptive clock distribution) for X1-85
   - drop fictional address_space_size
   - improve GMU HFI response time out robustness
   - fix crash when throttling during boot
 - DPU:
   - use single CTL path for flushing on DPU 5.x+
   - improve SSPP allocation code for better sharing
   - Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550
   - Added SAR2130P support
   - Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660
 - DP:
   - switch to new audio helpers
   - better LTTPR handling
 - DSI:
   - Added support for SA8775P
   - Added SAR2130P support
 - HDMI:
   - Switched to use new helpers for ACR data
   - Fixed old standing issue of HPD not working in some cases
 
 amdxdna:
 - add dma-buf support
 - allow empty command submits
 
 renesas:
 - add dma-buf support
 - add zpos, alpha, blend support
 
 panthor:
 - fail properly for NO_MMAP bos
 - add SET_LABEL ioctl
 - debugfs BO dumping support
 
 imagination:
 - update DT bindings
 - support TI AM68 GPU
 
 hibmc:
 - improve interrupt handling and HPD support
 
 virtio:
 - add panic handler support
 
 rockchip:
 - add RK3588 support
 - add DP AUX bus panel support
 
 ivpu:
 - add heartbeat based hangcheck
 
 mediatek:
 - prepares support for MT8195/99 HDMIv2/DDCv2
 
 anx7625:
 - improve HPD
 
 tegra:
 - speed up firmware loading
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmg2aVAACgkQDHTzWXnE
 hr6DjhAApr2fZjugU3EmpsARdcIWgEd+X65R97ef7RlUGqBKm2joSwZGOhH0oBsG
 9WyO92Qzu6XMe8OibKqY4D2hir9UPz5v+uEWe3q9CzZGbNyAwyVRjVkaKpnI9upv
 1dmHFI7HgPu6qbz6RfPIfgALBLXvVXMaQ4+ZgN/cLtZFa+OLAV5ByqWsRPPXZFb0
 F/pQGQ4ursglfA+LH3SVPfnTN53lu93IlM5/Os9OQQGj+44w94zQ6DCm7CY1AugH
 n+RM/0Yv7WaoF1ByeOtq4FcrmLRrd+ozsvITbRZqhOx7zS/mhP8LRzAwgKWOYzSh
 puKunyQiSdHR7FSqSi8uyY3YumcLWNa/17LMKoTf+KqweJbKGE7RVBuFBn6WUdPb
 AYHZrSB4USAeyahdrrsU+q7ltu5urs5ckpbXsRurMiaUz/BLim1PIm3N5FDLPY7B
 PD1n1FcMUv3CmJT5Y+aNIQgmf1/dETESRTSAgSoOo3gNp6jdRCYqSuWIBsppibWT
 26+tyz0/FGhE50QviHzg0Sv+jd/g93fN6snNlV8wNFMviq3bC69Toa+y3qJ5e7UC
 /42R7nCWdkCZJfr6E67rOaahe9TDV/LXLqPErwptOkdK8sMchaIgF+deybgTtTi/
 zGRBfjLvb5ocYBmPbeGX4mtXNRpyZ3o9I0QUyGUO4zMwFXmFwn0=
 =jpVr
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "As part of building up nova-core/nova-drm pieces we've brought in some
  rust abstractions through this tree, aux bus being the main one, with
  devres changes also in the driver-core tree. Along with the drm core
  abstractions and enough nova-core/nova-drm to use them. This is still
  all stub work under construction, to build the nova driver upstream.

  The other big NVIDIA related one is nouveau adds support for
  Hopper/Blackwell GPUs, this required a new GSP firmware update to
  570.144, and a bunch of rework in order to support multiple fw
  interfaces.

  There is also the introduction of an asahi uapi header file as a
  precursor to getting the real driver in later, but to unblock
  userspace mesa packages while the driver is trapped behind rust
  enablement.

  Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe,
  and msm being the main ones, and some changes to vsprintf.

  new drivers:
   - bring in the asahi uapi header standalone
   - nova-drm: stub driver

  rust dependencies (for nova-core):
   - auxiliary
       - bus abstractions
       - driver registration
       - sample driver
   - devres changes from driver-core
   - revocable changes

  core:
   - add Apple fourcc modifiers
   - add virtio capset definitions
   - extend EXPORT_SYNC_FILE for timeline syncobjs
   - convert to devm_platform_ioremap_resource
   - refactor shmem helper page pinning
   - DP powerup/down link helpers
   - extended %p4cc in vsprintf.c to support fourcc prints
   - change vsprintf %p4cn to %p4chR, remove %p4cn
   - Add drm_file_err function
   - IN_FORMATS_ASYNC property
   - move sitronix from tiny to their own subdir

  rust:
   - add drm core infrastructure rust abstractions
     (device/driver, ioctl, file, gem)

  dma-buf:
   - adjust sg handling to not cache map on attach
   - allow setting dma-device for import
   - Add a helper to sort and deduplicate dma_fence arrays

  docs:
   - updated drm scheduler docs
   - fbdev todo update
   - fb rendering
   - actual brightness

  ttm:
   - fix delayed destroy resv object

  bridge:
   - add kunit tests
   - convert tc358775 to atomic
   - convert drivers to devm_drm_bridge_alloc
   - convert rk3066_hdmi to bridge driver

  scheduler:
   - add kunit tests

  panel:
   - refcount panels to improve lifetime handling
   - Powertip PH128800T004-ZZA01
   - NLT NL13676BC25-03F, Tianma TM070JDHG34-00
   - Himax HX8279/HX8279-D DDIC
   - Visionox G2647FB105
   - Sitronix ST7571
   - ZOTAC rotation quirk

  vkms:
   - allow attaching more displays

  i915:
   - xe3lpd display updates
   - vrr refactor
   - intel_display struct conversions
   - xe2hpd memory type identification
   - add link rate/count to i915_display_info
   - cleanup VGA plane handling
   - refactor HDCP GSC
   - fix SLPC wait boosting reference counting
   - add 20ms delay to engine reset
   - fix fence release on early probe errors

  xe:
   - SRIOV updates
   - BMG PCI ID update
   - support separate firmware for each GT
   - SVM fix, prelim SVM multi-device work
   - export fan speed
   - temp disable d3cold on BMG
   - backup VRAM in PM notifier instead of suspend/freeze
   - update xe_ttm_access_memory to use GPU for non-visible access
   - fix guc_info debugfs for VFs
   - use copy_from_user instead of __copy_from_user
   - append PCIe gen5 limitations to xe_firmware document

  amdgpu:
   - DSC cleanup
   - DC Scaling updates
   - Fused I2C-over-AUX updates
   - DMUB updates
   - Use drm_file_err in amdgpu
   - Enforce isolation updates
   - Use new dma_fence helpers
   - USERQ fixes
   - Documentation updates
   - SR-IOV updates
   - RAS updates
   - PSP 12 cleanups
   - GC 9.5 updates
   - SMU 13.x updates
   - VCN / JPEG SR-IOV updates

  amdkfd:
   - Update error messages for SDMA
   - Userptr updates
   - XNACK fixes

  radeon:
   - CIK doorbell cleanup

  nouveau:
   - add support for NVIDIA r570 GSP firmware
   - enable Hopper/Blackwell support

  nova-core:
   - fix task list
   - register definition infrastructure
   - move firmware into own rust module
   - register auxiliary device for nova-drm

  nova-drm:
   - initial driver skeleton

  msm:
   - GPU:
       - ACD (adaptive clock distribution) for X1-85
       - drop fictional address_space_size
       - improve GMU HFI response time out robustness
       - fix crash when throttling during boot
   - DPU:
       - use single CTL path for flushing on DPU 5.x+
       - improve SSPP allocation code for better sharing
       - Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550
       - Added SAR2130P support
       - Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660
   - DP:
       - switch to new audio helpers
       - better LTTPR handling
   - DSI:
       - Added support for SA8775P
       - Added SAR2130P support
   - HDMI:
       - Switched to use new helpers for ACR data
       - Fixed old standing issue of HPD not working in some cases

  amdxdna:
   - add dma-buf support
   - allow empty command submits

  renesas:
   - add dma-buf support
   - add zpos, alpha, blend support

  panthor:
   - fail properly for NO_MMAP bos
   - add SET_LABEL ioctl
   - debugfs BO dumping support

  imagination:
   - update DT bindings
   - support TI AM68 GPU

  hibmc:
   - improve interrupt handling and HPD support

  virtio:
   - add panic handler support

  rockchip:
   - add RK3588 support
   - add DP AUX bus panel support

  ivpu:
   - add heartbeat based hangcheck

  mediatek:
   - prepares support for MT8195/99 HDMIv2/DDCv2

  anx7625:
   - improve HPD

  tegra:
   - speed up firmware loading

* tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits)
  drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr()
  drm/xe: Default auto_link_downgrade status to false
  drm/xe/guc: Make creation of SLPC debugfs files conditional
  drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue()
  drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read
  drm/i915/ptl: Use everywhere the correct DDI port clock select mask
  drm/nouveau/kms: add support for GB20x
  drm/dp: add option to disable zero sized address only transactions.
  drm/nouveau: add support for GB20x
  drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle
  drm/nouveau: add support for GB10x
  drm/nouveau/gf100-: track chan progress with non-WFI semaphore release
  drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA
  drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos
  drm/nouveau: add support for GH100
  drm/nouveau: improve handling of 64-bit BARs
  drm/nouveau/gv100-: switch to volta semaphore methods
  drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES
  drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY
  drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM
  ...
2025-05-28 09:46:39 -07:00
Umesh Nerlige Ramappa
66c8f7b435 drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value
For determining actual job execution time, save the current value of the
CTX_TIMESTAMP register rather than the value saved in LRC since the
current register value is the closest to the start time of the job.

v2: Define MI_STORE_REGISTER_MEM to fix compile error
v3: Place MI_STORE_REGISTER_MEM sorted by MI_INSTR (Lucas)

Fixes: 65921374c4 ("drm/xe: Emit ctx timestamp copy in ring ops")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250509161159.2173069-6-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 38b14233e5)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-14 09:03:29 -07:00
Kenneth Graunke
e775278cd7 drm/xe: Invalidate L3 read-only cachelines for geometry streams too
Historically, the Vertex Fetcher unit has not been an L3 client.  That
meant that, when a buffer containing vertex data was written to, it was
necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
VF L2 cachelines associated with that buffer, so the new value would be
properly read from memory.

Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
have included an "L3 Bypass Enable" bit which userspace drivers can set
to request that the vertex fetcher unit snoop L3.  However, unlike most
true L3 clients, the "VF Cache Invalidate" bit continues to only
invalidate the VF L2 cache - and not any associated L3 lines.

To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
Bit", which according to the docs, "controls the invalidation of the
Geometry streams cached in L3 cache at the top of the pipe."  In other
words, the vertex and index buffer data that gets cached in L3 when
"L3 Bypass Disable" is set.

Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
Cache Invalidate so that both L2 and L3 vertex data is invalidated.

xe is issuing VF cache invalidates too (which handles cases like CPU
writes to a buffer between GPU batches).  Because userspace may enable
L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.

Fixes significant flickering in Firefox on Meteorlake, which was writing
to vertex buffers via the CPU between batches; the missing L3 Read Only
invalidates were causing the vertex fetcher to read stale data from L3.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
Fixes: 6ef3bb6055 ("drm/xe: enable lite restore")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 61672806b5)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-07 13:16:07 -07:00
Kenneth Graunke
61672806b5
drm/xe: Invalidate L3 read-only cachelines for geometry streams too
Historically, the Vertex Fetcher unit has not been an L3 client.  That
meant that, when a buffer containing vertex data was written to, it was
necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
VF L2 cachelines associated with that buffer, so the new value would be
properly read from memory.

Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
have included an "L3 Bypass Enable" bit which userspace drivers can set
to request that the vertex fetcher unit snoop L3.  However, unlike most
true L3 clients, the "VF Cache Invalidate" bit continues to only
invalidate the VF L2 cache - and not any associated L3 lines.

To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
Bit", which according to the docs, "controls the invalidation of the
Geometry streams cached in L3 cache at the top of the pipe."  In other
words, the vertex and index buffer data that gets cached in L3 when
"L3 Bypass Disable" is set.

Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
Cache Invalidate so that both L2 and L3 vertex data is invalidated.

xe is issuing VF cache invalidates too (which handles cases like CPU
writes to a buffer between GPU batches).  Because userspace may enable
L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.

Fixes significant flickering in Firefox on Meteorlake, which was writing
to vertex buffers via the CPU between batches; the missing L3 Read Only
invalidates were causing the vertex fetcher to read stale data from L3.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
Fixes: 6ef3bb6055 ("drm/xe: enable lite restore")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-31 12:18:41 -04:00
Michal Wajdeczko
b823f80bbd drm/xe: Add MI_MATH and ALU instruction definitions
The command streamer implements an Arithmetic Logic Unit (ALU)
which supports basic arithmetic and logical operations on two
64-bit operands. Access to this ALU is thru the MI_MATH command
and sixteen General Purpose Register (GPR) 64-bit registers,
which are used as temporary storage.

Bspec: 45737, 60236 # MI
Bspec: 45525, 60132 # ALU
Bspec: 45533, 60309 # GPR
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250304162307.1866-1-michal.wajdeczko@intel.com
2025-03-12 11:37:50 +01:00
Michal Wajdeczko
4383dd88fa drm/xe: Add MI_LOAD_REGISTER_REG command definition
The MI_LOAD_REGISTER_REG command reads value from a source register
location and writes that value to a destination register location.

Bspec: 45730, 60233
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250303173522.1822-2-michal.wajdeczko@intel.com
2025-03-12 11:37:49 +01:00
Matt Roper
3b545b216c drm/xe/xe3: Recognize 3DSTATE_COARSE_PIXEL in LRC dumps
Xe3 adds a new 3DSTATE_COARSE_PIXEL state instruction as part of the
render engine LRC.  Ensure we can recognize and report this properly in
the LRC dumps.

Bspec: 65182, 73415
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250307190754.678376-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-10 10:23:12 -07:00
Daniele Ceraolo Spurio
f0c06677d1 drm/xe/pxp: Add VCS inline termination support
The key termination is done with a specific submission to the VCS
engine. This flow will be triggered in response to a termination
interrupt, whose handling is coming in a follow-up patch in the series.

v2: clean up defines and command emission code. (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-4-daniele.ceraolospurio@intel.com
2025-02-03 11:51:09 -08:00
José Roberto de Souza
88fca61ba5 Revert "drm/xe: Force write completion of MI_STORE_DATA_IMM"
This reverts commit 1460bb1fef.

In all places the MI_STORE_DATA_IMM are not followed by a read of
the same memory address in the same batch buffer and the posted writes
are flushed with PIPE_CONTROL or MI_FLUSH_DW in xe_ring_ops.c functions
so there is no need to set this register.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Fixes: 1460bb1fef ("drm/xe: Force write completion of MI_STORE_DATA_IMM")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241227183230.101334-1-jose.souza@intel.com
2025-01-02 05:35:16 -08:00
José Roberto de Souza
1460bb1fef drm/xe: Force write completion of MI_STORE_DATA_IMM
With Force write completion unset there is no guarantees of when the
write will be globally visible what is not the behavior wanted.

Fixes: 9c57bc0865 ("drm/xe/lnl: Drop force_probe requirement")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241217160732.46280-1-jose.souza@intel.com
2024-12-18 10:11:14 -08:00
Ashutosh Dixit
2f4a730fcd drm/xe/oa: Add OAR support
Add OAR support to allow userspace to execute MI_REPORT_PERF_COUNT on
render engines. Configuration batches are used to program the OAR unit, as
well as modifying the render engine context image of a specified exec queue
(to have correct register values when that context switches in).

v2: Rename/refactor xe_oa_modify_self (Umesh)
v3: Move IS_MI_LRI_CMD() into xe_oa.c (Michal)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-11-ashutosh.dixit@intel.com
2024-06-18 12:40:38 -07:00
Matthew Brost
9f46ecbb3f drm/xe: Add MI_COPY_MEM_MEM GPU instruction definitions
MI_COPY_MEM_MEM GPU instructions are used to copy ctx timestamp from a
LRC registers to another location at the beginning of every jobs
execution. Add MI_COPY_MEM_MEM GPU instruction definitions.

v2:
 - Include MI_COPY_MEM_MEM based on instruction order (Michal)
 - Fix tabs/spaces issue (Michal)
 - Use macro for DW definition (Michal)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611144053.2805091-3-matthew.brost@intel.com
2024-06-12 19:10:19 -07:00
Michal Wajdeczko
62010b3cd6 drm/xe: Move xe_gpu_commands.h file to instructions/
All other files with commands definitions are in instructions/
folder. Move xe_gpu_commands.h also there.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240508174856.1908-1-michal.wajdeczko@intel.com
2024-05-09 21:17:57 +02:00
Matt Roper
b9b7db4908 drm/xe: Add LRC parsing for more GPU instructions
The LRCs on some of our newer platforms appear to contain a few GPU
instructions that weren't handled in our LRC parser.  Add the relevant
instruction names and opcodes so that our debugfs LRC dumps will
properly indicate what these are.

Bspec: 55866, 64848, 46931
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240222184009.6857-2-matthew.d.roper@intel.com
2024-02-29 12:39:16 -08:00
Michal Wajdeczko
6901f73269 drm/xe: Add command MI_LOAD_REGISTER_MEM
We will need this shortly during context state preparation.

Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-2-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2023-12-21 16:31:29 -05:00
Daniele Ceraolo Spurio
0881cbe040 drm/xe/gsc: Query GSC compatibility version
The version is obtained via a dedicated MKHI GSC HECI command.
The compatibility version is what we want to match against for the GSC,
so we need to call the FW version checker after obtaining the version.

Since this is the first time we send a GSC HECI command via the GSCCS,
this patch also introduces common infrastructure to send such commands
to the GSC. Communication with the GSC FW is done via input/output
buffers, whose addresses are provided via a GSCCS command. The buffers
contain a generic header and a client-specific packet (e.g. PXP, HDCP);
the clients don't care about the header format and/or the GSCCS command
in the batch, they only care about their client-specific header. This
patch therefore introduces helpers that allow the callers to
automatically fill in the input header, submit the GSCCS job and decode
the output header, to make it so that the caller only needs to worry about
their client-specific input and output messages.

v3: squash of 2 separate patches ahead of merge, so that the common
functions and their first user are added at the same time

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.Com> #v1
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:06 -05:00
Daniele Ceraolo Spurio
dd0e89e5ed drm/xe/gsc: GSC FW load
The GSC FW must be copied in a 4MB stolen memory allocation, whose GGTT
address is then passed as a parameter to a dedicated load instruction
submitted via the GSC engine.

Since the GSC load is relatively slow (up to 250ms), we perform it
asynchronously via a worker. This requires us to make sure that the
worker has stopped before suspending/unloading.

Note that we can't yet use xe_migrate_copy for the copy because it
doesn't work with stolen memory right now, so we do a memcpy from the
CPU side instead.

v2: add comment about timeout value, fix GSC status checking
    before load (John)

Bspec: 65306, 65346
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:06 -05:00
Matt Roper
fb24b858a2 drm/xe/xe2: Update SVG state handling
As with DG2/MTL, Xe2 also fails to emit instruction headers for SVG
state instructions if no explicit state has been set.  The SVG part of
the LRC is nearly identical to DG2/MTL; the only change is that
3DSTATE_DRAWING_RECTANGLE has been replaced by
3DSTATE_DRAWING_RECTANGLE_FAST, so we can just re-use the same state
table and handle that single instruction when we encounter it.

Bspec: 65182
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20231025151732.3461842-8-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:22 -05:00
Matt Roper
72ac304769 drm/xe: Emit SVG state on RCS during driver load on DG2 and MTL
When recording the default LRC, the expectation is that the hardware's
original state settings (both register and instruction) will be written
out to the LRC upon first context switch.  For many 3DSTATE_* state
instructions that don't truly have "default" values, this translates to
a simple instruction header (opcodes + dword length) being written to
the LRC, followed by an appropriate number of blank dwords as a place
holder.  When userspace creates a context (which starts as a copy of the
default LRC), they'll generally emit real 3DSTATE_* as part of their
initialization to select the settings they desire.  If they don't emit
one of the 3DSTATE instructions, then the zeroed dwords that remain in
their LRC image generally translate to various state remaining disabled.
This will either be what userspace wants or will lead to very
reproducible and easily-debugged problems (rendering glitches, engine
hangs).

It turns out that a subset of the 3DSTATE instructions, specifically
those belonging to the SVG (State Variable - Global) unit, are not only
emitting 0's for the instruction's "body" dwords, but also for the
instruction header dword if no specific state has been explicitly set
before context switch.  This means that when the hardware switches to a
context that hasn't explicitly provided an appropriate state setting,
the hardware will just see a sequence of NOOPs in the spot reserved for
that 3DSTATE instruction while executing the LRC, and the actual
hardware state setting will unintentionally inherit the configuration
used by the previously running context.  Now when userspace makes a
mistake and forgets to emit an important state instruction they no
longer get consistent, easily-reproducible corruption/hangs, but rather
erratic behavior where the presence/absence of a problem depends on what
other workloads are running on the system and what order the contexts
are scheduled on the engine.

A specific example of this that came up recently related to mesh shading
The OpenGL driver was not specifically emitting a 3DSTATE_MESH_CONTROL
to disable mesh shading at context init, so on context switch, mesh
shading would either be on or off depending on what the previous context
had been doing.  Vulkan apps _were_ enabling mesh shading, so running a
Vulkan app and then context switching to an OpenGL app resulted in mesh
shading still unexpectedly being enabled during OpenGL operation, and
since other Mesh-related state was not properly initialized for that
context a GPU hang was seen.  Due to the specific ordering requirements
(Vulkan app runs first, followed by OpenGL app), it took additional
debug effort to track down the cause of the problem.

There are various workarounds related to this behavior, with current
implementations handled in the userspace drivers.  E.g., Wa_14019789679
and Wa_22018402687.  However it's been suggested that the kernel driver
can help simplify things here by emitting zeroed SVG state with proper
instruction headers as part of our default context creation (i.e., at
the same point we apply LRC workarounds).  This will help ensure that
any future cases where a userspace driver does not emit an important
state setting will result in consistent behavior.

Bspec: 46261
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20231025151732.3461842-7-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:22 -05:00
Matt Roper
6de492ae5f drm/xe/debugfs: Include GFXPIPE commands in LRC dump
RCS and CCS engines include several non-register gfxpipe commands in
their LRC images.  Include these in the dump output so that we can see
exactly what's inside the context snapshot.

v2:
 - Include raw instruction header in output
 - Add 3DSTATE_AMFS_TEXTURE_POINTERS and 3DSTATE_MONOFILTER_SIZE.  The
   first was supposed to be removed in Xe_HPG, and the second by
   gen12, but both still show up in the RCS LRC.

v3:
 - Sanity check that we don't have numdw > remaining_dw.  (Lucas)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-14-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:01 -05:00
Matt Roper
0f60547f7d drm/xe/debugfs: Add dump of default LRCs' MI instructions
For non-RCS engines, nearly all of the LRC state is composed of MI
instructions (specifically MI_LOAD_REGISTER_IMM).  Providing a dump
interface allows us to verify that the context image layout matches
what's documented in the bspec, and also allows us to check whether LRC
workarounds are being properly captured by the default state we record
at startup.

For now, the non-MI instructions found in the RCS and CCS engines will
dump as "unknown;" parsing of those will be added in a follow-up patch.

v2:
 - Add raw instruction header as well as decoded meaning.  (Lucas)
 - Check that num_dw isn't greater than remaining_dw for instructions
   that have a "# dwords" field.  (Lucas)
 - Clarify comment about skipping over ppHWSP.  (Lucas)

Bspec: 64993
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:00 -05:00
Matt Roper
0134f130e7 drm/xe: Extract MI_* instructions to their own header
Extracting the common MI_* instructions that can be used with any engine
to their own header will make it easier as we add additional engine
instructions in upcoming patches.

Also, since the majority of GPU instructions (both MI and non-MI) have
a "length" field in bits 7:0 of the instruction header, a common define
is added for that.  Instruction-specific length fields are still defined
for special case instructions that have larger/smaller length fields.

v2:
 - Use "instr" instead of "inst" as the short form of "instruction"
   everywhere.  (Lucas)
 - Include xe_reg_defs.h instead of the i915 compat header.  (Lucas)

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-12-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:00 -05:00