linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-08-30 20:07:36 +00:00

Author	SHA1	Message	Date
Raag Jadav	0ea07b6951	drm/xe/pm: Wire up suspend/resume for I2C controller Wire up suspend/resume handles for I2C controller to match its power state with SGUnit. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-5-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-10 10:19:41 -04:00
Heikki Krogerus	f0e53aadd7	drm/xe: Support for I2C attached MCUs Adding adaption/glue layer where the I2C host adapter (Synopsys DesignWare I2C adapter) and the I2C clients (the microcontroller units) are enumerated. The microcontroller units (MCU) that are attached to the GPU depend on the OEM. The initially supported MCU will be the Add-In Management Controller (AMC). Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo fixed the co-developed tags and SPDX format in the .c file]	2025-07-10 10:19:41 -04:00
Michal Wajdeczko	621a422079	drm/xe/guc: Don't allocate temporary policies object Since we are already using reusable buffer objects from the GuC buffer cache, we can directly write into their CPU pointers and spare unnecessary temporary allocation. While around, also make sure to clear obtained buffer, to avoid sending some stale data. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250702142504.1656-1-michal.wajdeczko@intel.com	2025-07-10 16:00:13 +02:00
Michal Wajdeczko	1fbe023d30	drm/xe/pf: Print configuration KLVs using debug printer While we print VF's configuration KLVs only under DEBUG_SRIOV config, we should be doing it at debug level, not info level. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250703145709.1832-1-michal.wajdeczko@intel.com	2025-07-10 14:26:13 +02:00
Michal Wajdeczko	89cd027c94	drm/xe/pf: Print runtime registers using debug printer While we already print VF's runtime registers only under DEBUG_SRIOV config, we should be still doing it at debug level, not info. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250604190021.725-2-michal.wajdeczko@intel.com	2025-07-10 14:26:11 +02:00
Raag Jadav	cdc36b66cd	drm/xe: Expose fan control and voltage regulator version Add sysfs attributes for late binding features which expose bound version to the user. v2: Rework attribute and macro naming (Badal) v3: Drop fancy formatting (Rodrigo) v4: Form version string using local variables (Rodrigo) Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250709164224.2676086-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-09 18:25:22 -04:00
Shuicheng Lin	017ef1228d	drm/xe: Release runtime pm for error path of xe_devcoredump_read() xe_pm_runtime_put() is missed to be called for the error path in xe_devcoredump_read(). Add function description comments for xe_devcoredump_read() to help understand it. v2: more detail function comments and refine goto logic (Matt) Fixes: `c4a2e5f865` ("drm/xe: Add devcoredump chunking") Cc: stable@vger.kernel.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707004911.3502904-6-shuicheng.lin@intel.com	2025-07-08 15:11:37 -07:00
Shuicheng Lin	8ce560d8e1	drm/xe: Remove unused code in devcoredump_snapshot() The deleted code is no longer needed because patch "drm/xe/guc: Plumb GuC-capture into dev coredump" has removed the related usage code. Remove the code to tidy up the function. v2: s/bacause/because Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707004911.3502904-5-shuicheng.lin@intel.com	2025-07-08 15:11:36 -07:00
Zhanjun Dong	b2c4ac219f	drm/xe/uc: Disable GuC communication on hardware initialization error Disable GuC communication on Xe micro controller hardware initialization error. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4917 Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707231108.3217573-1-zhanjun.dong@intel.com	2025-07-08 14:56:49 -07:00
Shuicheng Lin	83dcee1785	drm/xe/pm: Restore display pm if there is error after display suspend xe_bo_evict_all() is called after xe_display_pm_suspend(). So if there is error with xe_bo_evict_all(), display pm should be restored. Fixes: `51462211f4` ("drm/xe/pxp: add PXP PM support") Fixes: `cb8f81c175` ("drm/xe/display: Make display suspend/resume work on discrete") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250708035424.3608190-2-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-08 16:05:07 -04:00
Matt Atwood	94de1dfd47	drm/xe/ptl: Drop force_probe requirement Panther Lake has proven to be stable through testing and use. Remove the force_probe requirement and enable the platform by default. Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250707201959.319406-1-matthew.s.atwood@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-08 09:55:35 -04:00
Simona Vetter	69d09a2609	Merge tag 'drm-intel-next-2025-07-04' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next drm/i915 feature pull #2 for v6.17: Features and functionality: - Add drm_panic support for both i915 and xe drivers (Jocelyn Falempe) - Add initial flip queue implementation, disabled by default, for LNL and PTL (Ville) - Add support for Wildcat Lake (WCL) display, version 30.02 (Matt Roper, Matt Atwood, Dnyaneshwar) - Extend drm_panel and follower support to DDI eDP (Arun) Refactoring and cleanups: - Make all global state objects opaque (Jani) - Move display works to display specific unordered workqueue (Luca) - Add and use struct drm_device based pcode interface (Jani, Lucas) - Use clamp() instead of max()+min() combo (Ankit) - Simplify wait for power well disable (Jani) - Various stylistics cleanups and renames (Jani) Fixes: - Deal with loss of pipe DMC state (Ville) - Fix PTL HDCP2 stream status check (Suraj) - Add workaround for ADL-P DKL PHY DP and HDMI (Nemesa) - Fix skl_print_wm_changes() stack usage with KMSAN (Arnd Bergmann) - Fix PCON capability reads on non-branch devices (Chaitanya) - Fix which platforms have ultra joiner (Ankit) DRM core changes: - Add ttm_bo_kmap_try_from_panic() for xe drm_panic support (Jocelyn Falempe) - Add private pointer to struct drm_scanout buffer for xe/i915 drm_panic support (Jocelyn Falempe) Merges: - Backmerge drm-next for drm_panel and xe changes (Jani) Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6d728bf6ef23681b00dfbc7da9aeae41042dee02@intel.com	2025-07-08 14:07:44 +02:00
Matthew Auld	fee58ca135	drm/xe/bmg: fix compressed VRAM handling There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user. To fix this pass along use_comp_pat into emit_pte() on the src side, to indicate that compression should be considered. v2 (Jonathan): tweak the commit message Fixes: `523f191cc0` ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com (cherry picked from commit `f7a2fd776e`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-07 20:57:17 -07:00
Matthew Brost	daa099fed5	Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2" This reverts commit `fe0154cf82`. Seeing some unexplained random failures during LRC context switches with indirect ring state enabled. The failures were always there, but the repro rate increased with the addition of WA BB as a separate BO. Commit `3a1edef8f4` ("drm/xe: Make WA BB part of LRC BO") helped to reduce the issues in the context switches, but didn't eliminate them completely. Indirect ring state is not required for any current features, so disable for now until failures can be root caused. Cc: stable@vger.kernel.org Fixes: `fe0154cf82` ("drm/xe/xe2: Enable Indirect Ring State support for Xe2") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `03d85ab36b`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-07 20:57:17 -07:00
Matthew Brost	c9a95dbe06	drm/xe: Allocate PF queue size on pow2 boundary CIRC_SPACE does not work unless the size argument is a power of 2, allocate PF queue size on power of 2 boundary. Cc: stable@vger.kernel.org Fixes: `3338e4f90c` ("drm/xe: Use topology to determine page fault queue size") Fixes: `29582e0ea7` ("drm/xe: Add page queue multiplier") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com (cherry picked from commit `491b978312`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-07 20:57:17 -07:00
Michal Wajdeczko	705a412a36	drm/xe/pf: Clear all LMTT pages on alloc Our LMEM buffer objects are not cleared by default on alloc and during VF provisioning we only setup LMTT PTEs for the actually provisioned LMEM range. But beyond that valid range we might leave some stale data that could either point to some other VFs allocations or even to the PF pages. Explicitly clear all new LMTT page to avoid the risk that a malicious VF would try to exploit that gap. While around add asserts to catch any undesired PTE overwrites and low-level debug traces to track LMTT PT life-cycle. Fixes: `b1d2040582` ("drm/xe/pf: Introduce Local Memory Translation Table") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com (cherry picked from commit `3fae6918a3`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-07 20:57:07 -07:00
Daniele Ceraolo Spurio	4c93e2c341	drm/xe/ptl: Add HuC FW definition for PTL Add the unversioned define for the PTL HuC FW. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-15-daniele.ceraolospurio@intel.com	2025-07-07 10:37:12 -07:00
Daniele Ceraolo Spurio	5cdb71d3b0	drm/xe/ptl: Add GuC FW definition for PTL The first official GuC relase for PTL is 70.47.0, which maps to API version 1.22.4. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-14-daniele.ceraolospurio@intel.com	2025-07-07 10:37:12 -07:00
Julia Filipchuk	0b64addcae	drm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2 UAPI compatibility version 1.22.2 Resolves various bugs. Recommend newer version. Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250626182805.1701096-13-daniele.ceraolospurio@intel.com	2025-07-07 10:37:01 -07:00
Matthew Auld	f7a2fd776e	drm/xe/bmg: fix compressed VRAM handling There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user. To fix this pass along use_comp_pat into emit_pte() on the src side, to indicate that compression should be considered. v2 (Jonathan): tweak the commit message Fixes: `523f191cc0` ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com	2025-07-04 12:31:06 +01:00
Dave Airlie	17d081ef84	drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - bridge: More reference counting - dp: Implement backlight control helpers - fourcc: Add half-float and 32b float formats, RGB161616, BGR161616 - mipi-dsi: Drop MIPI_DSI_MODE_VSYNC_FLUSH flag - ttm: Improve eviction Driver Changes: - i915: Use backlight control helpers for eDP - tidss: Add AM65x OLDI bridge support - panels: - panel-edp: Add CMN N116BCJ-EAK support - raydium-rm67200: misc cleanups, optional reset - new panel: DJN HX83112B -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaGY7aQAKCRAnX84Zoj2+ dncVAYC+7mGk8UDugcIEn51fCLxv92DKeMRq/qsmGPz/x5c3TaXX7sN0/FLo91ek bLrwR9ABfjx+Qz+jO21LuwRBxgHv7XH5Bk1sPay1n7+TokndCj55+YG8vCbXISsk gsxtheA8Ig== =ybn3 -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-07-03' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - bridge: More reference counting - dp: Implement backlight control helpers - fourcc: Add half-float and 32b float formats, RGB161616, BGR161616 - mipi-dsi: Drop MIPI_DSI_MODE_VSYNC_FLUSH flag - ttm: Improve eviction Driver Changes: - i915: Use backlight control helpers for eDP - tidss: Add AM65x OLDI bridge support - panels: - panel-edp: Add CMN N116BCJ-EAK support - raydium-rm67200: misc cleanups, optional reset - new panel: DJN HX83112B Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250703-chirpy-lilac-dalmatian-2c5838@houat	2025-07-04 11:54:31 +10:00
Matthew Brost	03d85ab36b	Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2" This reverts commit `fe0154cf82`. Seeing some unexplained random failures during LRC context switches with indirect ring state enabled. The failures were always there, but the repro rate increased with the addition of WA BB as a separate BO. Commit `3a1edef8f4` ("drm/xe: Make WA BB part of LRC BO") helped to reduce the issues in the context switches, but didn't eliminate them completely. Indirect ring state is not required for any current features, so disable for now until failures can be root caused. Cc: stable@vger.kernel.org Fixes: `fe0154cf82` ("drm/xe/xe2: Enable Indirect Ring State support for Xe2") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-03 12:23:45 -07:00
Tomasz Lis	7eba6a80fe	drm/xe/vf: Make multi-GT migration less error prone There is a remote chance that after migration, some GTs will not send the MIGRATED interrupt, or due to current VF KMD state the interrupt will not lead to marking the GT for recovery. Requiring IRQs from all GTs before starting migration introduces the possibility that the process will get stalled due to one GuC. One could argue it is also waste of time to wait for all IRQs, but we should get them all IRQs as soon as VGPU starts, so that's not really an impactful argument. Still, not waiting for all GTs makes it easier to handle situations: * where one GuC IRQ is missing * where state before probe is unclean - getting MIGRATED IRQ as soon as interrupts are enabled * where multiple migrations happen close to each other To help with these cases, this patch alters the post-migration recovery so that recovery task is started as soon as one GuC IRQ is handled, and other GTs are included in recovery later as the subsequent IRQs are serviced. The post-migration recovery can now be called for any selection of GTs, and it will perform recovery on all GTs for which IRQs have arrived, even multiple times if necessary. v2: Typos and style fixes v3: Transferring gt_flags by value rather than reference to last function where it is used Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250630152155.195648-1-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-07-03 21:05:21 +02:00
Matthew Brost	491b978312	drm/xe: Allocate PF queue size on pow2 boundary CIRC_SPACE does not work unless the size argument is a power of 2, allocate PF queue size on power of 2 boundary. Cc: stable@vger.kernel.org Fixes: `3338e4f90c` ("drm/xe: Use topology to determine page fault queue size") Fixes: `29582e0ea7` ("drm/xe: Add page queue multiplier") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com	2025-07-03 10:19:35 -07:00
Michal Wajdeczko	3fae6918a3	drm/xe/pf: Clear all LMTT pages on alloc Our LMEM buffer objects are not cleared by default on alloc and during VF provisioning we only setup LMTT PTEs for the actually provisioned LMEM range. But beyond that valid range we might leave some stale data that could either point to some other VFs allocations or even to the PF pages. Explicitly clear all new LMTT page to avoid the risk that a malicious VF would try to exploit that gap. While around add asserts to catch any undesired PTE overwrites and low-level debug traces to track LMTT PT life-cycle. Fixes: `b1d2040582` ("drm/xe/pf: Introduce Local Memory Translation Table") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com	2025-07-03 16:38:45 +02:00
Matthew Brost	5459e16b21	drm/xe: Do not wedge device on killed exec queues When a user closes an exec queue or interrupts an app with Ctrl-C, this does not warrant wedging the device in mode 2. Avoid this by skipping the wedge check for killed exec queues in the TDR and LR exec queue cleanup worker. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250624174103.2707941-1-matthew.brost@intel.com (cherry picked from commit `5a2f117a80`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-03 06:42:22 -07:00
Daniele Ceraolo Spurio	d008fc65eb	drm/xe: Extend WA 14018094691 to BMG This WA is applicable to BMG as well. Note that this is a GSC WA and we don't load the GSC on BMG, so extending the WA to BMG won't do anything right now. However, it helps future-proof the driver so that if we ever turn the GSC on we won't have to remember to extend this WA. v2: don't use VERSION_RANGE from 2001 to 2004 (Matt) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250613231128.1261815-2-daniele.ceraolospurio@intel.com (cherry picked from commit `1a5ce0c5b9`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-03 06:40:50 -07:00
Riana Tauro	b9329f5167	drm/xe/xe_pmu: Validate gt in event supported Validate gt instead of checking gt_id is lesser than max gts per tile Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250630093741.2435281-1-riana.tauro@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:09:11 -07:00
Matt Roper	d4eb4a0102	drm/xe/xe_query: Use separate iterator while filling GT list The 'id' value updated by for_each_gt() is the uapi GT ID of the GTs being iterated over, and may skip over values if a GT is not present on the device. Use a separate iterator for GT list array assignments to ensure that the array will be filled properly on future platforms where index in the GT query list may not match the uapi ID. v2: - Include the missing increment of the iterator. (Jonathan) Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-16-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	457123d5a0	drm/xe: Don't compare GT ID to GT count when determining valid GTs On current platforms with multiple GTs, all of the GT IDs are consecutive; as a result we know that the GT IDs range from 0 to gt_count-1 and can determine if a GT ID is valid by comparing against the count. The consecutive nature of GT IDs may not hold true on future platforms if/when we have platforms that are both multi-tile and have multiple GTs within each tile. Once such platforms exist, it's quite possible that we could wind up with something like a GT list composed of IDs 0, 2, and 3 with no GT 1 (which would be a 2-tile platform with media only on the second tile). To future-proof the code we should stop comparing against the GT count to determine whether a GT ID is valid or not. Instead we should do an actual lookup of the ID to determine whether the GT exists. This also means that our GT loop macro should not end at the GT count, but should rather examine the entire space up to (# of tiles) * (max GT per tile) to ensure it doesn't stop prematurely. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-15-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	bd6a4b9785	drm/xe: Assign GT IDs properly on multi-tile + multi-GT platforms Although "multi-tile" and "multiple GTs per tile" are mutually-exclusive characteristics on all of our platforms today, this may not always be true. Assign GT IDs according to xe->info.max_gt_per_tile in a way that should work even if future platforms have different configurations. This patch should not change the behavior of current platforms; it only future-proofs for potential future designs. v2: - Re-calculate gt_count if tile count gets reduced by MTCFG. (PVC CI) Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-14-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	fb72cd2104	drm/xe/tests/pci: Ensure all platforms have a valid GT/tile count Add a simple kunit test to ensure each platform's GT per tile count is non-zero and does not exceed the global XE_MAX_GT_PER_TILE definition. We need to move 'struct xe_subplatform_desc' from the .c file to the types header to ensure it is accessible from the kunit test. v2: - Rebase on latest xe_pci test rework from Michal and convert to a parameterized test that runs on each PCI ID supported by the driver. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	f8e0f4c526	drm/xe: Track maximum GTs per tile on a per-platform basis Today all of our platforms fall into one of three cases: * Single tile platforms with a single (primary) GT * Single tile platforms with two GTs (primary + media) * Two-tile platforms with a single GT (primary) in each Our numbering of GTs has been a bit inconsistent between platforms (e.g., GT1 is the media GT on some platforms, but the second tile's primary GT on others). In the future we'll likely have platforms that are both multi-tile and multi-GT, which will make the situation more confusing. We could also wind up with more than just two types of GTs at some point in the future. Going forward we should standardize the way we assign uapi GT IDs to internal GT structures. Let's declare that for userspace GT ID n, GT[n]'s tile = n / (max gt per tile) GT[n]'s slot within tile = n % (max gt per tile) We don't want the GT numbering to change for any of our current platforms since the current IDs are part of our ABI contract with userspace so this means we should track the 'max gt per tile' value on a per-platform basis rather than just using a single value across the driver. Encode this into device descriptors in xe_pci.c and use the per-platform number for various checks in the code. Constant XE_MAX_GT_PER_TILE will remain just as the maximum across all platforms for easy of sizing array allocations. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-12-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Matt Roper	0fc957c20d	drm/xe: Export xe_step_name for kunit tests xe_step_name() is used by xe_assert(), so adding assertions to functions like xe_device_get_gt() will result in ERROR: modpost: "xe_step_name" [drivers/gpu/drm/xe/tests/xe_test.ko] undefined! while building the kunit tests. Export xe_step_name to avoid these build failures when adding assertions. Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-11-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-07-02 16:08:54 -07:00
Michal Wajdeczko	6797906074	drm/xe/hw_engine_group: Fix potential leak If we fail to allocate a workqueue we will leak kzalloc'ed group object since it was designed to be kfree'ed in the drmm cleanup action, but we didn't have a chance to register this action yet. To avoid this leak allocate a group object using drmm_kzalloc() and start using predefined drmm action to release the workqueue. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250627184143.1480-1-michal.wajdeczko@intel.com	2025-07-02 13:31:30 +02:00
Harry Austen	aa18d5769f	drm/xe: Allow dropping kunit dependency as built-in Fix Kconfig symbol dependency on KUNIT, which isn't actually required for XE to be built-in. However, if KUNIT is enabled, it must be built-in too. Fixes: `08987a8b68` ("drm/xe: Fix build with KUNIT=m") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Harry Austen <hpausten@protonmail.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-2-756fe5cd56cf@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `a559434880`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 13:59:02 -07:00
Lucas De Marchi	de6acfdc39	drm/xe: Fix kconfig prompt The xe driver is the official driver for Intel Xe2 and later, while maintaining experimental support for earlier GPUs. Reword the help message accordingly. Reviewed-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://lore.kernel.org/r/20250611-xe-kconfig-help-v1-1-8bcc6b47d11a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `1488a3089d`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 13:58:50 -07:00
Vinay Belgaumkar	84c0b4a006	drm/xe/bmg: Update Wa_22019338487 Limit GT max frequency to 2600MHz and wait for frequency to reduce before proceeding with a transient flush. This is really only needed for the transient flush: if L2 flush is needed due to 16023588340 then there's no need to do this additional wait since we are already using the bigger hammer. v2: Use generic names, ensure user set max frequency requests wait for flush to complete (Rodrigo) v3: - User requests wait via wait_var_event_timeout (Lucas) - Close races on flush + user requests (Lucas) - Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt rather than root gt (Lucas) v4: - Only apply the freq reducing part if a TDF is needed: L2 flush trumps the need for waiting a lower frequency Fixes: `aaa08078e7` ("drm/xe/bmg: Apply Wa_22019338487") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `deea6a7d6d`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 13:55:19 -07:00
Vinay Belgaumkar	a5c7dcdd96	drm/xe/bmg: Update Wa_14022085890 Set GT min frequency to 1200Mhz once driver load is complete. v2: Review comments (Rodrigo) v3: Apply Wa earlier so user_req_min is not clobbered. v4: Apply to all GTs (Lucas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-3-94ba5dcc1e30@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `bdde16c9ac`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 13:55:09 -07:00
Lucas De Marchi	a1eec6cae9	drm/xe: Split xe_device_td_flush() xe_device_td_flush() has 2 possible implementations: an entire L2 flush or a transient flush, depending on WA 16023588340. Make this clear by splitting the function so it calls each of them. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-3-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `5e300ed8a5`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 13:53:57 -07:00
Lucas De Marchi	4cec9099b9	drm/xe/xe_guc_pc: Lock once to update stashed frequencies pc_set_mert_freq_cap() currently lock()/unlock() the mutex multiple times to stash the current frequencies. It's not a problem since xe_guc_pc_restore_stashed_freq() is guaranteed to be called only later in the init sequence. However, now that we have _locked() variants for this functions, use them and avoid potential issues when called from other places or using the same pattern. While at it, prefer and early return for the WA check to reduce indentation. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-2-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `d878c97daa`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 13:53:26 -07:00
Lucas De Marchi	d8390768dc	drm/xe/guc_pc: Add _locked variant for min/max freq There are places in which the getters/setters are called one after the other causing a multiple lock()/unlock(). These are not currently a problem since they are all happening from the same thread, but there's a race possibility as calls are added outside of the early init when the max/min and stashed values need to be correlated. Add the _locked() variants to prepare for that. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-1-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `1beae9aa2b`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 13:53:16 -07:00
Matthew Brost	afcad92411	drm/xe: Make WA BB part of LRC BO No idea why, but without this GuC context switches randomly fail when running IGTs in a loop. Need to follow up why this fixes the aforementioned issue but can live with a stable driver for now. Fixes: `617d824c53` ("drm/xe: Add WA BB to capture active context utilization") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250612031925.4009701-1-matthew.brost@intel.com (cherry picked from commit `3a1edef8f4`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 13:49:38 -07:00
Jia Yao	2d5cff2b4b	drm/xe: Fix out-of-bounds field write in MI_STORE_DATA_IMM According to Bspec, bits 0~9 of MI_STORE_DATA_IMM must not exceed 0x3FE. The macro MI_SDI_NUM_QW(x) evaluates to 2 * x + 1, which means the condition 2 * x + 1 <= 0x3FE must be satisfied. Therefore, the maximum valid value for x is 0x1FE, not 0x1FF. v2 - Replace 0x1fe with macro MAX_PTE_PER_SDI (Auld, Matthew & Patelczyk, Maciej) v3 - Change macro MAX_PTE_PER_SDI from 0x1fe to 0x1feU (De Marchi, Lucas) Bspec: 60246 Fixes: `9c44fd5f6e` ("drm/xe: Add migrate layer functions for SVM support") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Brian3 Nguyen <brian3.nguyen@intel.com> Cc: Alex Zuo <alex.zuo@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Suggested-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Link: https://lore.kernel.org/r/20250612224620.161105-1-jia.yao@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `c038bdba98`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 13:04:00 -07:00
Tvrtko Ursulin	a34ba68d09	drm/xe: Consolidate LRC offset calculations Attempt to consolidate the LRC offsets calculations by aligning the recently added wa_bb_offset with the naming scheme in the file and also change the size stored in struct xe_lrc to not include the ring buffer. The former makes it somewhat visually easier to follow the layout of the various logical blocks stored in the LRC bo, while the latter reduces the number of sprinkled around calculations. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250630124711.8209-2-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 08:35:23 -07:00
Maarten Lankhorst	5ac5e19197	drm/xe: Fix typo in Kconfig doubut -> doubt. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250627074119.347826-1-dev@lankhorst.se Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-01 08:01:40 -07:00
Harry Austen	a559434880	drm/xe: Allow dropping kunit dependency as built-in Fix Kconfig symbol dependency on KUNIT, which isn't actually required for XE to be built-in. However, if KUNIT is enabled, it must be built-in too. Fixes: `08987a8b68` ("drm/xe: Fix build with KUNIT=m") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Harry Austen <hpausten@protonmail.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-2-756fe5cd56cf@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-28 07:11:12 -07:00
Lucas De Marchi	05f3af5905	drm/xe: Fix conflicting intel_pcode_* symbols If CONFIG_DRM_XE_DISPLAY is set, the xe module can only be built as module to avoid duplicate symbols from i915. The interface for pcode was added without considering that, so the build breaks if both xe and i915 are built-in. Since the intel_pcode_* functions should only be called from the display side (xe side should call the xe interface directly) and there's already a protection in Kconfig to avoid the problematic configuration, ifdef it out in case CONFIG_DRM_XE_DISPLAY is disabled. Closes: https://lore.kernel.org/r/3667a992-a24b-4e49-aab2-5ca73f2c0a56@infradead.org Fixes: `d9465cc8ac` ("drm/xe/pcode: add struct drm_device based interface") Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-1-756fe5cd56cf@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-28 06:56:58 -07:00
Matthew Brost	ec9223b49a	drm/xe: Drop bo->size bo->size is redundant because the base GEM object already has a size field with the same value. Drop bo->size and use the base GEM object’s size instead. While at it, introduce xe_bo_size() to abstract the BO size. v2: - Fix typo in kernel doc (Ashutosh) - Fix kunit (CI) - Fix line wrap (Checkpatch) v3: - Fix sriov build (CI) v4: - Fix display build (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com	2025-06-27 14:52:31 -07:00
Daniele Ceraolo Spurio	9c7d93a8f1	drm/xe/guc: Enable the Dynamic Inhibit Context Switch optimization The Dynamic Inhibit Context Switch is an optimization aimed at reducing the amount of time the HW is stuck waiting on an unsatisfied semaphore. When this optimization is enabled, the GuC will dynamically modify the CTX_CTRL_INHIBIT_SYN_CTX_SWITCH in the CTX_CONTEXT_CONTROL register of LRCs to enable immediate switching out on an unsatisfied semaphore wait when multiple contexts are competing for time on the same engine. This feature is available on recent HW from GuC 70.40.1 onwards and it is enabled via a per-VF feature opt-in. v2: rebase v3: switch to using guc_buf_cache instead of dedicated alloc v4: add helper to check for feature availability (Michal), don't enable if multi-lrc is possible. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250625205405.1653212-4-daniele.ceraolospurio@intel.com	2025-06-27 11:08:50 -07:00
Daniele Ceraolo Spurio	a7ffcea863	drm/xe/guc: Enable extended CAT error reporting On newer HW (Xe2 onwards + PVC) it is possible to get extra information when a CAT error occurs, specifically a dword reporting the error type. To enable this extra reporting, we need to opt-in with the GuC, which is done via a specific per-VF feature opt-in H2G. On platforms where the HW does not support the extra reporting, the GuC will set the type to 0xdeadbeef, so we can keep the code simple and opt-in to the feature on every platform and then just discard the data if it is invalid. Note that on native/PF we're guaranteed that the opt in is available because we don't support any GuC old enough to not have it, but if we're a VF we might be running on a non-XE PF with an older GuC, so we need to handle that case. We can re-use the invalid type above to handle this scenario the same way as if the feature was not supported in HW. Given that this patch is the first user of the guc_buf_cache on native and VF, it also extends that feature to non-PF use-cases. v2: simpler print for the error type (John), rebase v3: use guc_buf_cache instead of new alloc, simpler doc (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250625205405.1653212-3-daniele.ceraolospurio@intel.com	2025-06-27 11:08:40 -07:00
Ville Syrjälä	470022b5c2	drm/i915/flipq: Provide the nuts and bolts code for flip queue Provide the lower level code for PIPEDMC based flip queue. We'll use the so called semi-full flip queue mode where the PIPEDMC will start the provided DSB on a scanline a little ahead of the vblank. We need to program the triggering scanline early enough so that the DSB has enough time to complete writing all the double buffered registers before they get latched (at start of vblank). The firmware implements several queues: - 3 "plane queues" which execute a single DSB per entry - 1 "general queue" which can apparently execute 2 DSBs per entry - 1 vestigial "fast queue" that replaced the "simple flip queue" on ADL+, but this isn't supposed to be used due to issues. But we only need a single plane queue really, and we won't actually use it as a real queue because we don't allow queueing multiple commits ahead of time. So the whole thing is perhaps useless. I suppose there migth be some power saving benefits if we would get the flip scheduled by userspace early and then could keep some hardware powered off a bit longer until the DMC kicks off the flipq programming. But that is pure speculation at this time and needs to be proven. The code to hook up the flip queue into the actual atomic commit path will follow later. TODO: need to think how to do the "wait for DMC firmware load" nicely need to think about VRR and PSR etc. v2: Don't write DMC_FQ_W2_PTS_CFG_SEL on pre-lnl Don't oops at flipq init if there is no dmc v3: Adapt to PTL+ flipq changes (different queue entry layout, different trigger event, need VRR TG) Use the actual CDCLK frequency Ask the DSB code how long things are expected to take v3: Adjust the cdclk rounding (docs are 100% vague, Windows rounds like this) Initialize some undocumented magic DMC variables on PTL v4: Use PIPEDMC_FQ_STATUS for busy check (the busy bit in PIPEDMC_FQ_CTRL is apparently gone on LNL+) Based the preempt timeout on the max exec time Preempt before disabling the flip queue Order the PIPEDMC_SCANLINECMP* writes a bit more carefully Fix some typos v5: Try to deal with some clang-20 div-by-zero false positive (Nathan) Add some docs (Jani) Cc: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> epr Link: https://patchwork.freedesktop.org/patch/msgid/20250624170049.27284-5-ville.syrjala@linux.intel.com	2025-06-27 15:54:43 +03:00
Jocelyn Falempe	116d86dd69	drm/i915/display: Add drm_panic support for Y-tiling with DPT On Alder Lake and later, it's not possible to disable tiling when DPT is enabled. So this commit implements Y-Tiling support, to still be able to draw the panic screen. Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250624091501.257661-10-jfalempe@redhat.com Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-27 11:48:23 +02:00
Jocelyn Falempe	75fb60e5ad	drm/i915: Add intel_bo_panic_setup() and intel_bo_panic_finish() Implement both functions for i915 and xe, they prepare the work for drm_panic support. They both use kmap_try_from_panic(), and map one page at a time, to write the panic screen on the framebuffer. Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250624091501.257661-8-jfalempe@redhat.com Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-27 11:48:22 +02:00
Jocelyn Falempe	da091afacb	drm/i915: Add intel_bo_alloc_framebuffer() Encapsulate the struct intel_framebuffer into an xe_framebuffer or i915_framebuffer, and allow to add specific fields for each variant for the panic use-case. This is particularly needed to have a struct xe_res_cursor available to support drm panic on discrete GPU. Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250624091501.257661-7-jfalempe@redhat.com Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-27 11:48:22 +02:00
Jocelyn Falempe	d2782a0d8f	drm/i915/fbdev: Add intel_fbdev_get_map() The vaddr of the fbdev framebuffer is private to the struct intel_fbdev, so this function is needed to access it for drm_panic. Also the struct i915_vma is different between i915 and xe, so it requires a few functions to access fbdev->vma->iomap. Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250624091501.257661-3-jfalempe@redhat.com Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-27 11:48:22 +02:00
Jia Yao	c038bdba98	drm/xe: Fix out-of-bounds field write in MI_STORE_DATA_IMM According to Bspec, bits 0~9 of MI_STORE_DATA_IMM must not exceed 0x3FE. The macro MI_SDI_NUM_QW(x) evaluates to 2 * x + 1, which means the condition 2 * x + 1 <= 0x3FE must be satisfied. Therefore, the maximum valid value for x is 0x1FE, not 0x1FF. v2 - Replace 0x1fe with macro MAX_PTE_PER_SDI (Auld, Matthew & Patelczyk, Maciej) v3 - Change macro MAX_PTE_PER_SDI from 0x1fe to 0x1feU (De Marchi, Lucas) Bspec: 60246 Fixes: `9c44fd5f6e` ("drm/xe: Add migrate layer functions for SVM support") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Brian3 Nguyen <brian3.nguyen@intel.com> Cc: Alex Zuo <alex.zuo@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Suggested-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Link: https://lore.kernel.org/r/20250612224620.161105-1-jia.yao@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-26 14:10:28 -07:00
Maarten Lankhorst	18635b6328	drm/xe: Rename xe_uc_init_hw to xe_uc_load_hw It feels to me like load is closer to the intention than init_hw. It makes the init calls slightly less confusing to me. :) Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-24-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	a42939ee86	drm/xe: Remove xe_uc_fini_hw xe_uc_init_hw() is called multiple times from xe_gt.c, and that makes the name xe_uc_fini_hw(), called for a different reason in xe_guc.c confusing. Remove it and inline the xe_uc_sanitize_reset into xe_guc.c directly. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-23-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	80fa03eb8a	drm/xe: Remove xe_uc_init_hwconfig() This function is called immediately after xe_uc_init(), so just put it there instead. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-22-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	3effd109c6	drm/xe: Move xe_ttm_sys_mgr_init() downwards. Now that all previous allocations are gone, ensure no new allocations will ever be done before xe_display_init_early(), by moving the call that allows allocations downwards. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-21-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	11bf0f0b3a	drm/xe: Split init of xe_gt_init_hwconfig to xe_gt_init and *_early Now that we added the separate step of initialising GUC in xe_gt_init_early, it should be ok to initialise the minimum during early init, and the rest after allocations are allowed. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-20-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	6386a49951	drm/xe: Rename gt_init sub-functions s/gt_fw_domain_init/gt_init_with_gt_forcewake()/ s/all_fw_domain_init/gt_init_with_all_forcewake()/ Clarify that the functions are the part of gt_init() that are called with the respective power domains held. all_domain() of course only works after discovering and initialisation of force_wake on all engines, that's why the split is needed in the first place. Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-19-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	4c5517e9ec	drm/xe: Only dump PAT when xe_hw_engines_init_early fails After discussion with Lucas De Marchi, it turns out that is the specific caller requiring a dump. This allows us to cleanup gt_init in a bit. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-18-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	2e1efcafd4	drm/xe: Make it possible to read instance0 MCR registers after xe_gt_mcr_init_early After mcr_init_early, we need to be able to do VRAM and CCS probing without hwconfig probe. Fortunately the relevant registers are all instance 0, which fortunately means no dependencies on further initialization is required. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-17-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:28 +02:00
Maarten Lankhorst	396044c9d8	drm/xe: Simplify GuC early initialization Add a 2-stage GuC init. An early one for everything that is needed for VF, and a full one that loads GuC and is allowed to do allocations. Link: https://lore.kernel.org/r/20250619104858.418440-16-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-26 22:10:30 +02:00
Maarten Lankhorst	b3412d7233	drm/xe/sriov: Move VF bootstrap and query_config to vf_guc_init We want to split up GUC init to an alloc and noalloc part to keep the init path the same for VF and !VF as much as possible. Everything in vf_guc_init should be done as early as possible, otherwise VRAM probing becomes impossible. Also move xe_gt_mmio_init to the end of xe_gt_init_early(), cleaning up the init in xe_device slightly. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-15-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:07:44 +02:00
Maarten Lankhorst	e6018b194b	drm/xe: Defer memirq init until needed memirqs require allocations into GGTT, which we cannot use until after display is enabled. Now that the initialisation of interrupts is postponed, move memirq init too. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Ilia Levi <ilia.levi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-14-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:07:44 +02:00
Thomas Hellström	b587016878	drm/xe: Implement and use the drm_pagemap populate_mm op Add runtime PM since we might call populate_mm on a foreign device. v3: - Fix a kerneldoc failure (Matt Brost) - Revert the bo type change from device to kernel (Matt Brost) v4: - Add an assert in xe_svm_alloc_vram (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250619134035.170086-4-thomas.hellstrom@linux.intel.com	2025-06-26 18:00:10 +02:00
Matthew Brost	f86ad0ed62	drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap The migration functionality and track-keeping of per-pagemap VRAM mapped to the CPU mm is not per GPU_vm, but rather per pagemap. This is also reflected by the functions not needing the drm_gpusvm structures. So move to drm_pagemap. With this, drm_gpusvm shouldn't really access the page zone-device-data since its meaning is internal to drm_pagemap. Currently it's used to reject mapping ranges backed by multiple drm_pagemap allocations. For now, make the zone-device-data a void pointer. Alter the interface of drm_gpusvm_migrate_to_devmem() to ensure we don't pass a gpusvm pointer. Rename CONFIG_DRM_XE_DEVMEM_MIRROR to CONFIG_DRM_XE_PAGEMAP. Matt is listed as author of this commit since he wrote most of the code, and it makes sense to retain his git authorship. Thomas mostly moved the code around. v3: - Kerneldoc fixes (CI) - Don't update documentation about how the drm_pagemap migration should be interpreted until upcoming patches where the functionality is implemented. (Matt Brost) v4: - More kerneldoc fixes around timeslice_ms (Himal Ghimiray, Matt Brost) v6: - Fix an uninitialized pagemap pointer (CI) Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250619134035.170086-2-thomas.hellstrom@linux.intel.com	2025-06-26 18:00:07 +02:00
Thomas Hellström	bb8aa27eff	drm/ttm, drm_xe, Implement ttm_lru_walk_for_evict() using the guarded LRU iteration To avoid duplicating the tricky bo locking implementation, Implement ttm_lru_walk_for_evict() using the guarded bo LRU iteration. To facilitate this, support ticketlocking from the guarded bo LRU iteration. v2: - Clean up some static function interfaces (Christian König) - Fix Handling -EALREADY from ticketlocking in the loop by skipping to the next item. (Intel CI) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250623155313.4901-4-thomas.hellstrom@linux.intel.com	2025-06-26 17:20:00 +02:00
Thomas Hellström	e1e85eb0a9	drm/ttm, drm/xe: Modify the struct ttm_bo_lru_walk_cursor initialization Instead of the struct ttm_operation_ctx, Pass a struct ttm_lru_walk_arg to enable us to easily extend the walk functionality, and to implement ttm_lru_walk_for_evict() using the guarded LRU iteration. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250623155313.4901-3-thomas.hellstrom@linux.intel.com	2025-06-26 17:15:32 +02:00
Michal Wajdeczko	af2b588abe	drm/xe: Process deferred GGTT node removals on device unwind While we are indirectly draining our dedicated workqueue ggtt->wq that we use to complete asynchronous removal of some GGTT nodes, this happends as part of the managed-drm unwinding (ggtt_fini_early), which could be later then manage-device unwinding, where we could already unmap our MMIO/GMS mapping (mmio_fini). This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes) and this was leading to: [ ] BUG: unable to handle page fault for address: ffffc900058162a0 [ ] #PF: supervisor write access in kernel mode [ ] #PF: error_code(0x0002) - not-present page [ ] Oops: Oops: 0002 [#1] SMP NOPTI [ ] Tainted: [W]=WARN [ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe] [ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe] [ ] Call Trace: [ ] <TASK> [ ] xe_ggtt_clear+0xb0/0x270 [xe] [ ] ggtt_node_remove+0xbb/0x120 [xe] [ ] ggtt_node_remove_work_func+0x30/0x50 [xe] [ ] process_one_work+0x22b/0x6f0 [ ] worker_thread+0x1e8/0x3d Add managed-device action that will explicitly drain the workqueue with all pending node removals prior to releasing MMIO/GSM mapping. Fixes: `919bb54e98` ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250612220937.857-2-michal.wajdeczko@intel.com (cherry picked from commit `89d2835c36`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-26 15:21:45 +02:00
Michal Wajdeczko	ad40098da5	drm/xe/guc: Explicitly exit CT safe mode on unwind During driver probe we might be briefly using CT safe mode, which is based on a delayed work, but usually we are able to stop this once we have IRQ fully operational. However, if we abort the probe quite early then during unwind we might try to destroy the workqueue while there is still a pending delayed work that attempts to restart itself which triggers a WARN. This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] ------------[ cut here ]------------ [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710 [ ] RIP: 0010:__queue_work+0x287/0x710 [ ] Call Trace: [ ] delayed_work_timer_fn+0x19/0x30 [ ] call_timer_fn+0xa1/0x2a0 Exit the CT safe mode on unwind to avoid that warning. Fixes: `09b286950f` ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250612220937.857-3-michal.wajdeczko@intel.com (cherry picked from commit `2ddbb73ec2`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-26 14:52:43 +02:00
Matthew Auld	f16873f42a	drm/xe: move DPT l2 flush to a more sensible place Only need the flush for DPT host updates here. Normal GGTT updates don't need special flush. Fixes: `01570b4469` ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250606104546.1996818-4-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `35db1da40c`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-26 14:52:25 +02:00
Maarten Lankhorst	a4b1b51ae1	drm/xe: Move DSB l2 flush to a more sensible place Flushing l2 is only needed after all data has been written. Fixes: `01570b4469` ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://lore.kernel.org/r/20250606104546.1996818-3-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `0dd2dd0182`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-26 14:52:19 +02:00
Matthew Brost	5a2f117a80	drm/xe: Do not wedge device on killed exec queues When a user closes an exec queue or interrupts an app with Ctrl-C, this does not warrant wedging the device in mode 2. Avoid this by skipping the wedge check for killed exec queues in the TDR and LR exec queue cleanup worker. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250624174103.2707941-1-matthew.brost@intel.com	2025-06-25 09:55:20 -07:00
Daniele Ceraolo Spurio	dfe6c28132	Revert "drm/xe/ptl: Apply Wa_16026007364" This reverts commit `3972872e45`. There are several things wrong with the way this WA was implemented: - The KLV is only supported on GuC 70.47.0 or newer, so we shouldn't apply it unconditionally. - The KLV requires 2 DWs of data, which are not currently provided. The GuC currently ignores any unknown KLVs, so on versions older that 70.47.0 nothing happens. However, starting on 70.47.0 the GuC attempts to parse the KLV and fails due to the missing data, causing a GuC load abort. Given that 70.47.0 is the first GuC version approved for public release for PTL, let's revert this patch so it doesn't cause the GuC load to fail with that blob. We can then re-apply it properly fixed after the GuC definition is merged, which will also have the added benefit of running the KLV addition through CI with the right GuC version. Fixes: `3972872e45` ("drm/xe/ptl: Apply Wa_16026007364") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: sanirban <sk.anirban@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250625001202.1616606-2-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-25 10:18:04 -04:00
Michal Wajdeczko	2ddbb73ec2	drm/xe/guc: Explicitly exit CT safe mode on unwind During driver probe we might be briefly using CT safe mode, which is based on a delayed work, but usually we are able to stop this once we have IRQ fully operational. However, if we abort the probe quite early then during unwind we might try to destroy the workqueue while there is still a pending delayed work that attempts to restart itself which triggers a WARN. This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] ------------[ cut here ]------------ [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710 [ ] RIP: 0010:__queue_work+0x287/0x710 [ ] Call Trace: [ ] delayed_work_timer_fn+0x19/0x30 [ ] call_timer_fn+0xa1/0x2a0 Exit the CT safe mode on unwind to avoid that warning. Fixes: `09b286950f` ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250612220937.857-3-michal.wajdeczko@intel.com	2025-06-24 22:27:12 +02:00
Michal Wajdeczko	89d2835c36	drm/xe: Process deferred GGTT node removals on device unwind While we are indirectly draining our dedicated workqueue ggtt->wq that we use to complete asynchronous removal of some GGTT nodes, this happends as part of the managed-drm unwinding (ggtt_fini_early), which could be later then manage-device unwinding, where we could already unmap our MMIO/GMS mapping (mmio_fini). This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes) and this was leading to: [ ] BUG: unable to handle page fault for address: ffffc900058162a0 [ ] #PF: supervisor write access in kernel mode [ ] #PF: error_code(0x0002) - not-present page [ ] Oops: Oops: 0002 [#1] SMP NOPTI [ ] Tainted: [W]=WARN [ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe] [ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe] [ ] Call Trace: [ ] <TASK> [ ] xe_ggtt_clear+0xb0/0x270 [xe] [ ] ggtt_node_remove+0xbb/0x120 [xe] [ ] ggtt_node_remove_work_func+0x30/0x50 [xe] [ ] process_one_work+0x22b/0x6f0 [ ] worker_thread+0x1e8/0x3d Add managed-device action that will explicitly drain the workqueue with all pending node removals prior to releasing MMIO/GSM mapping. Fixes: `919bb54e98` ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250612220937.857-2-michal.wajdeczko@intel.com	2025-06-24 22:27:10 +02:00
Karthik Poosa	9127a69c71	drm/xe/hwmon: Fix xe_hwmon_power_max_write Prevent other bits of mailbox power limit from being overwritten with 0. This issue was due to a missing read and modify of current power limit, before setting a requested mailbox power limit, which is added in this patch. v2: - Improve commit message. (Anshuman) v3: - Rebase. - Rephrase commit message. (Riana) - Add read-modify-write variant of xe_hwmon_pcode_write_power_limit() i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal) - Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits. - Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits writes use xe_hwmon_pcode_rmw_power_limit() only. v4: - Use PWR_LIM in place of (PWR_LIM_EN \| PWR_LIM_VAL) wherever applicable. (Riana) Fixes: `25a2aa779f` ("drm/xe/hwmon: Add support to manage power limits though mailbox") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `8aa7306631`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-24 20:18:31 +02:00
Haoxiang Li	6220729347	drm/xe/display: Add check for alloc_ordered_workqueue() Add check for the return value of alloc_ordered_workqueue() in xe_display_create() to catch potential exception. Fixes: `44e694958b` ("drm/xe/display: Implement display support") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/4ee1b0e5d1626ce1dde2e82af05c2edaed50c3aa.1747397638.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit `5b62d63395`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-24 20:18:31 +02:00
Matthew Auld	35db1da40c	drm/xe: move DPT l2 flush to a more sensible place Only need the flush for DPT host updates here. Normal GGTT updates don't need special flush. Fixes: `01570b4469` ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250606104546.1996818-4-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-24 10:42:35 -07:00
Maarten Lankhorst	0dd2dd0182	drm/xe: Move DSB l2 flush to a more sensible place Flushing l2 is only needed after all data has been written. Fixes: `01570b4469` ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://lore.kernel.org/r/20250606104546.1996818-3-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-24 10:42:35 -07:00
Vinay Belgaumkar	deea6a7d6d	drm/xe/bmg: Update Wa_22019338487 Limit GT max frequency to 2600MHz and wait for frequency to reduce before proceeding with a transient flush. This is really only needed for the transient flush: if L2 flush is needed due to 16023588340 then there's no need to do this additional wait since we are already using the bigger hammer. v2: Use generic names, ensure user set max frequency requests wait for flush to complete (Rodrigo) v3: - User requests wait via wait_var_event_timeout (Lucas) - Close races on flush + user requests (Lucas) - Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt rather than root gt (Lucas) v4: - Only apply the freq reducing part if a TDF is needed: L2 flush trumps the need for waiting a lower frequency Fixes: `aaa08078e7` ("drm/xe/bmg: Apply Wa_22019338487") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-24 09:58:29 -07:00
Lucas De Marchi	5e300ed8a5	drm/xe: Split xe_device_td_flush() xe_device_td_flush() has 2 possible implementations: an entire L2 flush or a transient flush, depending on WA 16023588340. Make this clear by splitting the function so it calls each of them. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-3-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-24 09:58:03 -07:00
Lucas De Marchi	d878c97daa	drm/xe/xe_guc_pc: Lock once to update stashed frequencies pc_set_mert_freq_cap() currently lock()/unlock() the mutex multiple times to stash the current frequencies. It's not a problem since xe_guc_pc_restore_stashed_freq() is guaranteed to be called only later in the init sequence. However, now that we have _locked() variants for this functions, use them and avoid potential issues when called from other places or using the same pattern. While at it, prefer and early return for the WA check to reduce indentation. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-2-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-24 09:58:03 -07:00
Lucas De Marchi	1beae9aa2b	drm/xe/guc_pc: Add _locked variant for min/max freq There are places in which the getters/setters are called one after the other causing a multiple lock()/unlock(). These are not currently a problem since they are all happening from the same thread, but there's a race possibility as calls are added outside of the early init when the max/min and stashed values need to be correlated. Add the _locked() variants to prepare for that. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-1-b888388477f2@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-24 09:57:57 -07:00
Jani Nikula	54fd8f38d8	drm/xe/compat: remove old pcode compat interface With display code using the struct drm_device based pcode interface, we can drop the old pcode compat interface. We can also drop the __compat_uncore_to_tile() helper from intel_uncore.h compat header. Turns out a couple of headers depended on the intel_uncore.h include via intel_pcode.h. Fix them. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/948016a031dcb2acef0c97071aac09fa49613e07.1750678991.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2025-06-23 21:45:17 +03:00
Jani Nikula	d9465cc8ac	drm/xe/pcode: add struct drm_device based interface In preparation for dropping the dependency on struct intel_uncore or struct xe_tile from display code, add a struct drm_device based interface to pcode. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/eeaa9cc8438caab2e22f9cb2142fbc18cc0fd861.1750678991.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2025-06-23 21:45:16 +03:00
Jani Nikula	ab3ef56f58	drm/i915/pcode: drop fast wait from snb_pcode_write_timeout() Only use the ms granularity wait in snb_pcode_write_timeout(), primarily to better align with the xe driver, which also only has the millisecond wait. Use an arbitrary 250 us fast wait before the specified ms wait, and have snb_pcode_write() default to 1 ms. This means snb_pcode_write() and snb_pcode_write_timeout() will always be sleeping functions. There should not be any atomic users for pcode writes though, and any display code using pcode via xe has already been non-atomic. The uncore wait will do a might_sleep() annotation that should catch any problems. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/ba86280f53cea2d020308db35f1ecbd615d07d8a.1750678991.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2025-06-23 21:45:16 +03:00
Reuven Abliyev	a1c940cbf5	drm/xe/nvm: add support for non-posted erase Erase command is slow on discrete graphics storage and may overshot PCI completion timeout. BMG introduces the ability to have non-posted erase. Add driver support for non-posted erase with polling for erase completion. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Reuven Abliyev <reuven.abliyev@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Link: https://lore.kernel.org/r/20250617145159.3803852-9-alexander.usyskin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-23 13:14:50 -04:00
Alexander Usyskin	87e1ebbafb	drm/xe/nvm: add support for access mode Check NVM access mode from GSC FW status registers and overwrite access status read from SPI descriptor, if needed. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Link: https://lore.kernel.org/r/20250617145159.3803852-8-alexander.usyskin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-23 13:14:50 -04:00
Alexander Usyskin	c28bfb107d	drm/xe/nvm: add on-die non-volatile memory device Enable access to internal non-volatile memory on DGFX with GSC/CSC devices via a child device. The nvm child device is exposed via auxiliary bus. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Link: https://lore.kernel.org/r/20250617145159.3803852-7-alexander.usyskin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-23 13:14:50 -04:00
Rodrigo Vivi	0089d6ee3b	Merge drm/drm-next into drm-xe-next Catch up on i915 changes to be able to include mtd driver for both xe and i915. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-23 13:14:13 -04:00
Jani Nikula	400ade1638	Merge drm/drm-next into drm-intel-next Sync with drm_panel changes from drm-misc-next, and xe driver changes from drm-xe-next. Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2025-06-23 10:57:13 +03:00
Dave Airlie	36c52fb703	Merge tag 'drm-intel-next-2025-06-18' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next drm/i915 feature pull for v6.17: Features and functionality: - Add support for DSC fractional link bpp on DP MST (Imre) - Add support for simultaneous Panel Replay and Adaptive Sync (Jouni) - Add support for PTL+ double buffered LUT registers (Chaitanya, Ville) - Add PIPEDMC event handling in preparation for flip queue (Ville) Refactoring and cleanups: - Rename lots of DPLL interfaces to unify them (Suraj) - Allocate struct intel_display dynamically (Jani) - Abstract VLV IOSF sideband better (Jani) - Use str_true_false() helper (Yumeng Fang) - Refactor DSB code in preparation for flip queue (Ville) - Use drm_modeset_lock_assert_held() instead of open coding (Luca) - Remove unused arg from skl_scaler_get_filter_select() (Luca) - Split out a separate display register header (Jani) - Abstract DRAM detection better (Jani) - Convert LPT/WPT SBI sideband to struct intel_display (Jani) Fixes: - Fix DSI HS command dispatch with forced pipeline flush (Gareth Yu) - Fix BMG and LNL+ DP adaptive sync SDP programming (Ankit) - Fix error path for xe display workqueue allocation (Haoxiang Li) - Disable DP AUX access probe where not required (Imre) - Fix DKL PHY access if the port is invalid (Luca) - Fix PSR2_SU_STATUS access on ADL+ (Jouni) - Add sanity checks for porch and sync on BXT/GLK DSI (Ville) DRM core changes: - Change AUX DPCD access probe address (Imre) - Refactor EDID quirks, amd make them available to drivers (Imre) - Add quirk for DPCD access probe (Imre) - Add DPCD definitions for Panel Replay capabilities (Jouni) Merges: - Backmerges to sync with v6.15-rcs and v6.16-rc1 (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/fff9f231850ed410bd81b53de43eff0b98240d31@intel.com	2025-06-23 10:49:27 +10:00
sanirban	3972872e45	drm/xe/ptl: Apply Wa_16026007364 As part of this WA GuC will save and restore value of two XE3_Media control registers that were not included in the HW power context. v2: - Update klv name (Badal) Signed-off-by: sanirban <sk.anirban@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250619133413.107423-2-sk.anirban@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-20 15:36:43 -04:00
Dave Airlie	9356b50af5	drm-misc-next for 6.17: UAPI Changes: - Add Task Information for the wedge API Cross-subsystem Changes: Core Changes: - Fix warnings related to export.h - fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures - fence: Fix UAF issues - format-helper: Improve tests Driver Changes: - ivpu: Add turbo flag, Add Wildcat Lake Support - rz-du: Improve MIPI-DSI Support - vmwgfx: fence improvement -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaFOwgQAKCRAnX84Zoj2+ dkbjAX9aGa2vGeoz9fiT4wMMvxWzLSW7EzJW9oC/iFitHOcmd0yUZCfdmUfukQ3T cXtVHFcBf3clQ1iI4fV8EQwLOEaBpQ1H642/41pAebXOr9kQ6JOQ4AqhJBqamJzv teGbWnA2+w== =inwC -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-06-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.17: UAPI Changes: - Add Task Information for the wedge API Cross-subsystem Changes: Core Changes: - Fix warnings related to export.h - fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures - fence: Fix UAF issues - format-helper: Improve tests Driver Changes: - ivpu: Add turbo flag, Add Wildcat Lake Support - rz-du: Improve MIPI-DSI Support - vmwgfx: fence improvement Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250619-perfect-industrious-whippet-8ed3db@houat	2025-06-20 11:34:09 +10:00
Dave Airlie	377b2f15c0	UAPI Changes: - Expose media OA units (Ashutosh) Merge: - Restore GuC submit UAF fix around queue destruction accidentally removed in a drm-xe-fixes merge (Auld) Core Changes: - drm/gpusvm: Introduce devmem_only flag for allocation (Himal) - drm/gpusvm: Add timeslicing support to GPU SVM (Brost) Driver Changes: - Make gem shrinker drm managed (Thomas) - SRIOV VF Post-migration recovery of GGTT nodes and CTB (Tomasz) - Some W/A additions and updates (Aradhya, Shekhar, Vinay, Daniele) - Prefetch Support for svm ranges (Himal, Brost) - Don't allocate managed BO for each policy change (Michal) - Simplify and fix diff calculation in GuC submit (Lucas) - Track FAST_REQ GuC H2Gs to report where errors came from (John) - SRIOV PF: Don't allow LMEM provisioning if LMTT isn't available (Piotr) - Check if all domains awake for MOCS dump (Tejas) - Make creation of SLPC debugfs files conditional (Aradhya) - Default auto_link_downgrade status to false (Aradhya) - Use xe_mmio_read32() to read mtcfg register (Shuicheng) - Updates in PCI ID tables (Atwood, Shekhar) - SRIOV VF: Fail migration recovery if fixups needed but not supported (Tomasz) - Add missing documentation around freq and RPa (Rodrigo) - Some other SVM related fixes (Himal, Auld, Brost, Maarten) - Allow to trigger GT resets using debugfs writes (Michal) - Optimise CCS case for WB pages (Auld) - Create LRC BO without VM (Niranjana) - Initialize MOCS index early (Bala) - HWMON fixes for BMG (Karthik, Lucas) - Drop redundant conversion to bool (Raag) - Rework eviction rejection of bound external bos (Thomas) - Stop re-submitting signalled jobs (Auld) - Small fixes and cleanups for PXP (Daniele) - Convert some print messages to GT-oriented ones (Michal) - Resend potentially lost GuC H2G MMIO request (Michal) - Add configfs to load with fewer engines (Lucas) - Remove unmatched xe_vm_unlock from __xe_exec_queue_init (Maciej) - SRIOV VF: Small updates around GGTT handling (Michal) - Make VMA tile_present, tile_invalidated access rules clear (Brost) - Xe3 Tuning: Disable NULL query for Anyhit Shader (Nitin) - Fixes for VF GuC version (Daniele) - Don't store the xe device pointer inside xe_ttm_tt (Dave) - Small improvements in topology code (Michal) - Stop relying on GGTT internals (Maarten) - GSM size should be constant on most platforms (Roper) - Reorder 'Get pages failed' message (Brost) - WA BB related fixes and improvements (Lucas, Brost) - Fix early wedge on GuC load failure (Daniele) - Add helper function to inject fault into ct_dead_capture (Satyanarayana) - Determine ATS / PTA programming during early sw init (Roper) - Consolidate PAT programming logic for pre-Xe2 and post-Xe2 (Roper) - Fix kconfig prompt (Lucas) - Convert xe_pci tests to parametrized tests (Michal) - Do not kill VM in PT code on -ENODATA (Brost) - Move LRC_ENGINE_ID_PPHWSP_OFFSET outside of parallel offset (Brost) - Enable media OA (Ashutosh) - GuC log level tuning (Lucas) - Add xe_vm_has_valid_gpu_mapping helper (Brost) - Opportunistically skip TLB invalidaion on unbind (Brost) -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmhTGwcACgkQ+mJfZA7r E8odlwf6A6bfNDdj56gMjxK/tyS3ud5VV6nAiCyHoGtcMeN6rZE2dDHOI3rP1fH7 6urnx6DqZu6lA1o1NJaidyc11WLlqB3hJN+tAVZChVe8N65syvpxdz38wZbJxrfQ MKw4uB8GfhNroQXuZcj+0dF+Ru/UqCbSAL7f1PMajAf4AcPBu/Ju7EYc2ALnINt1 jx+TOm1fOIMpA/Cw3DmGL3Uy/MtYRnnASp+qU4xSv/y8en7+83HDoKbC7+nY5NG0 j06O0QK2QeRTnltdvmbTlpjwQ+1ztyA1JS+pqj+QjyQ8iLfZaUQzED3iWAiMayn7 5A8zHkW02+v0pkFTFn2C4HShANAeHg== =Jq5v -----END PGP SIGNATURE----- Merge tag 'drm-xe-next-2025-06-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - Expose media OA units (Ashutosh) Merge: - Restore GuC submit UAF fix around queue destruction accidentally removed in a drm-xe-fixes merge (Auld) Core Changes: - drm/gpusvm: Introduce devmem_only flag for allocation (Himal) - drm/gpusvm: Add timeslicing support to GPU SVM (Brost) Driver Changes: - Make gem shrinker drm managed (Thomas) - SRIOV VF Post-migration recovery of GGTT nodes and CTB (Tomasz) - Some W/A additions and updates (Aradhya, Shekhar, Vinay, Daniele) - Prefetch Support for svm ranges (Himal, Brost) - Don't allocate managed BO for each policy change (Michal) - Simplify and fix diff calculation in GuC submit (Lucas) - Track FAST_REQ GuC H2Gs to report where errors came from (John) - SRIOV PF: Don't allow LMEM provisioning if LMTT isn't available (Piotr) - Check if all domains awake for MOCS dump (Tejas) - Make creation of SLPC debugfs files conditional (Aradhya) - Default auto_link_downgrade status to false (Aradhya) - Use xe_mmio_read32() to read mtcfg register (Shuicheng) - Updates in PCI ID tables (Atwood, Shekhar) - SRIOV VF: Fail migration recovery if fixups needed but not supported (Tomasz) - Add missing documentation around freq and RPa (Rodrigo) - Some other SVM related fixes (Himal, Auld, Brost, Maarten) - Allow to trigger GT resets using debugfs writes (Michal) - Optimise CCS case for WB pages (Auld) - Create LRC BO without VM (Niranjana) - Initialize MOCS index early (Bala) - HWMON fixes for BMG (Karthik, Lucas) - Drop redundant conversion to bool (Raag) - Rework eviction rejection of bound external bos (Thomas) - Stop re-submitting signalled jobs (Auld) - Small fixes and cleanups for PXP (Daniele) - Convert some print messages to GT-oriented ones (Michal) - Resend potentially lost GuC H2G MMIO request (Michal) - Add configfs to load with fewer engines (Lucas) - Remove unmatched xe_vm_unlock from __xe_exec_queue_init (Maciej) - SRIOV VF: Small updates around GGTT handling (Michal) - Make VMA tile_present, tile_invalidated access rules clear (Brost) - Xe3 Tuning: Disable NULL query for Anyhit Shader (Nitin) - Fixes for VF GuC version (Daniele) - Don't store the xe device pointer inside xe_ttm_tt (Dave) - Small improvements in topology code (Michal) - Stop relying on GGTT internals (Maarten) - GSM size should be constant on most platforms (Roper) - Reorder 'Get pages failed' message (Brost) - WA BB related fixes and improvements (Lucas, Brost) - Fix early wedge on GuC load failure (Daniele) - Add helper function to inject fault into ct_dead_capture (Satyanarayana) - Determine ATS / PTA programming during early sw init (Roper) - Consolidate PAT programming logic for pre-Xe2 and post-Xe2 (Roper) - Fix kconfig prompt (Lucas) - Convert xe_pci tests to parametrized tests (Michal) - Do not kill VM in PT code on -ENODATA (Brost) - Move LRC_ENGINE_ID_PPHWSP_OFFSET outside of parallel offset (Brost) - Enable media OA (Ashutosh) - GuC log level tuning (Lucas) - Add xe_vm_has_valid_gpu_mapping helper (Brost) - Opportunistically skip TLB invalidaion on unbind (Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/aFMb_NVF_oCW7UVl@intel.com	2025-06-20 09:08:01 +10:00
Daniele Ceraolo Spurio	a39d082c35	drm/xe: Fix early wedge on GuC load failure When the GuC fails to load we declare the device wedged. However, the very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done before the GT1 GuC objects are initialized, so things go bad when the wedge code attempts to cleanup GT1. To fix this, check the initialization status in the functions called during wedge. Fixes: `7dbe8af13c` ("drm/xe: Wedge the entire device") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: stable@vger.kernel.org # v6.12+: `1e1981b16b`: drm/xe: Fix taking invalid lock on wedge Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250611214453.1159846-2-daniele.ceraolospurio@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `0b93b7dcd9`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-19 17:24:30 +02:00
Lucas De Marchi	87a15c89d8	drm/xe: Fix memset on iomem It should rather use xe_map_memset() as the BO is created with XE_BO_FLAG_VRAM_IF_DGFX in xe_guc_pc_init(). Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250612-vmap-vaddr-v1-1-26238ed443eb@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `21cf47d89f`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-19 16:14:56 +02:00
Vinay Belgaumkar	16c1241b08	drm/xe/bmg: Update Wa_16023588340 This allows for additional L2 caching modes. Fixes: `01570b4469` ("drm/xe/bmg: implement Wa_16023588340") Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-2-94ba5dcc1e30@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `6ab42fa03d`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-19 16:14:47 +02:00
Jani Nikula	a649c2abfa	drm/i915/plane: rename intel_atomic_plane.[ch] to intel_plane.[ch] It's all atomic, no need to emphasize this. v2: Also update Documentation/gpu/i915.rst (Gustavo) Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/ba5f304e9fe71723191d872e6828d461e1a572bd.1750147992.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2025-06-19 12:14:29 +03:00
Matt Roper	c96e0df4e9	drm/xe/xe3: Add support for media IP version 30.02 Media version 30.02 should be treated the same as other Xe3 IP, but will have a slightly different set of workarounds. -v2: Extend the range in existing WA entry (Bala) -v3: Revert v2, Do not extend the range for the time being(Matt) Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20250613193146.3549862-4-dnyaneshwar.bhadane@intel.com	2025-06-18 15:42:12 -07:00
Matt Roper	b1c37a0030	drm/xe/xe3: Add support for graphics IP version 30.03 Graphics version 30.03 should be treated the same as other Xe3 IP, but will have a slightly different set of workarounds. -v2: Merge and extend the WA onto existing entry (Bala) -v3: Revert v2's feedback changes and keep entry saparate (Matt). Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@inte.com> Link: https://lore.kernel.org/r/20250613193146.3549862-3-dnyaneshwar.bhadane@intel.com	2025-06-18 15:41:49 -07:00
Karthik Poosa	8aa7306631	drm/xe/hwmon: Fix xe_hwmon_power_max_write Prevent other bits of mailbox power limit from being overwritten with 0. This issue was due to a missing read and modify of current power limit, before setting a requested mailbox power limit, which is added in this patch. v2: - Improve commit message. (Anshuman) v3: - Rebase. - Rephrase commit message. (Riana) - Add read-modify-write variant of xe_hwmon_pcode_write_power_limit() i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal) - Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits. - Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits writes use xe_hwmon_pcode_rmw_power_limit() only. v4: - Use PWR_LIM in place of (PWR_LIM_EN \| PWR_LIM_VAL) wherever applicable. (Riana) Fixes: `7596d839f6` ("drm/xe/hwmon: Add support to manage power limits though mailbox") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-18 12:26:27 -04:00
Matthew Brost	bcc287203c	drm/xe: Opportunistically skip TLB invalidaion on unbind If a range or VMA is invalidated and scratch page is disabled, there is no reason to issue a TLB invalidation on unbind, skip TLB innvalidation is this condition is true. This is an opportunistic check as it is done without the notifier lock, thus it possible for the range to be invalidated after this check is performed. This should improve performance of the SVM garbage collector, for example, xe_exec_system_allocator --r many-stride-new-prefetch, went ~20s to ~9.5s on a BMG. v2: - Use helper for valid check (Thomas) v3: - Avoid skipping TLB invalidation if PTEs are removed at a higher level than the range - Never skip TLB invalidations for VMA - Drop Himal's RB Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250616063024.2059829-3-matthew.brost@intel.com	2025-06-17 15:38:14 -07:00
Matthew Brost	fab76ce565	drm/xe: Add xe_vm_has_valid_gpu_mapping helper Rather than having multiple READ_ONCE of the tile_* fields and comments in code, use helper with kernel doc for single access point and clear rules. v3: - s/xe_vm_has_valid_gpu_pages/xe_vm_has_valid_gpu_mapping Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250616063024.2059829-2-matthew.brost@intel.com	2025-06-17 15:38:11 -07:00
Dave Airlie	45215c589e	drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - atomic-helpers: Tune the enable / disable sequence - bridge: Add destroy hook - color management: Add helpers for hardware gamma LUT handling - HDMI: Add CEC handling, YUV420 output support - sched: tracing improvements Driver Changes: - hyperv: Move out of simple-kms, drm_panic support - i915: drm_panel_follower support - imx: Add IMX8qxq Display Controller Support - lima: Add Rockchip RK3528 GPU Support - nouveau: fence handling cleanup - panfrost: Add BO labeling, 64-bit registers access - qaic: Add RAS Support - rz-du: Add RZ/V2H(P) Support, MIPI-DSI DCS Support - sun4i: Add H616 Support - tidss: Add TI AM62L Support - vkms: YUV and R* formats support - bridges: - Switched to reference counted drm_bridge allocations - panels: - Switched to reference counted drm_panel allocations - Add support for fwnode-based panel lookup - himax-hx8394: Support for Huiling hl055fhv028c - ilitek-ili9881c: Support for 7" Raspberry Pi 720x1280 - panel-edp: Support for KDC KD116N3730A05, N160JCE-ELL CMN, - panel-simple: Support for AUO P238HAN01 - st7701: Support for Winstar wf40eswaa6mnn0 - visionox-rm69299: Support for rm69299-shift - New panels: Renesas R61307, Renesas R69328 -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaEri7QAKCRAnX84Zoj2+ do3hAX4lLiyR2SP9DJP+i5nRKv0nq0LBLp5+gzko66iF3nzU26ILvHaiVAgP6pQ8 UssnZXIBgJPLXwa4mloU2ynnHaReHR+s2TEn5tg6TjI51TautKtN9i4o3vL+Vy7d UPogL3WwIQ== =/VQL -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-06-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - atomic-helpers: Tune the enable / disable sequence - bridge: Add destroy hook - color management: Add helpers for hardware gamma LUT handling - HDMI: Add CEC handling, YUV420 output support - sched: tracing improvements Driver Changes: - hyperv: Move out of simple-kms, drm_panic support - i915: drm_panel_follower support - imx: Add IMX8qxq Display Controller Support - lima: Add Rockchip RK3528 GPU Support - nouveau: fence handling cleanup - panfrost: Add BO labeling, 64-bit registers access - qaic: Add RAS Support - rz-du: Add RZ/V2H(P) Support, MIPI-DSI DCS Support - sun4i: Add H616 Support - tidss: Add TI AM62L Support - vkms: YUV and R* formats support - bridges: - Switched to reference counted drm_bridge allocations - panels: - Switched to reference counted drm_panel allocations - Add support for fwnode-based panel lookup - himax-hx8394: Support for Huiling hl055fhv028c - ilitek-ili9881c: Support for 7" Raspberry Pi 720x1280 - panel-edp: Support for KDC KD116N3730A05, N160JCE-ELL CMN, - panel-simple: Support for AUO P238HAN01 - st7701: Support for Winstar wf40eswaa6mnn0 - visionox-rm69299: Support for rm69299-shift - New panels: Renesas R61307, Renesas R69328 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250612-coucal-of-impossible-cleaning-a5eecf@houat	2025-06-18 08:09:35 +10:00
Daniele Ceraolo Spurio	1a5ce0c5b9	drm/xe: Extend WA 14018094691 to BMG This WA is applicable to BMG as well. Note that this is a GSC WA and we don't load the GSC on BMG, so extending the WA to BMG won't do anything right now. However, it helps future-proof the driver so that if we ever turn the GSC on we won't have to remember to extend this WA. v2: don't use VERSION_RANGE from 2001 to 2004 (Matt) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250613231128.1261815-2-daniele.ceraolospurio@intel.com	2025-06-17 14:29:34 -07:00
Lucas De Marchi	21cf47d89f	drm/xe: Fix memset on iomem It should rather use xe_map_memset() as the BO is created with XE_BO_FLAG_VRAM_IF_DGFX in xe_guc_pc_init(). Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250612-vmap-vaddr-v1-1-26238ed443eb@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-17 13:22:15 -07:00
Lucas De Marchi	61a5a3f182	drm/xe: Annotate default for guc_log_level param Reword the parameter description so it's clear what's the default and what are the verbose levels. Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250613-guc-log-level-v2-2-cb84a63e49fe@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-17 12:21:55 -07:00
Lucas De Marchi	a37128ba61	drm/xe/guc: Default log level to non-verbose Currently xe sets the guc log level to a verbose level since it's useful to debug hangs and general development. However the verbose level may already be too much and affect performance. Michal Mrozek did some tests with the L0 compute stack for submission latency with ULLS disabled. Below are the normalized numbers with log level 3 (the current default) as baseline for each test: Test \ Log Level 3 0 1 2 ----------------------------------------------------------- ------ ------ ------ ------ BestWalkerNthCommandListSubmission(CmdListCount=2) 1.00 0.63 0.63 0.96 BestWalkerNthSubmission(KernelCount=2) 1.00 0.62 0.63 0.96 BestWalkerNthSubmissionImmediate(KernelCount=2) 1.00 0.58 0.58 0.85 BestWalkerSubmission 1.00 0.62 0.62 0.96 BestWalkerSubmissionImmediate 1.00 0.63 0.62 0.96 BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=2) 1.00 0.58 0.58 0.86 BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=4) 1.00 0.70 0.70 0.83 BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=8) 1.00 0.53 0.52 0.78 Log level 2 is the first "verbose level" for GuC, where the biggest difference happens. Keep log level 3 for CONFIG_DRM_XE_DEBUG, but switch to 1, i.e. GUC_LOG_LEVEL_NON_VERBOSE, for "normal" builds. Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250613-guc-log-level-v2-1-cb84a63e49fe@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-17 12:21:22 -07:00
Ashutosh Dixit	82a4be88c8	drm/xe/oa: Enable OAM latency measurement Enable OAM latency measurement for Xe3+ platforms. Bspec: 58840 v2: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG v3: Also add LNCF_MISC_CONFIG_REGISTER0 needed by MDAPI Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250606192618.4133817-6-ashutosh.dixit@intel.com	2025-06-17 11:34:20 -07:00
Ashutosh Dixit	10d42ef34b	drm/xe/oa: Assign hwe for OAM_SAG Because OAM_SAG doesn't have an attached hwe, assign another hwe belonging to the same gt (and different OAM unit) to OAM_SAG. A hwe is needed for batch submissions to program OA HW. v2: Assign an engine with a valid OA unit for OAM_SAG (Umesh) Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250606192618.4133817-5-ashutosh.dixit@intel.com	2025-06-17 11:31:57 -07:00
Ashutosh Dixit	2d1fcec022	drm/xe/oa: Introduce stream->oa_unit Previously, the oa_unit associated with an OA stream was derived from hwe associated with the stream (stream->hwe->oa_unit). This breaks with OAM_SAG since OAM_SAG does not have any attached hardware engines. Resolve this by introducing stream->oa_unit and stop depending on stream->hwe. Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250606192618.4133817-4-ashutosh.dixit@intel.com	2025-06-17 11:31:55 -07:00
Ashutosh Dixit	f3a3fd2c6f	drm/xe/oa: Print hwe to OA unit mapping Print hwe to OA unit mapping to dmesg, to help debug for current and new platforms. v2: Separate out xe_oa_print_gt_oa_units() (Umesh) Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250606192618.4133817-3-ashutosh.dixit@intel.com	2025-06-17 11:31:53 -07:00
Ashutosh Dixit	e04dac12ce	drm/xe/oa/uapi: Expose media OA units On Xe2+ platforms, media engines are attached to "SCMI" OA media (OAM) units. One or more SCMI OAM units might be present on a platform. In addition there is another OAM unit for global events, called OAM-SAG. Performance metrics for media workloads can be obtained from these OAM units, similar to OAG. Expose these OAM units for userspace to use. OAM-SAG is exposed as an OA unit without any attached engines. Bspec: 70819, 67103, 63844, 72572, 74476, 61284 v2: Fix xe_gt_WARN_ON in __hwe_oam_unit for < 12.7 platforms v3: Return XE_OA_UNIT_INVALID for < 12.7 to indicate no OAM units v4: Move xe_oa_print_oa_units() to separate patch v5: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG v6: Introduce DRM_XE_OA_CAPS_OAM Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250606192618.4133817-2-ashutosh.dixit@intel.com	2025-06-17 11:31:50 -07:00
Matthew Brost	2e273e4f85	drm/xe: Move LRC_ENGINE_ID_PPHWSP_OFFSET outside of parallel offset The parallel scratch layout spans 2k and LRC_ENGINE_ID_PPHWSP_OFFSET lands within than space. This happens to be ok as the offset lands in reserved part of guc_sched_wq_desc, but for future safety move LRC_ENGINE_ID_PPHWSP_OFFSET to the unused offset of 1024 below parallel scratch layout. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250612172850.4170428-1-matthew.brost@intel.com	2025-06-17 08:25:58 -07:00
Matthew Brost	badf45650b	drm/xe: Do not kill VM in PT code on -ENODATA No need kill on -ENODATA as is this non-fatal error can occur when MMU notifiers race with prefetches. Fixes: `09ba0a8f06` ("drm/xe/svm: Implement prefetch support for SVM ranges") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>> Link: https://lore.kernel.org/r/20250613231808.752616-1-matthew.brost@intel.com	2025-06-17 08:25:57 -07:00
André Almeida	183bccafa1	drm: Create a task info option for wedge events When a device get wedged, it might be caused by a guilty application. For userspace, knowing which task was involved can be useful for some situations, like for implementing a policy, logs or for giving a chance for the compositor to let the user know what task was involved in the problem. This is an optional argument, when the task info is not available, the PID and TASK string won't appear in the event string. Sometimes just the PID isn't enough giving that the task might be already dead by the time userspace will try to check what was this PID's name, so to make the life easier also notify what's the task's name in the user event. Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250617124949.2151549-4-andrealmeid@igalia.com Signed-off-by: André Almeida <andrealmeid@igalia.com>	2025-06-17 11:32:47 -03:00
Michal Wajdeczko	33c77e00f2	drm/xe/tests: Convert xe_pci tests to parametrized tests Instead of looping over known IP descriptors within single test case, without any diagnostics which IP descriptor is eventually broken, define kunit parameter generators with IP descriptors, and make existing xe_pci tests fully parametrized: [ ] =================== xe_pci (2 subtests) ==================== [ ] ==================== check_graphics_ip ==================== [ ] [PASSED] 12.70 Xe_LPG [ ] [PASSED] 12.71 Xe_LPG [ ] [PASSED] 12.74 Xe_LPG+ [ ] [PASSED] 20.01 Xe2_HPG [ ] [PASSED] 20.04 Xe2_LPG [ ] [PASSED] 30.00 Xe3_LPG [ ] [PASSED] 30.01 Xe3_LPG [ ] ================ [PASSED] check_graphics_ip ================ [ ] ===================== check_media_ip ====================== [ ] [PASSED] 13.00 Xe_LPM+ [ ] [PASSED] 13.01 Xe2_HPM [ ] [PASSED] 20.00 Xe2_LPM [ ] [PASSED] 30.00 Xe3_LPM [ ] ================= [PASSED] check_media_ip ================== [ ] ===================== [PASSED] xe_pci ====================== Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250614182446.2024-1-michal.wajdeczko@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-06-16 13:48:47 -07:00
Michal Wajdeczko	48f2f7a9fe	drm/xe/tests: Drop unused xe_device_fn typedef We missed to drop it in commit `50680d1698` ("drm/xe/tests: remove unused leftover xe_call_for_each_device()") so drop it now. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250613191938.1980-2-michal.wajdeczko@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-06-16 13:48:47 -07:00
Lucas De Marchi	1488a3089d	drm/xe: Fix kconfig prompt The xe driver is the official driver for Intel Xe2 and later, while maintaining experimental support for earlier GPUs. Reword the help message accordingly. Reviewed-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://lore.kernel.org/r/20250611-xe-kconfig-help-v1-1-8bcc6b47d11a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-16 13:28:00 -07:00
Matt Roper	3091bd44cd	drm/xe/pat: Consolidate PAT programming logic for pre-Xe2 and post-Xe2 Now that the PAT settings for the new special entries introduced by Xe2 are decided during early software init and left NULL on platforms they don't apply to, there's no need to keep separate programming functions for pre-Xe2 and post-Xe2 platforms. Consolidate down to a single pair of programming functions (mcr and non-mcr) that can be used on any platform. Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250613214751.792066-4-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-06-16 08:58:59 -07:00
Matt Roper	564e1a82fb	drm/xe/pat: Determine ATS / PTA programming during early sw init Decide whether programming of the special ATS and PTA PAT entries is necessary (and which entries should be programmed) during early software initialization rather than hardcoding this into the 'program' functions. Future platforms may want to re-use the same functions but utilize different special entry values. Consolidating all of the decisions into one place keeps things simple. Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250613214751.792066-3-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-06-16 08:58:59 -07:00
Tvrtko Ursulin	6bd90e700b	drm/xe: Make dma-fences compliant with the safe access rules Xe can free some of the data pointed to by the dma-fences it exports. Most notably the timeline name can get freed if userspace closes the associated submit queue. At the same time the fence could have been exported to a third party (for example a sync_fence fd) which will then cause an use- after-free on subsequent access. To make this safe we need to make the driver compliant with the newly documented dma-fence rules. Driver has to ensure a RCU grace period between signalling a fence and freeing any data pointed to by said fence. For the timeline name we simply make the queue be freed via kfree_rcu and for the shared lock associated with multiple queues we add a RCU grace period before freeing the per GT structure holding the lock. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250610164226.10817-5-tvrtko.ursulin@igalia.com	2025-06-13 08:28:22 +01:00
Himal Prasad Ghimiray	3ee9f2058a	drm/xe/vm: Add a helper xe_vm_range_tilemask_tlb_invalidation() Introduce xe_vm_range_tilemask_tlb_invalidation(), which issues a TLB invalidation for a specified address range across GTs indicated by a tilemask. v2 (Matthew Brost) - Move WARN_ON_ONCE to svm caller - Remove xe_gt_tlb_invalidation_vma - s/XE_WARN_ON/WARN_ON_ONCE v3 - Rebase Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250609041616.1723636-1-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-06-13 12:51:43 +05:30
Vinay Belgaumkar	bdde16c9ac	drm/xe/bmg: Update Wa_14022085890 Set GT min frequency to 1200Mhz once driver load is complete. v2: Review comments (Rodrigo) v3: Apply Wa earlier so user_req_min is not clobbered. v4: Apply to all GTs (Lucas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-3-94ba5dcc1e30@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-12 23:25:53 -07:00
Vinay Belgaumkar	6ab42fa03d	drm/xe/bmg: Update Wa_16023588340 This allows for additional L2 caching modes. Fixes: `01570b4469` ("drm/xe/bmg: implement Wa_16023588340") Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-2-94ba5dcc1e30@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-12 23:23:39 -07:00
Vinay Belgaumkar	fa42438737	drm/xe/guc: Ignore GuC CT errors when wedged Messaging to GuC may get canceled when device is wedged. Don't flag this as an error in xe_guc_pc code. Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-1-94ba5dcc1e30@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-12 23:23:39 -07:00
Satyanarayana K V P	87c648c313	drm/xe: Add helper function to inject fault into ct_dead_capture() When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv() functions, the CI test systems are going out of space and crashing. To avoid this issue, a new helper function is created and when fault is injected into this xe_is_injection_active() helper function, ct dead capture is avoided which suppresses ct dumps in the log. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Suggested-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250612080402.22011-1-satyanarayana.k.v.p@intel.com	2025-06-12 16:53:56 -07:00
Daniele Ceraolo Spurio	0b93b7dcd9	drm/xe: Fix early wedge on GuC load failure When the GuC fails to load we declare the device wedged. However, the very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done before the GT1 GuC objects are initialized, so things go bad when the wedge code attempts to cleanup GT1. To fix this, check the initialization status in the functions called during wedge. Fixes: `7dbe8af13c` ("drm/xe: Wedge the entire device") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: stable@vger.kernel.org # v6.12+: `1e1981b16b`: drm/xe: Fix taking invalid lock on wedge Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250611214453.1159846-2-daniele.ceraolospurio@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-12 15:17:12 -07:00
Matthew Brost	3a1edef8f4	drm/xe: Make WA BB part of LRC BO No idea why, but without this GuC context switches randomly fail when running IGTs in a loop. Need to follow up why this fixes the aforementioned issue but can live with a stable driver for now. Fixes: `617d824c53` ("drm/xe: Add WA BB to capture active context utilization") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250612031925.4009701-1-matthew.brost@intel.com	2025-06-12 10:51:19 -07:00
Matthew Brost	0fccfb635e	drm/xe: Use WRITE_ONCE for range->tile_invalidated update Updating range->tile_invalidated should be done with WRITE_ONCE to pair with READ_ONCE in opportunistic checks. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhrost <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20250604234712.2441130-1-matthew.brost@intel.com	2025-06-12 10:43:52 -07:00
Matthew Brost	265fa0692b	drm/xe: Don't use drm exec locking in SVM pagefaults Only the VM dma-resv lock is needed in SVM pagefaults so xe_vm_lock/unlock can be used instead of drm exec. Micro optimization but should save some CPU cycles in a critical path. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250603174012.2195759-1-matthew.brost@intel.com	2025-06-12 10:43:40 -07:00
Lucas De Marchi	9c7632faad	drm/xe/lrc: Use a temporary buffer for WA BB In case the BO is in iomem, we can't simply take the vaddr and write to it. Instead, prepare a separate buffer that is later copied into io memory. Right now it's just a few words that could be using xe_map_write32(), but the intention is to grow the WA BB for other uses. Fixes: `617d824c53` ("drm/xe: Add WA BB to capture active context utilization") Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `ef48715b2d`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-12 18:09:50 +02:00
Lucas De Marchi	0ed4b3c21c	drm/xe/lrc: Prepare WA BB setup for more users The post context restore (WA BB) is a mechanism in HW that may be used for things other than the utilization setup. Create a new function called setup_wa_bb() that wraps any function writing useful commands in the buffer. Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-2-0dfc5dafcef0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-11 15:03:05 -07:00
Lucas De Marchi	ef48715b2d	drm/xe/lrc: Use a temporary buffer for WA BB In case the BO is in iomem, we can't simply take the vaddr and write to it. Instead, prepare a separate buffer that is later copied into io memory. Right now it's just a few words that could be using xe_map_write32(), but the intention is to grow the WA BB for other uses. Fixes: `82b98cadb0` ("drm/xe: Add WA BB to capture active context utilization") Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-11 15:03:05 -07:00
Shekhar Chauhan	26ff87d2e7	drm/xe/xe2_hpg: Define additional Xe2_HPG GMD_ID Add another GMD_ID for Xe2_HPG Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: James Ausmus <james.ausmus@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250605190804.1287289-4-dnyaneshwar.bhadane@intel.com	2025-06-11 07:29:49 -07:00
Shekhar Chauhan	a5d221924e	drm/xe/xe2_hpg: Add set of workarounds Add set of workarounds for xe2_hpg. -v2: Fix xe2_hpg GMD version for some workarounds. -v3: Removed extra Workaround (Matt Roper) Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250605190804.1287289-3-dnyaneshwar.bhadane@intel.com	2025-06-11 07:29:48 -07:00
Jani Nikula	9d4e26042c	drm/i915/display: drop i915_reg.h include where possible A number of files have unnecessary i915_reg.h includes. Drop them. Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> Link: https://lore.kernel.org/r/7c4002322f4d8132fd2eaa1a4d688539cdd043c3.1749469962.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2025-06-11 14:03:06 +03:00
Vivek Kasireddy	5f105b2e49	Revert "drm/xe/display: use xe->display to decide whether to do anything" This reverts commit `5a9f299f95`. The following crash/regression was seen with the reverted commit on a specific BMG SKU with no display capabilities: [ 115.582833] BUG: kernel NULL pointer dereference, address: 00000000000005d0 [ 115.589775] #PF: supervisor write access in kernel mode [ 115.594976] #PF: error_code(0x0002) - not-present page [ 115.600088] PGD 0 P4D 0 [ 115.602617] Oops: Oops: 0002 [#1] SMP [ 115.606267] CPU: 14 UID: 0 PID: 1547 Comm: kworker/14:3 Tainted: G U E 6.15.0-local+ #62 PREEMPT(voluntary) [ 115.617332] Tainted: [U]=USER, [E]=UNSIGNED_MODULE [ 115.622100] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPEMI1.R00.3471.D49.2401260852 01/26/2024 [ 115.635314] Workqueue: pm pm_runtime_work [ 115.639309] RIP: 0010:_raw_spin_lock+0x17/0x30 [ 115.662382] RSP: 0018:ffffd13f82e7bc30 EFLAGS: 00010246 [ 115.667581] RAX: 0000000000000000 RBX: ffff8be919076000 RCX: 0000000000000002 [ 115.674675] RDX: 0000000000000001 RSI: 000000000000004b RDI: 00000000000005d0 [ 115.681775] RBP: ffffd13f82e7bc60 R08: ffffd13f82e7bb00 R09: ffff8beb0c1b06c0 [ 115.688869] R10: ffff8be7c034f4c0 R11: fefefefefefefeff R12: fffffffffffffff0 [ 115.695965] R13: ffff8be9190762e8 R14: ffff8be919077798 R15: 00000000000005d0 [ 115.703062] FS: 0000000000000000(0000) GS:ffff8beb552b6000(0000) knlGS:0000000000000000 [ 115.711106] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.716826] CR2: 00000000000005d0 CR3: 000000024c68d002 CR4: 0000000000f72ef0 [ 115.723921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 115.731015] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 115.738113] PKRU: 55555554 [ 115.740816] Call Trace: [ 115.743258] <TASK> [ 115.745363] ? xe_display_flush_cleanup_work+0x92/0x120 [xe] [ 115.751102] xe_display_pm_runtime_suspend+0x42/0x80 [xe] [ 115.756542] xe_pm_runtime_suspend+0x11b/0x1b0 [xe] [ 115.761463] xe_pci_runtime_suspend+0x23/0xd0 [xe] [ 115.766291] pci_pm_runtime_suspend+0x6b/0x1a0 [ 115.770717] ? pci_pm_thaw_noirq+0xa0/0xa0 [ 115.774797] __rpm_callback+0x48/0x1e0 [ 115.778531] ? pci_pm_thaw_noirq+0xa0/0xa0 [ 115.782614] rpm_callback+0x66/0x70 [ 115.786090] ? pci_pm_thaw_noirq+0xa0/0xa0 [ 115.790173] rpm_suspend+0xe1/0x5e0 [ 115.793647] ? psi_task_switch+0xb8/0x200 [ 115.797643] ? finish_task_switch.isra.0+0x8d/0x270 [ 115.802502] pm_runtime_work+0xa6/0xc0 [ 115.806238] process_one_work+0x186/0x350 [ 115.810234] worker_thread+0x33a/0x480 [ 115.813968] ? process_one_work+0x350/0x350 [ 115.818132] kthread+0x10c/0x220 [ 115.821350] ? kthreads_online_cpu+0x120/0x120 [ 115.825774] ret_from_fork+0x3a/0x60 [ 115.829339] ? kthreads_online_cpu+0x120/0x120 [ 115.833768] ret_from_fork_asm+0x11/0x20 [ 115.829339] ? kthreads_online_cpu+0x120/0x120 [ 115.833768] ret_from_fork_asm+0x11/0x20 [ 115.837680] </TASK> [ 115.839907] acpi_tad(E) drm(E) [ 115.931629] CR2: 00000000000005d0 [ 115.934935] ---[ end trace 0000000000000000 ]--- [ 115.939531] RIP: 0010:_raw_spin_lock+0x17/0x30 We cannot yet use xe->display to determine whether display hardware has been successfully probed/initialized or not. This is because xe->display would not be set to NULL even with GPUs with no display capabilities (e.g, GMD_ID_DISPLAY = 0). However, this might change in the future as Xe and i915 code is unified to deal with no display cases. Therefore, for now we have to continue to rely on xe->info.probe_display (which would be set to false with display-less GPUs) to decide whether to invoke any display related functions or not. Cc: Jani Nikula <jani.nikula@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20250605054247.386633-1-vivek.kasireddy@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2025-06-11 13:31:33 +03:00
Thomas Zimmermann	c598d5eb9f	Merge drm/drm-next into drm-misc-next Backmerging to forward to v6.16-rc1 Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2025-06-11 09:01:34 +02:00
Matthew Brost	10201c7de5	drm/xe: Reorder 'Get pages failed' message Print the error from get pages failing, not the cast to -ENODATA. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>> Link: https://lore.kernel.org/r/20250610045649.3149801-1-matthew.brost@intel.com	2025-06-10 07:04:07 -07:00
Thomas Hellström	86e2d052c2	Merge drm/drm-next into drm-xe-next Backmerging to bring in 6.16 Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-09 18:54:05 +02:00
Matt Roper	b5735e5e71	drm/xe: GSM size should be constant on most platforms On old Intel platforms, the size of the GSM (i.e., the stolen memory that holds the GGTT page table entries) could vary, so the driver needed to read the actual size from the PCI config space. However from Xe_HP onward, the GSM is now always guaranteed to be exactly 8MB (which translates to a 4GB GGTT address space); this is always true regardless of what the platform's much larger PPGTT address space is. The bspec doesn't document the PCI config space as being a valid way to query the size of the GSM after Xe_LP platforms, although so far it still seems to be giving us proper values for Xe_HP, Xe2, and Xe3. However we suspect that the config space will stop providing correct values on some upcoming platforms, so we should stop relying on it. Instead just use the hardcoded 8MB value as documented elsewhere in the bspec. Bspec: 49636, 67090, 50589 Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20250605225352.2333981-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-06-09 09:03:33 -07:00
Maarten Lankhorst	d6fb4f0173	drm/xe/svm: Fix regression disallowing 64K SVM migration When changing the condition from >= SZ_64K, it was changed to <= SZ_64K. This disallows migration of 64K, which is the exact minimum allowed. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5057 Fixes: `794f5493f5` ("drm/xe: Strict migration policy for atomic SVM faults") Cc: stable@vger.kernel.org Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://lore.kernel.org/r/20250521090102.2965100-1-dev@lankhorst.se (cherry picked from commit `531bef26d1`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-09 17:10:20 +02:00
Michal Wajdeczko	227c394d13	drm/xe/uc: Use GT-oriented firmware messages We are already prepared to define firmwares per-GT type, so we should also prepare our messages to be GT-oriented. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250606204311.813-1-michal.wajdeczko@intel.com	2025-06-09 16:01:52 +02:00

1 2 3 4 5 ...

3762 Commits