Wire up suspend/resume handles for I2C controller to match its power
state with SGUnit.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20250701122252.2590230-5-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Adding adaption/glue layer where the I2C host adapter
(Synopsys DesignWare I2C adapter) and the I2C clients (the
microcontroller units) are enumerated.
The microcontroller units (MCU) that are attached to the GPU
depend on the OEM. The initially supported MCU will be the
Add-In Management Controller (AMC).
Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo fixed the co-developed tags and SPDX format in the .c file]
Since we are already using reusable buffer objects from the GuC
buffer cache, we can directly write into their CPU pointers and
spare unnecessary temporary allocation.
While around, also make sure to clear obtained buffer, to avoid
sending some stale data.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250702142504.1656-1-michal.wajdeczko@intel.com
While we print VF's configuration KLVs only under DEBUG_SRIOV
config, we should be doing it at debug level, not info level.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://lore.kernel.org/r/20250703145709.1832-1-michal.wajdeczko@intel.com
While we already print VF's runtime registers only under DEBUG_SRIOV
config, we should be still doing it at debug level, not info.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://lore.kernel.org/r/20250604190021.725-2-michal.wajdeczko@intel.com
Add sysfs attributes for late binding features which expose bound version
to the user.
v2: Rework attribute and macro naming (Badal)
v3: Drop fancy formatting (Rodrigo)
v4: Form version string using local variables (Rodrigo)
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250709164224.2676086-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
xe_pm_runtime_put() is missed to be called for the error path in
xe_devcoredump_read().
Add function description comments for xe_devcoredump_read() to help
understand it.
v2: more detail function comments and refine goto logic (Matt)
Fixes: c4a2e5f865 ("drm/xe: Add devcoredump chunking")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250707004911.3502904-6-shuicheng.lin@intel.com
The deleted code is no longer needed because patch "drm/xe/guc: Plumb
GuC-capture into dev coredump" has removed the related usage code.
Remove the code to tidy up the function.
v2: s/bacause/because
Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250707004911.3502904-5-shuicheng.lin@intel.com
xe_bo_evict_all() is called after xe_display_pm_suspend(). So if there
is error with xe_bo_evict_all(), display pm should be restored.
Fixes: 51462211f4 ("drm/xe/pxp: add PXP PM support")
Fixes: cb8f81c175 ("drm/xe/display: Make display suspend/resume work on discrete")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250708035424.3608190-2-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Panther Lake has proven to be stable through testing and use.
Remove the force_probe requirement and enable the platform by default.
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250707201959.319406-1-matthew.s.atwood@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/i915 feature pull #2 for v6.17:
Features and functionality:
- Add drm_panic support for both i915 and xe drivers (Jocelyn Falempe)
- Add initial flip queue implementation, disabled by default, for LNL and PTL
(Ville)
- Add support for Wildcat Lake (WCL) display, version 30.02 (Matt Roper, Matt
Atwood, Dnyaneshwar)
- Extend drm_panel and follower support to DDI eDP (Arun)
Refactoring and cleanups:
- Make all global state objects opaque (Jani)
- Move display works to display specific unordered workqueue (Luca)
- Add and use struct drm_device based pcode interface (Jani, Lucas)
- Use clamp() instead of max()+min() combo (Ankit)
- Simplify wait for power well disable (Jani)
- Various stylistics cleanups and renames (Jani)
Fixes:
- Deal with loss of pipe DMC state (Ville)
- Fix PTL HDCP2 stream status check (Suraj)
- Add workaround for ADL-P DKL PHY DP and HDMI (Nemesa)
- Fix skl_print_wm_changes() stack usage with KMSAN (Arnd Bergmann)
- Fix PCON capability reads on non-branch devices (Chaitanya)
- Fix which platforms have ultra joiner (Ankit)
DRM core changes:
- Add ttm_bo_kmap_try_from_panic() for xe drm_panic support (Jocelyn Falempe)
- Add private pointer to struct drm_scanout buffer for xe/i915 drm_panic support
(Jocelyn Falempe)
Merges:
- Backmerge drm-next for drm_panel and xe changes (Jani)
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6d728bf6ef23681b00dfbc7da9aeae41042dee02@intel.com
There looks to be an issue in our compression handling when the BO pages
are very fragmented, where we choose to skip the identity map and
instead fall back to emitting the PTEs by hand when migrating memory,
such that we can hopefully do more work per blit operation. However in
such a case we need to ensure the src PTEs are correctly tagged with a
compression enabled PAT index on dgpu xe2+, otherwise the copy will
simply treat the src memory as uncompressed, leading to corruption if
the memory was compressed by the user.
To fix this pass along use_comp_pat into emit_pte() on the src side, to
indicate that compression should be considered.
v2 (Jonathan): tweak the commit message
Fixes: 523f191cc0 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com
(cherry picked from commit f7a2fd776e)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
This reverts commit fe0154cf82.
Seeing some unexplained random failures during LRC context switches with
indirect ring state enabled. The failures were always there, but the
repro rate increased with the addition of WA BB as a separate BO.
Commit 3a1edef8f4 ("drm/xe: Make WA BB part of LRC BO") helped to
reduce the issues in the context switches, but didn't eliminate them
completely.
Indirect ring state is not required for any current features, so disable
for now until failures can be root caused.
Cc: stable@vger.kernel.org
Fixes: fe0154cf82 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 03d85ab36b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
CIRC_SPACE does not work unless the size argument is a power of 2,
allocate PF queue size on power of 2 boundary.
Cc: stable@vger.kernel.org
Fixes: 3338e4f90c ("drm/xe: Use topology to determine page fault queue size")
Fixes: 29582e0ea7 ("drm/xe: Add page queue multiplier")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com
(cherry picked from commit 491b978312)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Our LMEM buffer objects are not cleared by default on alloc
and during VF provisioning we only setup LMTT PTEs for the
actually provisioned LMEM range. But beyond that valid range
we might leave some stale data that could either point to some
other VFs allocations or even to the PF pages.
Explicitly clear all new LMTT page to avoid the risk that a
malicious VF would try to exploit that gap.
While around add asserts to catch any undesired PTE overwrites
and low-level debug traces to track LMTT PT life-cycle.
Fixes: b1d2040582 ("drm/xe/pf: Introduce Local Memory Translation Table")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com
(cherry picked from commit 3fae6918a3)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There looks to be an issue in our compression handling when the BO pages
are very fragmented, where we choose to skip the identity map and
instead fall back to emitting the PTEs by hand when migrating memory,
such that we can hopefully do more work per blit operation. However in
such a case we need to ensure the src PTEs are correctly tagged with a
compression enabled PAT index on dgpu xe2+, otherwise the copy will
simply treat the src memory as uncompressed, leading to corruption if
the memory was compressed by the user.
To fix this pass along use_comp_pat into emit_pte() on the src side, to
indicate that compression should be considered.
v2 (Jonathan): tweak the commit message
Fixes: 523f191cc0 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com
This reverts commit fe0154cf82.
Seeing some unexplained random failures during LRC context switches with
indirect ring state enabled. The failures were always there, but the
repro rate increased with the addition of WA BB as a separate BO.
Commit 3a1edef8f4 ("drm/xe: Make WA BB part of LRC BO") helped to
reduce the issues in the context switches, but didn't eliminate them
completely.
Indirect ring state is not required for any current features, so disable
for now until failures can be root caused.
Cc: stable@vger.kernel.org
Fixes: fe0154cf82 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There is a remote chance that after migration, some GTs will not
send the MIGRATED interrupt, or due to current VF KMD state the
interrupt will not lead to marking the GT for recovery.
Requiring IRQs from all GTs before starting migration introduces
the possibility that the process will get stalled due to one GuC.
One could argue it is also waste of time to wait for all IRQs,
but we should get them all IRQs as soon as VGPU starts, so that's
not really an impactful argument.
Still, not waiting for all GTs makes it easier to handle situations:
* where one GuC IRQ is missing
* where state before probe is unclean - getting MIGRATED IRQ as soon
as interrupts are enabled
* where multiple migrations happen close to each other
To help with these cases, this patch alters the post-migration
recovery so that recovery task is started as soon as one GuC IRQ
is handled, and other GTs are included in recovery later as the
subsequent IRQs are serviced.
The post-migration recovery can now be called for any selection of
GTs, and it will perform recovery on all GTs for which IRQs have
arrived, even multiple times if necessary.
v2: Typos and style fixes
v3: Transferring gt_flags by value rather than reference to last
function where it is used
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250630152155.195648-1-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
CIRC_SPACE does not work unless the size argument is a power of 2,
allocate PF queue size on power of 2 boundary.
Cc: stable@vger.kernel.org
Fixes: 3338e4f90c ("drm/xe: Use topology to determine page fault queue size")
Fixes: 29582e0ea7 ("drm/xe: Add page queue multiplier")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com
Our LMEM buffer objects are not cleared by default on alloc
and during VF provisioning we only setup LMTT PTEs for the
actually provisioned LMEM range. But beyond that valid range
we might leave some stale data that could either point to some
other VFs allocations or even to the PF pages.
Explicitly clear all new LMTT page to avoid the risk that a
malicious VF would try to exploit that gap.
While around add asserts to catch any undesired PTE overwrites
and low-level debug traces to track LMTT PT life-cycle.
Fixes: b1d2040582 ("drm/xe/pf: Introduce Local Memory Translation Table")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com
When a user closes an exec queue or interrupts an app with Ctrl-C,
this does not warrant wedging the device in mode 2.
Avoid this by skipping the wedge check for killed exec queues in
the TDR and LR exec queue cleanup worker.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250624174103.2707941-1-matthew.brost@intel.com
(cherry picked from commit 5a2f117a80)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
This WA is applicable to BMG as well.
Note that this is a GSC WA and we don't load the GSC on BMG, so
extending the WA to BMG won't do anything right now. However, it helps
future-proof the driver so that if we ever turn the GSC on we won't have
to remember to extend this WA.
v2: don't use VERSION_RANGE from 2001 to 2004 (Matt)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250613231128.1261815-2-daniele.ceraolospurio@intel.com
(cherry picked from commit 1a5ce0c5b9)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Validate gt instead of checking gt_id is lesser
than max gts per tile
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250630093741.2435281-1-riana.tauro@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
The 'id' value updated by for_each_gt() is the uapi GT ID of the GTs
being iterated over, and may skip over values if a GT is not present on
the device. Use a separate iterator for GT list array assignments to
ensure that the array will be filled properly on future platforms where
index in the GT query list may not match the uapi ID.
v2:
- Include the missing increment of the iterator. (Jonathan)
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-16-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
On current platforms with multiple GTs, all of the GT IDs are
consecutive; as a result we know that the GT IDs range from 0 to
gt_count-1 and can determine if a GT ID is valid by comparing against
the count. The consecutive nature of GT IDs may not hold true on future
platforms if/when we have platforms that are both multi-tile and have
multiple GTs within each tile. Once such platforms exist, it's quite
possible that we could wind up with something like a GT list composed of
IDs 0, 2, and 3 with no GT 1 (which would be a 2-tile platform with
media only on the second tile).
To future-proof the code we should stop comparing against the GT count
to determine whether a GT ID is valid or not. Instead we should do an
actual lookup of the ID to determine whether the GT exists. This also
means that our GT loop macro should not end at the GT count, but should
rather examine the entire space up to (# of tiles) * (max GT per tile)
to ensure it doesn't stop prematurely.
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-15-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Although "multi-tile" and "multiple GTs per tile" are mutually-exclusive
characteristics on all of our platforms today, this may not always be
true. Assign GT IDs according to xe->info.max_gt_per_tile in a way that
should work even if future platforms have different configurations.
This patch should not change the behavior of current platforms; it only
future-proofs for potential future designs.
v2:
- Re-calculate gt_count if tile count gets reduced by MTCFG. (PVC CI)
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-14-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Add a simple kunit test to ensure each platform's GT per tile count is
non-zero and does not exceed the global XE_MAX_GT_PER_TILE definition.
We need to move 'struct xe_subplatform_desc' from the .c file to the
types header to ensure it is accessible from the kunit test.
v2:
- Rebase on latest xe_pci test rework from Michal and convert to
a parameterized test that runs on each PCI ID supported by the
driver.
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Today all of our platforms fall into one of three cases:
* Single tile platforms with a single (primary) GT
* Single tile platforms with two GTs (primary + media)
* Two-tile platforms with a single GT (primary) in each
Our numbering of GTs has been a bit inconsistent between platforms
(e.g., GT1 is the media GT on some platforms, but the second tile's
primary GT on others). In the future we'll likely have platforms that
are both multi-tile and multi-GT, which will make the situation more
confusing. We could also wind up with more than just two types of GTs
at some point in the future.
Going forward we should standardize the way we assign uapi GT IDs to
internal GT structures. Let's declare that for userspace GT ID n,
GT[n]'s tile = n / (max gt per tile)
GT[n]'s slot within tile = n % (max gt per tile)
We don't want the GT numbering to change for any of our current
platforms since the current IDs are part of our ABI contract with
userspace so this means we should track the 'max gt per tile' value on a
per-platform basis rather than just using a single value across the
driver. Encode this into device descriptors in xe_pci.c and use the
per-platform number for various checks in the code. Constant
XE_MAX_GT_PER_TILE will remain just as the maximum across all platforms
for easy of sizing array allocations.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-12-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
xe_step_name() is used by xe_assert(), so adding assertions to functions
like xe_device_get_gt() will result in
ERROR: modpost: "xe_step_name" [drivers/gpu/drm/xe/tests/xe_test.ko] undefined!
while building the kunit tests. Export xe_step_name to avoid these
build failures when adding assertions.
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-11-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
If we fail to allocate a workqueue we will leak kzalloc'ed group
object since it was designed to be kfree'ed in the drmm cleanup
action, but we didn't have a chance to register this action yet.
To avoid this leak allocate a group object using drmm_kzalloc()
and start using predefined drmm action to release the workqueue.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250627184143.1480-1-michal.wajdeczko@intel.com
Fix Kconfig symbol dependency on KUNIT, which isn't actually required
for XE to be built-in. However, if KUNIT is enabled, it must be built-in
too.
Fixes: 08987a8b68 ("drm/xe: Fix build with KUNIT=m")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Harry Austen <hpausten@protonmail.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-2-756fe5cd56cf@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit a559434880)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
The xe driver is the official driver for Intel Xe2 and later, while
maintaining experimental support for earlier GPUs. Reword the help
message accordingly.
Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://lore.kernel.org/r/20250611-xe-kconfig-help-v1-1-8bcc6b47d11a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 1488a3089d)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Limit GT max frequency to 2600MHz and wait for frequency to reduce
before proceeding with a transient flush. This is really only needed for
the transient flush: if L2 flush is needed due to 16023588340 then
there's no need to do this additional wait since we are already using
the bigger hammer.
v2: Use generic names, ensure user set max frequency requests wait
for flush to complete (Rodrigo)
v3:
- User requests wait via wait_var_event_timeout (Lucas)
- Close races on flush + user requests (Lucas)
- Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt
rather than root gt (Lucas)
v4:
- Only apply the freq reducing part if a TDF is needed: L2 flush trumps
the need for waiting a lower frequency
Fixes: aaa08078e7 ("drm/xe/bmg: Apply Wa_22019338487")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit deea6a7d6d)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Set GT min frequency to 1200Mhz once driver load is complete.
v2: Review comments (Rodrigo)
v3: Apply Wa earlier so user_req_min is not clobbered.
v4: Apply to all GTs (Lucas)
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-3-94ba5dcc1e30@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit bdde16c9ac)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
xe_device_td_flush() has 2 possible implementations: an entire L2 flush
or a transient flush, depending on WA 16023588340. Make this clear by
splitting the function so it calls each of them.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-3-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 5e300ed8a5)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
pc_set_mert_freq_cap() currently lock()/unlock() the mutex multiple times
to stash the current frequencies. It's not a problem since
xe_guc_pc_restore_stashed_freq() is guaranteed to be called only later
in the init sequence. However, now that we have _locked() variants for
this functions, use them and avoid potential issues when called from
other places or using the same pattern.
While at it, prefer and early return for the WA check to reduce
indentation.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-2-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit d878c97daa)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There are places in which the getters/setters are called one after the
other causing a multiple lock()/unlock(). These are not currently a
problem since they are all happening from the same thread, but there's a
race possibility as calls are added outside of the early init when the
max/min and stashed values need to be correlated.
Add the _locked() variants to prepare for that.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-1-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 1beae9aa2b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
No idea why, but without this GuC context switches randomly fail when
running IGTs in a loop. Need to follow up why this fixes the
aforementioned issue but can live with a stable driver for now.
Fixes: 617d824c53 ("drm/xe: Add WA BB to capture active context utilization")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250612031925.4009701-1-matthew.brost@intel.com
(cherry picked from commit 3a1edef8f4)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
According to Bspec, bits 0~9 of MI_STORE_DATA_IMM must not exceed 0x3FE.
The macro MI_SDI_NUM_QW(x) evaluates to 2 * x + 1, which means the
condition 2 * x + 1 <= 0x3FE must be satisfied. Therefore, the maximum
valid value for x is 0x1FE, not 0x1FF.
v2
- Replace 0x1fe with macro MAX_PTE_PER_SDI (Auld, Matthew & Patelczyk, Maciej)
v3
- Change macro MAX_PTE_PER_SDI from 0x1fe to 0x1feU (De Marchi, Lucas)
Bspec: 60246
Fixes: 9c44fd5f6e ("drm/xe: Add migrate layer functions for SVM support")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Brian3 Nguyen <brian3.nguyen@intel.com>
Cc: Alex Zuo <alex.zuo@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Suggested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Link: https://lore.kernel.org/r/20250612224620.161105-1-jia.yao@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit c038bdba98)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Attempt to consolidate the LRC offsets calculations by aligning the
recently added wa_bb_offset with the naming scheme in the file and
also change the size stored in struct xe_lrc to not include the ring
buffer.
The former makes it somewhat visually easier to follow the layout of the
various logical blocks stored in the LRC bo, while the latter reduces the
number of sprinkled around calculations.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250630124711.8209-2-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Fix Kconfig symbol dependency on KUNIT, which isn't actually required
for XE to be built-in. However, if KUNIT is enabled, it must be built-in
too.
Fixes: 08987a8b68 ("drm/xe: Fix build with KUNIT=m")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Harry Austen <hpausten@protonmail.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-2-756fe5cd56cf@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
If CONFIG_DRM_XE_DISPLAY is set, the xe module can only be built as
module to avoid duplicate symbols from i915. The interface for pcode was
added without considering that, so the build breaks if both xe and i915
are built-in.
Since the intel_pcode_* functions should only be called from the display
side (xe side should call the xe interface directly) and there's already
a protection in Kconfig to avoid the problematic configuration, ifdef it
out in case CONFIG_DRM_XE_DISPLAY is disabled.
Closes: https://lore.kernel.org/r/3667a992-a24b-4e49-aab2-5ca73f2c0a56@infradead.org
Fixes: d9465cc8ac ("drm/xe/pcode: add struct drm_device based interface")
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-1-756fe5cd56cf@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
bo->size is redundant because the base GEM object already has a size
field with the same value. Drop bo->size and use the base GEM object’s
size instead. While at it, introduce xe_bo_size() to abstract the BO
size.
v2:
- Fix typo in kernel doc (Ashutosh)
- Fix kunit (CI)
- Fix line wrap (Checkpatch)
v3:
- Fix sriov build (CI)
v4:
- Fix display build (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com
The Dynamic Inhibit Context Switch is an optimization aimed at reducing
the amount of time the HW is stuck waiting on an unsatisfied semaphore.
When this optimization is enabled, the GuC will dynamically modify the
CTX_CTRL_INHIBIT_SYN_CTX_SWITCH in the CTX_CONTEXT_CONTROL register of
LRCs to enable immediate switching out on an unsatisfied semaphore wait
when multiple contexts are competing for time on the same engine.
This feature is available on recent HW from GuC 70.40.1 onwards and it
is enabled via a per-VF feature opt-in.
v2: rebase
v3: switch to using guc_buf_cache instead of dedicated alloc
v4: add helper to check for feature availability (Michal), don't enable
if multi-lrc is possible.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Julia Filipchuk <julia.filipchuk@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250625205405.1653212-4-daniele.ceraolospurio@intel.com
On newer HW (Xe2 onwards + PVC) it is possible to get extra information
when a CAT error occurs, specifically a dword reporting the error type.
To enable this extra reporting, we need to opt-in with the GuC, which is
done via a specific per-VF feature opt-in H2G.
On platforms where the HW does not support the extra reporting, the GuC
will set the type to 0xdeadbeef, so we can keep the code simple and
opt-in to the feature on every platform and then just discard the data
if it is invalid.
Note that on native/PF we're guaranteed that the opt in is available
because we don't support any GuC old enough to not have it, but if we're
a VF we might be running on a non-XE PF with an older GuC, so we need to
handle that case. We can re-use the invalid type above to handle this
scenario the same way as if the feature was not supported in HW.
Given that this patch is the first user of the guc_buf_cache on native
and VF, it also extends that feature to non-PF use-cases.
v2: simpler print for the error type (John), rebase
v3: use guc_buf_cache instead of new alloc, simpler doc (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250625205405.1653212-3-daniele.ceraolospurio@intel.com
Provide the lower level code for PIPEDMC based flip queue.
We'll use the so called semi-full flip queue mode where the
PIPEDMC will start the provided DSB on a scanline a little
ahead of the vblank. We need to program the triggering scanline
early enough so that the DSB has enough time to complete writing
all the double buffered registers before they get latched (at
start of vblank).
The firmware implements several queues:
- 3 "plane queues" which execute a single DSB per entry
- 1 "general queue" which can apparently execute 2 DSBs per entry
- 1 vestigial "fast queue" that replaced the "simple flip queue"
on ADL+, but this isn't supposed to be used due to issues.
But we only need a single plane queue really, and we won't actually
use it as a real queue because we don't allow queueing multiple commits
ahead of time. So the whole thing is perhaps useless. I suppose
there migth be some power saving benefits if we would get the flip
scheduled by userspace early and then could keep some hardware powered
off a bit longer until the DMC kicks off the flipq programming. But that
is pure speculation at this time and needs to be proven.
The code to hook up the flip queue into the actual atomic commit
path will follow later.
TODO: need to think how to do the "wait for DMC firmware load" nicely
need to think about VRR and PSR
etc.
v2: Don't write DMC_FQ_W2_PTS_CFG_SEL on pre-lnl
Don't oops at flipq init if there is no dmc
v3: Adapt to PTL+ flipq changes (different queue entry
layout, different trigger event, need VRR TG)
Use the actual CDCLK frequency
Ask the DSB code how long things are expected to take
v3: Adjust the cdclk rounding (docs are 100% vague, Windows
rounds like this)
Initialize some undocumented magic DMC variables on PTL
v4: Use PIPEDMC_FQ_STATUS for busy check (the busy bit in
PIPEDMC_FQ_CTRL is apparently gone on LNL+)
Based the preempt timeout on the max exec time
Preempt before disabling the flip queue
Order the PIPEDMC_SCANLINECMP* writes a bit more carefully
Fix some typos
v5: Try to deal with some clang-20 div-by-zero false positive (Nathan)
Add some docs (Jani)
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
epr
Link: https://patchwork.freedesktop.org/patch/msgid/20250624170049.27284-5-ville.syrjala@linux.intel.com
On Alder Lake and later, it's not possible to disable tiling when DPT
is enabled.
So this commit implements Y-Tiling support, to still be able to draw
the panic screen.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20250624091501.257661-10-jfalempe@redhat.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Implement both functions for i915 and xe, they prepare the work for
drm_panic support.
They both use kmap_try_from_panic(), and map one page at a time, to
write the panic screen on the framebuffer.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20250624091501.257661-8-jfalempe@redhat.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Encapsulate the struct intel_framebuffer into an xe_framebuffer
or i915_framebuffer, and allow to add specific fields for each
variant for the panic use-case.
This is particularly needed to have a struct xe_res_cursor available
to support drm panic on discrete GPU.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20250624091501.257661-7-jfalempe@redhat.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
The vaddr of the fbdev framebuffer is private to the struct
intel_fbdev, so this function is needed to access it for drm_panic.
Also the struct i915_vma is different between i915 and xe, so it
requires a few functions to access fbdev->vma->iomap.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20250624091501.257661-3-jfalempe@redhat.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
According to Bspec, bits 0~9 of MI_STORE_DATA_IMM must not exceed 0x3FE.
The macro MI_SDI_NUM_QW(x) evaluates to 2 * x + 1, which means the
condition 2 * x + 1 <= 0x3FE must be satisfied. Therefore, the maximum
valid value for x is 0x1FE, not 0x1FF.
v2
- Replace 0x1fe with macro MAX_PTE_PER_SDI (Auld, Matthew & Patelczyk, Maciej)
v3
- Change macro MAX_PTE_PER_SDI from 0x1fe to 0x1feU (De Marchi, Lucas)
Bspec: 60246
Fixes: 9c44fd5f6e ("drm/xe: Add migrate layer functions for SVM support")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Brian3 Nguyen <brian3.nguyen@intel.com>
Cc: Alex Zuo <alex.zuo@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Suggested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Link: https://lore.kernel.org/r/20250612224620.161105-1-jia.yao@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
It feels to me like load is closer to the intention than init_hw.
It makes the init calls slightly less confusing to me. :)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-24-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
xe_uc_init_hw() is called multiple times from xe_gt.c,
and that makes the name xe_uc_fini_hw(), called for a different
reason in xe_guc.c confusing.
Remove it and inline the xe_uc_sanitize_reset into xe_guc.c directly.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-23-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Now that all previous allocations are gone, ensure no new allocations
will ever be done before xe_display_init_early(), by moving the call
that allows allocations downwards.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-21-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Now that we added the separate step of initialising GUC in
xe_gt_init_early, it should be ok to initialise the minimum during early
init, and the rest after allocations are allowed.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-20-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
s/gt_fw_domain_init/gt_init_with_gt_forcewake()/
s/all_fw_domain_init/gt_init_with_all_forcewake()/
Clarify that the functions are the part of gt_init() that are called
with the respective power domains held. all_domain() of course only
works after discovering and initialisation of force_wake on all engines,
that's why the split is needed in the first place.
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-19-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
After discussion with Lucas De Marchi, it turns out that is the
specific caller requiring a dump. This allows us to cleanup
gt_init in a bit.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-18-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
After mcr_init_early, we need to be able to do VRAM and CCS probing
without hwconfig probe. Fortunately the relevant registers are all
instance 0, which fortunately means no dependencies on further initialization
is required.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-17-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Add a 2-stage GuC init. An early one for everything that is needed
for VF, and a full one that loads GuC and is allowed to do allocations.
Link: https://lore.kernel.org/r/20250619104858.418440-16-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
We want to split up GUC init to an alloc and noalloc part to keep the
init path the same for VF and !VF as much as possible.
Everything in vf_guc_init should be done as early as possible, otherwise
VRAM probing becomes impossible.
Also move xe_gt_mmio_init to the end of xe_gt_init_early(), cleaning up
the init in xe_device slightly.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-15-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
memirqs require allocations into GGTT, which we cannot use until
after display is enabled.
Now that the initialisation of interrupts is postponed, move memirq
init too.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-14-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Add runtime PM since we might call populate_mm on a foreign device.
v3:
- Fix a kerneldoc failure (Matt Brost)
- Revert the bo type change from device to kernel (Matt Brost)
v4:
- Add an assert in xe_svm_alloc_vram (Matt Brost)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250619134035.170086-4-thomas.hellstrom@linux.intel.com
The migration functionality and track-keeping of per-pagemap VRAM
mapped to the CPU mm is not per GPU_vm, but rather per pagemap.
This is also reflected by the functions not needing the drm_gpusvm
structures. So move to drm_pagemap.
With this, drm_gpusvm shouldn't really access the page zone-device-data
since its meaning is internal to drm_pagemap. Currently it's used to
reject mapping ranges backed by multiple drm_pagemap allocations.
For now, make the zone-device-data a void pointer.
Alter the interface of drm_gpusvm_migrate_to_devmem() to ensure we don't
pass a gpusvm pointer.
Rename CONFIG_DRM_XE_DEVMEM_MIRROR to CONFIG_DRM_XE_PAGEMAP.
Matt is listed as author of this commit since he wrote most of the code,
and it makes sense to retain his git authorship.
Thomas mostly moved the code around.
v3:
- Kerneldoc fixes (CI)
- Don't update documentation about how the drm_pagemap
migration should be interpreted until upcoming
patches where the functionality is implemented.
(Matt Brost)
v4:
- More kerneldoc fixes around timeslice_ms
(Himal Ghimiray, Matt Brost)
v6:
- Fix an uninitialized pagemap pointer (CI)
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250619134035.170086-2-thomas.hellstrom@linux.intel.com
To avoid duplicating the tricky bo locking implementation,
Implement ttm_lru_walk_for_evict() using the guarded bo LRU iteration.
To facilitate this, support ticketlocking from the guarded bo LRU
iteration.
v2:
- Clean up some static function interfaces (Christian König)
- Fix Handling -EALREADY from ticketlocking in the loop by
skipping to the next item. (Intel CI)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250623155313.4901-4-thomas.hellstrom@linux.intel.com
Instead of the struct ttm_operation_ctx, Pass a struct ttm_lru_walk_arg
to enable us to easily extend the walk functionality, and to
implement ttm_lru_walk_for_evict() using the guarded LRU iteration.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250623155313.4901-3-thomas.hellstrom@linux.intel.com
During driver probe we might be briefly using CT safe mode, which
is based on a delayed work, but usually we are able to stop this
once we have IRQ fully operational. However, if we abort the probe
quite early then during unwind we might try to destroy the workqueue
while there is still a pending delayed work that attempts to restart
itself which triggers a WARN.
This was recently observed during unsuccessful VF initialization:
[ ] xe 0000:00:02.1: probe with driver xe failed with error -62
[ ] ------------[ cut here ]------------
[ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq
[ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710
[ ] RIP: 0010:__queue_work+0x287/0x710
[ ] Call Trace:
[ ] delayed_work_timer_fn+0x19/0x30
[ ] call_timer_fn+0xa1/0x2a0
Exit the CT safe mode on unwind to avoid that warning.
Fixes: 09b286950f ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250612220937.857-3-michal.wajdeczko@intel.com
(cherry picked from commit 2ddbb73ec2)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Only need the flush for DPT host updates here. Normal GGTT updates don't
need special flush.
Fixes: 01570b4469 ("drm/xe/bmg: implement Wa_16023588340")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250606104546.1996818-4-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 35db1da40c)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Flushing l2 is only needed after all data has been written.
Fixes: 01570b4469 ("drm/xe/bmg: implement Wa_16023588340")
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://lore.kernel.org/r/20250606104546.1996818-3-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 0dd2dd0182)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
When a user closes an exec queue or interrupts an app with Ctrl-C,
this does not warrant wedging the device in mode 2.
Avoid this by skipping the wedge check for killed exec queues in
the TDR and LR exec queue cleanup worker.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250624174103.2707941-1-matthew.brost@intel.com
This reverts commit 3972872e45.
There are several things wrong with the way this WA was implemented:
- The KLV is only supported on GuC 70.47.0 or newer, so we shouldn't
apply it unconditionally.
- The KLV requires 2 DWs of data, which are not currently provided.
The GuC currently ignores any unknown KLVs, so on versions older that
70.47.0 nothing happens. However, starting on 70.47.0 the GuC attempts
to parse the KLV and fails due to the missing data, causing a GuC load
abort.
Given that 70.47.0 is the first GuC version approved for public release
for PTL, let's revert this patch so it doesn't cause the GuC load to
fail with that blob. We can then re-apply it properly fixed after the
GuC definition is merged, which will also have the added benefit of
running the KLV addition through CI with the right GuC version.
Fixes: 3972872e45 ("drm/xe/ptl: Apply Wa_16026007364")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: sanirban <sk.anirban@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250625001202.1616606-2-daniele.ceraolospurio@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
During driver probe we might be briefly using CT safe mode, which
is based on a delayed work, but usually we are able to stop this
once we have IRQ fully operational. However, if we abort the probe
quite early then during unwind we might try to destroy the workqueue
while there is still a pending delayed work that attempts to restart
itself which triggers a WARN.
This was recently observed during unsuccessful VF initialization:
[ ] xe 0000:00:02.1: probe with driver xe failed with error -62
[ ] ------------[ cut here ]------------
[ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq
[ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710
[ ] RIP: 0010:__queue_work+0x287/0x710
[ ] Call Trace:
[ ] delayed_work_timer_fn+0x19/0x30
[ ] call_timer_fn+0xa1/0x2a0
Exit the CT safe mode on unwind to avoid that warning.
Fixes: 09b286950f ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250612220937.857-3-michal.wajdeczko@intel.com
Prevent other bits of mailbox power limit from being overwritten with 0.
This issue was due to a missing read and modify of current power limit,
before setting a requested mailbox power limit, which is added in this
patch.
v2:
- Improve commit message. (Anshuman)
v3:
- Rebase.
- Rephrase commit message. (Riana)
- Add read-modify-write variant of xe_hwmon_pcode_write_power_limit()
i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal)
- Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits.
- Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits
writes use xe_hwmon_pcode_rmw_power_limit() only.
v4:
- Use PWR_LIM in place of (PWR_LIM_EN | PWR_LIM_VAL) wherever
applicable. (Riana)
Fixes: 25a2aa779f ("drm/xe/hwmon: Add support to manage power limits though mailbox")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 8aa7306631)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Only need the flush for DPT host updates here. Normal GGTT updates don't
need special flush.
Fixes: 01570b4469 ("drm/xe/bmg: implement Wa_16023588340")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250606104546.1996818-4-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Flushing l2 is only needed after all data has been written.
Fixes: 01570b4469 ("drm/xe/bmg: implement Wa_16023588340")
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://lore.kernel.org/r/20250606104546.1996818-3-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Limit GT max frequency to 2600MHz and wait for frequency to reduce
before proceeding with a transient flush. This is really only needed for
the transient flush: if L2 flush is needed due to 16023588340 then
there's no need to do this additional wait since we are already using
the bigger hammer.
v2: Use generic names, ensure user set max frequency requests wait
for flush to complete (Rodrigo)
v3:
- User requests wait via wait_var_event_timeout (Lucas)
- Close races on flush + user requests (Lucas)
- Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt
rather than root gt (Lucas)
v4:
- Only apply the freq reducing part if a TDF is needed: L2 flush trumps
the need for waiting a lower frequency
Fixes: aaa08078e7 ("drm/xe/bmg: Apply Wa_22019338487")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
xe_device_td_flush() has 2 possible implementations: an entire L2 flush
or a transient flush, depending on WA 16023588340. Make this clear by
splitting the function so it calls each of them.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-3-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
pc_set_mert_freq_cap() currently lock()/unlock() the mutex multiple times
to stash the current frequencies. It's not a problem since
xe_guc_pc_restore_stashed_freq() is guaranteed to be called only later
in the init sequence. However, now that we have _locked() variants for
this functions, use them and avoid potential issues when called from
other places or using the same pattern.
While at it, prefer and early return for the WA check to reduce
indentation.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-2-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There are places in which the getters/setters are called one after the
other causing a multiple lock()/unlock(). These are not currently a
problem since they are all happening from the same thread, but there's a
race possibility as calls are added outside of the early init when the
max/min and stashed values need to be correlated.
Add the _locked() variants to prepare for that.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-1-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
With display code using the struct drm_device based pcode interface, we
can drop the old pcode compat interface.
We can also drop the __compat_uncore_to_tile() helper from
intel_uncore.h compat header.
Turns out a couple of headers depended on the intel_uncore.h include via
intel_pcode.h. Fix them.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/948016a031dcb2acef0c97071aac09fa49613e07.1750678991.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In preparation for dropping the dependency on struct intel_uncore or
struct xe_tile from display code, add a struct drm_device based
interface to pcode.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/eeaa9cc8438caab2e22f9cb2142fbc18cc0fd861.1750678991.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Only use the ms granularity wait in snb_pcode_write_timeout(), primarily
to better align with the xe driver, which also only has the millisecond
wait.
Use an arbitrary 250 us fast wait before the specified ms wait, and have
snb_pcode_write() default to 1 ms.
This means snb_pcode_write() and snb_pcode_write_timeout() will always
be sleeping functions. There should not be any atomic users for pcode
writes though, and any display code using pcode via xe has already been
non-atomic. The uncore wait will do a might_sleep() annotation that
should catch any problems.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/ba86280f53cea2d020308db35f1ecbd615d07d8a.1750678991.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Erase command is slow on discrete graphics storage
and may overshot PCI completion timeout.
BMG introduces the ability to have non-posted erase.
Add driver support for non-posted erase with polling
for erase completion.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Reuven Abliyev <reuven.abliyev@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-9-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Enable access to internal non-volatile memory on DGFX
with GSC/CSC devices via a child device.
The nvm child device is exposed via auxiliary bus.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-7-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/i915 feature pull for v6.17:
Features and functionality:
- Add support for DSC fractional link bpp on DP MST (Imre)
- Add support for simultaneous Panel Replay and Adaptive Sync (Jouni)
- Add support for PTL+ double buffered LUT registers (Chaitanya, Ville)
- Add PIPEDMC event handling in preparation for flip queue (Ville)
Refactoring and cleanups:
- Rename lots of DPLL interfaces to unify them (Suraj)
- Allocate struct intel_display dynamically (Jani)
- Abstract VLV IOSF sideband better (Jani)
- Use str_true_false() helper (Yumeng Fang)
- Refactor DSB code in preparation for flip queue (Ville)
- Use drm_modeset_lock_assert_held() instead of open coding (Luca)
- Remove unused arg from skl_scaler_get_filter_select() (Luca)
- Split out a separate display register header (Jani)
- Abstract DRAM detection better (Jani)
- Convert LPT/WPT SBI sideband to struct intel_display (Jani)
Fixes:
- Fix DSI HS command dispatch with forced pipeline flush (Gareth Yu)
- Fix BMG and LNL+ DP adaptive sync SDP programming (Ankit)
- Fix error path for xe display workqueue allocation (Haoxiang Li)
- Disable DP AUX access probe where not required (Imre)
- Fix DKL PHY access if the port is invalid (Luca)
- Fix PSR2_SU_STATUS access on ADL+ (Jouni)
- Add sanity checks for porch and sync on BXT/GLK DSI (Ville)
DRM core changes:
- Change AUX DPCD access probe address (Imre)
- Refactor EDID quirks, amd make them available to drivers (Imre)
- Add quirk for DPCD access probe (Imre)
- Add DPCD definitions for Panel Replay capabilities (Jouni)
Merges:
- Backmerges to sync with v6.15-rcs and v6.16-rc1 (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/fff9f231850ed410bd81b53de43eff0b98240d31@intel.com
As part of this WA GuC will save and restore value of two XE3_Media
control registers that were not included in the HW power context.
v2:
- Update klv name (Badal)
Signed-off-by: sanirban <sk.anirban@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250619133413.107423-2-sk.anirban@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
UAPI Changes:
- Add Task Information for the wedge API
Cross-subsystem Changes:
Core Changes:
- Fix warnings related to export.h
- fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures
- fence: Fix UAF issues
- format-helper: Improve tests
Driver Changes:
- ivpu: Add turbo flag, Add Wildcat Lake Support
- rz-du: Improve MIPI-DSI Support
- vmwgfx: fence improvement
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaFOwgQAKCRAnX84Zoj2+
dkbjAX9aGa2vGeoz9fiT4wMMvxWzLSW7EzJW9oC/iFitHOcmd0yUZCfdmUfukQ3T
cXtVHFcBf3clQ1iI4fV8EQwLOEaBpQ1H642/41pAebXOr9kQ6JOQ4AqhJBqamJzv
teGbWnA2+w==
=inwC
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2025-06-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:
UAPI Changes:
- Add Task Information for the wedge API
Cross-subsystem Changes:
Core Changes:
- Fix warnings related to export.h
- fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures
- fence: Fix UAF issues
- format-helper: Improve tests
Driver Changes:
- ivpu: Add turbo flag, Add Wildcat Lake Support
- rz-du: Improve MIPI-DSI Support
- vmwgfx: fence improvement
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250619-perfect-industrious-whippet-8ed3db@houat
- Expose media OA units (Ashutosh)
Merge:
- Restore GuC submit UAF fix around queue destruction
accidentally removed in a drm-xe-fixes merge (Auld)
Core Changes:
- drm/gpusvm: Introduce devmem_only flag for allocation (Himal)
- drm/gpusvm: Add timeslicing support to GPU SVM (Brost)
Driver Changes:
- Make gem shrinker drm managed (Thomas)
- SRIOV VF Post-migration recovery of GGTT nodes and CTB (Tomasz)
- Some W/A additions and updates (Aradhya, Shekhar, Vinay, Daniele)
- Prefetch Support for svm ranges (Himal, Brost)
- Don't allocate managed BO for each policy change (Michal)
- Simplify and fix diff calculation in GuC submit (Lucas)
- Track FAST_REQ GuC H2Gs to report where errors came from (John)
- SRIOV PF: Don't allow LMEM provisioning if LMTT isn't available (Piotr)
- Check if all domains awake for MOCS dump (Tejas)
- Make creation of SLPC debugfs files conditional (Aradhya)
- Default auto_link_downgrade status to false (Aradhya)
- Use xe_mmio_read32() to read mtcfg register (Shuicheng)
- Updates in PCI ID tables (Atwood, Shekhar)
- SRIOV VF: Fail migration recovery if fixups needed but not supported (Tomasz)
- Add missing documentation around freq and RPa (Rodrigo)
- Some other SVM related fixes (Himal, Auld, Brost, Maarten)
- Allow to trigger GT resets using debugfs writes (Michal)
- Optimise CCS case for WB pages (Auld)
- Create LRC BO without VM (Niranjana)
- Initialize MOCS index early (Bala)
- HWMON fixes for BMG (Karthik, Lucas)
- Drop redundant conversion to bool (Raag)
- Rework eviction rejection of bound external bos (Thomas)
- Stop re-submitting signalled jobs (Auld)
- Small fixes and cleanups for PXP (Daniele)
- Convert some print messages to GT-oriented ones (Michal)
- Resend potentially lost GuC H2G MMIO request (Michal)
- Add configfs to load with fewer engines (Lucas)
- Remove unmatched xe_vm_unlock from __xe_exec_queue_init (Maciej)
- SRIOV VF: Small updates around GGTT handling (Michal)
- Make VMA tile_present, tile_invalidated access rules clear (Brost)
- Xe3 Tuning: Disable NULL query for Anyhit Shader (Nitin)
- Fixes for VF GuC version (Daniele)
- Don't store the xe device pointer inside xe_ttm_tt (Dave)
- Small improvements in topology code (Michal)
- Stop relying on GGTT internals (Maarten)
- GSM size should be constant on most platforms (Roper)
- Reorder 'Get pages failed' message (Brost)
- WA BB related fixes and improvements (Lucas, Brost)
- Fix early wedge on GuC load failure (Daniele)
- Add helper function to inject fault into ct_dead_capture (Satyanarayana)
- Determine ATS / PTA programming during early sw init (Roper)
- Consolidate PAT programming logic for pre-Xe2 and post-Xe2 (Roper)
- Fix kconfig prompt (Lucas)
- Convert xe_pci tests to parametrized tests (Michal)
- Do not kill VM in PT code on -ENODATA (Brost)
- Move LRC_ENGINE_ID_PPHWSP_OFFSET outside of parallel offset (Brost)
- Enable media OA (Ashutosh)
- GuC log level tuning (Lucas)
- Add xe_vm_has_valid_gpu_mapping helper (Brost)
- Opportunistically skip TLB invalidaion on unbind (Brost)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmhTGwcACgkQ+mJfZA7r
E8odlwf6A6bfNDdj56gMjxK/tyS3ud5VV6nAiCyHoGtcMeN6rZE2dDHOI3rP1fH7
6urnx6DqZu6lA1o1NJaidyc11WLlqB3hJN+tAVZChVe8N65syvpxdz38wZbJxrfQ
MKw4uB8GfhNroQXuZcj+0dF+Ru/UqCbSAL7f1PMajAf4AcPBu/Ju7EYc2ALnINt1
jx+TOm1fOIMpA/Cw3DmGL3Uy/MtYRnnASp+qU4xSv/y8en7+83HDoKbC7+nY5NG0
j06O0QK2QeRTnltdvmbTlpjwQ+1ztyA1JS+pqj+QjyQ8iLfZaUQzED3iWAiMayn7
5A8zHkW02+v0pkFTFn2C4HShANAeHg==
=Jq5v
-----END PGP SIGNATURE-----
Merge tag 'drm-xe-next-2025-06-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Expose media OA units (Ashutosh)
Merge:
- Restore GuC submit UAF fix around queue destruction
accidentally removed in a drm-xe-fixes merge (Auld)
Core Changes:
- drm/gpusvm: Introduce devmem_only flag for allocation (Himal)
- drm/gpusvm: Add timeslicing support to GPU SVM (Brost)
Driver Changes:
- Make gem shrinker drm managed (Thomas)
- SRIOV VF Post-migration recovery of GGTT nodes and CTB (Tomasz)
- Some W/A additions and updates (Aradhya, Shekhar, Vinay, Daniele)
- Prefetch Support for svm ranges (Himal, Brost)
- Don't allocate managed BO for each policy change (Michal)
- Simplify and fix diff calculation in GuC submit (Lucas)
- Track FAST_REQ GuC H2Gs to report where errors came from (John)
- SRIOV PF: Don't allow LMEM provisioning if LMTT isn't available (Piotr)
- Check if all domains awake for MOCS dump (Tejas)
- Make creation of SLPC debugfs files conditional (Aradhya)
- Default auto_link_downgrade status to false (Aradhya)
- Use xe_mmio_read32() to read mtcfg register (Shuicheng)
- Updates in PCI ID tables (Atwood, Shekhar)
- SRIOV VF: Fail migration recovery if fixups needed but not supported (Tomasz)
- Add missing documentation around freq and RPa (Rodrigo)
- Some other SVM related fixes (Himal, Auld, Brost, Maarten)
- Allow to trigger GT resets using debugfs writes (Michal)
- Optimise CCS case for WB pages (Auld)
- Create LRC BO without VM (Niranjana)
- Initialize MOCS index early (Bala)
- HWMON fixes for BMG (Karthik, Lucas)
- Drop redundant conversion to bool (Raag)
- Rework eviction rejection of bound external bos (Thomas)
- Stop re-submitting signalled jobs (Auld)
- Small fixes and cleanups for PXP (Daniele)
- Convert some print messages to GT-oriented ones (Michal)
- Resend potentially lost GuC H2G MMIO request (Michal)
- Add configfs to load with fewer engines (Lucas)
- Remove unmatched xe_vm_unlock from __xe_exec_queue_init (Maciej)
- SRIOV VF: Small updates around GGTT handling (Michal)
- Make VMA tile_present, tile_invalidated access rules clear (Brost)
- Xe3 Tuning: Disable NULL query for Anyhit Shader (Nitin)
- Fixes for VF GuC version (Daniele)
- Don't store the xe device pointer inside xe_ttm_tt (Dave)
- Small improvements in topology code (Michal)
- Stop relying on GGTT internals (Maarten)
- GSM size should be constant on most platforms (Roper)
- Reorder 'Get pages failed' message (Brost)
- WA BB related fixes and improvements (Lucas, Brost)
- Fix early wedge on GuC load failure (Daniele)
- Add helper function to inject fault into ct_dead_capture (Satyanarayana)
- Determine ATS / PTA programming during early sw init (Roper)
- Consolidate PAT programming logic for pre-Xe2 and post-Xe2 (Roper)
- Fix kconfig prompt (Lucas)
- Convert xe_pci tests to parametrized tests (Michal)
- Do not kill VM in PT code on -ENODATA (Brost)
- Move LRC_ENGINE_ID_PPHWSP_OFFSET outside of parallel offset (Brost)
- Enable media OA (Ashutosh)
- GuC log level tuning (Lucas)
- Add xe_vm_has_valid_gpu_mapping helper (Brost)
- Opportunistically skip TLB invalidaion on unbind (Brost)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aFMb_NVF_oCW7UVl@intel.com
When the GuC fails to load we declare the device wedged. However, the
very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done
before the GT1 GuC objects are initialized, so things go bad when the
wedge code attempts to cleanup GT1. To fix this, check the initialization
status in the functions called during wedge.
Fixes: 7dbe8af13c ("drm/xe: Wedge the entire device")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: stable@vger.kernel.org # v6.12+: 1e1981b16b: drm/xe: Fix taking invalid lock on wedge
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250611214453.1159846-2-daniele.ceraolospurio@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 0b93b7dcd9)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
It should rather use xe_map_memset() as the BO is created with
XE_BO_FLAG_VRAM_IF_DGFX in xe_guc_pc_init().
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250612-vmap-vaddr-v1-1-26238ed443eb@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 21cf47d89f)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
This allows for additional L2 caching modes.
Fixes: 01570b4469 ("drm/xe/bmg: implement Wa_16023588340")
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-2-94ba5dcc1e30@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 6ab42fa03d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Media version 30.02 should be treated the same as other Xe3 IP, but
will have a slightly different set of workarounds.
-v2: Extend the range in existing WA entry (Bala)
-v3: Revert v2, Do not extend the range for the time being(Matt)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250613193146.3549862-4-dnyaneshwar.bhadane@intel.com
Graphics version 30.03 should be treated the same as other Xe3 IP, but
will have a slightly different set of workarounds.
-v2: Merge and extend the WA onto existing entry (Bala)
-v3: Revert v2's feedback changes and keep entry saparate (Matt).
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@inte.com>
Link: https://lore.kernel.org/r/20250613193146.3549862-3-dnyaneshwar.bhadane@intel.com
Prevent other bits of mailbox power limit from being overwritten with 0.
This issue was due to a missing read and modify of current power limit,
before setting a requested mailbox power limit, which is added in this
patch.
v2:
- Improve commit message. (Anshuman)
v3:
- Rebase.
- Rephrase commit message. (Riana)
- Add read-modify-write variant of xe_hwmon_pcode_write_power_limit()
i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal)
- Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits.
- Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits
writes use xe_hwmon_pcode_rmw_power_limit() only.
v4:
- Use PWR_LIM in place of (PWR_LIM_EN | PWR_LIM_VAL) wherever
applicable. (Riana)
Fixes: 7596d839f6 ("drm/xe/hwmon: Add support to manage power limits though mailbox")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If a range or VMA is invalidated and scratch page is disabled, there
is no reason to issue a TLB invalidation on unbind, skip TLB
innvalidation is this condition is true. This is an opportunistic check
as it is done without the notifier lock, thus it possible for the range
to be invalidated after this check is performed.
This should improve performance of the SVM garbage collector, for
example, xe_exec_system_allocator --r many-stride-new-prefetch, went
~20s to ~9.5s on a BMG.
v2:
- Use helper for valid check (Thomas)
v3:
- Avoid skipping TLB invalidation if PTEs are removed at a higher
level than the range
- Never skip TLB invalidations for VMA
- Drop Himal's RB
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250616063024.2059829-3-matthew.brost@intel.com
Rather than having multiple READ_ONCE of the tile_* fields and comments
in code, use helper with kernel doc for single access point and clear
rules.
v3:
- s/xe_vm_has_valid_gpu_pages/xe_vm_has_valid_gpu_mapping
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250616063024.2059829-2-matthew.brost@intel.com
This WA is applicable to BMG as well.
Note that this is a GSC WA and we don't load the GSC on BMG, so
extending the WA to BMG won't do anything right now. However, it helps
future-proof the driver so that if we ever turn the GSC on we won't have
to remember to extend this WA.
v2: don't use VERSION_RANGE from 2001 to 2004 (Matt)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250613231128.1261815-2-daniele.ceraolospurio@intel.com
Currently xe sets the guc log level to a verbose level since it's useful
to debug hangs and general development. However the verbose level may
already be too much and affect performance.
Michal Mrozek did some tests with the L0 compute stack for submission
latency with ULLS disabled. Below are the normalized numbers with log
level 3 (the current default) as baseline for each test:
Test \ Log Level 3 0 1 2
----------------------------------------------------------- ------ ------ ------ ------
BestWalkerNthCommandListSubmission(CmdListCount=2) 1.00 0.63 0.63 0.96
BestWalkerNthSubmission(KernelCount=2) 1.00 0.62 0.63 0.96
BestWalkerNthSubmissionImmediate(KernelCount=2) 1.00 0.58 0.58 0.85
BestWalkerSubmission 1.00 0.62 0.62 0.96
BestWalkerSubmissionImmediate 1.00 0.63 0.62 0.96
BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=2) 1.00 0.58 0.58 0.86
BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=4) 1.00 0.70 0.70 0.83
BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=8) 1.00 0.53 0.52 0.78
Log level 2 is the first "verbose level" for GuC, where the biggest
difference happens. Keep log level 3 for CONFIG_DRM_XE_DEBUG, but switch
to 1, i.e. GUC_LOG_LEVEL_NON_VERBOSE, for "normal" builds.
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250613-guc-log-level-v2-1-cb84a63e49fe@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Because OAM_SAG doesn't have an attached hwe, assign another hwe belonging
to the same gt (and different OAM unit) to OAM_SAG. A hwe is needed for
batch submissions to program OA HW.
v2: Assign an engine with a valid OA unit for OAM_SAG (Umesh)
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-5-ashutosh.dixit@intel.com
Previously, the oa_unit associated with an OA stream was derived from hwe
associated with the stream (stream->hwe->oa_unit). This breaks with OAM_SAG
since OAM_SAG does not have any attached hardware engines. Resolve this by
introducing stream->oa_unit and stop depending on stream->hwe.
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-4-ashutosh.dixit@intel.com
Print hwe to OA unit mapping to dmesg, to help debug for current and new
platforms.
v2: Separate out xe_oa_print_gt_oa_units() (Umesh)
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-3-ashutosh.dixit@intel.com
On Xe2+ platforms, media engines are attached to "SCMI" OA media (OAM)
units. One or more SCMI OAM units might be present on a platform. In
addition there is another OAM unit for global events, called
OAM-SAG. Performance metrics for media workloads can be obtained from these
OAM units, similar to OAG.
Expose these OAM units for userspace to use. OAM-SAG is exposed as an OA
unit without any attached engines.
Bspec: 70819, 67103, 63844, 72572, 74476, 61284
v2: Fix xe_gt_WARN_ON in __hwe_oam_unit for < 12.7 platforms
v3: Return XE_OA_UNIT_INVALID for < 12.7 to indicate no OAM units
v4: Move xe_oa_print_oa_units() to separate patch
v5: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG
v6: Introduce DRM_XE_OA_CAPS_OAM
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-2-ashutosh.dixit@intel.com
The parallel scratch layout spans 2k and LRC_ENGINE_ID_PPHWSP_OFFSET
lands within than space. This happens to be ok as the offset lands in
reserved part of guc_sched_wq_desc, but for future safety move
LRC_ENGINE_ID_PPHWSP_OFFSET to the unused offset of 1024 below parallel
scratch layout.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250612172850.4170428-1-matthew.brost@intel.com
When a device get wedged, it might be caused by a guilty application.
For userspace, knowing which task was involved can be useful for some
situations, like for implementing a policy, logs or for giving a chance
for the compositor to let the user know what task was involved in the
problem. This is an optional argument, when the task info is not
available, the PID and TASK string won't appear in the event string.
Sometimes just the PID isn't enough giving that the task might be already
dead by the time userspace will try to check what was this PID's name,
so to make the life easier also notify what's the task's name in the user
event.
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250617124949.2151549-4-andrealmeid@igalia.com
Signed-off-by: André Almeida <andrealmeid@igalia.com>
We missed to drop it in commit 50680d1698 ("drm/xe/tests: remove
unused leftover xe_call_for_each_device()") so drop it now.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250613191938.1980-2-michal.wajdeczko@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
The xe driver is the official driver for Intel Xe2 and later, while
maintaining experimental support for earlier GPUs. Reword the help
message accordingly.
Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://lore.kernel.org/r/20250611-xe-kconfig-help-v1-1-8bcc6b47d11a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Now that the PAT settings for the new special entries introduced by Xe2
are decided during early software init and left NULL on platforms they
don't apply to, there's no need to keep separate programming functions
for pre-Xe2 and post-Xe2 platforms. Consolidate down to a single pair
of programming functions (mcr and non-mcr) that can be used on any
platform.
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250613214751.792066-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Decide whether programming of the special ATS and PTA PAT entries is
necessary (and which entries should be programmed) during early software
initialization rather than hardcoding this into the 'program' functions.
Future platforms may want to re-use the same functions but utilize
different special entry values. Consolidating all of the decisions
into one place keeps things simple.
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250613214751.792066-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Xe can free some of the data pointed to by the dma-fences it exports. Most
notably the timeline name can get freed if userspace closes the associated
submit queue. At the same time the fence could have been exported to a
third party (for example a sync_fence fd) which will then cause an use-
after-free on subsequent access.
To make this safe we need to make the driver compliant with the newly
documented dma-fence rules. Driver has to ensure a RCU grace period
between signalling a fence and freeing any data pointed to by said fence.
For the timeline name we simply make the queue be freed via kfree_rcu and
for the shared lock associated with multiple queues we add a RCU grace
period before freeing the per GT structure holding the lock.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://lore.kernel.org/r/20250610164226.10817-5-tvrtko.ursulin@igalia.com
Introduce xe_vm_range_tilemask_tlb_invalidation(), which issues a TLB
invalidation for a specified address range across GTs indicated by a
tilemask.
v2 (Matthew Brost)
- Move WARN_ON_ONCE to svm caller
- Remove xe_gt_tlb_invalidation_vma
- s/XE_WARN_ON/WARN_ON_ONCE
v3
- Rebase
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250609041616.1723636-1-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Set GT min frequency to 1200Mhz once driver load is complete.
v2: Review comments (Rodrigo)
v3: Apply Wa earlier so user_req_min is not clobbered.
v4: Apply to all GTs (Lucas)
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-3-94ba5dcc1e30@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Messaging to GuC may get canceled when device is wedged. Don't
flag this as an error in xe_guc_pc code.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-1-94ba5dcc1e30@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv()
functions, the CI test systems are going out of space and crashing. To
avoid this issue, a new helper function is created and when fault is
injected into this xe_is_injection_active() helper function, ct dead
capture is avoided which suppresses ct dumps in the log.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Suggested-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250612080402.22011-1-satyanarayana.k.v.p@intel.com
When the GuC fails to load we declare the device wedged. However, the
very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done
before the GT1 GuC objects are initialized, so things go bad when the
wedge code attempts to cleanup GT1. To fix this, check the initialization
status in the functions called during wedge.
Fixes: 7dbe8af13c ("drm/xe: Wedge the entire device")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: stable@vger.kernel.org # v6.12+: 1e1981b16b: drm/xe: Fix taking invalid lock on wedge
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250611214453.1159846-2-daniele.ceraolospurio@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
No idea why, but without this GuC context switches randomly fail when
running IGTs in a loop. Need to follow up why this fixes the
aforementioned issue but can live with a stable driver for now.
Fixes: 617d824c53 ("drm/xe: Add WA BB to capture active context utilization")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250612031925.4009701-1-matthew.brost@intel.com
Updating range->tile_invalidated should be done with WRITE_ONCE to pair
with READ_ONCE in opportunistic checks.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhrost <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250604234712.2441130-1-matthew.brost@intel.com
Only the VM dma-resv lock is needed in SVM pagefaults so
xe_vm_lock/unlock can be used instead of drm exec. Micro optimization
but should save some CPU cycles in a critical path.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250603174012.2195759-1-matthew.brost@intel.com
In case the BO is in iomem, we can't simply take the vaddr and write to
it. Instead, prepare a separate buffer that is later copied into io
memory. Right now it's just a few words that could be using
xe_map_write32(), but the intention is to grow the WA BB for other
uses.
Fixes: 617d824c53 ("drm/xe: Add WA BB to capture active context utilization")
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit ef48715b2d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
The post context restore (WA BB) is a mechanism in HW that may be used
for things other than the utilization setup. Create a new function
called setup_wa_bb() that wraps any function writing useful commands in
the buffer.
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-2-0dfc5dafcef0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
In case the BO is in iomem, we can't simply take the vaddr and write to
it. Instead, prepare a separate buffer that is later copied into io
memory. Right now it's just a few words that could be using
xe_map_write32(), but the intention is to grow the WA BB for other
uses.
Fixes: 82b98cadb0 ("drm/xe: Add WA BB to capture active context utilization")
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Add set of workarounds for xe2_hpg.
-v2: Fix xe2_hpg GMD version for some workarounds.
-v3: Removed extra Workaround (Matt Roper)
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250605190804.1287289-3-dnyaneshwar.bhadane@intel.com
On old Intel platforms, the size of the GSM (i.e., the stolen memory
that holds the GGTT page table entries) could vary, so the driver needed
to read the actual size from the PCI config space. However from Xe_HP
onward, the GSM is now always guaranteed to be exactly 8MB (which
translates to a 4GB GGTT address space); this is always true regardless
of what the platform's much larger PPGTT address space is.
The bspec doesn't document the PCI config space as being a valid way to
query the size of the GSM after Xe_LP platforms, although so far it
still seems to be giving us proper values for Xe_HP, Xe2, and Xe3.
However we suspect that the config space will stop providing correct
values on some upcoming platforms, so we should stop relying on it.
Instead just use the hardcoded 8MB value as documented elsewhere in the
bspec.
Bspec: 49636, 67090, 50589
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20250605225352.2333981-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
When changing the condition from >= SZ_64K, it was changed to <= SZ_64K.
This disallows migration of 64K, which is the exact minimum allowed.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5057
Fixes: 794f5493f5 ("drm/xe: Strict migration policy for atomic SVM faults")
Cc: stable@vger.kernel.org
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://lore.kernel.org/r/20250521090102.2965100-1-dev@lankhorst.se
(cherry picked from commit 531bef26d1)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
We are already prepared to define firmwares per-GT type, so we
should also prepare our messages to be GT-oriented.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250606204311.813-1-michal.wajdeczko@intel.com