Doing devcoredump initializing before GT though look harmless, it leads
to problem during driver unbind. Because of this order, GT/Engine
release functions will be called before xe devcoredump release function
(xe_driver_devcoredump_fini) leading to the following kernel crash[1]
because the devcoredump functions might still use GT/Engine
datastructures after those are freed.
The following crash is observed while running the IGT
xe_wedged@wedged-at-any-timeout. The test forces a wedged state by
submitting a workload which hangs. Then does a unbind/rebind of the
driver to recover from the wedged state.
The hanged workload leads to a devcoredump. The following crash is
noticed when the devcoredump capture races with the driver unbind.
During driver unbind, the release function hw_engine_fini() will be
called which assigns NULL to hwe->gt. But the same data structure is
accessed during the coredump capture in the function
xe_engine_snapshot_print by reading snapshot->hwe->gt.
With this patch, we make sure the devcoredump is stopped before
deinitializing the core driver functions.
[1]:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe]
RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe]
Call Trace:
<TASK>
? drm_printf+0x64/0x90
__xe_devcoredump_read+0x23f/0x2d0 [xe]
? __pfx___drm_printfn_coredump+0x10/0x10
? __pfx___drm_puts_coredump+0x10/0x10
xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe]
process_one_work+0x22e/0x6f0
worker_thread+0x1e8/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x11f/0x250
? __pfx_kthread+0x10/0x10
ret_from_fork+0x47/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
v2: Detailed commit description (Rodrigo)
v3: FIXME added (Rodrigo, Stuart)
Fixes: 4209d635a8 ("drm/xe: Remove devcoredump during driver release")
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 1fdc4c381ff765479d76ccf3134717c430c871b8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
CSC is not accessible by VF drivers, so disable its support flag on VF
to prevent further initialization attempts.
Fixes: e02cea83d3 ("drm/xe/gsc: add Battlemage support")
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250729123437.5933-1-lukasz.laguna@intel.com
(cherry picked from commit 552dbba1caaf0cb40ce961806d757615e26ec668)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Wa_15015404425 only needs to be applied on PTL platforms with an A step
compute die. There is no way to map PCI revid to the compute die
stepping. The easiest way to figure out compute die stepping our end is
to map the media IP's stepping to the compute die. For PTL, compute die
has an A stepping if and only if the media IP's stepping is also A-step
(This relationship is determined on a per platform basis and just
happens to be this way on PTL).
In addition this workaround is a chicken-and-egg problem. Wa_15015404425
requires that all register reads be preceded by four dummy MMIO writes
(including during early driver init and even pre-OS firmware). The
driver needs to perform some MMIO reads during init which include the
GMD_ID register that contains the Media IPs stepping. To handle this in
the safest manner assume the workaround applies to all of PTL during
driver probe and deactivate the workaround after.
The overall solution becomes a set of two workarounds:
* 15015404425 - a Device OOB workaround that's always active for PTL
* 15015404425_disable - a GT OOB workaround that applies to PTL
platfroms with a B0 or later stepping
The first of these workarounds issues dummy MMIO writes we do when
reading registers. The second guards logic that disables the first once
we have the necessary information later in the probe process.
v2: rename SoC to device, avoid null pointer dereference, update commit
message.
v3: rebase
v5: move disable check into xe_device_probe to avoid linking in xe_wa
into xe_pci, reword commit message
v6: squash extension and b0 support into 1 patch
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-7-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Some workarounds need to be able to be applied ahead of any GT
initialization for example 15015404425. This patch creates XE_DEVICE_WA
macro, in the same vein as XE_WA. This macro can be used ahead of GT
initialization, and can be tracked in sysfs. This should alleviate some
of the complexities that exist in i915.
v2: name change SoC to Device, address style issues
v5: split into separate patch from RTP changes, put oob within a struct,
move the initiation of oob workarounds into xe_device_probe_early(),
clean up the comments around XE_WA.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-5-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Adding adaption/glue layer where the I2C host adapter
(Synopsys DesignWare I2C adapter) and the I2C clients (the
microcontroller units) are enumerated.
The microcontroller units (MCU) that are attached to the GPU
depend on the OEM. The initially supported MCU will be the
Add-In Management Controller (AMC).
Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo fixed the co-developed tags and SPDX format in the .c file]
Now that all previous allocations are gone, ensure no new allocations
will ever be done before xe_display_init_early(), by moving the call
that allows allocations downwards.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-21-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Now that we added the separate step of initialising GUC in
xe_gt_init_early, it should be ok to initialise the minimum during early
init, and the rest after allocations are allowed.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-20-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
We want to split up GUC init to an alloc and noalloc part to keep the
init path the same for VF and !VF as much as possible.
Everything in vf_guc_init should be done as early as possible, otherwise
VRAM probing becomes impossible.
Also move xe_gt_mmio_init to the end of xe_gt_init_early(), cleaning up
the init in xe_device slightly.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-15-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
memirqs require allocations into GGTT, which we cannot use until
after display is enabled.
Now that the initialisation of interrupts is postponed, move memirq
init too.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-14-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Limit GT max frequency to 2600MHz and wait for frequency to reduce
before proceeding with a transient flush. This is really only needed for
the transient flush: if L2 flush is needed due to 16023588340 then
there's no need to do this additional wait since we are already using
the bigger hammer.
v2: Use generic names, ensure user set max frequency requests wait
for flush to complete (Rodrigo)
v3:
- User requests wait via wait_var_event_timeout (Lucas)
- Close races on flush + user requests (Lucas)
- Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt
rather than root gt (Lucas)
v4:
- Only apply the freq reducing part if a TDF is needed: L2 flush trumps
the need for waiting a lower frequency
Fixes: aaa08078e7 ("drm/xe/bmg: Apply Wa_22019338487")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
xe_device_td_flush() has 2 possible implementations: an entire L2 flush
or a transient flush, depending on WA 16023588340. Make this clear by
splitting the function so it calls each of them.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-3-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Enable access to internal non-volatile memory on DGFX
with GSC/CSC devices via a child device.
The nvm child device is exposed via auxiliary bus.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-7-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/i915 feature pull for v6.17:
Features and functionality:
- Add support for DSC fractional link bpp on DP MST (Imre)
- Add support for simultaneous Panel Replay and Adaptive Sync (Jouni)
- Add support for PTL+ double buffered LUT registers (Chaitanya, Ville)
- Add PIPEDMC event handling in preparation for flip queue (Ville)
Refactoring and cleanups:
- Rename lots of DPLL interfaces to unify them (Suraj)
- Allocate struct intel_display dynamically (Jani)
- Abstract VLV IOSF sideband better (Jani)
- Use str_true_false() helper (Yumeng Fang)
- Refactor DSB code in preparation for flip queue (Ville)
- Use drm_modeset_lock_assert_held() instead of open coding (Luca)
- Remove unused arg from skl_scaler_get_filter_select() (Luca)
- Split out a separate display register header (Jani)
- Abstract DRAM detection better (Jani)
- Convert LPT/WPT SBI sideband to struct intel_display (Jani)
Fixes:
- Fix DSI HS command dispatch with forced pipeline flush (Gareth Yu)
- Fix BMG and LNL+ DP adaptive sync SDP programming (Ankit)
- Fix error path for xe display workqueue allocation (Haoxiang Li)
- Disable DP AUX access probe where not required (Imre)
- Fix DKL PHY access if the port is invalid (Luca)
- Fix PSR2_SU_STATUS access on ADL+ (Jouni)
- Add sanity checks for porch and sync on BXT/GLK DSI (Ville)
DRM core changes:
- Change AUX DPCD access probe address (Imre)
- Refactor EDID quirks, amd make them available to drivers (Imre)
- Add quirk for DPCD access probe (Imre)
- Add DPCD definitions for Panel Replay capabilities (Jouni)
Merges:
- Backmerges to sync with v6.15-rcs and v6.16-rc1 (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/fff9f231850ed410bd81b53de43eff0b98240d31@intel.com
UAPI Changes:
- Add Task Information for the wedge API
Cross-subsystem Changes:
Core Changes:
- Fix warnings related to export.h
- fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures
- fence: Fix UAF issues
- format-helper: Improve tests
Driver Changes:
- ivpu: Add turbo flag, Add Wildcat Lake Support
- rz-du: Improve MIPI-DSI Support
- vmwgfx: fence improvement
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaFOwgQAKCRAnX84Zoj2+
dkbjAX9aGa2vGeoz9fiT4wMMvxWzLSW7EzJW9oC/iFitHOcmd0yUZCfdmUfukQ3T
cXtVHFcBf3clQ1iI4fV8EQwLOEaBpQ1H642/41pAebXOr9kQ6JOQ4AqhJBqamJzv
teGbWnA2+w==
=inwC
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2025-06-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:
UAPI Changes:
- Add Task Information for the wedge API
Cross-subsystem Changes:
Core Changes:
- Fix warnings related to export.h
- fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures
- fence: Fix UAF issues
- format-helper: Improve tests
Driver Changes:
- ivpu: Add turbo flag, Add Wildcat Lake Support
- rz-du: Improve MIPI-DSI Support
- vmwgfx: fence improvement
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250619-perfect-industrious-whippet-8ed3db@houat
When a device get wedged, it might be caused by a guilty application.
For userspace, knowing which task was involved can be useful for some
situations, like for implementing a policy, logs or for giving a chance
for the compositor to let the user know what task was involved in the
problem. This is an optional argument, when the task info is not
available, the PID and TASK string won't appear in the event string.
Sometimes just the PID isn't enough giving that the task might be already
dead by the time userspace will try to check what was this PID's name,
so to make the life easier also notify what's the task's name in the user
event.
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250617124949.2151549-4-andrealmeid@igalia.com
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Currently we perform the bootstrap for the primary GT early on during
device init, while the media GT bootstrap happens when we try and fetch
the hwconfig table. For consistency, move the bootstrap of the media GT
happen at the same time as the primary GT, so that all the subsequent
code can rely on both GTs being in the same state.
v2: Also drop config query from min_guc_load since we now do it
early (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250603235432.720833-8-daniele.ceraolospurio@intel.com
The future goal is to have intel_display_device_probe() create struct
intel_display. As the first step, postpone xe->display initialization
right before that call. This is the same location as in i915.
There's a subtle functional change here: xe->display will now be
initialized only if xe->info.probe_display.
The xe_display_create() function becomes empty, and can be removed. Move
its documentation to xe_display_probe()
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/6c3075739d84cecea258d686c3ef38455a61191c.1747397638.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Add some informal control for atomic SVM fault GPU timeslice to be able
to play around with values and tweak performance.
v2:
- Reduce timeslice default value to 5ms
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250512135500.1405019-6-matthew.brost@intel.com
Since xe_device_sysfs_init() exposes device specific attributes, a better
place for it is xe_device_probe().
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://lore.kernel.org/r/20250506054835.3395220-2-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Enable survivability mode if supported and configfs attribute is set.
Enabling survivability mode manually is useful in cases where pcode does
not detect failure, validation and for IFR (in-field-repair).
To set configfs survivability mode attribute for a device
echo 1 > /sys/kernel/config/xe/0000:03:00.0/survivability_mode
The card enters survivability mode if supported
v2: add a log if survivability mode is enabled for unsupported
platforms (Rodrigo)
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250407051414.1651616-4-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
On device unbind, migrate exported bos, including pagemap bos to
system. This allows importers to take proper action without
disruption. In particular, SVM clients on remote devices may
continue as if nothing happened, and can chose a different
placement.
The evict_flags() placement is chosen in such a way that bos that
aren't exported are purged.
For pinned bos, we unmap DMA, but their pages are not freed yet
since we can't be 100% sure they are not accessed.
All pinned external bos (not just the VRAM ones) are put on the
pinned.external list with this patch. But this only affects the
xe_bo_pci_dev_remove_pinned() function since !VRAM bos are
ignored by the suspend / resume functionality. As a follow-up we
could look at removing the suspend / resume iteration over
pinned external bos since we currently don't allow pinning
external bos in VRAM, and other external bos don't need any
special treatment at suspend / resume.
v2:
- Address review comments. (Matthew Auld).
v3:
- Don't introduce an external_evicted list (Matthew Auld)
- Add a discussion around suspend / resume behaviour to the
commit message.
- Formatting fixes.
v4:
- Move dma-unmaps of pinned kernel bos to a dev managed
callback to give subsystems using these bos a chance to
clean them up. (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250326080551.40201-4-thomas.hellstrom@linux.intel.com
Allow to test if driver behaves correctly when xe_pcode_probe_early()
fails. Note that this is not sufficient for testing survivability mode
as it's still required to read the hw to check for errors, which doesn't
happen on an injected failure.
To complete the early probe coverage, allow injection in the other
functions as well: xe_mmio_probe_early() and xe_device_probe_early().
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250314-fix-survivability-v5-3-fdb3559ea965@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Commit d40f275d96 ("drm/xe: Move survivability entirely to xe_pci")
moved the survivability handling to be done entirely in the xe_pci
layer. However there are some issues with that approach:
1) Survivability mode needs at least the mmio initialized, otherwise it
can't really read a register to decide if it should enter that state
2) SR-IOV mode should be initialized, otherwise it's not possible to
check if it's VF
Besides, as pointed by Riana the check for
xe_survivability_mode_enable() was wrong in xe_pci_probe() since it's
not a bool return.
Fix that by moving the initialization to be entirely in the xe_device
layer, with the correct dependencies handled: only after mmio and sriov
initialization, and not triggering it on error from
wait_for_lmem_ready(). This restores the trigger behavior before that
commit. The xe_pci layer now only checks for "is it enabled?",
like it's doing in xe_pci_suspend()/xe_pci_remove(), etc.
Cc: Riana Tauro <riana.tauro@intel.com>
Fixes: d40f275d96 ("drm/xe: Move survivability entirely to xe_pci")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250314-fix-survivability-v5-1-fdb3559ea965@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Driver-FLR can't be triggered from the VF driver, so treat it
as disabled if VF. While around, fix also the message, as it
shouldn't be printed just 'once' as we may have many devices.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250311135726.1998-2-michal.wajdeczko@intel.com
- Expose per-engine activity via perf pmu (Riana, Lucas, Umesh)
- Add support for EU stall sampling (Harish, Ashutosh)
- Allow userspace to provide low latency hint for submission (Tejas)
- GPU SVM and Xe SVM implementation (Matthew Brost)
Cross-subsystem Changes:
- devres handling for component drivers (Lucas)
- Backmege drm-next to allow cross dependent change with i915
- GPU SVM and Xe SVM implementation (Matthew Brost)
Core Changes:
Driver Changes:
- Fixes to userptr and missing validations (Matthew Auld, Thomas
Hellström, Matthew Brost)
- devcoredump typos and error handling improvement (Shuicheng)
- Allow oa_exponent value of 0 (Umesh)
- Finish moving device probe to devm (Lucas)
- Fix race between submission restart and scheduled being freed (Tejas)
- Fix counter overflows in gt_stats (Francois)
- Refactor and add missing workarounds and tunings for pre-Xe2 platforms
(Aradhya, Tvrtko)
- Fix PXP locks interaction with exec queues being killed (Daniele)
- Eliminate TIMESTAMP_OVERRIDE from xe (Matt Roper)
- Change xe_gen_wa_oob to allow building on MacOS (Daniel Gomez)
- New workarounds for Panther Lake (Tejas)
- Fix VF resume errors (Satyanarayana)
- Fix workaround infra skipping some workarounds dependent on engine
initialization (Tvrtko)
- Improve per-IP descriptors (Gustavo)
- Add more error injections to probe sequence (Francois)
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE6rM8lpABPHM5FqyDm6KlpjDL6lMFAmfKozEZHGx1Y2FzLmRl
bWFyY2hpQGludGVsLmNvbQAKCRCboqWmMMvqU5FND/4yORxEWj5G2pEw5RZLcsXp
riXowTKbxUA9+fmTbMK/YCgFotVa4Jh+/wk+a2obI06YQflS6B4ZJtIIljQvGV2H
rNps2dEmw5Xqf/RIj3aWJ5XmOB71vvgHBmfYMNIghoZMFZ5J54z1baMCX1wS+w61
rb6M6N88u29VuecyPq7NdD0TuIm67mrV8h0uQCQJv6iJWlZ7yhsyhlP0jPE663SJ
ktuWLskwS3HqX56ITy9v/MQz0pmh3i8qRTgI2hcmbV0Fq5KJd1OBVF3BYYElUhHL
9600ab3oGwpWgd1KTC/THy75YlL4KmGgSQihvEiE02NOUSWkqTWhRd3Ahb9MgCcy
0LMFm32xVk0ERlqbW+AjHDxK8YecCpQ/fI2+lLKQqs9fEY192R1+23JxNgpi+R7I
ez8G3MABLLsGmu5gTLljDtinxlAf6ost7eCgmSjLvAz6HTHOnn7XbI82mKOW7C97
VScEMq0uvtTpJXHdtbynbk4rRMZI54S7cZIEmL70WG7j190qjktTuv+xkwBqiRk/
/s5iHlAAds6tr9WuS4i8ywg32kcx5rh71u2kB2je6hEeDK6pq3zjsBOuBpizUyXT
hBILOPvUgS9FhnmSXo04JGh6ivKfknJw8v7Fho8nXcfAX4aZWiTbywgQgR/5WL/e
O+XOBFPibNYGujeXjhsaQg==
=98mS
-----END PGP SIGNATURE-----
Merge tag 'drm-xe-next-2025-03-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Expose per-engine activity via perf pmu (Riana, Lucas, Umesh)
- Add support for EU stall sampling (Harish, Ashutosh)
- Allow userspace to provide low latency hint for submission (Tejas)
- GPU SVM and Xe SVM implementation (Matthew Brost)
Cross-subsystem Changes:
- devres handling for component drivers (Lucas)
- Backmege drm-next to allow cross dependent change with i915
- GPU SVM and Xe SVM implementation (Matthew Brost)
Core Changes:
Driver Changes:
- Fixes to userptr and missing validations (Matthew Auld, Thomas
Hellström, Matthew Brost)
- devcoredump typos and error handling improvement (Shuicheng)
- Allow oa_exponent value of 0 (Umesh)
- Finish moving device probe to devm (Lucas)
- Fix race between submission restart and scheduled being freed (Tejas)
- Fix counter overflows in gt_stats (Francois)
- Refactor and add missing workarounds and tunings for pre-Xe2 platforms
(Aradhya, Tvrtko)
- Fix PXP locks interaction with exec queues being killed (Daniele)
- Eliminate TIMESTAMP_OVERRIDE from xe (Matt Roper)
- Change xe_gen_wa_oob to allow building on MacOS (Daniel Gomez)
- New workarounds for Panther Lake (Tejas)
- Fix VF resume errors (Satyanarayana)
- Fix workaround infra skipping some workarounds dependent on engine
initialization (Tvrtko)
- Improve per-IP descriptors (Gustavo)
- Add more error injections to probe sequence (Francois)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ilc5jvtyaoyi6woyhght5a6sw5jcluiojjueorcyxbynrcpcjp@mw2mi6rd6a7l
Cross-subsystem Changes:
base:
- component: Provide helper to query bound status
fbdev:
- fbtft: Remove access to page->index
Core Changes:
- Fix usage of logging macros in several places
gem:
- Add test function for imported dma-bufs and use it in core and helpers
- Avoid struct drm_gem_object.import_attach
tests:
- Fix lockdep warnings
ttm:
- Add helpers for TTM shrinker
Driver Changes:
adp:
- Add support for Apple Touch Bar displays on M1/M2
amdxdna:
- Fix interrupt handling
appletbdrm:
- Add support for Apple Touch Bar displays on x86
bridge:
- synopsys: Add HDMI audio support
- ti-sn65dsi83: Support negative DE polarity
ipu-v3:
- Remove unused code
nouveau:
- Avoid multiple -Wflex-array-member-not-at-end warnings
panthor:
- Fix CS_STATUS_ defines
- Improve locking
rockchip:
- analogix_dp: Add eDP support
- lvds: Improve logging
- vop2: Improve HDMI mode handling; Add support for RK3576
- Fix shutdown
- Support rk3562-mali
xe:
- Use TTM shrinker
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmfJnNcACgkQaA3BHVML
eiNk/QgArbk5nWCS/Wbn/LtLUh5rdIEj6hUdTUurwbKb1TWM4UKTywU9ZjcoOohx
vcwD8QykjnfGDQqSx6uiQM27TQoyFucSgkLyp8asyzYOKqIaaIdvsdEPuu9LKnIw
jVcCDnoc5sQ1OjMLfTzcod4M22amL+fdcWjKXWJvHKuHfsLNY5ppPVxEmTeqiOcR
fnJ2Dlpi9Vkyft+j1begmm9PGTphWedu3xfkUdIR0o4t8ruEvuBq5xm55gg/lBo5
7mdZpqdsRtw+U9oowv17XHRVcjxJEDhGAgO21zW5FhP3PU6Sasgpap9iNX3IbTDj
6426osJOFCvqleQJOimc8SL20qf4mQ==
=62oM
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2025-03-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.15:
Cross-subsystem Changes:
base:
- component: Provide helper to query bound status
fbdev:
- fbtft: Remove access to page->index
Core Changes:
- Fix usage of logging macros in several places
gem:
- Add test function for imported dma-bufs and use it in core and helpers
- Avoid struct drm_gem_object.import_attach
tests:
- Fix lockdep warnings
ttm:
- Add helpers for TTM shrinker
Driver Changes:
adp:
- Add support for Apple Touch Bar displays on M1/M2
amdxdna:
- Fix interrupt handling
appletbdrm:
- Add support for Apple Touch Bar displays on x86
bridge:
- synopsys: Add HDMI audio support
- ti-sn65dsi83: Support negative DE polarity
ipu-v3:
- Remove unused code
nouveau:
- Avoid multiple -Wflex-array-member-not-at-end warnings
panthor:
- Fix CS_STATUS_ defines
- Improve locking
rockchip:
- analogix_dp: Add eDP support
- lvds: Improve logging
- vop2: Improve HDMI mode handling; Add support for RK3576
- Fix shutdown
- Support rk3562-mali
xe:
- Use TTM shrinker
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306130700.GA485504@linux.fritz.box
Introduce xe_bo_put_async to put a bo where the context is such that
the bo destructor can't run due to lockdep problems or atomic context.
If the put is the final put, freeing will be done from a work item.
v5:
- Kerenl doc for xe_bo_put_async (Thomas)
v7:
- Fix kernel doc (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Tested-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-6-matthew.brost@intel.com
Rather than relying on the TTM watermark accounting add a shrinker
for xe_bos in TT or system memory.
Leverage the newly added TTM per-page shrinking and shmem backup
support.
Although xe doesn't fully support WONTNEED (purgeable) bos yet,
introduce and add shrinker support for purgeable ttm_tts.
v2:
- Cleanups bugfixes and a KUNIT shrinker test.
- Add writeback support, and activate if kswapd.
v3:
- Move the try_shrink() helper to core TTM.
- Minor cleanups.
v4:
- Add runtime pm for the shrinker. Shrinking may require an active
device for CCS metadata copying.
v5:
- Separately purge ghost- and zombie objects in the shrinker.
- Fix a format specifier - type inconsistency. (Kernel test robot).
v7:
- s/long/s64/ (Christian König)
- s/sofar/progress/ (Matt Brost)
v8:
- Rebase on Xe KUNIT update.
- Add content verifying to the shrinker kunit test.
- Split out TTM changes to a separate patch.
- Get rid of multiple bool arguments for clarity (Matt Brost)
- Avoid an error pointer dereference (Matt Brost)
- Avoid an integer overflow (Matt Auld)
- Address misc review comments by Matt Brost.
v9:
- Fix a compliation error.
- Rebase.
v10:
- Update to new LRU walk interface.
- Rework ghost-, zombie and purged object shrinking.
- Rebase.
v11:
- Use additional TTM helpers.
- Honor __GFP_FS and __GFP_IO
- Rebase.
v13:
- Use ttm_tt_setup_backup().
v14:
- Don't set up backup on imported bos.
v15:
- Rebase on backup interface changes.
Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/intel-xe/20250305092220.123405-7-thomas.hellstrom@linux.intel.com
- Add mmap support for PCI memory barrier (Tejas, Matthew Auld)
- Enable integration with perf pmu, exposing event counters: for now, just
GT C6 residency (Vinay, Lucas)
- Add "survivability mode" to allow putting the driver in a state capable of
firmware upgrade on critical failures (Riana, Rodrigo)
- Add PXP HWDRM support and enable for compatible platforms:
Meteor Lake and Lunar Lake (Daniele, John Harrison)
- Expose package and vram temperature over hwmon subsystem (Raag, Badal, Rodrigo)
Cross-subsystem Changes:
- Backmege drm-next to synchronize with i915 display and other internal APIs
Display Changes (including i915):
- Device probe re-order to help with flicker-free boot (Maarten)
- Align watermark, hpd and dsm with i915 (Rodrigo)
- Better abstraction for d3cold (Rodrigo)
Driver Changes:
- Make sure changes to ccs_mode is with helper for gt sync reset (Maciej)
- Drop mmio_ext abstraction since it didn't prove useful in its current form
(Matt Roper)
- Reject BO eviction if BO is bound to current VM (Oak, Thomas Hellström)
- Add GuC Power Conservation debugfs (Rodrigo)
- L3 cache topology updates for Xe3 (Francois, Matt Atwood)
- Better logging about missing GuC logs (John Harrison)
- Better logging for hwconfig-related data availability (John Harrison)
- Tracepoint updates for xe_bo_create, xe_vm and xe_vma (Oak)
- Add missing SPDX licenses (Francois)
- Xe suballocator imporovements (Michal Wajdeczko)
- Improve logging for native vs SR-IOV driver mode (Satyanarayana)
- Make sure VF bootstrap is not attempted in execlist mode (Maarten)
- Add GuC Buffer Cache abstraction for some CTB H2G actions and use
during VF provisioning (Michal Wajdeczko)
- Better synchronization in gtidle for new users (Vinay)
- New workarounds for Panther Lake (Nirmoy, Vinay)
- PCI ID updates for Panther Lake (Matt Atwood)
- Enable SR-IOV for Panther Lake (Michal Wajdeczko)
- Update MAINTAINERS to stop directing xe changes to drm-misc (Lucas)
- New PCI IDs for Battle Mage (Shekhar)
- Better pagefault logging (Francois)
- SR-IOV fixes and refactors for past and new platforms (Michal Wajdeczko)
- Platform descriptor refactors and updates (Sai Teja)
- Add gt stats debugfs (Francois)
- Add guc_log debugfs to dump to dmesg (Lucas)
- Abstract per-platform LMTT availability (Piotr Piórkowski)
- Refactor VRAM manager location (Piotr Piórkowski)
- Add missing xe_pm_runtime_put when forcing wedged mode (Shuicheng)
- Fix possible lockup when forcing wedged mode (Xin Wang)
- Probe refactors to use cleanup actions with better error handling (Lucas)
- XE_IOCTL_DBG clarification for userspace (Maarten)
- Better xe_mmio initialization and abstraction (Ilia)
- Drop unnecessary GT lookup (Matt Roper)
- Skip client engine usage from fdinfo for VFs (Marcin Bernatowicz)
- Allow to test xe_sync_entry_parse with error injection (Priyanka)
- OA fix for polled read (Umesh)
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE6rM8lpABPHM5FqyDm6KlpjDL6lMFAme8uvkZHGx1Y2FzLmRl
bWFyY2hpQGludGVsLmNvbQAKCRCboqWmMMvqUyHzD/wNKA27p1P3AP4MW18uiLxW
uR4PrGR1oj9CxIwqRREPUcXxnrxdydSeEso1VUVzFhj8pHpvHJt05VvC4f4Ibf+4
N/KyJSHFsfhI4QxHjrD7+v0WCs8d1Jpl00PrHkXTI2KREFGAcvYijVGYt5oRdMEv
YhwJkXPGza6eC3xmvAH6OUah4mfpkA6z2Q6lREGacofL4B9PRtZPIloTuPMfCJ5K
LKWSDK6suHq7tb40Wb2qsHN2ejTF2Spt3JC//aaBIN5Vo/xnqHgXmr/mAK2oCfjR
IrpgtwaRKAmfv/ZAH+xrL0Q5/M9Sj0HLUXkCa5NDXEacuDdwOKYOxsAvcSzRlxlF
iLJR2mJ4AX1K6JECj6blSuklWqX6u1THZuMw7w8ICxWUH3INQMZqjTaNL9ID4IwM
QM0Q25ruzTRHRSMGut9x3QGoqgmoCJgHqH7C2mz0v6iFgeNp8wxQnHlb5MHmh37F
35tQCFHLu1agzeR2NBs7CNBN2OTTQVUbtjHV5s/b4MrENspsB6OO54VdRizu7FBX
8Kyyiaxgu2Q5Qv3ayhoOZZmfrgZ7GMWGbkktmKiyukbnWrfYMBgfoiLoN40QLe5P
f2cE6kJNJtDvl+/oNmFLkFcFji6pkN6ZrUmlQF8mqMFvZqYOQnqCKWFGugrtC6hZ
gRkUTimfMUqOePBzE80/WA==
=WDS6
-----END PGP SIGNATURE-----
Merge tag 'drm-xe-next-2025-02-24' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Add mmap support for PCI memory barrier (Tejas, Matthew Auld)
- Enable integration with perf pmu, exposing event counters: for now, just
GT C6 residency (Vinay, Lucas)
- Add "survivability mode" to allow putting the driver in a state capable of
firmware upgrade on critical failures (Riana, Rodrigo)
- Add PXP HWDRM support and enable for compatible platforms:
Meteor Lake and Lunar Lake (Daniele, John Harrison)
- Expose package and vram temperature over hwmon subsystem (Raag, Badal, Rodrigo)
Cross-subsystem Changes:
- Backmege drm-next to synchronize with i915 display and other internal APIs
Display Changes (including i915):
- Device probe re-order to help with flicker-free boot (Maarten)
- Align watermark, hpd and dsm with i915 (Rodrigo)
- Better abstraction for d3cold (Rodrigo)
Driver Changes:
- Make sure changes to ccs_mode is with helper for gt sync reset (Maciej)
- Drop mmio_ext abstraction since it didn't prove useful in its current form
(Matt Roper)
- Reject BO eviction if BO is bound to current VM (Oak, Thomas Hellström)
- Add GuC Power Conservation debugfs (Rodrigo)
- L3 cache topology updates for Xe3 (Francois, Matt Atwood)
- Better logging about missing GuC logs (John Harrison)
- Better logging for hwconfig-related data availability (John Harrison)
- Tracepoint updates for xe_bo_create, xe_vm and xe_vma (Oak)
- Add missing SPDX licenses (Francois)
- Xe suballocator imporovements (Michal Wajdeczko)
- Improve logging for native vs SR-IOV driver mode (Satyanarayana)
- Make sure VF bootstrap is not attempted in execlist mode (Maarten)
- Add GuC Buffer Cache abstraction for some CTB H2G actions and use
during VF provisioning (Michal Wajdeczko)
- Better synchronization in gtidle for new users (Vinay)
- New workarounds for Panther Lake (Nirmoy, Vinay)
- PCI ID updates for Panther Lake (Matt Atwood)
- Enable SR-IOV for Panther Lake (Michal Wajdeczko)
- Update MAINTAINERS to stop directing xe changes to drm-misc (Lucas)
- New PCI IDs for Battle Mage (Shekhar)
- Better pagefault logging (Francois)
- SR-IOV fixes and refactors for past and new platforms (Michal Wajdeczko)
- Platform descriptor refactors and updates (Sai Teja)
- Add gt stats debugfs (Francois)
- Add guc_log debugfs to dump to dmesg (Lucas)
- Abstract per-platform LMTT availability (Piotr Piórkowski)
- Refactor VRAM manager location (Piotr Piórkowski)
- Add missing xe_pm_runtime_put when forcing wedged mode (Shuicheng)
- Fix possible lockup when forcing wedged mode (Xin Wang)
- Probe refactors to use cleanup actions with better error handling (Lucas)
- XE_IOCTL_DBG clarification for userspace (Maarten)
- Better xe_mmio initialization and abstraction (Ilia)
- Drop unnecessary GT lookup (Matt Roper)
- Skip client engine usage from fdinfo for VFs (Marcin Bernatowicz)
- Allow to test xe_sync_entry_parse with error injection (Priyanka)
- OA fix for polled read (Umesh)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/m3gbuh32wgiep43i4zxbyhxqbenvtgvtao5sczivlasj7tikwv@dmlba4bfg2ny
This is only changing info flags for SR-IOV reasons. Rename it
accordingly, because there are several other places in probe where the
flags are updated, which is not inside this function.
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-11-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Do not ignore errors from xe_heci_gsc_init(). For example, it shouldn't
be fine to report successfully entering survivability mode when there's
no communication with gsc working. The driver should also not be
half-initialized in the normal case neither.
Cc: Riana Tauro <riana.tauro@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-10-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There's an odd split between xe_pci.c and xe_device.c wrt
xe_survivability: it's initialized by xe_device, but then finalized by
xe_pci. Move it entirely to the outer layer, xe_pci, so it controls
the flow entirely.
This also allows to stop ignoring some of the errors. E.g.: if there's
an -ENOMEM, it shouldn't continue as if it survivability had been
enabled.
One change worth mentioning is that if "wait for lmem" fails, it will
also check the pcode status to decide if it should enter or not in
survivability mode, which it was not doing before. The bit from pcode
for that decision should remain the same after lmem failed
initialization, so it should be fine.
Cc: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-9-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Handle it as part of xe_display_fini(). The error handling was already
calling it if a step after xe_display_init() failed. Just re-use the
same xe_display_fini() for driver remove.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Now that devres supports component driver cleanup during driver removal
cleanup, the xe custom support for removal callbacks is not needed
anymore. Drop it.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Rename so that xe_mmio_init() can be used in subsequent patches to
initialize an instance of struct xe_mmio.
Signed-off-by: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250130105057.136586-1-ilia.levi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Not registering hwmon because it's not available (SRIOV_VF and DGFX) is
different from failing the initialization. Handle the errors
appropriately.
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-13-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Now that previous callers in xe_device_probe() are handling the errors,
that can be done for xe_pmu_register() as well.
Cc: Riana Tauro <riana.tauro@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-12-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
This is not really display-related and needed for any sequence on driver
removal that has to interact with drm_dev_enter()/drm_dev_exit().
Just remove xe_device_remove_display() and inline it in the single
caller to make clear this is not done only for display.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-10-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Like done with other functions, cleanup the error handling in
xe_device_probe() by moving the OA fini to be handled by xe_oa
itself, which relies on devm to call the cleanup function.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-9-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
The only thing in xe_gt_remove() that really needs to happen on the
device remove callback is the xe_uc_remove(). That's because of the
following call chain:
xe_gt_remove()
xe_uc_remove()
xe_gsc_remove()
xe_gsc_proxy_remove()
Move xe_gsc_proxy_remove() to be handled as a xe_device_remove_action,
so it's recorded when it should run during device removal. The rest can
be handled normally by devm infra.
Besides removing the deep call chain above, xe_device_probe() doesn't
have to unwind the gt loop and it's also more in line with the
xe_device_probe() style.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Not being able to initialize pxp is fatal if the platform is expected to
have it. Update comment after commit 9c9dc9ba4a ("drm/xe/pxp: Fail the
load if PXP fails to initialize").
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Make sure to differentiate normal behavior, e.g. there's no stolen, from
allocation errors or failure to initialize lower layers.
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
xe_display_fini() undoes things from xe_display_init() (technically from
intel_display_driver_probe()). Those `goto err` in xe_device_probe()
were wrong and being accumulated over time.
Commit 65e366ace5 ("drm/xe/display: Use a single early init call for
display") made it easier to fix now that we don't have xe_display_* init
calls spread on xe_device_probe(). Change xe_display_init() to use
devm_add_action_or_reset() that will finalize display in the right
order.
While at it, also add a newline and comment about calling
xe_driver_flr_fini.
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
xe device probe uses devm cleanup in most places. However there are a
few cases where this is not possible: when the driver interacts with
component add/del. In that case, the resource group would be cleanup
while the entire device resources are in the process of cleanup. One
example is the xe_gsc_proxy and display using that to interact with mei
and audio.
Add a callback-based remove so the exception doesn't make the probe
use multiple error handling styles.
v2: Change internal API to mimic the devm API. This will make it easier
to migrate in future when devm can be used.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>