linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-09-01 09:36:29 +00:00

Author	SHA1	Message	Date
Maarten Lankhorst	3975d35683	drm/xe: Use xe_ggtt_map_bo_unlocked for resume This is the first step to hide the details of struct xe_ggtt. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-2-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-09 10:21:44 +02:00
Matthew Auld	7e3f4a3523	drm/xe: handle pinned memory in PM notifier Userspace is still alive and kicking at this point so actually moving pinned stuff here is tricky. However, we can instead pre-allocate the backup storage upfront from the notifier, such that we scoop up as much as we can, and then leave the final .suspend() to do the actual copy (or allocate anything that we missed). That way the bulk of our allocations will hopefully be done outside the more restrictive .suspend(). We do need to be extra careful though, since the pinned handling can now race with PM notifier, like something becoming unpinned after we prepare it from the notifier. v2 (Thomas): - Fix kernel doc and drop the pin as soon as we are done with the restore, instead of deferring to later. Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250416150913.434369-8-matthew.auld@intel.com	2025-04-23 09:32:16 +01:00
Matthew Auld	c6a4d46ec1	drm/xe: evict user memory in PM notifier In the case of VRAM we might need to allocate large amounts of GFP_KERNEL memory on suspend, however doing that directly in the driver .suspend()/.prepare() callback is not advisable (no swap for example). To improve on this we can instead hook up to the PM notifier framework which is invoked at an earlier stage. We effectively call the evict routine twice, where the notifier will have hopefully have cleared out most if not everything by the time we call it a second time when entering the .suspend() callback. For s4 we also get the added benefit of allocating the system pages before the hibernation image size is calculated, which looks more sensible. Note that the .suspend() hook is still responsible for dealing with all the pinned memory. Improving that is left to another patch. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1181 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4566 Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250416150913.434369-6-matthew.auld@intel.com	2025-04-23 09:32:16 +01:00
Matthew Auld	7f387e6012	drm/xe: add XE_BO_FLAG_PINNED_LATE_RESTORE With the idea of having more pinned objects using the blitter engine where possible, during suspend/resume, mark the pinned objects which can be done during the late phase once submission/migration has been setup. Start out simple with lrc and page-tables from userspace. v2: - s/early_restore/late_restore; early restore was way too bold with too many places being impacted at once. v3: - Split late vs early into separate lists, to align with newly added apply-to-pinned infra. v4: - Rebase. v5: - Make sure we restore the late phase kernel_bo_present in igpu. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-13-matthew.auld@intel.com	2025-04-04 11:41:05 +01:00
Matthew Auld	86f69c2611	drm/xe: use backup object for pinned save/restore Currently we move pinned objects, relying on the fact that the lpfn/fpfn will force the placement to occupy the same pages when restoring. However this then limits all such pinned objects to be contig underneath. In addition it is likely a little fragile moving pinned objects in the first place. Rather than moving such objects rather copy the page contents to a secondary system memory object, that way the VRAM pages never move and remain pinned. This also opens the door for eventually having non-contig pinned objects that can also be saved/restored using blitter. v2: - Make sure to drop the fence ref. - Handle NULL bo->migrate. v3: - Ensure we add the copy fence to the BOs, otherwise backup_obj can be freed before pipelined copy finishes. v4: - Rebase on newly added apply-to-pinned infra. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1182 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-10-matthew.auld@intel.com	2025-04-04 11:41:00 +01:00
Thomas Hellström	3cbb651117	drm/xe/bo: Add a bo remove callback On device unbind, migrate exported bos, including pagemap bos to system. This allows importers to take proper action without disruption. In particular, SVM clients on remote devices may continue as if nothing happened, and can chose a different placement. The evict_flags() placement is chosen in such a way that bos that aren't exported are purged. For pinned bos, we unmap DMA, but their pages are not freed yet since we can't be 100% sure they are not accessed. All pinned external bos (not just the VRAM ones) are put on the pinned.external list with this patch. But this only affects the xe_bo_pci_dev_remove_pinned() function since !VRAM bos are ignored by the suspend / resume functionality. As a follow-up we could look at removing the suspend / resume iteration over pinned external bos since we currently don't allow pinning external bos in VRAM, and other external bos don't need any special treatment at suspend / resume. v2: - Address review comments. (Matthew Auld). v3: - Don't introduce an external_evicted list (Matthew Auld) - Add a discussion around suspend / resume behaviour to the commit message. - Formatting fixes. v4: - Move dma-unmaps of pinned kernel bos to a dev managed callback to give subsystems using these bos a chance to clean them up. (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250326080551.40201-4-thomas.hellstrom@linux.intel.com	2025-03-27 12:07:57 +01:00
Thomas Hellström	f3e08e98bf	drm/xe: Simplify pinned bo iteration Introduce and use a helper to iterate over the various pinned bo lists. There are a couple of slight functional changes: 1) GGTT maps are now performed with the bo locked. 2) If the per-bo callback fails, keep the bo on the original list. v2: - Skip unrelated change in xe_bo.c Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250321133709.75327-1-thomas.hellstrom@linux.intel.com	2025-03-21 17:07:53 +01:00
Niranjana Vishwanathapura	5a3b0df25d	drm/xe: Allow bo mapping on multiple ggtts Make bo->ggtt an array to support bo mapping on multiple ggtts. Add XE_BO_FLAG_GGTTx flags to map the bo on ggtt of tile 'x'. Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241120000222.204095-2-John.C.Harrison@Intel.com	2024-11-22 19:10:23 -08:00
Matthew Auld	c8b3c6db94	drm/xe: handle flat ccs during hibernation on igpu Starting from LNL, CCS has moved over to flat CCS model where there is now dedicated memory reserved for storing compression state. On platforms like LNL this reserved memory lives inside graphics stolen memory, which is not treated like normal RAM and is therefore skipped by the core kernel when creating the hibernation image. Currently if something was compressed and we enter hibernation all the corresponding CCS state is lost on such HW, resulting in corrupted memory. To fix this evict user buffers from TT -> SYSTEM to ensure we take a snapshot of the raw CCS state when entering hibernation, where upon resuming we can restore the raw CCS state back when next validating the buffer. This has been confirmed to fix display corruption on LNL when coming back from hibernation. Fixes: `cbdc52c11c` ("drm/xe/xe2: Support flat ccs") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3409 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241112162827.116523-2-matthew.auld@intel.com	2024-11-13 10:04:36 +00:00
Matthew Auld	f2a6b8e396	drm/xe: improve hibernation on igpu The GGTT looks to be stored inside stolen memory on igpu which is not treated as normal RAM. The core kernel skips this memory range when creating the hibernation image, therefore when coming back from hibernation the GGTT programming is lost. This seems to cause issues with broken resume where GuC FW fails to load: [drm] ERROR GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1 [drm] ERROR GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01 [drm] ERROR GT0: firmware signature verification failed [drm] ERROR CRITICAL: Xe has declared device 0000:00:02.0 as wedged. Current GGTT users are kernel internal and tracked as pinned, so it should be possible to hook into the existing save/restore logic that we use for dgpu, where the actual evict is skipped but on restore we importantly restore the GGTT programming. This has been confirmed to fix hibernation on at least ADL and MTL, though likely all igpu platforms are affected. This also means we have a hole in our testing, where the existing s4 tests only really test the driver hooks, and don't go as far as actually rebooting and restoring from the hibernation image and in turn powering down RAM (and therefore losing the contents of stolen). v2 (Brost) - Remove extra newline and drop unnecessary parentheses. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241101170156.213490-2-matthew.auld@intel.com	2024-11-08 15:22:29 -08:00
Matthew Brost	a19d1db9a3	drm/xe: Restore system memory GGTT mappings GGTT mappings reside on the device and this state is lost during suspend / d3cold thus this state must be restored resume regardless if the BO is in system memory or VRAM. v2: - Unnecessary parentheses around bo->placements[0] (Checkpatch) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241031182257.2949579-1-matthew.brost@intel.com	2024-11-01 09:11:30 -07:00
Lucas De Marchi	62742d1266	drm/xe: Normalize bo flags macros The flags stored in the BO grew over time without following much a naming pattern. First of all, get rid of the _BIT suffix that was banned from everywhere else due to the guideline in drivers/gpu/drm/i915/i915_reg.h that xe kind of follows: Define bits using ``REG_BIT(N)``. Do not add ``_BIT`` suffix to the name. Here the flags aren't for a register, but it's good practice to keep it consistent. Second divergence on names is the use or not of "CREATE". This is because most of the flags are passed to xe_bo_create() family of functions, changing its behavior. However, since the flags are also stored in the bo itself and checked elsewhere in the code, it seems better to just omit the CREATE part. With those 2 guidelines, all the flags are given the form XE_BO_FLAG_<FLAG_NAME> with the following commands: git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i \ -e "s/XE_BO_\([_A-Z0-9]\)_BIT/XE_BO_\1/g" \ -e 's/XE_BO_CREATE_/XE_BO_FLAG_/g' git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i -r \ -e 's/XE_BO_(DEFER_BACKING\|SCANOUT\|FIXED_PLACEMENT\|PAGETABLE\|NEEDS_CPU_ACCESS\|NEEDS_UC\|INTERNAL_TEST\|INTERNAL_64K\|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g' And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to follow the coding style. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-02 10:33:57 -07:00
Matthew Brost	198bc28d0a	drm/xe: Pipeline evict / restore of pinned BOs during suspend / resume Rather than waiting for each evict / restore of pinned BOs to complete just wait on migrate exec queue to be idle once during suspend / resume. Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240305173503.285223-1-matthew.brost@intel.com	2024-03-05 11:45:51 -08:00
Francois Dugast	c73acc1eeb	drm/xe: Use Xe assert macros instead of XE_WARN_ON macro The XE_WARN_ON macro maps to WARN_ON which is not justified in many cases where only a simple debug check is needed. Replace the use of the XE_WARN_ON macro with the new xe_assert macros which relies on drm_*. This takes a struct drm_device argument, which is one of the main changes in this commit. The other main change is that the condition is reversed, as with XE_WARN_ON a message is displayed if the condition is true, whereas with xe_assert it is if the condition is false. v2: - Rebase - Keep WARN splats in xe_wopcm.c (Matt Roper) v3: - Rebase Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:08 -05:00
Thomas Hellström	08a4f00e62	drm/xe/bo: Simplify xe_bo_lock() xe_bo_lock() was, although it only grabbed a single lock, unnecessarily using ttm_eu_reserve_buffers(). Simplify and document the interface. v2: - Update also the xe_display subsystem. v4: - Reinstate a lost dma_resv_reserve_fences(). - Improve on xe_bo_lock() documentation (Matthew Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230908091716.36984-2-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:06 -05:00
Francois Dugast	99fea68288	drm/xe: Prefer WARN() over BUG() to avoid crashing the kernel Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls WARN() instead of BUG(). BUG() crashes the kernel and should only be used when it is absolutely unavoidable in case of catastrophic and unrecoverable failures, which is not the case here. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:17 -05:00
Matt Roper	08dea76745	drm/xe: Move migration from GT to tile Migration primarily focuses on the memory associated with a tile, so it makes more sense to track this at the tile level (especially since the driver was already skipping migration operations on media GTs). Note that the blitter engine used to perform the migration always lives in the tile's primary GT today. In theory that could change if media GTs ever start including blitter engines in the future, but we can extend the design if/when that happens in the future. v2: - Fix kunit test build - Kerneldoc parameter name update v3: - Removed leftover prototype for removed function. (Gustavo) - Remove unrelated / unwanted error handling change. (Gustavo) Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-15-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:15 -05:00
Matt Roper	876611c2b7	drm/xe: Memory allocations are tile-based, not GT-based Since memory and address spaces are a tile concept rather than a GT concept, we need to plumb tile-based handling through lots of memory-related code. Note that one remaining shortcoming here that will need to be addressed before media GT support can be re-enabled is that although the address space is shared between a tile's GTs, each GT caches the PTEs independently in their own TLB and thus TLB invalidation should be handled at the GT level. v2: - Fix kunit test build. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:14 -05:00
Matt Roper	ad703e0637	drm/xe: Move GGTT from GT to tile The GGTT exists at the tile level. When a tile contains multiple GTs, they share the same GGTT. v2: - Include some changes that were mis-squashed into the VRAM patch. (Gustavo) Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-9-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:11 -05:00
Matthew Auld	36919ebeaa	drm/xe: fix suspend-resume for dgfx This stopped working now that TTM treats moving a pinned object through ttm_bo_validate() as an error, for the general case. Add some new routines to handle the new special casing needed for suspend-resume. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/244 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:31:40 -05:00
Lucas De Marchi	ea9f879d03	drm/xe: Sort includes Sort includes and split them in blocks: 1) .h corresponding to the .c. Example: xe_bb.c should have a "#include "xe_bb.h" first. 2) #include <linux/...> 3) #include <drm/...> 4) local includes 5) i915 includes This is accomplished by running `clang-format --style=file -i --sort-includes drivers/gpu/drm/xe/*.[ch]` and ignoring all the changes after the includes. There are also some manual tweaks to split the blocks. v2: Also sort includes in headers Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:20 -05:00
Matthew Brost	c3ca546556	drm/xe: Lock GGTT on when restoring kernel BOs Make lockdep happy as we required to hold the GGTT when calling xe_ggtt_map_bo. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>	2023-12-19 18:27:47 -05:00
Matthew Brost	dd08ebf6c3	drm/xe: Introduce a new DRM driver for Intel GPUs Xe, is a new driver for Intel GPUs that supports both integrated and discrete platforms starting with Tiger Lake (first Intel Xe Architecture). The code is at a stage where it is already functional and has experimental support for multiple platforms starting from Tiger Lake, with initial support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as well as in NEO (for OpenCL and Level0). The new Xe driver leverages a lot from i915. As for display, the intent is to share the display code with the i915 driver so that there is maximum reuse there. But it is not added in this patch. This initial work is a collaboration of many people and unfortunately the big squashed patch won't fully honor the proper credits. But let's get some git quick stats so we can at least try to preserve some of the credits: Co-developed-by: Matthew Brost <matthew.brost@intel.com> Co-developed-by: Matthew Auld <matthew.auld@intel.com> Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Co-developed-by: Francois Dugast <francois.dugast@intel.com> Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com> Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com> Co-developed-by: Nirmoy Das <nirmoy.das@intel.com> Co-developed-by: Jani Nikula <jani.nikula@intel.com> Co-developed-by: José Roberto de Souza <jose.souza@intel.com> Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Co-developed-by: Dave Airlie <airlied@redhat.com> Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com>	2023-12-12 14:05:48 -05:00

23 Commits