Currently, ASID assignment for user VMs and page-table BO accounting for
client memory tracking are performed in xe_vm_create_ioctl.
To consolidate VM object initialization, move this logic to
xe_vm_create.
v2:
- removed unnecessary duplicate BO tracking code
- using the local variable xef to verify whether the VM is being created
by userspace
Fixes: 658a1c8e0a ("drm/xe: Assign ioctl xe file handler to vm in xe_vm_create")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250811104358.2064150-3-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry picked from commit 30e0c3f43a414616e0b6ca76cf7f7b2cd387e1d4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: Added fixes tag]
In several code paths, such as xe_pt_create(), the vm->xef field is used
to determine whether a VM originates from userspace or the kernel.
Previously, this handler was only assigned in xe_vm_create_ioctl(),
after the VM was created by xe_vm_create(). However, xe_vm_create()
triggers page table creation, and that function assumes vm->xef should
be already set. This could lead to incorrect origin detection.
To fix this problem and ensure consistency in the initialization of
the VM object, let's move the assignment of this handler to
xe_vm_create.
v2:
- take reference to the xe file object only when xef is not NULL
- release the reference to the xe file object on the error path (Matthew)
Fixes: 7f387e6012 ("drm/xe: add XE_BO_FLAG_PINNED_LATE_RESTORE")
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250811104358.2064150-2-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry picked from commit 9337166fa1d80f7bb7c7d3a8f901f21c348c0f2a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The prefetch code was referencing CONFIG_DRM_XE_DEVMEM_MIRROR, which has
been replaced by CONFIG_DRM_XE_PAGEMAP. As a result, prefetches were
limited to SRAM. Update the code to use CONFIG_DRM_XE_PAGEMAP instead of
the deprecated option.
Fixes: f86ad0ed62 ("drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250710205413.1105595-1-matthew.brost@intel.com
bo->size is redundant because the base GEM object already has a size
field with the same value. Drop bo->size and use the base GEM object’s
size instead. While at it, introduce xe_bo_size() to abstract the BO
size.
v2:
- Fix typo in kernel doc (Ashutosh)
- Fix kunit (CI)
- Fix line wrap (Checkpatch)
v3:
- Fix sriov build (CI)
v4:
- Fix display build (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com
Add runtime PM since we might call populate_mm on a foreign device.
v3:
- Fix a kerneldoc failure (Matt Brost)
- Revert the bo type change from device to kernel (Matt Brost)
v4:
- Add an assert in xe_svm_alloc_vram (Matt Brost)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250619134035.170086-4-thomas.hellstrom@linux.intel.com
Rather than having multiple READ_ONCE of the tile_* fields and comments
in code, use helper with kernel doc for single access point and clear
rules.
v3:
- s/xe_vm_has_valid_gpu_pages/xe_vm_has_valid_gpu_mapping
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250616063024.2059829-2-matthew.brost@intel.com
Introduce xe_vm_range_tilemask_tlb_invalidation(), which issues a TLB
invalidation for a specified address range across GTs indicated by a
tilemask.
v2 (Matthew Brost)
- Move WARN_ON_ONCE to svm caller
- Remove xe_gt_tlb_invalidation_vma
- s/XE_WARN_ON/WARN_ON_ONCE
v3
- Rebase
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250609041616.1723636-1-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Document VMA tile_invalidated access rules, use READ_ONCE / WRITE_ONCE
for opportunistic checks of tile_present and tile_invalidated, move
tile_invalidated state change from page fault handler to PT code under
the correct locks, and add lockdep asserts to TLB invalidation paths.
v2:
- Assert VM dma-resv lock rather than BO in zap PTEs
v3:
- Back to BO's dma-resv lock, adjust documentation
v4:
- Add WRITE_ONCE in xe_vm_invalidate_vma (Thomas)
- Change lockdep assert for userptr in xe_vm_invalidate_vma (CI)
- Take userptr notifier lock in read mode in xe_vm_userptr_pin before
calling xe_vm_invalidate_vma (CI)
v5:
- Fix typos (Thomas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250602164412.1912293-1-matthew.brost@intel.com
In xe_vm_close_and_put() we need to be able to call xe_svm_fini(),
however during vm creation we can call this on the error path, before
having actually initialised the svm state, leading to various splats
followed by a fatal NPD.
Fixes: 6fd979c2f3 ("drm/xe: Add SVM init / close / fini to faulting VMs")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4967
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250514152424.149591-4-matthew.auld@intel.com
In xe_vm_close_and_put() we need to be able to call
flush_work(rebind_work), however during vm creation we can call this on
the error path, before having actually set up the worker, leading to a
splat from flush_work().
It looks like we can simply move the worker init step earlier to fix
this.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250514152424.149591-3-matthew.auld@intel.com
This commit adds prefetch support for SVM ranges, utilizing the
existing ioctl vm_bind functionality to achieve this.
v2: rebase
v3:
- use xa_for_each() instead of manual loop
- check range is valid and in preferred location before adding to
xarray
- Fix naming conventions
- Fix return condition as -ENODATA instead of -EAGAIN (Matthew Brost)
- Handle sparsely populated cpu vma range (Matthew Brost)
v4:
- fix end address to find next cpu vma in case of -ENOENT
v5:
- Move find next vma logic to drm gpusvm layer
- Avoid mixing declaration and logic
v6:
- Use new function names
- Move eviction logic to prefetch_ranges
v7:
- devmem_only assigned 0
- nit address
v8:
- initialize ctx with 0
Cc: Matthew Brost <matthew.brost@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250513040228.470682-15-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
This update renames the lookup_vma function to xe_vm_find_vma_by_addr and
makes it accessible externally. The function, which looks up a VMA by
its address within a specified VM, will be utilized in upcoming patches.
v2
- Fix doc
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250513040228.470682-9-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Add a flag in xe_vma_ops to determine whether it has svm prefetch ops or
not.
v2:
- s/false/0 (Matthew Brost)
v3:
- s/XE_VMA_OPS_HAS_SVM_PREFETCH/XE_VMA_OPS_FLAG_HAS_SVM_PREFETCH
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250513040228.470682-8-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Prefetch for SVM ranges can have more than one operation to increment,
hence modify the function to accept an increment value as input.
v2:
- Call xe_vma_ops_incr_pt_update_ops only once for REMAP (Matthew Brost)
- Add check for 0 ops
v3:
- s/u8/int for inc_val and num_remap_ops (Matthew Brost)
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250513040228.470682-7-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
copy_from_user() has more checks and is more safer than
__copy_from_user()
Suggested-by: Kees Cook <kees@kernel.org>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/acabf20aa8621c7bc8de09b1bffb8d14b5376484.1746126614.git.harish.chegondi@intel.com
Abort printing coredump in VM printer output if full. Helps speedup
large coredumps which need to walked multiple times in
xe_devcoredump_read.
v2:
- s/drm_printer_is_full/drm_coredump_printer_is_full (Jani)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250423171725.597955-5-matthew.brost@intel.com
Normally scratch page is not allowed when a vm is operate under page
fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
is fault mode relies on recoverable page to work, while scratch page
can mute recoverable page fault.
On xe2 and xe3, out of bound prefetch can cause page fault and further
system hang because xekmd can't resolve such page fault. SYCL and OCL
language runtime requires out of bound prefetch to be silently dropped
without causing any functional problem, thus the existing behavior
doesn't meet language runtime requirement.
At the same time, HW prefetching can cause page fault interrupt. Due to
page fault interrupt overhead (i.e., need Guc and KMD involved to fix
the page fault), HW prefetching can be slowed by many orders of magnitude.
Fix those problems by allowing scratch page under fault mode for xe2 and
xe3. With scratch page in place, HW prefetching could always hit scratch
page instead of causing interrupt.
A side effect is, scratch page could hide application program error.
Application out of bound accesses are hided by scratch page mapping,
instead of get reported to user.
v2: Refine commit message (Thomas)
v3: Move the scratch page flag check to after scratch page wa (Thomas)
v4: drop NEEDS_SCRATCH macro (matt)
Add a comment to DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250403165328.2438690-4-oak.zeng@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
When a vm runs under fault mode, if scratch page is enabled, we need
to clear the scratch page mapping on vm_bind for the vm_bind address
range. Under fault mode, we depend on recoverable page fault to
establish mapping in page table. If scratch page is not cleared, GPU
access of address won't cause page fault because it always hits the
existing scratch page mapping.
When vm_bind with IMMEDIATE flag, there is no need of clearing as
immediate bind can overwrite the scratch page mapping.
So far only is xe2 and xe3 products are allowed to enable scratch page
under fault mode. On other platform we don't allow scratch page under
fault mode, so no need of such clearing.
v2: Rework vm_bind pipeline to clear scratch page mapping. This is similar
to a map operation, with the exception that PTEs are cleared instead of
pointing to valid physical pages. (Matt, Thomas)
TLB invalidation is needed after clear scratch page mapping as larger
scratch page mapping could be backed by physical page and cached in
TLB. (Matt, Thomas)
v3: Fix the case of clearing huge pte (Thomas)
Improve commit message (Thomas)
v4: TLB invalidation on all LR cases, not only the clear on bind
cases (Thomas)
v5: Misc cosmetic changes (Matt)
Drop pt_update_ops.invalidate_on_bind. Directly wire
xe_vma_op.map.invalidata_on_bind to bind_op_prepare/commit (Matt)
v6: checkpatch fix (Matt)
v7: No need to check platform needs_scratch deciding invalidate_on_bind
(Matt)
v8: rebase
v9: rebase
v10: fix an error in xe_pt_stage_bind_entry, introduced in v9 rebase
Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250403165328.2438690-3-oak.zeng@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Don't rely on CONFIG_DRM_GPUSVM because other drivers may enable it
causing us to compile in SVM support unintentionally.
Also take the opportunity to leave more code out of compilation if
!CONFIG_DRM_XE_GPUSVM and !CONFIG_DRM_XE_DEVMEM_MIRROR
v3:
- Fixes for compilation errors on 32-bit. This changes the Kconfig
logic a bit.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250326080551.40201-2-thomas.hellstrom@linux.intel.com
The check for args->extensions is repeated twice in xe_vm_create_ioctl().
This commit removes the redundant check to streamline the code.
Fixes: 7224788f67 ("drm/xe: Kill XE_VM_PROPERTY_BIND_OP_ERROR_CAPTURE_ADDRESS extension")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250303004942.951699-1-x.wang@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Support for CPU address mirror bindings in SRAM fully in place, enable the
implementation.
v3:
- s/system allocator/CPU address mirror (Thomas)
v7:
- Only enable uAPI if selected by GPU SVM (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-19-matthew.brost@intel.com
uAPI is designed with the use case that only mapping a BO to a malloc'd
address will unbind a CPU-address mirror VMA. Therefore, allowing a
CPU-address mirror VMA to unbind when the GPU has bindings in the range
being unbound does not make much sense. This behavior is not supported,
as it simplifies the code. This decision can always be revisited if a
use case arises.
v3:
- s/arrises/arises (Thomas)
- s/system allocator/GPU address mirror (Thomas)
- Kernel doc (Thomas)
- Newline between function defs (Thomas)
v5:
- Kernel doc (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-18-matthew.brost@intel.com
Add unbind to SVM garbage collector. To facilitate add unbind support
function to VM layer which unbinds a SVM range. Also teach PT layer to
understand unbinds of SVM ranges.
v3:
- s/INVALID_VMA/XE_INVALID_VMA (Thomas)
- Kernel doc (Thomas)
- New GPU SVM range structure (Thomas)
- s/DRM_GPUVA_OP_USER/DRM_GPUVA_OP_DRIVER (Thomas)
v4:
- Use xe_vma_op_unmap_range (Himal)
v5:
- s/PY/PT (Thomas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-17-matthew.brost@intel.com
Add basic SVM garbage collector which destroy a SVM range upon a MMU
UNMAP event. The garbage collector runs on worker or in GPU fault
handler and is required as locks in the path of reclaim are required and
cannot be taken the notifier.
v2:
- Flush garbage collector in xe_svm_close
v3:
- Better commit message (Thomas)
- Kernel doc (Thomas)
- Use list_first_entry_or_null for garbage collector loop (Thomas)
- Don't add to garbage collector if VM is closed (Thomas)
v4:
- Use %pe to print error (Thomas)
v5:
- s/visable/visible (Thomas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-16-matthew.brost@intel.com
Add (re)bind to SVM page fault handler. To facilitate add support
function to VM layer which (re)binds a SVM range. Also teach PT layer to
understand (re)binds of SVM ranges.
v2:
- Don't assert BO lock held for range binds
- Use xe_svm_notifier_lock/unlock helper in xe_svm_close
- Use drm_pagemap dma cursor
- Take notifier lock in bind code to check range state
v3:
- Use new GPU SVM range structure (Thomas)
- Kernel doc (Thomas)
- s/DRM_GPUVA_OP_USER/DRM_GPUVA_OP_DRIVER (Thomas)
v5:
- Kernel doc (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Tested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-15-matthew.brost@intel.com
Add SVM range invalidation vfunc which invalidates PTEs. A new PT layer
function which accepts a SVM range is added to support this. In
addition, add the basic page fault handler which allocates a SVM range
which is used by SVM range invalidation vfunc.
v2:
- Don't run invalidation if VM is closed
- Cycle notifier lock in xe_svm_close
- Drop xe_gt_tlb_invalidation_fence_fini
v3:
- Better commit message (Thomas)
- Add lockdep asserts (Thomas)
- Add kernel doc (Thomas)
- s/change/changed (Thomas)
- Use new GPU SVM range / notifier structures
- Ensure PTEs are zapped / dma mappings are unmapped on VM close (Thomas)
v4:
- Fix macro (Checkpatch)
v5:
- Use range start/end helpers (Thomas)
- Use notifier start/end helpers (Thomas)
v6:
- Use min/max helpers (Himal)
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-13-matthew.brost@intel.com
Clear root PT entry and invalidate entire VM's address space when
closing the VM. Will prevent the GPU from accessing any of the VM's
memory after closing.
v2:
- s/vma/vm in kernel doc (CI)
- Don't nuke migration VM as this occur at driver unload (CI)
v3:
- Rebase and pull into SVM series (Thomas)
- Wait for pending binds (Thomas)
v5:
- Remove xe_gt_tlb_invalidation_fence_fini in error case (Matt Auld)
- Drop local migration bool (Thomas)
v7:
- Add drm_dev_enter/exit protecting invalidation (CI, Matt Auld)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-12-matthew.brost@intel.com
Add SVM init / close / fini to faulting VMs. Minimual implementation
acting as a placeholder for follow on patches.
v2:
- Add close function
v3:
- Better commit message (Thomas)
- Kernel doc (Thomas)
- Update chunk array to be unsigned long (Thomas)
- Use new drm_gpusvm.h header location (Thomas)
- Newlines between functions in xe_svm.h (Thomas)
- Call drm_gpusvm_driver_set_lock in init (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
v7:
- Only select CONFIG_DRM_GPUSVM if DEVICE_PRIVATE (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-10-matthew.brost@intel.com
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to
create unpopulated virtual memory areas (VMAs) without memory backing or
GPU page tables. These VMAs are referred to as CPU address mirror VMAs.
The idea is that upon a page fault or prefetch, the memory backing and
GPU page tables will be populated.
CPU address mirror VMAs only update GPUVM state; they do not have an
internal page table (PT) state, nor do they have GPU mappings.
It is expected that CPU address mirror VMAs will be mixed with buffer
object (BO) VMAs within a single VM. In other words, system allocations
and runtime allocations can be mixed within a single user-mode driver
(UMD) program.
Expected usage:
- Bind the entire virtual address (VA) space upon program load using the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- If a buffer object (BO) requires GPU mapping (runtime allocation),
allocate a CPU address using mmap(PROT_NONE), bind the BO to the
mmapped address using existing bind IOCTLs. If a CPU map of the BO is
needed, mmap it again to the same CPU address using mmap(MAP_FIXED)
- If a BO no longer requires GPU mapping, munmap it from the CPU address
space and them bind the mapping address with the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- Any malloc'd or mmapped CPU address accessed by the GPU will be
faulted in via the SVM implementation (system allocation).
- Upon freeing any mmapped or malloc'd data, the SVM implementation will
remove GPU mappings.
Only supporting 1 to 1 mapping between user address space and GPU
address space at the moment as that is the expected use case. uAPI
defines interface for non 1 to 1 but enforces 1 to 1, this restriction
can be lifted if use cases arrise for non 1 to 1 mappings.
This patch essentially short-circuits the code in the existing VM bind
paths to avoid populating page tables when the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set.
v3:
- Call vm_bind_ioctl_ops_fini on -ENODATA
- Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs
- s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas)
- Rework commit message for expected usage (Thomas)
- Describe state of code after patch in commit message (Thomas)
v4:
- Fix alignment (Checkpatch)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-9-matthew.brost@intel.com
Use fault injection infrastructure to allow specific functions to
be configured over debugfs for failing during the execution of
xe_exec_queue_create_ioctl(). xe_exec_queue_destroy_ioctl() and
xe_exec_queue_get_property_ioctl() are not considered as there is
no unwinding code to test with fault injection.
This allows more thorough testing from user space by going through
code paths for error handling and unwinding which cannot be reached
by simply injecting errors in IOCTL arguments. This can help
increase code robustness.
The corresponding IGT series is:
https://patchwork.freedesktop.org/series/144138/
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250305150659.46276-1-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
If userptr pages are freed after a call to the xe mmu notifier,
the device will not be blocked out from theoretically accessing
these pages unless they are also unmapped from the iommu, and
this violates some aspects of the iommu-imposed security.
Ensure that userptrs are unmapped in the mmu notifier to
mitigate this. A naive attempt would try to free the sg table, but
the sg table itself may be accessed by a concurrent bind
operation, so settle for only unmapping.
v3:
- Update lockdep asserts.
- Fix a typo (Matthew Auld)
Fixes: 81e058a3e7 ("drm/xe: Introduce helper to populate userptr")
Cc: Oak Zeng <oak.zeng@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250304173342.22009-4-thomas.hellstrom@linux.intel.com
Fix fault mode invalidation racing with unbind leading to the
PTE zapping potentially traversing an invalid page-table tree.
Do this by holding the notifier lock across PTE zapping. This
might transfer any contention waiting on the notifier seqlock
read side to the notifier lock read side, but that shouldn't be
a major problem.
At the same time get rid of the open-coded invalidation in the bind
code by relying on the notifier even when the vma bind is not
yet committed.
Finally let userptr invalidation call a dedicated xe_vm function
performing a full invalidation.
Fixes: e8babb280b ("drm/xe: Convert multiple bind ops into single job")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-4-thomas.hellstrom@linux.intel.com
If a userptr vma subject to prefetching was already invalidated
or invalidated during the prefetch operation, the operation would
repeatedly return -EAGAIN which would typically cause an infinite
loop.
Validate the userptr to ensure this doesn't happen.
v2:
- Don't fallthrough from UNMAP to PREFETCH (Matthew Brost)
Fixes: 5bd24e7882 ("drm/xe/vm: Subclass userptr vmas")
Fixes: 617eebb9c4 ("drm/xe: Fix array of binds")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-2-thomas.hellstrom@linux.intel.com
Doesn't look to be used.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-6-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Currently we treat EFAULT from hmm_range_fault() as a non-fatal error
when called from xe_vm_userptr_pin() with the idea that we want to avoid
killing the entire vm and chucking an error, under the assumption that
the user just did an unmap or something, and has no intention of
actually touching that memory from the GPU. At this point we have
already zapped the PTEs so any access should generate a page fault, and
if the pin fails there also it will then become fatal.
However it looks like it's possible for the userptr vma to still be on
the rebind list in preempt_rebind_work_func(), if we had to retry the
pin again due to something happening in the caller before we did the
rebind step, but in the meantime needing to re-validate the userptr and
this time hitting the EFAULT.
This explains an internal user report of hitting:
[ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe]
[ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738690] Call Trace:
[ 191.738692] <TASK>
[ 191.738694] ? show_regs+0x69/0x80
[ 191.738698] ? __warn+0x93/0x1a0
[ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738759] ? report_bug+0x18f/0x1a0
[ 191.738764] ? handle_bug+0x63/0xa0
[ 191.738767] ? exc_invalid_op+0x19/0x70
[ 191.738770] ? asm_exc_invalid_op+0x1b/0x20
[ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738834] ? ret_from_fork_asm+0x1a/0x30
[ 191.738849] bind_op_prepare+0x105/0x7b0 [xe]
[ 191.738906] ? dma_resv_reserve_fences+0x301/0x380
[ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe]
[ 191.738966] ? kmemleak_alloc+0x4b/0x80
[ 191.738973] ops_execute+0x188/0x9d0 [xe]
[ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe]
[ 191.739098] ? trace_hardirqs_on+0x4d/0x60
[ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe]
Followed by NPD, when running some workload, since the sg was never
actually populated but the vma is still marked for rebind when it should
be skipped for this special EFAULT case. This is confirmed to fix the
user report.
v2 (MattB):
- Move earlier.
v3 (MattB):
- Update the commit message to make it clear that this indeed fixes the
issue.
Fixes: 521db22a1d ("drm/xe: Invalidate userptr VMA on page pin fault")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-5-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
On error restore anything still on the pin_list back to the invalidation
list on error. For the actual pin, so long as the vma is tracked on
either list it should get picked up on the next pin, however it looks
possible for the vma to get nuked but still be present on this per vm
pin_list leading to corruption. An alternative might be then to instead
just remove the link when destroying the vma.
v2:
- Also add some asserts.
- Keep the overzealous locking so that we are consistent with the docs;
updating the docs and related bits will be done as a follow up.
Fixes: ed2bdf3b26 ("drm/xe/vm: Subclass userptr vmas")
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-4-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
The driver needs to know if a BO is encrypted with PXP to enable the
display decryption at flip time.
Furthermore, we want to keep track of the status of the encryption and
reject any operation that involves a BO that is encrypted using an old
key. There are two points in time where such checks can kick in:
1 - at VM bind time, all operations except for unmapping will be
rejected if the key used to encrypt the BO is no longer valid. This
check is opt-in via a new VM_BIND flag, to avoid a scenario where a
malicious app purposely shares an invalid BO with a non-PXP aware
app (such as a compositor). If the VM_BIND was failed, the
compositor would be unable to display anything at all. Allowing the
bind to go through means that output still works, it just displays
garbage data within the bounds of the illegal BO.
2 - at job submission time, if the queue is marked as using PXP, all
objects bound to the VM will be checked and the submission will be
rejected if any of them was encrypted with a key that is no longer
valid.
Note that there is no risk of leaking the encrypted data if a user does
not opt-in to those checks; the only consequence is that the user will
not realize that the encryption key is changed and that the data is no
longer valid.
v2: Better commnnts and descriptions (John), rebase
v3: Properly return the result of key_assign up the stack, do not use
xe_bo in display headers (Jani)
v4: improve key_instance variable documentation (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-11-daniele.ceraolospurio@intel.com
PXP requires submissions to the HW for the following operations
1) Key invalidation, done via the VCS engine
2) Communication with the GSC FW for session management, done via the
GSCCS.
Key invalidation submissions are serialized (only 1 termination can be
serviced at a given time) and done via GGTT, so we can allocate a simple
BO and a kernel queue for it.
Submissions for session management are tied to a PXP client (identified
by a unique host_session_id); from the GSC POV this is a user-accessible
construct, so all related submission must be done via PPGTT. The driver
does not currently support PPGTT submission from within the kernel, so
to add this support, the following changes have been included:
- a new type of kernel-owned VM (marked as GSC), required to ensure we
don't use fault mode on the engine and to mark the different lock
usage with lockdep.
- a new function to map a BO into a VM from within the kernel.
v2: improve comments and function name, remove unneeded include (John)
v3: fix variable/function names in documentation
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-3-daniele.ceraolospurio@intel.com
BO validation during vm_bind could trigger memory eviction when
system runs under memory pressure. Right now we blindly evict
BOs of all VMs. This scheme has a problem when system runs in
none recoverable page fault mode: even though the vm_bind could
be successful by evicting BOs, the later the rebinding of the
evicted BOs would fail. So it is better to report an out-of-
memory failure at vm_bind time than at time of rebinding where
xekmd currently doesn't have a good mechanism to report error
to user space.
This patch implemented a scheme to only evict objects of other
VMs during vm_bind time. Object of the same VM will skip eviction.
If we failed to find enough memory for vm_bind, we report error
to user space at vm_bind time.
This scheme is not needed for recoverable page fault mode under
what we can dynamically fault-in pages on demand.
v1: Use xe_vm_in_preempt_fence_mode instead of stack variable (Thomas)
Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241203021929.1919730-1-oak.zeng@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
ttm_bo_access() returns the size on success, account for that otherwise
the caller incorrectly thinks this is an error in
intel_atomic_prepare_plane_clear_colors().
v2 (Thomas)
- Make sure we check for the partial copy case. Also since this api is
easy to get wrong, wrap the whole thing in a new helper to hide the
details and then convert the existing users over.
Fixes: b6308aaa24 ("drm/xe/display: Update intel_bo_read_from_page to use ttm_bo_access")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3661
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241202170102.88893-2-matthew.auld@intel.com
Non-contiguous mapping of BO in VRAM doesn't work, use ttm_bo_access
instead.
v2:
- Fix error handling
Fixes: 0eb2a18a8f ("drm/xe: Implement VM snapshot support for BO's and userptr")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241126174615.2665852-7-matthew.brost@intel.com