Commit Graph

107 Commits

Author SHA1 Message Date
Matthew Brost
5fcbf83e39 drm/xe: Drop rebind argument from xe_pt_prepare_bind
This is unused, drop it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201184844.2317004-1-matthew.brost@intel.com
2024-02-01 18:43:11 -08:00
Thomas Hellström
5bd24e7882 drm/xe/vm: Subclass userptr vmas
The construct allocating only parts of the vma structure when
the userptr part is not needed is very fragile. A developer could
add additional fields below the userptr part, and the code could
easily attempt to access the userptr part even if its not persent.

So introduce xe_userptr_vma which subclasses struct xe_vma the
proper way, and accordingly modify a couple of interfaces.
This should also help if adding userptr helpers to drm_gpuvm.

v2:
- Fix documentation of to_userptr_vma() (Matthew Brost)
- Fix allocation and freeing of vmas to clearer distinguish
  between the types.

Closes: https://lore.kernel.org/intel-xe/0c4cc1a7-f409-4597-b110-81f9e45d1ffe@embeddedor.com/T/#u
Fixes: a4cc60a55f ("drm/xe: Only alloc userptr part of xe_vma for userptrs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240131091628.12318-1-thomas.hellstrom@linux.intel.com
2024-02-01 09:49:26 +01:00
Thomas Hellström
06951c2ee7 drm/xe: Use NULL PTEs as scratch PTEs
Currently scratch PTEs are write-enabled and points to a single scratch
page. This has the side effect that buggy applications with out-of-bounds
memory accesses may not notice the bad access since what's written may
be read back.

Instead use NULL PTEs as scratch PTEs. These always return 0 when reading,
and writing has no effect. As a slight benefit, we can also use huge NULL
PTEs.

One drawback pointed out is that debugging may be hampered since previously
when inspecting the content of the scratch page, it might be possible to
detect writes to out-of-bound addresses and possibly also
from where the out-of-bounds address originated. However since the scratch
page-table structure is kept, it will be easy to add back the single
RW-enabled scratch page under a debug define if needed.

Also update the kerneldoc accordingly and move the function to create the
scratch page-tables from xe_pt.c to xe_pt.h since it is accessing
vm structure internals and this also makes it possible to make it static.

v2:
- Don't try to encode scratch PTEs larger than 1GiB.
- Move xe_pt_create_scratch(), Update kerneldoc.
v3:
- Rebase.

Cc: Brian Welty <brian.welty@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> #for general direction.
Reviewed-by: Brian Welty <brian.welty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231209151843.7903-3-thomas.hellstrom@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:28 -05:00
Thomas Hellström
e84d716dd4 drm/xe: Restrict huge PTEs to 1GiB
Add a define for the highest level for which we can encode a huge PTE,
and use it for page-table building. Also update an assert that checks that
we don't try to encode for larger sizes.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231209151843.7903-2-thomas.hellstrom@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:28 -05:00
Matthew Auld
e1fbc4f18d drm/xe/uapi: support pat_index selection with vm_bind
Allow userspace to directly control the pat_index for a given vm
binding. This should allow directly controlling the coherency, caching
behaviour, compression and potentially other stuff in the future for the
ppGTT binding.

The exact meaning behind the pat_index is very platform specific (see
BSpec or PRMs) but effectively maps to some predefined memory
attributes. From the KMD pov we only care about the coherency that is
provided by the pat_index, which falls into either NONE, 1WAY or 2WAY.
The vm_bind coherency mode for the given pat_index needs to be at least
1way coherent when using cpu_caching with DRM_XE_GEM_CPU_CACHING_WB. For
platforms that lack the explicit coherency mode attribute, we treat
UC/WT/WC as NONE and WB as AT_LEAST_1WAY.

For userptr mappings we lack a corresponding gem object, so the expected
coherency mode is instead implicit and must fall into either 1WAY or
2WAY. Trying to use NONE will be rejected by the kernel. For imported
dma-buf (from a different device) the coherency mode is also implicit
and must also be either 1WAY or 2WAY.

v2:
  - Undefined coh_mode(pat_index) can now be treated as programmer
    error. (Matt Roper)
  - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than
    having to match exactly. This ensures imported dma-buf can always
    just use 1way (or even 2way), now that we also bundle 1way/2way into
    at_least_1way. We still require 1way/2way for external dma-buf, but
    the policy can now be the same for self-import, if desired.
  - Use u16 for pat_index in uapi. u32 is massive overkill. (José)
  - Move as much of the pat_index validation as we can into
    vm_bind_ioctl_check_args. (José)
v3 (Matt Roper):
  - Split the pte_encode() refactoring into separate patch.
v4:
  - Rebase
v5:
  - Check for and reject !coh_mode which would indicate hw reserved
    pat_index on xe2.
v6:
  - Rebase on removal of coh_mode from uapi. We just need to reject
    cpu_caching=wb + pat_index with coh_none.

Testcase: igt@xe_pat
Bspec: 45101, 44235 #xe
Bspec: 70552, 71582, 59400 #xe2
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
Cc: Zhengguo Xu <zhengguo.xu@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Tested-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Zhengguo Xu <zhengguo.xu@intel.com>
Acked-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:07 -05:00
Thomas Hellström
fdb6a05383 drm/xe: Internally change the compute_mode and no_dma_fence mode naming
The name "compute_mode" can be confusing since compute uses either this
mode or fault_mode to achieve the long-running semantics, and compute_mode
can, moving forward, enable fault_mode under the hood to work around
hardware limitations.

Also the name no_dma_fence_mode really refers to what we elsewhere call
long-running mode and the mode contrary to what its name suggests allows
dma-fences as in-fences.

So in an attempt to be more consistent, rename
no_dma_fence_mode -> lr_mode
compute_mode      -> preempt_fence_mode

And adjust flags so that

preempt_fence_mode sets XE_VM_FLAG_LR_MODE
fault_mode sets XE_VM_FLAG_LR_MODE | XE_VM_FLAG_FAULT_MODE

v2:
- Fix a typo in the commit message (Oak Zeng)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231127123349.23698-1-thomas.hellstrom@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:44:58 -05:00
Thomas Hellström
a21fe5ee59 drm/xe/bo: Rename xe_bo_get_sg() to xe_bo_sg()
Using "get" typically refers to obtaining a refcount, which we don't do
here so rename to xe_bo_sg().

Suggested-by: Ohad Sharabi <osharabi@habana.ai>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/946
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Ohad Sharabi<osharabi@habana.ai>
Link: https://patchwork.freedesktop.org/patch/msgid/20231122110359.4087-3-thomas.hellstrom@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:44:57 -05:00
Lucas De Marchi
0bc519d20f drm/xe: Remove GEN[0-9]*_ prefixes
After noticing in logs there were still mentions to GEN6 registers, it
was clear commit d9b79ad275 ("drm/xe: Drop gen afixes from registers")
didn't take care of all the afixes. Some were added later, but there are
also constants and strings still using that.  Continue the cleanup
removing the remaining ones.

To keep it consistent with code nearby, a few other changes are made:

	- Remove prefix in INTEL_LEGACY_64B_CONTEXT
	- Remove GEN8_CTX_L3LLC_COHERENT since it's unused
	- Rename GEN9_FREQ_SCALER to GT_FREQUENCY_SCALER

v2: Use XELP_ as prefix for NUM_MOCS_ENTRIES and remove changes to
    MOCS_ENTRIES as this is now done as part of a previous commit
    (Matt Roper)

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231117174049.527192-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:44:39 -05:00
José Roberto de Souza
b6f45db5d0 drm/xe: Set PTE_AE for smem allocations in integrated devices
Without this if a atomic operation is executed in Xe2 integrated GPUs
it causes engine memory catastrophic error.

This fixes at least 3 failures in piglit sanity and 2 failures in
crucible for LNL.

v3:
- only add PTE_AE to smem in integrated

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:59 -05:00
Matthew Auld
e814389ff1 drm/xe: directly use pat_index for pte_encode
In a future patch userspace will be able to directly set the pat_index
as part of vm_bind. To support this we need to get away from using
xe_cache_level in the low level routines and rather just use the
pat_index directly.

v2: Rebase
v3: Some missed conversions, also prefer tile_to_xe() (Niranjana)
v4: remove leftover const (Lucas)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:58 -05:00
Paulo Zanoni
0e1a234618 drm/xe: fix range printing for debug messages
We're already using the half-open interval notation "[A, B)", that "-
1" there makes it wrong. Also, getting rid of the "-1" makes it much
easier to grep for the logs when you're looking for an address that's
the end of a vma and the start of another.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:09 -05:00
Lucas De Marchi
0e5e77bd97 drm/xe: Use vfunc for pte/pde ppgtt encoding
Move the function to encode pte/pde to be vfuncs inside struct xe_vm.
This will allow to easily extend to platforms that don't have a
compatible encoding.

v2: Fix kunit build

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230927193902.2849159-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:41:19 -05:00
Lucas De Marchi
b3bb7d9c56 drm/xe: Remove check for vma == NULL
vma at this point can never be NULL as otherwise it would crash earlier
in the only caller, xe_pt_stage_bind_entry(). Remove the extra check and
avoid adding and removing the bits from the pte.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230927193902.2849159-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:41:19 -05:00
Lucas De Marchi
f24081cd62 drm/xe: Normalize pte/pde encoding
Split functions that do only part of the pde/pte encoding and that can
be called by the different places. This normalizes how pde/pte are
encoded so they can be moved elsewhere in a subsequent change.

xe_pte_encode() was calling __pte_encode() with a NULL vma, which is the
opposite of what xe_pt_stage_bind_entry() does. Stop passing a NULL vma
and just split another function that deals with a vma rather than a bo.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230927193902.2849159-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:41:19 -05:00
Tejas Upadhyay
2ff00c4f77 drm/xe: Track page table memory usage for client
Account page table memory usage in the owning client
memory usage stats.

V2:
  - Minor tweak to if (vm->pt_root[id]) check - Himal

Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:41:15 -05:00
Thomas Hellström
fc678ec7c2 drm/xe: Reinstate pipelined fence enable_signaling
With the GPUVA conversion, the xe_bo::vmas member became replaced with
drm_gem_object::gpuva.list, however there was a couple of usage instances
left using the old member. Most notably the pipelined fence
enable_signaling.

Remove the xe_bo::vmas member completely, fix usage instances and
also enable this pipelined fence enable_signaling even for faulting
VM:s since we actually wait for bind fences to complete.

v2:
- Rebase.
v3:
- Fix display code build error.

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230915172606.14436-1-thomas.hellstrom@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:41:14 -05:00
Francois Dugast
c73acc1eeb drm/xe: Use Xe assert macros instead of XE_WARN_ON macro
The XE_WARN_ON macro maps to WARN_ON which is not justified
in many cases where only a simple debug check is needed.
Replace the use of the XE_WARN_ON macro with the new xe_assert
macros which relies on drm_*. This takes a struct drm_device
argument, which is one of the main changes in this commit. The
other main change is that the condition is reversed, as with
XE_WARN_ON a message is displayed if the condition is true,
whereas with xe_assert it is if the condition is false.

v2:
- Rebase
- Keep WARN splats in xe_wopcm.c (Matt Roper)

v3:
- Rebase

Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:41:08 -05:00
Francois Dugast
9b9529ce37 drm/xe: Rename engine to exec_queue
Engine was inappropriately used to refer to execution queues and it
also created some confusion with hardware engines. Where it applies
the exec_queue variable name is changed to q and comments are also
updated.

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/162
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:39:20 -05:00
Francois Dugast
99fea68288 drm/xe: Prefer WARN() over BUG() to avoid crashing the kernel
Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls
WARN() instead of BUG(). BUG() crashes the kernel and should only be
used when it is absolutely unavoidable in case of catastrophic and
unrecoverable failures, which is not the case here.

Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:39:17 -05:00
Lucas De Marchi
b23ebae7ab drm/xe: Set PTE_DM bit for stolen on MTL
Integrated graphics 1270 and beyond should set the PTE_LM bit in the PTE
when it's stolen memory. Add a new function, xe_bo_is_stolen_devmem(),
and use it when encoding the PTE.

In some places in the spec the PTE bit is called "Local Memory",
abbreviated as LM, and in others it's called "Device Memory" (DM). Since
we moved away from "Local Memory" and preferred the "vram" terminology,
also rename the macros as DM to follow the name of the new function.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230726160708.3967790-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:39:05 -05:00
Lucas De Marchi
937b4be72b drm/xe: Decouple vram check from xe_bo_addr()
The output arg is_vram in xe_bo_addr() is unused by several callers.
It's also not what the function is mainly doing. Remove the argument and
let the interested callers to call xe_bo_is_vram().

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230726160708.3967790-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:39:05 -05:00
Lucas De Marchi
621c1fbd9b drm/xe: Remove vma arg from xe_pte_encode()
All the callers pass a NULL vma, so the buffer is always the BO. Remove
the argument and the side effects of dealing with it.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20230726160708.3967790-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:39:04 -05:00
Lucas De Marchi
0e34fdb4a0 drm/xe: Fix checking for unset value
Commit 3743040261 ("drm/xe: NULL binding implementation") introduced
the NULL binding implementation, but left a case in which the out value
is_vram is not set and the caller will use whatever was on stack.
Eventually the is_vram out could be removed, but this should at least
fix the current bug.

Fixes: 3743040261 ("drm/xe: NULL binding implementation")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230726160708.3967790-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:38:30 -05:00
Matthew Brost
342206b7cc drm/xe: Always use xe_vm_queue_rebind_worker helper
Do not queue the rebind worker directly, rather use the helper
xe_vm_queue_rebind_worker. This ensures we use the correct work queue.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:37:55 -05:00
Matt Roper
7a060d786c drm/xe/mtl: Map PPGTT as CPU:WC
On MTL and beyond, the GPU performs non-coherent accesses to the PPGTT
page tables.  These page tables should be mapped as CPU:WC.

Removes CAT errors triggered by xe_exec_basic@once-basic on MTL:

   xe 0000:00:02.0: [drm:__xe_pt_bind_vma [xe]] Preparing bind, with range [1a0000...1a0fff) engine 0000000000000000.
   xe 0000:00:02.0: [drm:xe_vm_dbg_print_entries [xe]] 1 entries to update
   xe 0000:00:02.0: [drm:xe_vm_dbg_print_entries [xe]]  0: Update level 3 at (0 + 1) [0...8000000000) f:0
   xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2
   xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2
   xe 0000:00:02.0: [drm] Timedout job: seqno=4294967169, guc_id=2, flags=0x4

v2:
 - Rename to XE_BO_PAGETABLE to make it more clear that this BO is the
   pagetable itself, rather than just being bound in the PPGTT.  (Lucas)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230725003433.1992137-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:37:54 -05:00
Matthew Brost
7ead331564 drm/xe: Use migrate engine for page fault binds
We must use migrate engine for page fault binds in order to avoid a
deadlock as the migrate engine has a reserved BCS instance which cannot
be stuck on a fault. To use the migrate engine the engine argument to
xe_migrate_update_pgtables must be NULL, this was incorrectly wired up
so vm->eng[tile_id] was always being used. Fix this.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:37:53 -05:00
Matthew Brost
1655c893af drm/xe: Reduce the number list links in xe_vma
Combine the userptr, rebind, and destroy links into a union as
the lists these links belong to are mutually exclusive.

v2: Adjust which lists are combined (Thomas H)
v3: Add kernel doc why this is safe (Thomas H), remove related change
of list_del_init -> list_del (Rodrigo)

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:37:53 -05:00
Matthew Brost
8f33b4f054 drm/xe: Avoid doing rebinds
If we dont change page sizes we can avoid doing rebinds rather just do a
partial unbind. The algorithm to determine its page size is greedy as we
assume all pages in the removed VMA are the largest page used in the
VMA.

v2: Don't exceed 100 lines
v3: struct xe_vma_op_unmap remove in different patch, remove XXX comment

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:37:52 -05:00
Matthew Brost
fd84041d09 drm/xe: Make bind engines safe
We currently have a race between bind engines which can result in
corrupted page tables leading to faults.

A simple example:
bind A 0x0000-0x1000, engine A, has unsatisfied in-fence
bind B 0x1000-0x2000, engine B, no in-fences
exec A uses 0x1000-0x2000

Bind B will pass bind A and exec A will fault. This occurs as bind A
programs the root of the page table in a bind job which is held up by an
in-fence. Bind B in this case just programs a leaf entry of the
structure.

To fix use range-fence utility to track cross bind engine conflicts. In
the above example bind A would insert an dependency into the range-fence
tree with a key of 0x0-0x7fffffffff, bind B would find that dependency
and its bind job would scheduled behind the unsatisfied in-fence and
bind A's job.

Reviewed-by: Maarten Lankhorst<maarten.lankhorst@linux.intel.com>
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:37:52 -05:00
Lucas De Marchi
0d39b6daa5 drm/xe: Normalize XE_VM_FLAG* names
Rename XE_VM_FLAGS_64K to XE_VM_FLAG_64K to follow the other names and
s/GT/TILE/ that got missed in commit 08dea76745 ("drm/xe: Move
migration from GT to tile").

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20230718193924.3084759-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:37:37 -05:00
Matthew Brost
b06d47be7c drm/xe: Port Xe to GPUVA
Rather than open coding VM binds and VMA tracking, use the GPUVA
library. GPUVA provides a common infrastructure for VM binds to use mmap
/ munmap semantics and support for VK sparse bindings.

The concepts are:

1) xe_vm inherits from drm_gpuva_manager
2) xe_vma inherits from drm_gpuva
3) xe_vma_op inherits from drm_gpuva_op
4) VM bind operations (MAP, UNMAP, PREFETCH, UNMAP_ALL) call into the
GPUVA code to generate an VMA operations list which is parsed, committed,
and executed.

v2 (CI): Add break after default in case statement.
v3: Rebase
v4: Fix some error handling
v5: Use unlocked version VMA in error paths
v6: Rebase, address some review feedback mainly Thomas H
v7: Fix compile error in xe_vma_op_unwind, address checkpatch

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:35:18 -05:00
Matthew Brost
21ed3327e3 drm/xe: Add helpers to hide struct xe_vma internals
This will help with the GPUVA port as the internals of struct xe_vma
will change.

v2: Update comment around helpers

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.kernel.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:35:18 -05:00
Matthew Brost
3743040261 drm/xe: NULL binding implementation
Add uAPI and implementation for NULL bindings. A NULL binding is defined
as writes dropped and read zero. A single bit in the uAPI has been added
which results in a single bit in the PTEs being set.

NULL bindings are intendedd to be used to implement VK sparse bindings,
in particular residencyNonResidentStrict property.

v2: Fix BUG_ON shown in VK testing, fix check patch warning, fix
xe_pt_scan_64K, update __gen8_pte_encode to understand NULL bindings,
remove else if vma_addr

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Suggested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:34:44 -05:00
Lucas De Marchi
a0ea91db61 drm/xe: Rename pte/pde encoding functions
Remove the leftover TODO by renameing the functions to use xe prefix.
Since the static __gen8_pte_encode() already has a double score,
just remove the prefix.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20230611222447.2837573-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:34:14 -05:00
Matthew Brost
3534b18c36 drm/xe: s/XE_PTE_READ_ONLY/XE_PTE_FLAG_READ_ONLY
This define is for internal PTE flags rather than fields in the hardware
PTEs, rename as such. This will help in an upcoming patch to avoid
further confusion.

Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:35:21 -05:00
Thomas Hellström
85dbfe47d0 drm/xe: Invalidate TLB also on bind if in scratch page mode
For scratch table mode we need to cover the case where a scratch PTE might
have been pre-fetched and cached and used instead of that of the newly
bound vma.
For compute vms, invalidate TLB globally using GuC before signalling
bind complete. For !long-running vms, invalidate TLB at batch start.

Also document how TLB invalidation works.

v2:
- Fix a pointer to the comment about TLB invalidation (Jose Souza).
- Add a bool to the vm whether we want to invalidate TLB at batch start.
- Invalidate TLB also on BCS- and video engines at batch start where
  needed.
- Use BIT() macro instead of explicit shift.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Tested-by: José Roberto de Souza <jose.souza@intel.com> #v1
Reported-by: José Roberto de Souza <jose.souza@intel.com> #v1
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:35:20 -05:00
Matt Roper
f6929e80cd drm/xe: Allocate GT dynamically
In preparation for re-adding media GT support, switch the primary GT
within the tile to a dynamic allocation.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-19-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:34:15 -05:00
Matt Roper
08dea76745 drm/xe: Move migration from GT to tile
Migration primarily focuses on the memory associated with a tile, so it
makes more sense to track this at the tile level (especially since the
driver was already skipping migration operations on media GTs).

Note that the blitter engine used to perform the migration always lives
in the tile's primary GT today.  In theory that could change if media
GTs ever start including blitter engines in the future, but we can
extend the design if/when that happens in the future.

v2:
 - Fix kunit test build
 - Kerneldoc parameter name update
v3:
 - Removed leftover prototype for removed function.  (Gustavo)
 - Remove unrelated / unwanted error handling change.  (Gustavo)

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-15-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:34:15 -05:00
Matt Roper
876611c2b7 drm/xe: Memory allocations are tile-based, not GT-based
Since memory and address spaces are a tile concept rather than a GT
concept, we need to plumb tile-based handling through lots of
memory-related code.

Note that one remaining shortcoming here that will need to be addressed
before media GT support can be re-enabled is that although the address
space is shared between a tile's GTs, each GT caches the PTEs
independently in their own TLB and thus TLB invalidation should be
handled at the GT level.

v2:
 - Fix kunit test build.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:34:14 -05:00
Matt Roper
f79ee3013a drm/xe: Add backpointer from gt to tile
Rather than a backpointer to the xe_device, a GT should have a
backpointer to its tile (which can then be used to lookup the device if
necessary).

The gt_to_xe() helper macro (which moves from xe_gt.h to xe_gt_types.h)
can and should still be used to jump directly from an xe_gt to
xe_device.

v2:
 - Fix kunit test build
 - Move a couple changes to the previous patch. (Lucas)

Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:34:11 -05:00
Michael J. Ruhl
fb31517cd7 drm/xe: Rename GPU offset helper to reflect true usage
The _io_offset helper function is returning an offset into the GPU
address space.  Using the CPU address offset (io_) is not correct.

Rename to reflect usage.
Update to use GPU offset information.
Update PT dma_offset to use the helper

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:34:10 -05:00
Maarten Lankhorst
094d739f4d drm/xe: Prevent evicting for page tables
When creating page tables from xe_exec_ioctl, we may end up freeing
memory we just validated. To be certain this does not happen, do not
allow the current reservation to be evicted from the ioctl.

Callchain:
[  109.008522]  xe_bo_move_notify+0x5c/0xf0 [xe]
[  109.008548]  xe_bo_move+0x90/0x510 [xe]
[  109.008573]  ttm_bo_handle_move_mem+0xb7/0x170 [ttm]
[  109.008581]  ttm_bo_swapout+0x15e/0x360 [ttm]
[  109.008586]  ttm_device_swapout+0xc2/0x110 [ttm]
[  109.008592]  ttm_global_swapout+0x47/0xc0 [ttm]
[  109.008598]  ttm_tt_populate+0x7a/0x130 [ttm]
[  109.008603]  ttm_bo_handle_move_mem+0x160/0x170 [ttm]
[  109.008609]  ttm_bo_validate+0xe5/0x1d0 [ttm]
[  109.008614]  ttm_bo_init_reserved+0xac/0x190 [ttm]
[  109.008620]  __xe_bo_create_locked+0x153/0x260 [xe]
[  109.008645]  xe_bo_create_locked_range+0x77/0x360 [xe]
[  109.008671]  xe_bo_create_pin_map_at+0x33/0x1f0 [xe]
[  109.008695]  xe_bo_create_pin_map+0x11/0x20 [xe]
[  109.008721]  xe_pt_create+0x69/0xf0 [xe]
[  109.008749]  xe_pt_stage_bind_entry+0x208/0x430 [xe]
[  109.008776]  xe_pt_walk_range+0xe9/0x2a0 [xe]
[  109.008802]  xe_pt_walk_range+0x223/0x2a0 [xe]
[  109.008828]  xe_pt_walk_range+0x223/0x2a0 [xe]
[  109.008853]  __xe_pt_bind_vma+0x28d/0xbd0 [xe]
[  109.008878]  xe_vm_bind_vma+0xc7/0x2f0 [xe]
[  109.008904]  xe_vm_rebind+0x72/0x160 [xe]
[  109.008930]  xe_exec_ioctl+0x22b/0xa70 [xe]
[  109.008955]  drm_ioctl_kernel+0xb9/0x150 [drm]
[  109.008972]  drm_ioctl+0x210/0x430 [drm]
[  109.008988]  __x64_sys_ioctl+0x85/0xb0
[  109.008990]  do_syscall_64+0x38/0x90
[  109.008991]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Original warning:
[ 5613.149126] WARNING: CPU: 3 PID: 45883 at drivers/gpu/drm/xe/xe_vm.c:504 xe_vm_unlock_dma_resv+0x43/0x50 [xe]
...
[ 5613.226398] RIP: 0010:xe_vm_unlock_dma_resv+0x43/0x50 [xe]
[ 5613.316098] Call Trace:
[ 5613.318595]  <TASK>
[ 5613.320743]  xe_exec_ioctl+0x383/0x8a0 [xe]
[ 5613.325278]  ? __is_insn_slot_addr+0x8e/0x110
[ 5613.329719]  ? __is_insn_slot_addr+0x8e/0x110
[ 5613.334116]  ? kernel_text_address+0x75/0xf0
[ 5613.338429]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 5613.343778]  ? __kernel_text_address+0x9/0x40
[ 5613.348181]  ? unwind_get_return_address+0x1a/0x30
[ 5613.353013]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 5613.358362]  ? arch_stack_walk+0x99/0xf0
[ 5613.362329]  ? rcu_read_lock_sched_held+0xb/0x70
[ 5613.366996]  ? lock_acquire+0x287/0x2f0
[ 5613.370873]  ? rcu_read_lock_sched_held+0xb/0x70
[ 5613.375530]  ? rcu_read_lock_sched_held+0xb/0x70
[ 5613.380181]  ? lock_release+0x225/0x2e0
[ 5613.384059]  ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
[ 5613.389092]  drm_ioctl_kernel+0xc0/0x170
[ 5613.393068]  drm_ioctl+0x1b7/0x490
[ 5613.396519]  ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
[ 5613.401547]  ? lock_release+0x225/0x2e0
[ 5613.405432]  __x64_sys_ioctl+0x8a/0xb0
[ 5613.409232]  do_syscall_64+0x37/0x90

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/239
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:34:10 -05:00
Lucas De Marchi
58e19acf0c drm/xe: Cleanup page-related defines
Rename the following defines to lose the GEN* prefixes since they don't
make sense for xe:

GEN8_PTE_SHIFT		-> XE_PTE_SHIFT
GEN8_PAGE_SIZE		-> XE_PAGE_SIZE
GEN8_PTE_MASK		-> XE_PTE_MASK
GEN8_PDE_SHIFT		-> XE_PDE_SHIFT
GEN8_PDES		-> XE_PDES
GEN8_PDE_MASK		-> XE_PDE_MASK
GEN8_64K_PTE_SHIFT	-> XE_64K_PTE_SHIFT
GEN8_64K_PAGE_SIZE	-> XE_64K_PAGE_SIZE
GEN8_64K_PTE_MASK	-> XE_64K_PTE_MASK
GEN8_64K_PDE_MASK	-> XE_64K_PDE_MASK
GEN8_PDE_PS_2M		-> XE_PDE_PS_2M
GEN8_PDPE_PS_1G		-> XE_PDPE_PS_1G
GEN8_PDE_IPS_64K	-> XE_PDE_IPS_64K
GEN12_GGTT_PTE_LM	-> XE_GGTT_PTE_LM
GEN12_USM_PPGTT_PTE_AE	-> XE_USM_PPGTT_PTE_AE
GEN12_PPGTT_PTE_LM	-> XE_PPGTT_PTE_LM
GEN12_PDE_64K		-> XE_PDE_64K
GEN12_PTE_PS64		-> XE_PTE_PS64
GEN8_PAGE_PRESENT	-> XE_PAGE_PRESENT
GEN8_PAGE_RW		-> XE_PAGE_RW
PTE_READ_ONLY		-> XE_PTE_READ_ONLY

Keep an XE_ prefix to make sure we don't mix the defines for the CPU
(e.g. PAGE_SIZE) with the ones fro the GPU).

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:31:43 -05:00
Matthew Brost
59ea53eecb drm/xe: Use BO's GT to determine dma_offset when programming PTEs
Rather than using the passed in GT, use the BO's GT determine dma_offset
when programming PTEs as these two GT's could differ (i.e. mapping a BO
from a remote GT). The BO's GT is correct GT to use as this where BO
resides, while the passed in GT is where the mapping is created.

v2:
  (Thomas)      - Kernel doc, extra new line
  (CI)          - Rebase to tip

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:30:21 -05:00
Matthew Auld
2a8477f761 drm/xe: s/lmem/vram/
This seems to be the preferred nomenclature in xe. Currently we are
intermixing vram and lmem, which is confusing.

v2 (Gwan-gyeong Mun & Lucas):
  - Rather apply to the entire driver

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:29:45 -05:00
Lucas De Marchi
ea9f879d03 drm/xe: Sort includes
Sort includes and split them in blocks:

1) .h corresponding to the .c. Example: xe_bb.c should have a "#include
   "xe_bb.h" first.
2) #include <linux/...>
3) #include <drm/...>
4) local includes
5) i915 includes

This is accomplished by running
`clang-format --style=file -i --sort-includes drivers/gpu/drm/xe/*.[ch]`
and ignoring all the changes after the includes. There are also some
manual tweaks to split the blocks.

v2: Also sort includes in headers

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:29:20 -05:00
Matthew Brost
5387e865d9 drm/xe: Add TLB invalidation fence after rebinds issued from execs
If we add an TLB invalidation fence for rebinds issued from execs we
should be able to drop the TLB invalidation from the ring operations.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
2023-12-19 18:27:47 -05:00
Matthew Brost
bae8ddae18 drm/xe: Propagate VM unbind error to invalidation fence
If a VM unbind hits an error, do not issue a TLB invalidation and
propagate the error the invalidation fence.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
2023-12-19 18:27:47 -05:00
Matthew Brost
da3799c975 drm/xe: Use GuC to do GGTT invalidations for the GuC firmware
Only the GuC should be issuing TLB invalidations if it is enabled. Part
of this patch is sanitize the device on driver unload to ensure we do
not send GuC based TLB invalidations during driver unload.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
2023-12-19 18:27:47 -05:00
Matthew Brost
332dd0116c drm/xe: Add range based TLB invalidations
If the platform supports range based TLB invalidations use them. Hide
these details in the xe_gt_tlb_invalidation layer.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
2023-12-19 18:27:46 -05:00
Matthew Brost
24b52db6ae drm/xe: Add TLB invalidation fence ftrace
This will help debug issues with TLB invalidation fences.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:27:46 -05:00
Matthew Brost
f4a8add94f drm/xe: Invalidate TLB after unbind is complete
This gets tricky as we can't do the TLB invalidation until the unbind
operation is done on the hardware and we can't signal the unbind as
complete until the TLB invalidation is done. To work around this we
create an unbind fence which does a TLB invalidation after unbind is
done on the hardware, signals on TLB invalidation completion, and this
fence is installed in the BO dma-resv slot and installed in out-syncs
for the unbind operation.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Suggested-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:27:45 -05:00
Matthew Auld
b1e52b6571 drm/xe/ppgtt: fix scratch page usage on DG2
On DG2 when running the xe_vm IGT, the kernel generates loads of CAT
errors and GT resets (sometimes at least).  On small-bar systems seems
to trigger a lot more easily (maybe due to difference in allocation
strategy). Appears to be related to scratch, since we seem to use the
64K TLB hint on scratch entries, even though the scratch page is a 4K
vram page. Bumping the scratch page size and physical alignment seems
to fix it. Or at least we no longer hit:

[  148.872683] xe 0000:03:00.0: [drm] Engine memory cat error: guc_id=0
[  148.872701] xe 0000:03:00.0: [drm] Engine memory cat error: guc_id=0
[  148.875108] WARNING: CPU: 0 PID: 953 at drivers/gpu/drm/xe/xe_guc_submit.c:797

However to keep things simple, so we don't have to deal with 64K TLB
hints, just move the scratch page into system memory on platforms that
require 64K VRAM pages.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:27:44 -05:00
Matthew Auld
e63f81adcc drm/xe/ppgtt: clear the scratch page
We need to ensure we don't leak the contents to userspace.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:27:44 -05:00
Matthew Brost
60694edf66 drm/xe: Ensure VMA not userptr before calling xe_bo_is_stolen
Fix the below splat:

[  142.510525] [IGT] xe_exec_basic: starting subtest once-userptr
[  142.511339] BUG: kernel NULL pointer dereference, address: 0000000000000228
[  142.518311] #PF: supervisor read access in kernel mode
[  142.523458] #PF: error_code(0x0000) - not-present page
[  142.528604] PGD 0 P4D 0
[  142.531153] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  142.535518] CPU: 4 PID: 1199 Comm: kworker/u16:8 Not tainted 6.1.0-rc1-xe+ #1
[  142.542656] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[  142.556033] Workqueue: events_unbound async_op_work_func [xe]
[  142.561810] RIP: 0010:xe_bo_is_stolen+0x0/0x20 [xe]
[  142.566709] Code: 20 c8 75 05 83 fa 07 74 05 c3 cc cc cc cc 48 8b 87 08 02 00 00 0f b6 80 2c ff ff ff c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 <48> 8b 87 28 02 00 00 83 78 10 07 0f 94 c0 c3 cc cc cc cc 66 66 2e
[  142.585447] RSP: 0018:ffffc900019eb888 EFLAGS: 00010246
[  142.590678] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffff88813f6a2108
[  142.597821] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  142.604962] RBP: ffffc900019ebbc0 R08: 0000000000000001 R09: 0000000000000000
[  142.612101] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88814107d600
[  142.619242] R13: ffffc900019eba20 R14: ffff888140442000 R15: 0000000000000000
[  142.626378] FS:  0000000000000000(0000) GS:ffff88849fa00000(0000) knlGS:0000000000000000
[  142.634468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  142.640219] CR2: 0000000000000228 CR3: 000000010a4c0006 CR4: 0000000000770ee0
[  142.647361] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  142.654505] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  142.661639] PKRU: 55555554
[  142.664367] Call Trace:
[  142.666830]  <TASK>
[  142.668947]  __xe_pt_bind_vma+0x1a1/0xa50 [xe]
[  142.673417]  ? unwind_next_frame+0x187/0x770
[  142.677699]  ? __thaw_task+0xc0/0xc0
[  142.681293]  ? __lock_acquire+0x5e4/0x26e0
[  142.685409]  ? lockdep_hardirqs_on+0xbf/0x140
[  142.689779]  ? lock_acquire+0xd2/0x310
[  142.693548]  ? mark_held_locks+0x49/0x80
[  142.697485]  ? xe_vm_bind_vma+0xf1/0x3d0 [xe]
[  142.701866]  xe_vm_bind_vma+0xf1/0x3d0 [xe]
[  142.706082]  xe_vm_bind+0x76/0x140 [xe]
[  142.709944]  vm_bind_ioctl+0x26f/0xb40 [xe]
[  142.714161]  ? async_op_work_func+0x20c/0x450 [xe]
[  142.718974]  async_op_work_func+0x20c/0x450 [xe]
[  142.723620]  process_one_work+0x263/0x580
[  142.727645]  ? process_one_work+0x580/0x580
[  142.731839]  worker_thread+0x4d/0x3b0
[  142.735518]  ? process_one_work+0x580/0x580
[  142.739714]  kthread+0xeb/0x120
[  142.742872]  ? kthread_complete_and_exit+0x20/0x20
[  142.747671]  ret_from_fork+0x1f/0x30
[  142.751264]  </TASK>

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-12 14:06:00 -05:00
Maarten Lankhorst
d8b52a02cb drm/xe: Implement stolen memory.
This adds support for stolen memory, with the same allocator as
vram_mgr. This allows us to skip a whole lot of copy-paste,
by re-using parts of xe_ttm_vram_mgr.

The stolen memory may be bound using VM_BIND, so it performs like any
other memory region.

We should be able to map a stolen BO directly using the physical memory
location instead of through GGTT even on old platforms, but I don't know
what the effects are on coherency.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-12 14:06:00 -05:00
Matthew Brost
dd08ebf6c3 drm/xe: Introduce a new DRM driver for Intel GPUs
Xe, is a new driver for Intel GPUs that supports both integrated and
discrete platforms starting with Tiger Lake (first Intel Xe Architecture).

The code is at a stage where it is already functional and has experimental
support for multiple platforms starting from Tiger Lake, with initial
support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan
drivers), as well as in NEO (for OpenCL and Level0).

The new Xe driver leverages a lot from i915.

As for display, the intent is to share the display code with the i915
driver so that there is maximum reuse there. But it is not added
in this patch.

This initial work is a collaboration of many people and unfortunately
the big squashed patch won't fully honor the proper credits. But let's
get some git quick stats so we can at least try to preserve some of the
credits:

Co-developed-by: Matthew Brost <matthew.brost@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Co-developed-by: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Co-developed-by: Francois Dugast <francois.dugast@intel.com>
Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com>
Co-developed-by: Nirmoy Das <nirmoy.das@intel.com>
Co-developed-by: Jani Nikula <jani.nikula@intel.com>
Co-developed-by: José Roberto de Souza <jose.souza@intel.com>
Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Co-developed-by: Dave Airlie <airlied@redhat.com>
Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
2023-12-12 14:05:48 -05:00