Currently we are blocking processing of all save-restore rules
by the VFs inside the xe_rtp_process_to_sr() function, but we
want to unblock that to allow processing of the LRC WA rules.
To avoid hitting WARNs about reading an inaccessible registers by
the VFs, stop applying save-restore MMIOs action if VF, without
relying that SR list will be always empty for the VF.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250303173522.1822-5-michal.wajdeczko@intel.com
Instead of handling the whitelist directly in the GuC ADS
initialization, make it follow the same logic as other engine registers
that are save-restored. Main benefit is that then the SW tracking then
shows it in debugfs and there's no risk of an engine workaround to write
to the same nopriv register that is being passed directly to GuC.
This means that xe_reg_whitelist_process_engine() only has to process
the RTP and convert them to entries for the hwe. With that all the
registers should be covered by xe_reg_sr_apply_mmio() to write to the HW
and there's no special handling in GuC ADS to also add these registers
to the list of registers that is passed to GuC.
Example for DG2:
# cat /sys/kernel/debug/dri/0000\:03\:00.0/gt0/register-save-restore
...
Engine
rcs0
...
REG[0x24d0] clr=0xffffffff set=0x1000dafc masked=no mcr=no
REG[0x24d4] clr=0xffffffff set=0x1000db01 masked=no mcr=no
REG[0x24d8] clr=0xffffffff set=0x0000db1c masked=no mcr=no
...
Whitelist
rcs0
REG[0xdafc-0xdaff]: allow read access
REG[0xdb00-0xdb1f]: allow read access
REG[0xdb1c-0xdb1f]: allow rw access
v2:
- Use ~0u for clr bits so it's just a write (Matt Roper)
- Simplify helpers now that unused slots are not written
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241209232739.147417-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Currently xe_reg_sr_apply_whitelist() sets the unused values to a known
used value for no good reason: it could just leave it with the HW
default. The behavior is slightly different if there are no whitelist
registers for the engine as the function returns early. This is not
needed, so just drop the addition writes for the unused slots.
Later this will allow to reduce the amount of registers passed to GuC
for save/restore.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241209232739.147417-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
That pool implementation doesn't really work: if the krealloc happens to
move the memory and return another address, the entries in the xarray
become invalid, leading to use-after-free later:
BUG: KASAN: slab-use-after-free in xe_reg_sr_apply_mmio+0x570/0x760 [xe]
Read of size 4 at addr ffff8881244b2590 by task modprobe/2753
Allocated by task 2753:
kasan_save_stack+0x39/0x70
kasan_save_track+0x14/0x40
kasan_save_alloc_info+0x37/0x60
__kasan_kmalloc+0xc3/0xd0
__kmalloc_node_track_caller_noprof+0x200/0x6d0
krealloc_noprof+0x229/0x380
Simplify the code to fix the bug. A better pooling strategy may be added
back later if needed.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241209232739.147417-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
With xe_force_wake_get() now returning the refcount-incremented
domain mask, a non-zero return value in the case of XE_FORCEWAKE_ALL does
not necessarily indicate success. Use xe_force_wake_ref_has_domain()
to determine the status of the call.
Modify the return handling of xe_force_wake_get() accordingly and
pass the return value to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
v5
- return unsigned int from xe_force_wake_get()
v6
- use helper xe_force_wake_ref_has_domain()
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-22-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Forcewake is a general GT power management concept that isn't specific
to MMIO register access. Move the forcewake information for a GT out of
the 'mmio' substruct and into a 'pm' substruct. Also use the gt_to_fw()
helper in a few more places where it was being open-coded.
v2:
- Kerneldoc tweaks. (Lucas)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-46-matthew.d.roper@intel.com
Use xe_gt_dbg() instead of drm_dbg() so the GT is added to the log for
easy identification.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230906012053.1733755-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
For all RTP actions, clr_bits is a superset of the bits being modified.
That's also why the check for "changing all bits" can be done with
`clr_bits + 1`. So always use clr_bits for setting the upper bits of a
masked register.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://lore.kernel.org/r/20230906012053.1733755-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If RING_MAX_NONPRIV_SLOTS denotes the maximum number of whitelisting
slots, then it makes sense to refuse going above it.
v2:
- Use xe_gt_err() instead of drm_err() for more detailed info in the
error message. (Matt)
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230609143815.302540-3-gustavo.sousa@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
All other parameters can be extracted from a single struct xe_hw_engine
reference. This removes redundancy and simplifies the code.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230609143815.302540-2-gustavo.sousa@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
DRM_ERROR() has been deprecated in favor of pr_err(). However, we should
prefer to use xe_gt_err() or drm_err() whenever possible so we get gt-
or device-specific output with the error message.
v2:
- Prefer drm_err() over pr_err(). (Matt, Jani)
v3:
- Prefer xe_gt_err() over drm_err() when possible. (Matt)
v4:
- Use the already available dev variable instead of xe->drm as
parameter to drm_err(). (Matt)
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Haridhar Kalvala <haridhar.kalvala@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230601194419.1179609-1-gustavo.sousa@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Rename the address field to "addr" rather than "reg" so it's easier to
understand what it is.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20230508225322.2692066-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Convert all the callers to deal with xe_mmio_*() using struct xe_reg
instead of plain u32. In a few places there was also a rename
s/reg/reg_val/ when dealing with the value returned so it doesn't get
mixed up with the register address.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20230508225322.2692066-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Now that struct xe_reg and struct xe_reg_mcr are types that can be used
by xe, convert more of the driver to use them. Some notes about the
conversions:
- The RTP tables don't need the MASKED flags anymore in the
actions as that information now comes from the register
definition
- There is no need for the _XE_RTP_REG/_XE_RTP_REG_MCR macros
and the register types on RTP infra: that comes from the
register definitions.
- When declaring the RTP entries, there is no need anymore to
undef XE_REG and friends: the RTP macros deal with removing
the cast where needed due to not being able to use a compound
statement for initialization in the tables
- The index in the reg-sr xarray is the register offset only.
Otherwise we wouldn't catch mistakes about adding both a
MCR-style and normal-style registers. For that, the register
is now also part of the entry, so the options can be compared
to check for compatible entries.
In order to be able to accomplish this, some improvements are needed on
the RTP macros. Change its implementation to concentrate on "pasting a prefix
to each argument" rather than the more general "call any macro for each
argument". Hopefully this will avoid trying to extend this infra and
making it more complex. With the use of tuples for building the
arguments, it's not possible to pass additional register fields and
using xe_reg in the RTP tables.
xe_mmio_* still need to be converted, from u32 to xe_reg, but that is
left for another change.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-10-lucas.demarchi@intel.com
Link: https://lore.kernel.org/r/20230427223256.1432787-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
These should replace the _MMIO() and MCR_REG() from i915, with the goal
of being more extensible, allowing to pass the additional fields for
struct xe_reg and struct xe_reg_mcr. Replace all uses of _MMIO() and
MCR_REG() in xe.
Since the RTP, reg-save-restore and WA infra are not ready to use the
new type, just undef the macro like was done for the i915 types
previously. That conversion will come later.
v2: Remove MEDIA_SOFT_SCRATCH_COUNT/MEDIA_SOFT_SCRATCH re-added by
mistake (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The 'clear' field for register save/restore entries was being placed in
the value bits of the register rather than the mask bits; make sure it
gets shifted into the mask bits.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230419224909.4000920-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add some basic unit tests for rtp. This is intended to prove the
functionality of the rtp itself, like coalescing entries, rejecting
non-disjoint values, etc.
Contrary to the other tests in xe, this is a unit test to test the
sw-side only, so it can be executed on any machine - it doesn't interact
with the real hardware. Running it produces the following output:
$ ./tools/testing/kunit/kunit.py run --raw_output-kunit \
--kunitconfig drivers/gpu/drm/xe/.kunitconfig xe_rtp
...
[01:26:27] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
KTAP version 1
# Subtest: xe_rtp
1..1
KTAP version 1
# Subtest: xe_rtp_process_tests
ok 1 coalesce-same-reg
ok 2 no-match-no-add
ok 3 no-match-no-add-multiple-rules
ok 4 two-regs-two-entries
ok 5 clr-one-set-other
ok 6 set-field
[drm:xe_reg_sr_add] *ERROR* Discarding save-restore reg 0001 (clear: 00000001, set: 00000001, masked: no): ret=-22
ok 7 conflict-duplicate
[drm:xe_reg_sr_add] *ERROR* Discarding save-restore reg 0001 (clear: 00000003, set: 00000000, masked: no): ret=-22
ok 8 conflict-not-disjoint
[drm:xe_reg_sr_add] *ERROR* Discarding save-restore reg 0001 (clear: 00000002, set: 00000002, masked: no): ret=-22
[drm:xe_reg_sr_add] *ERROR* Discarding save-restore reg 0001 (clear: 00000001, set: 00000001, masked: yes): ret=-22
ok 9 conflict-reg-type
# xe_rtp_process_tests: pass:9 fail:0 skip:0 total:9
ok 1 xe_rtp_process_tests
# Totals: pass:9 fail:0 skip:0 total:9
ok 1 xe_rtp
...
Note that the ERRORs in the kernel log are expected since it's testing
incompatible entries.
v2:
- Use parameterized table for tests (Michał Winiarski)
- Move everything to the xe_rtp_test.ko and only add a few exports to the
right namespace
- Add more tests to cover FIELD_SET, CLR, partially true rules, etc
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst<maarten.lankhorst@linux.intel.com> # v1
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20230401085151.1786204-7-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When there's an entry that is dropped when xe_reg_sr_add(), there's
not much we can do other than reporting the error - it's for certain a
driver issue or conflicting workarounds/tunings. Save the number of
errors to be used later by kunit to report where it happens.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230401085151.1786204-6-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add debugfs entry to dump the final tables with register save-restore
information.
For the workarounds, this has a format a little bit different than when the
values are applied because we don't want to read the values from the HW
when dumping via debugfs. For whitelist it just re-uses the print
function added for when the whitelist is being built.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230314003012.2600353-5-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Besides printing the various register save-restore, it's also useful to
know the register being allowed/denied access from unprivileged batch
buffers. Print them during device probe.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230314003012.2600353-4-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The dump function was originally added with the idea that it could be
re-used both for printing the reg-sr data and saving it to pass to GuC
via ADS. This was not used by the GuC integration, so remove it now to
give place to a new debug.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Create regs/xe_gt_regs.h file with all the registers and bit
definitions used by the xe driver. Eventually the registers may be
defined in a different way and since xe doesn't supported below gen12,
the number of registers touched is much smaller, so create a new header.
The definitions themselves are direct copy from the
gt/intel_gt_regs.h file, just sorting the registers by address.
Cleaning those up and adhering to a common coding style is left for
later.
v2: Make the change to MCR_REG location in a separate patch to go
through the i915 branch (Matt Roper / Rodrigo)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Create regs/xe_engine_regs.h file with all the registers and bit
definitions used by the xe driver. Eventually the registers may be
defined in a different way and since xe doesn't supported below gen12,
the number of registers touched is much smaller, so create a new header.
The definitions themselves are direct copy from the
gt/intel_engine_regs.h file, just sorting the registers by address.
Cleaning those up and adhering to a common coding style is left for
later.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Sort includes and split them in blocks:
1) .h corresponding to the .c. Example: xe_bb.c should have a "#include
"xe_bb.h" first.
2) #include <linux/...>
3) #include <drm/...>
4) local includes
5) i915 includes
This is accomplished by running
`clang-format --style=file -i --sort-includes drivers/gpu/drm/xe/*.[ch]`
and ignoring all the changes after the includes. There are also some
manual tweaks to split the blocks.
v2: Also sort includes in headers
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Xe, is a new driver for Intel GPUs that supports both integrated and
discrete platforms starting with Tiger Lake (first Intel Xe Architecture).
The code is at a stage where it is already functional and has experimental
support for multiple platforms starting from Tiger Lake, with initial
support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan
drivers), as well as in NEO (for OpenCL and Level0).
The new Xe driver leverages a lot from i915.
As for display, the intent is to share the display code with the i915
driver so that there is maximum reuse there. But it is not added
in this patch.
This initial work is a collaboration of many people and unfortunately
the big squashed patch won't fully honor the proper credits. But let's
get some git quick stats so we can at least try to preserve some of the
credits:
Co-developed-by: Matthew Brost <matthew.brost@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Co-developed-by: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Co-developed-by: Francois Dugast <francois.dugast@intel.com>
Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com>
Co-developed-by: Nirmoy Das <nirmoy.das@intel.com>
Co-developed-by: Jani Nikula <jani.nikula@intel.com>
Co-developed-by: José Roberto de Souza <jose.souza@intel.com>
Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Co-developed-by: Dave Airlie <airlied@redhat.com>
Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>