Now that the PAT settings for the new special entries introduced by Xe2
are decided during early software init and left NULL on platforms they
don't apply to, there's no need to keep separate programming functions
for pre-Xe2 and post-Xe2 platforms. Consolidate down to a single pair
of programming functions (mcr and non-mcr) that can be used on any
platform.
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250613214751.792066-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Decide whether programming of the special ATS and PTA PAT entries is
necessary (and which entries should be programmed) during early software
initialization rather than hardcoding this into the 'program' functions.
Future platforms may want to re-use the same functions but utilize
different special entry values. Consolidating all of the decisions
into one place keeps things simple.
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250613214751.792066-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Dealing with CCS state is significant on LNL+, where we end up clearing
the compression state on every page alloc using the blitter for user
buffers, including also saving and restoring it when moving between
domains, plus we need to alloc extra pages to hold the raw CCS state for
the save step.
However all compression PAT modes, on platforms like LNL, also require
coh_none, meaning that only WC memory can use compression in the first
place. With this we can be sneaky and completely ignore CCS for WB
buffers, which is likely the common case anyway. This would then skip
all blitter moves/clears between sys <-> tt and then also means we can
drop the extra CCS pages.
This should be safe since there is no way to interact with the
compression state (potentially uncleared) without using a PAT enabled
index (which is rejected at bind), including if trying to be malicious
and copy the raw CCS state from userpace, which should give back all
zeroes if the src surface (indirect) is lacking compressed PAT index.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250516153810.223530-2-matthew.auld@intel.com
xe_force_wake_get() now returns the reference count-incremented domain
mask. If it fails for individual domains, the return value will always
be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even
in the event of failure. Update the return handling of xe_force_wake_get()
to reflect this behavior, and ensure that the return value is passed as
input to xe_force_wake_put().
v3
- return xe_wakeref_t instead of int in xe_force_wake_get()
- xe_force_wake_put() error doesn't need to be checked. It internally
WARNS on domain ack failure.
- don't use xe_assert() to report HW errors (Michal)
v5
- return unsigned int from xe_force_wake_get()
- remove redundant warns
v7
- Fix commit message
- Remove redundant header
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-20-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There is an implicit assumption in the driver that compression and
coh_1way+ are mutually exclusive. If this is ever not true then userptr
and imported dma-buf from external device will have uncleared ccs state.
Add a build bug for this so we don't forget.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240828092257.169063-2-matthew.auld@intel.com
This involves enabling l2 caching of host side memory access to VRAM
through the CPU BAR. The main fallout here is with display since VRAM
writes from CPU can now be cached in GPU l2, and display is never
coherent with caches, so needs various manual flushing. In the case of
fbc we disable it due to complications in getting this to work
correctly (in a later patch).
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Vinod Govindapillai <vinod.govindapillai@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703124338.208220-3-matthew.auld@intel.com
While xe_force_wake.h is now included from the xe_device.h, we
want to drop that include as we don't need it there. Explicitly
include xe_force_wake.h where needed.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240507110959.2747-3-michal.wajdeczko@intel.com
We may leave pat.ops unset when running on brand new platform or
when running as a VF. While the former is unlikely, the latter
is valid (future) use case and will cause NPD when someone will
try to dump PAT settings by debugfs.
It's better to check pointer to pat.ops instead of specific .dump
hook, as we have this hook always defined for every .ops variant.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409105106.1067-2-michal.wajdeczko@intel.com
Make sure that pat.ops (if selected) has all required function
pointers setup. Only .program_media may be omitted if we have
older media version.
This should help avoid late runtime checks against individual
function pointers.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409105106.1067-1-michal.wajdeczko@intel.com
Discrete Xe2 platforms require programming of one additional row of PAT
settings which controls the access characteristics for PPGTT and LMTT
page tables. Integrated GPUs do not need this programming and will
leave the register at its hardware default value.
Bspec: 71582
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240408170545.3769566-6-balasubramani.vivekanandan@intel.com
For indirect accessed buffer use compression enabled PAT index.
v2:
- Fix parameter name.
v3:
- use a relevant define instead of fix number.
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Future uapi needs to give userspace the ability to select the pat_index
for a given vm_bind. However we need to be able to extract the coherency
mode from the provided pat_index to ensure it's compatible with the
cpu_caching mode set at object creation. There are various security
reasons for why this matters. However the pat_index itself is very
platform specific, so seems reasonable to annotate each platform
definition of the pat table. On some older platforms there is no
explicit coherency mode, so we just pick whatever makes sense.
v2:
- Simplify with COH_AT_LEAST_1_WAY
- Add some kernel-doc
v3 (Matt Roper):
- Some small tweaks
v4:
- Rebase
v5:
- Rebase on Xe2 PAT additions
v6:
- Rebase on removal of coh_mode from uapi
Bspec: 45101, 44235 #xe
Bspec: 70552, 71582, 59400 #xe2
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
Cc: Zhengguo Xu <zhengguo.xu@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We don't seem to use the 4-7 pat indexes, even though they are defined
by the HW. In a future patch userspace will be able to directly set the
pat_index as part of vm_bind and we don't want to allow setting 4-7.
Simplest is to just ignore them here.
Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This is useful to debug cache issues, to double check if the PAT
indexes match what they were supposed to be set to from spec.
v2: Add separate functions for XeHP, XeHPC and XeLPG so it correctly
reads the index based on MCR/REG registers and also decodes the
fields (Matt Roper)
v3: Starting with XeHPC, do not translate values to human-readable
formats as the main goal is to make it easy to compare the table
with the spec. Also, share a single array for xelp/xehp str map
(Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231006182325.3617685-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The PAT tables become significantly more complicated on Xe2 platforms.
They now control L3, L4, and coherency settings, as well as additional
characteristics such as compression.
Aside from the main PAT table, there's an additional register that
also needs to be programmed with PAT settings for PCI Address
Translation Services.
Bspec: 71582
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20231006182325.3617685-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Some of the PAT entries are relevant for internal driver use, which
varies per platform. Let the PAT early initialization set what they
should point to so the rest of the driver can use them where needed.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230927193902.2849159-9-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Both DG2 and PVC are derived from XeHP, but DG2 should not really
re-use something introduced by PVC, so it's odd to have DG2 re-using the
PVC programming for PAT. Let's prefer using the architecture and/or IP
names.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230927193902.2849159-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
DG2 should use the MCR variant to program the PAT registers, like PVC,
but shouldn't use the same table as PVC.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230927193902.2849159-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Split the PAT initialization between SW-only and HW. The _early() only
sets up the ops and data structure that are used later to program the
tables. This allows the PAT to be easily extended to other platforms.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230927193902.2849159-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Use 0 in format string instead of space so it shows as
[drm] *ERROR* Missing PAT table for platform with graphics version 20.04!
instead of
[drm] *ERROR* Missing PAT table for platform with graphics version 20. 4!
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20230906193009.1912129-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Convert all the callers to deal with xe_mmio_*() using struct xe_reg
instead of plain u32. In a few places there was also a rename
s/reg/reg_val/ when dealing with the value returned so it doesn't get
mixed up with the register address.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20230508225322.2692066-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
These should replace the _MMIO() and MCR_REG() from i915, with the goal
of being more extensible, allowing to pass the additional fields for
struct xe_reg and struct xe_reg_mcr. Replace all uses of _MMIO() and
MCR_REG() in xe.
Since the RTP, reg-save-restore and WA infra are not ready to use the
new type, just undef the macro like was done for the i915 types
previously. That conversion will come later.
v2: Remove MEDIA_SOFT_SCRATCH_COUNT/MEDIA_SOFT_SCRATCH re-added by
mistake (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Clarify a few things related to the PAT programming, particularly on
MTL:
- The register type doesn't change depending on the GT - what
happens is that media GT writes to other set of registers that
are not MCR
- Remove "UNICAST": otherwise it's confusing why it's not using
MCR registers with the unicast function variant
Also, there isn't much reason to keep those parts as macros: promote
them to proper functions and let the compiler inline if it sees fit.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The tables are only used within this file; there's no reason for them
not to be static.
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230327175824.2967914-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Starting with MTL, the number of entries in the PAT table increased to
16. The register offset jumped between index 7 and index 8, so a slight
adjustment is needed to ensure the PAT_INDEX macros select the proper
offset for the upper half of the table.
Note that although there are 16 registers in the hardware, the driver is
currently only asked to program the first 5, and we leave the rest at
their hardware default values. That means we don't actually touch the
upper half of the PAT table in the driver today and this patch won't
have any functional effect [yet].
Bspec: 44235
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-7-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Re-sync our MTL PAT table with the bspec. 1-way coherency should only
be set on table entry 3. We do not want an incorrect setting here to
accidentally paper over other bugs elsewhere in the driver.
Bspec: 45101
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-6-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The PAT_INDEX registers are MCR registers on some platforms and unicast
on others. On MTL the handling even varies between GTs: the primary GT
uses MCR registers while the media GT uses unicast registers. Let's add
proper MCR programming on the relevant platforms/GTs.
Given that we PAT tables to change pretty regularly on future platforms,
we'll make PAT programming an exception to the usual model of assuming
new platforms should inherit the previous platform's behavior. Instead
we'll raise a warning if the current platform isn't handled in the
if/else ladder. This should help prevent subtle cache misbehavior if we
forget to add the table for a new platform.
Bspec: 66534, 67609, 67788
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Provide per-platform tables of PAT values rather than per-platform
functions. This will simplify the handling of unicast vs MCR registers
in the upcoming patches.
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
PAT handling is growing in complexity and will continue to do so in
upcoming platforms. Separate it out to a dedicated file to keep things
tidy.
The code is moved as-is here (aside from a few unused #define's that are
just dropped); further changes will come in future patches.
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>